Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Psychological Assessment Copyright 2002 by the American Psychological Association, Inc.

2002, Vol. 14, No. 2, 209 –220 1040-3590/02/$5.00 DOI: 10.1037//1040-3590.14.2.209

Convergent Validity of the Agnew Relationship Measure


and the Working Alliance Inventory

William B. Stiles Roxane Agnew-Davies


Miami University Refuge

Michael Barkham Alison Culverwell


University of Leeds East Kent Community Trust

Marvin R. Goldfried Jeremy Halstead


State University of New York at Stony Brook Dewsbury Health Care Trust

Gillian E. Hardy Patrick J. Raue


University of Leeds and University of Sheffield Weill Medical College, Cornell University

Anne Rees David A. Shapiro


University of Leeds University of Leeds and University of Sheffield

The convergent validity of the Agnew Relationship Measure (ARM) and the Working Alliance Inventory
(WAI) was assessed in samples drawn from 2 comparative clinical trials of time-limited psychotherapies
for depression. In 1 sample, clients (n ⫽ 18) and therapists (n ⫽ 4) completed self-report versions of both
measures after every session (n ⫽ 198). In the other sample, clients (n ⫽ 39) and therapists (n ⫽ 6)
completed the ARM, and observers subsequently rated selected audiotaped sessions (n ⫽ 78) using the
WAI. In both samples, the ARM’s core alliance scales (Bond, Partnership, and Confidence) were
correlated with the WAI’s scales (Bond, Tasks, and Goals) strongly when assessed within client and
therapist perspectives and, with some qualifications, moderately when assessed between client, therapist,
and observer perspectives, supporting the assumption that the ARM and the WAI measure some of the
same core constructs.

The alliance between client and therapist is perhaps the most 2002; Horvath & Bedi, in press; Horvath & Greenberg, 1994;
written-about and measured construct in the psychotherapy pro- Horvath & Luborsky, 1993; Orlinsky, Grawe, & Parks, 1994;
cess research literature (e.g., Constantino, Castonguay, & Shut, Safran & Muran, 1998). Following Bordin’s (1979, 1994) seminal
conceptualization, which characterized the alliance as encompass-
ing (a) the affective bond between client and therapist, (b) agree-
ment on the goals of treatment, and (c) agreement on treatment
William B. Stiles, Department of Psychology, Miami University; Rox- tasks, or means of achieving those goals, most alliances research-
ane Agnew-Davies, Refuge, London, England; Michael Barkham and
ers have understood the alliance as multidimensional. However,
Anne Rees, Psychological Therapies Research Centre, University of Leeds,
Leeds, United Kingdom; Alison Culverwell, East Kent Community Trust, researchers have not agreed on the boundaries of the alliance
Canterbury, United Kingdom; Marvin R. Goldfried, Department of Psy- construct or on the number or names of the dimensions (see
chology, State University of New York at Stony Brook; Jeremy Halstead, Horvath & Greenberg, 1994). Many additional and overlapping
Dewsbury Health Care Trust, Dewsbury, United Kingdom; Gillian E. dimensions have been posited and assessed, as noted later. The
Hardy and David A. Shapiro, Psychological Therapies Research Centre, intense interest reflects the alliance’s replicated correlations with
University of Leeds and Department of Psychology, University of Shef- measures of psychotherapy outcome across a wide range of ther-
field, Sheffield, United Kingdom; Patrick J. Raue, Department of Psychi- apeutic approaches (Horvath & Bedi, in press; Horvath & Sy-
atry, Weill Medical College, Cornell University. monds, 1991; Krupnick et al., 1996; Orlinsky et al., 1994; Raue &
Portions of this research were supported by the Medical Research Goldfried, 1994; Stiles, Agnew-Davies, Hardy, Barkham, & Sha-
Council of the United Kingdom and by Grant MH 40196 from the National
piro, 1998; Watson & Greenberg, 1994).
Institute of Mental Health. Some staff members involved in this project
were supported by funding from the Leeds Community Mental Health and Different alliance scales have only occasionally been compared
Teaching Trust. in the same study (Bachelor, 1991; Cecero, Fenton, Nich, Frank-
Correspondence concerning this article should be addressed to William forter, & Carroll, 2001; Safran & Wallner, 1991; Tichenor & Hill,
B. Stiles, Department of Psychology, Miami University, Oxford, Ohio 1989). Thus, the assumption that they are measuring the same
45056. E-mail: stileswb@muohio.edu construct has seldom been tested directly. In this article, we report

209
210 STILES ET AL.

direct comparisons between two alliance measures, the Working Many studies have demonstrated a positive association of WAI
Alliance Inventory (WAI; Horvath & Greenberg, 1986, 1989) and scales with gains in treatment (Horvath, 1994; Horvath & Bedi, in
the Agnew Relationship Measure (ARM; Agnew-Davies, Stiles, press). The ARM has not yet been studied so extensively, but
Hardy, Barkham, & Shapiro, 1998), in samples drawn from two comparisons based on one of the samples from which the present
comparative clinical trials of time-limited psychotherapies for de- study was drawn broadly confirmed the positive association of
pression. Although our comparison bears on the convergent valid- alliance with treatment outcomes. Some ARM scales shared up to
ity of both measures, our motivation was to assess the convergence 40 –50% of the variance in some comparisons (Stiles et al., 1998).
of the ARM, which is a newer measure, with the more widely used For example, therapist mean ARM Confidence scores were corre-
WAI. We examined convergent validity at both the dyad level lated .44 with residual gains on the Beck Depression Inventory
(comparisons across client–therapist dyads) and the session level (Beck, Ward, Mendelson, Mock & Erbaugh, 1961) and .51 with
(comparisons across each dyad’s sessions). residual gains on the Rosenberg Self-Esteem Scale (O’Malley &
Bachman, 1979) assessed at end of treatment in a sample of 79
Structure of the WAI and the ARM clients treated in time-limited therapies for depression (boths ps ⬍
.001). (Note that, unless the alliance is considered as a subscale of
The WAI includes three scales, which were derived from Bor- treatment outcome, the more appropriate coefficient of determina-
din’s (1979) transtheoretical conception of the alliance (Horvath, tion is r, not r2; Ozer, 1985.) However, as in studies involving
1994). The Bond scale measures the therapeutic bond, which other alliance measures (see review by Horvath & Bedi, in press),
encompasses mutual liking, attachment, and trust. The Tasks scale the strength of the alliance– outcome association varied in complex
measures agreement on joint tasks, including the strategies and ways across alliance subscales, outcome measures, occasions of
techniques of treatment. The Goals scale measures agreement outcome assessment (posttreatment, follow-up), and the point in
about treatment goals, including the areas targeted for change. treatment at which the alliance was measured (Stiles et al., 1998).
Although the WAI was developed as a self-report instrument As a contrasting example, client mean ARM Bond scores were
(Horvath & Greenberg, 1986, 1989), Tichenor and Hill (1989) correlated only .16 with residual gains on the Symptom Check-
“adapted [an observer-rated version] from the client and therapist list—90 —Revised (Derogatis, Lipman, & Covi, 1973) and .16
forms by altering the pronouns to fit an observer perspective” (p. with residual gains on the Inventory of Interpersonal Problems
197). This observer form was formally specified in a rating manual (Horowitz, Rosenberg, Baer, Ureno, & Villasenor, 1988) assessed
developed by Raue, Goldfried, and Barkham (1997) and used in at 6-month follow-up in the same sample ( p ⬎ .05). Current
one of our samples. conceptualizations of the alliance do not offer clear accounts of
The ARM was developed using a mixed conceptual– empirical these complex variations (Stiles et al., 1998). Furthermore,
strategy, with items constructed to encompass scale content from whereas some previous authors concluded that alliance– outcome
many previous measures, including the WAI, and then refined associations are stronger when the alliance is assessed from the
through three iterations of item selection and rewriting, adminis- client rather than from the therapist perspective and assessed early
tration to therapy dyads, and factor analysis (Agnew-Davies et al., rather than late in treatment (see reviews by Constantino et al.,
1998). Items were selected for five scales using three criteria: 2002; Horvath & Symonds, 1991), Stiles et al. (1998) found that
statistical coherence (based on the factor analyses), conceptual similar correlations with residual gains were obtained from ther-
coherence (judged from item content), and comparability of items apist and from client ratings (reflecting, perhaps, the closer atten-
across client and therapist forms. The five scales are Bond, which tion given to the therapist form in the ARM’s construction) and
concerns the friendliness, acceptance, understanding, and support that alliance– outcome correlations were generally higher when
in the relationship; Partnership, which concerns working jointly on alliance was measured later rather than earlier in treatment.1 As-
therapeutic tasks and toward therapeutic goals; Confidence, which sessing the extent to which different alliance scales measure the
concerns optimism and respect for the therapist’s professional same thing must be an early step in any attempt to understand such
competence; Openness, which concerns the client’s felt freedom to complexities empirically.
disclose personal concerns without fear or embarrassment; and The Bond scales on the ARM and the WAI were meant to
Client Initiative, which concerns the client’s taking responsibility measure the same construct, and the ARM’s Partnership scale was
for the direction of the therapy. Items and scales are parallel across meant to measure the constructs measured by the WAI’s Tasks and
client and therapist forms (Agnew-Davies et al., 1998). Advan- Goals scales considered together (Agnew-Davies et al., 1998). The
tages of the ARM include incorporation of content areas drawn ARM’s Confidence, Openness, and Initiative scales were meant to
broadly from previous alliance work, a simple format cast in measure constructs described elsewhere in the alliance literature;
language appropriate for most therapeutic approaches, and parallel
forms developed for therapists and clients. As noted by Horvath
1
and Luborsky (1993), in previous instruments, “therapists’ scales Across ARM alliance scales and across measures and occasions of
are direct rewordings of client instruments; thus far no effort has assessment, the prediction of residual gains from alliance scores tended to
improve from earlier to later sessions. The mean improvement-in-
been made to investigate the specific impressions and experiences
prediction correlation was .33 for client-rated alliance and .43 for therapist-
that therapists associate with the clients’ experience of positive
rated alliance (N ⫽ 140 improvement-in-prediction correlations for each
alliance” (p. 565). The ARM assesses broader aspects of the mean). Each improvement-in-prediction correlation was the correlation
therapeutic relationship than do most previous instruments, en- between session number and the correlation of residual gain (five mea-
compassing client initiative, openness, and confidence in addition sures ⫻ three occasions of assessment, except one measure was not
to the core components of bond and partnership described by administered at one of the assessments) with alliance (five ARM scales),
Bordin (1979). calculated for each of two treatment groups (Stiles et al., 1998).
CONVERGENT VALIDITY OF THE ARM AND THE WAI 211

for example, the content of the ARM Openness scale overlaps with In a somewhat similarly designed study with a larger sample of
the Patient Working Capacity scale of the California Psychother- clients (N ⫽ 60) drawn from a clinical trial of three treatments for
apy Alliance Scales (CALPAS; Marmar, Horowitz, Weiss, & substance abuse, Cecero et al. (2001) compared client-and
Marziali, 1986; Marmar, Weiss, & Gaston, 1989). The content of therapist-rated WAI scales with observer-rated WAI, CALPAS,
the ARM Confidence scale overlaps with therapist and client PENN, and VTAS scales. The clients included 15 (25%) women
Confident Collaboration factors identified in analyses of pools of and 30 (50%) minority participants; 35 (58%) were unemployed.
items drawn from several alliance measures (Hatcher, 1999; All met criteria for a diagnosis of cocaine dependence and alcohol
Hatcher & Barends, 1996). Empirically, in comparisons made abuse or dependence and had been randomly assigned to
within each instrument and within client, therapist, or observer cognitive– behavioral treatment (n ⫽ 21), 12-step facilitation (n ⫽
perspectives, the ARM Bond, Partnership, and Confidence scales 14), or clinical management (n ⫽ 25). Cecero et al. observed,
and, similarly, the three WAI scales (Bond, Tasks, and Goals) have “There was a pattern of strong positive correlations among the
been highly intercorrelated (r ⬎ .80 in most comparisons; e.g., observer-rated measures and more modest yet significant correla-
Agnew-Davies et al., 1998; Horvath, 1994; Horvath & Greenberg, tions between the observer-rated measures and the therapist ver-
1986, 1989; Raue et al., 1997). The ARM Openness and Initiative sion of the WAI” (p. 7). The client-rated WAI “was not signifi-
scales were less strongly correlated with other ARM scales (.18 ⱕ cantly related to any of the observer-rated measures” (p. 6). Within
r ⱕ .66; Agnew-Davies et al., 1998). the pattern of strong correlations among observer-rated measures,
The high intercorrelations among some alliance scales would however, there were anomalies for some scales. For example, the
justify collapsing them into a core alliance index, for example, CALPAS Mutual Goals scale was correlated only .19 with the
aggregating the ARM Bond, Partnership, and Confidence scales or WAI Goals scale, although it was correlated .45 with the VTAS
the WAI Bond, Tasks, and Goals scales. For example, Kivlighan Mutuality scale.
and Shaughnessy (2000) measured the alliance as the average of Safran and Wallner (1991) compared self-report versions of the
the three WAI scales in their study of patterns of relationship CALPAS and the WAI in a sample of 22 clients (11 men and 11
development in counseling dyads. Other alliance investigators, women, 24 to 52 years old) who presented with depression-related
however, continue to use the scales separately to retain the con- symptoms (54%), anxiety-related symptoms (32%), or a combina-
ceptual distinctions, and there have been some findings suggesting tion of both depression- and anxiety-related symptoms (14%) and
differential relations with other variables (Horvath, 1994; Horvath received time-limited (20-session) cognitive therapy from 1 of 9
& Greenberg, 1989; Horvath & Luborsky, 1993). For example, in therapists (5 master’s level and 4 doctoral level, with 1 to 5 years
one of the clinical trials from which the present samples were of experience with the approach). Clients completed both measures
drawn, clients’ ARM Confidence mean (i.e., Confidence scores after their third session. Safran and Wallner found high correla-
averaged across sessions) was more highly correlated with residual tions of the CALPAS Patient Commitment and Therapist Positive
gain at end of treatment on the Symptom Checklist—90 —Revised Contribution scales with the three WAI scales (.64 ⱕ r ⱕ .82).
(r ⫽ .30, p ⬍ .01) than was their ARM Bond mean (r ⫽ .14, ns; CALPAS Goal Disagreement was correlated strongly with WAI
Stiles et al., 1998). We examined convergence for each scale Goals (r ⫽ .73) and Tasks (r ⫽ .68) scales but more moderately
separately, and we have addressed the issue of overlap in our with WAI Bond (r ⫽ .39). CALPAS Patient Working Capacity
discussion. and Therapist Negative Contributions scales had lower correla-
tions with the WAI scales (.07 ⱕ r ⱕ .48).
Previous Research on the Convergent Validity Bachelor (1991) compared self-report versions of the PENN, the
of Alliance Measures Vanderbilt Psychotherapy Process Scale (VPPS; Suh, Strupp, &
O’Malley, 1986), and the Therapeutic Alliance Rating System
The few previous direct comparisons of alliance measures with (TARS; Marziali, 1984), each completed two or three times
each other have tended to show strong convergence within client, (roughly, at the 3rd, 10th and final sessions, averaged for analyses)
therapist, and observer perspectives but moderate to poor conver- by 37 female clients (M age ⫽ 31.2 years) and 10 male clients (M
gence between these different perspectives. Tichenor and Hill age ⫽ 28.9 years), including students and community residents,
(1989) compared client, therapist, and observer versions of the seen for therapy in a university consultation service, and by their
WAI with observer-rated versions of three other measures of therapists, who were master’s-level students in clinical psychol-
working alliance, the CALPAS, the Penn Helping Alliance Scales ogy. Diagnoses, as noted on the service’s form, included 33%
(PENN; Alexander & Luborsky, 1986), and the Vanderbilt Ther- interpersonal problems, 28% personality disorders, 35% psycho-
apeutic Alliance Scale (VTAS; Hartley & Strupp, 1983), each neuroses, and 4% marital or sexual problems, or both. Although
aggregated across its constituent scales to form a (core) alliance both client- and therapist-rated alliance measures were gathered,
index. These measures were averaged across four sessions from only within-perspective correlations were reported. These showed
each of eight therapist– client dyads. All of the clients were moderate to good agreement between conceptually corresponding
women 32 to 60 years old; therapists included four men and four scales; the strongest correlations involved scales that described
women 34 to 78 years old, with 5 to 42 years postdoctoral positively toned characteristics of the therapist, for example, the
experience (described more fully by Hill, 1989). The CALPAS, PENN Type 1 scale, which reflects the client’s experience of
PENN, and VTAS were all highly correlated with the observer- receiving help or a helpful attitude from the therapist, the VPPS
rated WAI (.71 ⱕ r ⱕ .84; N ⫽ 8) and had more mixed intercor- Therapist Warmth and Friendliness scale, and the TARS Therapist
relations among themselves (.34 ⱕ r ⱕ .80). None of the four Positive scale (.62 ⱕ r ⱕ .82 within each perspective).
observer-rated measures, however, was significantly correlated In a series of studies, Hatcher and colleagues (Hatcher, 1999;
with either the therapist or client self-report WAI ratings. Hatcher & Barends, 1996; Hatcher, Barends, Hansell, & Gut-
212 STILES ET AL.

freund, 1995) studied client and therapist self-report versions of relationship problems (Safran, Crocker, McMain, & Murray, 1990;
the WAI, the CALPAS, and (in the first two studies) the PENN, Safran & Muran, 1996). Correlations among alliance scores at the
collected at one assessment occasion per dyad (after varying dyad level are independent of correlations among alliance scores at
numbers of sessions) in a university psychology clinic. They did the session level, and the interpretations of dyad-level correlations
not directly compare the scales usually scored on these instruments are different from those of session-level correlations (Dill-
but instead investigated the factor structure of the global alliance Standiford, Stiles, & Rorer, 1988; Norman, 1967). In the interscale
(total) scores (in the first study) or of the aggregate pools of items comparisons reviewed earlier (Bachelor, 1991; Cecero et al., 2001;
(in the second and third studies). First, Hatcher et al. used confir- Hatcher, 1999; Hatcher & Barends, 1996; Hatcher et al., 1995;
matory factor analysis on global alliance scores by 38 therapists Safran & Wallner, 1991; Tichenor & Hill, 1989), investigators
and their 144 clients and confirmed three general alliance fac- correlated alliance scores only across dyads, even when data were
tors—a shared-view factor incorporating client and therapist gathered from several of each dyad’s sessions.
views, along with separate client and therapist factors. The load-
ings on the shared-view factor suggested that clients and therapists Aims and Design
tended to agree on helpfulness and on therapist clarity about goals
and tasks. Next, Hatcher and Barends used exploratory factor We aimed to assess convergent validity for multiple dimensions
analysis on ratings by 231 clients and identified six factors in the of the alliance as measured by the ARM and the WAI within and
full pool of items, notably a Confident Collaboration factor that between the perspectives of client, therapist, and observer at dyad
drew items from all three measures and that showed the highest and session levels. We were particularly interested in assessing
correlation of any factor with clients’ estimates of improvement, convergence for the ARM scales. Whereas the WAI scales had
gathered at the same assessment occasion (r ⫽ .37). Finally, using shown good convergence with core alliance scales on other instru-
Perfect Congruence Analysis (Ten Berge, 1986), Hatcher first ments (albeit only within perspectives at the dyad level; Bachelor,
identified components in the WAI and the CALPAS in a survey 1991; Cecero et al., 2001; Safran & Wallner, 1991; Tichenor &
sample (251 therapists who each rated one client selected from Hill, 1989), the ARM was relatively new, and its convergent
their current practice) and then confirmed the components in validity was previously untested.
ratings gathered in the university psychology clinic (ratings by 63 According to the measures’ conceptualization, the strongest
therapists of 259 clients). In separate analyses, he confirmed four convergence should be (a) between the Bond scales on the two
components in the WAI and five components in the CALPAS, measures and (b) between ARM Partnership and the WAI Goals
none of which corresponded closely to the scales usually scored. and Tasks scales. Other correlations should be weaker. However,
Then he confirmed a Therapists’ Confident Collaboration factor, more pragmatically, the previously noted findings of high inter-
which drew items from both measures and had the highest corre- correlations among ARM Bond, Partnership, and Confidence
lation with therapist estimates of improvement (r ⫽ .64 and .62 in scales and among WAI Bond, Tasks, and Goals scales led us to
the two samples). In a subsample of the clinic group (n ⫽ 190), expect substantial correlations between these sets of scales, with
Therapists’ Confident Collaboration also had a small but signifi- ARM Openness and Initiative appearing more distinct.
cant correlation with patients’ estimates of improvement (r ⫽ .17, Our data were drawn from two previously reported comparative
p ⬍ .02). Correlations of the therapist factors with the client clinical trials of brief therapy for depression, a collaborative psy-
alliance factors previously identified by Hatcher and Barends chotherapy project (CPP) carried out in three outpatient facilities
based on this subsample ranged from negligible to moderate of the National Health Service (NHS) of the United Kingdom
(⫺.05 ⱕ r ⱕ .38), consistent with previous interperspective (Barkham et al., 1996) and the Second Sheffield Psychotherapy
comparisons. Project (SPP2), conducted in a university-based research clinic in
the United Kingdom (Shapiro et al., 1994). The CPP was designed
as a replication and extension of SPP2. In both projects, the ARM
Dyad and Session Levels of Analysis
was the primary measure of the alliance, completed by clients and
We assessed convergent validity of the ARM and the WAI at therapists after each session. The WAI was used only in subsets of
two levels: the dyad level (correlations of means across therapist– the sessions, and the present study considered only the sessions for
client pairs) and the session level (correlations of deviation scores which both ARM and WAI data were available. In one half of the
across sessions within dyads), capitalizing on previously collected CPP cases, both therapists and clients completed the WAI after
samples in which both the ARM and the WAI were applied across each session (yielding data on 198 sessions of 18 clients and 4
multiple sessions of each dyad. As we use the term, dyad-level therapists); these data have not been previously reported. In SPP2,
mean is the mean of a scale across a dyad’s sessions. Dyad-level observers applied the WAI to selected audiotaped sessions (78
means thus reflect characteristics of a particular client–therapist sessions of 39 clients and 5 therapists). There have been previous
pairing averaged across sessions.2 A session-level deviation score
is the deviation of a raw scale score from the mean score for that 2
case (i.e., the difference between the raw score and the correspond- We did not assess convergent validity separately at the therapist level
(mean scores aggregated across each therapist’s clients) and client level
ing dyad-level mean). Session-level deviation scores thus reflect
(deviation of client-level means from therapist-level means). Most appli-
session-to-session variation within a case. To illustrate, alliance is cations of alliance measures consider the dyad to be the relevant unit (e.g.,
considered as a dyad-level variable when it is used to predict for comparisons with outcome), folding effects of therapist differences into
treatment outcome but as a session-level variable when it is used a dyad-level mean, whose convergent validity was thus of focal interest
to follow alliance rupture and repair cycles, which are thought to (see reviews by Horvath & Bedi, in press; Horvath & Symonds, 1991;
be a therapeutically important arena for in-session work on clients’ Orlinsky et al., 1994; but see Hatcher et al., 1995, for an exception).
CONVERGENT VALIDITY OF THE ARM AND THE WAI 213

reports of the ARM data in the full SPP2 sample (Agnew-Davies The WAI. The WAI (Horvath & Greenberg, 1986, 1989) is composed
et al., 1998; Stiles et al., 1998) and of the observer WAI data (Raue of 36 items on 7-point scales, with parallel items in the client and therapist
et al., 1997), but these ARM and WAI data have not previously self-report versions. It includes three scales, each composed of 12 items:
been compared. therapeutic Bond (e.g., “My therapist and I understand each other”),
agreement on Tasks (e.g., “I am clear about what my responsibilities are in
CPP and SPP2 had very similar overall designs. Briefly, clients
therapy”), and agreement about Goals (e.g., “The goals of these sessions
who met criteria that included primarily a diagnosis of major
are important for me”). Internal consistency reliabilities of the three WAI
depressive episode were randomly assigned to receive either 8 scales are reported in Table 1.
or 16 sessions of either cognitive– behavioral (CB) or The ARM. The ARM (Agnew-Davies et al., 1998) is composed of 28
psychodynamic–interpersonal (PI) therapy. Clients’ degree of sentences describing the client, the therapist, and the client–therapist rela-
change was assessed at the end of treatment and at follow-up tionship, rated on parallel forms by clients and therapists using 7-point
assessments 3 months and 1 year after treatment. The treatments scales anchored from strongly disagree to strongly agree. Instructions on
studied in CPP and SPP2 were generally effective. Clients in all the form read, “Thinking about today’s meeting, please indicate how
cells of the design averaged substantial improvement in both strongly you agree or disagree with each statement.” Parallel items concern
studies. These results, along with results of comparisons among the same person’s experience as viewed from two perspectives. For exam-
ple, the item “I feel friendly towards my therapist” in the client version is
experimental conditions, have been reported previously (Barkham
considered as parallel to the item “My client is friendly towards me” in the
et al., 1996; Shapiro et al., 1994, 1995).
therapist version (note that the latter item asks whether the client “is
friendly,” a judgment from observation, rather than whether the client
Study 1: Comparisons Within and Between Client and “feels friendly,” an inference about the client’s private feeling).
Therapist Perspectives
Table 1
To study how the ARM converged with the WAI within and
Means, Standard Deviations, and Internal Consistency of the
between the client and therapist perspectives, we drew data from
ARM and the WAI
the CPP (Barkham et al., 1996).
CPP SPP2
Method (n ⫽ 198 sessions)a (n ⫽ 78 sessions)b

Participants. Clients (n ⫽ 18) were 11 women and 7 men with a mean Alliance scale M SD ␣ M SD ␣
age of 39 years (range ⫽ 19 –55) who had received a diagnosis of major
depressive episode and had met other inclusion and exclusion criteria, Clients
including (a) continuous history of the presenting disorder less than 2 ARM
years, (b) no more than three sessions of formal psychotherapy within Bond (6) 6.08 0.98 .87 6.07 0.77 .81
previous 5 years, and (c) no significant change in psychotropic medication Partnership (4) 6.06 0.97 .81 5.96 0.90 .78
within the previous 6 weeks. A further criterion that clients be employed in Confidence (7) 5.91 0.98 .84 5.96 0.85 .86
a professional, managerial, or other white-collar occupation was aban- Openness (5) 5.75 1.05 .73 5.72 1.06 .78
doned part way through the study because of difficulty finding NHS clients Initiative (4) 4.23 1.00 .59 4.48 0.84 .54
who met that criterion. The clients were seen for psychotherapy in three WAI
NHS hospitals, in Leicester, Huddersfield, and Sheffield, England. By Bond (12) 5.79 0.97 .91
Tasks (12) 5.73 1.01 .92
design, they represented a randomly selected half of the 36 CPP clients
Goals (12) 5.48 1.07 .90
(Barkham et al., 1996) who completed the WAI as well as the ARM after
each of their sessions. The other half of the CPP clients instead completed Therapists
measures dealing with another topic. All of the participating clients gave
written informed consent for their data to be used for research. ARM
The therapists in CPP were 4 clinical psychologists (1 man and 3 Bond (6) 5.91 0.69 .83 5.38 0.91 .87
Partnership (4) 5.52 0.90 .83 5.15 1.03 .77
women, with 0 to 6 years of experience since completion of professional
Confidence (7) 5.31 0.85 .89 4.96 1.08 .87
training) employed in the clinics where the study was conducted. They Openness (5) 5.70 0.95 .89 5.07 1.08 .85
were investigators in the project, but, as clients were told, they did not have Initiative (4) 4.99 0.73 .47 4.62 0.59 .41
access to research data until after treatment was completed. The therapists WAI
were selected for, and encouraged to maintain, a balanced belief in the Bond (12) 5.82 0.68 .90
effectiveness of both CB and PI therapies. All of the therapists were trained Tasks (12) 5.46 0.85 .93
in both CB and PI treatment protocols, and each therapist’s clients were Goals (12) 5.28 0.95 .92
distributed approximately evenly across all cells in the design.
Observers
Treatments. The CB and PI therapies have been described in the
previous reports and in manuals developed for the Sheffield projects (Firth WAI
& Shapiro, 1985; Shapiro & Firth, 1985). Briefly, the CB treatment was a Bond (12) 6.13 0.46 .95
multimodal method emphasizing the provision of a wide range of cognitive Tasks (12) 6.01 0.50 .94
and behavioral strategies, including anxiety-control training, self- Goals (12) 6.07 0.48 .94
management, and cognitive restructuring (Beck, Rush, Shaw, & Emery,
Note. Numbers in parentheses indicate the number of items on the sub-
1979; Goldfried & Merbaum, 1973; Snaith, 1974). The PI treatment was
scales. ARM ⫽ Agnew Relationship Measure; WAI ⫽ Working Alliance
based on Hobson’s (1985) Conversational Model of therapy and used a Inventory; CPP ⫽ Collaborative Psychotherapy Project; SPP2 ⫽ Second
combination of psychodynamic, interpersonal, and experiential concepts. It Sheffield Psychotherapy Project.
focused on the client–therapist relationship as a vehicle for revealing and a
Means based on 186 to 196 sessions because of missing data on some
resolving interpersonal difficulties, which were viewed as primary in the items. b Means based on 75 to 77 sessions because of missing data on
origins of depression. some items.
214 STILES ET AL.

The ARM includes five scales. The following examples are from the completed. We used WAI and ARM data from all 198 of the 18 clients’
client version (see Agnew-Davies et al., 1998, for a list of all of the items). sessions (2 scheduled sessions of 1 client’s treatment were not conducted).
The scales were Bond (6 items, e.g., “My therapist accepts me no matter On some forms, however, a few items were not completed (less than 1%),
what I say or do”), Partnership (4 items, e.g., “My therapist follows his/her so scores could not be calculated on a few scales for a few sessions.
own plans, ignoring my views of how to proceed”; reversed), Confidence Data reduction for analysis. We combined data from CB and PI
(7 items, e.g., “I have confidence in my therapist and his/her techniques”), treatments and for 8- and 16-session treatments in our analyses. Hetero-
Openness (5 items, e.g., “I feel I can openly express my thoughts and geneity in types of treatments is appropriate for assessing convergent
feelings to my therapist”), and Client Initiative (4 items, e.g., “I take the validity of alliance measures, insofar as one potential use of these measures
lead when I’m with my therapist”). Two of the ARM’s 28 items were not is to compare the alliance across treatments.
used in any scale because their factor loadings were low or inconsistent Raw scores of each WAI and ARM scale were calculated as the means
across client and therapist perspectives. Internal consistency reliabilities of of constituent items (each scored 1–7, reversed for negatively worded
the five ARM scales are reported in Table 1. items) for each session. Scale scores were treated as missing if any
CPP procedure. Parallel procedures for client selection were used at constituent item on the scale was missing. Means, standard deviations, and
the three NHS sites. Referral letters were scanned for reference to depres- internal consistencies for each scale are shown in Table 1. Internal con-
sion as a presenting problem, and background information was checked for sistencies for all of the scales were good, except for the ARM Initiative
consistency with criteria for admission to the study. Possible referrals were scale; they were slightly higher for WAI scales than for ARM scales,
brought to a weekly meeting attended by all of the participating therapists. possibly reflecting the WAI scales’ greater number of items.
In the absence of excluding evidence, clients were mailed an invitation, an We calculated dyad-level means as the means of each WAI or ARM raw
information sheet on the project, and screening measures for completion. score on each scale across each client’s sessions. We then calculated
Clients returning materials who appeared likely to meet criteria were then session-level deviation scores as deviations of the raw scores from each
offered a clinical interview. therapist’s or client’s corresponding dyad-level mean.
Clients were interviewed by independent assessors. The interviews were
structured to gather sufficient information to determine Diagnostic and
Statistical Manual of Mental Disorders (3rd ed.; American Psychiatric
Association, 1980) diagnoses of major depressive episode, generalized Results
anxiety disorder, and panic disorder. Clients also completed a battery of
self-report assessment measures. Clients were excluded for psychotic, The first six columns of Table 2 show the dyad-level correla-
manic, or obsessional symptoms, or if depression was attributable to tions between the ARM and the WAI in the CPP. Dyad-level
organic illness. Clients meeting criteria were randomly assigned to one of interscale correlations indicate the degree to which respondents
the four treatment conditions. All of the participating clients signed a who reported generally strong or weak alliances on the ARM
consent form describing the treatment they were to receive and outlining reported similarly strong or weak alliances on the WAI.
the schedule for assessments. The 18 clients who provided ARM and WAI The first six columns of Table 3 show the session-level corre-
data included 6 assigned to 8 sessions of CB, 5 assigned to 8 sessions of lations between the ARM and the WAI in the CPP. Session-level
PI, 3 assigned to 16 sessions of CB, and 4 assigned to 16 sessions of PI. For
interscale correlations indicate the extent to which alliance scores
further details regarding client selection and assessment procedures, see
Barkham et al. (1996). covaried across sessions for an average dyad.
Sessions took place weekly, and missed sessions were rescheduled. The number of correlations we calculated made it inappropriate
Clients and therapists completed the ARM and the WAI immediately after (because of family-wise Type 1 error), as well as impractical, to
each session. Completed forms were returned to clinic secretaries with the consider each nominally significant result separately. A full Bon-
understanding that they would not be examined until the therapy was ferroni correction for the 90 correlations in each table (including

Table 2
Correlations of Dyad-Level Means of the ARM and the WAI

WAI scale

CPP clients (n ⫽ 18) CPP therapists (n ⫽ 18) SPP2 observers (n ⫽ 39)

ARM scale Bond Tasks Goals Bond Tasks Goals Bond Tasks Goals

Clients
Bond .91** .89** .86** .74** .65* .69* .31 .11 .14
Partnership .91** .92** .91** .58* .56* .59* .44* .42* .42*
Confidence .90** .96** .91** .66* .66* .70* .30 .19 .24
Openness .71* .53* .63* .41 .32 .36 .24 .11 .12
Initiative .21 .50* .45 .13 .14 .12 .05 ⫺.08 ⫺.12
Therapists
Bond .65* .62* .61* .97** .85** .83** .20 .07 .11
Partnership .60* .67* .64* .91** .96** .91** .35* .47* .44*
Confidence .69* .68* .66* .82** .96** .92** .21 .30 .35
Openness .70* .58* .55* .82** .79** .75** ⫺.09 .06 .08
Initiative .32 .28 .40 .55* .46 .49* .07 .15 .12

Note. ARM ⫽ Agnew Relationship Measure; WAI ⫽ Working Alliance Inventory; CPP ⫽ Collaborative
Psychotherapy Project; SPP2 ⫽ Second Sheffield Psychotherapy Project.
* nominal p ⬍ .05. ** nominal p ⬍ .0005 (Bonferroni-corrected p ⬍ .05).
CONVERGENT VALIDITY OF THE ARM AND THE WAI 215

Table 3
Correlations of Session-Level Deviation Scores of the ARM and the WAI, Adjusted for Mean
Differences Among Therapist–Client Pairs

WAI scale

CPP clients CPP therapists SPP2 observers


(n ⫽ 198 sessions)a (n ⫽ 198 sessions)a (n ⫽ 78 sessions)b

ARM scale Bond Tasks Goals Bond Tasks Goals Bond Tasks Goals

Clients
Bond .70** .60** .63** .29** .23** .32** .48** .43** .46**
Partnership .54** .54** .61** .20* .15* .19* .36* .45** .51**
Confidence .56** .63** .61** .21* .22* .27** .25* .29* .32*
Openness .43** .34** .37** .27** .13 .19* .22 .40** .35*
Initiative .24* .21* .23* .15 .00 .10 .09 .14 ⫺.09
Therapists
Bond .33** .23* .28** .81** .67** .57** .60** .61** .61**
Partnership .38** .42** .36** .79** .80** .80** .55** .54** .59**
Confidence .31** .43** .38** .74** .85** .83** .55** .59** .59**
Openness .34** .30** .27* .65** .52** .66** .28* .39** .31*
Initiative .24* .34** .26* .40** .43** .50** ⫺.03 .05 .04

Note. ARM ⫽ Agnew Relationship Measure; WAI ⫽ Working Alliance Inventory; CPP ⫽ Collaborative
Psychotherapy Project; SPP2 ⫽ Second Sheffield Psychotherapy Project.
a
Correlations based on 168 to 188 sessions because of missing data on some items. b Correlations based on 75
to 77 sessions because of missing data on some items.
* nominal p ⬍ .05. ** nominal p ⬍ .0005 (Bonferroni-corrected p ⬍ .05).

SPP2 as well as CPP analyses) demands a nominal significance Study 2: Comparisons of the Observer Perspective With
level of .000555 to achieve a conventional .05 significance level Client and Therapist Perspectives
for any single correlation. Correlations that met this criterion are
indicated in the tables. Ignoring all of the correlations that failed to To study how the client- and therapist-rated ARM converged
achieve this corrected significance level, however, would yield an with the observer-rated WAI, we drew data from SPP2 (Shapiro et
unacceptable rate of Type 2 errors—falsely accepting the null al., 1994).
hypothesis and thus overlooking relations that may be real. We
offer some observations based on the broad patterns of correlations
Method
and the magnitude of the effects. Our observations focus on this
study’s goal of assessing convergent validity of ARM and WAI Participants. Clients (n ⫽ 39) were 25 women and 14 men with a
dimensions within and between client and therapist perspectives at mean age of 41 years (range ⫽ 23– 60) who met the same criteria as clients
in CPP and, in addition, worked in professional, managerial, and other
both dyad and session levels.
white-collar occupations. They were self-referred or referred by general
1. Within client and therapist perspectives, the ARM Bond,
practitioners or occupational health workers for treatment of depression.
Partnership, and Confidence scales were strongly correlated with They were seen for treatment in a research clinic in Sheffield, United
all three WAI scales (Bond, Tasks, Goals) at both dyad and session Kingdom. All of the participating clients gave written informed consent for
levels. For example, the Bond scales on the two measures were their data to be used for research.
correlated .91 for clients and .97 for therapists at the dyad level The 39 SPP2 clients whose sessions were measured with both the ARM
(Table 2) and .70 for clients and .81 for therapists at the session and the WAI were a subset of 117 clients who participated in SPP2. They
level (Table 3). represented the intersection of two subgroups: (a) those who completed the
final version of the ARM (n ⫽ 79) and (b) those whose sessions were
2. The conceptual expectation that the correlations between
subsequently rated using the observer version of the WAI (n ⫽ 57).
ARM and WAI Bond scales and between ARM Partnership and Because the ARM was being developed while SPP2 was in progress, the
WAI Tasks and Goals scales should be distinctively higher than final version was used only by the last 79 SPP2 clients (Agnew et al.,
other intercorrelations was not fulfilled. Instead, the within- 1998). After the completion of SPP2, 2 sessions from each of the 57 clients
perspective correlations of all of these scales with each other (and who had been assigned to 16-session treatments were rated on the observer
with ARM Confidence) appeared generally similar (Tables 2 and 3). version of the WAI by Raue et al. (1997), as described later.
3. The ARM Openness and Initiative scales had relatively lower The therapists in SPP2 were 5 research clinical psychologists working in
the research clinic (3 men and 2 women, with 1 to 17 years of experience
correlations with WAI scales, although most of them were positive
since completion of professional training). None of them were therapists in
and some were substantial. CPP. As in CPP, however, the therapists were investigators in the project
4. WAI–ARM correlations within client and therapist perspec- and trained in both CB and PI therapies, and their clients were distributed
tives were generally higher than correlations between these two across cells in the design. Treatments followed the same protocols as those
perspectives at both dyad and session levels (Tables 2 and 3). in CPP.
216 STILES ET AL.

The observer-rated WAI. The observer-rated version of the WAI (Raue three observer WAI scales than did the ARM Openness and
et al., 1997; Tichenor & Hill, 1989), like the self-report version, is com- Initiative scales, as was the case at both levels in Study 1.
posed of 36 items on a 7-point scale. Items in the three 12-item scales are 2. As in Study 1, the conceptual expectation of distinctively
parallel to those in the self-report versions but reworded to represent an higher correlations between ARM and WAI Bond scales and
external perspective (Bond, e.g., “There is a good understanding between
between ARM Partnership and WAI Tasks and Goals scales was
the client and therapist”; Tasks, e.g., “There is agreement about what the
not fulfilled.
client’s responsibilities are in therapy”; and Goals, e.g., “There is a per-
ception that the goals of the sessions are important for the client”). Internal 3. At the dyad level (Table 2), the ARM Partnership scale was
consistency reliabilities of the three observer-rated WAI scales are reported more highly correlated with the observers’ WAI scales than were
in Table 1. the other ARM scales. This pattern was apparent for both clients’
SPP2 procedure. As in CPP, clients who met screening criteria were and therapists’ ARM ratings.
invited for an assessment interview at which the battery of assessment 4. At the dyad level (Table 2), most of the correlations of
measures was administered. Clients who were diagnosed with major de- observers’ WAI scales with clients’ and therapists’ ARM scales
pressive episode and met the other criteria were invited to join the study (Partnership excepted) were surprisingly low. As noted earlier, the
and, if they accepted, were randomly assigned to one of the treatment two sessions that contributed to each dyad-level mean in SPP2
conditions. Informed consent was obtained before randomization. Clients
were selected as extreme with respect to therapist-judged helpful-
were seen weekly, and missed sessions were rescheduled. Of the 39 SPP2
ness. In an exploratory analysis we recalculated these correlations
clients in this study, 20 were assigned to 16 sessions of CB therapy and 19
were assigned to 16 sessions of PI therapy. Batteries of assessment mea- separately for the least helpful and the most helpful sessions (Table
sures were readministered at the end of treatment, at 3-month follow-up, 4). For clients, the resulting pattern of correlations in the least and
and at 1-year follow-up (see Shapiro et al., 1994, 1995, for further details most helpful sessions were similar to each other and to that shown
regarding SPP2 procedures). in Table 2. However, the correlations of therapist ARM scores
As in CPP, clients and therapists completed the ARM immediately after with observer WAI scores were much higher in the least helpful
each session. Completed forms were returned to clinic secretaries with the sessions than in the most helpful sessions, as shown in Table 4.
understanding that they would not be examined until the therapy was 5. Looking across studies, at the session level (Table 3), corre-
completed. lations of clients’ and therapists’ ARM ratings with observers’
Session selection and observer WAI rating. Two sessions from each of
WAI ratings (SPP2) were generally higher than were correlations
the 39 clients were rated on the observer version of the WAI (Raue et al.,
between client ARM and therapist WAI ratings or between ther-
1997). These sessions were selected mainly on the basis of therapists’
global ratings of session helpfulness on a 7-point scale, completed imme- apist ARM and client WAI ratings (CPP). For example, client
diately after each session. One was the session rated as the most helpful, Partnership was correlated .51 with observer Tasks but only .19
and the other as the least helpful, among Sessions 4 –13 (i.e., excluding the with therapist Tasks. Put another way, at the session level, thera-
first 3 and last 3 sessions). In cases of helpfulness ratings that were tied or pists and clients seemed to converge more with the observers than
within one point of each other (40% of the cases), the selection from among they did with each other. This comparison should be considered
these sessions was based on therapists’ ratings on the Session Evaluation cautiously, however, insofar as it was based on two different
Questionnaire (Stiles, Reynolds, Hardy, Rees, Barkham, & Shapiro, 1994), samples.
which was also completed after every session. (See Raue et al., 1997, for
further details regarding session selection.)
The raters for the observer form of the WAI were 6 students in a U.S.
clinical psychology doctoral program who were trained for approximately Table 4
one month to adequate reliability among themselves (intraclass correlation Correlations of Therapists’ Dyad-Level ARM Means With
coefficient ⬎ .60). They also met regularly throughout the rating period to Observers’ Dyad-Level WAI Means in SPP2 Sessions Judged by
maintain calibration (see Raue et al., 1997, for further details of rater the Therapist as Least or Most Helpful
characteristics, selection, and training). Rotating pairs of raters indepen-
dently rated the sessions by listening to the audiotape and reading the Observer WAI scale
transcript. Thus, each rater rated one third of the sessions, which were
presented in randomized order at the rate of approximately two per week. Therapist ARM scale Bond Tasks Goals
Data reduction for analysis. ARM and WAI scores, including dyad-
level means and session-level deviation scores, were calculated in the same Least helpful sessions (n ⫽ 39)a
Bond .43* .32* .32*
way as in Study 1. Means, standard deviations, and internal consistencies
Partnership .62** .58** .56**
are shown in Table 1. Confidence .47* .45* .44*
Openness .04 .19 .11
Results Initiative .16 .27 .23
Most helpful sessions (n ⫽ 39)a
The last three columns of Table 2 show the dyad-level correla- Bond ⫺.01 ⫺.21 ⫺.21
tions of the client- and therapist-rated ARM with the observer- Partnership ⫺.08 .08 .06
Confidence ⫺.06 .06 .12
rated WAI in SPP2. The last three columns of Table 3 show the
Openness ⫺.21 ⫺.11 ⫺.08
corresponding session-level correlations. Continuing our strategy Initiative ⫺.11 ⫺.01 ⫺.05
in reporting the Study 1 results, and mindful that there were many
correlations addressing similar questions, we focused on a few Note. Observer WAI ratings drawn from SPP2. ARM ⫽ Agnew Rela-
broad patterns that bear on this study’s goal of assessing the tionship Measure; WAI ⫽ Working Alliance Inventory; SPP2 ⫽ Second
Sheffield Psychotherapy Project.
convergent validity of ARM and WAI dimensions. a
Some correlations based on only 37 or 38 sessions because of missing
1. At the session level (Table 3), the ARM Bond, Partnership, data on some ARM items.
and Confidence scales had generally higher correlations with the * p ⬍ .05. ** p ⬍ .0005.
CONVERGENT VALIDITY OF THE ARM AND THE WAI 217

General Discussion correlations were lower than the within-perspective correlations in


CPP at both levels. The notably lower client–therapist convergence
For the core alliance scales on the ARM and the WAI, the at the session level (Table 3) than at the dyad level (Table 2) could
within-perspective dyad-level comparisons showed excellent con- reflect therapists being less cognizant of session-to-session fluctu-
vergent validity, extending previous findings (Bachelor, 1991; ations in clients’ evaluations than of the enduring qualities of this
Safran & Wallner, 1991; Tichenor & Hill, 1989). Specifically, the particular alliance.
correlations of the ARM Bond, Partnership, and Confidence scales For the core alliance scales at the session level in SPP2, both
with the WAI Bond, Goals, and Tasks within client and therapist clients’ and therapists’ convergence with observers was substantial
perspectives in CPP (Table 2) were all in the .80s and .90s. For and generally higher than clients’ and therapists’ convergence with
measuring these core aspects of the alliance at the dyad level, then, each other in CPP (rightmost three columns of Table 3). This
the ARM and the WAI seemed strong and nearly equivalent.3 It comparison could reflect sample differences, but, alternatively,
should be noted that averaging WAI and ARM scores across all of seems plausibly understood as reflecting the observers’ taking both
each client’s 8 or 16 sessions in the CPP sample probably made client and therapist perspectives into account, yielding intermedi-
these dyad-level means particularly strong and stable estimates of ate estimates of alliance qualities. Note that by selecting extreme
the alliance in these dyads, in contrast to the weaker dyad-level sessions to represent each client in SPP2, we ensured a large
estimates in the SPP2 sample, discussed later. within-client variance in alliance ratings (insofar as alliance ratings
Within-perspective convergent validity for the core alliance tended to be higher in the most helpful sessions and lower in the
scales also seemed strong at the session level (Table 3), although least helpful sessions; Raue et al., 1997) and thus probably opti-
the correlations were somewhat lower (.54 ⱕ r ⱕ .70 for clients; mized convergence at the session level.
.57 ⱕ r ⱕ .85 for therapists). Convergence at this level, which had At the dyad level, the convergence between participants and
not been shown previously, suggests that these ARM and the WAI observers in SPP2, shown in the rightmost three columns of
scales should yield results that are generally similar to each other Table 2, may have been anomalously low. Because the two ses-
in studies of session-to-session changes in the alliance, such as
sions that contributed to each dyad-level mean were selected as
rupture and repair or sudden gains within a case (cf. Agnew,
extreme with respect to therapist-judged helpfulness, they may
Harper, Shapiro, & Barkham, 1994; Tang & DeRubeis, 1999), as
have been divergent and unrepresentative. The contrast between
they should do when they are used for addressing dyad-level
the negligible therapist– observer convergence in the therapists-
issues, such as predicting outcome. The correlations may have
judged most helpful sessions and the substantial convergence in
been lower at this level simply because the session-level deviation
the least helpful sessions (Table 4) raises several possibilities. (a)
scores were based on fewer ratings and were therefore less reliable
The low correlations for the most helpful sessions may have
than the dyad-level means (internal consistency of the scales was
reflected a restriction of range—a ceiling effect due to uniformly
high, but there was no assessment of test–retest reliability, which
high ratings. Variances for most of the alliance scales (all except
would have involved asking participants to rate the relationship
ARM Initiative) were numerically smaller for the most than for the
twice on the same instrument after each session). Alternatively, it
least helpful sessions; however, these differences were small for
may be that the ARM and the WAI are differentially sensitive to
many of the scales, and they were statistically significant (by
some transitory aspects of the relationship. For example, clients’
reactions to session-to-session variation in the partnership may Levene’s test for equality of variances) only for the ARM Confi-
have diverged in some way from their sense of agreement on dence scale and the WAI Goals scale. (b) As a second possibility,
treatment tasks, so that their session-level deviation scores varied perhaps therapists’ estimates of the alliance tended to be exagger-
around their dyad-level means differently for ARM Partnership ated or distorted in sessions they judged to be extremely helpful,
than for WAI Tasks, leading to lower session-level convergence leading to low correlations with other indexes. Interperspective
(r ⫽ .54) than dyad-level convergence (r ⫽ .92). convergence may be stronger when therapists take a more sober
Finding moderate convergence between therapists and clients view of the session’s accomplishments. (c) As a third possibility,
for the core alliance scales at both levels in CPP represents a perhaps observers can more easily or accurately see when the
further confirmation of the alliance construct, insofar as it implies alliance is problematic than when it is good; that is, it may be
some mutual or shared experience of the relationship. This inter- relatively easy to identify moments of rupture, getting stuck,
perspective convergence contrasts with the null results reported by conflict, miscommunication, nonengagement, and so forth, but
Tichenor and Hill (1989), but it is consistent with correlations relatively difficult to assess the extent of liking, trust, deepening,
reported by Cecero et al. (2001) and with the shared-view global and the like. Several of these factors may have contributed to the
alliance factor identified by Hatcher et al. (1995), reviewed earlier. contrasting patterns shown in Table 4.
The correlations were of the same magnitude as those between Our results failed to show the expected differentiation among
client and therapist dyad-level means on the ARM’s core alliance the core alliance scales on the two instruments (ARM Bond,
scales in SPP2, reported in a previous study (Agnew et al., 1998).
It should be expected that the alliance will be experienced some- 3
To underline this within-perspective equivalence, we constructed core
what differently from different vantage points. For example, ther-
alliance indexes as the mean of the 17 items on the ARM Bond, Partner-
apists may tend to be more prospective, focused on making inter- ship, and Confidence scales and the mean of all 36 WAI items in the CPP
ventions likely to have a helpful impact on clients, whereas clients sample. The within-perspective dyad level correlations of these two in-
may tend to be more immediate or retrospective, focused on the dexes were .98 for clients and .97 for therapists. The within-perspective
comfort, safety, understanding, and life changes achieved up to session-level correlations of the ARM and WAI core alliance indexes were
that point. So it is not surprising that the between-perspective .79 for clients and .91 for therapists.
218 STILES ET AL.

Partnership, and Confidence and WAI Bond, Tasks, and Goals). haps unwittingly, to convey their investment to their clients. Such
Theoretically, on the basis of Bordin’s (Bordin’s 1979, 1994) care and diligence should improve validity; however, personal
conceptualization and the design and construction of the ARM investment expressed as a bias toward positive ratings could im-
(Agnew et al., 1998), one would expect relatively higher correla- pair convergent validity correlations by restricting the range of
tions (a) between the Bond scales on the two measures and (b) scores. The ARM and the WAI use similar rating formats (7-point
between ARM Partnership and WAI Tasks and Goals than be- Likert scales), and although none of the items are worded identi-
tween other combinations of scales. The same pattern would be cally, there is a good deal of overlap in content, raising the
expected within and between perspectives at both dyad and session possibility that method variables contributed to the convergence,
levels. Instead, we found no consistent pattern, and most of the particularly between the self-report versions in CPP. So far, there
correlations between the core scales were of roughly comparable is no observer version of the ARM, so we were unable to assess
magnitude within each set of comparisons. This lack of differen- convergence within the observer perspective.
tiation helps justify the common practice of combining the sepa- In summary, the ARM and the WAI appear to measure at least
rate alliance scales into measures of global alliance (e.g., Hatcher some of the same things. Our results suggest that investigators who
et al., 1995; Kivlighan & Shaughnessy, 2000; Tichenor & Hill, favor a global alliance measure could justifiably either aggregate
1989). the three ARM core alliance scales or use the aggregated WAI
The main exception to the lack of differentiation among the core scales. Investigators who take a multidimensional view of the
alliance scales was the distinctive convergence of participants’ alliance would do better with the ARM. Finding convergence at
ARM Partnership ratings with all three observer-rated WAI scales the session level as well as the dyad level, at least within client and
at the dyad level (Table 2), more specifically, in the least helpful therapist perspectives, suggests that session-to-session changes in
sessions (Table 4). Such unexpected observations should be rep- the alliance can be studied with either measure, just as can case-
licated before they are strongly credited. But, perhaps, in distin- to-case variation. The substantially lower convergence between
guishing among therapeutic dyads, observers applying the WAI perspectives than within perspectives underlines the importance of
scales in an undifferentiated way were distinctively sensitive to assessing the alliance from multiple viewpoints. Research exam-
relationship qualities that participants experienced as a partner- ining the different perspectives on the alliance could contribute to
ship—working as a team, sharing a view of the therapeutic tasks fine tuning of training in the skills needed for improving the
and goals—as distinct from alliance aspects measured by ARM alliance.
Bond and Openness. Such occasional evidence that the core scales
are measuring something different from each other (see also Hor- References
vath, 1994; Stiles et al., 1998) supports the argument for consid-
ering the core scales separately. Agnew, R. M., Harper, H., Shapiro, D. A., & Barkham, M. (1994).
Most of the correlations of the WAI scales with ARM Openness Resolving a challenge to the therapeutic relationship: A single case
and Initiative were relatively lower than were those with the ARM study. British Journal of Medical Psychology, 67, 155–170.
core alliance scales (Bond, Partnership, Confidence) for corre- Agnew-Davies, R., Stiles, W. B., Hardy, G. E., Barkham, M., & Shapiro,
D. A. (1998). Alliance structure assessed by the Agnew Relationship
sponding comparisons at both levels—particularly for the within-
Measure (ARM). British Journal of Clinical Psychology, 37, 155–172.
perspective comparisons in CPP. For example, at the dyad level in Alexander, L. B., & Luborsky, L. (1986). The Penn Helping Alliance
CPP (Table 2), all of the correlations of client WAI scales with scales. In L. S. Greenberg & W. M. Pinsof (Eds.), The Psychotherapeu-
client ARM Openness and Initiative scales were lower (.21 ⱕ r ⱕ tic Process: A Research Handbook (pp. 325–366). New York: Guilford
.71) than all of the correlations of the client WAI scales with the Press.
client ARM core alliance scales (.86 ⱕ r ⱕ .96). These results American Psychiatric Association. (1980). Diagnostic and statistical man-
offered some discriminant validity. They suggest that the ARM ual of mental disorders (3rd ed.). Washington, DC: Author.
measures distinct aspects of the alliance not measured by the Bachelor, A. (1991). Comparison and relationship to outcome of diverse
WAI—the feeling of freedom or constriction in disclosing per- dimensions of the helping alliance as seen by client and therapist.
sonal concerns and the degree to which the client took responsi- Psychotherapy, 28, 534 –539.
Barkham, M., Rees, A., Shapiro, D. A., Stiles, W. B., Agnew, R. M.,
bility for session content and process. The distinctiveness of the
Halstead, J., Culverwell, A., & Harrington, V. M. G. (1996). Outcomes
Openness scale was consistent with Safran and Wallner’s (1991) of time-limited psychotherapy in applied settings: Replicating the Sec-
finding of relatively lower within-perspective correlations of WAI ond Sheffield Psychotherapy Project. Journal of Consulting and Clinical
scales with the CALPAS Patient Working Capacity scale, with Psychology, 64, 1079 –1085.
which the Openness scale overlaps conceptually (Agnew et al., Beck, A. T., Rush, A. J., Shaw, B. F., & Emery, G. (1979). Cognitive
1998). The much lower correlations involving Initiative could therapy of depression. New York: Guilford Press.
partly reflect that scale’s weaker internal consistency (Table 1) as Beck, A. T., Ward, C. H., Mendelson, M., Mock, J., & Erbaugh, J. (1961).
well as client characteristics, such as motivation for change, that An inventory for measuring depression. Archives of General Psychia-
are relatively independent of the core alliance dimensions. try, 4, 561–571.
Limitations of this study include its restrictions to clients diag- Bordin, E. S. (1979). The generalizability of the psychoanalytic concept of
working alliance. Psychotherapy: Theory, Research, and Practice, 16,
nosed with depression and to a relatively narrow and small sample
252–260.
of therapists who were also investigators. We know of no reason Bordin, E. S. (1994). Theory and research on the therapeutic working
why convergent validity of alliance instruments should be dis- alliance: New directions. In A. O. Horvath & L. S. Greenberg (Eds.), The
torted by diagnostic category. The therapist–investigators’ invest- working alliance: Theory, research and practice (pp. 13–37). New
ment in the project might have led them to be particularly careful York: Wiley.
and diligent in completing the numerous questionnaires and, per- Cecero, J. J., Fenton, L. R., Nich, C., Frankforter, T. L., & Carroll, K. M.
CONVERGENT VALIDITY OF THE ARM AND THE WAI 219

(2001). Focus on the therapeutic alliance: The psychometric properties alliance development: A typology of working alliance ratings. Journal of
of six measures across three treatments. Psychotherapy, 38, 1–11. Counseling Psychology, 47, 362–371.
Constantino, M. J., Castonguay, L. G., & Shut, A. J. (2002). The working Krupnick, J. L., Sotsky, S. M., Simmens, S., Moyer, J., Elkin, I., Watkins,
alliance: A flagship for the scientist–practitioner model in psychother- J., & Pilkonis, P. A. (1996). The role of the therapeutic alliance in
apy. In G. S. Tryon (Ed.), Counseling based on process research (pp. psychotherapy and pharmacotherapy outcome: Findings in the National
81–131). New York: Allyn & Bacon. Institute of Mental Health Treatment of Depression Collaborative Re-
Derogatis, L. R., Lipman, R. S., & Covi, L. (1973). SCL–90: An outpatient search Program. Journal of Consulting and Clinical Psychology, 64,
rating scale. Preliminary report. Psychopharmacology Bulletin, 9, 13– 532–539.
20. Marmar, C. R., Horowitz, M. J., Weiss, D. S., & Marziali, E. (1986). The
Dill-Standiford, T. J., Stiles, W. B., & Rorer, L. G. (1988). Counselor– development of the therapeutic alliance rating system. In L. S. Green-
client agreement on session impact. Journal of Counseling Psychol- berg & W. M. Pinsof (Eds.), The psychotherapeutic process: A resource
ogy, 35, 47–55. handbook (pp. 367–390). New York: Guilford Press.
Firth, J. A., & Shapiro, D. A. (1985). Prescriptive therapy manual for the Marmar, C. R., Weiss, D. S., & Gaston, L. (1989). Toward the validation
Sheffield Psychotherapy Project (PTRC Memo No. 734). (Available of the California Therapeutic Alliance Rating System. Psychological
from the Psychological Therapies Research Centre, University of Assessment, 1, 46 –52.
Leeds, 17 Blenheim Terrace, Leeds LS2 9JT, United Kingdom) Marziali, E. (1984). Three viewpoints on the therapeutic alliance: Similar-
Goldfried, M. R., & Merbaum, M. (1973). Behavior change through ities, differences and associations with psychotherapy outcome. Journal
self-control. New York: Holt, Rinehart & Winston. of Nervous and Mental Disease, 172, 417– 423.
Hartley, D. E., & Strupp, H. H. (1983). The therapeutic alliance: Its Norman, W. T. (1967). On estimating psychological relationships: Social
relationship to outcome in brief psychotherapy. In J. Masling (Ed.), desirability and self-report. Psychological Bulletin, 67, 273–293.
Empirical studies of psychoanalytic theories (Vol. 1, pp. 1–37). Hills- O’Malley, P. M., & Bachman, J. G. (1979). Self-esteem and education: Sex
dale, NJ: Analytic Press. and cohort comparisons among high school seniors. Journal of Person-
Hatcher, R. L. (1999). Therapists’ views on treatment alliance and collab- ality and Social Psychology, 37, 1153–1159.
oration in therapy. Psychotherapy Research, 9, 405– 425. Orlinsky, D. E., Grawe, K., & Parks, B. K. (1994). Process and outcome in
psychotherapy—Noch einmal. In A. E. Bergin & S. L. Garfield (Eds.),
Hatcher, R. L., & Barends, A. W. (1996). Patients’ view of the alliance in
Handbook of psychotherapy and behavior change (4th ed., pp. 270 –
psychotherapy: Exploratory factor analysis of three alliance measures.
376). New York: Wiley.
Journal of Consulting and Clinical Psychology, 64, 1326 –1336.
Ozer, D. J. (1985). Correlation and the coefficient of determination. Psy-
Hatcher, R. L., Barends, A., Hansell, J. & Gutfreund, M. J. (1995).
chological Bulletin, 97, 307–315.
Patients’ and therapists’ shared and unique views of the therapeutic
Raue, P. J., & Goldfried, M. R. (1994). The therapeutic alliance in
alliance: An investigation using confirmatory factor analysis in a nested
cognitive– behavior therapy. In A. O. Horvath & L. S. Greenberg (Eds.),
design. Journal of Consulting and Clinical Psychology, 63, 636 – 643.
The working alliance: Theory, research and practice (pp. 131–152).
Hill, C. E. (1989). Therapist techniques and client outcomes: Eight cases
New York: Wiley.
of brief psychotherapy. Newbury Park, CA: Sage.
Raue, P. J., Goldfried, M. R., & Barkham, M. (1997). The therapeutic
Hobson, R. F. (1985). Forms of feeling: The heart of psychotherapy.
alliance in psychodynamic–interpersonal and cognitive– behavioral ther-
London: Tavistock.
apy. Journal of Consulting and Clinical Psychology, 65, 582–587.
Horowitz, L. M., Rosenberg, S. E., Baer, B. A., Ureno, G., & Villasenor,
Safran, J. D., Crocker, P., McMain, S., & Murray, P. (1990). Therapeutic
V. S. (1988). Inventory of Interpersonal Problems: Psychometric prop-
alliance rupture as a therapy event for empirical investigation. Psycho-
erties and clinical applications. Journal of Consulting and Clinical
therapy, 27, 154 –165.
Psychology, 56, 885– 892. Safran, J. D., & Muran, J. C. (1996). The resolution of ruptures in the
Horvath, A. O. (1994). Empirical validation of Bordin’s pantheoretical therapeutic alliance. Journal of Consulting and Clinical Psychology, 64,
model of the alliance: The Working Alliance Inventory perspective. In 447– 458.
A. O. Horvath & L. S. Greenberg (Eds.), The working alliance: Theory, Safran, J. D., & Muran, J. C. (Eds.). (1998). The therapeutic alliance
research and practice (pp. 259 –286). New York: Wiley. in brief psychotherapy. Washington, DC: American Psychological
Horvath, A. O., & Bedi, R. P. (in press). The alliance. In J. C. Norcross Association.
(Ed.), Psychotherapy relationships that work: Therapist contributions Safran, J. D., & Wallner, L. K. (1991). The relative predictive validity of
and responsiveness to patient needs. New York: Oxford University two therapeutic alliance measures in cognitive therapy. Psychological
Press. Assessment, 3, 188 –195.
Horvath, A. O., & Greenberg, L. S. (1986). The development of the Shapiro, D. A., Barkham, M., Rees, A., Hardy, G. E., Reynolds, S., &
Working Alliance Inventory. In L. S. Greenberg & W. M. Pinsof (Eds.), Startup, M. J. (1994). Effects of treatment duration and severity of
The psychotherapeutic process: A research handbook (pp. 529 –556). depression on the effectiveness of cognitive/behavioral and psychody-
New York: Guilford Press. namic/interpersonal psychotherapy. Journal of Consulting and Clinical
Horvath, A. O., & Greenberg, L. S. (1989). Development and validation of Psychology, 62, 522–534.
the Working Alliance Inventory. Journal of Counseling Psychology, 36, Shapiro, D. A., & Firth, J. A. (1985). Exploratory therapy manual for the
223–233. Sheffield Psychotherapy Project (Memo No. 733). (Available from the
Horvath, A. O., & Greenberg, L. S. (Eds.). (1994). The working alliance: Psychological Therapies Research Centre, University of Leeds, 17 Blen-
Theory, research and practice. New York: Wiley. heim Terrace, Leeds LS2 9JT, United Kingdom)
Horvath, A. O., & Luborsky, L. (1993). The role of the therapeutic alliance Shapiro, D. A., Rees, A., Barkham, M., Hardy, G., Reynolds, S., & Startup,
in psychotherapy. Journal of Consulting and Clinical Psychology, 61, M. (1995). Effects of treatment duration and severity of depression on
561–573. the maintenance of gains following cognitive– behavioral and
Horvath, A. O., & Symonds, B. D. (1991). Relation between working psychodynamic–interpersonal psychotherapy. Journal of Consulting and
alliance and outcome in psychotherapy: A meta-analysis. Journal of Clinical Psychology, 63, 378 –387.
Counseling Psychology, 38, 139 –149. Snaith, R. P. (1974). Psychotherapy based on relaxation techniques. British
Kivlighan, D. M., Jr., & Shaughnessy, P. (2000). Patterns of working Journal of Psychiatry, 124, 473– 481.
220 STILES ET AL.

Stiles, W. B., Agnew-Davies, R., Hardy, G. E., Barkham, M., & Shapiro, in cognitive– behavioral therapy for depression. Journal of Consulting
D. A. (1998). Relations of the alliance with psychotherapy outcome: and Clinical Psychology, 67, 262–266, 894 –904.
Findings in the Second Sheffield Psychotherapy Project. Journal of Ten Berge, J. M. F. (1986). Rotation to perfect congruence and the
Consulting and Clinical Psychology, 66, 791– 802. cross-validation of component weights across populations. Multivariate
Stiles, W. B., Reynolds, S., Hardy, G. E., Rees, A., Barkham, M., & Behavioral Research, 21, 41– 64.
Shapiro, D. A. (1994). Evaluation and description of psychotherapy Tichenor, V., & Hill, C. E. (1989). A comparison of six measures of
sessions by clients using the Session Evaluation Questionnaire and working alliance. Psychotherapy, 26, 195–199.
the Session Impacts Scale. Journal of Counseling Psychology, 41, 175– Watson, J. C., & Greenberg, L. S. (1994). The alliance in experiential
therapy: Enacting the relationship conditions. In A. O. Horvath & L. S.
185.
Greenberg (Eds.), The working alliance: Theory, research and practice
Suh, C. S., Strupp, H. H, & O’Malley, S. S. (1986). The Vanderbilt process
(pp. 153–172). New York: Wiley.
measures: The Vanderbilt Psychotherapy Process Scale (VPPS) and the
Vanderbilt Negative Indicators Scale (VNIS). In L. S. Greenberg &
W. M. Pinsof (Eds.), The psychotherapeutic process: A research hand- Received August 17, 2000
book (pp. 285–324). New York: Guilford Press. Revision received February 2, 2002
Tang, T. Z., & DeRubeis, R. J. (1999). Sudden gains and critical sessions Accepted February 12, 2002 䡲

You might also like