Chapter 4
Scaling responses
Introduction
Having devised a set of questions using the methods outlined in the previous
chapter, we must choose 2 method by whieh responses
prejudice?” may req
‘and sophisticated techniques o obtain val
There has been a bewildering amount of research inthis area in disci
ranging from psychology to economics, Often the results are cont
the correct conclusions are frequently count
ibe a wide variety of scaling methods, indicat
commendations regarding a choice of methods.
Some basic concepts
In considering approaches tothe development of response scales its helpful
to first consider the kinds
divi
blood press ‘The second
related fe commonly referred to asthe level of mens
urement, Ifthe response consists of named categories, such as particular
symptoms, ajob classifica
called a nominal variable, Ordered categ
‘educational level (less than high school, high school diploma, some college
‘or university, university degree, postgraduate degree) are cal
ables. By contrast, variables in which the interval betwen re
sta interval variables, Te
(Celsius or Fahrenheit, isan interval variable. Get
isknown are38 | scau Resronses
sponse ison a five-point or seven-point scale, are not
level measurement, since we can never be sure thatthe di
between ‘strongly disagree’ and ‘disagree’ isthe same as between ‘agree’ and
“strongly agree: However, some methods have been devised to achieve interval
level measurement
ables on the other. Inthe
tions, and differences among the means can be interpreted, and the broad
class of techniques ca ‘an, therefore, be used for
analysis, By contrast, since it makes no sense to speak ofthe average religion or
average sex of a sample of people, nominal and ordinal data must be c
ered as frequencies in in
discussed in Chapter 5, However,
little difficulty in deciding on the appropriate response method.
Pethaps the most common error wher
they are frequently employed in circumstances where the response is not,
in fact, categorical. Attitudes and behaviours often lie on a continuum. When
\weaska question lke‘Do you have
there are varying degrees
Ignoring the continuous nature of many respons
‘The frst one is fairly obvious: since different people may have different ideas
3, there will likely be
luced into the responses, as well as uncertainty and confusion on
the part of respondents,
Have you ever hadachest Xray? yes no
‘Which ofthe following symptoms are you currently experiencing?
ited choice of response
in Fig. 4.2 might be responded to in one
) the first method effectively reduces
a single number, and
loss of information
‘The effect isa potent
and a corresponding reduction in
‘The third problem, which isa consequence ofthe second, is that dichotomiz-
ing a continuous variable leads to a loss of efficiency of the
1 bes, 67 percent
ant as a continuous one; depending on how the measure was
you needed 67 subjects to show an effect when the outcome is
measured along a continuum, you would need 100 subjects to demonstrate
Doctors carry aheavy responsibilty
disagree —
agree milly milly disagree strongly
agree disagree disagree
Fig. 42 baample of a continuous judgement
3°40 | scaume sesronses
the same effect when the outcome is dichotomized. When circumstances are
not as ideal, the inflation in the required sample size can be 10 oF
Hunter and Schmidt (1990) showed that ifthe dichotomy resulted
ina 50-50 split, with half of the subjects in one group and half in the other,
correlation of that instrument with another is reduced by 20 percent. Any
‘other spit cesults in a greater attenuation; ifthe result is that 10 per cent ofthe
subjects are in one group and 90 per cent in the other, then the reduc
41 per cent
We have demonstrated this result with real data on several occasions,
Ina recent study of the cert ‘examinations in internal medicine in
Canada, the reliability of the original scores, inter-rater and test-retest, was
0.76 and 0.47, respectively. These scores were then converted to a pass-fail
decision and the reliability recalculated. The comparable statistics for these
isions were 0.69 and 0.36, a loss of about 0,09 in reliability.
There are two common, but invalid, objections to the use of multiple
response levels. The first is thatthe researcher is only interested in whether
respondents agree or disagree, soit is not worth the extra effort. This
argument confuses measurement with decision-making; the decision
‘an always be made after the fact by establishing cutoff point on the response
continuum, but information lost from the original responses cannot be
recaptured,
‘The second argument is that the additional categories are only adding
noise or error to the data; people cannot make finer judgements than
‘agree-disagree’ Although there maybe pat
is true, in general, the evidence indicates that people are capable of much
finer discriminations; this will be reviewed in a later section of the chapter
here we discuss the appropriate number of response steps (p. 47)
lar circumstances where this
Continuous judgements
‘Accepting that many of the variables of interest to healthcare researchers are
continuous rather than categorical, methods must be devised to quantify these
jndgements,
‘The approaches that we will review fall nto three broad categories:
Direct Estimation techniques, in which subjects are quired to indicate theit
response by @ mark on a line or check ina box;
(Comparative methods, in which subjects choose among a series of alternatives
that have been previously calibrated bya separate criterion group; and
Econometric method: ts describe their preference by
anchoring it to extreme states (perfect health-death).
Direct estimation methods
Direct estimation methods are designed to elicit from the subject a direct
the magnitude ofan attribute. The approach is usu
ally straightforward, as in the cxample used above, where we asked for
a response on a six-point scale ranging from ‘strongly agree’ to ‘strongly dis
agree’ This is one of many variations, although all share many common fea
tures. We begin by describing the main contenders, then we will explore their
advantages and disadvantages,
Visual analog scales
‘The visual analog scale (VAS) is the essence of simplicity—a line of fixed
length, usually 100 mm, with anchors like‘No pain’ and'Pain as bad asitcould
be’ at the extreme ends and no words describing intermediate positions.
‘An example is shown in Fig, 4.3. Respondents ate required to place a mark,
usually an ‘X’ or a vertical line, onthe line corresponding to their perceived
state, The VAS technique was introduced over 80 years ago (Hayes and
Patterson 1921), at which time it was called the ‘graphic rating method but
became popular in clinical psychology only in the 1960s, The method has
been used extensively in medicine to assess a variety of constructs; pain
(Huskisson 1974), mood (Aitken 1969), and functional capacity (Scott and
Huskisson 1978), among many others.
‘The VAS has also been used for the measurement of change (Scott and
Huskisson 1979). In this approach, researchers are interested in the percep-
tions of the degree to which patients feel that chey have improved a a result of
‘treatment. The strategy used is to show patients, at the end of a course of
‘treatment, where they had marked the line prior to commencing treatment,
and then asking them to indicate, by the second line, their present state. There
ace a number of conceptual and methodological issves in the measurement of
change, by VAS or other means, which wil be addressed in Chapter 11
Proponents are enthusiastic in their writings seyarding the advantages of the
‘method over its usual rival, a scale in which intermediate positions are labelled
(ee ‘severe’; however, the authors frequently then demon-
strate a substantial correlation between the two methods (Downie etal 1978)
iow severe has your arthrti pain ben today?
pain as 0
bbadasit in
could be
Fig, 43 The visual analogue scale (VAS),1 | Scaune esronses
suggesting thatthe advantages are more perceived than real. One aso suspects
thatthe method provides an illusion of precision, since a number given to two
decimal places (eg. a length m :
racy of I per cent. Of course, although one can measure a response to this
degree of precision, there is no guarantee thatthe response accurately repre-
sents the underlying attribute tothe same degree of
st using these ‘coarser
‘The simplicity ofthe VAS has co
some evidence simple and appealing as
researchers; in one study described above (Huskisson 1974), 7 per cent of
is were unable to complete a VAS, a8 against 3 per cent for an adjectival
popularity, although there is
there may be an age effec the
‘a modification ofthe technique. In
vertical
Even among people who are com!
ed ificltyin using the VAS nding
-ad ofa horizontal have used a
iy easier for oder people to complete
table with the method, the VAS has
‘a number of serious drawbacks. Scale co ond to give litle thought
to the wording of the end-points, yet pat igs of pain are highly
dependent on the exact wording of the descriptors (Seymour et
‘often easy to describe (none of the attri
jermometer; which
that everyone
imagination.
Perhaps the most serious problem with the VAS isnot inherent
attribute of interes.
continuous svosements | 43
How mch fl should he courts have
cing whether 'o en life-span?
Nore
They should be
te the oe dele
NoRoke Ave —Aminar—Armioe —Avery”_—‘Theyshouldbe
sal mire isle ale marred stl deers
Fig. 44 Examples of adjectival scale,
In conclusion, although the VAS
appears to be sufficient evidence that other methods may yield more precise
‘measurement and possibly increased levels of satisfaction among respondents.
Adjectival scales,
are shown in Fig 44, The top scale uses discrete boxes, forcing the
jondent to select among the four alternatives (the actual numberof boxes
ary), while the bottom scale looks more like a continuous line llow=
ing the person to place the mark even on a dividing ine. This gives more an
ilusion of greater exbility than the reality, as the person who scores the
some rule to assign the answer into one ofthe ex
ays use the category on the left, or the one closer
reported health (excellent/very good/goods
‘variant of this, used primarily for
called the Juster scale (Hoek and Gendall 1993), As seen in Fig.
adjectival descriptors of probabil
good psychometric properties.
us responses, bears close
the exception that additional des
positions. Although proponents of the VA“4
ONS
10 Cora, pracy conn
Atm ce
Ver pobisy
Prot
5 Fitygod pouty
to give the rater a clear, unequivocal conception of the continuum along
hich he is to evaluate objects ..'(p. 292).
scales, with one exception,
ange from none or litle of
the other. In cont
asin Fig. 4.6. The descriptors mast often tap agreement
‘measuring almo:
agreeabl
ructors often want
tokeep the labels under the boxes the same from one item to another in order
to reduce the burden on the respondents. However this may lead
where descriptors for agreement appear on items that are better described
‘The worlds in danger of nuclear holocaust.
Litt}
strongly ‘agree —noopinion disagree strongly
agree disagree
Fig. 46 Gamples of