Health Measurement Scales (Streiner - Norman) Cap. 4 y 7

You might also like

Download as pdf
Download as pdf
You are on page 1of 37
Chapter 4 Scaling responses Introduction Having devised a set of questions using the methods outlined in the previous chapter, we must choose 2 method by whieh responses prejudice?” may req ‘and sophisticated techniques o obtain val There has been a bewildering amount of research inthis area in disci ranging from psychology to economics, Often the results are cont the correct conclusions are frequently count ibe a wide variety of scaling methods, indicat commendations regarding a choice of methods. Some basic concepts In considering approaches tothe development of response scales its helpful to first consider the kinds divi blood press ‘The second related fe commonly referred to asthe level of mens urement, Ifthe response consists of named categories, such as particular symptoms, ajob classifica called a nominal variable, Ordered categ ‘educational level (less than high school, high school diploma, some college ‘or university, university degree, postgraduate degree) are cal ables. By contrast, variables in which the interval betwen re sta interval variables, Te (Celsius or Fahrenheit, isan interval variable. Get isknown are 38 | scau Resronses sponse ison a five-point or seven-point scale, are not level measurement, since we can never be sure thatthe di between ‘strongly disagree’ and ‘disagree’ isthe same as between ‘agree’ and “strongly agree: However, some methods have been devised to achieve interval level measurement ables on the other. Inthe tions, and differences among the means can be interpreted, and the broad class of techniques ca ‘an, therefore, be used for analysis, By contrast, since it makes no sense to speak ofthe average religion or average sex of a sample of people, nominal and ordinal data must be c ered as frequencies in in discussed in Chapter 5, However, little difficulty in deciding on the appropriate response method. Pethaps the most common error wher they are frequently employed in circumstances where the response is not, in fact, categorical. Attitudes and behaviours often lie on a continuum. When \weaska question lke‘Do you have there are varying degrees Ignoring the continuous nature of many respons ‘The frst one is fairly obvious: since different people may have different ideas 3, there will likely be luced into the responses, as well as uncertainty and confusion on the part of respondents, Have you ever hadachest Xray? yes no ‘Which ofthe following symptoms are you currently experiencing? ited choice of response in Fig. 4.2 might be responded to in one ) the first method effectively reduces a single number, and loss of information ‘The effect isa potent and a corresponding reduction in ‘The third problem, which isa consequence ofthe second, is that dichotomiz- ing a continuous variable leads to a loss of efficiency of the 1 bes, 67 percent ant as a continuous one; depending on how the measure was you needed 67 subjects to show an effect when the outcome is measured along a continuum, you would need 100 subjects to demonstrate Doctors carry aheavy responsibilty disagree — agree milly milly disagree strongly agree disagree disagree Fig. 42 baample of a continuous judgement 3° 40 | scaume sesronses the same effect when the outcome is dichotomized. When circumstances are not as ideal, the inflation in the required sample size can be 10 oF Hunter and Schmidt (1990) showed that ifthe dichotomy resulted ina 50-50 split, with half of the subjects in one group and half in the other, correlation of that instrument with another is reduced by 20 percent. Any ‘other spit cesults in a greater attenuation; ifthe result is that 10 per cent ofthe subjects are in one group and 90 per cent in the other, then the reduc 41 per cent We have demonstrated this result with real data on several occasions, Ina recent study of the cert ‘examinations in internal medicine in Canada, the reliability of the original scores, inter-rater and test-retest, was 0.76 and 0.47, respectively. These scores were then converted to a pass-fail decision and the reliability recalculated. The comparable statistics for these isions were 0.69 and 0.36, a loss of about 0,09 in reliability. There are two common, but invalid, objections to the use of multiple response levels. The first is thatthe researcher is only interested in whether respondents agree or disagree, soit is not worth the extra effort. This argument confuses measurement with decision-making; the decision ‘an always be made after the fact by establishing cutoff point on the response continuum, but information lost from the original responses cannot be recaptured, ‘The second argument is that the additional categories are only adding noise or error to the data; people cannot make finer judgements than ‘agree-disagree’ Although there maybe pat is true, in general, the evidence indicates that people are capable of much finer discriminations; this will be reviewed in a later section of the chapter here we discuss the appropriate number of response steps (p. 47) lar circumstances where this Continuous judgements ‘Accepting that many of the variables of interest to healthcare researchers are continuous rather than categorical, methods must be devised to quantify these jndgements, ‘The approaches that we will review fall nto three broad categories: Direct Estimation techniques, in which subjects are quired to indicate theit response by @ mark on a line or check ina box; (Comparative methods, in which subjects choose among a series of alternatives that have been previously calibrated bya separate criterion group; and Econometric method: ts describe their preference by anchoring it to extreme states (perfect health-death). Direct estimation methods Direct estimation methods are designed to elicit from the subject a direct the magnitude ofan attribute. The approach is usu ally straightforward, as in the cxample used above, where we asked for a response on a six-point scale ranging from ‘strongly agree’ to ‘strongly dis agree’ This is one of many variations, although all share many common fea tures. We begin by describing the main contenders, then we will explore their advantages and disadvantages, Visual analog scales ‘The visual analog scale (VAS) is the essence of simplicity—a line of fixed length, usually 100 mm, with anchors like‘No pain’ and'Pain as bad asitcould be’ at the extreme ends and no words describing intermediate positions. ‘An example is shown in Fig, 4.3. Respondents ate required to place a mark, usually an ‘X’ or a vertical line, onthe line corresponding to their perceived state, The VAS technique was introduced over 80 years ago (Hayes and Patterson 1921), at which time it was called the ‘graphic rating method but became popular in clinical psychology only in the 1960s, The method has been used extensively in medicine to assess a variety of constructs; pain (Huskisson 1974), mood (Aitken 1969), and functional capacity (Scott and Huskisson 1978), among many others. ‘The VAS has also been used for the measurement of change (Scott and Huskisson 1979). In this approach, researchers are interested in the percep- tions of the degree to which patients feel that chey have improved a a result of ‘treatment. The strategy used is to show patients, at the end of a course of ‘treatment, where they had marked the line prior to commencing treatment, and then asking them to indicate, by the second line, their present state. There ace a number of conceptual and methodological issves in the measurement of change, by VAS or other means, which wil be addressed in Chapter 11 Proponents are enthusiastic in their writings seyarding the advantages of the ‘method over its usual rival, a scale in which intermediate positions are labelled (ee ‘severe’; however, the authors frequently then demon- strate a substantial correlation between the two methods (Downie etal 1978) iow severe has your arthrti pain ben today? pain as 0 bbadasit in could be Fig, 43 The visual analogue scale (VAS), 1 | Scaune esronses suggesting thatthe advantages are more perceived than real. One aso suspects thatthe method provides an illusion of precision, since a number given to two decimal places (eg. a length m : racy of I per cent. Of course, although one can measure a response to this degree of precision, there is no guarantee thatthe response accurately repre- sents the underlying attribute tothe same degree of st using these ‘coarser ‘The simplicity ofthe VAS has co some evidence simple and appealing as researchers; in one study described above (Huskisson 1974), 7 per cent of is were unable to complete a VAS, a8 against 3 per cent for an adjectival popularity, although there is there may be an age effec the ‘a modification ofthe technique. In vertical Even among people who are com! ed ificltyin using the VAS nding -ad ofa horizontal have used a iy easier for oder people to complete table with the method, the VAS has ‘a number of serious drawbacks. Scale co ond to give litle thought to the wording of the end-points, yet pat igs of pain are highly dependent on the exact wording of the descriptors (Seymour et ‘often easy to describe (none of the attri jermometer; which that everyone imagination. Perhaps the most serious problem with the VAS isnot inherent attribute of interes. continuous svosements | 43 How mch fl should he courts have cing whether 'o en life-span? Nore They should be te the oe dele NoRoke Ave —Aminar—Armioe —Avery”_—‘Theyshouldbe sal mire isle ale marred stl deers Fig. 44 Examples of adjectival scale, In conclusion, although the VAS appears to be sufficient evidence that other methods may yield more precise ‘measurement and possibly increased levels of satisfaction among respondents. Adjectival scales, are shown in Fig 44, The top scale uses discrete boxes, forcing the jondent to select among the four alternatives (the actual numberof boxes ary), while the bottom scale looks more like a continuous line llow= ing the person to place the mark even on a dividing ine. This gives more an ilusion of greater exbility than the reality, as the person who scores the some rule to assign the answer into one ofthe ex ays use the category on the left, or the one closer reported health (excellent/very good/goods ‘variant of this, used primarily for called the Juster scale (Hoek and Gendall 1993), As seen in Fig. adjectival descriptors of probabil good psychometric properties. us responses, bears close the exception that additional des positions. Although proponents of the VA “4 ONS 10 Cora, pracy conn Atm ce Ver pobisy Prot 5 Fitygod pouty to give the rater a clear, unequivocal conception of the continuum along hich he is to evaluate objects ..'(p. 292). scales, with one exception, ange from none or litle of the other. In cont asin Fig. 4.6. The descriptors mast often tap agreement ‘measuring almo: agreeabl ructors often want tokeep the labels under the boxes the same from one item to another in order to reduce the burden on the respondents. However this may lead where descriptors for agreement appear on items that are better described ‘The worlds in danger of nuclear holocaust. Litt} strongly ‘agree —noopinion disagree strongly agree disagree Fig. 46 Gamples of

You might also like