PIIS109830151060232X

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Blackwell Science, LtdOxford, UKVHEValue in Health1098-30152004 ISPORSeptember/October 20047Supplement 1S22S26Original ArticleRasch Analysis in QoL Instrument DevelopmentTennant et al.

Volume 7 • Supplement 1 • 2004


V A L U E I N H E A L T H

Application of Rasch Analysis in the Development and


Application of Quality of Life Instruments

Alan Tennant, PhD,1 Stephen P. McKenna, PhD,2 Peter Hagell, PhD3


1
Academic Unit of Musculoskeletal & Rehabilitation Medicine, University of Leeds, Leeds, UK; 2Galen Research, Manchester, UK;
3
Department of Nursing, Lund University, Lund, Sweden

ABST R ACT

This paper discusses recent advances that have been made order to ensure that they provide unidimensional meas-
in the field of psychometrics, specifically, the application urement. By ensuring that scales are based on the same
of Rasch analysis to the instrument development process. measurement model and that they fit the Rasch model it is
It emphasizes the importance of assessing the fundamen- possible for QoL scores to be compared across diseases by
tal scaling properties of an instrument prior to consider- means of cocalibration and item banking.
ation of traditional psychometric indicators. The paper Keywords: classical test theory, differential item func-
introduces Rasch analysis and shows how it has been tioning, needs-based quality of life, Rasch analysis,
applied in the development of needs-based measures in undimensionality.

As long as primitive counts and raw scores are rou- way to ensure that earlier measures are updated to
tinely mistaken for measures by our colleagues in fit the Rasch model [4].
social, educational, and health research, there is no The quest for measurement is an important part
hope of their professional activities ever developing of advancing science, and the type of measurement,
into a reliable or useful science [1]. which allows for arithmetic operations such as
A crucial aspect of the application of the needs addition and subtraction, is known as fundamental
model is that it has been allied with the most measurement [5]. Most outcome measures used in
advanced psychometric methods. This paper argues health care are ordinal in nature, precluding such
the case for the use of Rasch analysis to ensure that arithmetic operations [6]. Many such measures
scales are unidimensional [2], a fundamental focus on attributes that are not directly measurable,
requirement of construct validity [3]. The paper such as pain, self-esteem, or quality of life. These
gives an overview of the nature of Rasch analysis measures give a “manifest score” of the construct
and shows how it aids valid across-disease compar- being measured. Consequently, most outcomes are
isons of quality of life (QoL) by means of co- expressed as ordinal manifest scores, indicating
calibration and the development of item banks. some rank on a perceived underlying latent trait.
Over the last two decades an analytical approach Although there is a substantial body of nonpara-
has been adopted that is pivotal to both judging metric statistics to analyze such information, the
the quality of existing outcome instruments and importance of the calculation of change scores in
in developing new instruments. This approach is clinical trial analysis (and attributes of measure-
called Rasch analysis, after its originator, a Danish ment such as the “effect size” [7]), which require
mathematician. He developed Poisson models for normally distributed interval-level measurement,
reading, intelligence, and achievement tests, the last gives urgency to achieving a quality of measurement
becoming known as the Rasch model [2]. Rasch that will sustain such arithmetic operations.
analysis has been employed in the development of In order to achieve such fundamental measure-
most of the needs-based QoL instruments, to ensure ment certain properties are required. These are
that the resulting scales are unidimensional. Only reviewed in detail elsewhere [8,9] but essentially
the earliest developed needs-based measures did not they are:
benefit from this approach and studies are under-
• the numerical properties of order (one mark on
the ruler represents more or less of the con-
Address correspondence to: Stephen McKenna, Galen
Research, Enterprise House, Manchester Science Park, Lloyd struct than another);
Street North, Manchester M15 6SE, UK. E-mail: • addition (points on rulers may be added
smckenna@galen-research.com together); and

© ISPOR 1098-3015/04/$15.00/S22 S22–S26 S22


Rasch Analysis in QoL Instrument Development S23

• specific objectivity (the calibration of the ruler the unit of measurement. A logit is the distance
(item set or questions) is independent of the along the line of the variable that increases the odds
persons used to calibrate and vice versa). of observing the event by a factor of 2.718. There is
a clear relation between the ability–difficulty differ-
Where data fit the Rasch model these properties
ence, and the probability of affirming an item or
are confirmed and fundamental measurement fol-
undertaking a task. For example, where the differ-
lows. On a more formal level, the theory of simul-
ence between a patients’ ability and the item or task
taneous conjoint measurement [10] provides the
difficulty is zero, the probability is 0.5. Where the
mechanism for translating manifest to latent scores:
difference is +1 logit (that is, the patient has greater
Rasch analysis delivers conjoint measurement when
ability—or more of the trait—than expressed by the
data fit the model.
item) the probability is 0.73, or 0.27 if the differ-
The Rasch model is a unidimensional model that
ence is -1.0. Where the difference is ±3 logits then
has two main assertions:
the probabilities are 0.95 and 0.05, respectively.
1. that the easier the item is, the more likely it will Differential Item Functioning (DIF) can also be
be passed (affirmed); and examined by fitting data to the Rasch model [15].
2. the more able the patient, the more likely they Essentially, the scale should work in the same way,
will pass (affirm) an item (or do a task) com- irrespective of the group assessed. Thus, the proba-
pared to a less able patient. bility of being able to do a task, or affirming an
item, for patients at the same level of ability (or, for
Unidimensionality is a prerequisite to the summa-
example, with the same QoL) should remain the
tion of any set of items [3,11,12]. The Rasch model
same across groups. Assessment of DIF can yield
assumes that the probability of a given patient
crucial information about the measurement equiva-
“passing” an item or task is a logistic function of
lence of an instrument between various cultural
the relative distance between the item location
groups [16] but should also be applied across gen-
parameter (the difficulty of the task) and the
der and age groups within those cultures.
respondent location parameter (the ability of the
In comparison with classical test theory, the
patient), and only a function of that difference.
Rasch model provides a means of assessing a range
Expressed formally, this gives:
of additional measurement properties, increasing
the information available about a scale’s perform-
e (q -bi )
pi (q) = ance [17–19]. The model is one of many used in this
1 + e (q -bi ) way, which are generally subsumed under the rubric
where pi(q) is the probability that patients with abil- of Item Response Theory (IRT) [20,21]. The Rasch
ity q will be able to do item (task) i, and b is the item model is known as the one-parameter model within
(task) difficulty parameter. The model can be this framework, but it has unique properties, which
extended to cope with items with more than two are crucial to attaining conjoint measurement [22],
response categories. From this, the expected pattern a prerequisite for the calculation of change scores
of responses to a set of items or tasks is determined [7].
given the estimated q and b. When the observed The Rasch model was readily adopted in rehabil-
response pattern coincides with or does not deviate itation in the late 1980s [23], as the language of
greatly from the expected response pattern, the ability and difficulty easily transferred from educa-
items fit the measurement model and constitute a tion. Patients undergoing rehabilitation have a
true Rasch scale [13]. Various fit statistics determine given level of ability. In order to assess this level they
whether or not the data do fit the model, and these can be presented with a range of tasks requiring dif-
tend to be software dependent, although all work fering degrees of ability. Since then the approach has
on the principal of looking at the deviation of the become used with a wide range of clinical and diag-
observed data from the model expectation. Finally, nostic groups [24,25]. All recent needs-based qual-
where there is local independence of items (that is, ity of life instruments are developed using this
no residual associations in the data after the Rasch approach [26–32].
trait has been removed), this, taken together with fit Given that both patients and items are calibrated
to the model, supports the contention that the scale on the same underlying metric trait, the potential
is unidimensional [14]. for innovation in measurement is considerable.
Assuming that the data fit, the Rasch model Consider for example the current debate about
transforms them from ordinal scores into interval disease-specific and generic QoL measures. Where
level measurement with the logit (log odds unit) as scales are based on the same theoretical unidimen-
S24 Tennant et al.

sional construct, items from different diseases can the potential implications should not be underesti-
be calibrated on the same scale, given that some mated [43–45].
items (that are free of DIF by diagnosis) common to The ability of a scale to provide fundamental
both scales are employed. This provides disease-spe- measurement should be established prior to the
cific and comparative QoL measures by “item bank- more commonly reported psychometric attributes.
ing” items or questions onto the same underlying Rasch analysis offers a method of ensuring that key
metric [33–35]. Currently this approach, based on measurement assumptions are tested and, where
the needs-based model of QoL, is being used to data fit the model, arithmetic operations may be
establish an item bank for disease-specific QoL undertaken. It has particular value in the develop-
measures in the rheumatic diseases [36–38]. A sim- ment of new measures, specifically in guiding item
ilar exercise is planned for dermatology and links reduction. Traditional methods of item reduction
between these two disease areas could be made pos- that rely on item–total correlations and/or indices of
sible by means of the Psoriatic Arthritis Quality of internal consistency can have unfortunate effects on
Life (PSAQoL) measure [37]. the sensitivity of measures and their ability to pro-
The needs-based QoL measures all have the vide valid scores at the extremes of the construct
same theoretical basis, are unidimensional (insofar range. This is because items at the extreme of the
as their items fit the Rasch model) and have good measurement range are generally discarded because
traditional psychometric properties. Not only do too many or too few respondents affirm them. In
they work as effective outcome measures in clinical reality, these “extreme” items may be the most
trials but they also offer the potential for allowing important in a scale—extending its range of cover-
valid comparisons of QoL to be made across dis- age of the construct.
eases [39] and between healthy and diseased popu-
lations [40]. While it has been common practice to References
use generic health status measures such as the SF-
36 to make such comparisons, the results have 1 Wright BD. Common sense for measurement.
Rasch Meas Trans 1999;13:704–5.
been both misleading and invalid [39]. This is
2 Rasch G. Probabilistic Models for Some Intelli-
because, although a question is expressed in the
gence and Attainment Tests. Chicago: University of
same way for all respondents, different types of Chicago Press, 1960 (Reprinted 1980).
patients who have had different experiences inter- 3 Streiner D, Norman G. Health Measurement
pret it differently. For example, a “yes” response to Scales. Oxford: Oxford University Press, 1989.
a question about feeling tired can represent a very 4 McKenna SP, Whalley D, Cook S. Improving the
different response for a healthy person and one sensitivity of the Quality of Life in Depression
with rheumatoid arthritis. This explains why sur- Scale (QLDS). Qual Life Res 2002;11:625.
prising results are frequently obtained for cross- 5 Ellis B. Basic Concepts in Measurement. Cam-
disease comparisons. For example, data collected bridge: Cambridge University Press, 1966.
with the SF-36 suggest both that individuals with 6 Svensson E. Guidleines to statistical evaluation of
data from rating scales and questionnaires. J Reha-
psoriasis have worse scores than patients with
bil Med 2001;33:47–8.
arthritis, cancer, and myocardial infarction [41]
7 Kaziz L, Anderson JJ, Meenan RF. Effect sizes for
and that such patients have comparable or even interpreting changes in health status. Med Care
better scores than those experienced by an average 1989;27:S178–S189.
population [42]. 8 Andrich D. Rasch Models for Measurement.
Series: Quantitative Applications in the Social Sci-
ences No. 68. London: Sage Publications, 1988.
Summary 9 Embretson SE, Reise SP. Item Response Theory for
Only occasionally do we see concerns raised about Psychologists. NJ: Lawrence Erlbaum, 2000.
inappropriate analysis of data that are erroneously 10 Luce RD, Tukey JW. Simultaneous conjoint meas-
assumed to be at the interval level [6]. The extent to urement: a new type of fundamental measurement.
J Math Psychol 1964;1:1–27.
which analyses of reliability, validity, and respon-
11 Rasch G. On general laws and the meaning of
siveness are compromised by ignoring such assump-
measurement in psychology. In: Neyman J, ed.,
tions is unknown. It is also unknown at present to Proceedings of the Fourth Berkeley Symposium on
what extent the misuse of ordinal manifest scores Mathematical Statistics and Probability, IV. Berke-
compromises the results of clinical trial analyses ley CA: University of California Press, 1961.
when these scores are used to calculate changes 12 Wright BD, Masters GN. Rating Scale Analysis.
across experimental and control groups. However, Chicago: MESA Press, 1982.
Rasch Analysis in QoL Instrument Development S25

13 Van Alphen A, Halfens R, Hasman A, Imbos T. 29 Whalley D, McKenna SP, Dewar AL, et al. A new
Likert or Rasch? Nothing is more applicable than a instrument for assessing quality of life in atopic
good theory. J Adv Nurs 1994;20:196–201. dermatitis: International Development of the Qual-
14 Smith RM. Fit analysis in latent trait measurement ity of Life Index for Atopic Dermatitis (QoLIAD).
models. J Appl Meas 2000;2:199–218. Br J Dermatol 2004;150:274–83.
15 Holland PW, Wainer H, eds. Differential Item 30 Whalley D, McKenna SP, Dewar AL, et al. Quality
Functioning. Mahwah, NJ: Lawrence Erlbaum of life in adults with atopic dermatitis—the inter-
Associates, 1993. national development of the QoLIAD. Qual Life
16 Smith RM. Applications of Rasch Measurement. Res 2000;9:322.
Sacramento: JAM Press, 1992. 31 McKenna SP, Cook SA, Whalley D, et al. Devel-
17 Cella DF, Lloyd SR, Wright BD. Cross-cultural opment of the PSORIQoL, a psoriasis-specific
instrument equating: current research and future measure of quality of life designed for use in clin-
directions. In: Spilker B, ed., Quality of Life and ical practice and trials. Br J Dermatol 2003;149:
Pharmacoeconomics in Clinical Trials 2nd ed. 323–31.
Philadelphia: Lippincott-Raven Publishers, 1996. 32 Whalley D, McKenna SP, Dewar AL, et al. Inter-
18 Bond TG, Fox CM. Applying the Rasch Model: national development of a measure to assess qual-
Fundamental Measurement for the Human Sci- ity of life in childhood atopic dermatitis—the
ences. Mahwah, NJ: Lawrence Erlbaum Associ- PIQOL-AD. Qual Life Res 2000;9:302.
ates, Inc., 2001. 33 Dobby J, Duckworth D. Objective Assessment by
19 Smith EV Jr. Evidence for the reliability of meas- Means of Item Banking. Schools Council Exami-
ures and validity of measure interpretation: a nation Bulletin 40. London: Evans/Methuen Edu-
Rasch measurement perspective. J Appl Meas cational, 1979.
2001;2:281–311. 34 Revicki DA, Cella DF. Health status assessment for
20 Birnbaum A. Some latent trait models and their use the twenty-first century: item response theory, item
in inferring an examinee’s ability. In: Lord FM, banking and computer adaptive testing. Qual Life
Novick MR, eds., Statistical Theories of Mental Res 1997;6:595–600.
Test Scores. Reading, MA: Addison-Wesley, 1968. 35 Wainer H. Computerized Adaptive Testing (2nd
21 Van der Linden WJ, Hambleton RK, eds. Hand- ed.). Mahwah, NJ: Lawrence Erlbaum Associates,
book of Modern Item Response Theory. New 2000.
York: Springer, 1997. 36 McKenna SP, Doward LC, Whalley D, et al. The
22 Perline R, Wright BD, Wainer H. The Rasch model development of the PsAQoL: a quality of life
as additive conjoint measurement. Appl Psychol instrument specific to Psoriatic Arthritis. Ann
Meas 1979;3:237–56. Rheum Dis 2004;63:162–9.
23 Silverstein B, Kilore KM, Fisher WP, et al. Apply- 37 Doward LC, Whalley D, Dewar AL, et al. The
ing psychometric criteria to functional assessment development of the SLE-QoL: a quality of life
in medical rehabilitation: I. Exploring unidimen- instrument specific to Systemic Lupus Erythemato-
sionality. Arch Phys Med Rehabil 1991;72:631–7. sus. Qual Life Res 1999;8:609.
24 Haley SM, McHorney CA, Ware JE Jr. Evaluation 38 Doward LC, Spoorenberg A, Cook SA, et al. The
of the MOS SF-36 Physical Functioning Scale (PF- development of the ASQoL: a quality of life instru-
10): I. Unidimensionality and reproducibility of the ment specific to Ankylosing Spondylitis. Ann
Rasch item scale. J Clin Epidemiol 1994;47:671– Rheum Dis 2003;62:20–6.
84. 39 Cook SA, Whalley D. Looking for common
25 Shulman JA, Wolfe EW. Development of a nutri- ground: a first step towards comparing quality of
tion self-efficacy scale for prospective physicians. J life across diseases. Proc Br Psychol Soc 2001;9:64.
Appl Meas 2000;1:107–30. 40 Wirén L, Whalley D, McKenna SP, Wilhelmsen
26 Doward LC, McKenna SP, Kohlmann T, et al. The L. Application of a disease-specific, quality-of-
international development of the RGHQoL: a life measure (QoL-AGHDA) in growth hor-
quality of life measure for recurrent genital herpes. mone-deficient adults and a random population
Qual Life Res 1998;7:143–53. sample in Sweden: validation of the measure by
27 McKenna SP, Whalley D, Renck-Hooper U, et al. Rasch analysis. Clin Endocrinol (Oxf)
The development of a quality of life instrument for 2000;52:143–5.
use with post-menopausal women with urogenital 41 Rapp SR, Feldman SR, Exum ML, et al. Psoriasis
atrophy in the UK and Sweden. Qual Life Res causes as much disability as other major medical
1999;8:393–8. diseases. J Am Acad Dermatol 1999;41:401–7.
28 McKenna SP, Doward LC, Alonso J, et al. The 42 Nichol MB, Margolis JE, Lippa E, et al. The appli-
QoL-AGHDA: an instrument for the assessment of cation of multiple quality of life instruments in
quality of life in adults with growth hormone defi- individuals with mild-to-moderate psoriasis. Phar-
ciency. Qual Life Res 1999;8:373–83. macoeconomics 1996;10:644–53.
S26 Tennant et al.

43 Merbitz C, Morris J, Grip JC. Ordinal scales and 45 Streiner DL, Norman GR. Health Measurement
foundations of misinference. Arch Phys Med Reha- Scales: a Practical Guide to Their Development and
bil 1989;70:308–12. Use 2nd ed. Oxford: Oxford University Press,
44 Wright BJ, Linacre JM. Observations are always 1995.
ordinal; measurements, however, must be interval.
Arch Phys Med Rehabil 1989;70: 857–60.

You might also like