Assessment instruments for patients with fibromyalgia:

properties, applications and interpretation
F. Salaffi1, P. Sarzi-Puttini2, A. Ciapetti1, F. Atzeni2

Rheumatology Department, Polytechnic ABSTRACT symptoms such as fatigue, sleep distur-
University of Marche, Ancona, Italy; A comprehensive assessment of the bances, psychological and cognitive al-
Rheumatology Unit, L. Sacco University multiple symptom domains associated terations, headache, migraine, variable
Hospital, Milan, Italy.
with fibromyalgia (FM) and the impact bowel habits, diffuse abdominal pain,
Fausto Salaffi, MD of FM on multidimensional aspects of and increased urinary frequency (1-
Piercarlo Sarzi Puttini, MD
function should form a routine part of 3). It affects at least 2% of the general
Alessandro Ciapetti, MD
Fabiola Atzeni, MD the care of FM patients. Clinical trials population in Italy, and more than 90%
Please address correspondence to:
and long-term clinical registries have of the patients are female (4, 5). The
Fausto Salaffi, MD, used various outcome measures, but societal importance of this condition is
Clinica Reumatologica, the key domains include pain, fatigue, underlined by the fact that its economic
Università Politecnica delle Marche, disturbed sleep, physical functioning, consequences are as great as those re-
c/o Ospedale “A. Murri”, emotional functioning, patient global lated to chronic low back pain (6). FM
Via dei Colli 52, 60035 Jesi, Ancona, ratings of satisfaction, and their health- is frequently associated with depres-
Italy. related quality of life (HRQL). A number sion, anxiety, memory and concentra-
of measures have been ‘‘borrowed’’ from tion difficulties, and accompanied by
Received on November 26, 2009; accepted the fields of rheumatoid arthritis, psori- other chronic painful disorders.
in revised form on December 10, 2009.
atic arthritis, ankylosing spondylitis and Its outcomes are not clear, but recent
Clin Exp Rheumatol 2009: 27 (Suppl. 56): adapted to FM, and others are being studies suggest that FM patients are
developed specifically for FM. However, characterised by an increased number
© Copyright CLINICAL AND
despite the burgeoning theoretical litera- of physician visits, a self-reported re-
ture and the proliferation of instruments duction in the ability to perform daily
for measuring various health status do- activities, a reduced health-related qual-
Key words: Fibromyalgia, assessment
mains, no unified approach has been ity of life (HRQL), and an increased
instruments, pain, fatigue, disturbed
developed and there is little agreement risk of qualifying for a disability pen-
sleep, health-related quality of life,
concerning the meaning of the results. sion (7-11).
clinimetric properties.
There is, therefore, still a need for fur- It is difficult to evaluate the effects of FM
ther consensus and the development of a therapy because of the many aspects of
core set of measures and response crite- the syndrome, which also explains why
ria, more refined measuring instruments, it is usually treated with a wide range
standardised assessor training, cross- of treatments. Although some thera-
cultural adaptations of health status pies have been tested in randomised
questionnaires, electronic data capture, controlled trials (RCTs), the lack of
and the introduction of standardised standardisation and outcome measures
quantitative measurements into routine has prevented any clear evaluation of
clinical care. This article discusses the their effects. In an attempt to identify
advantages and limitations of a selection the appropriate outcome domains, a
of both newly developed and well-estab- multidimensional set of core symptoms
lished and validated distress screening (12) has been proposed for use in clini-
instruments that underlines the continu- cal trials that includes pain, tenderness,
ing challenge of assessing FM. patient global status, fatigue, HRQL,
physical function, disturbed sleep, de-
Introduction pression and anxiety, and dyscognition
Fibromyalgia (FM) is a chronic condi- (cognitive dysfunction), and received
tion characterised by generalised pain a high level of consensus among the
with characteristic tender points upon attendees of OMERACT 9 (Outcome
physical examination that is often ac- Measures in Rheumatoid Arthritis Clin-
Competing interests: none declared. companied by a number of associated ical Trials) (13).

FM assessment instruments / F. Salaffi et al. REVIEW

However, given the multifaceted nature customary range for an NRS is 0–100. one scale, they have been transformed
of FM and the new therapies currently It is even possible to determine the in- into a 0-10 scale (30).
being tested (14), further measures are tensity of pain accurately by means of
needed in order to develop a reliable a telephone or computerised telephone Patient self-report questionnaires
and valid composite patient-reported interview, with the NRS data given by The development of a clinical science
outcome (PRO) response measure that the patient being recorded in a computer of pain assessment using patient self-
more accurately assesses treatment ef- database by an operator or directly via report questionnaires has led to the
fects (12). The validity, reliability and the telephone keyboard. creation of numerous instruments for
responsiveness of PRO data in evalu- All three measures closely correlate evaluating various types and subtypes
ating and monitoring patients with with each other, although the correla- of chronic pain conditions and their
rheumatic conditions have been clearly tion between NRS and VAS is the clos- impact on function. These often pro-
documented (15-17), and this article re- est (23-25). Clinical trials have shown vide information about both the quality
views the literature concerning the clin- that an NRS is more reliable than a and quantity of pain, and many of them
imetric properties of PRO instruments, VAS, especially in the case of less also provide information concerning a
their advantages and their limitations. educated patients (26), a critical issue patient’s psychological and functional
that has also been pointed out by Joyce status. However, their length may limit
Assessing pain et al. (27). Simplicity and ease of rat- patient acceptance, especially if admin-
Chronic generalised pain is a core fea- ing are overriding criteria for pain as- istered during the painful experience.
ture of FM (1-3, 12, 13), and its assess- sessments in clinical settings, which The complete McGill Pain Question-
ment involves: i) patient reports of typ- explains the prevalent use of a simple naire (MPQ) is one of most widely
ical pain; and ii) an evaluation of the 0–10 cm NRS (23, 26), although a daily tested instruments, and can provide de-
hypersensitivity to palpation of specific diary has also been used and found to tailed information on the characteristics
tender points (TPs). be a useful means of identifying how of pain in FM (31-33). However, it is
pain affects the everday living activities complex (it includes 78 pain adjectives
Patient reports of pain of individuals. divided into the four major categories
The available instruments for the as- – sensory, affective, evaluative, and
sessment of pain incude visual analogue Pain diagrams or drawings miscellaneous sensory) and takes 15-
scales (VAS), verbal descriptor scales As widespread pain is one of the two 20 minutes to complete. It also includes
(VDS), numerical rating scale (NRS), a FM classification criteria proposed by questions concerning changes in pain
daily pain diary, the McGill Pain Ques- the American College of Rheumatol- over time, and classifies pain intensity
tionnaire (MPQ) and the Short-Form ogy (ACR) (28), and widespread pain as “mild”, “discomforting”, “distress-
McGill Pain Questionnaire (SF-MPQ), and/or the extent of pain has been the ing”, “horrible” and “excruciating”
the Wisconsin Brief Pain Questionnaire subject of many investigations, vari- (33-35). This makes it more difficult to
(BPQ), the Brief Pain Inventory (BPI), ous simple pain diagrams or drawings administer in non-research clinical set-
and the Multidisciplinary Pain Inven- have been validated. Two of these are tings, and simpler measures – such as
tory (MPI) (1, 18-20). the Regional Pain Scale (RPS) (29) and a VAS – have become more widely ac-
The standard VAS is a 10 cm scale the Self-assessment Pain Scale (SAPS) cepted. The Short-Form (SF)-MPQ is a
with a border on each side: to the left (30). The RPS is a valid means of meas- 15-item self-report scale derived from
of the “0” mark appears the indication uring the extent of pain that can be used the original MPQ (36) that contains
“no pain at all”, and to the right of the to identify patients with FM, including three components. The Pain Rating In-
“10” mark “pain as bad as it could be”. those with concomitant rheumatoid dex (PRI) consists of 15 representative
A number of studies have shown that arthritis (RA) and osteoarthritis (OA); words on a 4-point Likert-type scale
the data obtained using self-report VAS furthermore, as it is disease independ- ranging from 0 (none) to 3 (severe). It
scales are reproducible, but one of their ent, it works just as well in identify- includes 11 sensory (e.g. tender) and
limitations is that they must be adminis- ing the patients with severe RA or OA four affective (e.g. sickening) items,
tered on paper or electronically (18-21); alone who are likely to make the great- and there are two items measuring pain
furthermore, caution is required when est of the available resources (29). The intensity. Overall pain is assessed by
photocopying them as this can lead to SAPS considers 16 non-articular sites means of an NRS based on a 10 cm
significant changes in length (20, 22). by asking patients to “indicate below line that approximates ratings between
VDSs use categories such as “none”, the amount of pain and/or tenderness 0 (no pain) and 10 (unbearable pain)
“mild”, “moderate”, “severe” and “ex- you have experienced in the last 7 days (36). It includes the PRI of the standard
cruciating” to describe severity. An NRS in each of the body areas”, and has a MPQ and an NRS (32, 33, 35).
ranging from 0 (“no pain”) to 10 (“worst series of site descriptions followed by The Wisconsin Brief Pain Question-
pain imaginable”) is more practical than four boxes labelled 0 = none, 1 = mild, naire (BPQ) is a self-administered
a VAS as it is easier for most people to 2 = moderate, and 3 = severe; the pos- instrument that assesses pain history,
understand, and does not require vision, sible scores therefore range from 0 to worse pain, usual pain and pain now
dexterity, or paper and pen; another 48 but, in order to integrate them into (37) using a human figure that is shaded

REVIEW FM assessment instruments / F. Salaffi et al.

to indicate pain, pain intensity, the re- ing, bilaterally, the same manual fin- alone), or as a dichotomous variable
lief obtained from medication, and rat- ger pressure with a force of 4 kg (until (the presence or absence of a defined
ings of pain interference (0 = not at all, blanching of the fingernail bed) at nine criterion), or by means of four- or
1 = a little bit, 2 = moderately, 3 = quite anatomical sites: occiput, low cervical, five-point VDS or NRS (Table I), the
a bit, 4 = extremely). trapezius, supraspinatus, second rib, simplicity of these approaches must be
The Brief Pain Inventory (BPI) was de- lateral epicondyle, gluteal, greater tro- balanced against the missed opportu-
veloped to provide information about chanter and knee. A TP is considered nity to capture information concerning
pain intensity (the sensory dimension) “positive” when the patient reports other dimensions, including qualitative
and the extent to which pain interferes pain during the examination (43), and differences that may distinguish clini-
with function (the reactive dimension), the score is the total number of TPs. In cally meaningful subtypes (46); how-
and also asks questions about pain re- addition to the tender point count, other ever, these simple scales presumably
lief, pain quality, and the patient’s per- assessments of intensity have been de- provide a global measure of fatigue
ception of the cause of pain. It uses a veloped but it does not seem that their severity.
0-10 NRS for item rating because of use has increased accuracy (43). Another instrument that has been vali-
its simplicity, lack of ambiguity, and Another method of measure hyperal- dated in a number of rheumatic condi-
ease of use for cross-linguistic pain gesia is to use myalgic scores based on tions is the vitality scale (VT) of the
measurement. As pain can vary dur- dolorimetry. These are pain thresholds Medical Outcome Study (MOS)-SF36
ing the day, it asks patients to rate their based on the amount of force required (47, 48), which explores fatigue and
pain at the time of responding to the to elicit pain at each of the 18 FM TPs. the related concept of energy level.
questionnaire (pain now), and also the Digital and dolorimeter assessments are Item responses are rated on a 6-point
worst, least and average pain over the methodologically different (44) as the Likert scale from “all the time” to
previous week (37); ratings can also be former requires palpation at a constant “none of the time”, and the score can
made for the previous 24 hours. Evi- force, whereas the latter is based on vary from 0 (the worst score) to 100
dence for the validity of the BPI comes the amount of force required to induce (the best) (47).
from a number of studies involving pa- pain. Their scores are affected by the
tients with FM patients with other pain- different tactile sensations and surface Multidimensional fatigue
ful diseases (38, 39). areas involved, and the two methods assessment
The Multidisciplinary Pain Inventory may actually assess different aspects of Multidimensional fatigue assessment
(MPI) is a 61-item questionnaire that hyperalgesia. captures more information about the
provides a more generalised measure of characteristics or impact fatigue, such
chronic pain and its impact (40, 41). It Assessing fatigue as the global quality of life and symp-
is divided into three sections (“impact Many of the validated instruments for tom distress. Efforts to measure multi-
of pain on patient’s life”, “responses of measuring fatigue have been used in ple dimensions began thirty years ago
others to patient’s communication of FM patients, but there is still no con- in non-medically ill populations but,
pain”, and “participation in common sensus as to which should be preferen- since then, many instruments of this
daily activities”) and 13 scales measur- tially used (45). type have been validated in popula-
ing pain severity, interference, life con- The multidimensional nature of fatigue tions with chronic diseases (49) (Ta-
trol, affective distress, support, punish- underlines the challenge of its assess- ble I), some of which complement the
ing responses, solicitous responses, dis- ment in a research setting. Although measurement of fatigue severity by
tracting responses, household chores, it can be assessed monodimension- providing information concerning oth-
outdoor work, activities away from ally (e.g. by an intensity measurement er characteristics, while others measure
home, social activities, and general ac-
tivities. The responses are given using Table I. Monodimensional fatigue measurements.
a 7-point numerical scale. The MPI has
been shown to be reliable and valid for Type Score
both chronic pain and FM (42). 4-point verbal rating scale None, mild, moderate, severe
5-point verbal rating scale None, mild, moderate, severe, very severe
Assessment of pain hypersensitivity - 11-point NRS How severe has fatigue been, on average, during the past week on a
tender point (TP) assessment “0 (no fatigue) – 10 (worst fatigue imaginable) scale”
Another critical pain parameter in FM 4- point numerical scale 0 = none
is hyperalgesic responses to external 1 = increased fatigue over baseline, but not altering normal activities
2 = moderate fatigue or fatigue causing difficulty in performing
stimulation. Tender point (TP) assess-
some activities
ment is a demonstrably useful part of 3 = severe fatigue or an inability to perform some activities
the official ACR criteria for a diagnosis 4 = bed-ridden
of FM (28). The guidelines proposed VAS 0 (no fatigue) – 10 (worst possible fatigue)
by the ACR indicate that the examina-
tion should be carried out by apply- NRS: numerical rating scale; VAS: visual analogue scale.

FM assessment instruments / F. Salaffi et al. REVIEW

Table II. Characteristics of the self-administered fatigue instruments.

Instrument No. of Response format Score Measures

items range

FibroFatigue scale (57) 12 – – Impact of fatigue impact on specific types of functioning

MAF (58) 16 10-point RS (14 items) 1-50 Degree, severity, distress, impact on activities of daily
or multiple-choice (4 choices) responses living
(2 items)
MFI (59) 20 5-point RS 20-100 General fatigue, physical fatigue, reduced activity,
reduced motivation, mental fatigue
FACIT-F (60) 13 5-point RS 0-52 Severity, role and social impact
FSS (50) 9 7-point RS 1-7 Severity, physical, mental and social impact

MAF: Multidimensional Assessment of Fatigue; MFI: Multidimensional Fatigue Inventory; FACIT-F: Functional Assessment of Chronic Illness Therapy-
Fatigue scale; FSS: Fatigue Severity Scale; RS: rating scale.

the impact of fatigue on different types centration difficulties, failing memory, that is true” to “no, that is not true”. A
of functioning (50-56). irritability, sadness, sleep disturbances, global fatigue score combining the five
Multidimensional fatigue question- autonomic disturbances, irritable bowel, dimensions ranges from 20 to 100, with
naires have advantages and disadvan- headache, and the subjective experience higher scores indicating greater fatigue.
tages. One important advantage is that of infection. Its inter-rater reliability is The psychometric properties of the MFI
they make it possible to analyse and excellent, and it has been shown to be have been well documented, and it has
clarify the nature of a fatigue syndrome reliable, valid, capable of monitoring been frequently used in rheumatic dis-
or evaluate its response to treatment; symptom severity and changes during orders, including FM (63).
furthermore, the broader range of cap- treatment in patients with chronic fa-
tured experiences can add to its valid- tigue syndrome and FM, and effective Functional Assessment of Chronic
ity or improve its sensitivity to clinical in detecting and measuring functional Illness Therapy-Fatigue scale
changes. However, the disadvantages disability and symptom severity in FM (FACIT-Fatigue)
must also be considered. patients (61, 62). This has 13 items and a five-point Lik-
A variety of measures have proved to ert-type rating scale (0 = “not at all”;
be useful in measuring fatigue in FM Multidimensional Assessment 4 = “very much”), and explores the se-
and other rheumatic diseases, includ- of Fatigue (MAF) verity of fatigue on a monodimensional
ing the Fibromyalgia and Chronic The Multidimensional Assessment of Fa- basis (60). The total score is the sum of
Fatigue Syndrome Rating Scale (the tigue (MAF) scale (58) is a good means the individual items, and ranges from 0
FibroFatigue scale) (57), the Multi- of measuring fatigue in chronic illness (maximum fatigue) to 52 (no fatigue).
dimensional Assessment of Fatigue as it is easy to administer and score, It is widely used to measure cancer-re-
(MAF) (58), and the Multidimensional relatively short, and assesses the subjec- lated fatigue, and has also been used in
Fatigue Index (MFI) (59), which meas- tive aspects of fatigue by means of 16 primary Sjögren’s syndrome (64) and
ures various types of fatigue including items that cover the four dimensions of RA (65).
physical and emotional fatigue. An- fatigue severity, distress, degree of in-
other measure that has been validated terference in activities of daily living, Fatigue Severity Scale (FSS)
in a number of diseases is the Func- and timing. Fourteen items are rated us- The Fatigue Severity Scale (FSS) (50)
tional Assessment of Chronic Illness ing a 10-point numerical scale, and two consists of nine items and has a 7-point
Therapy (FACIT-Fatigue) system (60), by means of multiple-choice responses response format. Sample questions in-
which can be customised to certain in- with four choices. A global fatigue in- clude “I am easily fatigued” and “exer-
dications. Finally, the Fatigue Severity dex ranging from 1 (no fatigue) to 50 cise brings on my fatigue.” The initial
Scale (FSS) (50), which was originally (severe fatigue) can be computed using validation study found that its internal
developed to asses fatigue in multiple 15 of the 16 items (58). consistency was high for specific dis-
sclerosis and lupus, can also be used in ease groups and healthy controls: it
FM (Table II). Multidimensional Fatigue clearly distinguished patients from con-
Inventory (MFI) trols and moderately correlated with a
Fibromyalgia and Chronic The MFI is organised in five dimen- single-item visual analogue scale of fa-
Fatigue Sindrome Rating Scale sions (general fatigue, physical fatigue, tigue intensity. In all of the patients, a
(FibroFatigue Scale) reduced activity, reduced motivation, clinical improvement in fatigue was as-
The FibroFatigue scale (57) is an observ- mental fatigue), each based on four state- sociated with reductions in FSS scores.
er’s rating scale whose 12 items meas- ments (59) with five possible responses The scale is also practical as it is brief
ure pain, muscular tension, fatigue, con- to each statement ranging from “yes, and easy to administer and score.

REVIEW FM assessment instruments / F. Salaffi et al.

Assessing sleep
FM patients frequently report disturbed
sleep (1-3): estimates of the percentage
experiencing some sleep problem range
from 70-80% in the population used to
establish the ACR criteria (1-3,66-76)
to as high as 95% and 99% in two re-
Fig. 1. Sleep Quality Numerical Rating Scale.
cent studies (77, 78). It has also been
shown that the symptoms of disturbed
sleep in FM predict increased pain lev- as an indication of sleep quality. The caffeine, alcohol and tobacco products;
els and decreased physical functioning MOS-SS has been found to have posi- the use of medications; and the tim-
(77, 79-81), and so accurately assess- tive psychometric properties in a broad ing and duration of exercise and nap
ing the changes in sleep associated with range of patient populations, including periods. The daytime component gath-
FM treatments is critically important. patients with chronic pain conditions ers data on bedtime, “lights out” time,
Various dimensions of sleep have been similar to FM (89, 90). sleep latency, final wake time, method
assessed in FM trials, including quan- of final awakening, the frequency of
tity, quality, the ease of falling asleep, Pittsburgh Sleep Quality nightly awakenings, wake after sleep
the frequency of waking, and feeling Index (PSQI) onset time, the reasons for nightly
refreshed upon awakening. The quality The Pittsburgh Sleep Quality Index awakenings, sleep quality, mood on
of sleep can be assessed using a single- (PSQI) retrospectively measures sleep final wakening, and alertness on final
item measure (the Sleep Quality NRS) quality and disturbances (86). It dis- wakening. In addition to the categori-
(81), which instructs patients to “se- criminates good and poor sleepers, and cal and frequency data generated by
lect the number that best describes the provides a brief and clinically useful the bedtime questionnaire, the daytime
quality of your sleep during the past 24 assessment of multiple sleep distur- questionnaire makes it possible to cal-
hours” (0 = “best possible sleep” and 10 bances. Its 19 items generate seven culate standard continuity parameters.
“worst possible sleep”) (Fig. 1) or mul- component scores, the sum of which
tidimensional instruments (82,83). A (range 0-21) yields a global measure of Insomnia Severity Index (ISI)
number of multidimensional measures sleep quality, with higher scores indi- The Insomnia Severity Index (ISI)
have proved to be useful in measuring cating poorer sleep (>5 indicates sleep (88) is a self-report instrument that
disturbed sleep in rheumatic diseases, disturbance). The components assess a measures an individual’s perception
including the Medical Outcome Study broad range of domains associated with of insomnia. It has seven items and a
Sleep Scale (MOS-SS) (84,85), the sleep quality, including the duration total score that ranges from 0 to 28:
Pittsburgh Sleep Quality Index (PSQI) of sleep, sleep latency, the frequency according to the recommended score
(86), the Pittsburgh Sleep Diary (PSD) and severity of specific sleep-related interpretation guidelines, 0–7 indicates
(87), and the Insomnia Severity Index problems, and the perceived impact “no clinically significant insomnia”,
(ISI) (88), of which the MOS-SS may of poor sleep on daytime functioning. 8–14 “sub-threshold insomnia”, 15–21
represent the best choice. The questionnaire is perhaps the most “clinical insomnia (moderate sever-
widely used general measure of sleep, ity)”, and 22–28 “clinical insomnia
Medical Outcome Study and its strengths lie in its coverage of (severe)”. The cut-off level of 14 has
Sleep Scale (MOS-SS) multiple dimensions of sleep quality, optimal sensitivity (94%) and specifi-
The MOS-SS is a 12-item questionnaire its flexibility as a brief clinical tool, city (94%) in distinguishing a group of
designed to evaluate key constructs of and its demonstrated validity and use- adults diagnosed with primary insom-
sleep, with derived subscales for the fulness in chronic pain research and in nia from those without.
domains of sleep disturbance (4 items), patients with FM.
quantity of sleep (1 item), snoring (1 Psychological and behavioural
item), awakening short of breath or Pittsburgh Sleep Diary (PSD) assessment
with headache (1 item), sleep adequacy The Pittsburgh Sleep Diary (PSD) is Psychological and behavioural evalua-
(2 items), and somnolence (3 items) used to quantify subjectively reported tions of FM patients can provide use-
(84, 85). It is also possible to generate sleep and wake behaviours (87), and is ful information concerning factors that
a 9-item Sleep Problems Index that as- divided into two daily questionnaires may affect their pain and dysfunction,
sesses overall sleep problems and in- completed at “bedtime” and “wake and give an idea of the impact of pain,
cludes the four sleep disturbance and time”, with the timing and duration of fatigue and other symptoms on their
two sleep adequacy items, two of the various daytime and sleep-wake param- psychological health (1-3, 91, 92).
somnolence items, and awakening short eters and activities being completed by Anxiety and depression are major fac-
of breath/with headache; higher scores the participant. The bedtime compo- tors affecting a patient’s quality of life,
indicate greater sleep impairment, and nent consists of six general items: the and the associated symptoms (inabil-
this index is often used in clinical trials timing of meals; the consumption of ity to concentrate, loss of motivation,

FM assessment instruments / F. Salaffi et al. REVIEW

Table III. Definitions and characteristics of screening instruments.

Screening Items Time Advantages Disadvantages

instruments required

Ultra-short 1–4 <2 • Very likely to be used in busy clinics

• Sensitivity can be high • Can only assess one domain
• Low-to-moderate specificity • Unsuitable for research
• Inexpensive

Short 5–20 2-10 • Moderately likely to be used in busy clinics

• Probably highly sensitive, moderate-to-high specificity • Some cost in scoring
• Can assess multiple domains
• May be suitable for research, needs to be tested

Long 21–50 >10 • Specificity and sensitivity can be high • Routine use unlikely unless automated
• Can assess multiple domains • Potentially costly scoring (can be minimised
• Excellent for research by automation)

disturbed sleep, fatigue, pessimistic everyday practice (Table V). Ultra-short Zung Self-rating Depression
mood) may affect their response to screening instruments have a potential Scale (ZSDS)
treatment (14) and rehabilitation pro- economic advantage because of their The Zung Self-rating Depression Scale
grammes (93). brevity and the need for fewer staff re- (ZSDS) consists of 10 positively word-
Psychological assessment instruments sources to administer them. However, ed items and 10 negatively worded
come in varying lengths and formats although they may be successfully used items asking about symptoms of de-
(94), and one important factor is their in busy daily practice, a recent meta- pression (96), and has been found to be
length, defined as the number of ques- analysis (95) has shown that they are a reliable and valid means of measur-
tions or items they contain. The term not very accurate in detecting depres- ing depressive symptoms in a number
“screening instrument” usually refers sion in primary care and should only be of studies (109-112). ZSDS scores are
to a particularly short test whereas, al- used to rule out a diagnosis. used to define four categories of se-
though longer tests are more expensive Among the short instruments (i.e. those verity: within the normal range or no
to administer, they are sometimes need- with 5–20 items), the Zung Self-rat- significant psychopathology (<40); the
ed to reach acceptable levels of reliabil- ing Depression Scale (ZSDS) (96), the presence of minimal to mild depression
ity and validity. Table III shows the def- Center for Epidemiologic Studies – De- (40-47); moderate to marked depres-
initions and characteristics of screening pression Scale (97), the Hospital Anxi- sion (48-55); and the presence of severe
instruments by length, as well as their ety and Depression Scale (98), and the to extreme depression (≥56).
advantages and disadvantages. Hamilton Rating Scale for Depression
Ultra-short forms are typically limited (HRS-D) (99-101) all have adequate Center for Epidemiologic Studies
to one psychological domain, such as psychometric properties. The Somatic Depression Scale (CES-D)
depression or anxiety, and are the easi- Symptoms Checklist (SSC) (102) and The Center for Epidemiologic Studies
est to use in routine care settings. They the Illness Attitudes Scale (IAS) (103) Depression Scale (CES-D) has 20 items
usually consist of only one question, are less frequently used for FM pa- and has been validated in mixed samples
take only 1-2 minutes to complete, and tients (Table V). The long instruments of cancer patients and reference groups of
require no scoring. Table IV shows the (i.e. those with 21–50 items) include healthy control subjects (97). Each item
most frequently used questions for de- the Beck Depression Inventory (104), is assessed on a 4-point scale that ad-
pression. A combination of one depres- the Four-Dimensional Symptom Ques- dresses the frequency of the occurrence
sion question, a one-question interview, tionnaire (4DSQ) (106), the Symptom of each symptom (0 = none of the time,
a Distress Thermometer (DT) and an Checklist (SCL-90) (107), and the 3 = all of the time). A cut-off score of 19
11-point NRS creates a further ultra- Rotterdam Symptom Checklist (108) is commonly used to indicate a need for
short questionnaire that can be used in (Table V). a further assessment of depression in pa-
tients experiencing pain. Various studies
of the scale’s sensitivity and specificity
Table IV. Simple verbal questions for depression used as an ultra-short measure.
have shown that it has very good psy-
• ‘Are you depressed?’ chometric properties (113, 114).
• ‘Are you depressed OR “Have you lost interest?’
Hospital Anxiety and
• ‘Are you depressed?’ OR ‘Have you experienced a loss of interest in things or activities that
you would normally enjoy?’ Depression Scale (HADS)
The Hospital Anxiety and Depression
• ‘Over the past couple of weeks, have you been feeling unhappy or depressed?’
Scale (HADS) (98) examines the levels

REVIEW FM assessment instruments / F. Salaffi et al.

Table V. Screening instruments for psychological and behavioural assessments.

Screening instruments No. of items Validity Reliability Generalisable

Ultra-short (1-4 items)

Depression question 1 Moderate – No
Anxiety question 1 Moderate – No
One-question interview 1 Moderate – Yes
Combination of one depression question 2 Moderate Moderate No
Distress Thermometer (DT) 1 Moderate Moderate Yes
11-point numerical rating scale 1 Moderate – No

Short (5-20 items)

Zung Self-rating Depression Scale (ZSDS) [96] 20 High High Yes
Center for Epidemiologic Studies – Depression Scale (CES-DS) [97] 20 High High Yes
Hospital Anxiety and Depression Scale [98] 14 Moderate High Yes
Hamilton Rating Scale for Depression (HAM-D) [100] 17 Moderate Moderate Yes
Somatic Symptoms Checklist (SSC) [102] 7 Moderate Moderate Yes
Illness Attitudes Scales (IAS) [103] 17 Moderate Moderate Yes

Long (21-50 items)

Beck Depression Inventory [104] 21 High High Yes
Four-Dimensional Symptom Questionnaire (4DSQ) [106] 50 Moderate High Yes
Symptom Checklist (SCL-90) [107] 90 Moderate Moderate Yes
Rotterdam Symptom Checklist [108] 30 Moderate Moderate Yes

of anxiety and depression in the previ- Somatic Symptoms Checklist (SSC) symptoms of depression such as hope-
ous week. It consists of seven items The Somatic Symptoms Checklist lessness and irritability; cognition such
for anxiety (HADS-A) and seven for (SSC) (102) was originally designed as guilt and feelings of being punished;
depression (HADS-D) that are each and validated as a screening test for so- and physical symptoms such as fatigue,
self-rated on a four-point scale scored matisation disorder. It contains six items weight loss and lack of interest in sex.
0–3; higher scores are associated with (and an additional item for females re- A cut-off score of >9 is used to indicate
a greater probability of a depressive or garding menstrual cramps) in the form at least minimal symptoms of depres-
anxiety disorder. The depression scale of questions (e.g. “have you ever had sion. The 13-item BDI–Short Form is
(7 items, score range 0-21) mainly trouble breathing?”) requiring a yes/no also widely used, although it has a low
measures anhedonia, which is consid- answer, and the scores are summed to level of inter-rater reliability and is only
ered to be the central characteristic of provide the total number of reported so- moderately specific (117).
major depressive disorder; the anxiety matic symptoms.
scale (7 items, score range 0–21) main- Four-Dimensional Symptom
ly measures symptoms of generalised Illness Attitudes Scales (IAS) Questionnaire (4DSQ)
anxiety disorder. The scale as a whole The Illness Attitudes Scales (IAS) (103) The Four-Dimensional Symptom Ques-
and each subscale has adequate internal consists of two subscales: health anxiety tionnaire (4DSQ) is a 50-item self-rating
consistency and is sensitive to change and illness behaviour. The first contains questionnaire that measures “distress”,
(115, 116). 11 items (e.g. ‘are you worried that you “depression”, “anxiety” and “somati-
may get a serious illness in the future?’) sation” (106) by assessing the psycho-
Hamilton Rating Scale for scored on a five-point scale (0–4) with logical and psychosomatic symptoms
Depression (HAM-D) total scores ranging from 0 to 44; the experienced during the previous seven
The Hamilton Rating Scale for Depres- second contains six items (e.g. ‘how of- days. The distress scale (16 items, score
sion (HAM-D) (100,101) is probably the ten do you see a doctor?’) also scored range 0–32) measures the symptoms of
most widely used observer-rated rating on a five-point scale from 0 (‘no’) to 4 general psychological distress, which
scale doe depressive symptoms. The (‘most of the time’), with total scores is conceptualised as the most general
original scale had 21 items, but Ham- ranging from 0 to 24. and most basic expression of human
ilton suggested scoring only the initial psychological suffering; the depression
17 because the last four either occurred Beck Depression Inventory (BDI) scale (6 items, score range 0–12) meas-
infrequently or described only aspects The Beck Depression Inventory (BDI) ures severe anhedonia and depressive
of the illness. The items are ranked 0–4 (104) is a 21-question multiple-choice cognitions (including suicidal ideation)
(when severity is quantifiable) or 0–2 self-report questionnaire that is one of as symptoms that are considered to be
(when they measure symptoms that are the most widely used by healthcare pro- characteristic of depressive disorder;
more difficult to assess reliably), with fessionals and researchers for measur- the anxiety scale (12 items, score range
the highest scores indicating the great- ing the severity of depression in a varie- 0–24) measures irrational fears, panic
est severity (100). The range for the 17- ty of settings. It was designed for adults and avoidance, which are characteristic
item scale is 0–50. and is composed of items relating to features of most anxiety disorders; and

FM assessment instruments / F. Salaffi et al. REVIEW

Table VI. Characteristics of selected generic instruments. which physical health or emotional
problems interfere with normal social
Instrument* No. of No. of Administrationº Scoring options# Time required
items levels (minutes) activities; 3) Physical role function-
ing (4 items), or the extent to which
SF-36 36 3–6 S, I, P Pr, SS 10–15 physical health interferes with work
SIP 136 2 S, I, P Pr, SS, SI 20–30 or other daily activities; 4) Emotional
NHP 38 2 S, I Pr 10–15 role functioning (3 items), or the extent
EuroQoL 6 3 S, I SI 7–10 to which emotional problems interfere
with work or other daily activities; 5)
*SF-36: Medical Outcomes Study 36-Item Short-Form Health Survey; SIP: Sickness Impact Profile; Mental well-being (5 items), or general
NPH: Nottingham Health Profile; EuroQol: European Quality of Life Questionnaire; ºS: self-adminis-
tered; I: interviewer; P: proxy. #Pr: profile; SS: summary scores; SI: single index. mental health, including depression,
From: Franchignoni F. & Salaffi F. (130). anxiety, behavioural-emotional control,
and general positive affect; 6) Vitality
the somatisation scale (16 items, score tions requires multidimensional quali- (4 items), whether one feels energetic
range 0–32) measures a range of “psy- tative and HRQL instruments (121, and full of pep or tired and worn out; 7)
chosomatic” symptoms characteristic of 122) as it has been shown that measur- Bodily pain (5 items), which includes
bodily distress and somatoform disor- ing HRQL is a key aspect of screening the intensity of pain and its effect on
ders. Higher scores on all four scales in- for disability and improving patient/ normal work inside and outside the
dicate the presence of more symptoms. clinician communications. A distinc- house; and 8) General health percep-
Two cut-off points are recommended to tion is drawn between generic and spe- tions (5 items), a personal evaluation
divide low, moderate and high scores. cific measures of physical function and of health that includes current health,
health status (123-130): the first provide health outlook, and resistance to ill-
Symptom Checklist (SCL-90) a broad picture of health status across a ness. The SF-36 also includes a single-
The Symptom Checklist (SCL-90) is range of conditions, whereas the second item measure of health transition that is
used to assess psychological distress and are more sensitive to the disorder under not used to score any multi-item scales.
consists of eight dimensions (anxiety, consideration and therefore more likely The eight scales, which are weighted
agoraphobia, depression, somatic symp- to reflect clinically important changes. on the basis of a normative algorithm,
toms, distrust and interpersonal sensitiv- are scored from 0 to 100, with higher
ity, anger, hostility and sleeping disor- Generic measures scores reflecting a better quality of life
ders) designed to provide an overview of Generic measures, which are com- (48).
a patient’s symptoms and their intensity monly developed for descriptive epi- Subsequent algorithms have also been
at a specific time (107). The total SCL-90 demiological or social science research developed to calculate two psychomet-
score reflects general psychoneurotiscism applications, provide a profile of scores rically based summary measures, the
or psychological distress, by the Global for the different components of health Physical Component Summary Scale
Severity Index can be used as a summary status and HRQL, or operational defini- Score (PCS) and the Mental Compo-
test. The SCL-90 has 90 items and can be tions of various constructs summarised nent Summary Scale Score (MCS)
completed in just 12–15 minutes. by a single index value (130-134). The which provide greater precision, reduce
most widely used are the Medical Out- the number of statistical comparisons
Rotterdam Symptom Checklist comes Study (MOS) 36-Item Short- needed, and eliminate the floor and
(RSCL) Form Health Survey (SF-36) (47, 48), ceiling effects noted in several of the
The Rotterdam Symptom Checklist the Sickness Impact Profile (SIP) (135, sub-scales (48).
(RSCL) is a 30-item questionnaire that 136), and the Nottingham Health Pro- It has been reported that, in compari-
has been extensively used in clinical tri- file (NHP) (137-139) (Table VI). son with healthy populations, FM pa-
als (108). Although some studies have tients are significantly impaired in all
found that it has a four- or five-factor Medical Outcomes Study (MOS) eight domains (125, 140). The SF-36
structure, it has also been suggested 36-Item Short-Form Health Survey questionnaire takes about 15 minutes
that it has a two-factor psychological (SF-36) to complete, although most elderly pa-
and composite somatic structure (118). The SF-36 is a generic health ques- tients prefer a standard interview to the
The psychological subscale has proved tionnaire divided into eight scales that self-administered approach.
to be stabile across sub-samples to have measure a different function domains The SF-36 was later used to develop
a high degree of internal consistency and aspects of well-being (47, 48): 1) the SF-12 (141), which measures the
(119, 120). Physical functioning (10 items), or the same health status concepts but pro-
extent to which health limits activities vides only one score for the PCS and
Assessing health-related quality such as self-care, walking, climbing MCS summary measurements (140,
of life (HRQL) and function stairs, bending, lifting, and other mod- 141), although there description is the
Assessing chronic pain and its impact erate and vigorous activities; 2) Social same as that of the SF-36 PCS and
on physical, emotional and social func- functioning (2 items), or the extent to MCS scores.

REVIEW FM assessment instruments / F. Salaffi et al.

Sickness Impact Profile (SIP) None of the above generic measures value to their overall health directly.
The Sickness Impact Profile (SIP) con- captures the individual value that a The most widely used techniques are
tains 136 items grouped into 12 dimen- given respondent may assign to a par- rating scales (RS), time trade-offs
sions of daily activity (ambulation, ticular health state, and two individu- (TTO) and the standard gamble (SG)
body care and movement, mobility, als may rate the same state differently technique (130, 142).
social interaction, emotional behav- depending on the value they assign to a
iour, alertness, communication, home symptom or impairment, and their will- Disease-specific measures
management, recreation and pastimes, ingness to accept trade-offs between Disease-specific measures are designed
sleep and rest, eating, and work) (135, benefits and risks. to assess specific diagnostic groups or
136), and asks respondents check those In the context of HRQL evaluations, patient populations, often with the goal
that apply to them at the time of the in- preference-based (or utility) measures of measuring responsiveness to treat-
terview. Each item is weighted on the are specifically designed to assess the ment or “clinically important” changes.
basis of the relative severity of dys- value or desirability of a particular One obvious disadvantage of some of
function implied by each statement. health status/outcome (142, 143). They them is that they do not allow compara-
The scores for each dimension are provide a final score on a 0–1 scale tive judgements of the outcomes of dif-
summed and expressed as a percent- where 0 is the worst possible imaginable ferent treatments in patients with dif-
age of the maximum possible score. state (or death) and 1 is perfect health. ferent health problems, for example for
Three summary scores are also calcu- As the ratings can be elicited from dif- resource allocation studies (130, 131,
lated: the total score (includes all do- ferent groups of individuals, such as pa- 133, 134), although this can be over-
mains), a physical score (ambulation, tients, health professionals or the gener- come by combining the use of disease-
body care and movement, and mobil- al public, that can be used as quality of specific and generic measures. There
ity), and a psychosocial score (social life adjustment weights to calculate, for are a number of broad disease-specific
interaction, emotional behaviour, alert- example, quality-adjusted life years and measures, such as the Fibromyalgia
ness, and communication) (135, 136). similar measures that can then be used Impact Questionnaire (FIQ) (146,147)
Higher scores reflect greater dysfunc- in economic evaluations (142-144). or the Revised Fibromyalgia Impact
tion. The SIP can be administered by There are two main approaches to Questionnaire (FIQR) (148), the Ar-
an interviewer or self-administered but, measuring HRQL. The first is to clas- thritis Impact Measurement Scales 2
although it is easy to administer and sify patients into categories on the basis (AIMS2) (149), and the Health As-
score, it is relatively time-consuming of their responses to questions about sessment Questionnaire (HAQ) (150),
as it takes approximately 30 minutes to their functional status (preference clas- which cover general aspects of func-
complete (135). sification systems), and combining tional status together with specific ref-
these categories or dimensions leads to erences to states or changes of particu-
Nottingham Health Profile (NHP) descriptions of their overall health. One lar concern to the target population.
The Nottingham Health Profile (NHP) such instrument is the European Qual-
is a primary healthcare instrument that ity of Life Measure (EuroQol) (144, Fibromyalgia Impact
is intended to provide a brief indica- 145), a self-administered questionnaire Questionnaire (FIQ)
tion of a patient’s perceived emotional, used to measure health outcomes (145) The Fibromyalgia Impact Question-
social and physical health problems that provides a simple descriptive pro- naire (FIQ) (146, 147) is a 10-item,
(137, 138). It originally consisted of file and a single index value for health self-administered, disease-specific as-
two parts, but only part I is now used: status that can be used for clinical and sessment and outcome instrument de-
it contains 38 items that can be grouped economic evaluations of health care, as veloped to measure the components of
into six domains (physical mobility, well as in population health surveys. It health status that are believed to be most
pain, sleep, social isolation, emotional covers five dimensions of health (mo- affected by FM. The first item contains
reactions, and energy level), with each bility, self-care, usual activities, pain/ 11 questions related to physical func-
question being weighted on the basis of discomfort, and anxiety/depression), tioning, each of which is rated using a
severity. The questions were selected each of which is divided into three 4-point Likert-type scale; items 2 and 3
from statements generated in large sur- levels (no problems, some or moder- ask the patient to mark the number of
veys of people randomly selected from ate problems, extreme problems), thus days they felt well and the number of
the general population, and respondents generating a total of 243 theoretically days they were unable to work (includ-
are required to answer “yes” or “no” to possible health states. The EuroQol is ing housework) because of FM symp-
each. Scores range from 0 (no prob- self-completed by respondents and ide- toms; and items 4-10 are horizontal lin-
lems or limitations) to 100 (all prob- ally suited for use in postal surveys, ear scales marked in 10 increments for
lems are present). There is no summary clinics and face-to-face interviews. It the rating of working difficulties, pain,
score. The sum of all of the weighted is cognitively simple, and takes only a fatigue, morning tiredness, stiffness,
values in a given domain represents a few minutes to complete (145). anxiety and depression. Each of the 10
continuum between 0 (best health) and The second approach to utility meas- items has a maximum score of 10, and
100 (worst health) (137-139). urement is to ask patients to assign a so the maximum possible total score is

FM assessment instruments / F. Salaffi et al. REVIEW

100. The scoring is complicated by the ease-specific measure with a broad functional disability, and the number of
need to reverse scores in one question scope that is used to assess functional symptoms on a review of systems (in-
and use constants to convert the first 13 limitations and disability, has two ver- cluding the ratios of scores for pain to
questions to a standardised 0–10 scale. sions, AIMS2 (78 items) and AIMS2 SF physical function and fatigue to physi-
The average FM patient scores about (26 items) (149), both of which are de- cal function), and to study further how
50, and severely affected patients usu- signed to assess the severity of arthritic these scores can help to identify pa-
ally 70+. The FIQ takes approximate- pain and the extent to which it affects tients with FM, DeWalt et al. analysed
ly five minutes to complete, and has health (152,153). The respondents are 78 consecutive patients with FM over
been extensively used as an outcome asked to consider the areas of mobil- a two-year period, using 149 patients
measure in FM-related studies (151). ity, walking and bending, hand and with RA as a “control” group. The re-
It appears to be a sensitive measure of finger function, arm function, self-care, sults demonstrated that the FM patients
changes related to symptoms and dis- household tasks, social activity, fam- had significantly higher pain:physical
ability, and makes it possible to distin- ily support, arthritic pain, work, level function and fatigue:physical function
guish FM from some other health prob- of tension, and mood over the previous ratios, and reported a significantly larg-
lems involving chronic pain (30). month and, for each area, rate their de- er number of symptoms (155).
gree of satisfaction, the impact of the
Revised Fibromyalgia disease, and where they would like to Measures of overall health status
Impact Questionnaire (FIQR) see improvements. Finally, they are The number of TPs (a surrogate for dif-
The Revised Fibromyalgia Impact asked to summarise their current, fu- fuse pain) does not fully capture the
Questionnaire (FIQR) attempts to ad- ture and overall perceptions of health, essence of FM syndrome, in which ac-
dress the limitations of the FIQ while and to describe any existing medical companying fatigue is often severe and
retaining the essential properties of the problems that affect it. nearly always present, but the Symp-
original (148). It has 21 individual ques- tom Intensity Scale (SIS) (156) and
tions framed in the context of the previ- Health Assessment Fibromyalgia Assessment Status (FAS)
ous seven days, all of which are based Questionnaire (HAQ) (30) are accurate surrogate composite
on an 11-point NRS, with 10 being The most widely used form of the Stan- measures. Unlike instruments intended
“worst”. It is divided into three linked ford Health Assessment Questionnaire for a particular disease such as the Dis-
sets of domains: a) “function” contains (HAQ) is a 20-item, self-administered ease Activity Score (DAS) (157, 158),
nine questions; b) “overall impact” has questionnaire that examines difficulties which measures disease activity only in
two, but they now relate to the overall in performing eight daily living activi- RA, SIS (156) or FAS scores (30) can
impact of FM on functioning and overall ties (dressing and grooming, rising, eat- be used as a measure of global health
symptom severity; and c) “symptoms” ing, walking, hygiene, reach, grip, and status (or disease severity).
contains 10 questions, four of which are outside activities) (150). For each item, The SIS questionnaire consists of two
new and relate to memory, tenderness, the patients are asked to rate the level parts: a list of 19 anatomical areas
balance and environmental sensitivity of difficulty over the previous week on concerning which patients are asked
(loud noises, bright lights, odours, and a 4-point scale ranging from 0 (no dif- whether they feel pain (the total number
cold temperatures). The scoring is much ficulty) to 3 (unable to perform). The of “yes” answers being the RPS score),
simpler than that of the FIQ: the function final HAQ score is the average of the and a VAS for fatigue (156). The SIS
score (range 0-90) is divided by three, eight category scores; it ranges from 0 score can be used to identify and quan-
the overall impact score (range 0–20) to 3, with the highest score represent- tify FM simply from the information
is unchanged, and the symptoms score ing the greatest disability. supplied. As the continuous SIS score
(range 0–100) is divided by two, and the Various modifications have been made closely correlates with the patient’
total score is the sum of the three modi- to the HAQ for RA: the Multidimen- perceived pain and general health, it
fied domains. The weighting is different sional HAQ (MDHAQ) keeps one is ideal for outpatient evaluations and
insofar as 30% of the total score is as- question from each of the eight catego- complements a complete patient history
cribed to “function” (as opposed to 10% ries, thus reducing the number of items and physical examination by measuring
in the FIQ), 50% to “symptoms” (as op- to eight, and its score is calculated as biopsychosocial factors.
posed to 70% in the FIQ), while “over- the mean of the scores for each activity. The FAS index is a short and easy to
all impact” remains the same at 20%, as The MDHAQ includes 10 activities of complete self-administered instrument
does the maximum total score of 100. daily living (ADLs), eight derived from that combines a set of questions relat-
The FIQR takes approximately half as the HAQ and two additional complex ing to non-articular pain (SAPS range
long to complete as the FIQ (148). ADLs: walking two miles and partici- 0–10), fatigue (range 0–10) and the
pating in sports and games (154). The quality of sleep (range 0–10), thus pro-
Arthritis Impact Measurement MDHAQ also includes VAS’s to assess viding a single composite measure of
Scale 2 (AIMS2) pain, fatigue and global status, and a disease severity ranging from 0-10 (30).
The Arthritis Impact Measurement listing of 57 symptoms. To analyse the The final score is calculated by adding
Scale 2 (AIMS2), a widely used dis- quantitative scores for pain, fatigue, the three sub-scores and dividing the

REVIEW FM assessment instruments / F. Salaffi et al.

