Treanor 2014

Qual Life Res (2015) 24:339–362
DOI 10.1007/s11136-014-0785-6
REVIEW
A methodological review of the Short Form Health Survey

36 (SF-36) and its derivatives among breast cancer survivors
Charlene Treanor • Michael Donnelly
Accepted: 11 August 2014 / Published online: 20 August 2014

Ó Springer International Publishing Switzerland 2014
Abstract or without lymphedema. Methodological quality scores

Purpose A systematic review of the validity, reliability varied between and within papers.
and sensitivity of the Short Form (SF) health survey Conclusion Short Form measures appear to provide a reli-
measures among breast cancer survivors. able and valid indication of general health status among breast
Methods We searched a number of databases for peer- cancer survivors though the limited data suggests that par-
reviewed papers. The methodological quality of the papers ticular caution is required when interpreting scores provided
was assessed using the COnsenus-based Standards for the by non-English language groups. Further research is required
selection of health Measurement INstruments (COSMIN). to test the sensitivity or responsiveness of the measure.
Results The review identified seven papers that assessed
the psychometric properties of the SF-36 (n = 5), partial Keywords Breast cancer Quality of life Systematic
SF-36 (n = 1) and SF-12 (n = 1) among breast cancer sur- review Psychometric properties Short Form health
vivors. Internal consistency scores for the SF measures survey SF-36
ranged from acceptable to good across a range of language
and ethnic sub-groups. The SF-36 demonstrated good con-
vergent validity with respective subscales of the Functional Introduction
Assessment of Cancer Treatment—General scale and two
lymphedema-specific measures. Divergent validity between Breast cancer is one of the most commonly occurring and
the SF-36 and Lymph-ICF was modest. The SF-36 demon- most widely studied cancers in developing countries
strated good factor structure in the total breast cancer sur- worldwide with 5-year relative survival rates which exceed
vivor study samples. However, the factor structure appeared 80 % [1]. Breast cancer survivors may experience adverse
to differ between specific language and ethnic sub-groups. or late effects following treatment [2], and it is important to
The SF-36 discriminated between survivors who reported or understand the impact of these effects on the health and
did not report symptoms on the Breast Cancer Prevention well-being of cancer survivors [3, 4]. Increasingly, health
Trial Symptom Checklist and SF-36 physical sub-scales, but care programmes are shifting focus from traditional clinical
not mental sub-scales, discriminated between survivors with outcomes to patient-reported outcomes such as quality of
life and health status (e.g. UK National Cancer Survivor-
ship Initiative, 2010), particularly with respect to the new
C. Treanor (&) M. Donnelly chronic condition-like model of cancer and the recognised
UKCRC Centre of Excellence for Public Health, Queen’s
importance of capturing the patient perspective in order to
University Belfast, Belfast, Northern Ireland
e-mail: c.treanor@qub.ac.uk aid patient-centred practice, health care policy and the
configuration of cancer services [5].
C. Treanor M. Donnelly The Short Form (SF) health survey measure is a widely
Cancer Epidemiology and Health Services Research Group,
used, generic, self-report measure of health status. We
Centre for Public Health, Queen’s University Belfast, Institute of
Clinical Sciences-B Building, Royal Victoria Hospital Site, conducted a systematic review of studies which assessed
Grosvenor Road, Belfast BT12 6BJ, Northern Ireland the validity, reliability and sensitivity of the SF-36 and
123
340 Qual Life Res (2015) 24:339–362
briefer versions (SF-12 and SF-8) among breast cancer terms which link breast cancer survivor, psychometric
survivors. The methodological quality of the papers was properties and SF measures. The search terms for mea-
assessed using the COnsenus-based Standards for the surement properties were based on a sensitive search filter
selection of health Measurement INstruments (COSMIN) developed by the COSMIN initiative which is appropriate
and related checklist [5]. for identifying papers which focus on the psychometric
properties of a specific patient-reported outcome measure
(see Fig. 1).
Methods Titles identified from the electronic search were expor-
ted to Refworks, and duplicates were removed. The eligi-
Review method bility of a paper was assessed independently by two
reviewers, firstly according to its title followed by its
PubMed, MEDLINE, EMBASE, CINAHL, PsycINFO and abstract and the full paper. The bibliographies of included
the Social Sciences Citation Index were searched using papers were searched for further eligible papers. Data
Fig. 1 Complete search

(breast) AND (cancer OR neoplasm) AND (survivor* OR patient*)
strategy
AND
(medical outcomes study SF-36 OR medical outcomes study SF36 OR medical outcomes
study SF 36 OR medical outcomes study short form-36 OR medical outcomes study short
form 36 OR SF-36 OR SF36 OR SF 36 OR short form-36 OR short form 36 OR MOS SF-36
OR MOS SF36 OR MOS SF 36 OR MOS short form-36 OR MOS short form 36) OR
(medical outcomes study SF-12 OR medical outcomes study SF12 OR medical outcomes
study SF 12 OR medical outcomes study short form-12 OR medical outcomes study short
form 12 OR SF-12 OR SF12 OR SF 12 OR short form-12 OR short form 12 OR MOS SF-12
OR MOS SF12 OR MOS SF 12 OR MOS short form-12 OR MOS short form 12) OR
(medical outcomes study SF-8 OR medical outcomes study SF8 OR medical outcomes study
SF 8 OR medical outcomes study short form-8 OR medical outcomes study short form 8 OR
SF-8 OR SF8 OR SF 8 OR short form-8 OR short form 8 OR MOS SF-8 OR MOS SF8 OR
MOS SF 8 OR MOS short form-8 OR MOS short form 8)
AND
(reproducib*[tw] OR methods [sh] OR valid*[tiab] OR reproducibility of results [MeSH]
OR reliab*[tiab])
NOT
(child* OR paed* OR ped*) AND (hospital OR inpatient) AND (pallia* OR end of life* OR
terminal*)
123
Qual Life Res (2015) 24:339–362 341
extraction unto a standardised pro-forma was conducted Results

independently.
The search identified 270 papers after duplicates were
removed—see Fig. 2 for complete process of study selec-
Inclusion/exclusion criteria
tion. Tables 1 and 2 present the summary characteristics
and findings of included papers.
Only papers that covered survivors of breast cancer were
Five studies were conducted in the United States, and
included. Survivors were defined as individuals who had
the remaining two studies were conducted in Dutch-
completed treatment with curative intent (including sur-
speaking countries—Belgium and The Netherlands,
gery, radiotherapy and/or chemotherapy); survivors who
respectively. The SF-36 was the focus of all but one study
were in receipt of adjuvant hormone therapy for pro-
which focussed on the SF-12. One study included four
phylactic purposes were also included. Papers with sur-
subscales only of the SF-36. The English language versions
vivors from sites other than breast cancer were included
of the SF measures were used in three studies, other lan-
if a separate analysis was provided for breast cancer.
guage versions include: Chinese (n = 2), Dutch (n = 2),
Papers which assessed the psychometric properties of the
Korean (n = 1) and Spanish (n = 1). Of the studies that
SF-36, or its derivatives, the SF-12 and the SF-8 were
primarily aimed to assess the psychometric properties of
included. Four papers met the above inclusion criteria.
the SF measures, they focussed on internal consistency
Three papers which utilised the SF measures to assess the
(n = 3), construct validity (n = 3), concurrent validity
psychometric properties of other measures e.g. Functional
(n = 3) and discriminant validity (n = 1). The studies
Assessment of Cancer Treatment-General scale (FACT-G)
which provide psychometric information on the SF mea-
were also included. We focused on the taxonomy
sures when used to assess other measures focussed on
developed by the COSMIN initiative which covers three
convergent and divergent validity (n = 1), discriminant
main domains and related measurement properties: reli-
validity (n = 1) and concurrent validity (n = 1). Due to
ability (internal consistency and measurement error);
the small number of papers identified, no exclusions were
validity (content validity, construct validity, and criterion
made based on methodological quality.
validity); and responsiveness. Peer-reviewed papers which
assessed one or more of these measurement properties
Quality appraisal
were included.
Papers were excluded if they covered individuals in
Methodological quality scores varied between and within
receipt of curative treatment (defined above) or palliative
papers. The quality assessment of internal consistency
care; did not cover breast cancer survivors; did not assess
included scorings of excellent (n = 1), good (n = 1) and
the psychometric properties (defined above) of the SF-36 or
fair (n = 1). Methodological quality scores for construct
its derivatives; and were not published in peer-review
validity (including discriminant validity which the COS-
journals.
MIN initiative includes under construct validity) included
good (n = 1) and poor (n = 3). The methodological
Quality assessment quality of concurrent validity was assessed as fair in each
study (n = 4). Within studies, methodological quality
The COSMIN checklist and scoring manual were used to ranged from excellent to poor [10] and from good to fair
appraise the methodological quality of studies. The [11, 12]. In general, the main reason for reduced method-
checklist addresses the general requirements of psycho- ological quality across and within studies was the use of
metric studies, followed by separate sections on individual less-than-optimal sample sizes, particularly for analyses by
psychometric properties [6, 7]. In total, there are 114 sub-group. Additionally, many of the studies did not
checklist items. A four-point scoring system from clearly state a priori which hypotheses they were testing
‘excellent’, ‘good’, ‘fair’ to ‘poor’ has been developed for when assessing convergent and divergent validity, includ-
each property [8]. Many checklist items require subjective ing the direction and size of association between conver-
judgements, and this aspect may account for low inter- gent or divergent constructs.
rater reliability, thus the authors recommend independent
quality assessors to decide a priori how items are to be Internal consistency
interpreted [9]. Methodological quality was appraised
independently by two reviewers with high agreement. It is A range of ethnic and language groups were the focus of
not necessary to complete all items in the checklist, as interest in the three US studies which assessed the internal
each included study may address one or more psycho- consistency of the SF measures. The Chinese language SF-
metric properties. 12; Spanish, Korean Chinese and English language SF-36;
123
342 Qual Life Res (2015) 24:339–362
Two of the studies calculated internal consistency scores

for the overall breast cancer sample as well as ethnic and
language sub-groups. Internal consistency scores for the
overall sample across the SF subscales were acceptable in
both studies: from a = 0.76 to a = 0.91 [10] and a = 0.72
to a = 0.90 [12]. Only one study assessed the internal
consistency scores for the SF-36 subscales among a
European-American population; the scores were acceptable
(range a = 0.76–0.90) [10].
Two studies assessed the internal consistency of the SF-
36 among African-American breast cancer survivor popu-
lations. Similar internal consistency scores were reported in
both studies, and with the exception of the social func-
tioning subscale, these scores were in the acceptable to
good range [10, 12]. For the purposes of comparison to the
Ashing-Giwa et al. [10] study which utilised the full SF-36
to the Ashing-Giwa and Rosales [12] study which imple-
mented only four SF-36 subscales, internal consistency
scores for social functioning (a = 0.64; a = 0.68), physi-
cal role limitations (a = 0.86; a = 0.87), bodily pain
(a = 0.86; a = 0.84) and general health (a = 0.79;
a = 0.78) are highlighted, respectively.
Two studies included Latina-American breast cancer
survivors. One study administered the Spanish language
SF-36 [10], whereas the other study administered the
English language SF-36 (4 subscales) to two groups: sur-
vivors who were English language proficient (EP) and
survivors who had limited English language proficiency
(LEP) [12]. Nevertheless, internal consistency scores were
all acceptable and generally similar across each sub-group
between and within the two studies. For the purposes of
comparison of the Ashing-Giwa et al. [10] study which
utilised the full SF-36 to the Ashing-Giwa and Rosales
(2013) study [12] which implemented only four SF-36
subscales, internal consistency scores for social functioning
(a = 0.71; EP: a = 0.75; LEP: a = 0.73), physical role
limitations (a = 0.94; EP: a = 0.87; LEP: a = 0.93),
bodily pain (a = 0.84 EP: a = 0.89; LEP: a = 0.88) and
general health (a = 0.84 EP: a = 0.74; LEP: a = 0.76) are
highlighted, respectively [10, 12].
Among Chinese-American breast cancer survivors,
internal consistency scores for the Chinese language SF-12
at both baseline and at one-year follow-up for the physical
and mental component summary scores were acceptable
(range a = 0.81–0.82 and a = 0.79–0.80) [13]. Internal
consistency scores on the eight subscales of the SF-36 among
Fig. 2 Number of papers after implementation of search strategy Asian-American breast cancer survivors were also accept-
able in an additional study (range a = 0.79–0.89) [10].
and 4 subscales of the English language SF-36 were the Concurrent validity: convergent
measures of interest. A Cronbach’s alpha co-efficient (a)
score of greater than 0.7 indicates acceptable internal Two studies assessed the extent of convergence between
consistency. subscales of the FACT-B and FACT-G measures and the
123
Table 1 Study characteristics of papers which focussed on the psychometric properties of the SF measuresa
Study Ashing-Giwa et al. [10] Ashing-Giwa et al. [13] Ashing-Giwa and Rosales [12] Wilson et al. [11]
Country USA USA USA USA
Measure(s) of interest SF-36 Spanish-, Korean, Chinese SF-12 Chinese language SF-36 4 sub-scales only (general SF-36 English language version
and English language versions version health; pain; role limitations due
to physical health and social
functioning) English language
version
Functional Assessment of Cancer Functional Assessment of FACT-G Functional Living Index-Cancer
Therapy-Breast (FACT-B) Cancer Therapy-General (FLIC)
Qual Life Res (2015) 24:339–362
(FACT-G)
Life Stress Scale MOS Social Support Survey
Quality of medical care Centre for Epidemiology
satisfaction (study specific) Studies—Depression (CES-D)
Spirituality (study specific)
Body image (study specific)
Sexual impact (study specific)
Short Acculturation Scale for
Hispanics
Aim(s) To assess the construct validity of To assess the internal To assess the reliability and To assess whether the SF-36 and
the QoL measures consistency and construct validity of HRQoLb measures in the FLIC can be used
validity of the SF-12 and minority ethnic populations interchangeably to measure
the FACT-G among (African-American; English HRQoL among breast cancer
Chinese-American breast language proficient (EP) Latina- survivors
cancer survivors American and limited English
language proficient (LEP)
Latina-American)
To assess the internal consistency To assess the concurrent validity To assess whether similarly named
of the measures by ethnic group of the FACT-G with the other sub-scales on the SF-36 and the
outcome measures FLIC measure similar
dimensions of HRQoL
To assess the concurrent validity To assess the extent to which the
of the FACT-B and the SF-36 SF-36 and the FLIC are able to
detect differences in HRQoL
between breast cancer survivors
with and without lymphedema
Study design Cross-sectional telephone or postal Longitudinal postal survey Cross-sectional survey (baseline) Cross-sectional survey
survey data obtained from an
intervention study
Population Criteria Criteria Criteria Criteria
343
123
Table 1 continued
344
123
African-American, European- Chinese-American Breast Breast cancer survivors (with no Breast cancer survivors aged
American, Asian-American and cancer survivors between other cancer type) between 1 and 18-65 years, at least 3 months
Latina-American breast cancer 6 months and 3 years 6 years post-diagnosis (stages post-surgery recruited from
survivors between 1 and 5 years post-diagnosis (stages 0-III), aged over 18 years, self- outpatient lists
post-diagnosis (stages 0-III), 0-III), aged over 18 years identified as African- or Latina-
aged 18 and over with no other and identified from a American and identified from
cancer or major medical or cancer registry cancer registries, clinics and
psychiatric condition and support groups
identified from cancer registry
and community groups
Sample Sample Sample Sample
n = 703 n = 74 completed survey n = 320 Lymphedema group n = 32; age:
at both time-points; age: mean = 50.6 years, SD = 10.1;
mean = 54.6 years, time since diagnosis:
SD = 9.1, mean = 2.6 years, SD = 2.1
range = 31–83 years; age
at diagnosis:
mean = 52.7,
SD = 8.7 years; time
since diagnosis:
mean = 2.4 year,
SD = 2.0; 79 %
diagnosed stage I-II
African-American n = 135; age: 78 % more than high African-American n = 88; age: Non-lymphedema group n = 78;
mean = 56 years; age at school education; 40 % 70 % \ 65 years; 77 % age: mean = 52.8 years,
diagnosis: mean = 52; time low income less than diagnosed stage I-II; 82 % more SD = 9.1; time since diagnosis:
since diagnosis: $25,000 than high school education; mean = 2.1 years, SD = 1.7
mean = 3.6 years; 80 % 31 % low income less than
diagnosed stages I-II $25,000
80 % more than high school EP Latina-American n = 95;
education; 30 % low income less 84 % \ 65 years; 79 %
than $25,000 diagnosed stage I-II; 28 % more
than high school education;
31 % low income less than
$25,000
European-American n = 179; LEP Latina-American n = 137;
age: mean = 57 years; age at 85 % \ 65 years; 78 %
diagnosis: mean = 55; time diagnosed stage I-II; 14 % more
since diagnosis: than high school education;
mean = 2.7 years; 74 % 71 % low income less than
diagnosed stages I-II $25,000
Qual Life Res (2015) 24:339–362
Table 1 continued
90 % more than high school

education; 14 % low income less
than $25,000
Latina-American n = 183; age:
mean = 53 years; age at
diagnosis: mean = 50; time
since diagnosis:
Qual Life Res (2015) 24:339–362
mean = 2.9 years; 75 %

diagnosed stages I-II
than $25,000
Asian-American n = 206; age:
mean = 54 years; age at
diagnosis: mean = 51; time
since diagnosis:
mean = 2.9 years; 74 %
diagnosed stages I-II
than $25,000
Psychometric properties assessed Internal consistency Internal consistency Internal consistency Convergent validity
Construct validity Construct validity Construct validity (confirmatory Discriminative validity
(exploratory factor factor analysis based on original
analysis at baseline and factor structure)
Confirmatory factor
analysis at 1-year follow-
up)
Concurrent validity Concurrent validity
Results Internal consistency Internal consistency Internal consistency Convergent validity
Overall sample: physical Baseline: SF-12 physical SF-36 social functioning, Total SF-36 PCS vs. SF-36 MCS: tau-
functioning a = 0.91, physical component summary sample: a = 0.72; African- b = 0.247
role limitations a = 0.89, (PCS) score a = 0.82; American sample: a = 0.68;
emotional role limitations SF-12 mental component LEP Latina-American sample:
a = 0.86, vitality = 0.85, summary score (MCS) a = 0.73; EP Latina-American
mental health a = 0.84, social a = 0.80 sample: a = 0.75
functioning a = 0.76, bodily
pain a = 0.84, general health
a = 0.80
345
123
Table 1 continued
346
123
African-American sample: One-year follow-up: SF-12 SF-36 physical role limitations, SF-36 PCS vs. FLIC total score:
physical functioning a = 0.93, PCS a = 0.81; SF-12 Total sample: a = 0.90; African- tau-b = 0.556
physical role limitations MCS a = 0.79 American sample: a = 0.87;
a = 0.86, emotional role LEP Latina-American sample:
limitations a = 0.84, a = 0.93; EP Latina-American
vitality = 0.87, mental health sample: a = 0.87
a = 0.82, social functioning
a = 0.64, bodily pain a = 0.86,
general health a = 0.79
European-American sample: Construct validity SF-36 pain, Total sample: SF-36 MCS vs. FLIC total score:
physical functioning a = 0.88, a = 0.88; African-American tau-b = 0.490
physical role limitations sample: a = 0.84; LEP Latina-
a = 0.87, emotional role American sample: a = 0.88; EP
limitations a = 0.83, Latina-American sample:
vitality = 0.90, mental health a = 0.89
a = 0.88, bodily pain a = 0.86,
Latina-American sample: Baseline: two factors SF-36 general health, Total SF-36 physical functioning vs.
physical functioning a = 0.93, emerged sample: a = 0.77; African- FLIC physical functioning: tau-
physical role limitations American sample: a = 0.78; b = 0.616
a = 0.94, emotional role LEP Latina-American sample:
limitations a = 0.86, a = 0.76; EP Latina-American
vitality = 0.86, mental health sample: a = 0.74
a = 0.71, bodily pain a = 0.84,
Asian-American sample: physical Factor 1: All 6 MCS items Construct validity SF-36 mental health vs. FLIC
functioning a = 0.89, physical loaded (not mental health: tau-b = 0.586
role limitations a = 0.88, careful = 0.432, social
emotional role limitations time = 0.585,
a = 0.88, vitality = 0.79, energy = 0.632,
mental health a = 0.83, social accomplished less
functioning a = 0.79, bodily emotional = 0.798, blue/
pain a = 0.80, general health sad = 0.849,
a = 0.84 peaceful = 0.896), as
well as 1 PCS item
(general health = 0.426)
Qual Life Res (2015) 24:339–362
Table 1 continued
Construct validityc Factor 2: All 6 PCS items Total sample: 4-factors explained SF-36 social functioning vs. FLIC
loaded (accomplished less 72 % of the variance and social functioning: tau-
physical = 0.444, general represented the factorial b = 0.526
health = 0.476, pain structure of the original SF-36
interference = 0.589, measure
limited in kind of
Qual Life Res (2015) 24:339–362
work = 0.698, moderate

activities = 0.742, climb
several flights = 0.940)
Only data from the SF-36 general One-year follow-up: Factor 1: physical role limitations SF-36 general health vs. FLIC
health Perception items were 58.1 % of common (all 4 items): range = 0.82–0.86; general health: tau-b = 0.503
presented in the paper variance explained social functioning (1 out of 2
items): 0.49
Overall sample (n = 676): Factor 1: 5/6 PCS items Factor 2: general health (4 out of Discriminative validity
emotional role limitations, (general health = 0.531, 5 items): range = 0.60–0.72
physical role limitations, pain accomplished less
and mental health loaded onto a physical = 0.728, pain
single factor each; general interference = 0.735,
health, physical functioning and moderate
vitality each had mostly good activities = 0.846, climb
factor structure with a few several flights of
inconsistencies; social stairs = 0.977) but one
functioning did not load onto any (limited in kind of work)
factors loaded and 3/6 MCS
items (energy = 0.526,
accomplished less
emotional = 0.480 and
not careful)
loaded = 0.448
African-American (n = 131): Factor 2: 1/6 PCS item Factor 3: general health (2 out of SF-36 PCS (d = 1.20) but not
general health, emotional role loaded (limited in kind of 5 items): range = 0.46-0.51; MCS (d = 0.19) and each of the
limitations and pain items had work = 0.502), and 4/6 bodily pain (all 2 items): subscales (physical functioning:
good factor structures; other MCS items (social range = 0.68-0.82 d = 1.11; physical role
items had loadings on multiple time = 0.733, not limitations: d = 1.02; emotional
factors or no factors careful = 0.403 role limitations: d = 0.72;
peaceful = 0.879 and vitality: d = 0.74; social
blue/sad = 0.827) loaded functioning: d = 0.60; bodily
pain: d = 0.72 and general
health: d = 0.69) with the
exception of mental health
(d = 0.35) had significant effect
sizes
347
123
Table 1 continued
348
123
European-American (n = 174): Factor 4: general health (1 out of
discrepancies with factor loading 5 items): 0.72; social functioning
for physical functioning (items (all 2 items): range = 0.54–0.75
loaded onto two separate factors)
and general health (3/5 items
loaded onto one factor); other
subscale items had consistent
factor loadings
Latina-American (n = 170): African-American sample:
physical role limitations, 4-factors explained 73 % of the
emotional role limitations and variance
mental health items had
consistent factor loadings; other
subscale items had less
consistent factor loadings
Asian-American (n = 201): Factor 1: physical role limitations
consistent factor loadings for (all 4 items): range = 0.77–0.89;
general health, emotional role social functioning (1 out of 2
limitations and pain; other items): 0.57
subscale items loaded onto
multiple or no factors
Concurrent validitye Factor 2: general health (3 out of
5 items): range = 0.74–0.84;
bodily pain (1 out of 2 items):
0.45
Sub-scales of the FACT-G were Factor 3: general health (1 out of
significantly correlated to 5 items): 0.74; bodily pain (all 2
respective sub-scales of the SF- items): range = 0.69–0.71 and;
36 social functioning (all 2 items):
range = 0.51–0.71
Qual Life Res (2015) 24:339–362
Table 1 continued
Total sample: SF-36 general Factor 4: general health (1 out of

health and FACT-G functional 5 items): 0.93
Well-being (q = 0.62); SF-36
physical functioning and FACT-
G physical well-being
(q = 0.60); SF-36 physical role
Qual Life Res (2015) 24:339–362
limitations and FACT-G physical

well-being (q = 0.61); SF-36
emotional role limitations and
FACT-G functional well-being
(q = 0.51); SF-36 vitality and
(q = 0.64) and physical well-
being (q = 0.64); SF-36 mental
health and FACT-G emotional
well-being (q = 0.65); SF-36
social functioning and FACT-G
functional well-being
(q = 0.62); SF-36 bodily pain
and FACT-G physical well-being
(q = 0.68)
African-American: SF-36 general LEP Latina-American sample:
health and FACT-G functional 4-factors explained 75 % of the
Well-being (q = 0.60); SF-36 variance
(q = 0.51) and FACT-G
emotional Well-being
(q = 0.63); SF-36 mental health
and FACT-G emotional well-
being (q = 0.66); SF-36 social
functioning and FACT-G
(q = 0.70)
349
123
Table 1 continued
350
123
European-American: SF-36 Factor 1: physical role limitations
general health and FACT-G (all 4 items): range = 0.84–0.89;
functional Well-being bodily pain: (1 out of 2 items):
(q = 0.59); SF-36 physical range = 0.47; social functioning
functioning and FACT-G (1 out of 2 items): 0.42
physical well-being (q = 0.48);
SF-36 physical role limitations
(q = 0.64); SF-36 emotional
role limitations and FACT-G
and FACT-G functional well-
(q = 0.68)
Latina-American: SF-36 general Factor 2: general health (3 out of
health and FACT-G functional 5 items): range = 0.43–0.81
FACT-G physical well-being
(q = 0.63).
Qual Life Res (2015) 24:339–362
Table 1 continued
Asian-American: SF-36 general Factor 3: general health (3 out of

health and FACT-G functional 5 items): range = 0.52–0.81;
well-being (q = 0.66) and bodily pain (1 out of 2 items):
FACT-G physical Well-being 0.51
(q = 0.66); SF-36 physical
Qual Life Res (2015) 24:339–362
limitations and FACT-G
functional Well-being
(q = 0.53) and FACT-G
physical Well-being (q = 0.53);
SF-36 emotional role limitations
and FACT-G functional well-
being (q = 0.50); SF-36 vitality
SF-36 bodily pain and FACT-G
physical well-being (q = 0.68)
Factor 4: general health (1 out of
5 items): 0.74; bodily pain (all 2
items): range = 0.59–0.61;
social functioning (all 2 items):
range = 0.59–0.81
EP Latina-American sample:
4-factors explained 73 % of the
variance
Factor 1: physical role limitations
(all 4 items): range = 0.77–0.82;
social functioning (1 out of 2
items): 0.52
5 items): range = 0.66–0.88
5 items): range = 0.40–0.62;
bodily pain (all 2 items):
range = 0.85–0.88
351
123
Table 1 continued
352
123
5 items): range = 0.54–0.84;
social functioning (all 2 items):
range = 0.52–0.71
Concurrent validityd
African-American: SF-36 general
health vs. FACT-G physical
well-being (q = 0.60); vs.
FACT-G social/family well-
being (q = 0.01 n.s.*); vs.
FACT-G emotional well-being
(q = 0.39); vs. FACT-G
functional well-being (q = 0.43)
SF-36 social functioning vs.
(q = 0.69); vs. FACT-G social/
family well-being (q = 0.16
n.s.*); vs. FACT-G emotional
(q = 0.59)
SF-36 physical role limitations vs.
family well-being (q = -0.01
n.s.*); vs. FACT-G emotional
(q = 0.51)
SF-36 bodily pain vs. FACT-G
vs. FACT-G social/family well-
being (q = 0.04 n.s.*); vs.
(q = 0.20 n.s.); vs. FACT-G
Qual Life Res (2015) 24:339–362
Table 1 continued
LEP Latina-American: SF-36

general health vs. FACT-G
being (q = 0.42); vs. FACT-G
emotional well-being
Qual Life Res (2015) 24:339–362
(q = 0.44); vs. FACT-G

family well-being (q = 0.40);
vs. FACT-G emotional well-
family well-being (q = -0.17);
being (q = 0.21 n.s.*); vs.
(q = 0.27); vs. FACT-G
EP Latina-American: SF-36
general health vs. FACT-G
(q = 0.42); vs. FACT-G
353
123
Table 1 continued
354
123
family well-being (q = 0.43);
family well-being (q = -0.23);
being (q = 0.18 n.s.); vs. FACT-
G functional well-being
(q = 0.53)
(q = 0.25); vs. FACT-G
Authors conclusions Internal consistency Internal consistency Internal consistency Convergent validity
The SF-36 was assessed as having The SF-12 has good The SF-36 had acceptable internal SF-36 PCS and MCS measure
moderate-to-strong reliability internal consistency and consistency across the three sub- distinct domains of HRQoL
across the different ethnic groups the measure is reliable in groups
a Chinese-American
population
Construct validity Construct validity Construct validity There is a modest degree of
construct overlap between the
SF-36 PCS and MCS and the
FLIC total score
Overall, the SF-36 presented good The SF-12 at baseline had The SF-36 role limitations due to The physical, mental and social
factor structure, with the good factor structure physical health subscale had domains of HRQoL are similar
exception of the social which closely reflected good factor structure across each in the two measures, but general
functioning scale the 2 constructs of the sub-group health is not similar
measure
Qual Life Res (2015) 24:339–362
Table 1 continued
Factor structures by ethnic groups The factor structure of the The SF-36 general health, pain Discriminative validity
were less consistent SF-12 at follow-up was and social functioning sub-scales
less robust, and this may had inconsistent factor structures
be due to a response shift across the three sub-groups
in cancer survivor’s
interpretation of the
Qual Life Res (2015) 24:339–362
items.
Concurrent validity The SF-36 is acceptable for use in SF-36 is able to discriminate
breast cancer survivor between breast cancer survivors
populations from ethnic minority with and without lymphedema in
and low-literacy groups terms of physical HRQoL, but
not mental HRQoL
There was good concurrent Concurrent validity
validity demonstrated between
the FACT-G and the SF-36
There was good concurrent
validity demonstrated between
the FACT-G and the SF-36
Quality Assessment Internal consistency Internal consistency Internal consistency Concurrent validity
Excellent Fair Good
Construct validity Construct validity
Poor Poor
Concurrent validity Construct validity Concurrent validity
Fair Good Fair Fair
a
Unless where necessary information regarding the SF measures only is reported
b
HRQoL = health-related quality-of-life
c
Participants with missing data were excluded from this analysis
d
Although, the FACT-G was focus of the analysis, psychometric information on the SF-36 is also provided
e
Only the highest, positive correlations are reported in the table
* n.s. = non-significant
355
123
356 Qual Life Res (2015) 24:339–362
SF-36 [10, 12]. The Ashing-Giwa et al. [10] study analysed correlation was reportedly with SF-36 emotional role lim-
convergent validity among the total sample and four ethnic itations. Moreover, Lymph-ICF mobility activities subscale
and language sub-groups: African-American, European- was negatively, but moderately associated with the SF-36
American, Latina-American and Asian-American. The emotional role limitations subscale (q = -0.42) which
strength and direction of associations between respective deviated from the a priori hypothesis [14].
SF and FACT-B subscales were similar across the total
sample and ethnic and language sub-groups—see Table 2. Construct validity
The Ashing-Giwa and Rosales [12] study analysed the
concurrent validity between the FACT-G and four sub- Based on the original factor structure of the SF-36, one
scales of the SF-36 among three sub-groups: African- study utilised Confirmatory Factor Analysis to test the
American; LEP and EP Latina-Americans. Except for a factor structure of four subscales of the SF-36 for their total
few discrepancies, the strength and direction of associa- sample and three sub-groups: African-American; LEP and
tions were similar between the FACT-G and SF-36 sub- EP Latina-American. For the total sample, the 4 factors
scales among each of the sub-groups—see Table 2. Of explained 72 % of variance and generally represented the
note, a strong association was found between the SF-36 structure of the SF-36. Four distinct factors emerged for
social functioning and the FACT-G social/family well- each SF-36 subscale with few inconsistencies—see
being subscales among the African-American group com- Table 1. Although, the four factors of the SF-36 explained
pared to low-moderate correlations observed among the between 73 and 75 % of the variance in quality-of-life
LEP and EP Latina-American groups. scores within the respective ethnic and language sub-
Two additional studies assessed the convergent validity of groups, the factor structure of the subscales—with the
SF-36 subscales and respective lymphedema-specific mea- exception of the physical role limitation subscale—was
sures among breast cancer survivors in Dutch-speaking less consistent. The general health items loaded onto three
countries. One of the studies stated five a priori hypotheses to factors across the three sub-groups, and each of the items
assess convergent validity between the Lymph-ICF and the for the bodily pain and social functioning subscales loaded
SF-36. Each of the hypotheses was supported and listed in onto one factor among the African-American and LEP
Table 2 [14]. The second study demonstrated acceptable Latina-American sub-groups—see Table 1 [12].
convergent validity with the ULL27 [15]. The highest, posi- Exploratory factor analysis data across the total sample
tive correlations were between the SF-36 psychological sub- and ethnic and language sub-groups were presented for the
scales (with the exception of the emotional role limitations general health perception items only in one paper [10].
subscale), and social subscales and respective psychological According to the authors’ descriptions of the results for the
and social domains of the ULL27. However, the SF-36 vitality other subscales (except for SF-36 social functioning), the
and physical role limitations subscales did not correlate very factor structure was generally consistent for the total
strongly with the physical domain of the ULL27 [15]. sample. The most consistent factor structure was within the
Both studies demonstrated moderate correlations European-American sub-group whereby inconsistencies
between the SF-36 bodily pain subscale and respective were found only within the general health and physical
physical scales of the lymphedema scales in the expected functioning subscales. The factor structure of the SF-36
direction. Similar results were demonstrated between the performed similarly within the Asian- and African-Amer-
SF-36 social functioning subscale and respective Lymph- ican sub-groups. The most inconsistent pattern of factor
ICF and ULL27 social scales. The Devoogdt et al. [14] loadings was within the Latina-American sub-group; good
study found that the SF-36 mental health subscale was factor structure was identified among the emotional role
correlated strongly with the Lymph-ICF mental function limitations subscale only [10].
scale; however, the Viehoff et al. [15] study reported Both studies demonstrate good factor structure of the SF-
similar findings except for the respective ULL27 psycho- 36 within the total samples under study; however, factor
logical scale and SF-36 emotional role limitations. structures were less consistent within the ethnic and lan-
guage sub-groups. A few notable discrepancies can be found
Concurrent validity: divergent between the two studies. Within the African-American sub-
group, discrepancies can be seen for the factor structure of
One study stated five hypotheses to assess the divergent the physical role limitations, general health and bodily pain
validity of SF-36 and the Lymph-ICF—see Table 2. Three subscales. Moreover, within the Latina-American sub-
of the five hypotheses were supported. The authors group, the only discrepancy can be seen in the factor struc-
expected the greatest divergence and thus weakest associ- ture for the physical role limitations subscale [10, 12].
ation to be between the Lymph-ICF life and social and the A further study assessed the factor structure of the SF-12
SF-36 physical functioning subscales; however, the lowest within a Chinese-American breast cancer survivor sample.
123
Table 2 Study characteristics of papers which provide psychometric information on the SF measures but focussed on other measuresa
Study Devoogdt et al. [14] Terhorst et al. [16] Viehoff et al. [15]
Country Belgium USA The Netherlands
Measure(s) of Lymphedema Functioning, Disability and Health Breast Cancer Prevention Trial Symptom Checklist Upper limb lymphedema 27-item questionnaire (ULL27)
interest questionnaire (Lymph-ICF) (BCPT)
SF-36 Dutch language version SF-36 English language version SF-36 Dutch language version
Study-specific questionnaire
Aim(s) To assess the validity and reliability of the Lymph-ICF To assess the psychometric properties of the BCPT with To translate the ULL27 into Dutch (from French) and
questionnaire a sample of breast cancer patients before and after determine its psychometric properties in a population
Qual Life Res (2015) 24:339–362
adjuvant therapy of patients with lymphedema

Construct validity assessed using the SF-36 To determine the discriminant validity of the presence/ To assess the concurrent validity of the ULL27 using the
absence of symptoms on the BCPT using the SF-36 SF-36
Study design Longitudinal survey: baseline at outpatient Baseline and follow-up data obtained from a Cross-sectional survey and clinical assessment of
appointment; follow-up 24–48 h later to be returned longitudinal cohort study lymphedema
by post.
Population Criteria Criteria Criteria
Dutch-speaking breast cancer survivors who had Breast cancer survivors diagnosed with stages I-IIIa Dutch women with breast cancer with unilateral edema
undergone unilateral axillary dissection at least who were part of the Anastrozole use In Menopausal of the upper limb (no distinction made between
12 months previously, recruited from hospital-based women (AIM) cohort study primary and secondary lymphedema). Patients with
physiotherapy and breast clinic appointments progressive disease or infection of the upper limb in
the last 2 months were excluded
Sample Sample Sample
n = 90 breast cancer patients n = 27 chemotherapy only group; age: n = 84; age: mean = 59 years; SD = 11.79; onset of
mean = 58.9 years; range = 49–73 years; education: edema after surgery: mean = 26 months;
mean = 14.6 years; 81.5 % white ethnicity SD = 56.61 months
n = 30 with objective lymphedema; age: n = 157 anastrozole only group; age:
mean = 61.2 years; SD = 10.0 years mean = 61.6 years; range = 45–75 years; education:
mean = 14.8 years; 98.1 % white ethnicity
n = 30 with subjective lymphedema; age: n = 94 chemotherapy and anastrozole combined group;
mean = 56.7 years; SD = 9.3 years age: mean = 59.0; range = 44–68 years; education:
mean = 14.7 years; 94.7 % white ethnicity
n = 30 with no reported lymphedema; age:
mean = 58.3 years; SD = 11.9 years
Psychometric Construct validity (Convergent and Divergent)
properties
assessed
Convergent validity hypotheses Discriminant validity Concurrent validity
(1) Lymph-ICF physical function and SF-36 bodily
pain
(2) Lymph-ICF mental function and SF-36 mental
health
(3) Lymph-ICF household activities and SF-36
physical functioning
357
123
Table 2 continued
358
123
(4) Lymph-ICF mobility activities and SF-36 physical
functioning
(5) Lymph-ICF life and social activities and SF-36
social functioning
Divergent validity hypotheses
(1) Lymph-ICF physical function and SF-36 role-
emotional and mental health
(2) Lymph-ICF mental health and SF-36 physical
functioning and physical role limitations
emotional role limitations and mental health
(4) Lymph-ICF mobility activities and SF-36
emotional role limitations and mental health
(5) Lymph-ICF life and social activities and SF-36 –
physical limitations
Results Construct validity Discriminant validity Concurrent validity
5 convergent validity hypotheses were supported Low, negative correlations between the symptoms The Dutch ULL27 domains were significantly
reported on the BCPT and scores on the SF-36 correlated with most of the respective SF-36
summary component scores at both baseline
(range = -0.401 to 0.016) and 6-month follow-up
(range = -0.308 to -0.007)
(1) Lymph-ICF physical function and SF-36 bodily The ULL27 physical domain correlated highly with SF-
pain (q = -0.52) 36 bodily pain (q = 0.69), general health (q = 0.60)
and social functioning (q = 0.55) domains, but not
physical role limitations (q = 0.38) or vitality
(q = 0.47) domains as would be expected
(2) Lymph-ICF mental function and SF-36 mental The ULL27 psychological domain correlated highly
health (q = -0.70) with SF-36 general health (q = 0.54), vitality
(q = 0.55), mental health (q = 0.66) and social
functioning (q = 0.51) domains as would be expected,
but not the emotional role limitations (q = 0.42)
domain
(3) Lymph-ICF household activities and SF-36 The ULL27 social domain correlated highly with SF-36
physical functioning (q = -0.51) physical functioning (q = 0.64), general health
(q = 0.56) and social functioning (q = 0.45) domains
(4) Lymph-ICF mobility activities and SF-36 physical
functioning (q = -0.62)
(5) Lymph-ICF life and social activities and SF-36
social functioning (q = -0.33)
3 of the 5 divergent hypotheses were supported
Qual Life Res (2015) 24:339–362
Table 2 continued
(1) Lymph-ICF physical function and SF-36 role-

emotional (q = 0.03) and mental health (q = -0.14)
(2) Lymph-ICF mental health and SF-36 physical
functioning (q = -0.24) and physical role limitations
(q = -0.25)
Qual Life Res (2015) 24:339–362
emotional role limitations (q = -0.22) and mental

health (q = -0.27)
(4) Support for emotional role limitations only and not
mental health-Lymph-ICF mobility activities and SF-
36 emotional role limitations: (q = -0.15) and
mental health: (q = -0.42)
(5) Unsupported-Lymph-ICF life and social activities
and SF-36 physical functioning (q = -0.25); (role
limitations—emotional had lowest correlation q = -
0.19)
Authors Construct validity Discriminant validity Concurrent validity
conclusions The Lymph-ICF demonstrated good convergent and Discriminant validity was reported in the expected The lower concurrent validity between the ULL27
divergent validity with respective sub-scales of the direction that a higher number of reported symptoms is physical domain and the respective SF-36 domains
SF-36 associated with lower scores on the SF-36. This finding may be explained by the focus of lower limb
is supported by other research functioning in the SF-36 as compared to upper limb
focus in the ULL27
There was good concurrent validity between the
psychological and social domains of the ULL27 and
respective SF-36 domains
Quality Construct validity Construct validity Construct validity
assessment Poor Fair Fair
a
Unless where necessary information regarding the SF measures only is reported
359
123
360 Qual Life Res (2015) 24:339–362
The authors utilised exploratory factor analysis at baseline spoken languages and ethnicities. The social functioning
to explore the factor structure of the measure within this subscale had the lowest internal consistency scores
population. The factor structure largely mirrored that of the (a = 0.64 and a = 0.68) among African-American breast
original measure, all MCS items (with one PCS item) and cancer survivors in two studies; however, the scores were
PCS items loaded onto two emergent factors, respectively. not low enough to warrant cause for concern.
Confirmatory Factor Analysis was used to assess how the The SF-36 and FACT-G were assessed concurrently in
SF-12 performed in the same population, 1 year later. Two two studies within various ethnic and language sub-groups;
factors emerged, but the pattern of factor loadings reflect- comparisons between the two studies were, however, limited
ing the MCS and PCS items was less consistent, and as one study included only four of the eight SF-36 subscales
58.1 % only of variance was explained [13]. and included only two similar ethnic and language sub-
groups. Differences in study design between the two studies
Discriminant validity may account for some of the discrepancies in concurrent
validity results (e.g. SF-36 subscales and FACT measures
One study administered the English language SF-36 in social/family well-being subscales) among the African-
order to differentiate between groups of breast cancer American samples. One of the studies conducted a popula-
survivors with and without lymphedema [11], and a second tion-based cross-sectional telephone or postal survey to
study used the SF-36 to assess the discriminant validity of investigate different recruitment strategies among different
the Breast Cancer Prevention Trial Symptom Checklist ethnic and language groups and to assess the psychometric
(BCPT) measure [16]. properties of health-related outcome measures [10], whereas
At baseline and 6-month follow-up, low correlations the other study utilised baseline data from a psycho-educa-
(range q = -0.401 to 0.016 and q = -0.308 to -0.007, tion intervention study to undertake a secondary assessment
respectively) were reported between the SF-36 component of psychometric properties [12]. No further details of the
summary scores and BCPT scores. The BCPT requires the psycho-education intervention are reported in the study or
presence or absence of a number of symptoms related to published elsewhere, in particular how the sample was
breast cancer treatment to be reported, and higher scores on selected. Breast cancer survivors who already have a good
the SF-36 indicate better HRQoL. The findings indicated knowledge of breast cancer and its impact may have ‘self-
the SF-36 has good discriminant validity [16]. selected’ for the psycho-education intervention compared to
The SF-36 was able to differentiate between survivors survivors with less knowledge; therefore, items on the can-
with and without lymphedema in the expected direction cer-specific FACT-G may have more salience for them
(i.e. survivors with lymphedema have lower SF-36 scores) compared to items on the generic SF-36. A further study
in terms of physical component summary score (d = 1.20), assessed the convergent validity of the SF-36 and FLIC to
physical role limitations (d = 1.02), physical functioning good effect [11]. Given that the SF-36 is a generic measure of
(d = 1.11), emotional role limitations (d = 0.72), vitality health status and the FACT and FLIC are measures specific
(d = 0.74), social functioning (d = 0.60), bodily pain to cancer populations, it shows promise that the measures
(d = 1.20) and general health (d = 0.69). The SF-36 seemingly measure the same constructs in diverse ethnic and
mental component summary score (d = 0.19) and mental language breast cancer survivor groups. It would appear that
health subscale were not able to differentiate between the SF-36 may be a suitable generic alternative to both
survivors with and without lymphedema [11]. cancer-specific measures, particularly to make comparisons
of the health status of cancer survivors to population norms,
the general population or other disease groups.
Discussion The concurrent validity of respective subscales of the
Dutch language SF-36 and lymphedema-specific measures
Seven studies assessed the psychometric properties of the (ULL27 and Lymph-ICF) was assessed for convergent
SF measures within diverse ethnic and language breast validity in two studies and divergent validity (Lymph-ICF)
cancer survivor samples. Overall, the SF measures were in one study to good effect. One of the studies found a less
found to have good psychometric properties. Further sup- strong association between respective physical subscales of
port for the use of the SF-36 among cancer survivor pop- the two measures. The authors report that this is likely due
ulations is provided by the assessment of the psychometric to the lower limb focus of the SF-36 e.g. ability to climb a
properties of the SF-36 within a British, childhood cancer flight of stairs compared to the upper limb focus of the
survivor cohort [17]. ULL27 [15]. Thus, the SF-36 may be more appropriate for
Internal consistency ranged from acceptable to good use among survivors with lower limb lymphedema [18],
across the SF-36 subscales and SF-12 component summary although this would need to be psychometrically assessed.
scores within breast cancer survivor populations of varying In order to accurately capture the health outcomes of breast
123
Qual Life Res (2015) 24:339–362 361
cancer survivors with lymphedema, the SF-36 may not be items in the light of changes that have occurred as a result
an adequate substitute for lymphedema-specific measures of diagnosis and treatment as breast cancer survivors live
in terms of capturing similar aspects of health; however, longer with the disease [25]. Further research with ade-
more research is needed in this area due to the limited quate sample size should be undertaken to ensure that the
number of studies identified. factor structure of the SF-36 for use within diverse ethnic
Subscales of the SF-36 (e.g. vitality and bodily pain) and language groups is adequately assessed.
have been implemented in studies to assess cancer-related The SF-36 was able to discriminate between breast
fatigue [19] and cancer-related pain [20], respectively, cancer survivors with and without lymphedema in terms of
among breast cancer survivors. No studies were identified physical health, but not in terms of mental health. This is
which assessed the psychometric performance of these supported by further research which found that there was a
subscales compared to cancer-specific or symptom-specific significant reduction in scores on the SF-36 MCS and
measures. To use fatigue as an example, scales which mental health subscale associated with arm and shoulder
define and measure fatigue exclusively as a multidimen- problems which were not lymphedema among breast can-
sional construct may be more reliable than measures that cer survivors. However, this significant reduction was not
include fatigue as a series of unidimensional items within a seen among breast cancer survivors with lymphedema [26].
domain or subscale [21]. Moreover, variation in the types In contrast to this, results from a population-based survey
(i.e. generic, cancer-specific or symptom-specific) of of cancer survivors (the majority of which were breast
patient-reported outcome measures may account for vari- cancer survivors), scores on the SF-36 MCS and mental
ance in prevalence rates of fatigue and other cancer-related health subscale were significantly lower among cancer
effects [19, 22]. Therefore, there is a need for further survivors with late effects compared to those without late
research to identify the psychometric performance of sub- effects. Although this discrepancy may be explained by a
scales of the SF-36 that may be used to measure cancer- lack of focus on lymphedema, only and many of the
related effects among breast cancer survivors (and other experienced late effects may have been psychological in
cancer groups) to ensure that health care providers and nature [4]. The authors of one paper suggest that the dis-
commissioners adequately capture this information to criminant validity findings for the MCS and mental health
inform service provision and delivery. subscales may be accounted for by the limited number of
The factor structure of the SF-36 was good for overall SF-36 items to address issues of anxiety, depression, or
study samples and reflected the original measure, but distress which may be experienced by many breast cancer
inconsistencies in the factor structure were reported within survivors post-treatment [11]. One study utilised the SF-36
language and ethnic sub-groups. These consistencies may to assess the discriminant validity of the BCPT, to good
result from the loss of nuances of language or the failure to effect. This study provides a good illustration of how the
adequately or at all represent cultural norms when trans- SF-36 has been used to validate other measures [16].
lations are made [23]. However, the inconsistencies may
also be partially explained by a lack of statistical power to Strengths and limitations
conduct sub-group analyses due to small sample sizes. The
COSMIN manual provides recommendations for adequate A major strength of the review was the use of COSMIN
sample size to undertake factor analysis. For example, a resources including a well-developed search strategy and the
‘good’ sample size for factor analysis should include more use of a methodological quality assessment tool. The COS-
than 100 participants and at least five times the number of MIN initiative recommend their checklist to be scored using a
measure items [24]. Only one [13] of the three studies ‘worse score counts’ approach. Therefore, papers which have
which assessed the factor structure of the SF measures had generally ‘excellent’ or ‘good’ scores may score ‘poor’ on one
a ‘good’ sample size, perhaps due to using the smaller SF- item and receive an overall ‘poor’ rating, as was the case with
12 measure and not undertaking sub-group analysis; the the assessment of construct validity in one paper [10].
other studies had a an under-powered, ‘poor’ sample size The inclusion of additional studies (n = 3) which did
for sub-group analysis of the SF-36 [10, 12]. Specific not primarily assess the psychometric properties of the SF
inconsistencies between comparable aspects of the two measures but which nevertheless provide this information
studies may be accounted for by differences in sources of may be questionable [24]. In the light of this, these studies
data—see above [10, 12]. The Chinese language SF-12 demonstrate similar results to studies where the SF mea-
demonstrated good factor structure (closely reflecting the sures were the primary focus. Moreover, many psycho-
original SF-12) at baseline, but less consistent factor metric properties of the SF-36 were not assessed in the
structure at 1-year follow-up within a Chinese-American breast cancer survivor population, for example respon-
breast cancer survivor sample. This may be explained by a siveness to assess the ability of the SF-36 to detect change
response shift in the meaning or interpretation of the SF-12 and this would warrant further research attention.
123
362 Qual Life Res (2015) 24:339–362
Conclusion Challenges in recruitment and measurement. Cancer, 101,

450–465.
11. Wilson, R. W., Hutson, L. M., & VanStry, D. (2005). Comparison
Nuances in language and cultural norms have an important of 2 quality-of-life questionnaires in women treated for breast
influence on how individuals perceive and interpret items cancer: The RAND 36-Item Health Survey and the Functional
on patient-reported outcome measures, particularly as Living Index-Cancer. Physical Therapy, 85, 851–860.
concepts may not be adequately or accurately represented 12. Ashing-Giwa, K., & Rosales, M. (2013). A cross-cultural vali-
dation of patient-reported outcomes measures: A study of breast
after translation. The SF measures to assess health status cancer survivors. Quality of Life Research, 22, 295–308.
have good internal consistency, convergent validity, 13. Ashing-Giwa, K., Lam, C. N., & Xie, B. (2013). Assessing
divergent validity and moderately good construct validity health-related quality of life of Chinese-American breast cancer
within diverse language and ethnic breast cancer survivor survivors: A measurement validation study. Psycho-Oncology,
22, 704–707.
groups. The SF measures would provide a useful aide for 14. Devoogdt, N., van Kampen, M., Geraerts, I., Coremans, T., &
health care providers to assess health-related outcomes of Christiaens, M. (2011). Lymphoedema Functioning, Disability
breast cancer survivors in their care. and Health Questionnaire (Lymph-ICF): Reliability and validity.
Physical Therapy, 91, 944–957.
Acknowledgments CT received funding support from UKCRC 15. Viehoff, P. B., van Genderen, F. R., & Wittink, H. (2008). Upper
Centre of Excellence for Public Health, Queen’s University Belfast to limb lymphedema 27 (ULL27) Dutch translation and validation of
undertake the review. an illness-specific health-related quality of life questionnaire for
patients with upper limb lymphedema. Lymphology, 41, 131–138.
16. Terhorst, L., Blair-Belansky, H., Moore, P. J., & Bender, C.
(2011). Evaluation of the psychometric properties of the BCPT
Symptom Checklist with a sample of breast cancer patients
References before and after adjuvant therapy. Psycho-Oncology, 20,
961–968.
1. Coleman, M. P., Forman, D., Bryant, H., Butler, J., Rachet, B., 17. Reulen, R. C., Zeegers, M. P., Jenkinson, C., Lancashire, E. R.,
Nur, U., et al. (2011). Cancer survival in Australia, Canada, Winter, D. L., Jenney, M. E., et al. (2006). The use of the SF-36
Denmark, Norway, Sweden and the UK, 1995–2007 (the Inter- questionnaire in adult survivors of childhood cancer: Evaluation
national Cancer Benchmarking Partnership): an analysis of pop- of data quality, score reliability, and scaling assumptions. Health
ulation-based cancer registry data. The Lancet, 377, 127–138. and Quality of Life Outcomes, 4, 77.
2. Treanor, C., & Donnelly, M. (2014). Late effects of cancer and 18. Ryan, M., Stainton, C. M., Jaconelli, C., Watts, S., MacKenzie,
cancer treatment—A rapid review. Journal of Supportive P., & Mansberg, T. (2003). The experience of lower limb
Oncology, 12, 137–148. lymphedema for women after treatment for gynecologic cancer.
3. Mols, F., Vingerhoets, J. M., Coebergh, J. W., & van de Poll- Oncology Nursing Forum, 30, 417–423.
Franse, L. V. (2005). Quality of life among long-term breast 19. Bower, T. E., Ganz, P. A., Desmond, K. A., Rowland, J. H.,
cancer survivors: A systematic review. European Journal of Meyerowitz, B. E., & Belin, T. R. (2000). Fatigue in breast
Cancer, 41, 2613–2619. cancer survivors: Occurrence, correlates, and impact on quality of
4. Treanor, C., Santin, O., Mills, M., & Donnelly, M. (2013). Cancer life. Journal of Clinical Oncology, 18, 743–753.
survivors with late effects: Their health status, care needs and 20. Forsythe, L. P., Alfano, C. M., George, S. M., McTiernan, A.,
service utilisation. Psycho-Oncology, 22, 2428–2435. Baumgartner, K. B., Bernstein, L., et al. (2013). Pain in long-term
5. Department of Health, Macmillan Cancer Support & NHS breast cancer survivors: The role of body mass index, physical
Improvement. (2010). The national cancer survivorship initiative activity, and sedentary behavior. Breast Cancer Research and
vision. London: Department of Health. Treatment, 137, 617–630.
6. Mokkink, L. B., Terwee, C. B., Patrick, D. L., Alonso, J., Strat- 21. Minton, O., & Stone, P. (2009). A systematic review of the scales
ford, P. W., Knol, D. L., et al. (2010). The COSMIN checklist for used for the measurement of cancer-related fatigue (CRF). Annals
assessing the methodological quality of studies on measurement of Oncology, 20, 17–25.
properties of health status measurement instruments: An inter- 22. Kim, S. H., Son, B. H., Hwang, S. Y., Han, W., Yang, J., Lee, S.,
national Delphi study. Quality of Life Research, 19, 539–549. et al. (2008). Fatigue and depression in disease-free breast cancer
7. Mokkink, L. B., Terwee, C. B., Patrick, D. L., Alonso, J., Strat- survivors: Prevalence, correlates, and association with quality of
ford, P. W., Knol, D. L., et al. (2010). International consensus on life. Journal of Pain and Symptom Management, 35, 644–655.
taxonomy, terminology, and definitions of measurement proper- 23. Stewart, A. L., & Nápoles-Springer, A. (2000). Health-related
ties for health-related patient-reported outcomes: Results of the quality-of-life assessments in diverse population groups in the
COSMIN study. Journal of Clinical Epidemiology, 63, 737–745. United States. Medical Care, 38, 102–124.
8. Terwee, C. B., Mokkink, L. B., Knol, D. L., Ostelo, R. W. J. G., 24. Mokkink, L. B., Terwee, C. B., Patrick, D .L., Alonso, J., Stratford,
Bouter, L. M., & de Vet, H. C. W. (2012). Rating the method- P. W., Knol, D. L. et al. (2012). COSMIN Checklist Manual. http://
ological quality in systematic reviews of studies on measurement www.cosmin.nl/images/upload/files/COSMIN%20checklist%20
properties: A scoring system for the COSMIN checklist. Quality manual%20v9.pdf. Accessed on 1 Dec 2013.
of Life Research, 21, 651–657. 25. Schwartz, C. E., Bode, R., Repucci, N., Becker, N., Sprangers, M.
9. Mokkink, L. B., Terwee, C. B., Gibbons, E., Stratford, P. W., A. G., & Fayers, P. M. (2006). The clinical significance of
Alonso, J., Patrick, D. L., et al. (2010). Inter-rater agreement and adaptation to changing health: A meta-analysis of response shift.
reliability of the COSMIN (COnsensus-based Standards for the Quality of Life Research, 15, 1533–1550.
selection of health status Measurement INstruments) Checklist. 26. Nesvold, I.-L., Fossä, S. D., Holm, I., et al. (2010). Arm/shoulder
BMC Medical Research Methodology, 10, 82. problems in breast cancer survivors are associated with reduced
10. Ashing-Giwa, K. T., Padilla, G. V., Tejero, J. S., & Kim, J. health and poorer physical quality of life. Acta Oncologica, 49,
(2004). Breast cancer survivorship in a multiethnic sample. 347–353.
123

Treanor 2014

Uploaded by

Copyright:

Available Formats

You might also like

Treanor 2014

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Treanor 2014

Uploaded by

Copyright:

Available Formats

Qual Life Res (2015) 24:339–362

A methodological review of the Short Form Health Survey

Accepted: 11 August 2014 / Published online: 20 August 2014

Abstract or without lymphedema. Methodological quality scores

Fig. 1 Complete search

form 36 OR SF-36 OR SF36 OR SF 36 OR short form-36 OR short form 36 OR MOS SF-36

form 12 OR SF-12 OR SF12 OR SF 12 OR short form-12 OR short form 12 OR MOS SF-12

MOS SF 8 OR MOS short form-8 OR MOS short form 8)

(reproducib*[tw] OR methods [sh] OR valid*[tiab] OR reproducibility of results [MeSH]

extraction unto a standardised pro-forma was conducted Results

Two of the studies calculated internal consistency scores

90 % more than high school

mean = 2.9 years; 75 %

work = 0.698, moderate

Total sample: SF-36 general Factor 4: general health (1 out of

limitations and FACT-G physical

Asian-American: SF-36 general Factor 3: general health (3 out of

LEP Latina-American: SF-36

(q = 0.44); vs. FACT-G

adjuvant therapy of patients with lymphedema

(1) Lymph-ICF physical function and SF-36 role-

emotional role limitations (q = -0.22) and mental

Conclusion Challenges in recruitment and measurement. Cancer, 101,

You might also like

(reproducib[tw] OR methods [sh] OR valid[tiab] OR reproducibility of results [MeSH]