Professional Documents
Culture Documents
Treanor 2014
Treanor 2014
Treanor 2014
DOI 10.1007/s11136-014-0785-6
REVIEW
123
340 Qual Life Res (2015) 24:339–362
briefer versions (SF-12 and SF-8) among breast cancer terms which link breast cancer survivor, psychometric
survivors. The methodological quality of the papers was properties and SF measures. The search terms for mea-
assessed using the COnsenus-based Standards for the surement properties were based on a sensitive search filter
selection of health Measurement INstruments (COSMIN) developed by the COSMIN initiative which is appropriate
and related checklist [5]. for identifying papers which focus on the psychometric
properties of a specific patient-reported outcome measure
(see Fig. 1).
Methods Titles identified from the electronic search were expor-
ted to Refworks, and duplicates were removed. The eligi-
Review method bility of a paper was assessed independently by two
reviewers, firstly according to its title followed by its
PubMed, MEDLINE, EMBASE, CINAHL, PsycINFO and abstract and the full paper. The bibliographies of included
the Social Sciences Citation Index were searched using papers were searched for further eligible papers. Data
AND
(medical outcomes study SF-36 OR medical outcomes study SF36 OR medical outcomes
study SF 36 OR medical outcomes study short form-36 OR medical outcomes study short
OR MOS SF36 OR MOS SF 36 OR MOS short form-36 OR MOS short form 36) OR
(medical outcomes study SF-12 OR medical outcomes study SF12 OR medical outcomes
study SF 12 OR medical outcomes study short form-12 OR medical outcomes study short
OR MOS SF12 OR MOS SF 12 OR MOS short form-12 OR MOS short form 12) OR
(medical outcomes study SF-8 OR medical outcomes study SF8 OR medical outcomes study
SF 8 OR medical outcomes study short form-8 OR medical outcomes study short form 8 OR
SF-8 OR SF8 OR SF 8 OR short form-8 OR short form 8 OR MOS SF-8 OR MOS SF8 OR
AND
OR reliab*[tiab])
NOT
(child* OR paed* OR ped*) AND (hospital OR inpatient) AND (pallia* OR end of life* OR
terminal*)
123
Qual Life Res (2015) 24:339–362 341
123
342 Qual Life Res (2015) 24:339–362
and 4 subscales of the English language SF-36 were the Concurrent validity: convergent
measures of interest. A Cronbach’s alpha co-efficient (a)
score of greater than 0.7 indicates acceptable internal Two studies assessed the extent of convergence between
consistency. subscales of the FACT-B and FACT-G measures and the
123
Table 1 Study characteristics of papers which focussed on the psychometric properties of the SF measuresa
Study Ashing-Giwa et al. [10] Ashing-Giwa et al. [13] Ashing-Giwa and Rosales [12] Wilson et al. [11]
Country USA USA USA USA
Measure(s) of interest SF-36 Spanish-, Korean, Chinese SF-12 Chinese language SF-36 4 sub-scales only (general SF-36 English language version
and English language versions version health; pain; role limitations due
to physical health and social
functioning) English language
version
Functional Assessment of Cancer Functional Assessment of FACT-G Functional Living Index-Cancer
Therapy-Breast (FACT-B) Cancer Therapy-General (FLIC)
Qual Life Res (2015) 24:339–362
(FACT-G)
Life Stress Scale MOS Social Support Survey
Quality of medical care Centre for Epidemiology
satisfaction (study specific) Studies—Depression (CES-D)
Spirituality (study specific)
Body image (study specific)
Sexual impact (study specific)
Short Acculturation Scale for
Hispanics
Aim(s) To assess the construct validity of To assess the internal To assess the reliability and To assess whether the SF-36 and
the QoL measures consistency and construct validity of HRQoLb measures in the FLIC can be used
validity of the SF-12 and minority ethnic populations interchangeably to measure
the FACT-G among (African-American; English HRQoL among breast cancer
Chinese-American breast language proficient (EP) Latina- survivors
cancer survivors American and limited English
language proficient (LEP)
Latina-American)
To assess the internal consistency To assess the concurrent validity To assess whether similarly named
of the measures by ethnic group of the FACT-G with the other sub-scales on the SF-36 and the
outcome measures FLIC measure similar
dimensions of HRQoL
To assess the concurrent validity To assess the extent to which the
of the FACT-B and the SF-36 SF-36 and the FLIC are able to
detect differences in HRQoL
between breast cancer survivors
with and without lymphedema
Study design Cross-sectional telephone or postal Longitudinal postal survey Cross-sectional survey (baseline) Cross-sectional survey
survey data obtained from an
intervention study
Population Criteria Criteria Criteria Criteria
343
123
Table 1 continued
344
Study Ashing-Giwa et al. [10] Ashing-Giwa et al. [13] Ashing-Giwa and Rosales [12] Wilson et al. [11]
Country USA USA USA USA
123
African-American, European- Chinese-American Breast Breast cancer survivors (with no Breast cancer survivors aged
American, Asian-American and cancer survivors between other cancer type) between 1 and 18-65 years, at least 3 months
Latina-American breast cancer 6 months and 3 years 6 years post-diagnosis (stages post-surgery recruited from
survivors between 1 and 5 years post-diagnosis (stages 0-III), aged over 18 years, self- outpatient lists
post-diagnosis (stages 0-III), 0-III), aged over 18 years identified as African- or Latina-
aged 18 and over with no other and identified from a American and identified from
cancer or major medical or cancer registry cancer registries, clinics and
psychiatric condition and support groups
identified from cancer registry
and community groups
Sample Sample Sample Sample
n = 703 n = 74 completed survey n = 320 Lymphedema group n = 32; age:
at both time-points; age: mean = 50.6 years, SD = 10.1;
mean = 54.6 years, time since diagnosis:
SD = 9.1, mean = 2.6 years, SD = 2.1
range = 31–83 years; age
at diagnosis:
mean = 52.7,
SD = 8.7 years; time
since diagnosis:
mean = 2.4 year,
SD = 2.0; 79 %
diagnosed stage I-II
African-American n = 135; age: 78 % more than high African-American n = 88; age: Non-lymphedema group n = 78;
mean = 56 years; age at school education; 40 % 70 % \ 65 years; 77 % age: mean = 52.8 years,
diagnosis: mean = 52; time low income less than diagnosed stage I-II; 82 % more SD = 9.1; time since diagnosis:
since diagnosis: $25,000 than high school education; mean = 2.1 years, SD = 1.7
mean = 3.6 years; 80 % 31 % low income less than
diagnosed stages I-II $25,000
80 % more than high school EP Latina-American n = 95;
education; 30 % low income less 84 % \ 65 years; 79 %
than $25,000 diagnosed stage I-II; 28 % more
than high school education;
31 % low income less than
$25,000
European-American n = 179; LEP Latina-American n = 137;
age: mean = 57 years; age at 85 % \ 65 years; 78 %
diagnosis: mean = 55; time diagnosed stage I-II; 14 % more
since diagnosis: than high school education;
mean = 2.7 years; 74 % 71 % low income less than
diagnosed stages I-II $25,000
Qual Life Res (2015) 24:339–362
Table 1 continued
Study Ashing-Giwa et al. [10] Ashing-Giwa et al. [13] Ashing-Giwa and Rosales [12] Wilson et al. [11]
Country USA USA USA USA
123
Table 1 continued
346
Study Ashing-Giwa et al. [10] Ashing-Giwa et al. [13] Ashing-Giwa and Rosales [12] Wilson et al. [11]
Country USA USA USA USA
123
African-American sample: One-year follow-up: SF-12 SF-36 physical role limitations, SF-36 PCS vs. FLIC total score:
physical functioning a = 0.93, PCS a = 0.81; SF-12 Total sample: a = 0.90; African- tau-b = 0.556
physical role limitations MCS a = 0.79 American sample: a = 0.87;
a = 0.86, emotional role LEP Latina-American sample:
limitations a = 0.84, a = 0.93; EP Latina-American
vitality = 0.87, mental health sample: a = 0.87
a = 0.82, social functioning
a = 0.64, bodily pain a = 0.86,
general health a = 0.79
European-American sample: Construct validity SF-36 pain, Total sample: SF-36 MCS vs. FLIC total score:
physical functioning a = 0.88, a = 0.88; African-American tau-b = 0.490
physical role limitations sample: a = 0.84; LEP Latina-
a = 0.87, emotional role American sample: a = 0.88; EP
limitations a = 0.83, Latina-American sample:
vitality = 0.90, mental health a = 0.89
a = 0.86, social functioning
a = 0.88, bodily pain a = 0.86,
general health a = 0.76
Latina-American sample: Baseline: two factors SF-36 general health, Total SF-36 physical functioning vs.
physical functioning a = 0.93, emerged sample: a = 0.77; African- FLIC physical functioning: tau-
physical role limitations American sample: a = 0.78; b = 0.616
a = 0.94, emotional role LEP Latina-American sample:
limitations a = 0.86, a = 0.76; EP Latina-American
vitality = 0.86, mental health sample: a = 0.74
a = 0.84, social functioning
a = 0.71, bodily pain a = 0.84,
general health a = 0.80
Asian-American sample: physical Factor 1: All 6 MCS items Construct validity SF-36 mental health vs. FLIC
functioning a = 0.89, physical loaded (not mental health: tau-b = 0.586
role limitations a = 0.88, careful = 0.432, social
emotional role limitations time = 0.585,
a = 0.88, vitality = 0.79, energy = 0.632,
mental health a = 0.83, social accomplished less
functioning a = 0.79, bodily emotional = 0.798, blue/
pain a = 0.80, general health sad = 0.849,
a = 0.84 peaceful = 0.896), as
well as 1 PCS item
(general health = 0.426)
Qual Life Res (2015) 24:339–362
Table 1 continued
Study Ashing-Giwa et al. [10] Ashing-Giwa et al. [13] Ashing-Giwa and Rosales [12] Wilson et al. [11]
Country USA USA USA USA
Construct validityc Factor 2: All 6 PCS items Total sample: 4-factors explained SF-36 social functioning vs. FLIC
loaded (accomplished less 72 % of the variance and social functioning: tau-
physical = 0.444, general represented the factorial b = 0.526
health = 0.476, pain structure of the original SF-36
interference = 0.589, measure
limited in kind of
Qual Life Res (2015) 24:339–362
123
Table 1 continued
348
Study Ashing-Giwa et al. [10] Ashing-Giwa et al. [13] Ashing-Giwa and Rosales [12] Wilson et al. [11]
Country USA USA USA USA
123
European-American (n = 174): Factor 4: general health (1 out of
discrepancies with factor loading 5 items): 0.72; social functioning
for physical functioning (items (all 2 items): range = 0.54–0.75
loaded onto two separate factors)
and general health (3/5 items
loaded onto one factor); other
subscale items had consistent
factor loadings
Latina-American (n = 170): African-American sample:
physical role limitations, 4-factors explained 73 % of the
emotional role limitations and variance
mental health items had
consistent factor loadings; other
subscale items had less
consistent factor loadings
Asian-American (n = 201): Factor 1: physical role limitations
consistent factor loadings for (all 4 items): range = 0.77–0.89;
general health, emotional role social functioning (1 out of 2
limitations and pain; other items): 0.57
subscale items loaded onto
multiple or no factors
Concurrent validitye Factor 2: general health (3 out of
5 items): range = 0.74–0.84;
bodily pain (1 out of 2 items):
0.45
Sub-scales of the FACT-G were Factor 3: general health (1 out of
significantly correlated to 5 items): 0.74; bodily pain (all 2
respective sub-scales of the SF- items): range = 0.69–0.71 and;
36 social functioning (all 2 items):
range = 0.51–0.71
Qual Life Res (2015) 24:339–362
Table 1 continued
Study Ashing-Giwa et al. [10] Ashing-Giwa et al. [13] Ashing-Giwa and Rosales [12] Wilson et al. [11]
Country USA USA USA USA
123
Table 1 continued
350
Study Ashing-Giwa et al. [10] Ashing-Giwa et al. [13] Ashing-Giwa and Rosales [12] Wilson et al. [11]
Country USA USA USA USA
123
European-American: SF-36 Factor 1: physical role limitations
general health and FACT-G (all 4 items): range = 0.84–0.89;
functional Well-being bodily pain: (1 out of 2 items):
(q = 0.59); SF-36 physical range = 0.47; social functioning
functioning and FACT-G (1 out of 2 items): 0.42
physical well-being (q = 0.48);
SF-36 physical role limitations
and FACT-G physical well-being
(q = 0.64); SF-36 emotional
role limitations and FACT-G
functional well-being
(q = 0.61); SF-36 vitality and
FACT-G functional well-being
(q = 0.68); SF-36 mental health
and FACT-G functional well-
being (q = 0.75); SF-36 social
functioning and FACT-G
functional well-being
(q = 0.65); SF-36 bodily pain
and FACT-G physical well-being
(q = 0.68)
Latina-American: SF-36 general Factor 2: general health (3 out of
health and FACT-G functional 5 items): range = 0.43–0.81
Well-being (q = 0.60); SF-36
physical functioning and FACT-
G physical well-being
(q = 0.64); SF-36 physical role
limitations and FACT-G physical
Well-being (q = 0.61); SF-36
emotional role limitations and
FACT-G physical well-being
(q = 0.54); SF-36 vitality and
FACT-G physical well-being
(q = 0.66); SF-36 mental health
and FACT-G emotional well-
being (q = 0.64); SF-36 social
functioning and FACT-G
functional well-being
(q = 0.61); SF-36 bodily pain
and FACT-G physical well-being
(q = 0.63).
Qual Life Res (2015) 24:339–362
Table 1 continued
Study Ashing-Giwa et al. [10] Ashing-Giwa et al. [13] Ashing-Giwa and Rosales [12] Wilson et al. [11]
Country USA USA USA USA
functional well-being
(q = 0.57); SF-36 physical role
limitations and FACT-G
functional Well-being
(q = 0.53) and FACT-G
physical Well-being (q = 0.53);
SF-36 emotional role limitations
and FACT-G functional well-
being (q = 0.50); SF-36 vitality
and FACT-G physical well-being
(q = 0.72); SF-36 mental health
and FACT-G emotional well-
being (q = 0.62); SF-36 social
functioning and FACT-G
physical well-being (q = 0.56);
SF-36 bodily pain and FACT-G
physical well-being (q = 0.68)
Factor 4: general health (1 out of
5 items): 0.74; bodily pain (all 2
items): range = 0.59–0.61;
social functioning (all 2 items):
range = 0.59–0.81
EP Latina-American sample:
4-factors explained 73 % of the
variance
Factor 1: physical role limitations
(all 4 items): range = 0.77–0.82;
social functioning (1 out of 2
items): 0.52
Factor 2: general health (2 out of
5 items): range = 0.66–0.88
Factor 3: general health (2 out of
5 items): range = 0.40–0.62;
bodily pain (all 2 items):
range = 0.85–0.88
351
123
Table 1 continued
352
Study Ashing-Giwa et al. [10] Ashing-Giwa et al. [13] Ashing-Giwa and Rosales [12] Wilson et al. [11]
Country USA USA USA USA
123
Factor 4: general health (2 out of
5 items): range = 0.54–0.84;
social functioning (all 2 items):
range = 0.52–0.71
Concurrent validityd
African-American: SF-36 general
health vs. FACT-G physical
well-being (q = 0.60); vs.
FACT-G social/family well-
being (q = 0.01 n.s.*); vs.
FACT-G emotional well-being
(q = 0.39); vs. FACT-G
functional well-being (q = 0.43)
SF-36 social functioning vs.
FACT-G physical well-being
(q = 0.69); vs. FACT-G social/
family well-being (q = 0.16
n.s.*); vs. FACT-G emotional
well-being (q = 0.37); vs.
FACT-G functional well-being
(q = 0.59)
SF-36 physical role limitations vs.
FACT-G physical well-being
(q = 0.56); vs. FACT-G social/
family well-being (q = -0.01
n.s.*); vs. FACT-G emotional
well-being (q = 0.22); vs.
FACT-G functional well-being
(q = 0.51)
SF-36 bodily pain vs. FACT-G
physical well-being (q = 0.69);
vs. FACT-G social/family well-
being (q = 0.04 n.s.*); vs.
FACT-G emotional well-being
(q = 0.20 n.s.); vs. FACT-G
functional well-being (q = 0.52)
Qual Life Res (2015) 24:339–362
Table 1 continued
Study Ashing-Giwa et al. [10] Ashing-Giwa et al. [13] Ashing-Giwa and Rosales [12] Wilson et al. [11]
Country USA USA USA USA
123
Table 1 continued
354
Study Ashing-Giwa et al. [10] Ashing-Giwa et al. [13] Ashing-Giwa and Rosales [12] Wilson et al. [11]
Country USA USA USA USA
123
SF-36 social functioning vs.
FACT-G physical well-being
(q = 0.49); vs. FACT-G social/
family well-being (q = 0.43);
vs. FACT-G emotional well-
being (q = 0.44); vs. FACT-G
functional well-being (q = 0.62)
SF-36 physical role limitations vs.
FACT-G physical well-being
(q = 0.50); vs. FACT-G social/
family well-being (q = -0.23);
vs. FACT-G emotional well-
being (q = 0.18 n.s.); vs. FACT-
G functional well-being
(q = 0.53)
SF-36 bodily pain vs. FACT-G
physical well-being (q = 0.71);
vs. FACT-G social/family well-
being (q = 0.22); vs. FACT-G
emotional well-being
(q = 0.25); vs. FACT-G
functional well-being (q = 0.63)
Authors conclusions Internal consistency Internal consistency Internal consistency Convergent validity
The SF-36 was assessed as having The SF-12 has good The SF-36 had acceptable internal SF-36 PCS and MCS measure
moderate-to-strong reliability internal consistency and consistency across the three sub- distinct domains of HRQoL
across the different ethnic groups the measure is reliable in groups
a Chinese-American
population
Construct validity Construct validity Construct validity There is a modest degree of
construct overlap between the
SF-36 PCS and MCS and the
FLIC total score
Overall, the SF-36 presented good The SF-12 at baseline had The SF-36 role limitations due to The physical, mental and social
factor structure, with the good factor structure physical health subscale had domains of HRQoL are similar
exception of the social which closely reflected good factor structure across each in the two measures, but general
functioning scale the 2 constructs of the sub-group health is not similar
measure
Qual Life Res (2015) 24:339–362
Table 1 continued
Study Ashing-Giwa et al. [10] Ashing-Giwa et al. [13] Ashing-Giwa and Rosales [12] Wilson et al. [11]
Country USA USA USA USA
Factor structures by ethnic groups The factor structure of the The SF-36 general health, pain Discriminative validity
were less consistent SF-12 at follow-up was and social functioning sub-scales
less robust, and this may had inconsistent factor structures
be due to a response shift across the three sub-groups
in cancer survivor’s
interpretation of the
Qual Life Res (2015) 24:339–362
items.
Concurrent validity The SF-36 is acceptable for use in SF-36 is able to discriminate
breast cancer survivor between breast cancer survivors
populations from ethnic minority with and without lymphedema in
and low-literacy groups terms of physical HRQoL, but
not mental HRQoL
There was good concurrent Concurrent validity
validity demonstrated between
the FACT-G and the SF-36
There was good concurrent
validity demonstrated between
the FACT-G and the SF-36
Quality Assessment Internal consistency Internal consistency Internal consistency Concurrent validity
Excellent Fair Good
Construct validity Construct validity
Poor Poor
Concurrent validity Construct validity Concurrent validity
Fair Good Fair Fair
a
Unless where necessary information regarding the SF measures only is reported
b
HRQoL = health-related quality-of-life
c
Participants with missing data were excluded from this analysis
d
Although, the FACT-G was focus of the analysis, psychometric information on the SF-36 is also provided
e
Only the highest, positive correlations are reported in the table
* n.s. = non-significant
355
123
356 Qual Life Res (2015) 24:339–362
SF-36 [10, 12]. The Ashing-Giwa et al. [10] study analysed correlation was reportedly with SF-36 emotional role lim-
convergent validity among the total sample and four ethnic itations. Moreover, Lymph-ICF mobility activities subscale
and language sub-groups: African-American, European- was negatively, but moderately associated with the SF-36
American, Latina-American and Asian-American. The emotional role limitations subscale (q = -0.42) which
strength and direction of associations between respective deviated from the a priori hypothesis [14].
SF and FACT-B subscales were similar across the total
sample and ethnic and language sub-groups—see Table 2. Construct validity
The Ashing-Giwa and Rosales [12] study analysed the
concurrent validity between the FACT-G and four sub- Based on the original factor structure of the SF-36, one
scales of the SF-36 among three sub-groups: African- study utilised Confirmatory Factor Analysis to test the
American; LEP and EP Latina-Americans. Except for a factor structure of four subscales of the SF-36 for their total
few discrepancies, the strength and direction of associa- sample and three sub-groups: African-American; LEP and
tions were similar between the FACT-G and SF-36 sub- EP Latina-American. For the total sample, the 4 factors
scales among each of the sub-groups—see Table 2. Of explained 72 % of variance and generally represented the
note, a strong association was found between the SF-36 structure of the SF-36. Four distinct factors emerged for
social functioning and the FACT-G social/family well- each SF-36 subscale with few inconsistencies—see
being subscales among the African-American group com- Table 1. Although, the four factors of the SF-36 explained
pared to low-moderate correlations observed among the between 73 and 75 % of the variance in quality-of-life
LEP and EP Latina-American groups. scores within the respective ethnic and language sub-
Two additional studies assessed the convergent validity of groups, the factor structure of the subscales—with the
SF-36 subscales and respective lymphedema-specific mea- exception of the physical role limitation subscale—was
sures among breast cancer survivors in Dutch-speaking less consistent. The general health items loaded onto three
countries. One of the studies stated five a priori hypotheses to factors across the three sub-groups, and each of the items
assess convergent validity between the Lymph-ICF and the for the bodily pain and social functioning subscales loaded
SF-36. Each of the hypotheses was supported and listed in onto one factor among the African-American and LEP
Table 2 [14]. The second study demonstrated acceptable Latina-American sub-groups—see Table 1 [12].
convergent validity with the ULL27 [15]. The highest, posi- Exploratory factor analysis data across the total sample
tive correlations were between the SF-36 psychological sub- and ethnic and language sub-groups were presented for the
scales (with the exception of the emotional role limitations general health perception items only in one paper [10].
subscale), and social subscales and respective psychological According to the authors’ descriptions of the results for the
and social domains of the ULL27. However, the SF-36 vitality other subscales (except for SF-36 social functioning), the
and physical role limitations subscales did not correlate very factor structure was generally consistent for the total
strongly with the physical domain of the ULL27 [15]. sample. The most consistent factor structure was within the
Both studies demonstrated moderate correlations European-American sub-group whereby inconsistencies
between the SF-36 bodily pain subscale and respective were found only within the general health and physical
physical scales of the lymphedema scales in the expected functioning subscales. The factor structure of the SF-36
direction. Similar results were demonstrated between the performed similarly within the Asian- and African-Amer-
SF-36 social functioning subscale and respective Lymph- ican sub-groups. The most inconsistent pattern of factor
ICF and ULL27 social scales. The Devoogdt et al. [14] loadings was within the Latina-American sub-group; good
study found that the SF-36 mental health subscale was factor structure was identified among the emotional role
correlated strongly with the Lymph-ICF mental function limitations subscale only [10].
scale; however, the Viehoff et al. [15] study reported Both studies demonstrate good factor structure of the SF-
similar findings except for the respective ULL27 psycho- 36 within the total samples under study; however, factor
logical scale and SF-36 emotional role limitations. structures were less consistent within the ethnic and lan-
guage sub-groups. A few notable discrepancies can be found
Concurrent validity: divergent between the two studies. Within the African-American sub-
group, discrepancies can be seen for the factor structure of
One study stated five hypotheses to assess the divergent the physical role limitations, general health and bodily pain
validity of SF-36 and the Lymph-ICF—see Table 2. Three subscales. Moreover, within the Latina-American sub-
of the five hypotheses were supported. The authors group, the only discrepancy can be seen in the factor struc-
expected the greatest divergence and thus weakest associ- ture for the physical role limitations subscale [10, 12].
ation to be between the Lymph-ICF life and social and the A further study assessed the factor structure of the SF-12
SF-36 physical functioning subscales; however, the lowest within a Chinese-American breast cancer survivor sample.
123
Table 2 Study characteristics of papers which provide psychometric information on the SF measures but focussed on other measuresa
Study Devoogdt et al. [14] Terhorst et al. [16] Viehoff et al. [15]
Country Belgium USA The Netherlands
Measure(s) of Lymphedema Functioning, Disability and Health Breast Cancer Prevention Trial Symptom Checklist Upper limb lymphedema 27-item questionnaire (ULL27)
interest questionnaire (Lymph-ICF) (BCPT)
SF-36 Dutch language version SF-36 English language version SF-36 Dutch language version
Study-specific questionnaire
Aim(s) To assess the validity and reliability of the Lymph-ICF To assess the psychometric properties of the BCPT with To translate the ULL27 into Dutch (from French) and
questionnaire a sample of breast cancer patients before and after determine its psychometric properties in a population
Qual Life Res (2015) 24:339–362
123
Table 2 continued
358
Study Devoogdt et al. [14] Terhorst et al. [16] Viehoff et al. [15]
Country Belgium USA The Netherlands
123
(4) Lymph-ICF mobility activities and SF-36 physical
functioning
(5) Lymph-ICF life and social activities and SF-36
social functioning
Divergent validity hypotheses
(1) Lymph-ICF physical function and SF-36 role-
emotional and mental health
(2) Lymph-ICF mental health and SF-36 physical
functioning and physical role limitations
(3) Lymph-ICF household activities and SF-36
emotional role limitations and mental health
(4) Lymph-ICF mobility activities and SF-36
emotional role limitations and mental health
(5) Lymph-ICF life and social activities and SF-36 –
physical limitations
Results Construct validity Discriminant validity Concurrent validity
5 convergent validity hypotheses were supported Low, negative correlations between the symptoms The Dutch ULL27 domains were significantly
reported on the BCPT and scores on the SF-36 correlated with most of the respective SF-36
summary component scores at both baseline
(range = -0.401 to 0.016) and 6-month follow-up
(range = -0.308 to -0.007)
(1) Lymph-ICF physical function and SF-36 bodily The ULL27 physical domain correlated highly with SF-
pain (q = -0.52) 36 bodily pain (q = 0.69), general health (q = 0.60)
and social functioning (q = 0.55) domains, but not
physical role limitations (q = 0.38) or vitality
(q = 0.47) domains as would be expected
(2) Lymph-ICF mental function and SF-36 mental The ULL27 psychological domain correlated highly
health (q = -0.70) with SF-36 general health (q = 0.54), vitality
(q = 0.55), mental health (q = 0.66) and social
functioning (q = 0.51) domains as would be expected,
but not the emotional role limitations (q = 0.42)
domain
(3) Lymph-ICF household activities and SF-36 The ULL27 social domain correlated highly with SF-36
physical functioning (q = -0.51) physical functioning (q = 0.64), general health
(q = 0.56) and social functioning (q = 0.45) domains
(4) Lymph-ICF mobility activities and SF-36 physical
functioning (q = -0.62)
(5) Lymph-ICF life and social activities and SF-36
social functioning (q = -0.33)
3 of the 5 divergent hypotheses were supported
Qual Life Res (2015) 24:339–362
Table 2 continued
Study Devoogdt et al. [14] Terhorst et al. [16] Viehoff et al. [15]
Country Belgium USA The Netherlands
123
360 Qual Life Res (2015) 24:339–362
The authors utilised exploratory factor analysis at baseline spoken languages and ethnicities. The social functioning
to explore the factor structure of the measure within this subscale had the lowest internal consistency scores
population. The factor structure largely mirrored that of the (a = 0.64 and a = 0.68) among African-American breast
original measure, all MCS items (with one PCS item) and cancer survivors in two studies; however, the scores were
PCS items loaded onto two emergent factors, respectively. not low enough to warrant cause for concern.
Confirmatory Factor Analysis was used to assess how the The SF-36 and FACT-G were assessed concurrently in
SF-12 performed in the same population, 1 year later. Two two studies within various ethnic and language sub-groups;
factors emerged, but the pattern of factor loadings reflect- comparisons between the two studies were, however, limited
ing the MCS and PCS items was less consistent, and as one study included only four of the eight SF-36 subscales
58.1 % only of variance was explained [13]. and included only two similar ethnic and language sub-
groups. Differences in study design between the two studies
Discriminant validity may account for some of the discrepancies in concurrent
validity results (e.g. SF-36 subscales and FACT measures
One study administered the English language SF-36 in social/family well-being subscales) among the African-
order to differentiate between groups of breast cancer American samples. One of the studies conducted a popula-
survivors with and without lymphedema [11], and a second tion-based cross-sectional telephone or postal survey to
study used the SF-36 to assess the discriminant validity of investigate different recruitment strategies among different
the Breast Cancer Prevention Trial Symptom Checklist ethnic and language groups and to assess the psychometric
(BCPT) measure [16]. properties of health-related outcome measures [10], whereas
At baseline and 6-month follow-up, low correlations the other study utilised baseline data from a psycho-educa-
(range q = -0.401 to 0.016 and q = -0.308 to -0.007, tion intervention study to undertake a secondary assessment
respectively) were reported between the SF-36 component of psychometric properties [12]. No further details of the
summary scores and BCPT scores. The BCPT requires the psycho-education intervention are reported in the study or
presence or absence of a number of symptoms related to published elsewhere, in particular how the sample was
breast cancer treatment to be reported, and higher scores on selected. Breast cancer survivors who already have a good
the SF-36 indicate better HRQoL. The findings indicated knowledge of breast cancer and its impact may have ‘self-
the SF-36 has good discriminant validity [16]. selected’ for the psycho-education intervention compared to
The SF-36 was able to differentiate between survivors survivors with less knowledge; therefore, items on the can-
with and without lymphedema in the expected direction cer-specific FACT-G may have more salience for them
(i.e. survivors with lymphedema have lower SF-36 scores) compared to items on the generic SF-36. A further study
in terms of physical component summary score (d = 1.20), assessed the convergent validity of the SF-36 and FLIC to
physical role limitations (d = 1.02), physical functioning good effect [11]. Given that the SF-36 is a generic measure of
(d = 1.11), emotional role limitations (d = 0.72), vitality health status and the FACT and FLIC are measures specific
(d = 0.74), social functioning (d = 0.60), bodily pain to cancer populations, it shows promise that the measures
(d = 1.20) and general health (d = 0.69). The SF-36 seemingly measure the same constructs in diverse ethnic and
mental component summary score (d = 0.19) and mental language breast cancer survivor groups. It would appear that
health subscale were not able to differentiate between the SF-36 may be a suitable generic alternative to both
survivors with and without lymphedema [11]. cancer-specific measures, particularly to make comparisons
of the health status of cancer survivors to population norms,
the general population or other disease groups.
Discussion The concurrent validity of respective subscales of the
Dutch language SF-36 and lymphedema-specific measures
Seven studies assessed the psychometric properties of the (ULL27 and Lymph-ICF) was assessed for convergent
SF measures within diverse ethnic and language breast validity in two studies and divergent validity (Lymph-ICF)
cancer survivor samples. Overall, the SF measures were in one study to good effect. One of the studies found a less
found to have good psychometric properties. Further sup- strong association between respective physical subscales of
port for the use of the SF-36 among cancer survivor pop- the two measures. The authors report that this is likely due
ulations is provided by the assessment of the psychometric to the lower limb focus of the SF-36 e.g. ability to climb a
properties of the SF-36 within a British, childhood cancer flight of stairs compared to the upper limb focus of the
survivor cohort [17]. ULL27 [15]. Thus, the SF-36 may be more appropriate for
Internal consistency ranged from acceptable to good use among survivors with lower limb lymphedema [18],
across the SF-36 subscales and SF-12 component summary although this would need to be psychometrically assessed.
scores within breast cancer survivor populations of varying In order to accurately capture the health outcomes of breast
123
Qual Life Res (2015) 24:339–362 361
cancer survivors with lymphedema, the SF-36 may not be items in the light of changes that have occurred as a result
an adequate substitute for lymphedema-specific measures of diagnosis and treatment as breast cancer survivors live
in terms of capturing similar aspects of health; however, longer with the disease [25]. Further research with ade-
more research is needed in this area due to the limited quate sample size should be undertaken to ensure that the
number of studies identified. factor structure of the SF-36 for use within diverse ethnic
Subscales of the SF-36 (e.g. vitality and bodily pain) and language groups is adequately assessed.
have been implemented in studies to assess cancer-related The SF-36 was able to discriminate between breast
fatigue [19] and cancer-related pain [20], respectively, cancer survivors with and without lymphedema in terms of
among breast cancer survivors. No studies were identified physical health, but not in terms of mental health. This is
which assessed the psychometric performance of these supported by further research which found that there was a
subscales compared to cancer-specific or symptom-specific significant reduction in scores on the SF-36 MCS and
measures. To use fatigue as an example, scales which mental health subscale associated with arm and shoulder
define and measure fatigue exclusively as a multidimen- problems which were not lymphedema among breast can-
sional construct may be more reliable than measures that cer survivors. However, this significant reduction was not
include fatigue as a series of unidimensional items within a seen among breast cancer survivors with lymphedema [26].
domain or subscale [21]. Moreover, variation in the types In contrast to this, results from a population-based survey
(i.e. generic, cancer-specific or symptom-specific) of of cancer survivors (the majority of which were breast
patient-reported outcome measures may account for vari- cancer survivors), scores on the SF-36 MCS and mental
ance in prevalence rates of fatigue and other cancer-related health subscale were significantly lower among cancer
effects [19, 22]. Therefore, there is a need for further survivors with late effects compared to those without late
research to identify the psychometric performance of sub- effects. Although this discrepancy may be explained by a
scales of the SF-36 that may be used to measure cancer- lack of focus on lymphedema, only and many of the
related effects among breast cancer survivors (and other experienced late effects may have been psychological in
cancer groups) to ensure that health care providers and nature [4]. The authors of one paper suggest that the dis-
commissioners adequately capture this information to criminant validity findings for the MCS and mental health
inform service provision and delivery. subscales may be accounted for by the limited number of
The factor structure of the SF-36 was good for overall SF-36 items to address issues of anxiety, depression, or
study samples and reflected the original measure, but distress which may be experienced by many breast cancer
inconsistencies in the factor structure were reported within survivors post-treatment [11]. One study utilised the SF-36
language and ethnic sub-groups. These consistencies may to assess the discriminant validity of the BCPT, to good
result from the loss of nuances of language or the failure to effect. This study provides a good illustration of how the
adequately or at all represent cultural norms when trans- SF-36 has been used to validate other measures [16].
lations are made [23]. However, the inconsistencies may
also be partially explained by a lack of statistical power to Strengths and limitations
conduct sub-group analyses due to small sample sizes. The
COSMIN manual provides recommendations for adequate A major strength of the review was the use of COSMIN
sample size to undertake factor analysis. For example, a resources including a well-developed search strategy and the
‘good’ sample size for factor analysis should include more use of a methodological quality assessment tool. The COS-
than 100 participants and at least five times the number of MIN initiative recommend their checklist to be scored using a
measure items [24]. Only one [13] of the three studies ‘worse score counts’ approach. Therefore, papers which have
which assessed the factor structure of the SF measures had generally ‘excellent’ or ‘good’ scores may score ‘poor’ on one
a ‘good’ sample size, perhaps due to using the smaller SF- item and receive an overall ‘poor’ rating, as was the case with
12 measure and not undertaking sub-group analysis; the the assessment of construct validity in one paper [10].
other studies had a an under-powered, ‘poor’ sample size The inclusion of additional studies (n = 3) which did
for sub-group analysis of the SF-36 [10, 12]. Specific not primarily assess the psychometric properties of the SF
inconsistencies between comparable aspects of the two measures but which nevertheless provide this information
studies may be accounted for by differences in sources of may be questionable [24]. In the light of this, these studies
data—see above [10, 12]. The Chinese language SF-12 demonstrate similar results to studies where the SF mea-
demonstrated good factor structure (closely reflecting the sures were the primary focus. Moreover, many psycho-
original SF-12) at baseline, but less consistent factor metric properties of the SF-36 were not assessed in the
structure at 1-year follow-up within a Chinese-American breast cancer survivor population, for example respon-
breast cancer survivor sample. This may be explained by a siveness to assess the ability of the SF-36 to detect change
response shift in the meaning or interpretation of the SF-12 and this would warrant further research attention.
123
362 Qual Life Res (2015) 24:339–362
123