Professional Documents
Culture Documents
Evidence of Validity For The Hip Outcome Score
Evidence of Validity For The Hip Outcome Score
RobRoy L. Martin, Ph.D., P.T., C.S.C.S., Bryan T. Kelly, M.D., and Marc J. Philippon, M.D.
Purpose: The purpose of this study was to offer evidence of validity for the Hip Outcome Score
(HOS) based on internal structure, test content, and relation to other variables. Methods: The study
population consisted of 507 subjects with a labral tear. Internal structure was evaluated by use of
factor analysis and coefficient . Test content was evaluated by use of item response theory. Pearson
correlation coefficients were used to assess relations between the Short Form 36 and the HOS.
Results: The mean subject age was 38 years (range, 13 to 66 years), with 232 male and 273 female
subjects. Of the subjects, 263 (52%) underwent arthroscopic surgery. Factor analysis found that 17
of 19 items on the activities-of-daily-living (ADL) subscale loaded on 1 factor. The 2 items that did
not fit the 1-factor model were omitted from further testing. All 9 items on the sports subscale loaded
on 1 factor. The coefficient values were .96 and .95 for the ADL and sports subscales, respectively.
The errors associated with a single measure were 4.6 and 3.8 points for the ADL and sports
subscales, respectively. Item response theory found that all items contributed to their test information
curves and were potentially responsive. The correlations between the HOS and Short Form 36
measures of physical function were significantly different than their correlation to measures of mental
functioning (P .005). Conclusions: The results of this study provide evidence of validity to support
the use of the HOS ADL and sports subscales for individuals with labral tears. This includes
individuals who underwent arthroscopic surgery, as well as those who did not. Specifically, the
results of this study found that the HOS ADL and sports subscales were unidimensional, had adequate
internal consistency, were potentially responsive across the spectrum of ability, and contributed
information across the spectrum of ability. In addition, scores obtained by the HOS related to
measures of function and did not relate to measures of mental health. Level of Evidence: Level III,
development of diagnostic criteria with nonconsecutive patients. Key Words: Hip Outcome Score
Labral tearHip arthroscopyOutcome instrumentValidity.
1304
usculoskeletal hip disorders and hip arthroscopy are areas of growing interest within the
field of orthopaedics. As physicians and other health
care practitioners become more involved in these areas, research that defines the expected outcomes for
various treatments will be needed. This will include
continuing to define the outcomes of both arthroscopic
surgical treatment and nonsurgical treatment for individuals with acetabular labral tears. A number of
self-report evaluative instruments have been developed for individuals with hip pathology.1-8 All of
these instruments have deficiencies that may negatively impact their ability to assess the effect of treatment interventions for individuals with labral tears
who may be functioning throughout a wide range of
ability.
The usefulness of an instrument can be determined
based on concepts associated with contemporary validity theory. Important concepts to consider include
evidence for test content, internal structure, and rela-
Arthroscopy: The Journal of Arthroscopic and Related Surgery, Vol 22, No 12 (December), 2006: pp 1304-1311
1305
1306
R. L. MARTIN ET AL.
ware International, Chicago, IL). Eigenvalues and factor loading patterns were used to identify and extract
factors. Items with the lowest factor loading to the
principal component were sequentially deleted until
only 1 eigenvalue was produced that had a value
greater than 1.
Item Characteristic Curves: MULTILOG (Scientific Software International) was used to perform IRT
and calibrate the items by use of the 2-parameter
graded response model. The results of IRT allow for
item characteristic curves to be constructed in an
Excel spreadsheet (Microsoft, Redmond, WA) for
each item by use of difficulty and discrimination parameters generated by MULTILOG. An appropriate
item characteristic curve with 5 potential responses,
with each response describing a level of proficiency
with the activity in question, should have 5 distinct
and separate curves. Each curve should have 1 peak,
and together, the 5 curves should span the spectrum of
ability (theta).13 Items that did not have appropriate
item characteristic curves were considered for
elimination.
TABLE 1.
Unable to
Do
Extreme
Difficulty
Moderate
Difficulty
Slight
Difficulty
No
Difficulty
Nonapplicable
Missing
Response
7 (1.4%)
29 (5.7%)
129 (25.4%)
132 (26%)
206 (40%)
2 (0.4%)
3 (0.4%)
0
5 (1%)
20 (3.9%)
13 (2.6%)
1 (0.2%)
46 (9.1%)
60 (11.8%)
69 (13.6%)
48 (9.5%)
38 (7.5%)
118 (23.3%)
104 (20.5%)
161 (31.8%)
147 (29%)
100 (19.7%)
189 (37.3%)
156 (30.8%)
133 (26.2%)
140 (27.6%)
158 (31.2%)
153 (30.2%)
180 (35.5%)
112 (22.1%)
142 (28%)
209 (41.2%)
0
0
10 (2%)
11 (2.2%)
1 (0.2%)
1 (0.2%)
2 (0.4%)
2 (0.4%)
6 (1.2%)
0
1 (0.2%)
1 (0.2%)
67 (13.2%)
17 (3.4%)
9 (1.8%)
105 (20.7%)
91 (17.9%)
66 (13%)
123 (24.3%)
164 (32.3%)
134 (26.4%)
114 (22.5%)
230 (45.4%)
292 (57.6%)
72 (14.2%)
1 (0.2%)
0.3 (0.6%)
16 (3.2%)
3 (0.6%)
2 (0.4%)
10 (2%)
10 (2%)
2 (0.4%)
8 (1.6%)
21 (4.1%)
28 (5.5%)
25 (4.9%)
83 (16.4%)
80 (15.8%)
100 (19.7%)
136 (26.8%)
148 (29.2%)
170 (33.5%)
184 (36.3%)
248 (48.9%)
199 (39.3%)
64 (12.6%)
0
0
9 (1.8%)
1 (0.2%)
5 (1%)
12 (2.4%)
41 (8.1%)
107 (21.1%)
154 (30.4%)
190 (37.5%)
3 (0.6%)
33 (6.5%)
82 (16.2%)
137 (27%)
113 (22.3%)
136 (26.8%)
4 (0.8%)
2 (0.4%)
49 (9.7%)
7 (1.4%)
107 (21.1%)
47 (9.3%)
139 (27.4%)
87 (17.2%)
136 (26.8%)
176 (34.7%)
65 (12.8%)
185 (36.5%)
5 (1%)
1 (0.2%)
6 (1.2%)
4 (0.8%)
11 (2.2%)
33 (6.5%)
110 (21.7%)
181 (35.7%)
167 (32.9%)
1 (0.2%)
4 (0.8%)
64 (12.6%)
98 (19.3%)
114 (22.5%)
98 (19.3%)
143 (28.2%)
139 (27.4%)
114 (22.5%)
106 (20.9%)
52 (10.3%)
39 (7.7%)
18 (3.6%)
18 (3.6%)
4 (0.8%)
9 (1.8)
TABLE 2.
Running 1 mile
Jumping
Swinging objects like
a golf club
Landing
Starting and stopping
quickly
Cutting/lateral
movements
Low-impact activities
like fast walking
Ability to perform
activity with your
normal technique
Ability to participate
in your desired
sport as long as
you would like
1307
Unable to
Do
Extreme
Difficulty
Moderate
Difficulty
Slight
Difficulty
No
Difficulty
Nonapplicable
Missing
Response
286 (56.4%)
171 (33.7%)
62 (12.2%)
96 (18.9%)
60 (11.8%)
84 (16.6%)
41 (8.1%)
81 (16%)
35 (6.9%)
64 (12.6%)
21 (4.1%)
8 (1.6%)
2 (0.4%)
3 (0.6%)
82 (16.2%)
110 (21.7%)
50 (9.9%)
89 (17.6%)
66 (13%)
98 (19.3%)
99 (19.5%)
96 (18.9%)
111 (21.9%)
77 (15.2%)
92 (18.1%)
22 (4.3%)
7 (1.4%)
15 (3%)
79 (15.6%)
118 (23.3%)
131 (25.8%)
107 (21.1%)
67 (13.2%)
3 (0.6%)
15 (3%)
105 (20.7%)
133 (26.2%)
113 (22.3%)
104 (20.5%)
33 (6.5%)
9 (1.8%)
10 (2%)
74 (14.6%)
76 (15%)
115 (22.7%)
132 (26%)
104 (20.5%)
4 (0.8%)
2 (0.4%)
116 (22.9%)
95 (18.7%)
116 (22.9%)
104 (20.5%)
61 (12%)
8 (1.6%)
7 (1.4)
260 (51.3%)
96 (18.9%)
64 (12.6%)
42 (8.3%)
29 (5.7%)
14 (2.8%)
2 (0.4%)
1308
R. L. MARTIN ET AL.
Item Content
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Item No.
Item Content
1
2
3
4
5
6
7
8
Running 1 mile
Jumping
Swinging objects like a golf club
Landing
Starting and stopping quickly
Cutting/lateral movements
Low-impact activities like fast walking
Ability to perform activity with your
normal technique
Ability to participate in your desired
sport as long as you would like
Factor
Loading
.90
.93
.85
.94
.89
.88
.86
.83
.87
Factor Loading
Item No.
TABLE 4.
19 Item 17 Item
.82
.78
.82
.76
.63
.84
.84
.85
.86
.84
.75
.81
.55
.76
.86
.85
.85
.86
.86
.83
.75
.80
.75
.87
.84
.75
.76
.89
.85
.74
.74
.89
.81
.82
.77
.77
ADL TIF
35
0.9
Sports TIF
30
0.8
0.7
0.6
Information
Probability of Response
1309
RESP0
RESP1
0.5
RESP2
RESP3
0.4
RESP4
0.3
0.2
25
20
15
10
0.1
0
THETA
5
-3.2
-2.2
-1.2
-0.2
0.8
1.8
2.8
3.8
curbs) were considered for elimination. The test information function was recalculated separately with
each of these items deleted. In each case a decrease in
information was noted throughout the range of ability.
Therefore these 4 items were retained to maximize the
instruments precision of measurement across the
range of ability.
The 19-item ADL subscale and 9-item sports subscale can be found in Appendix 1 (online only, available at www.arthroscopyjournal.org). The ADL and
sports subscales are scored separately. The item re1
0.9
Probability of Response
0.8
0.7
RESP0
0.6
RESP1
RESP2
0.5
RESP3
RESP4
0.4
0.3
0.2
0.1
0
THETA
-3.2
-2.2
-1.2
-0.2
0.8
1.8
2.8
3.8
2.0
1.5
1.0
0.5
0.0
-0.5
-1.0
-1.5
-2.0
Ability
FIGURE 3. Test information function (TIF) for ADL and sports
subscales showing their potential to provide information across
range of ability. The ADL subscale offers more information regarding function at the lower end of ability, whereas the sports
subscale offers more information at the higher range of ability.
1310
R. L. MARTIN ET AL.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
REFERENCES
1. Bellamy N, Buchanan WW, Goldsmith CH, Campbell J, Stitt
LW. Validation study of WOMAC: A health status instrument
for measuring clinically important patient relevant outcomes to
13.
14.
1311