change interventions: opportunities for innovation
Andrea L. Dunn 1 *, Kenneth Resnicow 2 and Lisa M. Klesges 3 Theoretically based behavioral interventions have demonstrated effectiveness in stopping smoking, increasing physical activity and improving nutrition in order to prevent major chronic diseases such as cardiovascular disease, cancer and diabetes [17]. Despite the development of effective behavior change programs, the magnitude of change in these behaviors has been relatively modest [8, 9]. The rising rates of diabetes and obesity, for example, indicate there is still much to learn about the mech- anisms of behavior change and how to maintain newly acquired behavioral skills. One problem that has slowed behavioral intervention research is that the validity and reliability of our measures has sometimes lagged other innovations such as the development of effective tailored interventions [10, 11] or analytical techniques for assessing moder- ators and mediators of behavior change, hence making it difcult to understand the mechanisms of behavior change and making it difcult to improve our interventions [1215]. This special issue of Health Education Research provides an opportunity to consider an important advancement in our behavioral measurement meth- ods, specically, how we can apply item response models to improve our psychometric methods in health education and health behavior research and practice. Although itemresponse modeling (IRM) has been used in educational testing over the last three decades [16], it is an emerging method in health education and health behavior research [17]. As, for example, in health research, IRM is being widely adopted to improve and revise quality of life ques- tionnaires [1820]. There are many other innovative applications of IRMandas the nine papers inthis issue showus, these applications can aid our understanding of the psychometric properties of scales beyond making questionnaires shorter and beyond what most of us learned in our survey development courses based on classical test theory (CTT). In this brief afterword, we highlight current and future applica- tions of IRM and discuss how these methods might help us improve the efciency of our research. We believe these methods could lead to considerable improvements in intervention methods, understand- ing the mechanisms of behavior change and de- veloping and rening theoretical models to make themmore parsimonious as well as provide a founda- tion for considering the feasibility of computerized adaptivetestingfor measures of behavioral constructs. For those who are not familiar with IRM methods, the two papers by Wilson colleagues [21, 22] pro- vide an excellent tutorial by presenting an example using the Rasch one-parameter model to examine a measure of self-efcacy using both dichotomous and polytomous models and then comparing IRM and CTT methods. In the rst paper, readers learn the basics of IRM, including the relationship of individual items as well as the role of item difculty response patterns. In other words, item response models estimate a result of the underlying construct 1 Klein Buendel, Inc., 1667 Cole Boulevard, Suite 225, Golden, CO 80401, USA, 2 Health Behavior and Health Education School of Public Health, University of Michigan, Ann Arbor, MI, USA and 3 Department of Epidemiology and Cancer Control, St Jude Childrens Research Hospital, Memphis, TN, USA *Correspondence to: A. L. Dunn. E-mail: adunn@kleinbuendel.com 2006 The Author(s). This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/ licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. doi:10.1093/her/cyl141 HEALTH EDUCATION RESEARCH Vol.21 (Supplement 1) 2006 Theory & Practice Pages i121i124 Advance Access publication 31 October 2006
a t
M i h a i
E m i n e s c u
C e n t r a l
U n i v e r s i t y
L i b r a r y
o f
I a s i
o n
M a r c h
1 0 ,
2 0 1 4 h t t p : / / h e r . o x f o r d j o u r n a l s . o r g / D o w n l o a d e d
f r o m
for the individual and a difculty estimate of the item on the underlying construct. The implications of these features of IRM are likely to be far reaching for health promotion research and practice. Re- searchers will be able to determine whether items contribute to an overall understanding of a construct and determine if there are gaps in item difculty which could enable a more informed decision about development of new items. We see the beginnings of work that builds toward precise measures that yield equal accuracy at all levels [23]. For example, in the quality of life measures, the goal is to have equally precise measures across the continuum of chronically ill to healthy individuals. A similar goal could be established for behavioral constructs like self-efcacy, barriers, benets, enjoyment and many others. The goal is to increase the reliability of test scores and to improve the amount of information obtained along the continuum of the construct measured by the scale and potentially improving the scales ability to detect behavior change. Another potential application of IRM is in the design of tailored interventions. Behavioral scien- tists have recognized the promise of tailored inter- ventions aimed at discriminating key motivational strategies for individuals but we have lacked methods to effectively differentiate individuals on constructs targeted by these strategies [11, 24, 25]. For example, it is unlikely that current measures of self-efcacy for physical activity clearly differenti- ate levels among individuals with either high or low self-efcacy. For example, in the case of two people who might score 90 out of 100 on a self-efcacy scale for physical activity, we might want to tailor the self-efcacy advice somewhat differently de- pending on how they achieved that level of self- efcacy, e.g did condence increase because of a change of circumstances or because they have been able to work through problems over a long period of time and have developed a lifelong habit. Tailoring interventions is not a new concept and it can be done with CTT, but one of the greatest advantages of IRM is that both the items and total scores are on the same metric and the properties of the items are sample invariant when the IRM ts the data. Having the items and total scores on the same metric means that we know what items a person with low self-efcacy is likely to endorse which strengthen our ability to develop tailored interven- tions. Therefore, by applying IRM methods, we can build databases to better understand the properties of individual items and the location of these items along a difculty or frequency of endorsement con- tinuum. Improved precision of items can lead to improved tailoring of interventions. Furthermore, this process can serve as the foundation to develop item banks that could be a shared resource among researchers and is instrumental for the development of computerized adaptive testing. The assembly of item banks is an important rst step in computerized adaptive testing which has become the norm in educational testing such as the national nursing licensing examinations and Gradu- ate Record Examination. To illustrate, items on each of these exams are ranked by difculty. Individuals who answer easy items correctly move to next levels of difculty and do not waste time on easier questions. The outcome is a reduction in testing time and increased precision (higher reliability and lower standard error of measurement). Computerized adaptive testing is under development in the quality of life assessment and could also be used for the assessment of many constructs used in health pro- motion research and practice. This long-term goal would require an iterative process of developing an initial item bank of the different theoretical con- structs to be measured. Gaps in the item bank would then be determined using IRM methods and new items developed where needed. IRM could then be used again to determine difculty ratings of banked items in order to be able to begin to develop more efcient construct measures [23]. For example, individuals who show high levels of efcacy may be administered only items for the more difcult end of the efcacy distribution. A key issue here is whether behavioral health represents skills or abili- ties that map onto concepts of difculty and whether the eld is able to move forwardin exploring innovations in measurement methodology. The papers by Heesch, Masse and Dunn [26] Watson, Baranowski and Thompson [27] provide examples of the IRM method applied to several Afterword i122
a t
M i h a i
E m i n e s c u
C e n t r a l
U n i v e r s i t y
L i b r a r y
o f
I a s i
o n
M a r c h
1 0 ,
2 0 1 4 h t t p : / / h e r . o x f o r d j o u r n a l s . o r g / D o w n l o a d e d
f r o m
different scales and demonstrate how we might use IRM to evaluate the psychometric properties of our measures. For example, in the paper by Heesch, three scales are evaluated, namely, physical activity enjoyment [28], barriers to physical activity and benets of physical activity [29]. The Rasch analyses of these scales demonstrated the unidi- mensionality of the enjoyment and benets scales but the barriers scale had two separate factors that seem to be related to time demands and barriers internal to the person. Further, the analyses in- dicated problematic items within the scales that seemed to tap other minor dimensions. These items could be dropped in a future scale revision. While this nding could have also been determined with CTT, IRM analyses also revealed that the perceived benets scale was limited in its ability to measure the underlying construct in a large sample of sedentary adults because items were not providing adequate content coverage for those having moder- ately high to high levels of perceived benets. The enjoyment scale also did not indicate good item content coverage for those on the low or high end of physical activity enjoyment. As all of these papers have shown, IRM provides detailed information about the performance of the items by identifying difculty gaps, which provides an important rst step toward improving testing in health promotion research and practice by increasing our understand- ing of the performance of particular items. In addition to identifying difculty gaps, IRM also helps to identify response gaps. For example, the Rasch model not only provides information about whether the distances between responses are equal or not (using the rating scale versus partial credit models) but also tells us about the functioning of the response format using itemcharacteristic curves. By having this information in addition to what CTT provides, we can greatly enhance the efciencies and understanding of our measurements. This lesson is made even clearer in the paper by Masse, Heesch and Eason [30] comparing conrmatory factor analyses (CFA) with CTT and IRM. Al- though all of these methods provide useful in- formation on psychometric properties of scales, IRM provides additional but complementary in- formation that cannot be provided by CFA or CTT. IRM extends our capabilities but does not replace the need to consider CFA or CTT approaches. IRM also provides a method for enhancing our understanding of how various scales purporting to measure comparable constructs might be similar or different. Currently, we have difculty comparing intervention outcomes (or surveys for that matter) that have administered different scales. Methods of scale linkage are highly desirable when trying to evaluate and contrast the results of several studies. The paper by Masse, Allen and Wilson [31] demon- strates how two eight-item self-regulatory scales can be linked and found that this is possible if there were at least four items in common between the measures. Behavioral science researchers often use different scales of the same construct and frequently items within the scale are similar. The IRM linking method is likely to be especially useful in meta- analytic reviews of studies to be able to evaluate the comparability of results as additional scale link- age papers become available. This is relevant to adaptive testing, as this linking could allow us to analyze variables measured with different versions of instruments. We are enthusiastic about the potential of IRM to advance the behavioral science measurement eld. The implications of IRM analyses hold exciting potential to improve our interventions, our under- standing of the mechanisms of behavior change and our evaluation of the theoretical basis for behavior change research. We are grateful to the leadership of Drs. Masse, Baranowski and Wilson for provid- ing a stimulating set of papers and we urge all who are interested in furthering the eld of behavioral science to consider IRM to advance our knowledge of the psychometric properties of our measure- ments. In the near future, there are many existing data sets that could be used to continue to build our knowledge base and we would urge the use of com- plementary approaches that employ CTT, CFA and IRM methods to gain a more complete picture of the performance of these scales in different popu- lations. In the future, it will be possible to develop item banks and to develop computerized adaptive tests for research. Such an effort will require Afterword i123
a t
M i h a i
E m i n e s c u
C e n t r a l
U n i v e r s i t y
L i b r a r y
o f
I a s i
o n
M a r c h
1 0 ,
2 0 1 4 h t t p : / / h e r . o x f o r d j o u r n a l s . o r g / D o w n l o a d e d
f r o m
leadership to build the necessary infrastructure for such an undertaking. Given that behavior underlies much of the chronic disease in the world, it is easy to understand the necessity of reaching these interme- diate and distant goals. References 1. Lancaster T, Stead LF. Individual behavioural counselling for smoking cessation. Cochrane Database Syst Rev 2005; CD001292. 2. Stead LF, Lancaster T. Group behaviour therapy pro- grammes for smoking cessation. Cochrane Database Syst Rev 2005; CD001007. 3. Gotay CC. Behavior and cancer prevention. J Clin Oncol 2005; 23: 30110. 4. Dzewaltowski DA, Estabrooks PA, Klesges LM et al. Behavior change intervention research in community set- tings: how generalizable are the results? Health Promot Int 2004; 19: 23545. 5. Katz DL, OConnell M, Yeh MC et al. Public health strategies for preventing and controlling overweight and obesity in school and worksite settings: a report on recom- mendations of the Task Force on Community Preventive Services. MMWR Recomm Rep 2005; 54: 112. 6. Hardeman W, Grifn S, Johnston M et al. Interventions to prevent weight gain: a systematic review of psychological models and behaviour change methods. Int J Obes Relat Metab Disord 2000; 24: 13143. 7. Avenell A, Broom J, Brown TJ et al. Systematic review of the long-term effects and economic consequences of treatments for obesity and implications for health improve- ment. Health Technol Assess 2004; 8: 21. 8. Rothman AJ. Is there nothing more practical than a good theory?: why innovations and advances in health behavior change will arise if interventions are used to test and rene theory. Int J Behav Nutr Phys Act 2004; 1: 11. 9. Jeffery RW. How can health behavior theory be made more useful for intervention research? Int J Behav Nutr Phys Act 2004; 1: 10. 10. Kroeze W, Werkman A, Brug J. A systematic review of randomized trials on the effectiveness of computer-tailored education on physical activity and dietary behaviors. Ann Behav Med 2006; 31: 20523. 11. Strecher V, Wang C, Derry H et al. Tailored interventions for multiple risk behaviors. Health Educ Res 2002; 17: 61926. 12. MacKinnon DP, Lockwood CM, Hoffman JM et al. A comparison of methods to test mediation and other intervening variable effects. Psychol Methods 2002; 7: 83104. 13. MacKinnon D, Dwyer JH. Estimating mediated effects in prevention studies. Eval Rev 1993; 17: 14458. 14. Bauman AE, Sallis JF, Dzewaltowski DA et al. Toward a better understanding of the inuences on physical activity: the role of determinants, correlates, causal variables, medi- ators, moderators, and confounders. Am J Prev Med 2002; 23: 514. 15. Kraemer HC, Wilson GT, Fairburn CG et al. Mediators and moderators of treatment effects in randomized clinical trials. Arch Gen Psychiatry 2002; 59: 87783. 16. Lord FM. Applications of Item Response Theory to Practical Testing Problems. Conference Proceedings. Mahwah, NJ: Lawrence Erlbaum Associates, Inc., 1980. 17. Masse LC, Dassa C, Gauvin L et al. Emerging measurement and statistical methods in physical activity research. Am J Prev Med 2002; 23: 4455. 18. Petersen MA, Groenvold M, Aaronson Net al. Itemresponse theory was used to shorten EORTC QLQ-C30 scales for use in palliative care. J Clin Epidemiol 2006; 59: 3644. 19. Petersen MA, Groenvold M, Aaronson N et al. Scoring based on item response theory did not alter the measurement ability of EORTC QLQ-C30 scales. J Clin Epidemiol 2005; 58: 9028. 20. Prieto L, Alonso J, Lamarca R. Classical test theory versus Rasch analysis for quality of life questionnaire reduction. Health Qual Life Outcomes 2003; 1: 27. 21. Wilson M, Allen DD, Li JC. Improving measurement in health education and health behavior research using item response modeling: comparison with the classical test theory approach. Health Educ Res 2006; 21(Suppl 1): i19i32. 22. Wilson M, Allen DD, Li JC. Improving measurement in health education and health behavior research using item response modeling: introducing item response modeling. Health Educ Res 2006; 21(Suppl 1): i4i18. 23. McHorney CA. Generic health measurement: past accom- plishments and a measurement paradigmfor the 21st century. Ann Intern Med 1997; 127: 74350. 24. Oenema A. Web-based tailored nutrition education: results of a randomized controlled trial. Health Educ Res 2002; 16: 64760. 25. Marcus BH, Bock BC, Pinto BM et al. Efcacy of an individualized, motivationally-tailored physical activity in- tervention. Ann Behav Med 1998; 20: 17480. 26. Heesch KC, Masse LC, Dunn AL. Using Rasch modeling to re-evaluate three scales related to physical activity: enjoyment, perceived benets and perceived barriers. Health Educ Res 2006; 21(Suppl 1): i58i72. 27. Watson K, Baranowski T, Thompson D. Item response modeling: an evaluation of the childrens fruit and vegetable self-efcacy questionnaire. Health Educ Res 2006; 21(Suppl 1): i47i57. 28. Kendzierski D, DeCarlo KJ. Physical activity enjoyment scale: two validation studies. J Sport Exerc Psychol 1999; 13: 5064. 29. Sallis JF, Hovell MF, Hofstetter CR et al. A multivariate study of determinants of vigorous exercise in a community sample. Prev Med 1989; 18: 2034. 30. Masse LC, Heesch KC, Eason KE et al. Evaluating the properties of a stage-specic self-efcacy scale for physical activity using classical test theory, conrmatory factor analysis, and item response modeling. Health Educ Res 2006; 21(Suppl 1): i33i46. 31. Masse LC, Allen D, Wilson M et al. Introducing equating methodologies to compare test scores from two different self-regulation scales. Health Educ Res 2006; 21(Suppl 1): i110i120. Afterword i124
a t
M i h a i
E m i n e s c u
C e n t r a l
U n i v e r s i t y
L i b r a r y
o f
I a s i
o n
M a r c h
1 0 ,
2 0 1 4 h t t p : / / h e r . o x f o r d j o u r n a l s . o r g / D o w n l o a d e d