Test Translation

Tests Translation and Adaptation
The translation and adaptation of psychological tests used for practice and research requires careful
attention to issues of bias and equivalence. Thorough translation methods help reduce bias and enhance
equivalence of multilingual versions of a test. Of equal importance is statistical verification of
equivalence.
Equivalence addresses the question of comparability of observations and test scores across cultures.
Lonner described four types: functional, conceptual, metric, and linguistic equivalence. These refer to
issues of comparability of behavior and concepts across cultures to issues of test item characteristics
(form, meaning structure). Van de Vijver also discussed four types of equivalence. Construct
nonequivalence refers to constructs being so dissimilar across cultures they cannot be compared.
Construct equivalence occurs when a scale measures the same underlying construct and nomological
network across cultural groups, but may not be defined the same way. With measurement unit
equivalence, the measurement scales for the instruments are equivalent (e.g., interval level), but their
origins are different across groups. Equivalence at this level may limit comparability of two language
versions of an instrument. The origins of the two versions may appear the same (both include interval
scales), but because of differential familiarity with the response format used (e.g., Likert scale), the two
versions are not identical. The same holds if the two cultural groups vary in response style (e.g.,
acquiescence). At the highest level of equivalence is scalar equivalence or full score comparability.
Equivalent instruments at this level measure a concept with the same interval/ratio scale across cultures
and the origins of the scales are similar. At this level, bias has been ruled out and direct cross-cultural
comparisons of scores on an instrument can be made.
Bias negatively influences equivalence and refers to factors limiting comparability of test scores across
cultural groups. Construct bias occurs when a construct is not identical across cultural groups (e.g.,
incomplete construct coverage). Method bias may limit scalar equivalence and can stem from specific
characteristics of the instrument (e.g., differential response styles) or from its administration. Item bias
can result from poor translation and item formulation and because item content may not be equally
relevant across cultural groups.
Use of proper translation procedures can minimize bias and help establish equivalence. The International
Test Commission (ITC) published translation guidelines to encourage attention to the cross-cultural
validity of translated or adapted instruments. The context guidelines emphasize minimizing construct,
method, and item bias, and assessing construct similarity or equivalence across cultural groups before
embarking on instrument translation. The development guidelines refer to the translation process itself,
while the administration guidelines suggest ways to minimize method bias. The interpretation guidelines
recommend verification of equivalence between language versions of an instrument.
Two general approaches have been identified when translating or adapting tests. In the applied
approach, items are literally translated. Item content is not changed to a new cultural context, and the
linguistic and psychological appropriateness of the items are assumed. With the adaptation approach,
some items may be literally translated while others require modification of wording and content to
enhance their appropriateness to a new culture. This approach is chosen if there is concern with
construct bias. For both approaches, attention to equivalence and absence of bias is important. Building
on the ITC guidelines and the work of others, the following should be considered when translating or
adapting tests.
Bilingual persons fluent in the original and target languages should perform the translation. A single
person or committee can be used. Employing test translators who are familiar with the target culture,
the construct being assessed, and principles of assessment minimizes item biases that may result from
literal translations.
After the translation team has agreed on the best translation, the measure should be independently
back-translated by additional person(s) into the original language. The back-translated version is then
compared to the original for linguistic equivalence. If the two versions are not identical, the researcher
works with the team to revise problematic items through a translation/back-translation process until
agreement is reached about equivalence. This process, however, does not guarantee a good scale
translation, as it often leads to literal translation at the cost of readability and naturalness of the
translated version. To minimize this problem, an expert in linguistics should be consulted. Test
instructions also need to go through the translation/back-translation process.
Once there is judgmental evidence of the equivalence of the two language versions, the translated scale
needs to be pretested. One approach is administering both versions to bilingual persons. Item responses
can then be compared using statistical methods. If item differences are discovered between versions, the
translations are reviewed and change accordingly. Additionally, a small group of bilingual individuals can
be employed to rate each item from both versions on a predetermined scale in regard to the similarity of
meaning conveyed. Problematic items are then refined until satisfactory.
A small sample of participants speaking the target language can also provide verbal or written feedback
about each item. The researcher may, for instance, randomly select scale items and ask probing
questions (e.g., what do you mean by your response?). Responses considered unfitting an item are
scrutinized and the translation changed. This method provides insight into how well the meaning of the
original items has fared in the translation. Another method may involve respondents rating their
perceptions about item clarity and appropriateness on a pre-determined scale. Unclear items or items
not fitting are changed. Finally a focus group approach can be used in which participants respond to the
translated version and discuss with the researcher(s) the meaning they associated with the items and
their perception about the clarity and cultural appropriateness of the items. Item wording can be
changed based on participants’ responses.
Along with the judgmental evidence just mentioned, statistical methods must be performed to verify
equivalence and lack of bias. Cronbach’s alpha, item-total scale correlations, and item means and
variations provide information about instruments’ psychometric properties. Significantly different
reliability coefficients, for example, may indicate item or construct bias. Comparing these statistics across
different language versions of an instrument offers preliminary data about equivalence.
Construct, conceptual, and measurement equivalence can also be measured at the scale level using
factor analyses, multidimensional scaling, and cluster analysis. Scalar or full score equivalence is more
difficult to establish than construct and measurement unit equivalence, and various biases (e.g., item
and method bias) may threaten this level of equivalence. Item bias can be found by studying the
distribution of item scores for all cultural groups. Item response theory, in which differential item
functioning is examined, may be used for this purpose, as can analysis of variance (ANOVA), logistic
regression, multiple-group standard error of the mean (SEM) invariance analyses, and multiple-group
mean and covariance structures analysis. Last, factors contributing to method bias can be assessed and
statistically held constant when measuring constructs across cultures.
There are many examples of psychological measures translated from English to other languages. For
instance, the Minnesota Multiphasic Personality Inventory-2 (MMPI-2), including the adolescent form
(MMPI-A), is available in nearly 20 languages. Multilingual versions of the Myers-Briggs Type Indicator,
Strong Interest Inventory, California Psychological Inventory Sixteen Personality Factor Questionnaire (16
PF), Self-Directed Search, Millon Clinical Multiaxial Inventory-III, Revised NEO Personality Inventory (NEO
PI-R), Hare Psychopathy Checklist-Revised, Beck Depression Inventory, State-Trait Anxiety Inventory, and
Wechsler Intelligence tests are also available.
Often, information about availability and psychometric properties of translations and adaptations of
tests can be accessed from the tests’ developers or distributors. It is unclear, however, how translations
of the measures mentioned above were performed and whether the tests were adapted for different
cultural and linguistic contexts.
If one uses a test that has been translated into the language of a specific target population, but that has
not been specifically developed and normed for that population, there is little to guarantee equivalence
across such factors as item difficulty and relevance, cultural bias, comprehension/decoding, and validity
within a differing cultural context. Beyond language, culture, and relevance, even the factor structure of
a specific test cannot be assumed to exist in an adapted translation. This has, for instance, been
observed in discussions regarding a 5- or 6-factor solution for the NEO PI-R in some specific cultural/
linguistic adaptations. Psychologists worldwide, however, are striving to develop culturally sensitive and
linguistically accurate translations of existing English version instruments. They are also developing
measures for particular national and ethnic populations.
References:
/Egisdottir, S., Gerstein, L. H., & Qinarba§, D. C. (2007, October 9). Methodological issues in cross-cultural
counseling research: Equivalence, bias and translations. The Counseling Psychologist. Retrieved from
http://tcp.sagepub.com/content/early/2007/10/09/0011000007305384.abstract
Brislin, R. W. (1986). The wording and translation of research instruments. In W. J. Lonner & J. W. Berry
(Eds.), Field methods in cross-cultural research (pp. 137-164). Beverly Hills, CA: Sage.
Byrne, B. M. (2004). Testing for multigroup invariance using AMOS graphics: A road less traveled.
Structural Equation Modeling: A Multidisciplinary Journal, 11, 272-300.
Cleary, T. A., & Hilton, T. L. (1968). An investigation of item bias. Educational and Psychological
Measurement, 28, 61-75.
Hambleton, R. K. (2001). The next generation of the ITC test translation and adaptation guidelines.
European Journal of Psychological Assessment, 17, 164-172.
Hambleton, R. K., & de Jong, J. H. A. L. (2003). Advances in translating and adapting educational and
psychological tests. Language Testing, 20, 127-134.
Little, T. D. (2000). On the comparability of constructs in cross-cultural research: A critique of Cheung and
Rensvold. Journal of Cross-Cultural Psychology, 31, 213-219.
Lonner, W. J. (1985). Issues in testing and assessment in cross-cultural counseling. The Counseling
Psychologist, 13, 599-614.
Van de Vijver, F. J. R., & Hambleton, R. K. (1996). Translating tests: Some practical guidelines. European
Psychologist, 1, 89-99.
Van de Vijver, F. J. R., & Leung, K. (1997). Methods and data analysis for cross-cultural research.
Thousand Oaks, CA: Sage.

Test Translation

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Test Translation

Uploaded by

Copyright:

Available Formats

Tests Translation and Adaptation

You might also like