Frey TestTheoryClassicalTestTheory

Test eory, Classical Test eory
Felix Frey
LMU Munich
felix.frey@iw.lmu.de
To be published in:
J. P. Matthes, R. Potter, & C. S. Davis (Eds.). (in prep.). International Encyclopedia of Communica-
tion Research Methods. Wiley Blackwell.
Abstract
Test eory is concerned with methods and criteria for the construction, evaluation and compar-
ison of ‘tests’, i.e., procedures for measuring observables or constructs of interest. ese methods
and criteria rest on assumptions about the composition of measurements as well as the properties
of these components and the relationship amongst them. e most prominent test-theoretical
frameworks are Classical Test eory (CTT) and Item Response eory (IRT) including the
Rasch model. Within the CTT framework influential quality criteria like objectivity, reliability,
and validity as well as methods for assessing them have been developed. Considering the small
number of simple assumptions made by CTT, it has proven quite fruitful, even in comparison to
“modern” test theories (e.g., Item Response eory) which overcome certain limitations of CTT
at the expense of stronger assumptions and greater mathematical complexity.
Test eory is concerned with methods and criteria for the construction, evaluation, and compar-
ison of tests. Tests are procedures for measuring observables or constructs, typically encompassing
several items then combined into a total test score. ese methods and criteria rest on assumptions
about the composition of measurements as well as the properties of these components and the
relationships amongst them. Typically discussed with regard to psychological constructs like abil-
ities, attitudes or knowledge, test theory in principle applies to the measurement of any type of
variable.
e most prominent test-theoretical frameworks are Classical Test eory (CTT) and Item
Response eory (IRT) including the Rasch model. e tenets of CTT were elaborated in the first
half of the 20th century. It is a “theory” only in that it forms a set of interconnected propositions,
not in the sense of being empirically testable.
CTT assumes that measurements usually do not exclusively reflect the true score of the
construct to be measured, but also unsystematic error resulting from various situational sources.
Lord and Novick (1968) define the true score (ῃ) for a particular subject (P) as the expected value
of the observed score in an infinite number of hypothetical (independent) replications of the meas-
urement for that subject:
ῃ=ᾱ(XP) (1)
Switching to the level of a random sample of subjects from a population, the basic model of CTT
understands observed scores as realizations of an observed-score random variable X. X is com-
posed of a true-score random variable Д and an error random variable E:
X=Д+E (2)
Among the three variables, only X is directly observable. From (1) and (2) follows that the expected
measurement error is zero:
ᾱ(E)=0
CTT further assumes that true scores and error scores are uncorrelated in the population, and that
error scores on one test are uncorrelated to error scores and true scores of a second test.
Within this framework, performance criteria like objectivity, reliability, and validity, as well
as influential methods for estimating reliability and the standard error of measurement, for equat-
ing scores from different test forms, and for scale development (item difficulty, discrimination and
validity) have been developed. Considering the small number of simple assumptions, CTT has
proven to be quite fruitful. Modern test models, like IRT, overcome several limitations of CTT,
e.g., the sample dependency of test and item statistics (like reliability coefficients) and CTT’s focus
on the test level rather than on individual test items. However, they do so at the expense of stronger
assumptions, greater mathematical complexity and more time and effort needed for scale devel-
opment (Hambleton & Jones, 1993).
References
Hambleton, R. K., & Jones, R. W. (1993). Comparison of classical test theory and item response
theory and their applications to test development. Educational Measurement, 12(3), 38–47.
doi:10.1111/j.1745-3992.1993.tb00543.x
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-
Wesley.
Further Readings
Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. New York, NY:
Holt, Rinehart and Winston.

Frey TestTheoryClassicalTestTheory

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Frey TestTheoryClassicalTestTheory

Uploaded by

Copyright:

Available Formats

Test eory, Classical Test eory

You might also like