Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

Test eory, Classical Test eory

Felix Frey
LMU Munich
felix.frey@iw.lmu.de

To be published in:
J. P. Matthes, R. Potter, & C. S. Davis (Eds.). (in prep.). International Encyclopedia of Communica-
tion Research Methods. Wiley Blackwell.

Abstract

Test eory is concerned with methods and criteria for the construction, evaluation and compar-
ison of ‘tests’, i.e., procedures for measuring observables or constructs of interest. ese methods
and criteria rest on assumptions about the composition of measurements as well as the properties
of these components and the relationship amongst them. e most prominent test-theoretical
frameworks are Classical Test eory (CTT) and Item Response eory (IRT) including the
Rasch model. Within the CTT framework influential quality criteria like objectivity, reliability,
and validity as well as methods for assessing them have been developed. Considering the small
number of simple assumptions made by CTT, it has proven quite fruitful, even in comparison to
“modern” test theories (e.g., Item Response eory) which overcome certain limitations of CTT
at the expense of stronger assumptions and greater mathematical complexity.

Test eory is concerned with methods and criteria for the construction, evaluation, and compar-
ison of tests. Tests are procedures for measuring observables or constructs, typically encompassing
several items then combined into a total test score. ese methods and criteria rest on assumptions
about the composition of measurements as well as the properties of these components and the
relationships amongst them. Typically discussed with regard to psychological constructs like abil-
ities, attitudes or knowledge, test theory in principle applies to the measurement of any type of
variable.
e most prominent test-theoretical frameworks are Classical Test eory (CTT) and Item
Response eory (IRT) including the Rasch model. e tenets of CTT were elaborated in the first
half of the 20th century. It is a “theory” only in that it forms a set of interconnected propositions,
not in the sense of being empirically testable.
CTT assumes that measurements usually do not exclusively reflect the true score of the
construct to be measured, but also unsystematic error resulting from various situational sources.
Lord and Novick (1968) define the true score (ῃ) for a particular subject (P) as the expected value
of the observed score in an infinite number of hypothetical (independent) replications of the meas-
urement for that subject:
ῃ=ᾱ(XP) (1)

Switching to the level of a random sample of subjects from a population, the basic model of CTT
understands observed scores as realizations of an observed-score random variable X. X is com-
posed of a true-score random variable Д and an error random variable E:

X=Д+E (2)

Among the three variables, only X is directly observable. From (1) and (2) follows that the expected
measurement error is zero:

ᾱ(E)=0

CTT further assumes that true scores and error scores are uncorrelated in the population, and that
error scores on one test are uncorrelated to error scores and true scores of a second test.
Within this framework, performance criteria like objectivity, reliability, and validity, as well
as influential methods for estimating reliability and the standard error of measurement, for equat-
ing scores from different test forms, and for scale development (item difficulty, discrimination and
validity) have been developed. Considering the small number of simple assumptions, CTT has
proven to be quite fruitful. Modern test models, like IRT, overcome several limitations of CTT,
e.g., the sample dependency of test and item statistics (like reliability coefficients) and CTT’s focus
on the test level rather than on individual test items. However, they do so at the expense of stronger
assumptions, greater mathematical complexity and more time and effort needed for scale devel-
opment (Hambleton & Jones, 1993).

References

Hambleton, R. K., & Jones, R. W. (1993). Comparison of classical test theory and item response
theory and their applications to test development. Educational Measurement, 12(3), 38–47.
doi:10.1111/j.1745-3992.1993.tb00543.x

Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-
Wesley.

Further Readings

Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. New York, NY:
Holt, Rinehart and Winston.

You might also like