Professional Documents
Culture Documents
Achievement Tests: Gregory J. Cizek
Achievement Tests: Gregory J. Cizek
Achievement Tests: Gregory J. Cizek
Achievement Tests
Gregory J. Cizek
University of North Carolina, Chapel Hill, North Carolina, USA
high-stakes testing for college admission, licensure to sometimes unforeseen consequences affecting, for exam-
practice a profession, or certification. The design of ple, the security of tests, the formats used for testing, and
achievement tests varies depending on whether the the relationship between testing and instruction.
inference intended to be drawn regarding examinees’
performance is the absolute or relative level of mastery
of specific knowledge and skills. 2. DEFINITIONS AND EXAMPLES s0010
form of classroom achievement measures, but would interpreted with reference to the criterion. The written
also include standardized in-training examinations and driver’s license test is a familiar example of a CRT.
board examinations for persons pursuing professional SRTs are similar to CRTs in that they are designed to
careers. Achievement testing has a long history in di- measure an examinee’s absolute level of achievement
verse occupational fields. Achievement tests are rou- vis-à-vis fixed outcomes. These outcomes are narrowly
tinely administered to ascertain levels of knowledge or defined and are referred to as content standards. Unlike
skills when screening or selecting applicants for positions CRTs, however, interpretation of examinees’ perfor-
in business and industry. These tests have traditionally mance is referenced not to a single criterion but rather
been administered in paper-and-pencil format, although to descriptions of multiple levels of achievement called
technology has enabled administration via computer or performance standards that illustrate what performance
over the Internet to be secure, fast, and accessible. For at the various levels means. Typical reporting methods
example, one vendor of computerized achievement tests for SRTs would consist of performance standard cate-
offers computerized ‘‘work sample’’ achievement tests to gories such as basic, proficient, and advanced or begin-
assist human resources personnel in selecting applicants ner, novice, intermediate, and expert levels. Familiar
for positions in legal offices, food service, information examples of SRTs include state-mandated testing for
technology, accounting, medical offices, and others. K–12 students in English language arts, mathematics,
Many state, federal, and private organizations also and so on to the extent that the tests are aligned with
provide achievement tests for a variety fields in the state’s content standards in those subjects. At the
which licensure or certification is required. national level, the National Assessment of Educational
Progress (NAEP) is administered at regular intervals to
samples of students across the United States.
s0015
3. TYPES OF ACHIEVEMENT TESTS NRTs are designed to measure achievement in a
relative sense. Although NRTs are also constructed
In the previous section, it was noted that achievement based on a fixed set of objectives, the domain covered
tests could be categorized according to administration by an NRT is usually broader than that covered by a
(group or individual) and scale (informal classroom CRT. The important distinction of NRTs is that exam-
tests or more formal commercially available tests). inee performance is reported with respect to the per-
Another more important distinction focuses on the formance of one or more comparison groups of other
intended purpose, use, or inference that is to be made examinees. These comparison groups are called norm
from the observed test score. groups. Tables showing the correspondence between a
Less formal classroom achievement tests are usually student’s performance and the norm group’s perfor-
developed by a teacher to align with an instructional mance are called norms. Thus, an examinee’s perfor-
unit, or they may be pre-prepared by publishers of mance or score on an NRT is interpreted with
classroom textbooks or related materials. The primary reference to the norms. Typical reporting methods for
purposes of such tests are for educators’ use in refining NRTs include z scores, percentile ranks, normal curve
instruction and assigning grades as well as for both equivalent scores, grade- or age-equivalent scores, sta-
teacher and pupil use in understanding and responding nines, and other derived scale scores. Familiar exam-
to individual students’ strengths and weaknesses. ples of NRTs include the Iowa Tests of Basic Skills
More formal standardized achievement tests can also (ITBS), the Scholastic Assessment Test (SAT), and the
be categorized according to the inferences they yield. Graduate Record Examinations (GRE).
Three such types of tests—criterion-referenced tests Many publishers of large-scale achievement tests for
(CRTs), standards-referenced tests (SRTs), and norm- school students also provide companion ability tests to
referenced tests (NRTs)—are described in this section. be administered in conjunction with the achievement
CRTs are designed to measure absolute achievement batteries. The tests are administered in tandem to
of fixed objectives comprising a domain of interest. The derive ability/achievement comparisons that describe
content of CRTs is narrow, highly specific, and tightly the extent to which a student is ‘‘underachieving’’ or
linked to the specific objectives. Importantly, a criterion ‘‘overachieving’’ in school given his or her measured
for judging success on a CRT is specified a priori, and potential. Examples of these test pairings include the
performance is usually reported in terms of pass/fail, Otis–Lennon School Abilities Test (administered with
number of objectives mastered, or similar terms. Thus, the Stanford Achievement Test) and the Cognitive
an examinee’s performance or score on a CRT is Abilities Test (administered with the ITBS).
44 Achievement Tests
s0020
4. ACHIEVEMENT TEST promulgated by content area professional associations.
CONSTRUCTION Licensure, certification, or other credentialing tests
would seek a foundation in job analysis or survey of
Rigorous achievement test development consists of practitioners in the particular occupation. Regardless
numerous common steps. Achievement test construc- of the context, these first steps involving grounding of
tion differs slightly based on whether the focus of the the test in content standards, curriculum, or profes-
assessment is classroom use or larger scale. Table I sional practice provide an important foundation for the
provides a sequence listing 18 steps that would be validity of eventual test score interpretations.
common to most achievement test development. Common next steps would include deciding on and
In both large and smaller contexts, the test maker developing appropriate items or tasks and related scoring
would begin with specification of a clear purpose for guides to be field tested prior to actual administration of
the test or battery and a careful delineation of the the test. At this stage, test developers pay particular
domain to be sampled. Following this, the specific attention to characteristics of items and tasks (e.g., clar-
standards or objectives to be tested are developed. If ity, discriminating power, amenability to dependable
it is a classroom achievement test, the objectives may scoring) that will promote reliability of eventual scores
be derived from a textbook, an instructional unit, a obtained by examinees on the operational test.
school district curriculum guide, content standards, Following item/task tryout in field testing, a data-
or another source. Larger scale achievement tests base of acceptable items or tasks, called an item bank
(e.g., state mandated, standards referenced) would or item pool, would be created. From this pool, opera-
begin the test development process with reference to tional test forms would be drawn to match previously
adopted state content standards. Standardized norm- decided test specifications. Additional steps would be
referenced instruments would ordinarily be based on required, depending on whether the test is to be admin-
large-scale curriculum reviews, based on analysis of istered via paper-and-pencil format or computer.
content standards adopted in various states, or Ancillary materials, such as administrator guides and
examinee information materials, would also be pro-
duced and distributed in advance of test administra-
t0005
TABLE I tion. Following test administration, an evaluation of
Common Steps in Achievement Test Development testing procedures and test item/task performance
would be conducted. If obtaining scores on the current
1. Establish need, purpose test form that were comparable to scores from a previ-
2. Delineate domain to be tested ous test administration is required, then statistical
3. Develop specific objectives, content standards procedures for equating the two test forms would
4. Decide on item and test specifications, formats, length, take place. Once quality assurance procedures have
costs ensured accuracy of test results, scores for examinees
5. Develop items, tasks, scoring guides would be reported to individual test takers and other
6. Conduct item/task review (editorial, appropriateness, groups as appropriate. Finally, documentation of the
alignment, sensitivity) entire process would be gathered and refinements
7. Pilot/Field test items/tasks/scoring guides would be made prior to cycling back through the
8. Review item/task performance steps to develop subsequent test forms (Steps 5–18).
9. Create item bank/pool
10. Assemble test form(s) according to specifications
11. Develop test administration guidelines, materials
12. Establish performance standards 5. EVALUATING ACHIEVEMENT s0025
One source of information about achievement tests is Finally, a particularly useful framework for evaluat-
the various test publishers. Many publishers have online ing achievement tests was developed by Rudner in
information available to help users gain a better under- 1994. Table II provides a modified version of key
standing of the purposes, audiences, and uses of their points identified by Rudner that should be addressed
products. Often, online information is somewhat limited when choosing an achievement test.
and rather nontechnical. However, in addition to pro- It is likely that some potential users will not have the
viding online information, many publishers will provide time or technical expertise necessary to fully evaluate
samples of test materials and technical documentation an achievement test independently. A rich source
on request to potential users. Frequently, publishers will of information exists for such users in the form of
provide one set of these packets of information, called published reviews of tests. Two compilations of
specimen sets, at no charge for evaluation purposes. test reviews are noteworthy: Mental Measurements
When evaluating an achievement test, it is important Yearbook (MMY) and Tests in Print. These references
to examine many aspects. A number of authorities have are available in nearly all academic libraries. In the case
provided advice on how to conduct such a review. For of MMY, the editors of these volumes routinely gather
example, one textbook for school counselors by Whiston test materials and forward those materials to two inde-
contains a section titled ‘‘Selection of an Assessment pendent reviewers. The reviewers provide brief (two- to
Instrument’’ that consists of several pages of advice and four-page) summaries of the purpose, technical quali-
a user-friendly checklist. The single authoritative source ties, and administration notes for the test. Along with
for such information would likely be the Standards for the summaries, each entry in MMY contains the publi-
Educational and Psychological Testing, jointly sponsored cation date for the test, information on how to contact
by the American Educational Research Association, the the publisher, and cost information for purchasing the
American Psychological Association, and the National test. In these volumes, users can compare several
Council on Measurement in Education. options for an intended use in a relatively short time.
t0010
TABLE II
Evaluation Criteria for Achievement Tests
1. Is the purpose of the test clearly stated? What achievement construct is claimed? Is the construct or intended content domain
clearly delineated?
2. What are the intended uses of the test? What are the intended audiences for the test results?
3. For what ages, grade levels, or subject areas is the test intended?
4. Are the test materials (e.g., booklets, answer sheets) clear, engaging, and appropriate for the age/grade level of the examinees?
5. What are the costs of the test materials, scoring, training personnel, and time required to administer the test?
6. Are the procedures for administering the test clear? Is the information provided sufficiently detailed so as to provide
consistent administrations across users and contexts?
7. What are the qualifications of those who participated in the development of the test? What qualifications are required for test
administrators?
8. How were samples selected for developing, pilot testing, norming, estimating reliability, screening out potentially biased
items, and gathering validity evidence for the test? Were the samples relevant? Were they representative? Were they
collected recently?
9. Does the test yield scores of acceptable reliability? Were appropriate methods used to compute reliability estimates? If
decisions are to be made based in part on test performance, what is the evidence regarding decision consistency?
10. Is there adequate validity evidence to support the intended inferences, uses, interpretations, or meanings that will be made
from test scores? Is there evidence to support the use of the test with various groups (including non-native speakers of
English, special needs learners, students in need of testing accommodations, etc.)?
11. Is the scoring system likely to produce accurate scores? If hand scoring is involved, are scoring keys easy to use and is
conversion of raw scores to derived scores facilitated by tables, look-up charts, and the like? If machine scoring is used, are
answer documents easy for examinees to use? Are timelines and costs for scoring reasonable?
12. Do score reports provide a clear, detailed, and comprehensible summary of performance and diagnostic information? Are
users appropriately cautioned about likely misinterpretations?
A fee-based search capability for locating test reviews is Cizek, G. J. (2003). Detecting and preventing classroom cheat-
available at the MMY Web site (www.unl.edu). ing: Promoting integrity in schools. Thousand Oaks, CA:
Corwin.
CTB/McGraw–Hill. (1997). TerraNova. Monterey, CA: Author.
See Also the Following Articles Gronlund, N. E. (1993). How to make achievement tests and
assessments. Boston: Allyn & Bacon.
Educational Achievement and Culture
Harcourt Educational Measurement. (2002). Stanford
Achievement Test (10th ed.). San Antonio, TX: Author.
Hoover, H. D., Dunbar, S. B., & Frisbie, D. A. (2001). Iowa
Further Reading Tests of Basic Skills. Itasca, IL: Riverside.
American Educational Research Association, American Mardell-Czudnowski, C., & Goldenberg, D. S. (1998).
Psychological Association, and National Council on Developmental indicators for the assessment of learning
Measurement in Education. (1999). Standards for educa- (3rd ed.). Circle Pines, MN: American Guidance Services.
tional and psychological testing. Washington, DC: Author. Rudner, L. (1994, April). Questions to ask when evaluating
Brigance, A. H., & Glascoe, F. P. (1999). Brigance tests (ERIC/AE Digest, EDO-TM-94-06). Washington, DC:
Comprehensive Inventory of Basic Skills (rev. ed.). North ERIC Clearinghouse on Assessment and Evaluation.
Billerica, MA: Curriculum Associates. Whiston, S. C. (2000). Principles and applications of assess-
Cizek, G. J. (1997). Learning, achievement, and assessment: ment in counseling. Belmont, CA: Wadsworth.
Constructs at a crossroads. In G. D. Phye (Ed.), Handbook Woodcock, R. W., McGrew, K. S., & Mather, N. (2001).
of classroom assessment (pp. 1–32). San Diego: Academic Woodcock–Johnson III Tests of Achievement. Itasca, IL:
Press. Riverside.