Achievement Tests: Gregory J. Cizek

a0005
Achievement Tests
Gregory J. Cizek
University of North Carolina, Chapel Hill, North Carolina, USA
1. Introduction norm-referenced test (NRT) An instrument whose primary

2. Definitions and Examples purpose is to describe the relative standing of examinees
3. Types of Achievement Tests at a particular age or grade level.
4. Achievement Test Construction performance standards Prescribed levels of performance
5. Evaluating Achievement Tests representing differential degrees of knowledge or skill with
Further Reading respect to a set of content standards.
reliability The dependability of test scores; because an
assignment or test consists only of a sample of questions
or tasks, and because both the students who respond and
those who score the responses are susceptible to various
GLOSSARY unpredictabilities in their performance (called random
errors), no score can considered to be a perfectly depend-
ability test A procedure designed to measure an examinee’s able representation of a student’s performance.
potential for achievement, sometimes also referred to as an standardized A term used to describe any instrument that is
aptitude test. developed, administered, and scored under controlled
achievement test A procedure designed to measure an exam- conditions.
inee’s attainment of knowledge or skills. standards-referenced test (SRT) An instrument whose pri-
battery A collection of related subtests, the performance on mary purpose is to describe examinees’ performance with
which can be aggregated to form a composite or ‘‘total respect to a set of content standards.
battery’’ score on the entire test. validity The degree to which the conclusions yielded by any
classroom assessments Achievement tests that are developed, sample of behavior (e.g., test, quiz, interview) are mean-
administered, and scored for classroom use. ingful, accurate, and useful; in the context of achievement
content standards Statements that describe specific knowl- testing, validity is the degree to which students’ test scores
edge or skills over which examinees are expected to have yield inferences about the students’ level of knowledge,
mastery for a given age/grade and subject area. skill, or ability that are ‘‘correct’’ or ‘‘on target.’’
criterion-referenced test (CRT) An instrument whose pri-
mary purpose is to gauge whether an examinee knows or
can do specific things.
high-stakes/low-stakes Terms referring to the severity of the
consequences associated with test performance. Achievement tests are used in diverse contexts to
inference A reasoned leap, supported by evidence, made measure the degree to which examinees can
whenever observed test scores are interpreted in terms of demonstrate acquisition of knowledge or skills deemed
examinees’ underlying standing on a construct or actual to be important. The contexts range from teacher-made
level of knowledge or skill. testing in elementary and secondary school settings to
Encyclopedia of Applied Psychology, 41 #2004 Elsevier Inc.

VOLUME 1 All rights reserved.
42 Achievement Tests
high-stakes testing for college admission, licensure to sometimes unforeseen consequences affecting, for exam-
practice a profession, or certification. The design of ple, the security of tests, the formats used for testing, and
achievement tests varies depending on whether the the relationship between testing and instruction.
inference intended to be drawn regarding examinees’
performance is the absolute or relative level of mastery
of specific knowledge and skills. 2. DEFINITIONS AND EXAMPLES s0010
Achievement testing refers to any procedure or instru-

s0005
1. INTRODUCTION ment that is used to measure an examinee’s attainment
of knowledge or skills. Achievement testing can be done
Achievement testing is a general term used to describe informally, as in when a teacher asks a student to per-
any measurement process or instrument whose pur- form a skill such as reading aloud or demonstrating
pose is to estimate an examinee’s degree of attainment correct laboratory technique. More formal, and perhaps
of specified knowledge or skills. Beyond that central more common, achievement tests are routinely adminis-
purpose, achievement tests differ according to their tered in educational and occupational settings. Exam-
specific intended inference. Common inferences ples of formal achievement testing in education would
include either absolute level of performance on the include spelling tests, chemistry lab reports, end-of-unit
specified content or relative standing vis-à-vis other tests, homework assignments, and so on.
examinees on the same content. Achievement tests More formal achievement testing is evident in large-
may be group or individually administered. They may scale, commercially available standardized instruments.
consist of differing formats, including multiple-choice The majority of these achievement tests would be
items, essays, performance tasks, and portfolios. referred to as standardized to the extent that the publish-
Achievement tests are administered in diverse con- ers of the instruments develop, administer, and score the
texts. For example, they are used when the school- tests under uniform, controlled conditions. It is impor-
related skills of preschool pupils are measured to assess tant to note, however, that the term ‘‘standardized’’ is
their readiness for kindergarten. During the K–12 (a) unrelated to test format (although the multiple-choice
school years, students typically take a variety of format is often used for standardized tests, any format
achievement tests, ranging from teacher-made informal may be included) and (b) not synonymous with norm
assessments, to commercially prepared achievement referenced (although sometimes the term ‘‘standardized’’
batteries, to state-mandated high school graduation is used to indicate that a test has norms).
tests. Following formal schooling, achievement tests Examples of standardized achievement tests used in
are administered to assess whether examinees have an K–12 education would include the Iowa Tests of Basic
acceptable level of knowledge or skill for safe and Skills, the 10th edition of the Stanford Achievement Test,
competent practice in a regulated profession for and the TerraNova. These tests ordinarily consist of sev-
which licensure is required. In other situations, profes- eral subtests, measuring achievement in specific narrow
sional organizations establish certification procedures, areas such as language arts, mathematics, science, and
often including an achievement test, to determine study skills. The composite index formed from these
examinees’ eligibility to attain a credential or certifica- subtests (often referred to as a ‘‘complete battery score’’)
tion of advanced status in a field. Even the ubiquitous provides a more global measure of academic achievement.
requirements to obtain a driver’s license involve an The preceding tests are also usually administered in
achievement testing component. a group setting, although individually administered
Although the purposes and contexts may vary, fairly achievement tests are also available and are designed
uniform procedures are implemented for developing for administration in a one-on-one setting with indi-
achievement tests and for evaluating their technical vidual students, usually of very young age. Examples
quality. Several sources exist for potential users of of individually administered achievement tests in-
achievement tests to ascertain the quality of a particu- clude the Woodcock–Johnson III Tests of Achievement,
lar test and its suitability for their purposes. the third edition of the Developmental Indicators
In any context where an achievement test is used, for the Assessment of Learning, and the Brigance
consequences for individual persons or groups may fol- Comprehensive Inventory of Basic Skills.
low from test performance. In addition, the context and Following secondary schooling, achievement testing
extent of achievement testing may have broad and continues in colleges and universities, primarily in the
Achievement Tests 43
form of classroom achievement measures, but would interpreted with reference to the criterion. The written
also include standardized in-training examinations and driver’s license test is a familiar example of a CRT.
board examinations for persons pursuing professional SRTs are similar to CRTs in that they are designed to
careers. Achievement testing has a long history in di- measure an examinee’s absolute level of achievement
verse occupational fields. Achievement tests are rou- vis-à-vis fixed outcomes. These outcomes are narrowly
tinely administered to ascertain levels of knowledge or defined and are referred to as content standards. Unlike
skills when screening or selecting applicants for positions CRTs, however, interpretation of examinees’ perfor-
in business and industry. These tests have traditionally mance is referenced not to a single criterion but rather
been administered in paper-and-pencil format, although to descriptions of multiple levels of achievement called
technology has enabled administration via computer or performance standards that illustrate what performance
over the Internet to be secure, fast, and accessible. For at the various levels means. Typical reporting methods
example, one vendor of computerized achievement tests for SRTs would consist of performance standard cate-
offers computerized ‘‘work sample’’ achievement tests to gories such as basic, proficient, and advanced or begin-
assist human resources personnel in selecting applicants ner, novice, intermediate, and expert levels. Familiar
for positions in legal offices, food service, information examples of SRTs include state-mandated testing for
technology, accounting, medical offices, and others. K–12 students in English language arts, mathematics,
Many state, federal, and private organizations also and so on to the extent that the tests are aligned with
provide achievement tests for a variety fields in the state’s content standards in those subjects. At the
which licensure or certification is required. national level, the National Assessment of Educational
Progress (NAEP) is administered at regular intervals to
samples of students across the United States.
s0015
3. TYPES OF ACHIEVEMENT TESTS NRTs are designed to measure achievement in a
relative sense. Although NRTs are also constructed
In the previous section, it was noted that achievement based on a fixed set of objectives, the domain covered
tests could be categorized according to administration by an NRT is usually broader than that covered by a
(group or individual) and scale (informal classroom CRT. The important distinction of NRTs is that exam-
tests or more formal commercially available tests). inee performance is reported with respect to the per-
Another more important distinction focuses on the formance of one or more comparison groups of other
intended purpose, use, or inference that is to be made examinees. These comparison groups are called norm
from the observed test score. groups. Tables showing the correspondence between a
Less formal classroom achievement tests are usually student’s performance and the norm group’s perfor-
developed by a teacher to align with an instructional mance are called norms. Thus, an examinee’s perfor-
unit, or they may be pre-prepared by publishers of mance or score on an NRT is interpreted with
classroom textbooks or related materials. The primary reference to the norms. Typical reporting methods for
purposes of such tests are for educators’ use in refining NRTs include z scores, percentile ranks, normal curve
instruction and assigning grades as well as for both equivalent scores, grade- or age-equivalent scores, sta-
teacher and pupil use in understanding and responding nines, and other derived scale scores. Familiar exam-
to individual students’ strengths and weaknesses. ples of NRTs include the Iowa Tests of Basic Skills
More formal standardized achievement tests can also (ITBS), the Scholastic Assessment Test (SAT), and the
be categorized according to the inferences they yield. Graduate Record Examinations (GRE).
Three such types of tests—criterion-referenced tests Many publishers of large-scale achievement tests for
(CRTs), standards-referenced tests (SRTs), and norm- school students also provide companion ability tests to
referenced tests (NRTs)—are described in this section. be administered in conjunction with the achievement
CRTs are designed to measure absolute achievement batteries. The tests are administered in tandem to
of fixed objectives comprising a domain of interest. The derive ability/achievement comparisons that describe
content of CRTs is narrow, highly specific, and tightly the extent to which a student is ‘‘underachieving’’ or
linked to the specific objectives. Importantly, a criterion ‘‘overachieving’’ in school given his or her measured
for judging success on a CRT is specified a priori, and potential. Examples of these test pairings include the
performance is usually reported in terms of pass/fail, Otis–Lennon School Abilities Test (administered with
number of objectives mastered, or similar terms. Thus, the Stanford Achievement Test) and the Cognitive
an examinee’s performance or score on a CRT is Abilities Test (administered with the ITBS).
s0020
4. ACHIEVEMENT TEST promulgated by content area professional associations.
CONSTRUCTION Licensure, certification, or other credentialing tests
would seek a foundation in job analysis or survey of
Rigorous achievement test development consists of practitioners in the particular occupation. Regardless
numerous common steps. Achievement test construc- of the context, these first steps involving grounding of
tion differs slightly based on whether the focus of the the test in content standards, curriculum, or profes-
assessment is classroom use or larger scale. Table I sional practice provide an important foundation for the
provides a sequence listing 18 steps that would be validity of eventual test score interpretations.
common to most achievement test development. Common next steps would include deciding on and
In both large and smaller contexts, the test maker developing appropriate items or tasks and related scoring
would begin with specification of a clear purpose for guides to be field tested prior to actual administration of
the test or battery and a careful delineation of the the test. At this stage, test developers pay particular
domain to be sampled. Following this, the specific attention to characteristics of items and tasks (e.g., clar-
standards or objectives to be tested are developed. If ity, discriminating power, amenability to dependable
it is a classroom achievement test, the objectives may scoring) that will promote reliability of eventual scores
be derived from a textbook, an instructional unit, a obtained by examinees on the operational test.
school district curriculum guide, content standards, Following item/task tryout in field testing, a data-
or another source. Larger scale achievement tests base of acceptable items or tasks, called an item bank
(e.g., state mandated, standards referenced) would or item pool, would be created. From this pool, opera-
begin the test development process with reference to tional test forms would be drawn to match previously
adopted state content standards. Standardized norm- decided test specifications. Additional steps would be
referenced instruments would ordinarily be based on required, depending on whether the test is to be admin-
large-scale curriculum reviews, based on analysis of istered via paper-and-pencil format or computer.
content standards adopted in various states, or Ancillary materials, such as administrator guides and
examinee information materials, would also be pro-
duced and distributed in advance of test administra-
t0005
TABLE I tion. Following test administration, an evaluation of
Common Steps in Achievement Test Development testing procedures and test item/task performance
would be conducted. If obtaining scores on the current
1. Establish need, purpose test form that were comparable to scores from a previ-
2. Delineate domain to be tested ous test administration is required, then statistical
3. Develop specific objectives, content standards procedures for equating the two test forms would
4. Decide on item and test specifications, formats, length, take place. Once quality assurance procedures have
costs ensured accuracy of test results, scores for examinees
5. Develop items, tasks, scoring guides would be reported to individual test takers and other
6. Conduct item/task review (editorial, appropriateness, groups as appropriate. Finally, documentation of the
alignment, sensitivity) entire process would be gathered and refinements
7. Pilot/Field test items/tasks/scoring guides would be made prior to cycling back through the
8. Review item/task performance steps to develop subsequent test forms (Steps 5–18).
9. Create item bank/pool
10. Assemble test form(s) according to specifications
11. Develop test administration guidelines, materials
12. Establish performance standards 5. EVALUATING ACHIEVEMENT s0025
13. Administer operational test forms TESTS

14. Score test
15. Evaluate preliminary item/task and examinee In some contexts, a specific achievement test may be
performance
required for use (e.g., state-mandated SRTs). However,
16. Report scores to appropriate audiences
in many other contexts, potential users of an achievement
17. Evaluate test, document test cycle
test may have a large number of options from which to
18. Update item pool, revise development procedures,
choose. In such cases, users should be aware of the aids
develop new items/tasks
that exist to assist them in making informed choices.
Achievement Tests 45
One source of information about achievement tests is Finally, a particularly useful framework for evaluat-
the various test publishers. Many publishers have online ing achievement tests was developed by Rudner in
information available to help users gain a better under- 1994. Table II provides a modified version of key
standing of the purposes, audiences, and uses of their points identified by Rudner that should be addressed
products. Often, online information is somewhat limited when choosing an achievement test.
and rather nontechnical. However, in addition to pro- It is likely that some potential users will not have the
viding online information, many publishers will provide time or technical expertise necessary to fully evaluate
samples of test materials and technical documentation an achievement test independently. A rich source
on request to potential users. Frequently, publishers will of information exists for such users in the form of
provide one set of these packets of information, called published reviews of tests. Two compilations of
specimen sets, at no charge for evaluation purposes. test reviews are noteworthy: Mental Measurements
When evaluating an achievement test, it is important Yearbook (MMY) and Tests in Print. These references
to examine many aspects. A number of authorities have are available in nearly all academic libraries. In the case
provided advice on how to conduct such a review. For of MMY, the editors of these volumes routinely gather
example, one textbook for school counselors by Whiston test materials and forward those materials to two inde-
contains a section titled ‘‘Selection of an Assessment pendent reviewers. The reviewers provide brief (two- to
Instrument’’ that consists of several pages of advice and four-page) summaries of the purpose, technical quali-
a user-friendly checklist. The single authoritative source ties, and administration notes for the test. Along with
for such information would likely be the Standards for the summaries, each entry in MMY contains the publi-
Educational and Psychological Testing, jointly sponsored cation date for the test, information on how to contact
by the American Educational Research Association, the the publisher, and cost information for purchasing the
American Psychological Association, and the National test. In these volumes, users can compare several
Council on Measurement in Education. options for an intended use in a relatively short time.
t0010
TABLE II
Evaluation Criteria for Achievement Tests
1. Is the purpose of the test clearly stated? What achievement construct is claimed? Is the construct or intended content domain
clearly delineated?
2. What are the intended uses of the test? What are the intended audiences for the test results?
3. For what ages, grade levels, or subject areas is the test intended?
4. Are the test materials (e.g., booklets, answer sheets) clear, engaging, and appropriate for the age/grade level of the examinees?
5. What are the costs of the test materials, scoring, training personnel, and time required to administer the test?
6. Are the procedures for administering the test clear? Is the information provided sufficiently detailed so as to provide
consistent administrations across users and contexts?
7. What are the qualifications of those who participated in the development of the test? What qualifications are required for test
administrators?
8. How were samples selected for developing, pilot testing, norming, estimating reliability, screening out potentially biased
items, and gathering validity evidence for the test? Were the samples relevant? Were they representative? Were they
collected recently?
9. Does the test yield scores of acceptable reliability? Were appropriate methods used to compute reliability estimates? If
decisions are to be made based in part on test performance, what is the evidence regarding decision consistency?
10. Is there adequate validity evidence to support the intended inferences, uses, interpretations, or meanings that will be made
from test scores? Is there evidence to support the use of the test with various groups (including non-native speakers of
English, special needs learners, students in need of testing accommodations, etc.)?
11. Is the scoring system likely to produce accurate scores? If hand scoring is involved, are scoring keys easy to use and is
conversion of raw scores to derived scores facilitated by tables, look-up charts, and the like? If machine scoring is used, are
answer documents easy for examinees to use? Are timelines and costs for scoring reasonable?
12. Do score reports provide a clear, detailed, and comprehensible summary of performance and diagnostic information? Are
users appropriately cautioned about likely misinterpretations?
Source. Adapted from Rudner (1994).

A fee-based search capability for locating test reviews is Cizek, G. J. (2003). Detecting and preventing classroom cheat-
available at the MMY Web site (www.unl.edu). ing: Promoting integrity in schools. Thousand Oaks, CA:
Corwin.
CTB/McGraw–Hill. (1997). TerraNova. Monterey, CA: Author.
See Also the Following Articles Gronlund, N. E. (1993). How to make achievement tests and
assessments. Boston: Allyn & Bacon.
Educational Achievement and Culture
Harcourt Educational Measurement. (2002). Stanford
Achievement Test (10th ed.). San Antonio, TX: Author.
Hoover, H. D., Dunbar, S. B., & Frisbie, D. A. (2001). Iowa
Further Reading Tests of Basic Skills. Itasca, IL: Riverside.
American Educational Research Association, American Mardell-Czudnowski, C., & Goldenberg, D. S. (1998).
Psychological Association, and National Council on Developmental indicators for the assessment of learning
Measurement in Education. (1999). Standards for educa- (3rd ed.). Circle Pines, MN: American Guidance Services.
tional and psychological testing. Washington, DC: Author. Rudner, L. (1994, April). Questions to ask when evaluating
Brigance, A. H., & Glascoe, F. P. (1999). Brigance tests (ERIC/AE Digest, EDO-TM-94-06). Washington, DC:
Comprehensive Inventory of Basic Skills (rev. ed.). North ERIC Clearinghouse on Assessment and Evaluation.
Billerica, MA: Curriculum Associates. Whiston, S. C. (2000). Principles and applications of assess-
Cizek, G. J. (1997). Learning, achievement, and assessment: ment in counseling. Belmont, CA: Wadsworth.
Constructs at a crossroads. In G. D. Phye (Ed.), Handbook Woodcock, R. W., McGrew, K. S., & Mather, N. (2001).
of classroom assessment (pp. 1–32). San Diego: Academic Woodcock–Johnson III Tests of Achievement. Itasca, IL:
Press. Riverside.

Achievement Tests: Gregory J. Cizek

Uploaded by

Copyright:

Available Formats

You might also like

Achievement Tests: Gregory J. Cizek

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Achievement Tests: Gregory J. Cizek

Uploaded by

Copyright:

Available Formats

a0005

1. Introduction norm-referenced test (NRT) An instrument whose primary

Encyclopedia of Applied Psychology, 41 #2004 Elsevier Inc.

Achievement testing refers to any procedure or instru-

13. Administer operational test forms TESTS

Source. Adapted from Rudner (1994).

You might also like