Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 3

The following is adapted from: Popham, J. W. (1975). Educational evaluation.

Englewood
Cliffs, New Jersey: Prentice-Hall, Inc.
Criterion-Referenced Norm-Referenced
Dimension
Tests Tests
To determine whether each student To rank each student with respect to
has achieved specific skills or the
concepts. achievement of others in broad areas
Purpose
To find out how much students of knowledge.
know before instruction begins and To discriminate between high and
after it has finished. low achievers.
Measures specific skills which
make up a designated curriculum. Measures broad skill areas sampled
These skills are identified by from a variety of textbooks, syllabi,
Content
teachers and curriculum experts. and the judgments of curriculum
Each skill is expressed as an experts.
instructional objective.
Each skill is tested by at least four
Each skill is usually tested by less
items in order to obtain an adequate
than four items.
sample of student
Item Items vary in difficulty.
performance and to minimize the
Characteristics Items are selected that discriminate
effect of guessing.
between high
The items which test any given
and low achievers.
skill are parallel in difficulty.
Each individual is compared with
Each individual is compared with a other examinees and assigned a
preset standard for acceptable score--usually expressed as a
achievement. The performance of percentile, a grade equivalent
Score other examinees is irrelevant. score, or a stanine.
Interpretation A student's score is usually Student achievement is reported for
expressed as a percentage. broad skill areas, although some
Student achievement is reported for norm-referenced tests do report
individual skills. student achievement for individual
skills.
The differences outlined are discussed in many texts on testing. The teacher or administrator who
wishes to acquire a more technical knowledge of criterion-referenced test or its norm-referenced
counterpart, may find the text from which this material was adapted particularly helpful.
Additional resources:
 Bond, L. (1996). Norm- and criterion-referenced testing. Practical Assessment, Research
& Evaluation, 5(2). Retrieved September 2002, from
http://ericae.net/pare/getvn.asp?v=5&n=2.
 Linn, R. (2000). Assessments and accountability. ER Online, 29(2), 4-14. Retrieved
September, 2002, from http://www.aera.net/pubs/er/arts/29-02/linn01.htm.
 Sanders, W., & Horn, S. (1995). Educational assessment reassessed: The usefulness of
standardized and alternative measures of student achievement as indicators for the
assessment of educational outcomes. Education Policy Analysis Archives, 3(6). Retrieved
September 2002, from http://olam.ed.asu.edu/epaa/v3n6.html.

Norm-Referenced vs. Criterion-Referenced Tests


Norm-referenced tests are specifically designed to rank test takers on a “bell curve,” or a
distribution of scores that resembles, when graphed, the outline of a bell—i.e., a small
percentage of students performing well, most performing average, and a small percentage
performing poorly. To produce a bell curve each time, test questions are carefully designed to
accentuate performance differences among test takers, not to determine if students have achieved
specified learning standards, learned certain material, or acquired specific skills and knowledge.
Tests that measure performance against a fixed set of standards or criteria are called criterion-
referenced tests.
Reform
Norm-referenced tests have historically been used to make distinctions among students, often for
the purposes of course placement, program eligibility, or school admissions. Yet because norm-
referenced tests are designed to rank student performance on a relative scale—i.e., in relation to
the performance of other students—norm-referenced testing has been abandoned by many
schools and states in favor of criterion-referenced tests, which measure student performance in
relation to common set of fixed criteria or standards.
It should be noted that norm-referenced tests are typically not the form of standardized test
widely used to comply with state or federal policies—such as the No Child Left Behind Act—
that are intended to measure school performance, close “achievement gaps,” or hold schools
accountable for improving student learning results. In most cases, criterion-referenced tests are
used for these purposes because the goal is to determine whether schools are successfully
teaching students what they are expected to learn.
Similarly, the assessments being developed to measure student achievement of the Common
Core State Standards are also criterion-referenced exams. However, some test developers
promote their norm-referenced exams—for example, the TerraNova Common Core—as a way
for teachers to “benchmark” learning progress and determine if students are on track to perform
well on Common Core–based assessments.

Criterion-referenced test results are often based on the number of correct answers provided by
students, and scores might be expressed as a percentage of the total possible number of correct
answers. On a norm-referenced exam, however, the score would reflect how many more or fewer
correct answers a student gave in comparison to other students. Hypothetically, if all the students
who took a norm-referenced test performed poorly, the least-poor results would rank students in
the highest percentile. Similarly, if all students performed extraordinarily well, the least-strong
performance would rank students in the lowest percentile.

It should be noted that norm-referenced tests cannot measure the learning achievement or
progress of an entire group of students, but only the relative performance of individuals within a
group. For this reason, criterion-referenced tests are used to measure whole-group performance.

http://edglossary.org/norm-referenced-test/
Norm-Referenced

Ricki wants to know if her curriculum will help students learn math skills, and she's written a
math test for the students to take. But how should she determine what passing means?

A norm-referenced test scores a test by comparing a person's performance to others who are
similar. You can remember norm-referenced by thinking of the word 'normal.' The object of a
norm-referenced test is to compare a person's performance to what is normal for other people
like him or her.

Think of it kind of like a race. If a runner comes in third in a race, that doesn't tell us anything
objectively about what the runner did. We don't know if she finished in 30 seconds or 30
minutes; we only know that she finished after two other runners and ahead of everyone else.

So, if Ricki decides to make her test norm-referenced, she would compare students to what is
normal for that age, grade, or class. Examples of norm-referenced tests include the SAT, IQ tests,
and tests that are graded on a curve. Anytime a test offers a percentile rank, it is a norm-
referenced test. If you score at the 80th percentile, that means that you scored better than 80% of
people in your group.

Norm-referenced tests are a good way to compensate for any mistakes that might be made in
designing the measurement tool. For example, what if Ricki's math test is too easy, and
everybody aces it? If it is a norm-referenced test, that's OK because you're not looking at the
actual scores of the students but how well they did in relation to students in the same age group,
grade, or class.

Criterion-Referenced

But norm-referenced tests aren't perfect. They aren't completely objective and make it hard to
know anything other than how someone did in comparison to others. But what if we want to
know about a person's performance without comparing them to others?

A criterion-referenced test is scored on an absolute scale with no comparisons made. It is


interested in one thing only: did you meet the standards?

Let's go back to our race scenario. Saying that a runner came in third place is norm-referenced
because we are comparing her to the other runners in the race. But if we look at her time in the
race, that's criterion-referenced. Saying she finished the race in 58:42 is an objective measure
that is not a comparison to others.

You might also like