Professional Documents
Culture Documents
Norm and Criterion
Norm and Criterion
norm-referenced tests
To understand what happened, we need to understand the difference
between criterion-referenced tests and norm-referenced tests.
Note that it doesn’t matter how many other people are in line or how tall
or short they are; whether or not you’re allowed to get on the ride is
determined solely by your height. Even if you’re the tallest person in line,
if the top of your head doesn’t reach the line on the height chart, you
can’t ride.
In the charts below, you can see the student’s score and performance
category (“below proficient”) do not change, regardless of whether they
are a top-performing student, in the middle, or a low-performing student.
This means knowing a student’s score for a criterion-referenced test will
only tell you how that specific student performed in relation to the
criterion, but not whether they performed below-average, above-
average, or average when compared to their peers.
One norm-referenced measure that many families are familiar with is the
baby weight growth charts in the pediatrician’s office, which show
which percentile a child’s weight falls in:
For example, a baby who weighed 2,600 grams at birth would be in the
7th percentile, weighing the same as or less than 93% of the babies in
the norm group. However, despite the very low percentile, 2,600 grams
is classified as a normal or healthy weight for babies born in the United
States—a birth weight of 2,500 grams is the cut-off, or criterion, for a
child to be considered low weight or at risk. (For the curious, 2,600
grams is about 5 pounds and 12 ounces.)
Thus, knowing a baby’s percentile rank for weight can tell you how they
compare with their peers, but not if the baby’s weight is “healthy” or
“unhealthy.”
In the charts below, you can see that, while the student’s score doesn’t
change, their percentile rank does change depending on how well the
students in the norm group performed. When the individual is a top-
performing student, they have a high percentile rank; when they are a
low-performing student, they have a low percentile rank.
What we can’t tell from these charts is whether or not the student
should be categorized as proficient or below proficient.
This means knowing a student’s percentile rank on a norm-referenced
test will tell you how well that specific student performed compared to
the performance of the norm group, but will not tell you whether the
student met, exceeded, or fell short of proficiency or any other criterion.
For example, you might have a student who has a high percentile rank,
but doesn’t meet the criterion for proficiency. Is that student doing well,
because they are outperforming their peers, or are they doing poorly,
because they haven’t achieved proficiency?
The opposite is also possible. A student could have a very low percentile
rank, but still meet the criterion for proficiency. Is this student doing
poorly, because they aren’t performing as well as their peers, or are they
doing well, because they’ve achieved proficiency?
However, these are fairly extreme and rather unlikely cases. Perhaps
more common is a “typical” or “average” student who does not achieve
proficiency because the majority of students are not achieving
proficiency.
In fact, this is the pattern we see with 2017 National Assessment of Educational
Progress (NAEP) scores, where the “typical” fourth-grade student (50th
percentile) has a score of 226 and the “average” fourth-grade student
(average of all student scores) has a score of 222, but proficiency
requires a score of 238 or higher.
What is a norm-referenced test? A norm-referenced test is a type of standardized test (that is, a test that is
identical for every test-taker). After the items on a norm-referenced test are scored, the scores are compared
to those of a comparison group, or norming group. Because the test-taker is compared to other people, the
results can be considered subjective.
A norm-referenced test compares the performance of the tester to that of
other testers with similar characteristics or demographics.
To develop a norm-referenced test, the test developers select a statistically relevant group of individuals and
administer the test items. The scores of this norming group are used to create the scoring system for that test.
The composition of the norming group depends on the test, but factors considered usually include age or
grade level, and may also be narrowed down by other demographic information. In addition, some tests are
normed for more than one group. If so, a test administrator gives the test, then chooses the correct scoring
system based on the subject's qualifications (i.e., the administrator uses a different scoring chart for a seven-
year-old than for a twelve-year-old).
The norm-referenced test definition indicates that the results are reported as a percentage or a percentile
ranking. The purpose of this number is to tell the test-taker what percentage of the norming group scored
above and below them. Many test developers use a bell curve to organize their data. This means that they
expect the largest percentage of test-takers to score in the middle range of the test, with smaller numbers of
test-takers performing below average and above average.
The following table provides a summary of the key features of criterion and norm referenced assessment approaches: 2,13
Standards are a specified and definite level of achievement that may be attained.
Criteria means the characteristics by which the quality of something may be judged.
The two terms Norm-Referenced and Criterion-Referenced are commonly used to describe
tests, exams, and assessments. They are often some of the first concepts learned
when studying assessment and psychometrics.
Norm-referenced means that we are referencing how your score compares to other
people. Criterion-referenced means that we are referencing how your score
compares to a criterion such as a cutscore or a body of knowledge.
What if the average score was 95%? Well, that changes your norm-referenced
interpretation (you are now below average) but the criterion-referenced
interpretation does not change.
However, you could interpret your score by looking at your percentile rank compared
to other examinees; it just doesn’t impact the cutscore