Ped 106 Report

CHAPTER 9
IMPROVING A
CLASSROOM-
BASED
ASSESSMENT TEST
THE NEW CHAPTER
This chapter therefore deals with providing
practical and necessary ways for improving
teacher-developed assessment tools. Popham
(2011) suggests two approaches to undertake
item improvement: the judgmental approach
and the empirical approach.
INTENDED LEARNING OUTCOMES:
At the end of Chapter 9, students are

expected to:
• Acquire procedures for improving a

classroom-based assessment test
JUDGMENTAL ITEM-IMPROVEMENT
• This approach basically makes use of
human judgment in reviewing the
items.
• The judges are the teachers themselves
who know exactly what the test is for,
the instructional outcomes to be
assessed, and the items' level of
difficulty appropriate to his/her class.
TEACHERS' OWN REVIEW
To presume perfection right away after its
construction may lead to failure to detect
shortcomings of the test or assessment task. There
are five suggestions given by Popham (2011) for
the teachers to follow in exercising judgment:
1. Adherence to item-specific guidelines and general
item-writing commandments. The preceding
chapter has provided specific guidelines in writing
various forms of objective and non-objective
constructed-response types and the selected-
response types for measuring lower- level and
higher-level thinking skills. These guidelines should
be used by the teachers to check how good the items
have been planned and written particularly in their
alignment to intended instructional outcomes.
2. Contribution to score-based inference. The
teacher examines if the expected scores generated
by the test can contribute to making valid inference
about the learners.
3. Accuracy of Content. This review should
especially be considered when tests have been
developed after a certain period of time. Changes
that may occur due to new discoveries or
developments can redefine the test content of a
summative test. If this happens, the items or the key
to correction may have to be revisited.
4. Absence of content gaps. This review criterion is
especially useful in strengthening the score-based
inference capability of the test. If the current tool
misses out on important content now prescribed by a
new curriculum standard, the score will likely not
give an accurate description of what is expected to
be assessed.
4. Fairness. The discussions on item-writing guidelines
always give warning on unintentionally favoring the
uninformed students obtain higher scores. These are
due inadvertent grammatical clues, unattractive
distracters, ambiguous problems and messy test
instructions. Sometimes, unfairness can happen
because of due advantage received by a particular
group like those seated in front of the classroom or
those coming from a particular socio-economic level.
Getting rid of faulty and biased items and writing
clear instructions definitely add to the fairness of the
test.
PEER REVIEW
There are schools that encourage peer or collegial
review of assessment instruments among themselves.
Time is provided for this activity, and it has almost
always yielded good results for improving tests and
performance-based assessment tasks. During these
teacher dyad or triad sessions, those teaching the same
subject area can openly review together the classroom
tests and tasks they have devised against some
consensual criteria. The suggestions given by test
experts can actually be used collegially as basis for a
review checklist:
PEER REVIEW
a. Do the items follow the specific and general
guidelines in writing items especially on:
• being aligned to instructional objectives?
• making the problem clear and unambiguous?
• providing plausible options?
• avoiding unintentional clues?
• having only one correct answer?
PEER REVIEW
b. Are the items free from inaccurate content?
c. Are the items free from obsolete content?
d. Are the test instructions clearly written for
students to follow?
e. Is the level of difficulty of the test appropriate
to level of learners?
f. Is the test fair to all kinds of students?
STUDENT REVIEW
Engagement of students in reviewing items
has become a laudable practice for improving
classroom tests. The judgment is based on the
students' experience in taking the test, their
impressions and reactions during the testing event.
The process can be efficiently carried out through
the use of a review questionnaire. Popham (2011)
illustrates a sample questionnaire shown in Table
9.1. It is better to conduct the review activity a day
after taking the test, so the students still remember
the experience when they see a blank copy of the
test.
Table 9.1 Item-Improvement Questionnaire For Students
1. If any of the items seemed confusing, which ones were

they?
2. Did any items have more than one correct answer? If so,
which ones?
3. Did any items have no correct answers? If so, which
ones?
4. Were there words in any items that confused you? If so,
which ones?
5. Were the directions for the test, or for particular
subsections, unclear? If so, which ones?
EMPIRICALLY-BASED PROCEDURES
• Item-improvement using empirically-based

methods is aimed at improving the quality
of an item using students' responses to the
test.
• Test developers refer to this as item
analysis as it utilizes data obtained
separately for each item.
ITEM ANALYSIS
•a process of examining
the student’s response to
individual item in the test.
• An item is considered good when difficulty

index and discrimination index, meet
certain characteristics.
• For a norm-referenced test, these two
indices are related since the level of
difficulty of an item contributes to its
discriminability.
• However, easy items that can be answered correctly by

more than 85% of the group, or difficult item, that can
be answered correctly by 15%, is not expected to
perform well as a “discriminator.”
The difficulty index, however, takes a
different meaning when used in the context of
criterion referenced interpretation or testing for
mastery. An item with a high difficulty index will
not be considered as an "easy item" and therefore a
weak item, but rather an item that displays the
capability of the learners to perform the expected
outcome. It therefore becomes an evidence of
mastery.
Particularly for objective tests, the responses
are binary in form, i.e. right or wrong, translates
into numerical figures as 1 and 0, for obtaining
nominal data like frequency, percentage and
proportion. Useful data then are in the form of:
a. Total number of students answering the item(T)

b. Total number of students answering the item
right(R)
DIFFICULTY INDEX
• is defined as the number of

students who are able to answer the
item correctly divided by the total
number of students.
DIFFICULTY INDEX
An item's difficulty index is obtained by calculating the p value (p)

which is the proportion of students answering the item correctly.
p = R/T
Where p is the difficulty index
R = Total number of students answering the item right
T = Total number of students answering the item
DIFFICULTY INDEX
DIFFICULTY INDEX
Item 1: There were 45 students in the class Item 1 has a p value of 0.67. Sixty-seven
who responded to Item 1 and 30 answered it percent (67%) got the item right while 33%
correctly. missed it.
P = 30/45
= 0.67
Item 2: In the same class, only 10 responded Item 2 has a p value of 0.22. Out of 45 only 10
correctly in Item 2. or 22% got the item right while 35 or 78%
missed it.
P= 10/45
=.22
For Normative-referenced test: Between the two items, Item 2 appears to be a much more
difficult item since less than a fourth of the class only was able to respond correctly.
For Criterion-referenced test: The class shows much better performance in Item 1 than in
Item 2. It is still a long way for many to master Item 2.
DIFFICULTY INDEX
DISCRIMINATION INDEX
• is the power of the item to discriminate the
students between those who scored high and those
who scored low in the test.
• basis of measuring the validity of an item.
• This is an item statistics that can reveal useful
information for improving an item.
• Shows the relationship between the student's
performance in an item (i.e. right or wrong) and
his/her total performance in the test represented by
the total score.
For classroom tests, the discrimination index shows if a difference exists
between the performance of those who scored high and those who scored low
in an item. As a general rule, the higher the discrimination index (D), the more
marked the magnitude of the difference is, and thus, the more discriminating
the item is. The nature of the difference, however, can take different directions:
a. Positively discriminating item - Proportion of high scoring group is

greater than that of the low scoring group.
b. Negatively discriminating item - Proportion of high scoring group is less
than that of the low scoring group.
c. Not discriminating - Proportion of high scoring group is equal to that of
the low scoring group
Calculation of the discrimination index therefore requires obtaining the difference
between the proportion of the high-scoring group getting the item correctly and
the proportion of the low-scoring group getting the item correctly using this
simple formula:
Another calculation can bring about the same result as (Kubiszyn and Borich,
2010):
As you can see R/T is actually getting the p value of an item. So, to get D is to get
the difference between the p-value involving the upper half and the p-value
involving the lower half. So, the formula for discrimination index (D) can also be
given as (Popham, 2011):
To obtain the proportions of the upper and lower groups responding to the item
correctly, the teacher follows these steps:
a. Score the test papers using a key to correction to obtain the total scores of
the students. Maximum score is the total number of objective items.
b. Order the test papers from highest to lowest score.
c. Split the test papers into halves: high group and low group.
 For a class of 50 or less students, do a 50-50 split. Take the upper half as
the HIGH Group and the lower half as the LOW Group
 For a big group of 100 or so, take the upper 25 - 27% and the lower 25 -
27%.
 Maintain equal numbers of test papers for Upper and Lower groups.
d. Obtain the p value for the Upper group and p-value for the Lower
group
P(upper) = Ru/Th P(lower) = Rl/Tl
e. Get the Discrimination index by getting the difference between the

p-values.
For purposes of evaluating the discriminating power of items, Popham (2011) offers the
guidelines proposed by Ebel & Frisbie (1991) shown in Table 9.2. The teachers can be
guided on how to select the satisfactory items and what to do to improve the rest.
Table 9.2 Guidelines for Evaluating the Discriminating Efficiency of Items
Discrimination Index Item Evaluation
. 40 and above Very good items

Reasonably good items, but possibly subject to
.30 - .39 Improvement
.20-.29 Marginal items, usually needing improvement
.19 and below Poor items, to be rejected or improved by revision

DISTRACTER ANALYSIS
• An empirical procedure to discover areas for item-improvement

utilizes an analysis of the distribution of responses across the
distracters. Especially when the difficulty index and
discrimination index of the item seem to suggest its being
candidate for revision, distracter analysis becomes a useful
follow-up.
• Can detect differences in how the more able students respond to
the distracters in a multiple-choice item compared to how the
less able ones do it.
• Can also provide an index of the plausibility of the alternatives,
that is, if they are functioning as good distracters.
DISTRACTER ANALYSIS
To illustrate this process, consider the frequency distribution of the responses of the
upper group and the lower group across the alternatives for two items. Separate
counts are done for the upper and lower group who chose A, B, C and D. The data is
organized in a distracter analysis table.
Table 9.3: Distracter Analysis Table
Table 9.3: Distracter Analysis Table
• What kinds of items do you see based on their D?

• What does their respective D indicate? Cite the data supporting this.
• Which of the two items is more discriminating? Why?
• Which items need to be revised?
SENSITIVITY TO INSTRUCTION INDEX
This is referred to as sensitivity to instruction index (Si) and it signifies
a change in student's performance as a result of instruction. The
information is useful for criterion- referenced tests which aim at
determining if mastery learning has been attained, after a designated or
prescribed instructional period. The basic question being addressed is a
directional one, ie, is student performance better after instruction is
given. In the context of item performance, Si will indicate if p-value
obtained for the item in the post-test is greater than the p-value in the
pre-test. Consider an item where in a class of 40, 80% answered it
correctly in the post-test while only 10% did it in the pre-test.
SENSITIVITY TO INSTRUCTION INDEX
Its p-value for the post-test is .80 while for pre-test is .10, thus Si =.70 following this
calculation:
Sensitivity to instruction (Si) = P(post) - P(pre)

= .80 - .10
= .70
END

Ped 106 Report

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ped 106 Report

Uploaded by

Copyright:

Available Formats

CHAPTER 9

At the end of Chapter 9, students are

• Acquire procedures for improving a

1. If any of the items seemed confusing, which ones were

• Item-improvement using empirically-based

• An item is considered good when difficulty

• However, easy items that can be answered correctly by

a. Total number of students answering the item(T)

• is defined as the number of

An item's difficulty index is obtained by calculating the p value (p)

a. Positively discriminating item - Proportion of high scoring group is

e. Get the Discrimination index by getting the difference between the

Table 9.2 Guidelines for Evaluating the Discriminating Efficiency of Items

Discrimination Index Item Evaluation

. 40 and above Very good items

.20-.29 Marginal items, usually needing improvement

.19 and below Poor items, to be rejected or improved by revision

• An empirical procedure to discover areas for item-improvement

• What kinds of items do you see based on their D?

Sensitivity to instruction (Si) = P(post) - P(pre)

You might also like