Purpose of Item Analysis
—Evaluates the quality of each item
— Rationale: the quality of items determines the
quality of test (i.e., reliability & validity)
— May suggest ways of improving the
faalor- 10s aa a elmore
—Can help with understanding why certain
tests predict some criteria but not othersItem Analysis
@ When analyzing the test items, we have several
questions about the performance of each item. Some
of these questions include:
Are the items congruent with the test objectives?
Are the items valid? Do they measure what they're
supposed to measure?
PAC aU Taco eae Vm Cee Meola tal ata
BWA clae Relea RUN Sc Ulecrm Mee ccm rte)
Tetanied
Nr ae Ue Mateos Lilet a co aM NC meta ton Nig
arta nee LENA
Are there any poor performing items that need to be
discarded?Types of Item Analyses for CTT
Three major types:
1. Assess quality of the distractors
2. Assess difficulty of the items
3. Assess how well an item
differentiates between high and low
performersDistractor Analysis
First question of item analysis: How many
people choose each response?
If there is only one best response, then all
other response options are distractors.
Example from in-class assignment (N = 35):
Which method has the best internal consistency? #
a) projective test 1
Me ae Tee aT
c) forced choice Pa
d) differences n.s. aDistractor Analysis (cont’d)
A perfect test item would have 2 characteri:
1. Everyone who knows the item gets it right
2. People who do not know the item will have
responses equally distributed across the wrong answers.
It is not desirable to have one of the distractors chosen
more often than the correct answer.
This result indicates a potential problem with the
question. This distractor may be too similar to the correct
answer and/or there may be something in
or the alternatives that is misleading.Distractor Analysis (cont’d)
Calculate the # of people expected to choose each of the
distractors. If random same expected number for each
wrong response (Figure 10-1).
Pad
Soe i
Pe lrar te cetg DT mee ac CaleyDistractor Analysis (cont’d)
eae ome tere tite Melitta tls
Prete Nacecest a ae tomced tarts Matton Teed
Pelorr tied ed
1. It is possible that the choice reflects partial knowledge
2. The item is a poorly worded trick question
Pea tlre ict ie tur ae Nom ctun atk ome Liitttl isd
because it is easily eliminated
extremely popular is likely to lower the reliability and
validity of the testItem Difficulty Analysis
@ Description and How to Compute
ex: a)(6X3)+4=?7
b) 9TT[1n(-3.68) X (1 — 1n(+3.68))] = ?
@lItis often difficult to explain or define difficulty in
terms of some intrinsic characteristic of the item
@ The only common thread of difficult items is that
individuals did not know the answerPercentage of test takers who respond correctly
What if p = .00
What if p = 1.00? =Item Difficulty
— An item with a p value of .0 or 1.0 does not
contribute to measuring individual differences and
thus is certain to be useless
— When comparing 2 test scores, we are interested in
who had the higher score or the differences in
rere tr)
—p value of .5 have most variation so seek items in
this range and remove those with extreme values
—can also be examined to determine proportion
answering in a particular way for items that don’t
have a “correct” answerItem Difficulty (cont.)
What is the best p-value?
— most optimal p-value = .50
— maximum discrimination between good
and poor performers
Should we only choose items of .50?
When shouldn't we?Should we only choose items of .50?
Not necessarily ...
@ When wanting to screen the very top group of
applicants (i.e., admission to university or medical
school).
Cutoffs may be much higher
@ Other institutions want a minimum level (i.e., minimum
reading level)
Cutoffs may be much lowerItem Difficulty (cont.)
Interpreting the p-value...
example:
100 people take a test
15 got question 1 right
What is the p-value?
Is this an easy or hard item?ltem Difficulty (cont.)
Interpreting the p-value...
example:
100 people take a test
70 got question 1 right
What is the p-value?
Is this an easy or hard item?Item Difficulty (cont’d)
General Rules of Item Difficulty...
p low (< .20) difficult test item
p moderate (.20 - .80) moderately diff.
phigh (> .80) rE aicciialITEM DISCRIMINATION
--. The extent to which an item
differentiates people on the
behavior that the test is designed
to assess. 4
iN
ey
the computed difference between AN
the percentage of high achievers
and the percentage of
low achievers who got the item ye
fate daleItem Discrimination (cont.)
compares the performance of upper
group (with high test scores) and lower
group (low test scores) on each item--%
of test takers in each group who were
correctDivide sample into TOP half and
BOTTOM half (or TOP and BOTTOM
third)
Compute Discrimination Index (D)Item Discrimination
@D=U-L
U =# in the upper group correct response
Total # in upper group
Total # in lower group
The higher the value of D, the more adequately
the item discriminates (The highest value is 1.0)Item Discrimination
seek items with high positive numbers (those
who do well on the test tend to get the item
correct)
negative numbers (lower scorers on test more
likely to get item correct) and low positive
numbers (about the same proportion of low and
high scorers get the item correct) don’t
discriminate well and are discardedItem Discrimination (cont’d):
Item-Total Correlation
Correlation between each item (a correct response
usually receives a score of 1 and an incorrect a score
of zero) and the total test score.
To which degree do item and test measures the same
thing?
Positive -item discriminates between high and low
Terel for
Near 0 - item does not discriminate between high & low
Negative - scores on item and scores.on test disagreeItem Discrimination (cont’d):
Item-Total Correlation
Item-total correlations are directly
related to reliability.
St
Because the more each item correlates
with the test as a whole, the higher all
items correlate with each other
( = higher alpha, internal consistency)Level of Difficulty
Index Range ulty Level
(0.00-0.20 Very Difficult
0.21-0.40 Difficult
0.41-0.60 Average/Moderately
Difficult
0.61-0.80 Easy
0.81-1.00 Very Easy* Ebel’s (1972) gives the indices of item
discrimination in the terms shown below.
For average classroom test, these indices are
widely accepted.
0.40 and above
very good item
0.30 — 0.39
reasonably good, but subject to
improvement
0.0=0.29 ‘marginal items usually needing and
being subject to
Improvement
Below 0.19
poor items to be rejected or improved by
revision_ ITEM DISCRIMINATION INDEX
(D)
The item discrimination index of a test refers to the
degree, which the item discriminates between high
achieving students and low achieving students in
terms of the scores of the total test.
In technical sense item discrimination index
addresses the validity of the test item i.e. the extent to
which the item test the attributes it was intended to
test.+ The formula to determine item discrimination index is:
Ru= number of students in the upper group who got the
item right.
R1= number of students in the lower group who got the
em right.
'‘4V= one half of the total number of students included i
the analysis+ The resulting index can range
from + 1.00 where all students from upper group got the
item right and all the students from lower group got the item
wrong to
+ An index of 0.00 or no discriminating index oceurs
equal number of students in each group got the lente* The higher the discrimination index , the better
and more reliable the test.
When items have no discrimination power or
negative discrimination power they should be
revised and discarded.
This usually indicate that the items are written
ambiguously, the answer on the scoring key is
wrong or that the content to which the ite:
refers was too obscureTypes of Discrimination Index
+ Positive Discrimination
° Negative Discrimination
* Zero DiscriminationPositive Discrimination
* Happens when more students in the upper
group got the item correctly than those
students in the lower group.
Negative Discrimination
* Occurs when more students in the
lower group got the item correctly than ,
the students in the upper group SSZero discrimination
+ Happens when a number of students in the
upper group and lower group who answer
the test correctly are equal+ Ebel’s (1972) gives the indices of item
discrimination in the terms shown below.
For average classroom test, these indices are
widely accepted.
0.40 and above
very good item
0.30 —0.39
reasonably good, but subject to
improvement
0.00.29 marginal items usually needing and
being subject to.
Improvement
Below 0.19 poor items to be rejected or improved by
revisionDIFFICULTY INDEX TABLE
RANGE OF INTERPRETATION ACTION
DIFFICULTY
INDEX
O- 0.25 DIFFICULT REVISE OR
DISCARD
0.26-0.75 RIGHT DIFFICULT RETAIN
_76-ABOVE EASY REVISED OR
DISCRDArrange the scores in descending order
Separate two sub groups of the test papers
Take 27% of the scores out of the highest scores and 27% of the
scores falling at bottom
Count the number of right answer in highest group (R.H) and
count the no of right answer in lowest group (R.L)
Count the non-response (N.R) examineesQuantitative Item Analysis
@ Inter-item correlation matrix displays the
correlation of each item with every other
item
@ provides important information for
increasing the test’s internal consistency
@each item should be highly correlated
with every other item measuring the same
construct and not correlated with items
measuring a different constructQuantitative Item Analysis
items that are not highly correlated with
other items measuring the same
construct can and should be dropped to
increase internal consistencyItem Discrimination (cont’d):
Interitem Correlation
Possible causes for low inter-item correlation:
Item badly written (revise)
Item measures other attribute than rest of
the test (discard)
Item correlated with some items, but not
with others: test measures 2 distinct
attributes (subtests or subscales)