Professional Documents
Culture Documents
Item Analysis Workshop
Item Analysis Workshop
Item Analysis Workshop
WORKSHOP FOR
NURSING FACULTY
‣We use the overall score on the exam to assess the student’s aptitude/ability
‣We want to know who understands the material and who doesn’t
‣We want to make sure that the student’s score is stable
‣Question: How do we know we assessing are the student's ability as well as we can?
‣So that you can ensure that your items are effectively evaluating student ability
‣Examples:
‣Item Difficulty
‣Item Discrimination
‣Internal Consistency
‣Differential Item Functioning
ITEM ANALYSIS PROCESS 6
‣How well do the questions identify those students who knew the
material from those that did not? (The more of these questions, the better
your exam can precisely measure ability)2
‣Item Analysis can answer many questions (“Am I able to know who has
ability and who does not”, “Am I able to get a fine-grained view on ability”, “How
consistent are scores”)
‣Many fields are concerned with knowing the answer to those questions:
‣Academic exams
‣Dean, Head of the programme
‣Quality improvement committee in the college
PRELIMINARY
PREPARATION
BEFORE BEGINNING ITEM ANALYSIS 16
‣Cells = Whether the examinee got the question right (1) or wrong (0)
BEFORE BEGINNING ITEM ANALYSIS 17
Ahmed 1 1 1 1
Ahmed 1 1 1 1
Ahmed 1 1 1 1
Ahmed 1 1 1 1
‣Rows = An individual
Adnan 1 1 1 0
examinee
Fatima 0 1 1 0
2) The activity sheet contains five students’ responses to five different multiple choice
question
c) Each cell indicates whether the student answered the question correctly (1) or
incorrectly(0)
ITEM DIFFICULTY
ITEM DIFFICULTY: WHAT IS IT? 23
‣Examples
‣If everyone got the question right, the item is considered “easy”
‣If half the people got the question right, then the item is somewhere
between easy and hard.
ITEM DIFFICULTY: WHY DOES IT MATTER? 24
‣If you don’t pay attention to Item Difficulty, you don’t get a precise measure of
ability
‣If true ability has a bell-shaped distribution, then your estimated ability
should have a bell-shaped distribution
ITEM DIFFICULTY: WHY DOES IT MATTER? 25
Ali 100%
‣If question too easy -> everyone gets the Hussain 100%
question right -> you don’t know who is on
Too Easy - Cannot tell who has
the lower end of ability more/less ability
Hussain 0%
‣Therefore, it is important to pay attention to
Too hard - Cannot tell who has
item difficulty to have it be just right more/less ability
ITEM DIFFICULTY: WHY DOES IT MATTER? 26
‣If item difficulty is too high or low, then scores will be truncated (prevents
symmetry)
Examinee Q1 Q2 Q3
‣Imagine a test with THREE questions and FIVE
Huda 1 0 1 examinees
Ali 1 0 1
Examinee Q1 Q2 Q3
‣ We can total up how many people got each question correct by taking the
Huda 1 0 1
sum for each question
‣Question 1 = 5
‣Question 2 = 0
Ali 1 0 1
‣Question 3 = 4
Hussain 1 0 1
Ahmed 1 0 1
Noor 1 0 0
Total 5 0 4
ITEM DIFFICULTY: EXAMPLE 29
Examinee Q1 Q2 Q3
‣ We can total up how many people got each question correct by taking the
Huda 1 0 1
sum for each question
‣Question 1 = 5
‣Question 2 = 0
Ali 1 0 1
‣Question 3 = 4
Hussain 1 0 1
‣ Divide the total number of people who got the question correct by the
Ahmed 1 0 1 total number of people who took the test, and you have the item’s
difficulty
Noor 1 0 0 Question # Total Item
correct/ Difficulty
Total test
takers
‣Item Difficulty = P / N
ITEM DIFFICULTY: HOW TO INTERPRET IT 31
‣How to Interpret Item Difficulty
‣Think of it as “Item Easiness”
‣If value is at an ideal “sweet spot”, then your test can better separate
high ability people from low ability people (discussed in next section)
‣If Item Difficulty is too Low / Item is too Hard (< .25 or .30):
‣The item may have been miskeyed
‣The item may be too challenging relative to the overall level of ability of the
class
‣Find where people are being confused and clarify in the question
‣Not meaningful if
‣The recommended values (.60-.75) assume you want to assess people’s ability
relative to others
‣If you are concerned with content mastery, you want all items answered
correctly
‣Test had a short time limit (“speed test”) - later items seem difficult
ACTIVITY 35
3) Think of a question you can ask your fellow attendees that would probably have a
difficulty value close to your group’s assigned value
EXERCISE Example: If you are assigned a difficulty value of .25, what is a question you could ask
that only 25% of the attendees would know
4) Think of 4 multiple choice options to go with your question, one of which is right
‣Remember: Item
5) When you are ready, have one member of the group go up to the presenter and share the
Difficulty: Percentage of
group’s question and answers
people answering the
question correctly 6) WHEN TOLD THE SURVEY IS READY BY THE PRESENTER: Complete the combined
survey online (link will be provided) - Skip your question
‣ Larger values = Easier
7) Access the spreadsheet link provided, and compute the difficulty for each question
‣ Smaller values = Harder
8) How close was the actual difficulty to the difficulty you were assigned?
ITEM
DISCRIMINATION
ITEM DISCRIMINATION: WHAT IS IT? 37
‣People who studied get the question right, people who didn’t study get the question
wrong
ITEM DISCRIMINATION 38
Examinee Q1 Q2 Q3
‣Imagine a test with THREE questions and FIVE
Huda 1 1 1 students
Ali 1 1 1
Hussain 0 0 1
Ahmed 0 1 0
Noor 0 0 1
ITEM DISCRIMINATION 39
Ali 1 1 1 100% ‣Huda and Ali did the best (perfect scores)
Hussain 0 0 1 33%
‣Hussain, Ahmed, and Noor did the worst (33%)
Ahmed 0 1 0 33%
Noor 0 0 1 33%
ITEM DISCRIMINATION 40
Ali 1 1 1 100% ‣Huda and Ali did the best (perfect scores)
Hussain 0 0 1 33%
‣Hussain, Ahmed, and Noor did the worst (33%)
Ahmed 0 1 0 33%
‣Make one column that has whether people got the question right (1) or wrong
(0) = Question Scores
‣Make another column that has people’s total score on the exam (0 to 100%)
=
Total Scores
‣If above .20, item is useful for describing people’s overall ability
ITEM DISCRIMINATION: DIAGNOSTICS 46
‣Low item discrimination is problematic: Suggests that people who know the
concepts really well overall, were not any more likely to understand the specific concept of
the question
‣Not meaningful if
‣Partial credit for answers (some answers are less wrong than others)
ACTIVITY: ITEM DISCRIMINATION 49
1) Think to yourself about a multiple choice exam you might give in your respective field
for a specific topic
EXERCISE 3) Which of those questions, if answered correctly would indicate that this
person understands the topic well as a whole?
a) This question has good discrimination
b) Can tell you who likely has high knowledge and who has lower knowledge
‣True ability = what you actually know for the entire topic
‣If test 100% reliable, then the score a person receives is their true score, and
they get the same score each time they retake the exam
‣If test not 100% reliability, then the score a person receives may be either higher or
lower than their actual true score. Next score might be different
‣If our test is unreliable, then a student’s ability is not reflected in the score received
TEST RELIABILITY: WHAT IS IT 54
‣Parallel forms reliability: Consistency from one exam form and another
‣Internal consistency is how consistent the items are with the other
items
‣If you know an exam internal consistency, then you know the worst case of its
reliability
Alpha Interpretation
> .90 Excellent
.90 > .80 Good
.80 > .70 Acceptable
.70 > 60 Marginal
<. 60 Poor
INTERNAL CONSISTENCY: HOW TO CALCULATE IT 58
‣How to Calculate Internal Consistency
‣For each question, make a column that has whether people got the question right (1) or
wrong (0)
‣Calculate the correlation between each right/wrong column and every other
right/wrong column
‣If K questions, then (K * (K-1) / 2) comparisons
‣If 20 questions, then (20 * (20-1) / 2) comparisons
‣If 40 questions, then (40 * (40-1) / 2) comparisons
Test length: The more questions on the test, the more reliable the test will be.
Average inter-item correlation: The more the questions address a single common domain,
the more reliable the test will be
- All questions pertain to the same topic area = Higher average correlation between
question scores
- All questions pertain to the disparate topic areas = Lower average correlation between
EXERCISE question scores
2) What additional question could we ask that would probably INCREASE the
average inter-item correlation?
‣Items represent too many distinct dimensions (too many concepts being asked)
‣Excel
‣SPSS
IMPLEMENTING ITEM ANALYSIS: EXCEL 68
Cronbach’s alpha
= Internal
Consistency
IMPLEMENTING ITEM ANALYSIS: SPSS 72
‣You can calculate item analysis with SPSS (or R, SPSS, Minitab, Stata)
‣At the top menu, go to Analyze -> Scale -> Reliability Analysis
‣Click “Statistics” and ask for “item,” “scale,” and “scale if item deleted” statistics
‣Click “OK”
IMPLEMENTING ITEM ANALYSIS: SPSS - STEP 1 - CHOOSE ANALYSIS 73
IMPLEMENTING ITEM ANALYSIS: SPSS - STEP 2 - SELECT VARIABLES 74
IMPLEMENTING ITEM ANALYSIS: SPSS - STEP 3 - INTERPRETATION 75
Cronbach’s alpha = Internal consistency
‣Items can perform poorly due to wording ambiguity, lack of ability in that
domain, miscoding, lack of conceptual relevance, instructional issues
‣Item-Response Theory - What are people’s ability, when you take into
account the difficulty and discrimination of the item that people’s answered
correctly/incorrectly
QUESTIONS