Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 44

1

Item Analysis - Outline


1. Types of test items
A. Selected response items
B. Constructed response items
2. Parts of test items
3. Guidelines for writing test items
2
Item Analysis - Outline
4. Item Analysis
A. Distracter measures
B. Item difficulty measures
C. Item discrimination measures
5. Item Response Theory
A. ICCS
B. Adaptive testing
3
1. Types of test items

A. Selected response
 Multiple choice
 Likert scale
 Category
 Q-sort
B. Constructed response
4
A. Selected response

• Multiple choice or • Task is to choose


forced choice between set answers
• Advantage: ease of
scoring
• Advantage: scoring
requires little skill
• Disadvantage: may
test memory rather
than comprehension
5
A. Selected response
• Multiple choice or • Correct response
forced choice must be distinct
• Distracters should not
be obvious or
ambiguous
• If distracters are bad,
more = less reliable
test
• Use 3-4 distracters
per item
6
A. Selected response

• Multiple choice or • Test-taker chooses a


forced choice point on a scale that
• Likert format expresses their
attitude or belief
• Data lend themselves
to factor analysis
7
Likert scale example item

Parking costs at the university are fair

1 2 3 4 5
Strongly agree neutral disagree strongly
agree disagree
8
A. Selected response

• Multiple choice or • Similar to Likert but


forced choice with more choices
• Likert format • Test-taker’s
• Category commitment
• Reliability depends on
good instructions & #
of categories (≤ 10)
• Scoring shows
context effects
9
A. Selected response

• Multiple choice or • A large set of cards


forced choice each with statement
• Likert format referring to a “target”
• Category • Test-take sorts cards
• Q-sort into piles in terms of
how accurate
statements are as a
description of target
• Generally 9 piles
10
1. Types of test items

A. Selected response
B. Constructed response
 Free response
 Fill-in-the-blank
 Essay tests
 Portfolios
 In-basket technique
11
B. Constructed response items

• Free response • Test-taker responds


without constraint
• Describes what is
important to him/her
12
B. Constructed response items

• Free response • Used to test for


• Fill-in-the-blank knowledge or to find
out about beliefs and
attitudes
13
B. Constructed response items

• Free response • Preferred when you


• Fill-in-the-blank want to assess test-
• Essay tests taker’s ability to think
analytically, integrate
ideas, and express
himself
14
B. Constructed response items

• Free response • Not really a test


• Fill-in-the-blank • Collections of things
• Essay tests the person being
• Portfolios evaluated has
produced
• Let you evaluate
things you can’t
assess with a selected
response test
15
B. Constructed response items

• Free response • Used in business


• Fill-in-the-blank • Job candidate gets a
• Essay tests set of “everyday”
• Portfolios problems, says how
he or she would deal
• In-basket technique
with those problems
• Requires expert
raters to grade
response
16
B. Constructed response items

• Strengths • Assess higher-order


skills
• More useful feedback
to test-taker
• Positive influence on
study habits?
• Easier to create items
17
B. Constructed response items

• Weaknesses • Time consuming to


use
• Possible subjectivity
in scoring
18
2. Parts of test items

A. Stimulus or item stem


B. Response format or method
C. Conditions governing the response
D. Procedures for scoring the response
19
2. Parts of test items

A. Stimulus or item • What the subject


stem responds to
20
2. Parts of test items

B. Response format or • Typically multiple


method choice or constructed
response
21
2. Parts of test items

C. Conditions • e.g., time limits;


governing the allowing probes for
response ambiguous
responses; how
response is
recorded...
22
2. Parts of test items

D. Procedures for • particularly important


scoring the response for constructed
response items
23
2. Parts of test items

• To some extent, your • Precedent


choices on each of  What did you do last
these parts will be time?
dictated by: • Experience
 Did that work?
• Practical considerations
 How many people have
to be tested?
 How much time is
available?
24
3. Writing test items – guidelines

A. Define clearly
B. Generate a pool of potential items
C. Monitor reading level
D. Use unitary items
E. Avoid long items
F. Break any response “set”
25
3. Writing test items – guidelines

A. Define clearly • Why are you testing?


• What do you want to
know?
26
3. Writing test items – guidelines

A. Define clearly • The larger the pool of


B. Generate a pool of items you select from,
potential items the better the test
• Selection from this
pool based on item-
analysis (see below)
27
3. Writing test items – guidelines

A. Define clearly • level too low?


B. Generate a pool of  more sophisticated
potential items test-takers may get
bored
C. Monitor reading level
• level too high?
 you’re testing reading
skill as well as domain
you think you’re testing
28
3. Writing test items – guidelines

A. Define clearly • Then the meaning of


B. Generate a pool of the response is clear
potential items
C. Monitor reading level
D. Use unitary items
29
3. Writing test items – guidelines

A. Define clearly • Longer items are


B. Generate a pool of more likely to be mis-
potential items interpreted by test-
C. Monitor reading level takers
• Short items are more
D. Use unitary items
likely to be unitary
E. Avoid long items
30
3. Writing test items - guidelines

A. Define clearly • Use reverse-scored


B. Generate a pool of items to prevent test-
potential items taker’s from getting
C. Monitor reading level into a response set
such as just
D. Use unitary items responding “5” for
E. Avoid long items every item on a Likert
F. Break any response scale
“set”
31
4. Item analysis

A. Multiple choice distracter analysis


B. Item difficulty measure P
C. Discrimination index D
D. Item – total correlation
32
A. Multiple choice – distracter measures

• How many people • Distracters should be


choose each equally attractive
distracter? • Correct choice should
be based on
knowledge
• Where knowledge is
lacking, choice should
be random
33
B. Item Difficulty Measure P

• Difficulty determined P(i) = # got item correct


by item and # taking test
population tested
34
B. Item Difficulty Measure P

• P = .50 is best • P = 0 or P = 1 – such


items do not
distinguish ability
levels
35
C. Item Discrimination Measures

• Discrimination index D
• Item-total correlation
36
Discrimination Index D

• Extreme groups D=U – L


method nU nL
 U = # getting item
correct in ‘top’ group
 L = # getting item
correct in ‘bottom’
group
 nU = # in top group
 nL = # in bottom group
37
Item Total Correlation

• Good item • Poor item


 High correlation  Low correlation: look
 People who get item at wording – may be
correct have high testing reading skill
score on the test
 People who get item
wrong have low score
on the test
38
5. Item Response theory

A. Item characteristic curves


B. Adaptive testing using computers
39
A. Item characteristic curves

• Most important idea: • One curve for each


Item Characteristic test item
Curves (ICCs) • X axis: test-taker
ability (given by test
score)
• Y axis: probability of
choosing an answer
Probability of Item 1
correct
response
Item 2

Item 3

Test Score
41
A. Item Characteristic Curves

• Slope: how quickly • indicates how well


the curve rises. item discriminates
among persons of
differing abilities
• like P(i) in Classical
Test Theory
• but sample-invariant
42
Problems with Item Response Theory

• Obtaining stable • IRT model assumes


estimates of IRT that the trait being
parameters requires measured is one-
rather large samples dimensional. It may
not be.
• Computationally
complex
43
B. Adaptive Testing Using Computers

• computer selects • lets you tailor


harder or easier questions for each
questions as test- test-taker
taker gets each • test-taker does not
question right or spend most of their
wrong time with questions
that are too easy or
too difficult
44
B. Adaptive Testing Using Computers

• Facilitates testing of • Output = level of


diverse ability groups difficulty test-taker
can deal with

You might also like