Item Analysis - Outline: 1. Types of Test Items

1
Item Analysis - Outline

1. Types of test items
A. Selected response items
B. Constructed response items
2. Parts of test items
3. Guidelines for writing test items
2
Item Analysis - Outline
4. Item Analysis
A. Distracter measures
B. Item difficulty measures
C. Item discrimination measures
5. Item Response Theory
A. ICCS
B. Adaptive testing
3
A. Selected response
 Multiple choice
 Likert scale
 Category
 Q-sort
B. Constructed response
4
• Multiple choice or • Task is to choose

forced choice between set answers
• Advantage: ease of
scoring
• Advantage: scoring
requires little skill
• Disadvantage: may
test memory rather
than comprehension
5
• Multiple choice or • Correct response
forced choice must be distinct
• Distracters should not
be obvious or
ambiguous
• If distracters are bad,
more = less reliable
test
• Use 3-4 distracters
per item
6
• Multiple choice or • Test-taker chooses a

forced choice point on a scale that
• Likert format expresses their
attitude or belief
• Data lend themselves
to factor analysis
7
Likert scale example item
Parking costs at the university are fair
1 2 3 4 5
Strongly agree neutral disagree strongly
agree disagree
8
• Multiple choice or • Similar to Likert but

forced choice with more choices
• Likert format • Test-taker’s
• Category commitment
• Reliability depends on
good instructions & #
of categories (≤ 10)
• Scoring shows
context effects
9
• Multiple choice or • A large set of cards

forced choice each with statement
• Likert format referring to a “target”
• Category • Test-take sorts cards
• Q-sort into piles in terms of
how accurate
statements are as a
description of target
• Generally 9 piles
10
B. Constructed response
 Free response
 Fill-in-the-blank
 Essay tests
 Portfolios
 In-basket technique
11
• Free response • Test-taker responds

without constraint
• Describes what is
important to him/her
12
• Free response • Used to test for

• Fill-in-the-blank knowledge or to find
out about beliefs and
attitudes
13
• Free response • Preferred when you

• Fill-in-the-blank want to assess test-
• Essay tests taker’s ability to think
analytically, integrate
ideas, and express
himself
14
• Free response • Not really a test

• Fill-in-the-blank • Collections of things
• Essay tests the person being
• Portfolios evaluated has
produced
• Let you evaluate
things you can’t
assess with a selected
response test
15
• Free response • Used in business

• Fill-in-the-blank • Job candidate gets a
• Essay tests set of “everyday”
• Portfolios problems, says how
he or she would deal
• In-basket technique
with those problems
• Requires expert
raters to grade
response
16
• Strengths • Assess higher-order

skills
• More useful feedback
to test-taker
• Positive influence on
study habits?
• Easier to create items
17
• Weaknesses • Time consuming to

use
• Possible subjectivity
in scoring
18
A. Stimulus or item stem

B. Response format or method
C. Conditions governing the response
D. Procedures for scoring the response
19
A. Stimulus or item • What the subject

stem responds to
20
B. Response format or • Typically multiple

method choice or constructed
response
21
C. Conditions • e.g., time limits;

governing the allowing probes for
response ambiguous
responses; how
response is
recorded...
22
D. Procedures for • particularly important

scoring the response for constructed
response items
23
• To some extent, your • Precedent

choices on each of  What did you do last
these parts will be time?
dictated by: • Experience
 Did that work?
• Practical considerations
 How many people have
to be tested?
 How much time is
available?
24
3. Writing test items – guidelines
A. Define clearly
B. Generate a pool of potential items
C. Monitor reading level
D. Use unitary items
E. Avoid long items
F. Break any response “set”
25
A. Define clearly • Why are you testing?

• What do you want to
know?
26
A. Define clearly • The larger the pool of

B. Generate a pool of items you select from,
potential items the better the test
• Selection from this
pool based on item-
analysis (see below)
27
A. Define clearly • level too low?

B. Generate a pool of  more sophisticated
potential items test-takers may get
bored
• level too high?
 you’re testing reading
skill as well as domain
you think you’re testing
28
A. Define clearly • Then the meaning of

B. Generate a pool of the response is clear
potential items
29
A. Define clearly • Longer items are

B. Generate a pool of more likely to be mis-
potential items interpreted by test-
C. Monitor reading level takers
• Short items are more
likely to be unitary
E. Avoid long items
30
3. Writing test items - guidelines
A. Define clearly • Use reverse-scored

B. Generate a pool of items to prevent test-
potential items taker’s from getting
C. Monitor reading level into a response set
such as just
D. Use unitary items responding “5” for
E. Avoid long items every item on a Likert
F. Break any response scale
“set”
31
4. Item analysis
A. Multiple choice distracter analysis

B. Item difficulty measure P
C. Discrimination index D
D. Item – total correlation
32
A. Multiple choice – distracter measures
• How many people • Distracters should be

choose each equally attractive
distracter? • Correct choice should
be based on
knowledge
• Where knowledge is
lacking, choice should
be random
33
B. Item Difficulty Measure P
• Difficulty determined P(i) = # got item correct

by item and # taking test
population tested
34
B. Item Difficulty Measure P
• P = .50 is best • P = 0 or P = 1 – such

items do not
distinguish ability
levels
35
C. Item Discrimination Measures
• Discrimination index D
• Item-total correlation
36
Discrimination Index D
• Extreme groups D=U – L

method nU nL
 U = # getting item
correct in ‘top’ group
 L = # getting item
correct in ‘bottom’
group
 nU = # in top group
 nL = # in bottom group
37
Item Total Correlation
• Good item • Poor item

 High correlation  Low correlation: look
 People who get item at wording – may be
correct have high testing reading skill
score on the test
 People who get item
wrong have low score
on the test
38
5. Item Response theory
A. Item characteristic curves

B. Adaptive testing using computers
39
A. Item characteristic curves
• Most important idea: • One curve for each

Item Characteristic test item
Curves (ICCs) • X axis: test-taker
ability (given by test
score)
• Y axis: probability of
choosing an answer
Probability of Item 1
correct
response
Item 2
Item 3
Test Score
41
A. Item Characteristic Curves
• Slope: how quickly • indicates how well

the curve rises. item discriminates
among persons of
differing abilities
• like P(i) in Classical
Test Theory
• but sample-invariant
42
Problems with Item Response Theory
• Obtaining stable • IRT model assumes

estimates of IRT that the trait being
parameters requires measured is one-
rather large samples dimensional. It may
not be.
• Computationally
complex
43
B. Adaptive Testing Using Computers
• computer selects • lets you tailor

harder or easier questions for each
questions as test- test-taker
taker gets each • test-taker does not
question right or spend most of their
wrong time with questions
that are too easy or
too difficult
44
B. Adaptive Testing Using Computers
• Facilitates testing of • Output = level of

diverse ability groups difficulty test-taker
can deal with

Item Analysis - Outline: 1. Types of Test Items

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Item Analysis - Outline: 1. Types of Test Items

Uploaded by

Copyright:

Available Formats

1

Item Analysis - Outline

• Multiple choice or • Task is to choose

• Multiple choice or • Test-taker chooses a

Parking costs at the university are fair

• Multiple choice or • Similar to Likert but

• Multiple choice or • A large set of cards

• Free response • Test-taker responds

• Free response • Used to test for

• Free response • Preferred when you

• Free response • Not really a test

• Free response • Used in business

• Strengths • Assess higher-order

• Weaknesses • Time consuming to

A. Stimulus or item stem

A. Stimulus or item • What the subject

B. Response format or • Typically multiple

C. Conditions • e.g., time limits;

D. Procedures for • particularly important

• To some extent, your • Precedent

A. Define clearly • Why are you testing?

A. Define clearly • The larger the pool of

A. Define clearly • level too low?

A. Define clearly • Then the meaning of

A. Define clearly • Longer items are

A. Define clearly • Use reverse-scored

A. Multiple choice distracter analysis

• How many people • Distracters should be

• Difficulty determined P(i) = # got item correct

• P = .50 is best • P = 0 or P = 1 – such

• Extreme groups D=U – L

• Good item • Poor item

A. Item characteristic curves

• Most important idea: • One curve for each

• Slope: how quickly • indicates how well

• Obtaining stable • IRT model assumes

• computer selects • lets you tailor

• Facilitates testing of • Output = level of

You might also like