Professional Documents
Culture Documents
Week13- ÃÄrenci
Week13- ÃÄrenci
Test Development
© McGraw Hill 1
Test Development
© McGraw Hill 2
Test Conceptualization
© McGraw Hill 3
Some preliminary questions include:
© McGraw Hill 4
Norm-referenced versus criterion-referenced tests: Item
development issues 1
Generally, a good item on a norm-referenced achievement test is an item
for which high scorers on the test respond correctly and low scorers
respond incorrectly.
Test items may be pilot studied to evaluate whether they should be included
in the final form of the instrument.
© McGraw Hill 5
What is pilot study?
• During pilot work, the test developer aims to determine the most
effective way to measure the targeted construct.
© McGraw Hill 6
Test Construction
© McGraw Hill 7
Types of scales
• Age-based scale
• Grade-based scale
• Stanine scale
• Unidimensional & Multidimensional Scale
• Comparatice & Categorical Scale
© McGraw Hill 8
Scaling Methods
© McGraw Hill 9
© McGraw Hill 10
Scaling Methods
© McGraw Hill 11
An example of a Likert scale:
© McGraw Hill 12
Method of paired comparisons: Test takers must choose between two
alternatives according to some rule.
• Example: Select the behavior that you think would be more justified:
© McGraw Hill 13
Comparative scaling: Entails judgments of a stimulus in comparison
with every other stimulus on the scale.
© McGraw Hill 14
Categorical scaling: Stimuli are placed into one of two or more
alternative categories that differ quantitatively with respect to some
continuum.
© McGraw Hill 15
If the following scale was a perfect Guttman scale, then all
respondents who agree with “a” (the most extreme position) should
also agree with “b,” “c,” and “d.”
© McGraw Hill 16
Writing Items
• Item pool: The reservoir or well from which items will or will not be
drawn for the final version of the test.
• Comprehensive sampling provides a basis for content validity of the
final version of the test.
• When devising a standardized test using a multiple-choice format, it is
usually advisable that the first draft contain approximately twice the
number of items that the final version of the test will contain.
• How does one develop items for the item pool?
• The test developer may write a large number of items from
personal experience or academic acquaintance with the subject
matter.
• Help may also be sought from others, including experts.
• Searches through the academic research literature may prove
fruitful, as may searches through other databases.
© McGraw Hill 17
Writing Items
© McGraw Hill 18
Selected-response format:
1. Multiple-choice
2. Matching
3. True-false
© McGraw Hill 19
Multiple-choice format has three elements:
(1) a stem
(2) a correct alternative or option
(3) several incorrect alternatives or options variously referred to as
distractors or foils.
© McGraw Hill 20
Other commonly used selective
response formats include matching and
true–false items.
❖ In a matching item, the testtaker is
presented with two columns: premises
on the left and responses on the right.
The testtaker’s task is to determine
which response is best associated with
which premise.
❖ If the format contains only two
possible responses is called a binary-
choice item.
Perhaps the most familiar binary-choice
item is the true–false item.
© McGraw Hill 21
Constructed-response format: Items require test takers to supply or to create the
correct answer, not merely to select it.
1. completion item
2. short answer
3. essay
• A completion item requires the examinee to provide a word or phrase that completes
a sentence, as in the following example:
Standard deviation of data indicates the spread of scores from the _________
❖ A good completion item should be worded so that the correct answer is specific.
❖ A completion item may also be referred to as a short-answer item.
❖ There are no hard-and-fast rules for how short answer must be to be considered
a short answer
❖ But beyond a paragraph or two, the item is more properly referred to as an essay item
which requires writing a composition, typically one that demonstrates recall of facts,
understanding, analysis, and/or interpretation.
© McGraw Hill 22
Writing items for computer administration.
• Item bank: A relatively large and easily accessible collection of test
questions.
• Computerized adaptive testing (CAT): An interactive, computer-
administered test-taking process wherein items presented to the test
taker are based in part on the test taker’s performance on previous
items.
• CAT provides economy in testing time and number of items
presented.
• CAT tends to reduced floor effects and ceiling effects.
© McGraw Hill 23
Ceiling or floor effects occur when the tests or scales are relatively easy
or difficult such that substantial proportions of individuals obtain either
maximum or minimum scores and that the true extent of their abilities
cannot be determined
Ceiling effect
❖ occurs when the scores on a variable reaches a level that cannot be
exceeded
Floor effect
❖ a value below which a response cannot be made
© McGraw Hill 24
Scoring Items
© McGraw Hill 25
Scoring Items
© McGraw Hill 26
Test Tryout
© McGraw Hill 27
What is a good item?
© McGraw Hill 28
Item Analysis
The nature of item analysis will vary depending on the goals of the test
developer.
Among the tools test developers might employ to analyze and select
items are:
© McGraw Hill 29
• Item-difficulty index: The proportion of respondents answering an
item correctly.
• An index of an item’s difficulty (shown by p) is obtained by calculating
the proportion of the total number of testtakers who answered the item
correctly
• If 50 of the 100 examinees answered item correctly, p = .50 & if 75
of the 100 examinees answered item correctly p = .75
• Note that the larger the item-difficulty index, the easier the item.
• For maximum discrimination among the abilities of the testtakers, the
optimal average item difficulty is approximately .5, with individual
items on the test ranging in difficulty from about .3 to .8.
© McGraw Hill 30
One factor
© McGraw Hill 31
Item-validity index: Allows test developers to evaluate the validity of
items in relation to a criterion measure.
The higher the item-validity index, the greater the test’s criterion-related
validity.
Item-discrimination index (d): Indicates how adequately an item
separates or discriminates between high scorers and low scorers on an
entire test.
• A measure of the difference between the proportion of high scorers
answering an item correctly and the proportion of low scorers
answering the item correctly.
© McGraw Hill 32
Analysis of item alternatives: The quality of each alternative within a
multiple-choice item can be readily assessed with reference to the
comparative performance of upper and lower scorers.
© McGraw Hill 33
Item characteristic curves (ICC): A graphic representation of item
difficulty and discrimination.
© McGraw Hill 34
• On Item C, test takers of high
ability tend to do better; it is a
good item
• Item D shows a high level of
discrimination; it might be good
if cut scores are being used
© McGraw Hill 35
Other Considerations in Item Analysis
Guessing: Test developers and users must decide whether they wish to
correct for guessing, but to date, there has not been any satisfactory
solution to the problem of guessing.
Speed tests: Item analyses of tests taken under speed conditions yield
misleading or uninterpretable results
• The closer an item is to the end of the test, the more difficult it may
appear to be.
• In a similar vein, measures of item discrimination may be artificially high
for late-appearing items. This is so because testtakers who know the
material better may work faster and are thus more likely to answer the
later items
© McGraw Hill 36
Qualitative Item Analysis
© McGraw Hill 37
Test Revision
© McGraw Hill 38
Revision in the life cycle of a test
• Existing tests may be revised if the stimulus material or verbal
material is dated, some outdated words are not readily understood or
are offensive, norms no longer represent the population, psychometric
properties could be improved, or the underlying theory behind the test
has changed.
• In test revision, the same steps are followed as with new tests (i.e.,
test conceptualization, construction, item analysis, tryout, and
revision).
© McGraw Hill 39
Revision in the Life Cycle of a Test
© McGraw Hill 40
ANY QUESTIONS?
©© McGraw Hill
McGraw-Hill Education 41