Chapter 8 Test Development

Chapter 8- Test Development Pilot Work
● Also called pilot study and pilot research.

Chapter Topics ● Test items may be pilot studied (or piloted)
1. Test conceptualization to evaluate whether they should be included
2. Test construction in the final form of the instrument. In
3. Test tryout developing a structured interview to
4. Item analysis measure introversion/extraversion, for
5. Test revision example, pilot research may involve open-
ended interviews with research subjects
Test Development Process believed for some reason (perhaps based
1. Test Development on an existing test) to be introverted or
2. Test Construction extraverted.
3. Test Tryout
4. Item Analysis Test Construction
5. Test Revision
1. SCALING
Test Conceptualization ● Scaling may be defined as the process of
● This is the beginning of any published test. setting rules for assigning numbers in
● An emerging social phenomenon or pattern measurement.
of behavior might serve as the stimulus for
the development of new tests. Types of Scaling
1. Age-based Scale
Preliminary Questions: ● Interest is on the test performance as a
1. What is a test designed to measure? function of age.
2. What is the objective of the test? 2. Grade-based Scale
3. Is there a need for this test? ● Interest is on the test performance as a
4. Who will use this test? function of grade.
5. Who will take the test? 3. Stanine Scale
6. What content will the test cover? ● When a raw score is to be transformed into
7. How will the test be administered? scores that range from 1-9.
8. What is the ideal format of the test?
9. Should more than one form of the test be Scaling Methods
developed? 1. Rating Scale
10. What special training will be required of the ● A grouping of words, statements, or
test users for administering or interpreting symbols in which judgments of the strength
the test? of a particular trait, attitude or emotion are
11. What type of response will be required of indicated by the test taker.
test takers? 2. Summative Scale
12. Who benefits from the administration of this ● Final test score is obtained by summing the
test? ratings of all the items.
13. Is there any potential harm as the result of 3. Likert Scale
an administration of this test? ● Contains 5-7 alternative responses which
14. How will the meaning be attributed to scores may include the following continuum:
on this test? Agree/Disagree; Approve/Disapprove
4. Paired Comparisons
● Test takers are presented with 2 stimuli
which they must compare in order to select
one.
● Select the behavior that you think is more ● Help may also be sought through experts in
justified: their respective fields.
a. Cheating on taxes if one has a Item Format
chance. ● Form, plan, structure, arrangement, and
b. Accepting a bribe in one’s duties. layout of individual test items.
5. Comparative Scale a. Selected Response
● Entails judgment on a stimulus in ○ Requires test takers to select a
comparison with other stimulus on the response from a set of alternative
scale. responses.
● Comparative Scaling: Rank according to ○ Multiple Choice
Beauty ○ Binary Choice
_____ Angel Locsin ○ Matching Type
_____ Marian Rivera b. Constructed Response
_____ Anne Curtis ○ Requires test takers to supply or
_____ Heart Evangelista create the correct answer.
_____ Toni Gonzaga ○ Completion Items- Requires
6. Categorical Scale examinee to provide a word or
● Done by placing stimuli into alternative phrase that completes a sentence.
categories that differ quantitatively.
7. Categorical Scaling Elements of Multiple Choice
● 30 cards with various scenarios/situations. 1. Stem
You are to judge whether scenarios are: 2. Correct Option
● Beautiful Average Ugly 3. Several Incorrect Options or distractors or
8. Guttman Scale foils
● Entails all respondents who agree with the
stronger statements will also agree with the Writing Items for Computer Administration
milder statement. 1. Item Bank
● Large, easily accessible collection of test
Test Construction questions.
2. Computer Adaptive Testing (CAT)
2. WRITING ITEMS ● Interactive, computer-administered test
taking process wherein items presented to
Writing Items: the test taker are based in part of the test
Questions to Consider by the Test Developer: taker’s performance on previous items.
● What is the range of content the items 3. Item Branching
should cover? ● Ability of the computer to tailor the content
● Which of the many different types of item and order of presentation of test items.
formats should be employed?
● How many items should be written? Test Construction
Item Pool 3. SCORING ITEMS

● Reservoir or well from which adequate
items will be drawn or discarded for the final Scoring Items
revision of the test. 1. Cumulative Scoring
● Items could be derived from the test ● The higher the score on a test, the higher
developer’s personal experience or the ability or trait.
academic acquaintance with the subject 2. Class/Category Scoring
matter. ● Response earn credit toward placement in a
particular class or category with other test
takers whose pattern of responses are ● Expert panels
similar.
3. Ipsative Scoring Test Revision
● Comparison of test taker’s score on one
scale within a test with another scale within Test Revision
that same test. ● Characterize each item according to its
strengths and weaknesses.
Test Tryout ● Balance various strengths and weaknesses
across items.
Test Tryout ● Administer the revised test under
● Tests should be tried out on people similar standardized conditions to a second
in critical respects to the people for whom appropriate sample of examinees.
the test was designed.
● Subjects should not be fewer than 5, rather Characteristics of Tests that are Due for
than ideally 10. The more the subjects, the Revision
better. 1. Current test takers cannot relate to the test.
● Tryout should be executed under conditions 2. Vocabulary that is not readily understood by
as identical as possible to the condition the test taker.
under which the standardized test will be 3. Inappropriate meaning of the words dictated
administered. by popular culture change.
4. Test norms are no longer adequate as a
Item Analysis result of group membership changes.
5. Test norms are no longer adequate as a
1. Item Difficulty Index result of age-related shifts.
● Obtained by calculating the proportion of the 6. Reliability and validity are improved for
total number of test takers who got the item revision.
right. 7. Theory on which the test was based has
● Value can range from 0-1. been improved.
● Optimal item difficulty should be determined
in respect to the number of options.
2. Item Reliability Index
● Provides an indication of internal
consistency of a test. The higher the index,
the greater the internal consistency.
● Obtained using factor analysis.
3. Item Validity Index
● Statistics designed to provide an indication
of the degree to which a test is measuring
what it purports to measure; the higher the
item-validity index, the greater the test’s
criterion-related validity.
4. Item Discrimination Index
● Indicate how adequate an item separates or
discriminates between high scorers and low
scorers on an entire test.
5. Qualitative Item Analysis
● Nonstatistical procedure designed to
explore how an individual test item works.
● “Think aloud” Test administration.

Chapter 8 Test Development

Uploaded by

Copyright:

Available Formats

You might also like

Chapter 8 Test Development

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 8 Test Development

Uploaded by

Copyright:

Available Formats

Chapter 8- Test Development Pilot Work

● Also called pilot study and pilot research.

Item Pool 3. SCORING ITEMS

You might also like