The document discusses the process of test development which includes test conceptualization, construction, tryout, item analysis, and revision. It covers topics such as pilot work, scaling methods, writing test items in various formats, scoring items, and revising tests based on tryouts and item analysis. The goal of the process is to create valid and reliable standardized tests through iterative refinement.
The document discusses the process of test development which includes test conceptualization, construction, tryout, item analysis, and revision. It covers topics such as pilot work, scaling methods, writing test items in various formats, scoring items, and revising tests based on tryouts and item analysis. The goal of the process is to create valid and reliable standardized tests through iterative refinement.
The document discusses the process of test development which includes test conceptualization, construction, tryout, item analysis, and revision. It covers topics such as pilot work, scaling methods, writing test items in various formats, scoring items, and revising tests based on tryouts and item analysis. The goal of the process is to create valid and reliable standardized tests through iterative refinement.
The document discusses the process of test development which includes test conceptualization, construction, tryout, item analysis, and revision. It covers topics such as pilot work, scaling methods, writing test items in various formats, scoring items, and revising tests based on tryouts and item analysis. The goal of the process is to create valid and reliable standardized tests through iterative refinement.
Chapter Topics ● Test items may be pilot studied (or piloted) 1. Test conceptualization to evaluate whether they should be included 2. Test construction in the final form of the instrument. In 3. Test tryout developing a structured interview to 4. Item analysis measure introversion/extraversion, for 5. Test revision example, pilot research may involve open- ended interviews with research subjects Test Development Process believed for some reason (perhaps based 1. Test Development on an existing test) to be introverted or 2. Test Construction extraverted. 3. Test Tryout 4. Item Analysis Test Construction 5. Test Revision 1. SCALING Test Conceptualization ● Scaling may be defined as the process of ● This is the beginning of any published test. setting rules for assigning numbers in ● An emerging social phenomenon or pattern measurement. of behavior might serve as the stimulus for the development of new tests. Types of Scaling 1. Age-based Scale Preliminary Questions: ● Interest is on the test performance as a 1. What is a test designed to measure? function of age. 2. What is the objective of the test? 2. Grade-based Scale 3. Is there a need for this test? ● Interest is on the test performance as a 4. Who will use this test? function of grade. 5. Who will take the test? 3. Stanine Scale 6. What content will the test cover? ● When a raw score is to be transformed into 7. How will the test be administered? scores that range from 1-9. 8. What is the ideal format of the test? 9. Should more than one form of the test be Scaling Methods developed? 1. Rating Scale 10. What special training will be required of the ● A grouping of words, statements, or test users for administering or interpreting symbols in which judgments of the strength the test? of a particular trait, attitude or emotion are 11. What type of response will be required of indicated by the test taker. test takers? 2. Summative Scale 12. Who benefits from the administration of this ● Final test score is obtained by summing the test? ratings of all the items. 13. Is there any potential harm as the result of 3. Likert Scale an administration of this test? ● Contains 5-7 alternative responses which 14. How will the meaning be attributed to scores may include the following continuum: on this test? Agree/Disagree; Approve/Disapprove 4. Paired Comparisons ● Test takers are presented with 2 stimuli which they must compare in order to select one. ● Select the behavior that you think is more ● Help may also be sought through experts in justified: their respective fields. a. Cheating on taxes if one has a Item Format chance. ● Form, plan, structure, arrangement, and b. Accepting a bribe in one’s duties. layout of individual test items. 5. Comparative Scale a. Selected Response ● Entails judgment on a stimulus in ○ Requires test takers to select a comparison with other stimulus on the response from a set of alternative scale. responses. ● Comparative Scaling: Rank according to ○ Multiple Choice Beauty ○ Binary Choice _____ Angel Locsin ○ Matching Type _____ Marian Rivera b. Constructed Response _____ Anne Curtis ○ Requires test takers to supply or _____ Heart Evangelista create the correct answer. _____ Toni Gonzaga ○ Completion Items- Requires 6. Categorical Scale examinee to provide a word or ● Done by placing stimuli into alternative phrase that completes a sentence. categories that differ quantitatively. 7. Categorical Scaling Elements of Multiple Choice ● 30 cards with various scenarios/situations. 1. Stem You are to judge whether scenarios are: 2. Correct Option ● Beautiful Average Ugly 3. Several Incorrect Options or distractors or 8. Guttman Scale foils ● Entails all respondents who agree with the stronger statements will also agree with the Writing Items for Computer Administration milder statement. 1. Item Bank ● Large, easily accessible collection of test Test Construction questions. 2. Computer Adaptive Testing (CAT) 2. WRITING ITEMS ● Interactive, computer-administered test taking process wherein items presented to Writing Items: the test taker are based in part of the test Questions to Consider by the Test Developer: taker’s performance on previous items. ● What is the range of content the items 3. Item Branching should cover? ● Ability of the computer to tailor the content ● Which of the many different types of item and order of presentation of test items. formats should be employed? ● How many items should be written? Test Construction
Item Pool 3. SCORING ITEMS
● Reservoir or well from which adequate items will be drawn or discarded for the final Scoring Items revision of the test. 1. Cumulative Scoring ● Items could be derived from the test ● The higher the score on a test, the higher developer’s personal experience or the ability or trait. academic acquaintance with the subject 2. Class/Category Scoring matter. ● Response earn credit toward placement in a particular class or category with other test takers whose pattern of responses are ● Expert panels similar. 3. Ipsative Scoring Test Revision ● Comparison of test taker’s score on one scale within a test with another scale within Test Revision that same test. ● Characterize each item according to its strengths and weaknesses. Test Tryout ● Balance various strengths and weaknesses across items. Test Tryout ● Administer the revised test under ● Tests should be tried out on people similar standardized conditions to a second in critical respects to the people for whom appropriate sample of examinees. the test was designed. ● Subjects should not be fewer than 5, rather Characteristics of Tests that are Due for than ideally 10. The more the subjects, the Revision better. 1. Current test takers cannot relate to the test. ● Tryout should be executed under conditions 2. Vocabulary that is not readily understood by as identical as possible to the condition the test taker. under which the standardized test will be 3. Inappropriate meaning of the words dictated administered. by popular culture change. 4. Test norms are no longer adequate as a Item Analysis result of group membership changes. 5. Test norms are no longer adequate as a 1. Item Difficulty Index result of age-related shifts. ● Obtained by calculating the proportion of the 6. Reliability and validity are improved for total number of test takers who got the item revision. right. 7. Theory on which the test was based has ● Value can range from 0-1. been improved. ● Optimal item difficulty should be determined in respect to the number of options. 2. Item Reliability Index ● Provides an indication of internal consistency of a test. The higher the index, the greater the internal consistency. ● Obtained using factor analysis. 3. Item Validity Index ● Statistics designed to provide an indication of the degree to which a test is measuring what it purports to measure; the higher the item-validity index, the greater the test’s criterion-related validity. 4. Item Discrimination Index ● Indicate how adequate an item separates or discriminates between high scorers and low scorers on an entire test. 5. Qualitative Item Analysis ● Nonstatistical procedure designed to explore how an individual test item works. ● “Think aloud” Test administration.