Item Analysis PDF

You might also like

Download as pdf
Download as pdf
You are on page 1of 39
Purpose of Item Analysis —Evaluates the quality of each item — Rationale: the quality of items determines the quality of test (i.e., reliability & validity) — May suggest ways of improving the faalor- 10s aa a elmore —Can help with understanding why certain tests predict some criteria but not others Item Analysis @ When analyzing the test items, we have several questions about the performance of each item. Some of these questions include: Are the items congruent with the test objectives? Are the items valid? Do they measure what they're supposed to measure? PAC aU Taco eae Vm Cee Meola tal ata BWA clae Relea RUN Sc Ulecrm Mee ccm rte) Tetanied Nr ae Ue Mateos Lilet a co aM NC meta ton Nig arta nee LENA Are there any poor performing items that need to be discarded? Types of Item Analyses for CTT Three major types: 1. Assess quality of the distractors 2. Assess difficulty of the items 3. Assess how well an item differentiates between high and low performers Distractor Analysis First question of item analysis: How many people choose each response? If there is only one best response, then all other response options are distractors. Example from in-class assignment (N = 35): Which method has the best internal consistency? # a) projective test 1 Me ae Tee aT c) forced choice Pa d) differences n.s. a Distractor Analysis (cont’d) A perfect test item would have 2 characteri: 1. Everyone who knows the item gets it right 2. People who do not know the item will have responses equally distributed across the wrong answers. It is not desirable to have one of the distractors chosen more often than the correct answer. This result indicates a potential problem with the question. This distractor may be too similar to the correct answer and/or there may be something in or the alternatives that is misleading. Distractor Analysis (cont’d) Calculate the # of people expected to choose each of the distractors. If random same expected number for each wrong response (Figure 10-1). Pad Soe i Pe lrar te cetg DT mee ac Caley Distractor Analysis (cont’d) eae ome tere tite Melitta tls Prete Nacecest a ae tomced tarts Matton Teed Pelorr tied ed 1. It is possible that the choice reflects partial knowledge 2. The item is a poorly worded trick question Pea tlre ict ie tur ae Nom ctun atk ome Liitttl isd because it is easily eliminated extremely popular is likely to lower the reliability and validity of the test Item Difficulty Analysis @ Description and How to Compute ex: a)(6X3)+4=?7 b) 9TT[1n(-3.68) X (1 — 1n(+3.68))] = ? @lItis often difficult to explain or define difficulty in terms of some intrinsic characteristic of the item @ The only common thread of difficult items is that individuals did not know the answer Percentage of test takers who respond correctly What if p = .00 What if p = 1.00? = Item Difficulty — An item with a p value of .0 or 1.0 does not contribute to measuring individual differences and thus is certain to be useless — When comparing 2 test scores, we are interested in who had the higher score or the differences in rere tr) —p value of .5 have most variation so seek items in this range and remove those with extreme values —can also be examined to determine proportion answering in a particular way for items that don’t have a “correct” answer Item Difficulty (cont.) What is the best p-value? — most optimal p-value = .50 — maximum discrimination between good and poor performers Should we only choose items of .50? When shouldn't we? Should we only choose items of .50? Not necessarily ... @ When wanting to screen the very top group of applicants (i.e., admission to university or medical school). Cutoffs may be much higher @ Other institutions want a minimum level (i.e., minimum reading level) Cutoffs may be much lower Item Difficulty (cont.) Interpreting the p-value... example: 100 people take a test 15 got question 1 right What is the p-value? Is this an easy or hard item? ltem Difficulty (cont.) Interpreting the p-value... example: 100 people take a test 70 got question 1 right What is the p-value? Is this an easy or hard item? Item Difficulty (cont’d) General Rules of Item Difficulty... p low (< .20) difficult test item p moderate (.20 - .80) moderately diff. phigh (> .80) rE aicciial ITEM DISCRIMINATION --. The extent to which an item differentiates people on the behavior that the test is designed to assess. 4 iN ey the computed difference between AN the percentage of high achievers and the percentage of low achievers who got the item ye fate dale Item Discrimination (cont.) compares the performance of upper group (with high test scores) and lower group (low test scores) on each item--% of test takers in each group who were correct Divide sample into TOP half and BOTTOM half (or TOP and BOTTOM third) Compute Discrimination Index (D) Item Discrimination @D=U-L U =# in the upper group correct response Total # in upper group Total # in lower group The higher the value of D, the more adequately the item discriminates (The highest value is 1.0) Item Discrimination seek items with high positive numbers (those who do well on the test tend to get the item correct) negative numbers (lower scorers on test more likely to get item correct) and low positive numbers (about the same proportion of low and high scorers get the item correct) don’t discriminate well and are discarded Item Discrimination (cont’d): Item-Total Correlation Correlation between each item (a correct response usually receives a score of 1 and an incorrect a score of zero) and the total test score. To which degree do item and test measures the same thing? Positive -item discriminates between high and low Terel for Near 0 - item does not discriminate between high & low Negative - scores on item and scores.on test disagree Item Discrimination (cont’d): Item-Total Correlation Item-total correlations are directly related to reliability. St Because the more each item correlates with the test as a whole, the higher all items correlate with each other ( = higher alpha, internal consistency) Level of Difficulty Index Range ulty Level (0.00-0.20 Very Difficult 0.21-0.40 Difficult 0.41-0.60 Average/Moderately Difficult 0.61-0.80 Easy 0.81-1.00 Very Easy * Ebel’s (1972) gives the indices of item discrimination in the terms shown below. For average classroom test, these indices are widely accepted. 0.40 and above very good item 0.30 — 0.39 reasonably good, but subject to improvement 0.0=0.29 ‘marginal items usually needing and being subject to Improvement Below 0.19 poor items to be rejected or improved by revision _ ITEM DISCRIMINATION INDEX (D) The item discrimination index of a test refers to the degree, which the item discriminates between high achieving students and low achieving students in terms of the scores of the total test. In technical sense item discrimination index addresses the validity of the test item i.e. the extent to which the item test the attributes it was intended to test. + The formula to determine item discrimination index is: Ru= number of students in the upper group who got the item right. R1= number of students in the lower group who got the em right. '‘4V= one half of the total number of students included i the analysis + The resulting index can range from + 1.00 where all students from upper group got the item right and all the students from lower group got the item wrong to + An index of 0.00 or no discriminating index oceurs equal number of students in each group got the lente * The higher the discrimination index , the better and more reliable the test. When items have no discrimination power or negative discrimination power they should be revised and discarded. This usually indicate that the items are written ambiguously, the answer on the scoring key is wrong or that the content to which the ite: refers was too obscure Types of Discrimination Index + Positive Discrimination ° Negative Discrimination * Zero Discrimination Positive Discrimination * Happens when more students in the upper group got the item correctly than those students in the lower group. Negative Discrimination * Occurs when more students in the lower group got the item correctly than , the students in the upper group SS Zero discrimination + Happens when a number of students in the upper group and lower group who answer the test correctly are equal + Ebel’s (1972) gives the indices of item discrimination in the terms shown below. For average classroom test, these indices are widely accepted. 0.40 and above very good item 0.30 —0.39 reasonably good, but subject to improvement 0.00.29 marginal items usually needing and being subject to. Improvement Below 0.19 poor items to be rejected or improved by revision DIFFICULTY INDEX TABLE RANGE OF INTERPRETATION ACTION DIFFICULTY INDEX O- 0.25 DIFFICULT REVISE OR DISCARD 0.26-0.75 RIGHT DIFFICULT RETAIN _76-ABOVE EASY REVISED OR DISCRD Arrange the scores in descending order Separate two sub groups of the test papers Take 27% of the scores out of the highest scores and 27% of the scores falling at bottom Count the number of right answer in highest group (R.H) and count the no of right answer in lowest group (R.L) Count the non-response (N.R) examinees Quantitative Item Analysis @ Inter-item correlation matrix displays the correlation of each item with every other item @ provides important information for increasing the test’s internal consistency @each item should be highly correlated with every other item measuring the same construct and not correlated with items measuring a different construct Quantitative Item Analysis items that are not highly correlated with other items measuring the same construct can and should be dropped to increase internal consistency Item Discrimination (cont’d): Interitem Correlation Possible causes for low inter-item correlation: Item badly written (revise) Item measures other attribute than rest of the test (discard) Item correlated with some items, but not with others: test measures 2 distinct attributes (subtests or subscales)

You might also like