This document summarizes the key points from a session on designing classroom language tests. It discusses determining the purpose and scope of a test, drafting test specifications, developing test items like multiple choice questions, administering and scoring the test, and providing feedback. It also addresses questions about using options like "all are correct" in multiple choice and the differences between item facility and item discrimination.
This document summarizes the key points from a session on designing classroom language tests. It discusses determining the purpose and scope of a test, drafting test specifications, developing test items like multiple choice questions, administering and scoring the test, and providing feedback. It also addresses questions about using options like "all are correct" in multiple choice and the differences between item facility and item discrimination.
This document summarizes the key points from a session on designing classroom language tests. It discusses determining the purpose and scope of a test, drafting test specifications, developing test items like multiple choice questions, administering and scoring the test, and providing feedback. It also addresses questions about using options like "all are correct" in multiple choice and the differences between item facility and item discrimination.
Instructor: Nguyễn Thị Hồng Thắm, Ph.D. 1. Hoàng Kỳ Nam - 22814011126
2. Nguyễn Tấn Lộc - 22814011124 GROUP SUMMARY – SESSION 4 3. Nguyễn Thị Ngọc Ân - 22814011108 DESIGNING CLASSROOM 4. Nguyễn Thị Thùy Dung - 22814011113 LANGUAGE TESTS Key - Test usefulness: the extent to which a test accomplishes its intended criterion or objective concepts - Item facility: the extent to which an item is easy or difficult for the proposed group of test-takers and - Item discrimination: the extent to which an item differentiates between high- and low-ability test- takers meanings - Distractor efficiency: the extent to which the distractors “lure” a sufficient number of test-takers and those responses are somewhat evenly distributed across all distractors. Main 1. Determining the purpose of a test (pp. 59-61) points - Consider the purpose of the exercise to be performed by the students - The purpose of an assessment = test usefulness (Bachman & Palmer, 1996) - Issues for consideration: the need to administer a test, the purpose it will serve for the students and teachers, its significance to the course and in comparison to other student performance, use of test results, beneficial washback, and impact 2. Defining abilities to be assessed (pp. 61-63) - Teachers need to know specifically what they want to test. - Carefully review what the students should know or be able to do (e.g. forms and functions covered in a course unit, constructs to be tested) based on the objectives of a unit or a course - Determine the constructs that are appropriately framed and assessable to be demonstrated by the students - Each construct is stated in terms of performance and target linguistic domain. 3. Drawing up test specifications (pp. 63-65) - An outline of the test and a guiding plan for designing an instrument, including: the constructs, a description of content, item types, tasks, skills, scoring procedures, and reporting of results - Test specifications are not the actual test items or tasks but the descriptions and details of the test to be followed. 4. Devising test items (pp. 65-71) - The elicitation mode (or test prompt) and the response mode can be oral or written. Each elicitation mode can be matched with either response mode, but not all of the response modes match all the elicitation modes. - The selection of constructs tested is based on the time spent on those in class, the importance assigned to them, and the time for test administration. 5. Designing multiple-choice items (pp. 72-83) - Weaknesses of multiple-choice items: test only recognition knowledge, may allow guessing, restrict what can be tested, difficult to write successful items, have minimal beneficial washback, may facilitate cheating - Two important principles: practicality and reliability - Multiple-choice items are all receptive response or selective response (test-takers choose from a set of response – supply items) - Every multiple-choice item has a stem and several options/ alternatives to choose from; the key is the correct response; the others are distractors - Construct an effective item: + Design each item to measure a single objective + State both stem and options as simply and directly as possible + Ensure the intended answer is clearly the only correct one + Use items indices to accept, discard, or revise items (optional) - Suitable multiple-choice items can be selected by measuring items against three indices: + Item facility/ Item difficulty (IF): the extent to which an item is easy or difficult for test takers, can separate high-ability and low-ability test takers and reflect the percentage of students who can answer the items correctly. IF = students answering correctly (n)/ (N) students responding to the item + Item discrimination/ Item differentiation (ID): the extent to which an item differentiates between high- and low-ability test takers. Though difficult to create, they are a must for standardized norm-referenced tests. ID = (items correct in high group (n) – items correct in low group (n)) / .5 X students in the 2 comparison groups. High discriminating power would approach a perfect 1.0 and no discrimination power would be 0. Items that scored near zero would be discarded + Distractor efficiency: the extent to which (1) a distractor “lures” a sufficient number of test takers, especially ones with low ability and (2) those responses are evenly distributed across all distractors. 6. Administering the test (pp. 83-84) - Pre-test considerations: provide appropriate pre-test information (on the conditions for the test, materials that students should bring, the kinds of items that will be on the test, suggestions of strategies for optimal performance, evaluation criteria), offer a review of components of narrative and descriptive essays, give students a chance to ask questions and provide responses. - Test administration details: arrive early, check classroom conditions, try out everything, have extra paper, etc., start on time, distribute the test itself, sit quietly, available for questions, warn students about the time 7. Scoring, grading, and giving feedback (pp. 84-86) - Scoring: + Scoring plan reflects the relative weight of each section and of the items in each sections + After administering a test once, teachers can revise the scoring plan for the course next time - Grading: is a product of the country, culture, context of the English classroom, institutional expectations, explicit and implicit definitions of grades teachers have set, relationship teachers have with the class, student expectations - Giving feedback: there are many manifestations of feedback such as scoring/ grading for a test in terms of a letter grade or a total score/ subscores, for responses to listening and reading items, for oral production tests, for written essays. + Reading quiz: in the form of self-assessment and whole-class discussion of the reading passage + Grammar unit test: in the form of diagnostic scores, a checklist of areas need work and class discussion of the test results + Midterm essay: subsequent peer conferences and individual conferences between student and teacher + Listening/ speaking final exam: minimal oral feedback after the oral interview Questions 1. Should we include options such as “all are correct” or “all of the above” in multiple-choice questions? 2. Both Item facility and Item discrimination refers to the extent to which an item test differentiates between high- and low-ability test takers. So, what’s the difference between them?