Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 2

Course: Language Assessment Group members:

Instructor: Nguyễn Thị Hồng Thắm, Ph.D. 1. Hoàng Kỳ Nam - 22814011126


2. Nguyễn Tấn Lộc - 22814011124
GROUP SUMMARY – SESSION 4
3. Nguyễn Thị Ngọc Ân - 22814011108
DESIGNING CLASSROOM 4. Nguyễn Thị Thùy Dung - 22814011113
LANGUAGE TESTS
Key - Test usefulness: the extent to which a test accomplishes its intended criterion or objective
concepts - Item facility: the extent to which an item is easy or difficult for the proposed group of test-takers
and - Item discrimination: the extent to which an item differentiates between high- and low-ability test-
takers
meanings
- Distractor efficiency: the extent to which the distractors “lure” a sufficient number of test-takers
and those responses are somewhat evenly distributed across all distractors.
Main 1. Determining the purpose of a test (pp. 59-61)
points - Consider the purpose of the exercise to be performed by the students
- The purpose of an assessment = test usefulness (Bachman & Palmer, 1996)
- Issues for consideration: the need to administer a test, the purpose it will serve for the students and
teachers, its significance to the course and in comparison to other student performance, use of test
results, beneficial washback, and impact
2. Defining abilities to be assessed (pp. 61-63)
- Teachers need to know specifically what they want to test.
- Carefully review what the students should know or be able to do (e.g. forms and functions covered
in a course unit, constructs to be tested) based on the objectives of a unit or a course
- Determine the constructs that are appropriately framed and assessable to be demonstrated by the
students
- Each construct is stated in terms of performance and target linguistic domain.
3. Drawing up test specifications (pp. 63-65)
- An outline of the test and a guiding plan for designing an instrument, including: the constructs, a
description of content, item types, tasks, skills, scoring procedures, and reporting of results
- Test specifications are not the actual test items or tasks but the descriptions and details of the test
to be followed.
4. Devising test items (pp. 65-71)
- The elicitation mode (or test prompt) and the response mode can be oral or written. Each elicitation
mode can be matched with either response mode, but not all of the response modes match all the
elicitation modes.
- The selection of constructs tested is based on the time spent on those in class, the importance
assigned to them, and the time for test administration.
5. Designing multiple-choice items (pp. 72-83)
- Weaknesses of multiple-choice items: test only recognition knowledge, may allow guessing,
restrict what can be tested, difficult to write successful items, have minimal beneficial washback,
may facilitate cheating
- Two important principles: practicality and reliability
- Multiple-choice items are all receptive response or selective response (test-takers choose from a set
of response – supply items)
- Every multiple-choice item has a stem and several options/ alternatives to choose from; the key is
the correct response; the others are distractors
- Construct an effective item:
+ Design each item to measure a single objective
+ State both stem and options as simply and directly as possible
+ Ensure the intended answer is clearly the only correct one
+ Use items indices to accept, discard, or revise items (optional)
- Suitable multiple-choice items can be selected by measuring items against three indices:
+ Item facility/ Item difficulty (IF): the extent to which an item is easy or difficult for test
takers, can separate high-ability and low-ability test takers and reflect the percentage of
students who can answer the items correctly. IF = students answering correctly (n)/ (N)
students responding to the item
+ Item discrimination/ Item differentiation (ID): the extent to which an item differentiates
between high- and low-ability test takers. Though difficult to create, they are a must for
standardized norm-referenced tests. ID = (items correct in high group (n) – items correct in
low group (n)) / .5 X students in the 2 comparison groups. High discriminating power would
approach a perfect 1.0 and no discrimination power would be 0. Items that scored near zero
would be discarded
+ Distractor efficiency: the extent to which (1) a distractor “lures” a sufficient number of test
takers, especially ones with low ability and (2) those responses are evenly distributed across
all distractors.
6. Administering the test (pp. 83-84)
- Pre-test considerations: provide appropriate pre-test information (on the conditions for the test,
materials that students should bring, the kinds of items that will be on the test, suggestions of
strategies for optimal performance, evaluation criteria), offer a review of components of narrative
and descriptive essays, give students a chance to ask questions and provide responses.
- Test administration details: arrive early, check classroom conditions, try out everything, have
extra paper, etc., start on time, distribute the test itself, sit quietly, available for questions, warn
students about the time
7. Scoring, grading, and giving feedback (pp. 84-86)
- Scoring:
+ Scoring plan reflects the relative weight of each section and of the items in each sections
+ After administering a test once, teachers can revise the scoring plan for the course next time
- Grading: is a product of the country, culture, context of the English classroom, institutional
expectations, explicit and implicit definitions of grades teachers have set, relationship teachers
have with the class, student expectations
- Giving feedback: there are many manifestations of feedback such as scoring/ grading for a test in
terms of a letter grade or a total score/ subscores, for responses to listening and reading items, for
oral production tests, for written essays.
+ Reading quiz: in the form of self-assessment and whole-class discussion of the reading passage
+ Grammar unit test: in the form of diagnostic scores, a checklist of areas need work and class
discussion of the test results
+ Midterm essay: subsequent peer conferences and individual conferences between student and
teacher
+ Listening/ speaking final exam: minimal oral feedback after the oral interview
Questions 1. Should we include options such as “all are correct” or “all of the above” in multiple-choice
questions?
2. Both Item facility and Item discrimination refers to the extent to which an item test differentiates
between high- and low-ability test takers. So, what’s the difference between them?

You might also like