Professional Documents
Culture Documents
A Study On An Achievement Listening Test Design For Espd'S Students Bui Ngoc Lien Context
A Study On An Achievement Listening Test Design For Espd'S Students Bui Ngoc Lien Context
A Study On An Achievement Listening Test Design For Espd'S Students Bui Ngoc Lien Context
Context
This test is a final achievement test designed to measure knowledge and academic
achievement of students in listening skills at Foundation Studies Department of Hanoi
University. They are Intermediate learners, who have just completed the first semester of
their first academic year and have been required to sit an exam. In their first semester, they
have studies the listening course, which consists of two sub-components namely
Conversation and Dictation Listening. With regard to Conversation Listening, special focus
has been put on the development of learners’ listening skills and strategies to be able to
comprehend daily life conversations and talks made by native speakers with various
accents. Meanwhile, Dictation Listening puts more emphasis on promoting students’
capacity of listening and perceiving the accurate written forms of oral sounds, words and
simple sentences so that they can confidently dictate (write down exactly what they hear)
from speakers. Therefore, test designers are going to choose listening material and design
listening tasks which matches description and objectives of the course as well as students’
level.
I. Literature review
Test listening skill
According to Heaton (1988), testing four basic skills plays a significant role because it
evaluates students’ language skills for purposes of selection or comparison. In addition, he
claims that testing provides feedback about the results of education for teachers as well as
for learners, which can bring some backwash to teaching and learning process. As a
receptive skill, listening tests typically resemble reading tests except the fact that students
listen to a text instead of reading. Therefore, they might have few chances to look back
information. In general, a listening test includes 3 rudimentary elements: the listening
stimuli which represents typical oral language, the questions and responses format and the
testing environment which should be free of external distraction (Rubin & Mead, 1984).
Direct test
According Hughes (1989), direct test requires examinees “perform precisely the sill which
we wish to measure” (p15). It seems attracting, even though there might have problems
related to reliability of testing such productive skills as writing and speaking because it is
limited to a rather small sample of tasks, so as a result it cannot include a representative
structure. However, if the test designers have a clear awareness of the abilities they want to
test, direct test is straightforward and leads to helpful backwash (Hughes, 1989).
Objective testing
In terms of scoring tests, objective testing requires no judgment for scorer. The reason is
that an objective test has only one or limited correct answers, so no matter which teacher
marks, the score of a candidate will stay the same. (Heaton, 1988). Heaton also adds a
weakness of objective testing is time-consuming in preparation and test designing. Despite
some criticisms that objective test may be easy to guess the answer, Heaton believes that
“look easier is no indication that they are easier” (p26).
Content validity
Content validity is the extent to which the items constituting a representative sample covers
the most appropriate and necessary content essential for a good performance. (Hughes,
1989). In an attempt to have and judge content validity, Heaton (1988) points out that it is
important to have a specification in which particular skills and structures in the text are
clearly written.
Construct validity
Construct validity is the most important type of validity (Cronbach, 1990). It investigates if
the test measures exactly and adequately the ability and skills which is aims to measure.
Face validity
A test is said to have face validity “if it looks as if it measures what it is supposed to
measure” (Hughes, 1989). In other words, it is the extent to which a test appeals to
candidates or to those choosing it on behalf of the candidates. Face validity is crucial
because if students do not realize face validity of a test, they might not put maximum effort
into performing the tasks (Heaton, 1988)
Predictive validity
Predictive validity indicates the extent to which an individual's future level on the criterion
is predicted from prior test performance (Hughes, 1989).
Concurrent Validity
Concurrent Validity is whether a test correlates with or provides similar result as another
similar test of the same skill. In other words, it can be studied when one test is proposed as
a substitute for another and it is examined when test score and criterion score are
determined at essentially the same time (Hughes, 1989).
Test reliability
Reliability is another very fundamental consideration when testing because it addresses the
consistency of testing process in relation to test administration and scoring (Hughes, 1989).
It means that if every time the test is administered, it will have the same outcome. The
consistency in a test, which is called internal consistency, happens if there is correlation
among the variables comprising the test. Besides, inter-rater reliability refers to the level of
agreement between different raters on an instrument.
Teachers, as the agents of assessment, need to ensure the reliability and validity of their
classroom assessment and base on that to support their students’ learning (Black &
William, 1998a, 1998b).
3. Test specifications
Specifications for the test must be developed at the very beginning of the test design
process. According to Hughes (2003, p.59), this will contain information regarding the
“content, test structure, timing, medium/channel, techniques to be used, criterial levels of
performance, and scoring procedure.”
Test content
Test content refers to items that will be included in the test, which will subsequently
suggest the construct of the test. As a result, test content needs to be chosen carefully from
samples of the test domain (McNamara, 2000). The domain of the test can either be defined
as “a set of practical, real-world tasks” if the construct is more operational, or “a theory of
components of knowledge and ability that underlie performance in domain” if the construct
is rather abstract (p.25). Importantly, content should be specified as fully as possible in
terms of operations, types, addressees and length of text(s), topics, readability, structural
and vocabulary range, dialect, accent, style and speed of processing.
Scoring procedure
These are of vital importance, especially when scoring is subjective. Test developers should
be clear about ways to achieve reliable and valid scoring, the types of rating scale, the
number of people involved in scoring and procedures to deal with disagreement among
scorers (Hughes, 2003).
Constructed response
Constructed response are those in which a student is required to actually produce language
rather than simply selecting answers. Therefore, they are suitable for testing the interaction
of receptive and productive skills such as listening. In contrast to selected response format,
constructed response tests eliminate most of the guessing, but introduce all of problems
associated with subjectivity on the part of scorers. Also, marking is more time- consuming.
Another potential problem is that test takers may be able to be bluff because this is also a
type of guessing but about what raters want not what may be correct as in selected-
response tests. Among some types of selected- response items, the test designed uses partial
dictation. According to Nation and Newton (2009) considered partial dictation (PD) as an
easier variant of full dictation and a plausible activity in enhancing FL/L2 listening ability.
Students are provided with an incomplete written text and fill in missing words while
listening to an oral version of the text. Some FL/L2 researchers recommended the use of
partial dictation as a reliable, valid, and plausible listening test (Buck, 2001; Hughes, 1989;
Nation & Newton, 2009). Buck (2001) supports Hughes’ (1989) suggestion on the use of
partial dictation for low-level students when dictation proved too difficult for the students.
Using partial dictation helps students focus on missing parts, making it easier for them to
follow the text and/or to get its main points.
Task Listen to a short talk/ dialogue and find the correct answer for each question.
description The task focuses on identification of specific ideas.
Presentation Aural
Topic Education, Work and Leisure, Food, Movie and Television, Memory,
Describing things, Relationships, Money makes the world go round, Travel
and Exploration, Environment
Pattern Dialogue
Questions 6-10
Features of the task
Task level B1
Task description Listen to a short talk/ dialogue and decide each sentence True or False.
The task focuses on identification of specific ideas.
Topic Education, Work and Leisure, Food, Movie and Television, Memory,
Describing things, Relationships, Money makes the world go round,
Travel and Exploration, Environment
Pattern Dialogue
Section 2
Features of the task
Task level B1
Task description Students listen to a dictation audio 2 times and fill ONE WORD or a
NUMBER in each gap.
Instructions to Listen and fill in the gaps. Write no more than ONE WORD and/or A
candidates NUMBER in each gap.
You will hear the piece TWICE.
Presentation Aural
Pattern Monologue
Timing
For about 20-22 minutes excluding 10 minutes for transferring answers
Operations
- Students presents answers by writing answers A, B or C (Questions 1-5), writing
answers T or F (Questions 6-10) and writing answers from the recording (Questions
11-20) on the answer sheet.
- Candidates record their answers on the question paper as they listen.
- They are given 10 minutes at the end of the test to copy their answers onto the
answer sheet
- In each part the recording is played twice.
- The recordings contain a variety of different native speaker accents
Scoring procedures
Section 1 (Questions 1-10): Total: 5 points
- For multiple choice and binary choice questions, there is only one correct answer per
question.
- For each correct answer, 0.5 mark will be awarded; for wrong or blank answer, 0 mark
will be awarded
Section 2 (Questions 11-20): Total: 5 points
- For each correct answer, 0.25 mark will be awarded;
- For one of these following problems: Wrong spelling answers, over the word/number
limit, wrong word order, or blank answer, 0 mark will be awarded
References
Black, P. J., & William, D. (1998a). Assessment and classroom learning. Assessment in
Education: Principles Policy and Practice, 5(1), 7-73.
Black, P. J., & William, D. (1998b). Inside the black box: raising standards through
classroom assessment. London: King’s College London School of Education.
Brown, J. D., & Hudson, T. (2002). Criterion-referenced language testing. Cambridge:
Cambridge University Press.
Buck, G. (2001). Assessing listening. Cambridge: Cambridge University Press.
Cronbach, L. (1990). Essentials of psychological testing. Harper & Row, New York.
Heaton, J. B. (1988). Writing English Language Tests: Longman Handbook for Language
Teachers (New Edition). London: Longman Group UK Ltd.
Hughes, A. (1989). Testing for Language Teachers. Cambridge: Cambridge University
Press.
Hughes, A. (2003) Testing for Language Teachers (2nd Edition ed.). Cambridge:
Cambridge University Press.
Madsen, H. S. (1983). Techniques in Testing. Oxford: Oxford University Press .
McNamara, T. (2000). Language testing. Oxford: Oxford University Press.
Messick, S. (1989b). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp.
13-103). New York: Macmillan
Nation, I. S. P., & Newton, J. (2009). Teaching ESL/EFL Listening and Speaking. New
York: Routledge Publisher.
Rubin, D. L., & Mead, N. A. (1984). Large scale assessment of oral communication skills:
Kindergarten through grade12. Urbana, Ill, ERIC Clearinghouse on Reading and
Communication Skills, National Institute of Education.