Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

International Christian University

Language Testing

Fall 2000

Randy Thrasher

Lesson Six

The Testing of Listening Comprehension

Testing Passive Skills We speak of listening and reading as passive or receptive skills. This doesn't mean that they don't involve any activity on the part of the listener or reader. It indicates that there is no visible product of these activities. Comprehension takes place in the mind and so we cannot observe this activity. Writing and speaking also involve activity in the mind but, unlike listening and reading, a written or spoken product results. The fact that comprehension takes place in the mind and cannot be directly viewed presents a special problem for language testers. We have to get the test takers to do something to demonstrate their comprehension. So we are faced with a situation in which we must mix skills. Whatever activities we use to get the test takers to demonstrate that have understood what they heard will involves skills other than listening comprehension. The traditional sort of listening comprehension test in which the test taker must answer printed questions about what he or she has heard involve reading. Asking the questions orally only allows a partial escape from the problem. The test taker must respond by speaking or, if a multiple choice format is used, by reading and selecting the correct answer. Having the test takers listen to a cue, listen to a question about that cue, and then listen to possible answers to that question in order to select the correct one, imposes an impossible burden on memory. There is another practical problem with using choices given orally when you are testing a group of people. It may be obvious to the weaker test takers what choice the rest of the group considered correct. The timing of the marking activity will indicate which choice those around you consider correct.. Isolating the test takers may solve this second problem but the memory issue remains. In testing English in Japan we can use written questions and responses to check listening comprehension. This is possible because Japanese learners are almost always more proficient in reading than they are in listening. But when a friend and I were asked to design a test of Japanese listening comprehension for speakers of English we faced a very different situation. The listening and even speaking skills of these test takers were much higher than their ability to read written Japanese. This meant that, if we presented the questions and answers in written Japanese, we could

International Christian University

Language Testing

Fall 2000

Randy Thrasher

not be sure if those who missed the item got it wrong because they didn't understand what they heard or because they couldn't read the question and answer choices we provided. As I said, the second possibility was much more likely than the first.1 The broad issue is what sorts of tasks can we pose the test takers to allow them to demonstrate their comprehension of what we ask them to listen to. We will return to this issue when we discuss possible listening comprehension task types below. What is Listening? The other broad consideration in the testing of listening comprehension is the sorts of material we will ask the test takers to listen to. What we include in this list is determined by our construct of listening comprehension. If our construct includes the ability to distinguish between two words that differ in only one sound, then we can have our test takers listen to pairs of word such as 'bit' and 'bite', 'for' and 'far', and 'buzz' and 'bus'. We would also have to add some pairs that are pronounced the same and then ask the test takers to tell us if the two words are the same or different. Such items were very popular 40 years ago but are not seen very much today. The reason they are rarely used today is not that our construct of listening has changed. We still believe that identification of individual words is important. Such item are unpopular because they are not really testing word identification but only 'so called' problem pairs of sounds. The 'rice'/lice' pair nicely illustrates what I'm talking about. It cannot be denied that Japanese learners of English have difficulty with pronouncing English /l/ and /r/. But this pronunciation problem rarely gets in the way of communication. It marks the speaker as having a Japanese accent and we should help our students be able to pronounce these two sounds correctly. That is, it may be a problem in speaking, but it is rarely a problem in listening comprehension. Heaton, in Writing English Language Tests (64-65) gives some other examples of such sound discrimination listening tests. In one of his examples the test takers are shown a picture of a sock and hear the words 'shark', 'sock' 'sack', and 'sock'. In a variation on the same-different items I mentioned above, Heaton reads three sentences to the test takers and they are instructed to identify the one that is

We tried to solve this problem by presenting the question and choices in English, the native language of the test takers. This was not a fully successful solution. The weaker test takers did not complain, but the better ones claimed that switching languages was more difficult than reading the Japanese would have been.

International Christian University

Language Testing

Fall 2000

Randy Thrasher

different (or the two that are the same). His example is A. There's a bend in the middle of the road. B. There's a bend in the middle of the road. C. There's a band in the middle of the road. Heaton's sock item might have some limited usefulness with very beginning level students since it could be argued that this item is testing the ability to identify what the pictured object is called. But his same-different item has serious drawbacks. In order to get sentences that differ in only a single sound in one of the words, meaning often is distorted. What could "There's a bend in the middle of the road." mean? Could it mean that the sides of the road are straight but the middle bends? This seems contrary to the laws of physics. And even if this meaning problem could be overcome with pairs such as "There's a bend in the road. There's a band (marching band) in the road." the question of just what such items are measuring must be faced. The intention of Heaton and others who use such items is the have the test takers demonstrate that they can distinguish the vowel sound in 'band' from the one in 'bend'. But is this really what the item is testing? Inevitably, one of the pair of sentences is more natural or usual than the other. Native speakers of English will most probably hear the sound of the word they are expecting, not necessarily the one that was pronounced. In other words, to get this item correct, the meaning has to be suppressed. It is difficult to argue that an item is testing a necessary language skill if this is the case. Sentence Level Listening Comprehension It is because of these sorts of problems with trying to test individual sounds that we have moved away from this type of item and begun to focus on items involving sentences or longer stretches of language. The listening comprehension items in the TOEFL are good examples of this trend. There are two basic types of items at the sentence level. Short questions are asked and possible answers are printed in the test booklet. Or a sentence is read and the test taker is instructed to select the best paraphrase from the choices provided. And a number of variations on these two basic types are possible. The choices can be pictured instead of printed. Or a visual clue (a map or drawing) is provided to guide the selection of the correct response.

International Christian University

Language Testing

Fall 2000

Randy Thrasher

These sorts of items have been the workhorses of MC listening testing for 40 years but they are not without their drawbacks. The question type often turn out to be no more than grammar items presented orally. Cue: Is Mr. Blake a Canadian? a. Yes, he is. b. Yes, he does. c. Yes, he has. d. Yes, he can. All the choices have to be grammatically correct sentences. Otherwise the test takers could eliminate the grammatically incorrect sentences before even listening to the cue. But this means that the test takers can look at the four choices above and realize that all they need to do is listening for the verb which will probably be sentence initial. So the test takers learn to listen to the first word of the cue and ignore the rest. But this strategy defeats the purpose of the item. I was able to get around this problem in one test I designed by asking questions that any educated Japanese could answer if the question were posed in Japanese. I used questions like, Cue: What is the capital of Japan? a. Osaka b. Kyoto c. Nara d. Tokyo I could use such items because all the test takers were adult college educated Japanese. These items are clearly not oral grammar items but they have drawbacks of their own. They are more difficult to write than the usual kind. You must decide what educated Japanese do and do not know and realize that there is diversity in this country. I got into trouble asking "Which blooms first, the cherry or the plum?" because they bloom together in Hokkaido. Such items can also get out of date. I asked, "Can you take the Shinkansen to Nagano?" when the correct answer was 'No'. And it is very difficult to write large numbers of such items. I began to run out of ideas after the first 100 or so. Let me finish my discussion of these MC sentence level listening items by pointing out that they can be used to teach as well as test listening comprehension. When I was teaching what was called English Conversation in a more traditional 4

International Christian University

Language Testing

Fall 2000

Randy Thrasher

university I had the problem of getting the students to come to class and to come on time. So I decided to start each class with a short listening comprehension exercise. I would give the class a 10 item MC listening test using sentence level question and paraphrase type items. But I had to teach several of these classes in one day and there were times when I wanted to repeat the same sort of test with the same class. I included the test as one of the drills in the textbook I designed for the course. But if I used the usual format, the students would soon learn (and tell their friends) that the correct answer to item one was 'd' and the correct answer to item two was 'a' and so on. Or they would simply circle the correct choice in the textbook. I solved this problem by having multiple versions of the test. Each choice of every item was correct in at least one version of the test. The paraphrase type looked like this. Cue: Cue: Cue: Cue The book is old. The job is difficult. The class is long. The tree is short. A. It's not new. B. It's not easy. C. It's not short. D. It's not tall.

And it was possible to do the same thing with question type items and even the dialogue type which we will turn to next. However, I must point out that this technique cannot be used in formal test situations because the difficulty of the item can change with the change in cues. Beyond the Sentence So far we have looked at items that test comprehension of single sentences, but it is possible to use MC items to test longer discourse. You are familiar with the TOEFL dialogue items. The cue is a dialogue between two speakers (usually a man and a woman so it is easy to identify the two) and the test taker is expected to select the best answer to a question concerning the dialogue. This question can be presented orally or in writing. I personally prefer to present the question in writing. We can do this with most English learners in Japan because, as I mentioned earlier, their reading ability is superior to their listening comprehension. And printing the question as well as the answer choices in the test booklet has two advantages. It lessens the memory load on the test takers and it provides them with an idea of what it is important to listen for. We usually have some purpose in listening. We may not pay close attention to a report of the weather in Bombay but we listen closely to what the

International Christian University

Language Testing

Fall 2000

Randy Thrasher

announcer says the weather will be where we live. Giving the test takers an idea of the information that they will need to answer the question makes such MC items a bit more like real-world listening comprehension. Helping the test taker know what to listen for becomes even more important in longer listening tasks that often have more than one question related to them. In fact, in a listening test using announcements of the type you might hear on airplanes, in department stores, or in rail stations, we instructed the test takers to read the three questions that were posed before listening to the announcement. Beyond MC Listening Items The sorts of items we have discussed so far can be converted to short answer ones. Instead of providing choices to select, we ask the test taker to write the answer to the question or the paraphrase of the statement. This change means that, instead of requiring reading comprehension in addition to listening comprehension, writing ability is needed to do the task. I don't believe that, for many Japanese students of English, we can assume that their writing ability is superior to their listening. For those for whom this assumption is valid or in situations where there is need of the positive washback obtained by having the students write something, the short answer format can be justified. However, we are rarely asked to write the answer to questions or paraphrase what someone has said (at least, not at the sentence level). If the beneficial washback of writing is needed, it would be far better to pick a listening task which is naturally combined with writing. Asking the test takers to take information down that they are given over the telephone would be an example of such a task. Taking notes on a lecture is another. But notice that both of these tasks involve more than just listening and writing. There is, in both, the necessity to summarize or get the main points of what was said. This is a separate skill that even some native speakers never develop. But if the test you are developing is for bilingual secretaries, the telephone task is appropriate and the note-taking task would be appropriate in academic settings. Dictation Dictation has sometimes been claimed to be a good test of listening

International Christian University

Language Testing

Fall 2000

Randy Thrasher

comprehension. Lado denounced it2 and Oller praised it3. Oller claimed that it was a good example of what he called a 'pragmatic' test. He defined a pragmatic test as "any procedure or task that causes the learner to process sequences of elements in language that conform to the normal textual constraints of that language and which require the learner to relate sequences of linguistic elements via pragmatic mappings to extralinguistic context." (38) For Oller, dictation was good because it dealt with text that was longer than a single sentence and allowed the test takers to use their knowledge of the world and other clues from the context to reconstruct the message. All this may be true, but dictation lacks any obvious real world counterpart. We say that secretaries 'take dictation' but what they do is quite different from what those who take dictation tests are expected to do. The secretary listens to the message once, gets the gist of what was said in shorthand and then reconstructs the original message. In a typical dictation test, the material is read once at normal speed and then, sentence by sentence very slowly and perhaps with repetitions, and finally the whole passage is read one more time at normal speed. Not only is this different from real world listening and writing, it requires a special technique. The test takers must learn to get the gist on the first reading, then focus on each word in the sentence by sentence reading, and finally check what they have written during the last reading. Test takers usually quickly learn this technique but there is a clear 'method effect' in dictation tests. There may be situations in which dictation is a useful test task but the best listening tasks are those that have clear real-world counterparts. It is to those sorts of tasks that we now turn. Real World Listening Tasks We have already mentioned some listening tasks that look like the sort of listening that must be done in the real world. Taking a message over the telephone is one and taking notes on a lecture is another. But not all such tasks are at this level of difficulty. Getting just one piece of information (a telephone number or name) from an announcement could be used with relatively low proficiency students. Following simple spoken instructions is another that can be used with beginning students. But as

2 3

"[O]n critical inspection it [dictation] appears to measure very little of language." (34) "Although an inspection of the results of dictation tests with appropriate statistical techniques shows the technique to be very reliable and highly valid, it has not always been looked on with favor by the experts." (39)

International Christian University

Language Testing

Fall 2000

Randy Thrasher

many have pointed out4, listening is most naturally combined with speaking. Much of our interaction using language involves a constant exchange of listening and speaking. Thus a face to face interview is a good way to test both listening comprehension and the ability to speak. But interviews are not usually appropriate for beginning students. Yet there are ways of combining listening and speaking in tasks that test takers without high proficiency can do. My favorite is a simple conversation task developed by John Upshur, my mentor at the University of Michigan. Upshur prepared sets of three simple pictures. Perhaps in one picture a girl would be pictured between two boys. In another picture in the set the girl would be pictured to the left of the boys and in the final one in the set she would be pictured on their right. Two versions of each set were prepared. In one version the three pictures are merely labelled a, b, and c. In the other version one of the letters would be circled. Two test takers would be placed facing each other across a table. Each would be given one version of the set of pictures. The one who got the version with one letter circled was told to describe that picture so that the person across the table could identify which picture was being described. If the second person correctly identified the intended picture, both test takers were given a point for communication. Sometimes the speaker used an ungrammatical utterance to describe the picture but the listener was able to figure out the meaning. Sometimes the speaker appropriately described the picture but the listener failed to understand. It should be obvious that each test taker (both speaker and listener) needs to have many chances to do such a test. They need chances to be both speaker and listener, to have different partners, and to be exposed to several different sets of pictures. Thus this test is much better as a class exercise than as a formal test. There is another reason for using it as a teaching device. It can only be used once as a test. Once the students take such a test they learn how to do it. The final listening task I would like to describe here is a combination reading and lecture test. Some of you are familiar with this sort of test because we have used it from time to time in the ELP. In it, the test takers are given material to read. After they have had time to read through the material once, they are given a lecture on the same topic. The reading may discuss one aspect of the topic and the lecture another. One of the early ones we used in the ELP dealt with the Chernoble disaster. The

Wier (98) or Hughes (134)

International Christian University

Language Testing

Fall 2000

Randy Thrasher

reading passage dealt with the medical and other physical effects of the disaster and the lecture dealt with the political consequences. Another possibility is for the reading to discuss one side of a controversy and the lecture to take the other side. Many other possibilities exist but whatever is selected must be suitable as information for an essay to be written by the test taker. In the Chernoble topic the test takers were asked to discuss both the physical and political fall out from the disaster. This prompt enabled us to evaluate the essay in three ways. We could examine the essay as an essay (how well it answered the question, how good its organization was, the quality of the grammar and the word choice). But we could also ask if the essay showed that the test taker had understood the reading and if he/she had been able to follow the lecture. As was mentioned in Lesson Four, if such production tasks are used to measure listening comprehension, we need to come up with a scoring rubric that allows what the test takers produce to be reliably graded. We will discuss such scoring rubrics in Lessons Eight and Nine. The Content of Good Listening Comprehension Material Since there are many possible listening comprehension tasks it is difficult to speak in general terms about what constitutes good material for such tasks. But some points can be made. In the previous section we talked about the importance of using listening tasks that have real world counterparts, but not all listening tasks that we encounter everyday are equally good as tests of listening comprehension. Using news broadcasts, for example, is not usually a good idea because the specialized vocabulary and rapid delivery make such material unsuitable for all but very advanced students. Also the news involves the presentation of many facts with very few repetitions. That is, the material is very dense. News broadcasts also soon become dated. Another point that can be made is that the test material should resemble the real world use as closely as possible. Radio or TV lectures may be scripted but most university lectures aren't. Yet, lecture tests tend to be like the radio or TV variety not like the kind you hear in a usual academic setting. One way to avoid using 'unnatural' lectures is to prepare a script of the lecture (to make sure that all the points that need to be covered are in fact dealt with) but use the script as you would notes and not read it.

International Christian University

Language Testing

Fall 2000

Randy Thrasher

I have purposely avoided the use of the word 'authentic' in this discussion because, authenticity has become one of the shibboleths of recent test theorizing. It is a shibboleth because some would have us believe that authenticity is the one thing that determines whether or not a test is good. The claim is that using authentic material (in our case authentic listening material) makes the test valid and using material that the test writer has created or modified makes the test invalid. I cannot accept this idea. I agree with Messick that validity is a matter of the accuracy of the inferences that are made on the basis of the test results. Therefore, the only way that authenticity could determine validity would be if the selection of material to be listened to determined the accuracy of the inferences that can be drawn from the test results. This is clearly absurd. The listening material is important but so is the task that we ask the test takers to perform in order to demonstrate their comprehension. Practically speaking, if using authentic materials means that the task becomes too difficult for our test takers, insisting on such materials reduces the usefulness of the test. If the editing of such material can reduce the level of difficulty to an acceptable level, I believe that such tampering with authentic material is justified. The real issue is trying to keep our listening tasks as closely tied to real world language use as possible. This often means using authentic materials. But there are cases in which a better fit can be achieved if we use modified authentic material or even material constructed for the purpose of the test. Final Comments on Testing Listening Comprehension Listening is a skill that is very difficult to break up into its constituent parts. Listening is, at bottom, getting meaning from spoken language, and all the subskills that make up listening are all subservient to this quest for meaning. As we saw in our discussion of attempts to check comprehension of individual sounds in sentences, this means that we usually have to sacrifice meaning in order to test the subskills that make up listening. This, I believe, is too big a price to pay. This leaves us with only the possibility of testing different sorts of listening: understanding single words (Heaton's 'sock' example), single sentences (TOEFL LC items), or longer pieces of discourse (dialogues, announcements, lectures, etc.). And we must also realize that we can't totally avoid the introduction of skills other than listening when we try to get our test takers to demonstrate that they did understand what they heard. Yet we must always remember that the additional skill required to

10

International Christian University

Language Testing

Fall 2000

Randy Thrasher

do the task we set should be easier than understanding the listening itself. If we keep this in mind, try to use test tasks that have real world counterparts, and never forget that the question we start with (the reason we are designing a test in the first place) dictates what sort of test task is appropriate, we should be able to build good listening comprehension tests.

Works Cited Heaton, J.B. (1988) Writing English Language Tests Longman Hughes, Arthur (1990) Testing for Teachers Cambridge University Press Lado, Robert (1961) Language Testing McGraw-Hill Oller, John (1979) Language Tests at School Longman Weir, Cyril (1993) Understanding and Developing Language Tests Prentice Hall

11

You might also like