TESTING AND EVALUATIO (Didactica)

TESTING AND EVALUATION
A- TESTING AND ASSESSMENT
Summative Assessment: kind of measurement that takes place to round things off make a
one-off measurement. Include the end-of-year tests that students or the big public exams
which many students enter for.
Formative Assessment: relates to the kind of feedback teachers give students as a course
progressing and which may help them to improve their performance. This is done at a micro-
level every time we indicate that something is wrong and help students to get it right. The
results could suggest the teacher change the focus of the curriculum or the emphasis he/she is
giving to certain lesson elements. It means teachers as well as students may have to change
and develop.
A1 Different types of testing: Four main reasons of testing
 Placement: to place sts in the right class in a school is facilitated by the use of these
tests. Is based on syllabuses and materials students will follow and if the level has been
decided, they test grammar, vocabulary knowledge and assess productive and
receptive skills.
 Diagnostic: Can be used to expose learners’ difficulties, gaps in their knowledge and
skill deficiencies during a course. When we know what the problems are, we can do
something about them.
 Progress/Achievement: Designed to measure learners’ language and skill progress in
relation to the syllabus they follow. Are often written by teachers and given to sts few
weeks how well they’re doing. They can form part of a programme of formative
assessment. Achievement can only work if they contain certain item types which the
students are familiar with. Is sts are faced with new material, the test will not measure
the learning that has been taking place. Achievement tests at the end of term should
reflect PROGRESS NOT FAILURE. They should reinforce the learning that has been
taking place, not go out of their way to expose weakness. They can also help to decide
on future changes to future teaching programmes where students can do worse in the
test.
 Proficiency: give a general picture of a student’s knowledge and ability. Frequently
used at stages people have to reach if the want to be admitted to university, job or
some kind of certificate. Have a backwash effect where they are external exams,
students want to pass them and teachers’ reputation sometimes depend upon how
many they succeed.
 Portfolio: for some people, who say are not good at exams, seems like an unfair
situation, many educators claim that “sudden death” testing does not give a true
picture on how well the students could do in certain situations. Many educational
institutions allow sts to assemble a portfolio of their work over a period of time and
the student can be assessed by looking at 3 or 4 of the best pieces of work over this
period. PROS: provide evidence of student effort. Help the student become
autonomous, can foster student reflection and self-monitor their own learning. Has
clear validity. Sts will have a chance to edit before submitting. CONS: it is time
consuming; teacher will need training in how to select items from the portfolio and
how to give them grades. Students may be tempted to leave their portfolios until the
end of course when their work will be at their best.
A2 – Characteristics of a good test. Criteria to measure a test.
 Validity: a test is VALID if it tests what is supposed to test. Students share this
knowledge before they do the test. A test is valid if it produces similar results to
some other measure. There is validity in the way it0s marked. A particular kind
that concerns of face validity. The test should look as if it is valid. A test consisted
of only 3 multiple-choice items would not convince the sts of its face validity,
however reliable or practical teacher thought it to be.
 Reliability: should give consistent results. They should get the same results on
each occasion. In practice, reliability is enhanced by making the test instructions
absolutely CLEAR, restricting the scope for variety n the answers and making sure
that test conditions remain CONSISTENT. It also depends on the people who mark
the tests- the scorers. Clearly a test is unreliable if the result depends to any large
extent on who is marking. Much though has gone into making the scoring of test
as reliable as possible.
B- TYPES OF TEST ITEM: has a major factor in its success or failure as a good measuring
instrument will be determined by the item types that it contains.
B1 – Direct and Indirect test item.
 Direct: it’s direct if it asks candidates to perform the comm. skill which is being tested.
They try to be as much real-life language use as possible.
 Indirect: try to measure a student’s knowledge and ability by getting what lies beneath
their receptive and productive skills. They try to find out a student’s language
knowledge through more controlled items. E.g., multiple choice, questions or sentence
transformation. They are often quicker to design and easier to mark, and produce
greater scorer reliability.
A distinction between DISCRETE-POINT and INTEGRATIVE testing was made.
 Discrete-point: one tests one thing at a time. E.g., asking sts to choose the correct tens
of a verb.
 Integrative: it expects sts to use a variety of language at one given time. E.g., writing a
composition or doing a conversational oral test.
In proficiency tests, there is a mixture of direct and indirect, discrete-point and integrative
testing. It is said that this combination gives and overall picture of student ability. In placement
tests, they often use discrete-point testing to measure students against an existing language
syllabus, but can compare with more direct and integrative tasks to get a fuller picture.
B2 – Indirect test items types.
 Multiple-choice questions: they were considered to be ideal test instruments for

measuring student’s knowledge of grammar and vocabulary. They were easy to mark.
The answer sheets can be read by machines. There are a number of problems: they are
extremely difficult to write well. There are “distractors” that may put ideas into sts’
heads. The multiple-choice abilities can be enhanced, BUT NOT THEIR ENGLISH. Still,
they’re widely used, but validity and reliability are suspect.
 Cloze procedures: seem to offers us the ideal indirect but integrative test-item. They
can be prepared quickly and are an extremely cost-effective way of finding out about a
st’s overall knowledge. Cloze is the DELETION OF EVERY WORDS IN A TEXT. The
procedure is random, and avoid test designer failings. Anything can be tested and
becomes more integrative in its reach. The score depends on the particular words that
are deleted. Some are difficult to supply than others. Despite problems of reliability,
close is too useful a technique to abandon altogether because it’s clear that supplying
the correct word for a blank implies an understanding of context and a knowledge of
that words and how it operates. Modified cloze is useful for placement test since
students can be given texts they would be expected to cope with at certain levels.
They are useful as part of tests battery in either achievement or proficiency tests.
 Transformation and paraphrase: it’s the re-writing of sentences in a slightly different
form, retaining the exact meaning of the original. The student has to understand the
1st sentences and then know how to construct an equivalent which is grammatically
possible
 Sentence Re-ordering: put words in the right order to make appropriate sentences
that tells the underlying knowledge of syntax and lexico-grammatical elements. Re-
ordering are fairly easy to write, it’s not always possible to ensure one only correct
order.
 Many other indirect techniques, e.g., fill-ins, choosing the correct tense of verbs in
sentences and passages, finding errors in sentences, and choosing the correct form of
a word. All items are quick and efficient to score and which aim to tell us something
about a student’s underlying knowledge.
B3- Direct Test Items: in order to achieve validity and reliability, test designers need to
student’s:
 Create a “level playing field”: in a written test, teachers and sts would complain about
a certain question, e.g., in an essay, since it could unfairly favour candidates who have
knowledge on a certain topic. Receptive skills also need to avoid excessive demands on
the sts’ general or specialist knowledge. Receptive ability testing can also be
undermined if the means of testing requires sts to perform well in writing or speaking.
 Replicate real-life action: traditional testing has often been based exclusively on
general essay questions, and speaking often included hypothetical questions about
what candidates might say if they happened to be in a certain situation. Test of reading
and listening should also reflect real life. Texts should be as realistic as possible, even if
they aren’t authentic. They should be like real reading and listening as possible.
C- WRITING AND MARKING TESTS: may range from a lesson test at the end of the week
to an achievement test at the end of the term or year.
C1 – Writing tests
 Assess the test situation: remind ourselves of the context in which the test takes place.
Decide how much time should be given to the test-taking, when, where and how much
time there is for marking.
 Decide what to test: list what we want to include. Means decision to include skills. And
knowing what syllabus items can be legitimately included, and what kinds of topics and
situations are appropriate to test every item.
 Balance the elements: we have to make a decision about how many each item we
should put in our test. Balancing elements involves estimating how long we want each
section of the test to take. the amount of space and time we give also reflect their
importance in our teaching.
 Weight the scores: our perception of our sts’ success or failure will depend upon how
many marks are given to each section of the test.
 Make the test work: it’s vital we try out individual items and/or whole tests on
colleagues and other students before administering them to real candidates. Later,
having changes made based on our colleagues’ reactions, we will want to try out the
test on students. We can also discover how long the test takes.
C2 – Marking Tests: Cyril Weir made 8 same exam scripts. They marked them 1 st on the basis
of impressionistic marking out of a possible total of 20 marks. Weir writes “the worst scripts, if
they had been marked by certain markers, might have been given higher marks than the best
scripts. There are a number of solutions:
 Training: if scorers have seen examples of scripts at various different levels, then their
marking is likely to be less erratic than if they come to the task fresh.
 More than one scorer: reliability can be enhanced by having more than one scorer.
The more people who look at a script, the greater the chance that its true worth will
be located. 2 examiners watching an oral exam are likely you agree on a more reliable
score than one.
 Global assessment scale: a way of specifying scores than can be given to productive
skill work is to create “predefined descriptions of performance.” Such descriptions say
what students need to be capable of in order to gain the required marks. These are not
without problems: perhaps the description doesn’t exactly match the student who is
speaking. There is also the danger that different teachers “will not agree on the
meaning of scale descriptors”.
 Analytic profiles: marking test more reliable when a student’s performance is analyzed
in much greater detail. Instead of a general assessment, marks are awarded for
different elements. For oral assessment we can judge a student’s speaking in a number
of different ways. We may want to rate their ability to get themselves out of trouble
and how successfully they completed the task which we set them.
A combination of global and analytical gives up the best chance of reliable marking.
 Scoring and interacting during tests: score reliability in oral tests is helped not only by
global and analytic profiles but also by separating the role of scorer from the role of
interlocutor. This may cause practical problems, but will allow the teacher to observe
and assess. Students are now put in pairs or groups for certain tasks since it is felt that
it will ensue genuine interaction and will help to relax students in a way that interl-
candidate interaction may not.
D- TEACHING FOR TESTS

The thing that concerns test designers is the WASHBACK EFFECT. Since teachers reasonably
want their students to pass the test and exams they are going to take, their teaching becomes
dominated by the test and by the items that are in it. Exam teachers suffering from the
washback effect might stick to exam-format activities. THE FORMAT OF THE EXAM OF
DETERMINING THE FORMAT OF THE LESSON.
Two points need to be taken into account: 1) modern tests are grounded far more in
mainstream classroom activities. There are many direct test questions which would not look
out of place in a modern lesson anyway. 2) even if preparing students for a particular test
forma is necessity, “it is important to build variety and fun into an exam course.
Many teachers find teaching exam classes to be satisfying in that where students perceive a
clear sense of purpose. They are in some senses “easier” to teach than students whose focus in
less clear. Good-exam preparation teachers need to familiarize themselves with the tests their
students are taking, and they need to be able to answer their sts’ concerns and worries. There
are a number of things we can do in exam class:
 Train for test types: show various types and ask the student each item is testing s that
they are clear and what is required. We can help by showing what the teacher is
aiming for. Showing the mark scales, so we can make them aware if what constitutes
success. We can help them by helping the approach test items more effectively. Our
task is to make sts familiar with the test items they will have to face is they give their
best.
 Discuss general exam skills: most sts would benefit from being reminded about
general test and exam skills. They need to pace themselves so that they do not spend a
disproportionate amount of time on only one part of the exam. They need to be able
to organize their work so they can revise effectively.
 Do practice tests: sts need a chance to practice taking the test so that they get a feel
for the experience, especially with regard to issues as pacing.
 Have fun: there are a number of ways of having fun with tests. E.g., put words in order
to make sentences by giving sts a set of cards they have to physically assemble into
sentences.
 Ignore the test: when we are preparing students for an exam, we need to ignore the
exam from time to time so that we have opportunities to work on general language
issues, an is that sts can take part in the kind of motivating activities that are
appropriate for all English lessons.

TESTING AND EVALUATIO (Didactica)

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

TESTING AND EVALUATIO (Didactica)

Uploaded by

Copyright:

Available Formats

TESTING AND EVALUATION

A- TESTING AND ASSESSMENT

A1 Different types of testing: Four main reasons of testing

B1 – Direct and Indirect test item.

A distinction between DISCRETE-POINT and INTEGRATIVE testing was made.

B2 – Indirect test items types.

 Multiple-choice questions: they were considered to be ideal test instruments for

D- TEACHING FOR TESTS

You might also like