Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 20

Chapter 3

Five major limitations identified and discussed by Bachman (1990), are as


follows:
(a) Subjectivity
(b) Under specification of domain
(c) Incompleteness
(d) Indirectness
(e) Imprecision 1.1 SUBJECTIVITY
a)The most obvious form of subjectivity in tests is seen in grading tests that
are in the supply type
or subjective format such as essays and even interviews. This issue has
been addressed at some
length in previous chapters. However, subjectivity does not refer only to
grading but also to
other elements of the test as well. Even when the test is an objective,
select or multiple choice
type test, there is still some amount of subjectivity. This subjectivity is found
in the selection of
passages and item formats as well as content that is to be tested. In a test
that contains a reading
comprehension passage, for example, why was one passage selected over
another? Was it
because of the content? If so, then there must surely be many passages on
one content. The
question then is why the use of one passage and not another on the same
content? The answer is
that decisions that affect the test and its ability to precisely measure are
made by individuals.
There is some degree of subjectivity involved in these decisions.
b)A test is a measurement of some content. This content can be referred to
as the domain or the
construct of the test. However, while it may be quite easy to specify the
domain to be listening
comprehension, for example, it is not as easy to test or measure the domain.
When any kind of
theoretical domain or construct is operationalised, there is bound to be
some aspect of the
domain that cannot be translated into a test. The test therefore underspecifies the domain. It is
this under-specification of domain that limits a test as a measure of ability,
knowledge or sample

behaviour.
c)Incompleteness refers to the students inability to demonstrate the entire
repertoire of the
construct being measured. As a test is constrained by time and physical
setting, a student will
never be able to show all of what he or she is able to do. Because only a few
questions can be
asked in a test due to time constraints, these questions may not be able to
elicit the students true
or complete ability. Similarly, the constraints placed by the physical setting
of the test may also
restrain the student from demonstrating specific kinds of abilities. As such,
we should take note
that even when a student scores zero points in a test, this does not mean
that he or she is
completely ignorant of the subject or ability being tested. It is just that the
test has not elicited
the knowledge or abilities that the student is able to convey or perform.
d)While we are aware of the importance of having direct tests, it is unlikely
that a test will be
completely free of being an indirect measure of ability. This limitation is
inherent in the testing
situation itself. Many of us have gone through test anxiety. Once the word
test or assessment is
mentioned, the entire situation changes. While some students will be able to
speak well in
situations outside the classroom, they lose this ability once they become
aware that they are being
tested. In addition to this, every test situation has elements that are not
related to the construct
being tested. This is referred to as construct irrelevant variance by Messick
(1989) and examples
may include the test rubrics or instructions, time constraints, and other
rules and regulations of
the test. All these are not present in the actual real-world situation and
must be considered as
aspect of indirectness. As such, we can only conclude that the test situation
is indirect because it
is inauthentic. And by being indirect, it fails to capture the true ability of the
students if they were
to perform in the real world.

e)Finally, we need to acknowledge that there is a degree of imprecision in


all tests. While we may
be able to justify some of the weightage in marks or points given to some
items, we will never be
able to be completely accurate and just. Even in a situation where there are
twenty multiple
choice items, each assigned one point, it is almost impossible to claim that
each one of the
twenty items are of equal difficulty. As such, we will not be able to justify
equal weightage of one
point for each item. It is this imprecision that must be acknowledged as
another constraint of
tests.In addition to the above, Herman et al. (1992) also point out other
limitations such as the
mismatch between test content and curriculum and instruction; the over
emphasis on routine
and discrete skills to the neglect of complex thinking and problem solving
skills; and the limited
relevance of major test formats such as the multiple choice format to either
classroom or realworld
learning
(pp. 5-6).
Advantages using portfolio as an assessment
(a) enhances student and teacher involvement in assessment;
(b) provides opportunities for teachers to observe students using
meaningful language;
(c) to accomplish various authentic tasks in a variety of contexts and
situations;
(d) permit the assessment of the multiple dimensions of language learning;
(e) provide opportunities for both students and teachers to work together
and reflect on
what it means to assess students language growth;
(f) increase the variety of information collected on students;
(g) make teachers ways of assessing student work more systematic.
Alternative assessment

Chapter 6
4 types of test
(a) Achievement test.
(b) Aptitude test.
(c) Proficiency test.
(d) Diagnostic test

Additional testing terminology


Several other terms are also important in testing. In language testing, these
include theterms: direct test and the indirect test; authentic tests,
performance tests, integrativeand discrete point tests as well as speeded
and power tests.

Authentic and performance testTESTS AND PERFORMANCE TESTS

Closely related to the distinction made between the direct and indirect tests
are authentic tests.

An authentic test is one in which the activity that is performed closely


resembles what is done
outside of the testing situation. For example, if a test requires that you take
down notes based on
a lecture, this task can be considered authentic as it reflects what you may
be required to do in
real life. Tests in which the task does not reflect a real life task are
inauthentic.
Performance based examinations are tests that assess your ability to
perform a specific task.The performance test, therefore, is like a direct test.
It is also authentic and can be considered as being similar to a simulation. It
can also be likened to the kinds of badges that the scoutmovement used to
give for being able to perform an action or demonstrate an ability or skill.
Applied to language teaching, we can perhaps imagine the different
communicative scenarios
possible and determine ones ability based on a performance test according
to that scenario.
SPEEDED AND POWER TESTS
It is also important to consider the terms speeded and power tests and how
they differ. A
speeded test is a test that is timed and emphasises the students ability to
complete tasks quickly.
A power test, on the other hand, focuses on the students knowledge and
provides them with
enough time to demonstrate this knowledge or ability.
What is the necessity for us to discuss speeded and power tests? It relates
to tests being measures
of behaviour, knowledge, or ability. As such a measure, we must be aware
of whether we have
been fair in providing our students the correct situation to demonstrate
their knowledge. A
timed or speeded test may not allow them to do so. In such a situation, a
power test could be
more relevant. Test takers may be provided with more than enough time to
complete the test as
the demonstration of their knowledge is more important than limiting the
time of the test. In a
different situation, however, one may be interested to know how well
students perform in an
actual social or natural situation. Such a situation may be constrained by
time demands. As such,

a speeded test would be a more relevant and appropriate situation.

Chapter 7
1 THE CLOZE TEST
The cloze test is a test that is often associated with language proficiency
testing. It is more than
simply filling in blanks in a passage as it has a theoretical basis. The term
cloze comes from the
word closure and reflects a psychoanalytical human tendency to close any
incomplete object. As
such, the cloze test is thought to elicit a respondents language competency
by requiring the
respondent to complete a passage which has been mutilated with blanks.
Although it was
initially intended to be a measure of reading ability, the cloze test has often
been considered as a
measure of overall general language proficiency.
There are many different types of cloze tests, two of the more common are
determined by how
the words in the passage are deleted in order to form blanks in the passage.
The fixed deletion
cloze is a cloze passage where every nth word in the passage is deleted. For
example, a cloze test
where n = 5 means that every fifth word after the first sentence is deleted.
This method is said to
help assess overall language proficiency as the types of words deleted are
thought to be
representative of language in general, given the fact that they have been
deleted on a more or less
random basis.
If the test maker intentionally deletes a certain kind of word, then the cloze
test is referred to as a
rational deletion cloze test. A rational deletion cloze test could involve the
deletion of only verbs,
for example. The number of words between every blank in a rational
deletion cloze test may not
consistently be the same. However, you may also find some cloze tests in
which the passage has
been altered so that only certain types of words are deleted at consistent
intervals. These cloze

passages, even if they consist of blanks that are spaced out equally, are still
rational cloze
passages as the deleted words were selected by the test maker.
7.1.1 THE STRUCTURE OF THE CLOZE TEST
The cloze test consists of a passage with blanks. The first sentence is left
intact without any
blanks. This is to ensure that the test takers have some context to work
with. It also provides
other information such as to indicate the tense of the passage. Normally the
cloze passage is long
enough to allow for about 20 blank spaces as a longer text would make it
extremely difficult.
difficulty level of the cloze procedure include the following:
(a) Length of the text: The longer the text, the more difficult the cloze
passage.
(b) Familiarity of vocabulary and structures: This includes the word
that is neededto fill
in the blank. For example, in a sentence such as The situation was _____
with danger, it is
highly unlikely that non native speakers would be able to provide the
correct word
fraught to fill in the blank.
(c) Length and complexity of the sentences: The longer and more
complex the sentence,
the more difficult it becomes for the student to complete the cloze.
(d) Familiarity with chapter and discourse genre: Familiarity with each
of these would
make the cloze easier.
(e) Frequency with which blanks are spaced: In this case, when the
blanks are closer
together, the more difficult the cloze passage becomes. Normally, the
number of words
between blanks or the N in a cloze passage is between 5 to 7 and seldom
less than 5.
Grading Clozen test

Exact Word Method


In the exact word method, only one answer is accepted for each blank. This
method of grading is
seen to be more objective and therefore more reliable. However, the exact
word method stifles

creativity. Take the following text for example:

Acceptable Word Method


The second method of scoring is the acceptable word method in which we
accept any suitable
answer. This method is more subjective and therefore less reliable.
However, it definitely does
not suffer from being unnecessarily rigid and stringent. A potential source
of problem that
should be noted is to decide the words that would be acceptable as answers.
Usually, in this
method, the cloze is pretested with native or near native speakers and the
responses given are
used as the acceptable answers 7.2 THE DICTATION OF TEST

Dictation of test
The dictation is a common form of assessment that many of us have
experienced. The dictation
is seen to have some commonalities with the cloze test, especially in that
both are considered to
be able to predict overall language ability. The dictation is also thought to
provide results that are
similar to those obtained in cloze tests but with the added ability of
assessing listening as well
(Hughes, 1972). In a standard dictation test, the teacher begins by selecting
an appropriate
passage. This passage is usually a short passage no longer than one
paragraph. This stage of the
dictation is an important one as the paragraph that has been selected must
be appropriate to the
students language ability as well as cultural background. After having
selected the passage, the
teacher can proceed with the dictation. 7.2.1 THE STRUCTURE OF THE
DICTATION

the structure of dictation

The dictation passage is usually read out three times. The first time it is
read out, it is done so at
a normal rate of reading. Students are expected to listen and get the gist of
the passage. The
second reading is a little slower and the students are expected to take down
what is read. During
the second reading, the teacher usually pauses to break the passage into
meaningful chunks

referred to as bursts. Finally, the passage is read a third time and students
are expected to check
their work, editing it for errors.

The dictation passage is usually read out three times. The first time it is
read out, it is done so at
a normal rate of reading. Students are expected to listen and get the gist of
the passage. The
second reading is a little slower and the students are expected to take down
what is read. During
the second reading, the teacher usually pauses to break the passage into
meaningful chunks
referred to as bursts. Finally, the passage is read a third time and students
are expected to check
their work, editing it for errors.

What makes dictation difficult


There are many factors that can contribute to the difficulty of taking down a
dictation. Some of
these factors are listed as follows:
(a) The length of the phrase or burst.
(b) The length of the pauses between bursts.
(c) The content of the dictation passage.
(d) The syntactic and structural properties of the sentences in the passage.
(e) Clarity of voice, expression and pace or tempo.
It is quite obvious that the longer the burst, the more difficult it becomes for
the student as he or
she will have to retain more in short term memory while taking down the
burst. However, the
longer the pause, the easier it becomes for the students.
How long should a pause be? It is recommended that during the pause we
silently read the burst
we have just read twice at normal speed.
When it comes to content, familiar content makes the dictation easier.
Jargon and highly
technical words are difficult for someone who is not trained in the particular
field and is
therefore unfamiliar with such words. It is also not surprising that complex
structures make
dictation more difficult.

For example, compare the following two sentences:


(1) Ali caught the woman who stole the money.
(2) The woman that Ali caught stole the money.
While the two sentences convey more or less the same meaning, the second
sentence is a little
more complicated as its structure is unfamiliar. Furthermore, the woman
which is the subject of
the verb stole does not appear immediately before the verb. The student is
therefore required to
make a long distance connection between the subject and the verb.
Finally, it should also be
mentioned that clarity of voice is important in dictations. Together with this,
you may also
include facial expressions as the more animated the person dictating, the
more cues are provided
to the students.

Variants of dictation test


In addition to the standard dictation that most of us are familiar with, there
are several variants
of the dictation procedure. Most notably among these are:
(a) the graded or graduated dictation;
(b) the partial dictation;
(c) the dictocomp.

Graded or Graduated Dictation


The graded dictation is simply a technique where the dictation passage
becomes progressively
more difficult. This is done by gradually increasing the number of words in a
burst. A burst is the
number of words the tester dictates between pauses and repeats. The
dictation may begin with a
burst consisting of two words and the number of words slowly increases
until there can be up to
thirteen or fourteen words in a burst. Normally the processing load
becomes too high when a
burst exceeds seven words. However, better and more proficient students
will be able to handle
seven words and perhaps even more by chunking words that collocate
naturally.

Partial Dictation
The partial dictation is essentially like a listening cloze activity. Students are
provided the passage
with some words or phrases deleted. They are expected to listen to a
passage and fill in words or
phrases. It is commonplace to have partial dictations in which single words
or even short phrases
are deleted.

Dictocomp
Finally, in the dictocomp, the students are expected to use the information
they hear to construct
a coherent piece of composition instead of taking down the passage exactly
as it was dictated.The teacher will determine the key elements of the
original passage which the student is expected
to include in the composition. Therefore, the dictocomp can be said to test
listening
comprehension in a very specific way in that the student has to decide what
pieces of
information are important and should be included. This is reminiscent of
summaries.
Additionally, the dictocomp also tests writing ability as well because the
students are expected to
write a cohesive piece based on the passage that was dictated to them.
Chapter 99.

discrete point or integrative DISCRETE POINT TESTS


AND INTEGRATIVE TESTS

Language tests may also be categorised as either discrete point or


integrative. Discrete point tests
examine one element at a time. Integrative tests, on the other hand,
requires the candidate to combine
many language elements in the completion of a task (Hughes, 1989: 16). It
is a simultaneous measure of
knowledge and ability of a variety of language features, modes, or skills.
A multiple choice type test is usually cited as an example of a discrete point
test while essays are
commonly regarded as the epitome of integrative tests. However, both the
discrete point test and
the integrative test are a matter of degree. A test may be more discrete
point than another and

similarly a test may be more integrative than another. Perhaps the more
important aspect is to be
aware of the discrete point or integrative nature of a test as we must be
careful of what we
believe the test measures.
This brings us to the question of how discrete point is a multiple choice
question type item?
While it is definitely more discrete point than an essay, it may still require
more than just one skill
or ability in order to complete. Lets say you are interested in testing a
students knowledge of the
relative pronoun and decide to do so by using a multiple choice test item. If
he fails to answer
this test item correctly, would you conclude that the student has problems
with the relative
pronoun? The answer may not be as straight forward as it seems. The test is
presented in textual
form and therefore requires the student to read. As such, even the multiple
choice test item
involves some integration of language skills as this example shows, where in
addition to the
grammatical knowledge of relative pronouns, the student must also be able
to read and
understand the question.
Perhaps a clearer way of viewing the distinction between the discrete point
and the integrative
test is to examine the perspective each takes toward language. In the
discrete point test, language
is seen to be made up of smaller units and it may be possible to test
language by testing each unit
at a time. Testing knowledge of the relative pronoun, for example, is
certainly assessing the
students on a particular unit of language and not on the language as a
whole. In an integrative
test, on the other hand, the perspective of language is that of an integrated
whole which cannot
be broken up into smaller units or elements. Hence, the testing of language
should maintain the
integrity or wholeness of the language.
Multiple choice

The multiple choice format is perhaps the most common test format to many
of us. It is also
commonly referred to as an objective test as there is seen to be objectivity
in grading the test.
In this section, we will examine the multiple choice format with respect to
its structure, use, and
construction.
There are a number of situations in which a multiple choice format test may
be useful and
appropriate. Ory outlines some of these situations as follows:

When there is a large number of students taking the test.


When you wish to reuse the questions in the test.
When you have to provide the grades quickly.
When highly reliable test scores must be obtained as efficiently as
possible.
When impartiality of evaluation, fairness and freedom from possible
test scoring
influences such as fatigue are essential.
When you are more confident of your ability to construct valid
objective test items
clearly than of your ability to judge essay test answers fairly.
When you want to sample a wide range of content.
When you are especially interested in measuring particular learning
objectives such as
comprehension, recognition, and recall.
When you want specific information especially for diagnostic
feedback.

It should be noted that these situations reflect the advantages of


using the multiple choice
question format. These advantages include:
the ability to create a test item bank;
quick grading;
high reliability;
objective grading;
wide coverage of content;
precision in providing information regarding specific skills and
abilities.
Negative effect of MCQ

The technique tests only recognition knowledge and recall of facts.

Guessing may have a considerable but unknowable effect on test


scores.
The technique severely restricts what can be tested to only lower
order skills.
It is very difficult to write successful items, due especially to the
difficulty of finding
good distractors.
Backwash can be harmful for both teaching and learning as
preparing for a multiple
choice format test is not reflective of good language teaching and
learning practice.
Cheating may be facilitated.
It places a high degree of dependence on the students reading ability
and instructors
writing ability.
It is time consuming to construct.

Chapter 8
Essay
Unlike the directed writing task, the continuous writing test item provides
little structure other than the question itself. Students are expected to draw
upon their experience and past knowledge as well as knowledge of writing
conventions and organisation in order to complete the task.
The essay test format provides several advantages compared to the multiple
choice test format.
Some of these advantages as mentioned by Kubiszyn and Borich (2000:18)
are:
(a) It can assess higher order skills. Unlike the multiple choice test
format which
is often limited to assessing low order skills, the essay places a premium on
the
ability to analyse, synthesise and evaluate through topics that require
students to

express their opinions or argue a point.


(b) Emphasises communication skills. This is especially important when
we
consider that communication skills is an important aspect of social relations.
(c) Eliminates guessing. The multiple choice question format is notorious
for
allowing students to guess. In the essay format, however, guessing is
unlikely to
occur.
(d) Relatively easy to construct. An essay question can be constructed
within
minutes compared to other test formats which can even take days to
construct..

scoring essay
As we have seen earlier, scoring an essay is not easy as graders can be
easily swayed by many
factors. Scoring remains one of the major issues in grading essays. There
are generally three
major approaches to scoring essays which are the holistic scoring method,
the analytical scoring
method, and the objective scoring method.

Holistic Scoring
In holistic scoring, the reader reacts to the students compositions as a
whole and a single score
is awarded to the writing. Normally this score is on a scale of 1 to 4, or 1 to
6, or even 1 to 10.
(Bailey, 1998 : 187). Each score on the scale will be accompanied with
general descriptors of
ability. The following is an example of a holistic scoring scheme based on a
6 point scale.
The 6 point scale above includes broad descriptors of what a students essay
reflects for each
band. It is quite apparent that graders using this scale are expected to pay
attention to vocabulary,
meaning, organisation, topic development and communication. Mechanics
such as punctuation
are secondary to communication.

Bailey also describes another type of scoring related to the holistic


approach which she refers to
as primary trait scoring. In primary trait scoring, a particular functional
focus is selected which is
based on the purpose of the writing and grading is based on how well the
student is able to
express that function. For example, if the function is to persuade, scoring
would be on how well
the author has been able to persuade the grader rather than how well
organised the ideas were, or
how grammatical the structures in the essay were. This technique to
grading emphasises
functional and communicative ability rather than discrete linguistic ability
and accuracy.

Analytical Scoring
Analytical scoring is a familiar approach to many teachers. In analytical
scoring, raters assess
students performance on a variety of categories which are hypothesised to
make up the skill of
writing. Content, for example, is often seen as an important aspect of
writing i.e. is there
substance to what is written? Is the essay meaningful? Similarly, we may
also want to consider
the organisation of the essay. Does the writer begin the essay with an
appropriate topic sentence?
Are there good transitions between paragraphs? Other categories that we
may want to also
consider include vocabulary, language use and mechanics. The following are
some possible
components used in assessing writing ability using an analytical scoring
approach and the
suggested weightage assigned to each:

The points assigned to each component reflect the importance of each of


the components.

Objective Scoring
A third type of scoring approach is the objective scoring approach. This
scoring approach relies
on quantified methods of evaluating students writing. A sample of how
objective scoring is
conducted is given by Bailey (1999) as follows:
Establish standardization by limiting the length of the assessment: Count
the first 250 words of
the essay.
Identify the elements to be assessed: Go through the essay up to the 250th
word underlining
every mistake from spelling and mechanics through verb tenses,
morphology, vocabulary, etc.
Include every error that a literate reader might note.
Operationalise the assessment: Assign a weight score to each error, from 3
to 1. A score of 3 is a
severe distortion of readability or flow of ideas; 2 is a moderate distortion;
and 1 is a minor error
that does not affect readability in any significant way.
Quantify the assessment: Calculate the essay Correctness Score by using
250 words as the
numerator of a fraction, and the sum of error scores as the denominator:
The denominator is the
sum of all the error scores:

The steps described above help to provide a clear and systematic method
for assessing essays.
Objective scoring does not necessarily need to use the same values as in
this example. The most
important element in this approach is the objective scoring which is
determined through the
unbiased and fixed values provided according to some concrete aspect of
the essay such as the
number of mistakes made.

Familiarisation with a grading scale is an important step in achieving valid


and accurate scores. If
bands are used, there is an obvious need to fully understand what each
band signifies. This stage

of the grading process should therefore be given due consideration and not
ignored. There are
enough incidents of graders jumping the gun and assessing essays
without first becoming
familiar with the scoring criteria. This may only result in having to grade
the paper again.
The purpose of identifying benchmark papers or anchor papers is to provide
a clear and
representative example of students work according to the grading criteria.
Bands can only give a
general description of what is expected. Anchor or benchmark papers
provide concrete examples
and help ensure fairness in grading.
When it comes to the actual grading, some recommend that we first quickly
scan through all the
essays and place them in stacks according to the bands on the scale. All
papers which we
consider A papers will be stacked together, the B papers will be together
and so on. We can then
read each paper more closely in order to confirm our initial impression. If
we need to assign
more precise numerical scores, we can do so at this time. Another pointer in
grading essays,
especially when there are several essays, is to grade all the students on one
essay first before
moving on to the next essay. This is expected to help ensure more
consistent grading.

You might also like