Topic 1 Linguistic Evaluation

1
TOPIC 1
LINGUISTIC EVALUATION: CONTEXT, HISTORY, ISSUES AND TRENDS
table of Contents
1 INTRODUCTION......................................................................................................2
2 THE INTEREST IN EVALUATION........................................................................2
2.1 The nature and quality of evidence.....................................................................3
2.2 The effects of evaluation on students.................................................................4
2.3 The fairness of testing with minorities...............................................................6
3 BRIEF HISTORY OF LINGUISTIC EVALUATION..............................................6
3.1 The pre-scientific trend.......................................................................................6
3.2 The psychometric-structuralist trend..................................................................7
3.3 The integrative-sociolinguistic trend................................................................10
3.4 The communicative trend.................................................................................11
4 TECHNOLOGICAL ADVANCES IN LINGUISTIC EVALUATION..................12
5 BIBLIOGRAPHIC REFERENCES.........................................................................13
2
1 INTRODUCTION
In this introductory topic I am going to discuss the following aspects:
1. The causes of the growing interest in educational evaluation in general and in

foreign language evaluation in particular and three areas that have caused
intense controversy in educational evaluation in general: (1) the nature and
quality of tests, (2) ) the effects of assessment on students and (3) fairness to
minorities.
2. The history of linguistic evaluation.
3. The influence of new information and communication technologies on linguistic
evaluation.
The specific learning outcomes that the student must achieve at the end of this topic
are:
1. The student defines, uses and relates a series of general concepts of educational
evaluation.
2. The student defines, uses and relates a series of concepts that have appeared
throughout the development of the linguistic evaluation.
3. The student defines, uses and relates a series of concepts related to the use of
new information and communication technologies in linguistic evaluation.
2 THE INTEREST IN EVALUATION
Global competition, regional, national and international evaluation programs, new

national and regional curricula and the incorporation of representatives from all sectors
of the school community to the management and control bodies of educational centers
have contributed to the increase in importance of educational evaluation, in general, and
linguistic evaluation, in particular. However, the reliance on assessment instruments has
also given rise to debates about the fairness of the uses and interpretations of assessment
instruments.
Most educational administrations require testing in certain courses, which are
sometimes instruments developed expressly by the administration itself. In Spain, for
example, Organic Law 2/2006, of May 3, on Education (LOE) (Spain, 2006) establishes
the obligation to carry out two general diagnostic evaluations of the basic competencies
achieved by students: an evaluation at complete the second cycle of primary education
(art. 21) and another at the end of the second year of compulsory secondary education
(art. 29). The LOE also states that the Evaluation Institute and the corresponding
organizations of the autonomous communities will collaborate in carrying out these
general diagnostic evaluations (art. 144.1). The tests sometimes originate from
participation in national or international studies, such as the Program for International
Student Assessment (PISA) tests (Organization for Economic Co-operation and
Development, n.d.) or in evaluations of educational programs, such as the Andalusian
Language Adaptation Program (Andalusia, 2005, p. 31). Another type of educational
test that our readers have surely “suffered” is the test to enter a university.
Language assessment instruments have played a very prominent role in discussions
about the state of foreign language teaching around the world and in reforms of foreign
3
language teaching systems. Language tests have been the center of intense debate for a
multitude of reasons: accusations that the tests were biased against minorities or that
they influence teaching in an undesirable way, by paying too much attention to certain
types of content to the detriment of others, etc. Considering the importance of
assessment in language teaching practice, and the associated issues and debates, it is
essential that teachers understand the design, uses and abuses of language assessment
instruments.
Decisions about the choice of an educational test, about a call, or about the use of a
linguistic test, or about educational tests in general, are no longer of interest only to
teachers. Currently, society demands effectiveness in foreign language teaching
programs. This increased concern about issues related to language testing stems, in part,
from awareness of the social consequences of testing, especially the danger that certain
tests pose to the rights and opportunities of certain individuals and groups. This concern
has taken the form of attacks on testing, the testing industry and the new standards
governing testing, or requests for postponement of the implementation of new
assessment tools, or accusations that tests are biased and discriminatory. In reality, there
are many compelling reasons to be concerned about the social consequences of
evaluation. However, it is important to distinguish between, on the one hand, the
negative consequences for individuals or groups that originate from failures in the
evaluation instruments and, on the other, failures caused by misinterpretation or misuse
of test scores.
Linn and Gronlund (2000, p. 18) mention three areas that cause controversy in
educational evaluation, and that are perfectly applicable to linguistic evaluation: (1) the
nature and quality of tests, (2) the effects of evaluation in students and (3) justice with
minorities.
2.1 The nature and quality of evidence
In the early 1960s, some authors, such as Hoffman (1962, p. 22), argued that multiple-
choice items penalized the most intelligent, original, or “exceptional” people. Hoffman
(1962) supported his claims with a review of standardized test items that showed that
some highly creative students high in the ability being tested were likely to make
interpretations that had not been anticipated by the test designers. 1 . Hoffman (1962, p.
17), for example, included the following letter, addressed to the editor of the Times:
Dear Sir:
Among the “mark the different element” questions my son had to answer in a school
entrance test was: “What is the different element in cricket, football, billiards and hockey?”
1
Davies et al. (1999, p. 187) define a standardized test as follows:
A test that ideally has the following characteristics, although so-called standardized
linguistic tests do not always have all of these characteristics:
 A rigorous development, testing and review process, which determines the metric
properties of the test...
 Standard procedures for the call and scoring of the test.
 The content of the test is standardized in all versions. This content is based on a set
of test specifications that may reflect a theory of linguistic competence or a
conception of the expected needs of candidates. Alternative forms of the test are
examined to see if there is equivalence in content.
4
I said billiards because it is the only game that is played inside a building. A classmate said
soccer because it is the only one in which the ball is not hit with an instrument. A neighbor
said cricket because in other games the objective is to get the ball into a net; and my son,
with the confidence that comes with nine springs, decides on hockey because “it's the only
game for girls.”
Although Hoffman's (1962) criticisms were widely echoed, Hoffman also encouraged
test authors to add careful, logical item analysis to the statistical analysis of the items.
Frederiksen (1984, p. 199) observed that problems in standardized tests are usually well
structured , that is, “they are clearly expressed, all the information necessary to solve
the problem is available in the problem or - presumably - in the head. of the student, and
there is an algorithm that guarantees a correct solution if it is applied properly.”
However, most of the important problems one faces in life are poorly structured , that
is, they are
complex, without defined criteria to determine when the problem has been solved, without
all the information necessary to solve the problem, and without a 'legal move generator' to
find all the possibilities at each step during the resolution of the problem ( ibid. ). .
These criticisms have led to greater emphasis on open-ended questions and on designing
tests that use computer simulations.
Much of the misinterpretation and misuse of test scores would be avoided if the test
user were aware of the limited nature of the information a test provides. A good test
user takes into account the error that may exist in the test scores and uses information
other than the test score when making their decision. Claiming that better decisions are
made without test scores is claiming that better decisions are made when there is less
information. Test scores are certainly fallible, but they are probably less fallible than
most other types of information used to make educational decisions.
2.2 The effects of evaluation on students
Critics of assessment claim that assessment has undesirable effects on students. Some of
the most frequently mentioned criticisms of the use of tests appear below, followed by a
few brief comments.
Criticism 1: Tests cause anxiety
There is no doubt that anxiety increases during a test. For most students, assessment
forces them to do better. For a few, the anxiety caused by the test may be so high that it
interferes with performance on the test. These students usually have high anxiety and
the test simply increases their anxiety level. Different procedures can be used to reduce
test anxiety, such as thorough preparation before the test, rehearsing the test, and
providing enough time for the student to take the test with some peace of mind.
Fortunately, in recent years the designers of many tests also provide versions for the
student to rehearse and there has been a shift from speed tests to power tests. This
should help, but it is still necessary to carefully observe the students during the test and
reflect on the scores obtained by students for whom the test produces a high level of
anxiety.
5
Criticism 2: Tests classify students
The classification of individuals can become a serious problem, especially if the

classification in a certain group is a simple excuse for not giving the treatment that the
student needs to achieve greater learning. When students are, for example, classified as
mentally retarded or as “immigrants,” this influences the way teachers and peers regard
them, the way they see themselves, and the choice of school. educational program
provided to them. If students are mistakenly considered competent in Spanish, as is the
case with some members of ethnic minorities, the problem worsens.
Dividing students into different groups can increase efficiency in managing
classroom time and school resources. However, any classification must take into
account that the test measures only a limited sample of a student's abilities and that
students are constantly changing. Test users should note that rankings based on this test
are provisional and flexible. When categories are considered permanent, then
classification does constitute a major problem. But in this case, the culprit is not the test,
but the user of the test.
Criticism 3: Tests damage students' self-concept
There are teachers who attribute stereotypes to students based on test scores, which can
have an undesirable effect on the students' self-concept. It also happens that the student
develops a general feeling of failure from a low score. Teachers must explain to
students who receive low scores that tests are limited measures and that our
competencies (and, therefore, scores) change. Furthermore, the development of the
feeling of failure can be limited if the positive aspects that the student shows in the test
are mentioned. Testing can help students identify their strengths and weaknesses,
thereby contributing to better learning and a positive self-image.
Critique 4: Tests influence teachers' expectations, which, in turn, influence

students' own expectations.
Those who use this criticism maintain that when a teacher assigns a score to a test the
following process occurs:
1. Test scores create teachers' expectations about each student's learning.

2. The teacher teaches each student based on these expectations.
3. Students respond by placing themselves at the expected level.
Therefore, those who are expected to achieve more, achieve more, and those who are
expected to achieve less, achieve less. This effect, called the Pygmalion effect , was
studied by Rosenthal and Jacobsen (1968), although the study was later questioned by
other researchers (Elashoff and Snow, 1971; West and Anderson, 1976). It is widely
believed that teacher expectations increase or hinder a student's achievement.
In short, there is some reason in the various criticisms about the undesirable effects
of testing on students. But in most cases these criticisms should be directed at the users
of the tests, rather than at the tests themselves. The same people who misuse test results
are likely to misuse other information, which is probably less accurate and objective.
Therefore, the solution is not to stop using tests, but to start using tests and other data
6
more effectively. When tests are used in a positive way – that is, to help students
improve their learning – the consequences are likely to be beneficial.
2.3 The fairness of testing with minorities
The issue of fairness to racial and ethnic minorities is critical in any assessment
program. Fairness has received increasing attention in the language assessment
literature over recent years. The term justice is related, according to Linn and Gronlund
(2000, pp. 21-22), with different concepts:
1. Absence of bias : According to the Association of Language Testers in Europe

Members (1998, p. 204), “a test or item is biased if a particular group of the
population of candidates is favored or disadvantaged because of a characteristic
of the test or of an item that is not relevant to what it is intended to measure.”
The origin of the bias can be linked to sex, age, culture, etc.
2. Fairness in procedure , which is related to questions such as: do examinees
have the same opportunities to demonstrate what they know on the test? Are
essay responses scored consistently by graders, regardless of the examinee's
ethnic or racial background?
3. Opportunity to learn : Obviously, a test that evaluates something will be
considered unfair if only some examinees have had the opportunity to learn the
material being evaluated.
4. Equality in results : from this perspective, a test is fair if the average scores of
each group (for example, Spanish speakers, Catalan speakers, Chinese, Africans,
Latinos) are equal.
Different concepts can lead to quite different conclusions about the fairness of any
test or assessment instrument. The fourth concept, equality of results, is incompatible
with other principles of assessment, such as the goal of achieving a reliable and valid
measure of what students know, regardless of their origin or ethnic group. If different
groups of students differ in the instruction they have received, in their experiences in
and out of school, and in their interests and effort, a test or assessment instrument that
provides different mean scores for minority groups and for the group Majority may
reflect the consequences of unfair treatment of minorities by society.
An absence of bias and procedural fairness are essential for an evaluation to have a
high degree of validity.
3 BRIEF HISTORY OF LINGUISTIC EVALUATION
Bernard Spolsky (1978, v) distinguished three trends in linguistic evaluation, which

coexist today:
1. The pre-scientific trend.

2. The psychometric-structuralist trend.
3. The integrative-sociolinguistic trend.
3.1 The pre-scientific trend

7
For Spolsky (1978, v), the pre-scientific tendency, which still prevails in many places in
the world, can be characterized by an absence of concern for statistical issues or for
notions such as objectivity and reliability:
In its simplest form, it presupposes that we can and should rely entirely on the judgment of
an experienced teacher, who can tell what grade should be given after a conversation of
several minutes, or after reading the response to an essay (Spolsky, 1978 , p.v).
In the pre-scientific trend, it is difficult to find oral exams and the exams usually consist
of open questions that must be answered in writing. These exams usually include:
1. Fragments that must be translated to or from the foreign language.

2. Free essays in the foreign language.
3. Grammatical, textual or cultural items.
In this movement, the construction of linguistic tests is a task assigned to language

teachers or, in certain situations, former language teachers who have gone on to work as
examiners. For those who have a conception of evaluation based on the pre-scientific
tendency, teachers do not need special training: “if a person knows how to teach, it is
assumed that he can assess the competence of his students” (Spolsky, 1978, pp. v-vi).
3.2 The psychometric-structuralist trend
The psychometric-structuralist tendency is characterized by the interaction of two types

of experts, who agree that evaluation can be “precise, objective, reliable and scientific”
(Spolsky, 1978, vi):
1. The evaluators, that is, the psychologists responsible for the development of modern
theories and techniques of measurement in education, whose main objective is to
provide objective measurements through the use of different statistical techniques,
which allow the scores to be reliable and the interpretations that we make from the
scores are valid:
The form of the tests… is determined primarily by the need to evaluate the reliability
and validity of the tests. This is why, for example, the multiple choice response
technique is so common. In linguistic evaluation this means that we normally resort to
the skills of writing and listening comprehension (Ingram, 1968, p. 74).
The evaluators had noticed the poor reliability of traditional exams (Pilliner,
1968, p. 27). Starch and Elliott (1912), for example, observed that the scores that
142 English teachers had assigned to one test ranged between 64 and 98, while on
another test the scores ranged between 50 and 98 (Starch, 1913, p. 630 ). Starch
(1913, ibid. ) made theBoard1 based on the scores assigned by ten professors to 10
final English tests of the first year of the University of Wisconsin, in which we can
appreciate the great disparity in the scores that the professors ( instructors ) assign
to a test ( paper ) carried out by the same student. Teacher 4, for example, assigns a
score of 20 to the test taken by Student 4, while Teacher 8 assigns a score of 68 to
this same test:
8
Board1 Scores assigned by 10 instructors to a sample of 10 final English tests ( papers ) from the first
course at the University of Wisconsin (Starch, 1913, p. 630).
Educational evaluators have developed different types of items, such as multiple

choice items, which make it possible to check with relative ease whether the scores
are reliable, and a series of techniques whose objectives are to ensure that the scores
that the different correctors assign are more reliable. In this trend, the quantification
of reliability and validity in tests is considered to be of utmost importance.
Spolsky (1978, p. vi) mentioned two problems with linguistic tests developed
exclusively by evaluators:
1. New types of tasks (such as the task in which the examinee answers by
choosing one option from among several possible options) require a written
response, which limits the linguistic assessment to writing and listening
comprehension activities. Agard and Dunkel (1948), for example, stated that
the only tests available were written tests of vocabulary, reading and
grammar and that none of these tests evaluated oral production and
comprehension skills ( cit. in Spolsky, 1978, p. saw; Fulcher, 1999, p. 391).
2. A test developed exclusively by evaluators does not take into account new
concepts, procedures and discoveries in language teaching and learning.
2. Experts with training in educational evaluation and linguistics. Already in the 1950s
there were voices that recommended the combination of knowledge from
educational evaluation with linguistic knowledge for the construction of linguistic
tests. Robert Lado (1950), for example, applied this combination of knowledge to
the design of English achievement tests for Latin American students and concluded
the following in his doctoral thesis:
Several conclusions are obtained. These conclusions are (1) that there is a great delay
in the measurement of English as a foreign language, (2) that the delay is related to
unscientific conceptions of the language, (3) that the science of language should be
used in the definition of what to teach... The study provides procedures for the
application of linguistics to the development of foreign language tests (Lado, 1950,
cit. in Carroll, 1953, p. 195).
For Carroll (1953, p. 195), the delay existed, in reality, in “the entire measurement
of foreign languages.” Throughout the 1950s and 1960s Lado refined his concepts
of linguistic assessment and in 1961 published Language Testing , a book aimed at
“teachers of foreign languages and English as a foreign language,” which is based
on the assumption that “ linguistic knowledge” is a “main contribution” to linguistic
9
evaluation, that is, for Lado (1961, p. vii) linguistic tests had to take into account
“the development of modern linguistic linguistics during the last thirty-five years” .
According to Spolsky (1978, p. vii), during the 50s and 60s the structuralist
conception of language, psychological theories and the practical needs of evaluators
were combined. On the one hand, the designers of linguistic tests needed extensive lists
of items that would allow the selection of certain items, which would be included in
objective tests, while, on the other, the structuralist linguists were describing language
as a system composed of elements that are combine with each other. In American
structural linguistics of the 1950s, a series of hierarchical levels were postulated in the
study of language, composed of a series of units, from whose combination the units of
the higher level emerged. Lado (1961, p. 25), for example, stated that “language is
constructed from sounds, intonation, accent, morphemes, words and combinations of
words.” Through this combination of the structural vision of language and objective
educational evaluation procedures, the path was clear towards the construction of an
objective test with multiple choice questions based on structural linguistics. The
linguistic elements can be evaluated, according to Lado (1961, p. 204), isolated or in
combination in an “integrated skill”, such as oral comprehension ( listening ), oral
comprehension ( reading ), oral production ( speaking ), writing ( writing ) or
translation ( translation ). Below I present two items that appear in Lado (1961), which
evaluate isolated elements and combined elements:
Example of multiple choice item to assess control of grammatical structures:
Mr. Martin visited the professor. John saw them...
(1) Mr. Martin knows who visited John.

(2) John knows who visited Mr. Martin.
(3) The professor knows who John visited.
(4) John knows who Mr. Martin visited.
(5) John knows who the professor visited.
(Side, 1961, p. 159)
Example of multiple choice item to assess integrated reading skill:
The sky highway above the top of the world has become the touchstone of the history of
intercontinental travel, ushering in a new age in commercial aviation (Map of Scandinavian Airlines
Routes)
1. “Highway” in this sentence means
(A) 'automobile road'

(B) 'route for airplanes'
(C) 'group of stars'
(D) 'animal with wings'
(Side, 1961, p. 235)
In 1961 Carroll (1961[1965], p. 370) distinguished in linguistic evaluation between

the dicrete structure- point approach and the integrative approach :
1. In the discrete structural points approach, designers construct tests that

evaluate
10
very specific items of linguistic knowledge and skill that have been sensibly selected
from the generally enormous pool of possible items... It is the type of approach that
is necessary and recommended... where knowledge of structure and lexicon,
auditory discrimination and the oral production of sounds, and the reading and
writing of symbols and individual words (Carroll, 1961[1965], p. 369)2 .
2. The integrative approach to linguistic assessment:
The four skills of oral comprehension ( listening ), oral production ( speaking ),

reading, and writing must also be considered integrated performances that require
the candidate to master the language as a whole, that is, its phonology, structure and
lexicon. It is worth specifying the desired level of proficiency in each of them...
because each is related to elements of response speed... I do not believe... that
linguistic assessment (or the specification of linguistic proficiency) is complete
without the use of ...an approach that requires an integrated and fluid performance
by the examinee... I recommend tests in which less attention is paid to certain
structural points or certain vocabularies than to the total communicative effect of an
utterance. For example, I have had great success in determining levels of
audiolingual training through a listening comprehension test in which sentences of
increasing length and speed presented auditorily must be associated with the
corresponding drawing, of the four presented. The examinee is not interested in the
specific structural points or in the specific lexicon, but in the total meaning of the
sentence, regardless of the way in which he can understand it (Carroll, 1961[1965],
pp. 369-370).
This emphasis on an integrated approach makes Carroll, according to Spolsky (1978, p.

ix), the first supporter of the integrative-sociolinguistic tendency, which is the object of
the following section.
3.3 The integrative-sociolinguistic trend
According to Carroll (1961[1965], p. 370), the integrative approach had certain

advantages that the psychometric-structuralist tendency did not have:
1. The items or tasks that constitute a test designed according to the integrative-
sociolinguistic tendency are selected from a set that is broader than the set from
which the items or tasks of a psychometric-structuralist test are selected.
According to Carroll, this is an advantage, since it facilitates the construction of
a test that is independent of the curricula that the examinees who are going to
take the test have followed.
2. It seems that it is easier to relate the tasks of an integrative-sociolinguistic test
with different levels of competence.
2
Oller (1979, p. 37) defined a discrete point test as a test “that attempts to concentrate attention on one
point of grammar at a time”:
Each test item targets a single element of a given component of a grammar (or perhaps we
should say a grammar that is postulated), such as phonology, syntax, or vocabulary.
Furthermore, a discrete item test is intended to assess only one skill at a time (e.g., listening
comprehension, or oral production, or reading, or writing) and only one aspect of a skill
(e.g., productive in instead of receptive or oral instead of visual). Within each skill, aspect,
and component, discrete items supposedly target exactly one and only one phoneme,
morpheme, lexical item, grammatical rule, or whatever the corresponding item is (Oller,
1979, p. 37).
11
3. In an integrative approach, it is not so necessary to carry out a contrastive

analysis between the first language of the examinees and the second one
evaluated in the test.
According to Spolsky (1978, p. ix), the integrative-sociolinguistic tendency is

characterized, among other things, by:
1. The use of cloze tests , which Association of Language Testers in Europe

Members (1998, p. 198) define as a “type of activity that consists of filling in
gaps in a text in which entire words have been deleted”, and dictations , a “type
of examination activity in which the candidate has to listen to a text and write
the words heard” ( ibid. , p. 179). The reason why this type of tasks are included
in an integrative-sociolinguistic test has its origin in the assumption that "in the
normal use of the language... it is always possible to partially predict what will
come next" (Oller, 1979, p.25). Therefore, the inclusion of cloze tests and
dictations allows the examinee to use this predictive ability.
2. The importance given to the evaluation of communicative competence. Over
time, the concept of linguistic competence became increasingly complex and has
been replaced by the concept of communicative competence. Linguists realized
that language is not made up only of “sounds, intonation, stress, morphemes,
words and combinations of words” (Lado, 1961, p. 25). Hymes (1972, p. 281),
for example, stated that there are four different types of rule systems underlying
linguistic behavior, which are reflected in the four types of judgments that a
communicatively competent speaker can make about language:
1. If (and to what degree) something is formally possible ;

2. Whether (and to what degree) something is feasible given the available means of
implementation;
3. Whether (and to what degree) something is appropriate (adequate, well-adapted,
successful) in relation to the context in which it is used and evaluated.
4. If (and to what degree) something is actually done, something is actually done ,
and what does this action entail?3 .
Later, Canale and Swain (1980, pp. 28-31) and Canale (1983, 338-342)
developed their concept of communicative competence, which has been very
influential in linguistic evaluation.
3.4 The communicative trend
Other authors have divided the evolution of linguistic evaluation in a slightly different
way than Spolsky (1978). James Dean Brown (2005, pp. 19-24), for example,
distinguishes four movements in linguistic evaluation, which coexist today: (i) the pre-
scientific movement, (ii) the psychometric-structuralist movement, (iii) the integrative-
sociolinguistic movement, and (iv ) the communicative movement, while Elana
Shohamy (1997, p. 141) distinguishes three periods in the history of linguistic
evaluation: the period of discrete points, the integrative period and the communicative
period.
The communication trend, which began in the United Kingdom and later spread to
the United States, is based on three principles:
3
In italics in the original.
12
1. Learning a language should be interactive.

2. The situations in which a language is learned must be very similar to the
situations in which the subjects are going to use the language (Shohamy, 1997,
p. 142).
3. The use of linguistic performance tests ( performance assessment ), that is, tests
that require “candidates to produce a sample of language, written or spoken (for
example, essays and oral interviews). These procedures are designed to
reproduce the performance as it occurs in real communication contexts”
(Association of Language Testers in Europe Members, 1998, pp. 198-199).
The specific background of this movement is composed of propositions taken from

various fields of language teaching, such as the notional-functional approach to
language teaching or language teaching for specific purposes. The concept of
communicative competence by Canale and Swain (1981) and Canale (1983) has also
influenced the design of tests designed within the communicative tendency.
4 TECHNOLOGICAL ADVANCES IN LINGUISTIC

EVALUATION
With the increasing availability and power of microcomputers at a relatively low price,
it is not surprising that the use of computer programs to assess the linguistic competence
of individuals has become widespread. Some of you may even have already taken, for
example, the DIALANG tests ( www.dialang.org ).
Using a computer to present the items of a linguistic test can have several
advantages. For example, instead of having to take the test on the day of the call,
examinees can request to take it at a time that best suits their needs. Additionally,
instead of having to wait several weeks to receive test results, scores can be obtained
immediately. Pearson Driving Assessment (2007) cites the following advantages of
computer-based assessment:
 The ability to perform testing when the candidate requests it and when it is convenient for the
candidate.
 The possibility of creating questions that can be stored in “question banks” and presenting these
questions randomly, reducing “serial” evaluation, that is, the need to evaluate all candidates on
the same day at the same time.
 The disappearance of complex logistical problems, such as the distribution, storage and tracking
of exam forms.
 Tests can be performed without an Internet connection, thus minimizing the risk of system
failures.
 Reduction of effort and time when correcting and reporting results.
 Instant results and immediate diagnostic feedback, indicating the candidate's strengths and areas
for improvement.
Although these advantages are important, the most significant changes have
occurred as a result of the fact that the computer can easily do things that are not easy
with a pencil and paper test. The technology allows, for example, to introduce video
recordings or pose problems that force students to use the Internet, which adds all the
advantages that these technologies can provide during the teaching and evaluation
processes.
The most widespread change in linguistic assessment has been the use of the
computer to perform adaptive tests , that is, tests in which the choice of the next item is
based on the examinee's previous responses, such as the DIALANG tests. Adaptive
13
testing can increase the quality of the information available and, therefore, of the
decisions made based on the available information. An adaptive test typically begins
with the presentation of an item believed to be of medium difficulty for the examinee.
The second and subsequent items are determined by the examinee's previous responses.
In general, if a test taker answers an item correctly, the program next selects a slightly
more difficult item. And, conversely, a slightly easier item is presented after an
incorrect answer. The test ends when the test taker's estimates of performance reach a
predetermined level of accuracy or when a specified number of items have been
presented. It has been shown that adaptive assessment can increase the efficiency and
accuracy of measures of certain types of concepts, skills, and abilities. In some cases,
adaptive tests can achieve the same level of reliability as a conventional pencil-and-
paper test, but in half the time.
However, you will not understand the full potential of using computers during the
assessment process if you only consider that computers are tools to present items more
easily: the computer can measure competencies that are not adequately measured in
conventional tests! pencil and paper! Video recordings allow problems to be presented
that are more realistic than problems normally posed in paper-and-pencil tests. The
simulation of problems presented through a computer has several advantages over
pencil and paper tests in teaching Spanish as a second language: the simulation can
force the examinee to concentrate their attention on the use of the information to solve a
problem. problem and can help evaluate not only the student's product but also the
process that the student uses to carry out the activity, including the way in which the
activity is approached, the quality of the solution, and the number of suggestions that
may be necessary to solve the activity.
5 BIBLIOGRAPHIC REFERENCES
AGARD, FB; DUNKEL, H.B. An investigation of second language teaching . Boston,

Massachusetts: Ginn, 1948.
ANDALUSIA. Agreement of March 22, 2005, of the Government Council, approving the
Plan to Promote Multilingualism in Andalusia. Official Gazette of the Junta de
Andalucía , April 5, 2005, no. 65, pp. 8-39.
ASSOCIATION OF LANGUAGE TESTERS IN EUROPE MEMBERS. Multilingual
glossary of language testing terms . Cambridge: Cambridge University Press, 1998.
BROWN, James Dean. Testing in language programs. New York: McGraw-Hill
ESL/ELT, 2005.
DAVIES, Alan; BROWN, Annie; ELDER, Cathie; HILL, Kathryn; LUMLEY, Tom;
MCNAMARA, Tim F. Dictionary of language testing . Cambridge: Cambridge
University Press, 1999.
CANALE, Michael . “On some dimensions of language proficiency.” In: OLLER, John
W. (ed.). Issues in language testing research . Rowley, Massachusetts: Newbury
House, pp. 333-342.
CANALE, Michael ; SWAIN, Merrill . “Theoretical bases of communicative
approaches to second language teaching and testing.” Applied Linguistics. 1980, vol.
1, pp. 1-47.
CARROLL, John Bissell. The study of language: A survey of linguistics and related
disciplines in America . Cambridge : Harvard University Press, 1953.
CARROLL, John Bissell. “Fundamental considerations in testing for English language
proficiency of foreign students.” In: Testing the English proficiency of foreign
14
students . Washington, DC: Center for Applied Linguistics, 1961, pp. 30-40. Reprint
in: ALLEN, Harold B (ed.). Teaching English as a second language: A book of
readings. New York: McGraw-Hill, 1965, 364-372.
DAVIES, Alan BROWN, Annie ELDER, Cathie HILL, Kathryn LUMLEY, Tom ;
McNamara, Tim F. Dictionary of language testing . Cambridge: Cambridge
University Press, 1999.
ELASHOFF, Janet D.; SNOW, Richard E. Pygmalion reconsidered; a case study in
statistical inference: reconsideration of the Rosenthal-Jacobson data on teacher
expectancy. Worthington, Ohio: Charles A. Jones, 1971.
SPAIN. Organic Law 2/2006, of May 3, on Education. Official State Gazette , May 4,
2006, no. 106, pp. 17158-17207.
FREDERIKSEN, Norman. “The real test bias: Influences of testing on teaching and
learning.” American Psychologist . 1984, vol. 39, no. 3, pp. 193-202.
FULCHER, Glenn. “Book Review: A history of foreign language testing in the United
States: from its beginnings to the present.” Language Testing . 1999, vol. 16, no. 3,
pp. 389-398.
HOFFMAN, Banesh. The tyranny of testing. New York: Crowell-Collier, 1962.
HYMES, D.H. “On communicative competence”. In: PRIDE, JB; HOLMES, Janet
(eds.). Sociolinguistics: selected readings . Hardmondsworth: Penguin, 1972, pp.
269-293.
INGRAM, Elisabeth. “Attainment and diagnostic test”. In: DAVIES, Alan (ed.).
Language testing symposium: a psycholinguistic approach . London: Oxford
University Press, 1968, pp. 70-97.
SIDE, Robert. Measurement in English as a foreign language with special reference to
Spanish-speaking adults . doctoral thesis. Ann Arbor , Michigan : University of
Michigan , 1950.
LINN, Robert L.; GRONLUND, Norman E. Measurement and assessment in teaching .
Saddle River, NJ: Prentice-Hall, 2000.
OLLER, John W. Language tests at schools . London: Longman, 1979.
ORGANIZATION FOR ECONOMIC COOPERATION AND DEVELOPMENT.
Organization for Economic Co-operation and Development [online]. Paris:
Organization for Economic Co-operation and Development, sd [ref. on January 14,
2007 5:06]. OECD Program for International Student Assessment (PISA): PISA in
Spanish. Available on World Wide Web:
<http://www.pisa.oecd.org/document/25/0,3343,en_32252351_32235731_39733465
_1_1_1_1,00.html>.
PEARSON DRIVING ASSESSMENT. Pearson VUE [online]. London: Pearson VUE,
2007 [ref. on October 27, 2007 20:37]. Computer-based testing: benefits. Available
on World Wide Web: <http://www.pearsonvue.co.uk/home/cbt/benefits/>.
PILLINER, Albert EG “Subjective and objective testing”. In: DAVIES, Alan (ed.).
Language testing symposium: a psycholinguistic approach . London: Oxford
University Press, 1968, pp. 19-35.
ROSENTHAL, Robert; JACOBSEN, Lenore. Pygmalion in the classroom: teacher
expectation and pupils' intellectual development. New York : Holt, Rinehart and
Winston, 1969.
SHOHAMY, Elana. “Second language assessment”. In: TUCKER, G. Richard;
CORSON, David (eds.). Encyclopedia of language and education, vol. 4: second
language education . Dordrecht: Kluwer, 1997, pp. 141-149.
15
SPOLSKY, Bernard. “Introduction: linguists and language testers”. In: SPOLSKY,

Bernard (ed.). Approaches to language testing. Arlington, Virginia: Center for
Applied Linguistics, 1978, pp. vx.
STARCH, Daniel . “Reliability and distribution of grades”. Science . 1913, vol. 38, no.
983, pp. 630-636.
STARCH, Daniel ELLIOTT, Edward C. “Reliability of the grading of high-school work
in English.” The School Review . 1912, vol. 20, no. 7, pp. 442-457.
WEST, Charles K.; ANDERSON, Thomas H. “The question of teacher preponderant
causation in teacher expectancy research.” Review of Educational Research . 1976,
vol. 46, pp. 613-630.

Topic 1 Linguistic Evaluation

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Topic 1 Linguistic Evaluation

Uploaded by

Copyright:

Available Formats

1

In this introductory topic I am going to discuss the following aspects:

1. The causes of the growing interest in educational evaluation in general and in

2 THE INTEREST IN EVALUATION

Global competition, regional, national and international evaluation programs, new

2.1 The nature and quality of evidence

2.2 The effects of evaluation on students

Criticism 1: Tests cause anxiety

Criticism 2: Tests classify students

The classification of individuals can become a serious problem, especially if the

Criticism 3: Tests damage students' self-concept

Critique 4: Tests influence teachers' expectations, which, in turn, influence

1. Test scores create teachers' expectations about each student's learning.

2.3 The fairness of testing with minorities

1. Absence of bias : According to the Association of Language Testers in Europe

3 BRIEF HISTORY OF LINGUISTIC EVALUATION

Bernard Spolsky (1978, v) distinguished three trends in linguistic evaluation, which

1. The pre-scientific trend.

3.1 The pre-scientific trend

1. Fragments that must be translated to or from the foreign language.

In this movement, the construction of linguistic tests is a task assigned to language

3.2 The psychometric-structuralist trend

The psychometric-structuralist tendency is characterized by the interaction of two types

Educational evaluators have developed different types of items, such as multiple

Example of multiple choice item to assess control of grammatical structures:

Mr. Martin visited the professor. John saw them...

(1) Mr. Martin knows who visited John.

(Side, 1961, p. 159)

Example of multiple choice item to assess integrated reading skill:

1. “Highway” in this sentence means

(A) 'automobile road'

(Side, 1961, p. 235)

In 1961 Carroll (1961[1965], p. 370) distinguished in linguistic evaluation between

1. In the discrete structural points approach, designers construct tests that

2. The integrative approach to linguistic assessment:

The four skills of oral comprehension ( listening ), oral production ( speaking ),

This emphasis on an integrated approach makes Carroll, according to Spolsky (1978, p.

3.3 The integrative-sociolinguistic trend

According to Carroll (1961[1965], p. 370), the integrative approach had certain

3. In an integrative approach, it is not so necessary to carry out a contrastive

According to Spolsky (1978, p. ix), the integrative-sociolinguistic tendency is

1. The use of cloze tests , which Association of Language Testers in Europe

1. If (and to what degree) something is formally possible ;

3.4 The communicative trend

1. Learning a language should be interactive.

The specific background of this movement is composed of propositions taken from

4 TECHNOLOGICAL ADVANCES IN LINGUISTIC

AGARD, FB; DUNKEL, H.B. An investigation of second language teaching . Boston,

SPOLSKY, Bernard. “Introduction: linguists and language testers”. In: SPOLSKY,

You might also like