Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

Assessment Strategies for Second Language

Downloaded from https://academic.oup.com/applij/article-abstract/14/3/225/184491 by Michigan State University user on 16 October 2019


Acquisition Theories1
MICHAEL H. LONG
University of Hawaii at Manoa

There are numerous theories of second language acquisition (SLA), many of


them oppositional. Whether or not this is inevitable now, culling will eventually
be necessary if researchers are to meet their social responsibilities or ifSLA is to
be explained and a stage of normal science achieved. For the culling to be
principled, a rational approach to theory assessment is needed, and the
difficulty of identifying universally valid evaluation criteria makes this
problematic. Assessment strategies used in other fields can be useful in SLA,
but choice among them will depend on the researcher's (implicit or explicit)
philosophy of science.

0. INTRODUCTION
There are numerous theories of SLA. They differ in form, type, source, and
scope, and as is shown in part 1 of this paper, many are oppositional. SLA
researchers' social responsibilities and the increased likelihood of SLA's
success as a discipline are argued in part 2 to justify some theory culling. In
part 3, minimum requirements on the form and content of SLA theories are
discussed. From a rationalist position, part 4 describes theory assessment
strategies utilized in other disciplines: namely, 16 assessment strategies
proposed by Darden (1991), and 17 criteria from the general philosophy of
science literature, at least four of which seem valuable for SLA theories. Part 5
addresses the problematic issue of whether it is possible to identify universally
valid criteria for theory evaluation, either of single theories in isolation or of two
or more theories comparatively.

1. SLA THEORY PROLIFERATION


By a recent count, there are between 40 and 60 theories of SLA. Some might
prefer 'theories' (in inverted commas), since the list includes theories, hypo-
theses, models, metaphors, frameworks, perspectives, theoretical claims,
theoretical models, theoretical frameworks, and theoretical perspectives. The
terms are used in free variation in much of the literature, but as Crookes (1992)
shows, a fuzzy notion of theory is by no means unique to applied linguistics.
With the declining influence of logical positivism and the axiomatic theory form
it espoused, a more tolerant attitude towards what will count as a theory has
permeated most sciences over the past thirty years (Giere 1985).
SLA theories are as diverse as they are numerous.2 They differ inform, with
causal-process (Gardner 1985) and set-of-laws (Spolsky 1989) prevalent, and
are of three basic types: nativist, both specific (White 1989) and general (Wolfe-
Applied Linguistics, Vol. 14, No. 3 © Oxford University Press 1993
226 ASSESSMENT STRATEGIES FOR SLA THEORIES

Quintero 1992), environmentalist (Schumann 1986), and interactionist (Piene-


mann and Johnston 1987). They differ in source, drawing upon work in
linguistics (Cook 1988), pidgin and creole studies (Schumann 1978), socio-

Downloaded from https://academic.oup.com/applij/article-abstract/14/3/225/184491 by Michigan State University user on 16 October 2019


linguistics (Tarone 1983), psychology (McLaughlin, Rossman, and McLeod
1983), psycholinguistics (Clahsen 1987), neurolinguistics (Lamendella 1977),
cognitive science (Gasser 1990), social psychology (Giles and Byrne 1982), and
combinations thereof (Hatch, Flashner, and Hunt 1986). They also differ in
scope, or the range of data they attempt to explain. Some address naturalistic
SLA only (Schumann 1978), some instructed only (Ellis 1990), some both
(Krashen 1985); some children (Wong-Fillmore 1991), some adults (Bley-
Vroman 1989); some a specific cognitive capacity, such as metalinguistic
awareness (Bialystok 1991); some a specific psychological process, such as
transfer (Eckman 1985), restructuring (McLaughlin 1990), or implicit learning
(Hulstijn 1989); some a specific linguistic system, such as phonology (Major
1987) or the lexicon (Hudson 1989); some a specific sub-system, such as word
order (Meisel, Clahsen, and Pienemann 1981), speech act behavior (Wolfson
1988), or interrogative structures (Eckman, Moravcsik, and Wirth 1989).
On the face of it, the multiplicity and heterogeneity of theories may be well
motivated. SLA is a broad and expanding field, encompassing at a minimum the
simultaneous and sequential acquisition and loss of second (third, fourth, etc.)
languages and dialects by children and adults, learning naturalistically or with
the aid of instruction, as individuals or in groups, in second or foreign language
settings. Such an extensive, apparently heterogeneous range of cases and
contexts might need to be broken down into sub-groups, and a theory
developed for each. None of the 40-60 theories has much to say about all of
them, and it could be argued that at least some theories which do operate in one
or more of the same domains differ in other ways which make them potentially
complementary rather than oppositional. In Gregg's terms (this issue), some of
them, at least, may be theories in, not of, SLA. At least three kinds of evidence
show, however, that many current SLA theories are oppositional, not comple-
mentary, i.e. are so different in their underlying assumptions or in the claims
they make that they could not logically both/all be correct.
To begin with, the mere fact that rival claims have not been made in the same
domain may itself be problematic. As Beretta (1991: 497) points out:
Perhaps no-one sees SLA theories as direct rivals. The sense in which opposition exists
is in the assertions made about what an SLA theory must consist of. That is, it is a
question of domain.

The very choice of different domains, he notes, sometimes reveals fundamental


incompatibility. As illustrated by a recent assault on variationist models (Gregg
1990), UG-motivated SLA research is concerned only with linguistic com-
petence as its domain of inquiry,1 and discounts performance models (for
example, Tarone 1988; Ellis 1989), since their domain is irrelevant from a UG
perspective and because they assume a variable competence, something UG
accounts exclude from the realm of possibility.4
MICHAEL H. LONG 227

Secondly, both the specific variables and types of variables considered


important by different theorists, as evidenced by the content of their claims,
suggest that some fundamental conflicts exist. If they have not been explicitly

Downloaded from https://academic.oup.com/applij/article-abstract/14/3/225/184491 by Michigan State University user on 16 October 2019


articulated, it may simply be because the domains in which the theories have
been applied thus far have been different, and not because those theories are
truly complementary. Obvious examples, cited earlier, are the simultaneous
allegiance of different groups of researchers to nativist, environmentalist, and
interactionist theories, or to theories whose (explicit) content involves exclu-
sively cognitive, affective, or linguistic factors.
Thirdly, numerous clear cases of opposition within a domain exist, some of
which have been recognized in the literature, others not. For example, Krashen's
Input Hypothesis (1980 and elsewhere) claimed that learners can only acquire
new structures that are 'one step ahead' of their current stage of development, or
at 'i + 1'. Pienemann (1984 and elsewhere, cf. 1992) made essentially the same
claim in his Learnability and Teachability Hypotheses, but, unlike Krashen, could
derive precise predictions as to which structures will and will not be learnable by
learners at different stages of development, since stages of development in the
Multidimensional Model (Meisel, Clahsen, and Pienemann 1981) were defined
in terms of specified processing constraints. Additional examples include:
Krashen's dichotomous model and several continuum models of interlanguage
variation (see Tarone 1983, for discussion); explicitly conflicting views about
access to principles of Universal Grammar in adult SLA (for example, Bley-
Vroman 1989, Schwartz 1990); sociolinguistically-based and cognitivist
accountsof interlanguage variation (for example, Sato 1984,Tarone 1988,1989,
Crookes 1989, Hulstijn 1989, Selinker and Douglas 1989); and linguistic and
cognitivist explanations for observed developmental sequences, for example, the
dataonSwedishSLnegation(Hyltenstaml977,1982;Jordens 1980,1982).
In sum, (1) conflicting domain choice, (2) mutually exclusive theory type and
content, and (3) explicitly competing claims within the same domain, are all
indications of oppositionality. While some theories may turn out to be
complementary, this is certainly not the rule. There are multiple theories of
SLA, not just in SLA.

2. A RATIONALE FOR CULLING


Opinions among SLA researchers vary as to whether multiple theories are a
problem or something to be valued. An extreme relativist position was
advocated by Schumann (1983), who suggested that SLA theories might be
evaluated aesthetically, if at all, like exhibits in an art gallery—neither true nor
false, but more or less pleasing. The relativist position is that, in principle, at
least, any theory is as good as any other because there is no one reality waiting to
be discovered, but multiple realities, each a construction of the individual, in this
case, the individual theorist.5
Others (for example, Selinker and Lamendella 1978, Ellis 1985, McLaughlin
1987) have applauded the diversity on the grounds that none of the available
theories can handle all the known facts; each, they suggest, may have limitations
228 ASSESSMENT STRATEGIES FOR SLA THEORIES

or weaknesses, but each also offers some unique insight. The danger with this
attitude is that it can lead to the kind of eclecticism in SLA that has long afflicted

Downloaded from https://academic.oup.com/applij/article-abstract/14/3/225/184491 by Michigan State University user on 16 October 2019


the literature on language teaching pedagogy. For example, Ellis (1990) reviews
seven competing 'theoretical positions' regarding classroom language learning
(that he labels the Frequency, Input, Interaction, Output, Discourse, Collabora-
tive Discourse, and Topicalization Hypotheses), all of which hefindsinadequate
and some of which he considers fundamentally flawed, and then proceeds to
outline a theory of instructed SLA 'which is consistent with the theoretical
positions and research discussed in previous chapters' (Ellis 1990:174).
Not everyone, however, is content with the status quo. In a ground-breaking
examination of the multiple theories issue in SLA, Beretta (1991) draws upon
work in the history and philosophy of science and takes the position that the co-
existence of complementary theories, i.e. theories operating in different
domains and each providing answers to different parts of the SLA puzzle, need
not be a problem provided the complementarity is theoretically coherent.
Conversely, if theories are oppositional, i.e. offer theoretically incompatible,
mutually exclusive explanations of the same facts or of different facts, rational
strategies must be found to choose among them. 'The persistence of multiple
SLA theories (without principled complementarity),' he concludes, 'would be
inimical to progress in theory construction' (Beretta 1991: 495).
Now it would be simplistic to represent (and difficult to explain) scientific
progress as a succession of categorical acceptances and rejections of scientific
theories. There is, as Laudan (1981:141) points out,
a broad spectrum of cognitive stances which scientists take towards theories, including
accepting, rejecting, pursuing, entertaining, etc.
This is not to say that outright rejection (or falsification which sometimes
precedes rejection) never occurs, however. It clearly does, and indeed has
happened in SLA. In addition to falsification, however, there are at least two
major justifications for culling oppositional theories, one concerning a major
purpose (for some) of SLA work, the other deriving from the history of science.
With regard to the former, even were a relativist position intellectually
sustainable, a major purpose of studying SLA for some researchers (although by
no means all—see Beretta and Crookes, this issue) is to help solve pressing
social problems, for which a theory will be needed until empirically tested
solutions become available. To illustrate, given the scarcity of rigorous program
evaluations, decisions concerning the appropriacy of different kinds of second
language teaching or second language medium educational programs for
learners of different ages and different LI and L2 proficiencies currently have
to be based in large part on SLA theory, such as J. Cummins' Linguistic Inter-
dependence Hypothesis (for example, J. Cummins 1991) and some version of
an Age, Sensitive Periods, or Maturational State Hypothesis (for example,
Krashen 1982, Long 1990a, Newport 1990), as well, of course, as educational
theory and a range of other complex social, cultural, economic, and ideological
grounds. These are, of course, decisions which affect the educational and
MICHAEL H. LONG 229

occupational life chances of millions of people in many countries, including


refugees, migrant workers, international students, their spouses, children, and

Downloaded from https://academic.oup.com/applij/article-abstract/14/3/225/184491 by Michigan State University user on 16 October 2019


parents, and residents of multilingual societies. The field has an obligation to act
as quickly as possible to reach the point where we can respond to practitioners'
questions on such matters in some other way than by informing them of the
existence of numerous different points of view (SLA theories) of equal merit. If
we really think they all have equal merit, we are effectively claiming not to know
much about SLA. This would be a sad state of affairs for us and for the field as a
field, and is in my opinion untrue. If that is not our belief, we should make it clear
for the benefit of insiders and outsiders; and one way of doing this is by a
(rational) process of theory selection and unification, a product in part of
identifying agreed-upon areas of the empirical base, i.e. accepted findings (Long
1990b), and application of a coherent set of theory assessment strategies.
A second, independent reason for culling theories is that the history of
science shows that successful sciences, defined as those which have provided a
rich source of answers to important theoretical and applied questions, are those
which, typically after periods of theoretical ferment, have settled down—not
permanently, of course, and certainly not as a result of the (arbitrary) imposition
of a paradigm—to produce steady productive research under the constructive
guidance of a dominant theory. This position is clearly easier to achieve the
more is (thought to be) known in a field, since there then tends to be less
disagreement among scientists (Fleck 1979, cited in Beretta 1991).
Lest this sound like the arrival of Hitler's Neue Welt Ordnung in SLA, it is
important to note that the existence of a dominant theory does not mean the
eradication of all others, as shown, for example, by the flourishing of several
schools of linguistics (Generalized Phrase Structure Grammar, Lexical-
Functional Grammar, Systemics, etc.) throughout the period of dominance of
Chomsky's theory (which has itself undergone major changes). In fact, it has
been argued that the sociological history of intellectual change shows that
certain structural qualities of the social networks which make up intellectual
communities will always function to preserve more than one theory. On the
basis of studies of the histories of psychology, mathematics, and philosophy
(Ben-David and Collins 1966, Collins and Revisto 1983, Collins 1989), Collins
(1989) claims that in philosophy, at least, there will typically be between three
and six:
There is a lower limit because intellectual creativity is a conflict process. If there is any
creativity at all, there must be an organizational basis in the intellectual world for at
least two positions. If there is freedom to have more than one position, a third always
seems available (a plague on both houses is always a viable intellectual strategy); so, in
fact, there are usually at least three. (Collins I 989: 124)
Collins claims the upper limit derives from the 'structure of conflict', too, with
proliferation of rival positions ultimately being constrained by the level at which
groups can still be relatively large and visible. As a further source of comfort,
Collins (ibid.: 127) observes that more strongly supported positions (in
philosophy) tend to split internally, and weaker ones to consolidate.
230 ASSESSMENT STRATEGIES FOR SLA THEORIES

The existence of a dominant theory (or paradigm) is necessary if a new field is


ever to attain that state of grace known as 'normal science' (Kuhn 1970, 1977;

Downloaded from https://academic.oup.com/applij/article-abstract/14/3/225/184491 by Michigan State University user on 16 October 2019


Newton-Smith 1982). Normal science takes place when the domain(s) of
interest, relevant questions, and methods of going about answering them have
been specified by a theory to the general satisfaction of researchers in a field
(hence, the theory's acceptance by most of them), such that they can get on
productively with their work, at peace with themselves and each other over
basic issues in the philosophy of science. Research becomes cumulative, details
can be attended to, and applications of theory can be harvested. The work is
done efficiently and economically because the theory tells researchers the
relevant data to collect. It is theory-governed, organized, cooperative effort.
Views will no doubt differ, and rightly so, as to whether or not SLA is ready
for theory convergence just yet. As Beretta (1991) points out, however,
relativists argue that no field could ever be ready, and are against it in principle
either as impossible or as undesirable on cognitive grounds, which Beretta
considers a self-defeating attitude. In fact, it is not clear to me why relativists
would bother to do research at all.
Meanwhile, although some minimal ground rules will be necessary, normal
science and a dominant theory does not have to come about, as some might fear,
by everyone adopting the positivist orthodoxy of 'the scientific method'.
Instead, Laudan (1977), Chalmers (1990), and others suggest that agreement
on the aims (not method) of science can serve as a unifying principle—
specifically, agreement that the purpose of science is to develop, transform,
improve, and extend knowledge about the world. Skeptics, such as Feyerabend
(1981), notwithstanding, it is not necessary to adopt the tenets of the Popperian
scientific method to attain such goals, but neither can they be reached by the
relativist approach of Feyerabend (1975), Barnes and Bloor (1982), and others.
At a minimum, their pursuit requires that candidate laws and explanations be
tested against observational data in a demanding manner, which often involves
experimentation, in order to test their superiority over rivals, with the successful
prediction of novel phenomena being especially significant (Chalmers 1990:
39). Any changes in methods, standards, theories, or paradigms can be assessed,
and either accepted or rejected, not with reference to a dogmatically inherited
set of rules and procedures, but according to whether they are better at (both
conceptual and empirical) problem-solving (Laudan 1977, 1981), and at
producing new knowledge. Put simply, do they work?
The proposal to unify around agreed-upon aims, not the method, was
originally made for the physical sciences, but it is viable for SLA and the social
sciences, too. This is not to equate natural and human sciences. Nor is it to
pretend that prediction of natural phenomena is the only goal of either. It is
certainly not the only goal, or even a goal, for many social scientists. Numerous
problems in SLA research—the role of aptitude, age, and class differences in
acquisition, proficiency measurement, and the effects of instruction on inter-
language development, for example—are concerned with predictable regu-
larities in (individual and group) learning processes and outcomes, however,
MICHAEL H. LONG 231

and Laudan's and Chalmers' views are very useful for people working on those
issues.

Downloaded from https://academic.oup.com/applij/article-abstract/14/3/225/184491 by Michigan State University user on 16 October 2019


An obvious relativist objection to this modest rationalist proposal is to say
that, while circumventing the problems of logical positivism, it begs the question
in that it still assumes superior status for some kind of scientific practice and
method, and that even that assumption is culturally determined. Feyerabend
(1975,1987), for example, argues that scientific beliefs are different from, not
superior to, those of voodoo, magic, or religion, which are held no less certainly
by their believers. Many cultures and sub-cultures value poison oracles, faith-
healing, horoscopes, or prayer just as highly and believe in them with equal
certainty, and many societies had developed a proven knowledge base for hunt-
ing, fishing, agriculture, navigation, and other matters before big science was a
glimmer in Popper's eye. Feyerabend's (1975: 28) famous conclusion, of
course, was:
There is only one principle that can be defended under all circumstances and in all
stages of human development. It is the principle: anything goes.
Laudan (1990) provides the crucial counter to this. He accepts the success of
many pre-scientific or non-scientific practices and recognizes people's faith in
them, but goes on to suggest that the issue is a comparative one:
Are the methods of science generally more successful at producing what we expect out
of a system of knowledge than are the methods of belief-fixation practiced by
nonscientific cultures? (Laudan 1990: 117)
Non-scientific methods (consulting an oracle, prayer, etc.), he points out (ibid.:
118-19), are generally used for questions that are not independently verifiable—
for example, to find out if Mr Jones is possessed by evil spirits or if Ms Hodgkins
should marry. Non-scientific methods may or may not be successful at guiding
future behavior in the eyes of their users or in fact, but they are demonstrably not
as successful as scientific practice when used to make predictions about natural
events that are independently verifiable. It is the greater success of scientific
predictions (and, one might add, of scientific explanations) when tested against
the real world that confirms the superiority of science, at least for certain ends. If
every view of truth is equally valid, relativists have to explain the increasing
success of theories in physics and elsewhere in predicting natural phenomena
(for discussion, see Newton-Smith 1982).6

3. UNIVERSAL REQUIREMENTS ON SLA THEORIES


Against this background, there are some minimum requirements on the form
and content of SLA theories that can be defended (for insightful discussion, see
R. Cummins 1983, Harre 1985, McShane 1987, Crookes 1992). Whatever
form it takes, a theory of SLA must by definition be a theory of change, or
development. Therefore, static property theories, such as theories of grammar,
which describe extant systems of native-speaker competence never attained by
232 ASSESSMENT STRATEGIES FOR SLA THEORIES

most learners, alone will be inadequate, although principles held to govern all
possible human languages (including interlanguages) will be relevant at any

Downloaded from https://academic.oup.com/applij/article-abstract/14/3/225/184491 by Michigan State University user on 16 October 2019


developmental stage. What are also needed are one or more transition theories,
which attempt to explain changes in state of systems or their ontogenesis.
Transition theories, in other words, are appropriate when the explanandum is
an event or sequence of events, such as progression through a developmental
sequence, rather than a capacity. If progress through a sequence of develop-
mental stages is claimed to be driven by an existing or developing capacity of
some kind, then the development, structure, and use of the driving mechanism
itself—for example, an innate language acquisition device—will require a theory
of its own if the whole is not to degenerate into a mystical 'black box' account of
language learning. In sum, the minimum content for a SLA theory must include:
(1) a model of starting (L1/L2) communicative ability; (2) a model of target
communicative potential; (3) some descriptive statements—for example,
generalizations about observed regularities in stages of development towards
end-state knowledge, reflected (say) in developmental sequences; and (4) one or
more models or mechanisms—for example, triggering, parameter-resetting,
generalization, compilation, or restructuring (for discussion and further
examples, see Atkinson 1987, Long 1990b, Schmidt 1992)—to explain how to
get from one stage to the next. Where models are concerned, Harre (1970,
reported in Crookes 1992) suggests that partial analogies with some other
known process are the most conceptually productive. Many SLA 'theories' lack
the fourth component, models or mechanisms. Therefore, while useful steps
along the path to theory construction, they can only be providing the illusion of
true explanation of the SLA process.7
For such a theory to become dominant requires a high level of agreement
among researchers in the field, although by no means complete agreement, as
noted earlier with reference to the history of linguistics from 1957 to the 1990s,
and Collins's observations on the history of philosophy. Two important require-
ments for establishing that agreement are: (1) the identification of accepted
research findings which have a solid empirical base; and (2) accountability to
data, including the accepted findings. Both operations assume that science is or
can be a rational endeavor—for the findings must have been obtained rationally,
in as disengaged a manner as possible, if they are to be taken seriously and
theories held accountable to them. Relativists, such as Barnes and Bloor (1982),
and some sociologists of science dispute this rational quality on the basis not
only of the kinds of arguments outlined above, but also because scientists are
fallible human beings acting within a social system (science) and subject to its
pressures.
It is in fact necessary to recognize science's dark side. The system often breaks
down. Most notoriously, it is susceptible to tricksters (see Broad and Wade
1982, for some embarrassing examples). Less obviously, but more seriously, the
influence of societal, political, and professional pressures on consensus
formation among scientists has been widely documented (for example,
Newmeyer and Emonds 1971, Latour and Woolgar 1979, Barnes and Bloor
MICHAEL H. LONG 233

1982, Bloor 1984, Shapere 1986, Hull 1988, Diesing 1991, Martin 1991,
Mulkay 1991). Gould (1981), for instance, recounts grisly episodes in the

Downloaded from https://academic.oup.com/applij/article-abstract/14/3/225/184491 by Michigan State University user on 16 October 2019


history of the mismeasurement of intelligence. The power of military funding to
determine what research gets done is an obvious current example in US
universities (Dickson 1984, Erlich 1985). Few applied linguists would deny the
importance, documented in other fields (see, for example, Hull 1988), of such
academic and professional gatekeepers as journal editors, manuscript review-
ers, officers in professional organizations, conference organizers, and com-
mercial publishers in deciding which ideas become approved in thefield,and in
some cases, which research findings enter the refereed literature at all. Finally,
there are cases (described, for example, by Pickering 1981, Shapin 1982,
Shapere 1986) where experimental work has been compromised in these ways.
In sum, scientific facts, it must be recognized, are vulnerable to external forces.
Scientists are not necessarily acting rationally just because they are doing
science.
The rationalist counter to challenges of this kind lies essentially in the con-
ventionalized methodological traditions which function, however imperfectly,
to disengage scientists from their findings. It is only scientists, Newton-Smith
(1981, 1982), Campbell (1988), Laudan (1990), and others, point out, who
recognize and attempt to deal with the threat to the validity of their work posed
by flawed procedures, faulty measurement, and a lack of objectivity. Unlike
witch doctors, faith healers, and priests, scientists accept minimum criteria for
supporting a claim, including the need for evidence, for the evidence to have
been gathered following certain procedures, and for both the evidence and the
procedures to be available for inspection by other scientists. The experiment is
the highest form of this approach, and third-party replication of experiments is
one of the most crucial safety measures for protecting the integrity of a field.
While by no means perfect, scientific procedures are the best we have in a search
for disengagement.8
The defense of experimentation and scientific methodology is not absolute,
however. There is no attempt here to deny the theory-dependence of obser-
vation (the bias a theory may exert on how researchers interpret what they think
they see). Rather, it is argued (for example, Hacking 1983, Chalmers 1990: 71)
that while research and its findings are inevitably reported in theory-dependent
language, the physical results of experiments themselves depend on the way the
world is, not the way the theory that motivated them is, nor the strength of the
researcher's belief in that theory. Although by no means infallible, the failure to
replicate experimental findings in other fields often does serve as a restraint on
the growth of incorrect claims (witness the recent international controversy
surrounding cold fusion), as, it should not be forgotten, does the much more
frequent, if understandably less publicized (or even published) failure of initial
experiments to produce predicted results (Campbell 1988).9 While the method-
ology of experimentation never permits certainty that a claim is correct (one is
always operating on the basis of a stated degree of probability that the results are
not due to chance), the increasing accuracy of predictions suggests when the
234 ASSESSMENT STRATEGIES FOR SLA THEORIES

theorist is getting closer to the truth. As Beretta (personal communication)


points out:

Downloaded from https://academic.oup.com/applij/article-abstract/14/3/225/184491 by Michigan State University user on 16 October 2019


The multiple-truths, anything goes, rationality-is-an-internal-standard-which-is-no-
better-than-other-belief-systems-approach has to be able to account for that success.
The onus is on the relativists.

Much SLA research is not experimental, not only because of its relatively short
history, but because some of the key variables involved—age of onset, aptitude,
memory, native language background, etc.—are not susceptible to random
assignment (although they are to structural equation modeling) and so are
typically examined in quasi-experimental criterion group designs, and because
some questions in social science are simply not amenable to or best studied via
experimentation. Moreover, Lightbown (1984) and Santos (1989) have drawn
attention to the scarcity of replication studies in SLA and applied linguistics.
Nevertheless, examples exist of the accountability of SLA theory to conflicting
evidence—including cases of falsification—especially accountability to experi-
mental results, often with theorists' explicit recognition that such accountability
is necessary.
First, there is the consistent failure (described in Long and Sato 1984) of
error analysis studies to support early claims about LI transfer. This was a
major factor forcing the abandonment of the strong form of the Contrastive
Analysis Hypothesis in the 1970s (see, for example, Wardaugh 1970, Janicki
1990). Another example is the way in which in the mid-1970s the results of
quasi-experimental studies of the 'natural order' for ESL morphology led
Krashen to recognize two additional constraints (knowledge of the rule and
focus on form) to the one he had initially hypothesized (time). Krashen (1979:
155-6) explicitly recognized this, and in the same paper he articulated his view
of scientific method, lending support to the idea that SLA theorists at least
attempt to adhere to the rationalist conventions described by Newton-Smith
and others:
. . . our generalizations \sic\ need to be able to predict. They are not merely categories
for previously existing data, but must fit data gathered after the hypotheses were
formulated. The way we test our generalizations [sic], then, is to see whether they
predict new data. If they do, we are still in business, but if they do not, we have to change
the hypothesis, alter it. If these alterations cause major changes in the fundamental
assumptions in the original generalizations, make it too ad hoc, too cumbersome, we
may have to abandon the hypothesis. (Krashen 1979: 159)

A third example concerns a study of the components of SL proficiency by


Bachman and Palmer (1982), the results of which were recognized by Oiler
(1981) as falsifying his Unitary Factor Hypothesis and prompted his explicit
withdrawal of the claim. Oiler (1981:130) wrote:
In the debate among language testers concerning the strength of a general factor, one
study served to eliminate quite dramatically the possibility of a single global factor
exhausting all of the reliable variance in language tests in general. . . . The study that
MICHAEL H. LONG 235

settled the question concerning the extreme version of the general factor hypothesis
was conducted by Bachman... and Palmer In a very carefully designed experiment,

Downloaded from https://academic.oup.com/applij/article-abstract/14/3/225/184491 by Michigan State University user on 16 October 2019


they were able to show that a model incorporating a reading factor and a separate
speaking factor fits their test data better than a unitary trait model (i.e. the extreme
general factor hypothesis).

Having attempted to justify theory convergence in principle and a belief in the


possibility of achieving it in SLA, the question arises as to which assessment
strategies, or criteria, should be applied for the purpose. One option, following
Kuhn (1970), would be to wait for proponents of rival theories to retire or die,
but that would only work for older theories and proponents, would not achieve
consensus among younger theorists, and would not help meet SLA researchers'
social obligations. Also, one of the 'older' theories might be right. If the culling is
to be principled and swifter, rational evaluation criteria are needed. Theory
assessment strategies used in other fields can be useful in SLA. Choice among
the many available will depend on the researcher's (implicit or explicit)
philosophy of science. It will be important, however, to be aware of the dangers
inherent in importing criteria from the natural to the social sciences, since we
are dealing with people, who can affect the systems and processes SLA theories
seek to explain in ways physicists, for example, need not worry about.

4. ASSESSMENT STRATEGIES
A potentially useful approach to theory assessment is to be found in the work of
Darden (1991), based on her study of theory change in the history of genetics.
The strategies she outlines may not be appropriate for all areas of, or
approaches to, SLA and applied linguistics, but they are worth consideration by
researchers interested in explanations of learning processes and outcomes
which can be generalized to new educational and other social settings.
One of the appealing qualities of Darden's analysis is her grouping of assess-
ment strategies into five sets distinguished partly according to function. This
may eventually help demystify the wide range of evaluation criteria to which
scientists subscribe. Some of the differences, for example, may be a matter of the
phase in theory development at which the criteria are appropriately applied.
Some differences may also reflect confusion over the dual function many strate-
gies can serve, as constraints on theory construction and/or criteria for theory
evaluation, which Darden sees as interdependent processes (see also Beretta
and Crookes, this issue; Long 1990b). Finally, some differences may be the
consequence of genuine tensions among the outcomes of application of
different criteria.
Darden's first set consisfs of two criteria applicable before or independent of
empirical testing of a theory. Internal consistency refers to the need to avoid
internal contradictions among a theory's components. This would be the case,
for example, in a theory which assumed continuing access to Universal
Grammar to account for the successful acquisition of some complex syntactic
structures by adult acquirers, and use of general problem-solving procedures to
explain the same adults' failure to acquire other structures, when type of
236 ASSESSMENT STRATEGIES FOR SLA THEORIES
structure was not a distinction made in the theory (see, for example, Felix 1985).
Non-tautologousness means, obviously, that a theory should not consist of

Downloaded from https://academic.oup.com/applij/article-abstract/14/3/225/184491 by Michigan State University user on 16 October 2019


trivial statements with no empirical consequences.
The second set of eight criteria concern tests or standards for assessing a
theory's empirical adequacy. Systematicity refers to how systematically inter-
connected the components of a theory are. That is to say, there should be an
internal coherence to a theory's parts, and no obvious cases of ad hocness, as
when a component has been added on simply to deal with an awkward set of
facts and has no independent evidence in its favor. Modularity refers to the
desirability of having distinct components and sub-components within a theory,
rather than a large amorphous mass, in order to aid in the identification of
anomalies (recalcitrant facts and unwanted theoretical consequences), and
thereby to identify the part of a theory that may be wrong and in need of change.
For obvious reasons, Darden flags systematicity and modularity as examples of
potentially conflicting criteria. Clarity is a desirable quality of the way
theoretical claims, entities, and processes are specified (although it may not be a
good strategy in the early stages of theory construction, when vagueness can be
tolerated as a way of encouraging early progress). Explanatory adequacy refers
to the need for a theory to account for phenomena in its domain (although what
will actually count as explanation may vary from one science and scientist to
another), and tends to be used as a post hoc criterion for evaluating a theory's
past successes. Darden emphasizes (see also, Laudan 1990) that new theories
will often not account for as many domain items as the rival they seek to replace,
so this again is a strategy which should not be applied before its time. Explana-
tory adequacy is an example of a criterion that often provokes disputes, but
disputes which generally really concern when, not if, the criterion must be met.
Still in the category of strategies for assessing a theory's empirical adequacy is
predictive adequacy. This criterion, Darden notes, has two aspects: an inability
to make many interesting predictions can act as a constraint on theory construc-
tion, while the ability to make successful predictions is an important issue in
theory assessment. Predictive adequacy and explanatory adequacy are also
related, since a view seems to be emerging among philosophers of science (see,
for example, Giere 1984, Laudan 1990) that, despite the fact that the logical
relationship between a hypothesis and a fact are the same whether the fact was
known or unknown to the theorist at the time the theory was formulated, the
ability to make novel or improbable or surprising predictions is to be valued
more highly than the ability to explain known facts.1" Conversely, Laudan
(1977) and Brush (1989, cited in Darden 1991) have suggested that a new
theory's ability to explain something which was a recalcitrant anomaly for an old
theory may count for more than the theory's ability to predict novel phenomena.
Scope refers to the breadth of the domain, the range of facts (such as
classroom SLA, or classroom and naturalistic SLA), purportedly explained by
the theory. Generality refers to the strength of the claim(s) a theory makes. A
claim that all learners pass through the same six-stage sequence in the
acquisition of German SL word order is more general (also simpler) and more
MICHAEL H. LONG 237

valued than a claim that children follow one sequence and adults another.
Darden notes, however, that greater generality does not always mean greater

Downloaded from https://academic.oup.com/applij/article-abstract/14/3/225/184491 by Michigan State University user on 16 October 2019


scope. It is sometimes more effective to employ a twin strategy of'specialization'
of an over-generalization (i.e. restriction of the scope of a claim), plus 'addition'
of a new component in order to deal with the cases no longer covered by the
original over-generalized and partially inaccurate statement. Lack of ad
hocness, or simplicity (which appears to overlap somewhat with systematicity in
Darden's system), refers to the undesirability of having a component in a theory
whose function is to account for one or a few problematic domain items, which
is not systematically connected to other components, and which does not
successfully predict new domain items.
The third set of two strategies can be used to assess a theory's future potential.
Extendability, equivalent to 'smoothness' (Newton-Smith 1981), refers to a
theory's potential for easy accommodation of changes and extensions to explain
new domain items. Fruitfulness refers to a theory's productiveness, or fertility
(see also, McMullin 1976), its ability to generate new research and conceptual
advances.
The fourth category consists of one criterion (here called) consistency with
accepted theories in other fields, with three possible variants, dealing with
relationships to existing theories of other phenomena. Although Darden admits
it exerts a conservative influence and constitutes a strong constraint on theory
construction, it is possible to require that new theories be (a) consistent with, (b)
analogous to, or even (c) tightly interrelated with, accepted theories in other
fields. If this is not required during theory construction, then Darden proposes
that a strategy of assessing a new theory in relation to other accepted theories be
used later for theory evaluation. The idea is to provide a check on the splintering
effects on a field which new theories can have if they are launched without
regard for their consistency with known natural phenomena in related areas,
and also to facilitate future unification. An example in linguistics is the
importance attached by Pinker and Bloom (1990) to the claim that the
evolution of the human capacity for language is consistent with general Darwi-
nian principles of natural selection. An example in language acquisition is Hur-
ford's approach to the study of sensitive periods for development (Hurford
1991).
Thefifthandfinalset concerns metaphysical and methodological constraints.
Experimental testability and the ability to make quantitative predictions are self-
explanatory methodological assessment strategies. (Some fields—for example,
political science—will be as interested in qualitative predictions, it should be
noted.) Simplicity also re-enters the picture here, either as a methodological
criterion for assessing testability, or metaphysically, as an assumption that
nature is simple at some deep level and that simpler theories are therefore more
likely to be true. The previously mentioned strategy of assessing relations to
theories in other fields can also be metaphysically relevant in the sense that its
importance for possible future theory unification may reflect a belief that
science should be unified because it is a body of knowledge about one world.
238 ASSESSMENT STRATEGIES FOR SLA THEORIES

Again, the Pinker and Bloom (1990) claim about language and natural selection
provides an illustration.
All of Darden's assessment strategies can be used to evaluate single theories

Downloaded from https://academic.oup.com/applij/article-abstract/14/3/225/184491 by Michigan State University user on 16 October 2019


in isolation, and could be used to thin out the ranks of SLA theories in that
manner, or at least to alert theorists to weaknesses they might have overlooked.
How many SLA theories, for example, could pass the tests of internal
consistency, systematicity, and explanatory and predictive adequacy? Several of
the same strategies can also be used in comparative evaluations of two or more
theories. For this exercise, however, a number of additional criteria are
available, several with a more familiar ring to them.
The best known is falsifiability. Falsifiability has fallen into disfavor over the
years for a variety of reasons, including the notorious theory-ladenness of
observation, and underdetermination (past confirmatory evidence adduced in
support of a theory does not guarantee that future observations will be
supportive, and both positive and negative evidence may only support or
challenge part of a theory, not the whole edifice). Strict falsificationism would
require a theory to be abandoned every time it was confronted with observa-
tions incompatible with it, and that would have meant the loss of many
ultimately successful theories in the history of science, including Newton's
astronomy, Mendel's genetics, and just about every theory in the history of
physics. Popper recognized this, and did not advocate strict falsificationism
himself. Conversely, however, the weaker demand that a theory should be held
accountable to mismatches between its predictions and real world phenomena
is a requirement that almost any theory can meet at some level, and so serves
little purpose as a criterion for comparative evaluation (for discussion, see
Chalmers 1990:14-20).
The most obvious problem with falsification is the difficulty of deciding which
components of a theory are implicated by a negative finding and have to be
modified or abandoned, and/or whether just that component or the whole
theory is in jeopardy. Thus, when unpredicted disturbed morpheme orders
were observed on the reading and writing tasks in Larsen-Freeman's (1975)
study, and unpredicted 'natural' orders on a supposedly discrete-point test in
Krashen, Sferlazza, Feldman, and Fathman (1976), it was not clear which of the
so-called hypotheses in the then current version of Krashen's Monitor Theory
was in trouble. Was the order due to a problem with the Monitor Hypothesis,
the Natural Order Hypothesis, or the Acquisition-Learning Hypothesis, or
some combination? (In the end, Krashen's solutions were, respectively, to
invoke a new distinction between 'easy' and 'hard' grammar, and to reclassify the
SLOPE test as an integrative measure.) There is also the problem of ad hocness,
since a theorist might make a token adjustment to deal with a problem finding
and so merely give the illusion of accountability to negative evidence. Lakatos
(1978) proposed treating conflicting findings as anomalies rather than
falsifications, continuing with the theory if it could produce at least some
confirmed novel predictions, and perhaps abandoning it only when a rival
theory could explain the anomalies. As Chalmers (1990: 19) points out,
MICHAEL H. LONG 239

however, there is no way of knowing how long to continue in the face of


anomalies, since success may always be just around the corner.
Despite these limitations, falsifiability is not without its champions in SLA,

Downloaded from https://academic.oup.com/applij/article-abstract/14/3/225/184491 by Michigan State University user on 16 October 2019


including McLaughlin (1987), Spolsky (1989), and Ellis (1990), who writes:

The theory [of instructed second language acquisition) presented here has two primary
purposes. The first is to provide a set of statements or hypotheses about classroom L2
learning which are testable and, therefore, falsifiable. In this sense, the theory aims to be
scientific, as defined by Popper (1976). (Ellis 1990:174, emphasis in the original)

It also has its detractors (see Schumann, this issue). Falsifiability as an assess-
ment strategy, it should not be forgotten, assumes a belief in the existence of
truth and falsity on the part of the person applying it, and so makes it (and
indeed, probably the whole notion of theory evaluation, other than from an
aesthetic standpoint) a non-issue for relativists, who do not accept the notion of
objective truth.
In the light of the problems of falsifiability as a requirement on theories, it
would be better not to try to make it one—especially not an absolute require-
ment in the early stages of theory construction. Rather, a falsifiable theory may
be judged better than an unfalsifiable one in later comparative evaluations. The
notion has additional value. As one form of accountability to data, it is likely to
encourage adherence to related desirable qualities in a theory, such as clarity
and explanatory adequacy. Finally, when evaluating one or more theories,
whether or not they have in fact been falsified is a reasonable, indeed critical
measure to apply, but as part of a more general criterion of empirical
adequacy.11

5. UNIVERSAL STRATEGIES?
Are any assessment strategies universally valid? It would be tempting to believe
that while theories might differ, scientists would at least agree on how to
evaluate them. It is clear, however, that there is no consensus on a universal set
of criteria at present. Indeed, sociologists of science (for example, Mulkay and
Gilbert 1984/1991) have found that scientists vary both as a group in the
criteria they claim to apply in assessing their own and others' theories, and also
as individuals by applying criteria inconsistently.
Independent of the sociological findings or historical record, philosophers of
science have noted the logical impossibility of a current consensus on assess-
ment strategies. The problem, addressed explicitly by Laudan and Laudan
(1989) and Laudan (1990), is that if scientists all adhere to the same scientific
method (lthe scientific method' espoused by positivists, or any other), if they
(supposedly) have access to the same body of 'accepted findings', if they accept
the same set of standards for evaluating theories (and we must recognize we are
implicitly urging this if we advocate a rational narrowing of the range of
theories the field should consider), and if they are rational, how do we explain
the fact that they (ever) disagree? And if standards for theory acceptance are
240 ASSESSMENT STRATEGIES FOR SLA THEORIES

supposedly rigorous, as is widely believed and advocated, how, too, do we


explain innovation, the appearance of new theories which initially are

Downloaded from https://academic.oup.com/applij/article-abstract/14/3/225/184491 by Michigan State University user on 16 October 2019


inevitably less well developed and have less empirical support, i.e. account for a
narrower range of data than the theories they are designed to replace?
Historically, there are examples of new theories (for example, continental drift
theory) ignoring or reinterpreting widely accepted findings. Conversely, if
disagreement is explained by positing that scientists hold different views about
scientific method and subscribe to different theory assessment criteria, how is
consensus ever to be reached or explained?
The solution offered by Laudan and Laudan (1989) is that scientists do
indeed operate with divergent standards and evaluation criteria, thereby
accounting for the possibility of both disunity and innovation. Those whose
assessment criteria are satisfied first become the innovators. Agreement can be
reached because over time, with developments in the theory and its database,
other scientists' standards can also be met, until everyone is on board. They
support this analysis, using Wegener's continental drift theory of 1915 as their
example, by showing that scientists who accept and support a new theory earlier
do so on different grounds from those who do so later, i.e. hold different
assessment criteria.
Laudan and Laudan report that the writings of early proponents of contin-
ental drift theory reveal they were mostly impressed by the range of different
kinds of data (paleomagnetic, paleoclimatic, and geological) that could be
accounted for, albeit post hoc, by Wegener's theory, even though any of the
individual lines of work alone might not have been sufficiently compelling, and
even though none of this was predictive, and because none of the rival theories
covered so many different kinds of data. Holdouts at this stage included those
who wanted a theory also to explain different phenomena from those it had been
invented to account for, and/or who wanted novel predictive successes, the
more surprising (unrelated) the better. Evidence was produced in 1965-67 that
confirmed two startling novel predictions from the theory, one concerning
symmetrical patterns in the magnetism of rocks on either side of recently
discovered mid-ocean ridges, the other seismological evidence confirming the
specific direction of movement predicted for certain underwater fracture zones
if they were the result of continental drift. This was evidence the theory could
account for different phenomena than those it had been created to explain,
make surprising predictions, and account for data the rival immobilist geology
theory could not handle.
Laudan and Laudan's claim is that early converts subscribed to the variety of
instances criterion, later converts to the importance of surprising predictions
and/or independent testability criteria. They held different criteria, but could
still reach consensus, because continental drift was dominant by all their
standards. The 50-year period over which different groups joined up is evidence
against the claim that there are universal theory assessment criteria, or else
obliges one to explain away the lack of consensus on sociological grounds (see,
for example, Giere 1988), such as the differential power of rival scientific sub-
MICHAEL H. LONG 241

communities, time-lags in learning of new findings, cases of careers and


research funding being too committed to a rival theory, geography, or

Downloaded from https://academic.oup.com/applij/article-abstract/14/3/225/184491 by Michigan State University user on 16 October 2019


differences in training. Laudan and Laudan note, then, that you can have
consensus by dominance, despite divergent evaluation criteria among those
making up the consensus—a field can agree about a theory while disagreeing
about what makes a theory good.
When scientific consensus is achieved, it is sometimes due to disagreements
about assessment criteria themselves being resolved (Laudan 1984), but this
happens slowly and rarely. Laudan and Laudan (1989) posit an additional
mechanism to explain consensus, theory dominance, i.e. a theory is superior to
all its rivals by every set of assessment criteria used in the field, even though
these may be very different, and is recognized as such by all scientists in that
field. The degree of agreement in the natural sciences is not because scientists
there all subscribe to the same standards, Laudan and Laudan suggest, but
because theories have emerged which dominate because they can satisfy
divergent standards.
If there is no current consensus on assessment criteria, could there be one in
principle? The received view is again negative. Beretta (1991:503-4) considers
adoption of any criterion stiffer than that a theory should be able to explain at
least some data is likely to lead to trouble. The history of science shows, after all,
that almost every ultimately successful theory would have failed at least one of
the other potential evaluative tests. And here he echoes the current view among
philosophers of science (see, for example, Laudan and Laudan 1989, Cushing
1989). After an examination of the possibilities of universals (and, incidentally,
of the potential, which he considers dismal, of an empirical historical approach
for unearthing any), Cushing concludes that:

the (eventual) requirement of (a not wholly theory-independent) empirical adequacy


has attained the status of an almost necessary condition for theory acceptance. Fertility,
or generative potential, is another fairly common characteristic of scientific theories.
No theory satisfies all of these desirable requirements. Scientists make the best
arguments they can in a given set of circumstances. Some carry the day and some do
not. There is no algorithm for success in this enterprise. Detailed, philosophically
sensitive examination of specific episodes in the history of science can reveal what
successful strategies have been, but not universal characteristics. In fact, focusing on
such criteria is too narrow an exercise to reveal the many-faceted and holistic nature of
the scientific enterprise. (Cushing 1989: 20)

This may be unduly pessimistic, however. Showing a lack of consensus on


assessment criteria is arguably no different from showing oppositionality in
theories themselves. If a rationalist position is adopted in one debate, why not in
both, if only in the interest of consistency? If there can be one correct theory
(which it is a researcher's purpose to approach, without ever being sure he or
she has discovered it), why not a set of universally valid assessment criteria,
which might be posited and applied, again without ever being sure they are valid,
but whose validity researchers could continually be drawing closer to
242 ASSESSMENT STRATEGIES FOR SLA THEORIES

establishing? Over time, theories which made progressively more accurate


predictions of natural phenomena in various fields should also tend to be those
meeting the favored assessment criteria, while theories doing poorly empirically

Downloaded from https://academic.oup.com/applij/article-abstract/14/3/225/184491 by Michigan State University user on 16 October 2019


should also tend to be those evaluated unfavorably by the same criteria.
At the very least, given that the uses of assessment criteria vary, and given, as
Laudan (1981) has pointed out, that the particular criteria scientists have used
have changed significantly during the history of science, why can there not in
principle be a correct set of evaluation criteria for a particular group of scientists
at a particular point in time for a particular purpose? The most typical purpose
is the comparative assessment of two or more theories, rivals in at least partly
the same domain, of greater or lesser simplicity, each apparently having had
sufficient time (it could never be more accurate than this) to be demandingly
tested, to be falsified if that were going to happen, to have made successful novel
predictions, to have provided a satisfying explanation for the SLA process
involved, and to have proved themselves useful solvers of theoretical and
applied problems in the field.
Some apparent historical counter-evidence to the universality of assessment
criteria may really involve problems in the timing of their application, not
conflicts in principle. The empirical adequacy criterion for a theory—i.e. that it
be capable of explaining major accepted findings in a field—could be universally
applied without creating problems for good theories provided it is not applied
too early in the theory development process. (Assessment of isolated theories in
absolute terms is only possible late in the process because a lot of data are
needed before a theory can unambiguously be ruled in or out (Rosenberg
1986).) The same goes for simplicity, the ability to explain phenomena different
from those it was invented to account for, and possibly the ability to make
surprising novel predictions, although here the time-lag needed to assess this
might lead to longer survival of degenerate theories than was desirable.
Skeptics can object that, quite apart from risking the loss of valid theories
(among other possible nasty side-effects), the argument unjustifiably assumes
knowledge of how long to wait before applying the criteria. To this it can be
responded that every aspect of theory construction involves the exercise of
judgment, and that the same is true for which assessment criteria to apply and
when to apply them. There is nothing intrinsically different, however, about the
fallibility of theory assessment criteria and the fallibility of theories themselves.
In just the same way that a theory of SLA is someone's current best shot at
explanation, so can at least four criteria— empirical adequacy, simplicity, the
ability to explain phenomena differentfromthose it was invented to account for,
and the ability to make surprising novel predictions—function for comparative
evaluation needs. The criteria, like the theory, may turn out to be invalid.
However, at any one time, it should be possible to defend adherence to a
particular set of assessment strategies for a particular purpose in exactly the
same way that at any one time it should be possible to justify doing research
motivated by a particular theory or giving advice to practitioners on the basis of
a particular theory. The important (and difficult) thing is not to lose sight of the
MICHAEL H. LONG 243

fact that both the theory and the assessment criteria used to evaluate it may turn
out to be false, and the need always to alert practitioners to this.

Downloaded from https://academic.oup.com/applij/article-abstract/14/3/225/184491 by Michigan State University user on 16 October 2019


(Revised version received January 1993)

NOTES
1
This is a revised version of a plenary address to the Applied Linguistics at Michigan
State University conference on Theory Construction and Methodology in Second
Language Acquisition Research, 4-6 October 1991. I thank Alan Beretta, Graham
Crookes, Kevin Gregg, Larry Laudan, Charlie Sato, and two anonymous reviewers for
helpful comments on the earlier version. Remaining errors are my own responsibility.
2
In what follows, only one example of each kind of SLA theory is cited to illustrate
these differences. Numerous examples exist of some categories.
3
While competence is claimed to be the proper domain of inquiry, researchers in the
Chomskyan tradition are as obliged as those in any other to investigate competence by
studying performance, e.g. on grammaticality judgment tasks and via other forms of
introspection.
4
Eckman (1991) and Sharwood-Smith (1991) provide interesting commentaries on
the debate in Applied Linguistics between Gregg and Ellis and Tarone on this issue.
Eckman argues that the relevance of variation data is an empirical matter, not something
to be decided a priori.
5
Responding to an audience question at the 1992 Michigan State University
conference, Schumann announced that he now rejected his earlier (1983) relativist
stance, and declared himself a realist (in the philosophical sense).
6
I once asked a famous applied linguist, a devout Christian, how he reconciled the
rigor of his experimental work with his religious beliefs. After thinking for a moment, he
replied, 'You know, I often ask myself the same thing. Of course, I can test them in one
way. When I die, I'll either go to heaven or to hell.' Then, after a pause, 'The problem is, it
isn't replicable.'
7
Prescriptions for language teaching based on SLA theories which fail to provide
explanations are also misleading if presented as 'theory-based'.
8
One Applied Linguistics reviewer proposed structural equation modeling as a more
useful alternative to experimentation, arguing that 'conceptually, it recognizes that
causation is an unrealistic goal, but that finding a reasonable explanation, or an
explanation that is clearly more consistent with the data than other explanations, is a rea-
sonable goal. Methodologically, it lends itself much more readily to the investigation of
phenomena including multiple variables or factors, which it seems to me, is now a neces-
sary characteristic of SLA research.'
9
For a vivid example in SLA research, see Eubank (1991).
10
A recent example in SLA of 'prediction' of an earlier, but unknown, finding
concerns sensitive periods in language development. Long (1990a) hypothesized the
closure of the first sensitive period, for suprasegmental phonology, to occur as early as
age six, not at puberty, as traditionally claimed in the literature. Chambers (1992: 689)
cited a just translated finding, hitherto available only in Japanese, by Sibata (1958), to the
effect that of 500 children still in Shirakawa in 1949 after being moved from Tokyo and
Yokahama five or six years earlier to escape US bombing, children aged six or seven on
arrival had acquired the pitch-accent features of the Shirakawa dialect almost perfectly by
the time of the interview, while those aged fourteen or over on arrival had not modified
their dialect at all.
244 ASSESSMENT STRATEGIES FOR SLA THEORIES
1
' At least sixteen other theory-assessment criteria are advocated in the history and
philosophy of science literature. Some overlap with those discussed above, but most
differ in emphasis, at least, and repay careful study. They are: fertility as a paradigm for

Downloaded from https://academic.oup.com/applij/article-abstract/14/3/225/184491 by Michigan State University user on 16 October 2019


puzzle-solving (Kuhn 1970); explanatory power; problem-solving ability (Laudan 1977);
the ability to account for different kinds of data (Nagel 1939: 72, cited in Laudan and
Laudan 1989: 230); the ability to account for phenomena different from those the theory
was invented to explain; novel predictive successes; the ability to account for data a rival
theory cannot handle; simplicity/parsimony; consistency; generality; empirical adequacy
(van Fraassen 1980), especially a record of having survived a strong empirical test;
proven fertility (McMullin 1976), i.e. the amount of research and conceptual work
stimulated by a theory; and unproven fertility, or generative potential (Nickles 1987), i.e. a
theory's potential for stimulating research; continuity/rationality, or the appearance of
being at least partly grounded in currently accepted ideas (Darden 1991); a pragmatic
(get on with things') relationship with experiment (Pickering 1984); and the ability to
resolve fundamental conceptual difficulties. (For discussion, see Cushing 1989, Laudan
and Laudan 1989, Laudan 1990, Darden 1991.)

REFERENCES
Atkinson, M. 1987. 'Mechanisms for language acquisition: learning, parameter-setting
and triggering.' First Language 7: 3-30.
Bachman, L. F. and A. S. Palmer. 1982. 'The construct validation of some components
of communicative proficiency.' TESOL Quarterly 16/4: 449-65.
Barnes, B. and D. Bloor. 1982. 'Relativism, rationalism and the sociology of knowledge'
inM. Hollis and S. Lukes (eds.): Rationality and Relativism. Oxford: Blackwell.
Ben-David, J. and R. Collins. 1966. 'Social factors in the origins of a new science: the case
of psychology.' American Sociological Review 31: 451-65.
Beretta, A. 1991. 'Theory construction in SLA. Complementarity and opposition.'
Studies in Second Language Acquisition 13/3: 493-512.
Bialystok, E. 1991. 'Metalinguistic dimensions of bilingual proficiency' in E. Bialystok
(ed.): Language Processing in Bilingual Children. Cambridge: Cambridge University
Press.
Bley-Vroman, R. 1989. 'The logical problem of foreign language learning.' Linguistic
Analysis 20: 1-2; 3-49.
Bloor, D. 1984. 'The strengths of the strong programme' in J. R. Brown (ed.): Scientific
Rationality: The Sociological Turn. Dordrecht: Reidel.
Broad, W. and N. Wade. 1982. Betrayers of the Truth. Fraud and Deceit in the Halls of
Science. New York: Simon and Schuster.
Brush, S. 1989. 'Prediction and theory evaluation: The case of light bending.' Science
246:1124-9.
Campbell, D. T. 1988. 'A tribal model of the social system vehicle carrying scientific
knowledge' in S. Overman (ed.): Methodology and Epistemology for Social Science:
Selected Papers. Chicago: University of Chicago Press.
Chalmers, A. 1990. Science and its Fabrication. Milton Keynes: Open University Press.
Chambers, J. K. 1992.'Dialect acquisition.' Language 68/4: 673-705.
Clahsen, H. 1987. 'Connecting theories of language processing and (second) language
acquisition' in C. Pfaff (ed.): First and Second Language Acquisition Processes.
Cambridge, MA: Newbury House.
Collins, R. 1989. 'Toward a theory of intellectual change: The social causes of
philosophies: Science, Technology, and Human Values 14/2: 107-40.
MICHAEL H. LONG 245

Collins, R. and S. Revisto. 1983. 'Robber-barons and politicians in mathematics: a


conflict model of science.' Canadian Journal of Sociology 8:199-227.

Downloaded from https://academic.oup.com/applij/article-abstract/14/3/225/184491 by Michigan State University user on 16 October 2019


Cook, V. 1988. Chomsky's Universal Grammar: An Introduction. Oxford: Blackwell.
Crookes, G. 1989. 'Planning and interlanguage variation.' Studies in Second Language
Acquisition 11/4: 367-87.
Crookes, G. 1992. 'Theory format and SLA theory.' Studies in Second Language
Acquisition 14/4:425-49.
Cummins, J. 1991. 'Interdependence of first- and second-language proficiency in
bilingual children' in E. Bialystok (ed.): Language Processing in Bilingual Children.
Cambridge: Cambridge University Press.
Cummins, R. 1983. Psychological Explanation. Cambridge, Mass.: MIT Press.
Cushing, J. T. 1989. 'The justification and selection of scientific theories.' Synthese 78:
1-24.
Darden, L. 1991. Theory Change in Science: Strategies from Mendelian Genetics. New
York: Oxford University Press.
Dickson, D. 1984. The New Politics of Science. Chicago: University of Chicago Press.
Diesing, P. 1991. How Does Social Science Work? Reflections on Practice. Pittsburgh,
PA: University of Pittsburgh Press.
Eckman, F. R. 1985. 'The markedness differential hypothesis: theory and applications' in
B. Wheatley, A. Hastings, F. Eckman, L. Bell, G. Krukar, and R. Rutkowski (eds.):
Current Approaches to Second Language Acquisition: Proceedings of the 1984
University of Wisconsin-Milwaukee Linguistics Symposium. Bloomington, Indiana:
Indiana University Linguistics Club.
Eckman, F. R. 1991. 'On the determination of the proper level of abstraction for a theory
of second language acquisition.' Paper presented at the Applied Linguistics at
Michigan State conference on Theory Construction and Methodology in Second
Language Research. East Lansing: Michigan State University.
Eckman, F. R., E. A. Moravcsik, and J. R. Wirth. 1989. 'Implicational universals and
interrogative structures in the interlanguage of ESL learners.' Language Learning 39/
2:173-205.
Ellis, R. 1985. Understanding Second Language Acquisition. Oxford: Oxford University
Press.
Ellis, R. 1989. 'Sources of intra-learner variability in language use, and their relationship
to second language acquisition' in S. Gass, C. Madden, D. Preston, and L. Selinker
(eds.): Variation in Second Language Acquisition: Psycholinguistic Issues. Clevedon,
Avon: Multilingual Matters.
Ellis, R. 1990. Instructed Second Language Acquisition. Oxford: Blackwell.
Erlich, H. J. 1985. The university-military connection. Social Anarchism 8 & 9: 3 -
21.
Eubank, L. 1991. 'Testing the explanatory adequacy of sentence processing constraints
for L2 acquisition.' Paper presented at the Applied Linguistics at Michigan State
conference on Theory Construction and Methodology in Second Language Research.
East Lansing: Michigan State University.
Felix, S. W. 1985. 'More evidence on competing cognitive systems.' Second Language
Research 1/1:47-72.
Feyerabend, P. K. 1975. Against Method. London: New Left Books.
Feyerabend, P. K. 1981. 'More clothes from the Emperor's bargain basement.' British
Journal of the Philosophy of Science 32: 57-71.
Feyerabend, P. K. 1987. Farewell to Reason. London: Verso.
246 ASSESSMENT STRATEGIES FOR SLA THEORIES
Fleck, L. 1979. Genesis and Development of a Scientific Fact. Chicago: University of
Chicago Press.

Downloaded from https://academic.oup.com/applij/article-abstract/14/3/225/184491 by Michigan State University user on 16 October 2019


Gardner, R. 1985. Social Psychology and Second Language Learning: The Role of
Attitudes and Motivation. London: Edward Arnold.
Gasser, M. 1990. 'Connectionist models.' Studies in Second Language Acquisition 12/2:
179-99.
Giere, R. N. 1984. Understanding Scientific Reasoning. New York: Holt, Rinehart and
Winston.
Giere, R. N. 1985. 'Philosophy of science naturalized.' Philosophy of Science 52: 3 3 1 -
56.
Giere, R. N. 1988. Explaining Science: A Cognitive Approach. Chicago: University of
Chicago Press.
Giles, H. and J. Byrne. 1982. 'An intergroup approach to second language acquisition.'
Journal of Multilingual and Multicultural Development 3:17-40.
Gould, S. J. 1981. The Mis-measure of Man. New York: Norton.
Gregg, K. 1990. 'The variable competence model of second language acquisition, and
why it isn't.' Applied Linguistics 11: 364-83.
Hacking, I. 1983. Representing and Intervening. Cambridge: Cambridge University
Press.
Harre, R. 1970. The Principles of Scientific Thinking. London: Macmillan.
Harre, R. 1985. The Philosophies of Science. Oxford: Oxford University Press.
Hatch, E., V. Flashner, and L. Hunt. 1986. 'The experience model and language
teaching' in R. R. Day (ed.): 'Talking to Learn': Conversation in Second Language
Acquisition. Rowley, MA: Newbury House.
Hudson, W. 1989. 'Semantic theory and L2 lexical development' in S. M. Gass and J.
Schachter (eds.): Linguistic Perspectives on Second Language A cquisition. Cambridge:
Cambridge University Press.
Hull, D. L. 1988. Science as a Process: An Evolutionary Account of the Social and
Conceptual Development of Science. Chicago: University of Chicago Press.
Hulstijn, J. 1989. 'Implicit and incidental second language learning: experiments in the
processing of natural and partly artificial input' in H. W. Dechert and M. Raupach
(eds.), Interlingual processes. Tubingen: Gunter Narr.
Hurford, J. R. 1991. 'The evolution of the critical period for language acquisition.'
Cognition 40:159-201.
Hyltenstam, K. 1977. 'Implicational patterns in interlanguage syntax variation.'
Language Learning 27/2: 383-411.
Hyltenstam, K. 1982. 'On descriptive adequacy and psychological plausibility: a reply to
Jordens.' Language Learning 32/1:167-73.
Janicki, K. 1990. 'A brief falsificationist look at contrastive sociolinguistics.' Papers and
Studies in Contrastive Linguistics 26: 5-10.
Jordens, P. 1980. 'Interlanguage research: interpretation or explanation.' Language
Learning 30/1: 195-207.
Jordens, P. 1982. 'How to make your facts fit: a response from Jordens.' Language
Learning 32/'1: 175-81.
Krashen, S. D. 1979. 'A response to McLaughlin, "The Monitor Model: Some methodo-
logical considerations".' Language Learning 29/1: 151-67.
Krashen, S. D. 1980. 'The input hypothesis' in J. Alatis (ed.): Current Issues in Bilingual
Education. Washington, D C : Georgetown University Press.
Krashen, S. D. 1982. 'Accounting for child-adult differences in second language rate
MICHAEL H. LONG 247

and attainment' in S. D. Krashen, R. C. Scarcella, and M. H. Long (eds.): Child-Adult


Differences in Second Language Acquisition. Rowley, Mass.: Newbury House.

Downloaded from https://academic.oup.com/applij/article-abstract/14/3/225/184491 by Michigan State University user on 16 October 2019


Krashen, S. D. 1985. The Input Hypothesis: Issues and Implications. New York:
Longman.
Krashen, S. D., V. Sferlazza, L. Feldman, and A. Fathman. 1976. 'Adult performance
on the SLOPE test: more evidence for a natural sequence in adult second language
acquisition.' Language Learning 26/1: 145-51.
Kuhn, T. S. 1970. The Structure of Scientific Revolutions. Chicago: University of
Chicago Press.
Kuhn, T. S. 1977. The Essential Tension. Chicago: University of Chicago Press.
Lakatos, I. 1978. 'Newton's effect on scientific standards' in J. Worrall and G. Currie
(eds.): Imre Lakatos, Philosophical Papers, Volume 1: The Methodology of Scientific
Research Programmes. Cambridge: Cambridge University Press.
Lamendella, J. 1977. 'General principles of neurofunctional organization and their
manifestations in primary and nonprimary acquisition.' Language Learning 27/1:
155-96.
Larsen-Freeman, D. 1975. 'The acquisition of grammatical morphemes by adult learners
of English as a second language.' Unpublished PhD dissertation, University of Michi-
gan.
Larsen-Freeman, D. and M. H. Long. 1991. An Introduction to Second Language
Acquisition Research. London: Longman.
Latour, B. and S. Woolgar. 1979. Laboratory Life. The Social Construction of Scientific
Facts. London: Routledge.
Laudan, L. 1977. Progress and its Problems. Berkeley: University of California Press.
Laudan, L. 1981. 'A problem-solving approach to scientific progress' in I. Hacking (ed.):
Scientific Revolutions. Oxford: Oxford University Press.
Laudan, L. 1984. 'The pseudo-science of science?' in J. R. Brown (ed.): Scientific Ration-
ality: The Sociological Turn. Dordrecht: Reidel.
Laudan, L. 1990. Science and Relativism. Some Key Controversies in the Philosophy of
Science. Chicago: University of Chicago Press.
Laudan, R. and L. Laudan. 1989. 'Dominance and the disunity of method: Solving the
problems of innovation and consensus.' Philosophy of Science 56: 221-37.
Lightbown, P. M. 1984. 'The relationship between theory and method in second
language acquisition research' in A. Davies, C. Criper, and A. P. R. Howatt (eds.):
Interlanguage. Edinburgh: Edinburgh University Press.
Long, M. H. 1990a. 'Maturational constraints on language development.' Studies in
Second Language Acquisition 12/3: 251 - 8 5 .
Long, M. H. 1990b. 'The least a second language acquisition theory needs to explain.'
TESOL Quarterly 24/4: 649-66.
Long, M. H. and C. J. Sato. 1984. 'Methodological issues in interlanguage studies: an
interactionist perspective' in A. Davies, C. Criper, and A. P. R. Howatt (eds.): Inter-
language. Edinburgh: Edinburgh University Press.
Major, R. 1987. 'A model for interlanguage phonology' in G. Ioup and S. Weinberger
(eds.): Interlanguage Phonology. The Acquisition of a Second Language Sound System.
New York: Newbury House/Harper and Row.
Martin, B. 1991. Strip the Experts. London: Freedom Press.
McLaughlin, B. 1987. Theories of Second Language Learning. London: Edward
Arnold.
McLaughlin, B. 1990. 'Restructuring.' Applied Linguistics 11/2: 1-16.
248 ASSESSMENT STRATEGIES FOR SLA THEORIES

McLaughlin, B., T. Rossman, and B. McLeod. 1983. 'Second language learning: an


information-processing perspective.' Language Learning 33/2:135-58.

Downloaded from https://academic.oup.com/applij/article-abstract/14/3/225/184491 by Michigan State University user on 16 October 2019


McMullin, E. 1976. 'The fertility of theory and the unit of appraisal in science' in R. S.
Cohen, P. K. Feyerabend, and M. W. Wartofsky (eds.): Boston Studies in the
Philosophy of Science 39: 395-432. Dordrecht: Reidel.
McShane, J. 1987. 'Do we need a metatheory of language development?' Language and
Communication 7/2: 111-21.
Meisel, J., H. Clahsen, and M. Pienemann. 1981. 'On determining developmental
stages in natural second language acquisition.' Studies in Second Language Acquisition
3/1:109-35.
Mulkay, M. 1991. Sociology of Science. A Sociological Pilgrimage. Buckingham: Open
University Press.
Mulkay, M. and N. Gilbert. 1984/1991. 'Theory choice' in M. Mulkay (ed.): Sociology of
Science. A Sociological Pilgrimage. Buckingham: Open University Press.
Nagel, E. 1939. Principles of the Theory of Probability. Chicago: University of Chicago
Press.
Newell, A. 1990. Unified Theories of Cognition. Cambridge, MA: Harvard University
Press.
Newmeyer, F. J. and J. Emonds. 1971. 'The linguist in American society.' Chicago
Linguistics Society 7: 285-303.
Newport, E. L. 1990. 'Maturational constraints on language learning.' Cognitive Science
14:11-28.
Newton-Smith, W. 1981. The Rationality of Science. Boston: Routledge and Kegan
Paul.
Newton-Smith, W. 1982. 'Relativism and the possibility of interpretation' in M. Hollis
and S. Lukes (eds.): Rationality and Relativism. Oxford: Blackwell.
Nickles, T. 1987. 'Twixt method and madness' in N. J. Nersessian (ed.), The Process of
Science. Dordrecht: Martinus Nijhoff.
Oiler, J. W. Jr. 1981. 'Language testing research (1979-1980)' in R. B. Kaplan, R. L.
Jones, and G. R. Tucker (eds.): Annual Review of Applied Linguistics 1980. Rowley,
Mass.: Newbury House.
Pickering, A. 1981. 'The hunting of the quark.' isis!2:2\ 6-36.
Pickering, A. 1984. Constructing Quarks: A Sociological History of Particle Physics.
Chicago: University of Chicago Press.
Pienemann, M. 1984. 'Psychological constraints on the teachability of languages.' Stud-
ies in Second Language Acquisition 6/2:186-214.
Pienemann, M. 1992. 'Teachability theory.' Unpublished MS. Sydney: Language
Acquisition Research Centre, University of Sydney.
Pienemann, M. and M. Johnston. 1987. 'Factors influencing the development of
language proficiency' in D. Nunan (ed.): Applying Second Language Acquisition
Research. Adelaide: National Curriculum Resource Centre.
Pinker, S. and P. Bloom. 1990. 'Natural language and natural selection.' Behavioral and
Brain Sciences 13: 707-27.
Popper, K. 1976. Unended Quest. London: Fontana-Collins.
Rosenberg, A. 1986. 'Philosophy of science and the potentials for knowledge in the social
sciences' in D. W. Fiske and R. A. Shweder (eds.): Metatheory in Social Science:
Pluralisms and Subjectivities. Chicago: University of Chicago Press.
Santos, T. 1989. 'Replication in applied linguistics research.' TESOL Quarterly 23/A:
699-702.
MICHAEL H. LONG 249

Sato, C. J. 1984. "Phonological processes in second language acquisition: another look at


syllable structure.' Language Learning 34/4:43-57.
Schmidt, R. W. 1992. 'Psychological mechanisms underlying second language fluency.7

Downloaded from https://academic.oup.com/applij/article-abstract/14/3/225/184491 by Michigan State University user on 16 October 2019


Studies in Second Language Acquisition 14/4: 357-85.
Schumann, J. H. 1978. The Pidginization Process: A Model for Second Language
Acquisition. Rowley, Mass.: Newbury House.
Schumann, J. H. 1983. 'Art and science in second language acquisition research.'
Language Learning 33:49-75.
Schumann, J. H. 1986. 'Research on the acculturation model for second language
acquisition.' Journal of Multilingual and Multicultural Development 7/5: 379-92.
Schwartz, B. 1990. 'Unmotivating the motivation for the Fundamental Difference
Hypothesis' in H. Burmeister and P. L. Rounds (eds.): Variability in Second Language
Acquisition. Eugene: University of Oregon.
Selinker, L. and D. Douglas. 1989. 'Research methodology in contextually-based
second language research.' Second Language Research 5/2: 93-126.
Selinker, L. and J. Lamendella. 1978. 'Two perspectives on fossilization in inter-
language learning.' Interlanguage Studies Bulletin 3/2:143-91.
Shapere, D. 1974. 'Scientific theories and their domains' in F. Suppe (ed.): The Structure
of Scientific Theories. Urbana: University of Illinois Press.
Shapere, D. 1986. 'External and internal factors in the development of science.' Science
and Technology Studies 4:1-9.
Shapin, S. 1982. 'History of science and its sociological reconstructions.' History of
Science 20/3:49,157-211.
Shanvood-Smith, M. 1991. 'SLA and the cognitive enterprise.' Plenary address to the
Second Language Research Forum, University of Southern California, Los Angeles,
February 28-March 3.
Sibata, T. 1958. 'Conditions controlling standardization. Excerpt from Nihon no hogen
[The dialects of Japan].' Tokyo: Iwanami Shoten. Translated by Motoei Sawaki, 1990,
MS.
Spolsky, B. 1989. Conditions for Second Language Learning. Oxford: Oxford University
Press.
Tarone, E. 1983. 'On the variability of interlanguage systems.' Applied Linguistics 4/2:
143-63.
Tarone, E. 1988. Variation in Interlanguage. London: Edward Arnold.
Tarone, E. 1989. 'Accounting for style-shifting in interlanguage' in S. Gass, C. Madden,
D. Preston, and L. Selinker (eds.): Variation in Second Language Acquisition.
Clevedon, Avon: Multilingual Matters.
van Fraassen, B. C. 1980. The Scientific Image. Oxford: Clarendon Press.
Wardaugh, R. 1970. 'The contrastive analysis hypothesis.' TESOL Quarterly 4: 2.
White, L. 1989. Universal Grammar and Second Language Acquisition. Philadelphia:
John Benjamins.
Wolfe-Quintero, K. 1992. 'Learnability theory and the acquisition of extraction in
relative clauses.' Studies in Second Language Acquisition 14/1: 39-70.
Wolfson, N. 1988. 'The bulge: a theory of speech act behavior and social distance' in J.
Fine (ed.): Second Language Discourse: A Textbook of Current Research. Norwood,
N J.: Ablex.
Wong-Fillmore, L. 1991. 'Second-language learning in children: a model of language
learning in social context' in E. Bialystok (ed.): Language Processing in Bilingual
Children. Cambridge: Cambridge University Press.

You might also like