Logical Issues in Language Acquisition 9789067655064 9067655066 Compress

Logical Issues
in Language
Acquisition
Linguistic Models
The publications in this series tackle crucial
problems, both empirical and conceptual, within the
context of progressive research programs. In
particular Linguistic Models will address the
development of formal methods in the study of
language with special reference to the interaction
of grammatical components.
Series Editors:
Teun Hoekstra
Harry van der Hülst
Other books in this series:
1 Michael Moortgat, Harry van der Hülst and Teun Hoekstra (eds)
The Scope of Lexical Rules
2 Harry van der Hülst and Norval Smith (eds)
The Structure of Phonological Representations. Part I
The Structure of Phonological Representations. Part II
4 Gerald Gazdar, Ewan Klein and Geoffrey K. Pullum (eds)
Order, Concord and Constituency
5 W. de Geest and Y. Putseys (eds)
Sentential Complementation
6 Teun Hoekstra
Transitivity. Grammatical Relations in Government-Binding Theory
Advances in Nonlinear Phonology
8 Harry van der Hülst
Syllable Structure and Stress in Dutch
9 Hans Bennis
Gaps and Dummies
10 Ian G. Roberts
The Representation of Implicit and Dethematized Subjects
Autosegmental Studies on Pitch Accent
12 a. Harry van der Hülst and Norval Smith (eds)
Features, Segmental Structures and Harmony Processes (Part I)
12 b. Harry van der Hülst and Norval Smith (eds)
Features, Segmental Structures and Harmony Processes (Part II)
13 D. Jaspers, W. Klooster, Y. Putseys and P. Seuren (eds)
Sentential Complementation and the Lexicon
14 René Kager
A Metrical Theory of Stress and Destressing in English and Dutch
Logical Issues
in Language
Acquisition
I.M. Roca (ed.)
¥
1990
FORIS PUBLICATIONS
Dordrecht - Holland/Providence RI - U.S.A.
Published by:
Foris Publications Holland
P.O. Box 509
3300 AM Dordrecht, The Netherlands
Distributor for the U.S.A. and Canada:

Foris Publications USA, Inc.
P.O. Box 5904
Providence RI 02903
U.S.A.
Distributor for Japan:

Toppan Company, Ltd.
Shufunotomo Bldg.
1-6, Kanda Surugadai
Chiyoda-ku
Tokyo 101, Japan
CIP-DATA KONINKLIJKEBIBLIOTHEEK, DENHAAG
Logical
Logical Issues in Language Acquisition / I.M. Roca (ed.). - Dordrecht [etc.]: Foris. - 111. -
(Linguistic Models : 15)
With Index, Ref.
ISBN 90 6765-506-6
Subject Heading: Language Acquisition
ISBN 90 6765 506 6
© 1990 Foris Publications - Dordrecht
No part of this publication may be reproduced or transmitted in any form or by any means,
electronic or mechanical, including photocopy, recording, or any information storage and
retrieval system, without permission from the copyright owner.
Printed in The Netherlands by ICG Printing, Dordrecht.

Table of Contents
List of Contributors xi
I.M. Roca
Introduction xv
Martin Atkinson
The logical problem of language acquisition: representational and
procedural issues 1
1. Background 2
2. Representational Issues 10
3. Procedural Issues 16
Footnotes 26
References 29
Vivian Cook
Observational data and the UG theory of language acquisition . . . . 33
1. Evidence in the UG model 33
2. I-language and E-language theories 34
3. Observational data, performance and development 35
4. Representativeness of observational data 37
5. Observational data and adult performance 38
6. Evidence of absence 40
7. Correlations within observational data 42
8. General requirements for observational data in UG research 43
References 45
Michael Hammond
Parameters of Metrical Theory and Learnability 47
1. Metrical Theory 48
2. Learnability 49
3. The Seven-Syllable Hypothesis 52
4. Levels and Options 53
5. Short-Term Memory Constraint 57
vi Logical Issues in Language Acquisition
Footnotes 60
References 61
Teun Hoekstra
Markedness and growth 63
1. Parameters and Markedness 64
2. Developmental markedness 69
3. Extension and Intension 72
4. The notion of growth: the Unique External Argument
Principle 73
5. A-chains 76
5.1. Ergatives 76
5.2. Passives 79
6. Conclusion 82
Footnote 82
References 83
James Hurford
Nativist and Functional Explanations in Language Acquisition 85
1. Preliminaries 85
1.1. Setting and Purpose 85
1.2. Glossogenetic and Phylogenetic mechanisms 87
1.3. Competence/performance, I-Language/E-language 89
1.4. The ambiguity of 'functional' 94
2. Glossogenetic mechanism of functional influence on language
form 96
2.1. The Arena of Use 96
2.2. Frequency, statistics and language acquisition 107
2.3. Grammaticalisation, syntacticisation, phonologisation 113
2.4. The role of invention and individual creativity 120
2.5. The problem of identifying major functional forces 124
2.6. Language drift 129
3. Conclusion 130
Footnotes 131
References 132
Rita Manzini
Locality and Parameters again 137
1. Locality 138
2. English Anaphors and Pronouns 142
3. Italian Reciprocal Constructions 148
4. Parameters in Locality Theory 152
References 156
Contents vii
Marina Nespor
On the rhythm parameter in phonology 157
1. Phonetic evidence against two types of timing 159
2. Phonological evidence against two types of rhythm 162
2.1. Nonrhythmic characteristics of "stress-timed and "syllable-
timed" languages 162
2.2. On the existence of intermediate systems 163
2.3. On the development of rhythm 165
3. The Phonology of rhythm: arguments for a unified rhythmic
component 166
3.1. The metrical grid in English and Italian 166
3.2. The Rhythm Rule in English and Italian 167
3.3. The definition of stress clash in Italian and English 169
3.4. Stress lapses in English and Italian 171
4. Conclusions 172
Footnotes 173
References 173
Mark Newson
Dependencies in the Lexical Setting of Parameters: a solution to the
undergeneralisation problem 177
1. The Lexical Parameterisation Hypothesis and Ensuing
Problems 177
2. A solution to the problems 179
3. Undergeneralisations and the Binding Theory 180
3.1. Background issues 180
3.2. Generalisations and the Lexical Dependency 182
4. Support for the Lexical Dependency 187
5. A further predicted generalisation 192
Footnotes 195
References 197
Andrew Radford
The Nature of Children's Initial Grammars of English 199
1. Introduction 199
2. Structure of nominals in early child English 202
3. Structure of clauses in early child English 209
4. The overall organisation of early child grammars 219
5. Summary 228
Footnotes 229
References 231
viii Logical Issues in Language Acquisition
Anjum Saleemi
Null Subjects, Markedness, and Implicit Negative Evidence 235
1. Some background assumptions 236
2. The Licensing Parameter 237
3. The Learnability Problem 242
4. Positive Identification 242
5. Exact Identification 247
6. Is Implicit Negative Evidence Really Necessary? 248
7. Developmental Implications 249
8. Binding Parameters and Markedness 252
Footnotes 255
References 256
Michael Sharwood Smith

Second Language Learnability 259
1. Introduction 259
1.1. The second language learner as a constructor of mental
grammars 259
1.2. LI and L2 acquisition as special cases of the same process 260
2. Linguistic theory and second language acquisition 262
2.1. L2 learnability 262
2.2. The initial L2 state: logical possibilities 264
2.2.1. The " U G by proxy" view 265
2.2.2. The "back-to-square-one" view 267
2.2.3. The UG-Reorganisation view 267
3. Research strategies 271
4. Conclusion 272
Footnotes 273
References 273
N. V. Smith
Can Pragmatics fix Parameters? 277
1. Introduction 277
2. Exclusions 278
3. Relevance 280
4. Parameters 281
5. Hyams 282
6. Fixing 284
7. Conclusion 287
Footnotes 288
References 288
Contents
Author Index
Subject Index
List of Contributors
Martin Atkinson
Department of Language and Linguistics
University of Essex
Colchester
Essex C 0 4 3SQ
ENGLAND
Vivian Cook
University of Essex
Colchester
Essex C 0 4 3SQ
ENGLAND
Michael Hammond
Department of Linguistics
University of Arizona
Tucson, AZ 857221
USA
e-mail:hammond@ccit.arizona.edu
Teun Hoekstra
Instituut voor Algemene Taalwetenschap
Rijksuniversiteit
Postbus 9515
2300 RA Leiden
THE NETHERLANDS
e-mail:letthoekstra@nl.leidenuniv.rulcri
xii
James Hurford
Department of Linguistics
Adam Ferguson Building
40 George Square
Edinburgh EH8 9LL
SCOTLAND
e-mail:jim@uk.ac.ed.edling
Rita Manzini
Department of Phonetics and Linguistics
University College
Gower Street
London WC1E 6BT
ENGLAND
Marina Nespor
Italiaans Seminarium
Universiteit van Amsterdam
Spuistraat 210
1012 VT Amsterdam
THE NETHERLANDS
Mark Newson
University of Essex
Colchester
Essex C04 3SQ
ENGLAND
Andrew Radford
University of Essex
Colchester
Essex C 0 4 3SQ
ENGLAND
Iggy Roca
University of Essex
Colchester
Essex C 0 4 3SQ
ENGLAND
e-mail:iggy@uk.ac.essex
Anjum P. Saleemi
English Department
Allama Iqbal Open University
H-8, Islamabad
PAKISTAN

English Department
Rijksuniversiteit te Utrech
Trans 10
3512 JK Utrecht
THE NETHERLANDS
e-mail:smith@nl.ruu.let.ruulet
N.V. Smith
Department of Phonetics and Linguistics
University College
Gower Street
London WC1E 6BT
ENGLAND
e-mail:uclynvs@uk.ac.ucl
Introduction
I.M. Roca
University of Essex
This volume grew out of a seminar series on the theme 'The Logical Problem
of Language Acquisition' that I organised for the Department of Language
and Linguistics of the University of Essex in 1988, and at which most
of the papers included here were first presented. The aim of the series
was to examine the impact of the issue on various areas of language research,
thus offering as broad as possible an overview of what is rapidly becoming
the focal point of generative linguistics.
The change in concerns and outlook which has taken place in linguistics
over the past quarter century is nicely encapsulated in the contrast between
the basic tenet of American descriptive linguistics that 'languages ... differ
from each other without limit and in unpredictable ways' (Joos 1957:96)
and Chomsky's current position that there is only one language (cf. e.g.
Chomsky 1988c: 2).
The apparent irreconcilability of these two stands betrays a more
fundamental truth that contemporary linguistics, under Chomsky's endu-
ring leadership, has been labouring to unravel and articulate. Specifically,
the crucial discovery has been that phenomenon must be kept distinct
from noumenon, or, in plainer words, that underlying the obvious diversity
of languages there is a unity more essential to language than its surface
geographical variety.
Chomksy has thus shifted the focus of linguistics from language to man,
from manifestation to source. The central question has now become that
of accounting for the possession of language, that is, of an object which
has the precise characteristics that human languages are known to have.
Pursuing this line of logical investigation, it is reasonable to conclude that
if all human languages are cut to the same shape, this shape must be
imposed by the very organism in which such languages are contained,
that is to say, by man himself. Moreover, given the obvious fact that
language develops in man rather than from him, like, say, a physical limb
or body hair, the need for interaction between the organism and its
environment becomes more acutely obvious. Briefly, what psychological
(or, more accurately, biological) attributes must humans possess in order
for language learning to take place in early childhood, under the usual
conditions of spontaneity, rapidity, satisfactory completion, and so on?
xvi I.M. Roca
In turn, what traits are necessary in the ambient language itself to make
such learning possible in spite of the apparent input variety which so struck
linguists of Joos's generation? Here we have in a nutshell what has come
to be known as the logical problem of language acquisition.
Chomsky's unashamedly nativist position is of course well-known.
Briefly, the surface complexity of language is such that no acquisition
could meaningfully take place unless the organism already comes equipped
with a sort of mental template designed to anticipate and match the ambient
data in some way. Given the reality of cross-linguistic surface variation,
however, such matching cannot be simplistically direct. Rather, the idea
is that the variation is built into the template in the form of a limited
range of values for each of a set of parameters. From this perspective,
therefore, the task of the child learner is one of elucidating from the data
which of the available values must be assigned to the language to which
he is being exposed. To a large degree, the acquisition of this language
consists in the setting of such parameters. Further to this, there will be
the (of course non-negligible) task of rote learning the idiosyncratic
properties of lexical items. Not unexpectedly, these two tasks are in fact
interdependent, in ways that are gradually becoming better known.
It is not my intention to review here the short but already hefty history
of the topic which inspires the title of this book. For most of the relevant
information, the curious reader can refer to such works as Wexler and
Culicover (1980), Baker and McCarthy (1981), Hornstein and Lightfoot
(1981), Atkinson (1982), Borer (1984), Pinker (1984), Berwick (1985),
Chomsky (1986a), Hyams (1986), Roeper and Williams (1987), and
Chomsky (1988a, 1988b, 1988c)
Focussing then on the contents of the present collection, a range of
interwoven themes are discernable, and we shall now go through them
briefly.
Granting the reality of Universal Grammar in the form of principles
and parameters, one obvious question concerns the chronology of its
availability. Specifically, are all such principles and parameters present
and accessible from the onset of the acquisition process or do they (or
at least some of them) emerge as development unfolds, as in Borer and
Wexler's (1987) Maturation Hypothesis? While Atkinson is decidedly
sympathetic to the maturational account, an important part of Hoekstra's
paper is aimed against Borer and Wexler's key argument for the hypothesis,
which is based on the claim that the non-occurrence of verbal passives
in the early stages is the result of the unavailability of A-chains at this
point of development. Hoekstra's alternative hinges on the characterisation
of language acquisition as growth in the system of grammatical knowledge,
the central theme of his paper. Importantly, such intensional accruement
need not result in extensional expansion, but is also consistent with
Introduction xvii
contraction of the output language, and in this light Hoekstra reinterprets

the Unique External Argument Principle of Borer and Wexler (1988).
The availability issue reemerges in the arena of L2 acquisition. Clearly,
here the learner arrives at the process with all the baggage of his learned
LI. The question therefore is - does Universal Grammar still play a role,
and, if so, exactly what form does it take? In Sharwood Smith's paper
a number of possibilities are presented and evaluated. Importantly, the
issue is clouded by the existence of several obvious differences between
mother tongue and L2 acquisition, in the areas of cognitive development,
social context, and target attainment, among others. The richness of such
extralinguistic factors creates falsifiability difficulties for simplistic claims
based on a naive, if commonly adopted, identity hypothesis. Consequently,
Sharwood Smith forcefully points out their methodological undesirability,
unless embedded in a research strategy embracing a range of alternatives
for the investigation of the developing perceptions of L2 learners.
I have thus far been using the terms 'acquisition' and 'learning' as
mutually substitutable alternatives. Behind such apparently harmless sty-
listic variation lurks however the substantive issue of the nature of language
development. In particular, is acquisition reducible to classical learning,
or does it possess characteristics all of its own? This is perhaps the question
of most general relevance and far-reaching consequences in the whole
domain of language. In his paper, Atkinson concludes after careful
discussion that, despite explicit claims to the contrary by practitioners
(e.g. Chomsky 1988a, Piatelli-Palmerini 1989), it is not possible to divorce
language acquisition from learning, given the central role afforded to
hypothesis selection and testing in both processes.
The books's spine, like that of the principles and parameters approach
to language and to language learnability, concerns the nature and identity
of the parameters themselves, and Atkinson warns against the dangers
of a new descriptivism cloaked in parametric terminology. He presents
and discusses several views on the matter, the adoption of which would
lead to a tightening of the range of parameters, and thus to a restriction
of the hypothesis space available to the child, with the obvious positive
consequences for language learnability.
A concrete aspect of the issue of parameter identity concerns the
assessment of causal relations between specific language phenomena and
the corresponding hypothesised parameter(s), and the papers by Manzini
and Nespor shed light on this matter from opposing ends.
A locality parameter for binding was proposed in Wexler and Manzini
(1987) and Manzini and Wexler (1987) in support of the influential Subset
Principle (Berwick 1985, Wexler and Manzini 1987). Pica (1987), however,
put forward an alternative account deriving the parametric effects in binding
by an appeal to the long vs. short-distance movement of anaphors in LF,
xviii I.M. Roca
in accordance with their categorisation as phrases or heads. In her paper,

Manzini objects to such a binarity-based approach to locality on the grounds
that there is an independent need for two separate definitions of locality,
as in Manzini (1988, 1989) and Chomsky (1986b), respectively. She backs
up her argument with a detailed examination of the behaviour of the Italian
reciprocal I'm I'altro with regard to locality, and concludes that the
observable parameter-like effects are indeed best accounted for by means
of a specific binding parameter.
Confronted with a similar situation in the domain of rhythm, Nespor
nonetheless arrives at the opposite conclusion. A cluster of properties led
investigators such as Pike (1945) and Abercrombie (1967) into a syllable-
vs. stress-timing dichotomy as regards the rhythmic realisation of languages.
Nespor notes that a specific rhythm parameter would entail a rigid
separation of language types which ought to be observable in acquisition,
and reviews evidence of various types (acoustic, perceptual, phonological)
which contradicts the predictions of such a parameter. In particular,
languages with intermediate effects are attested, and children's development
goes through a compulsory 'syllable-timed' phase cross-linguistically.
Nespor consequently concludes that, contrary to what may appear plausible
at first sight, the observable effects are not the result of the different settings
of an independent rhythm parameter, but rather correspond to several
autonomous processes which, in turn, produce the impression of a different
rhythm.
A commonly held belief is that, in order to facilitate learnability, the
values of the parameters are ordered according to a hierarchy of markedness.
The use of the term 'markedness' in the literature is not, however, free
of ambiguity, and Hoekstra's paper contains a useful review of the several
notions available. Tackling directly the issue of parameter marking, Saleemi
presents the case for adopting an intensional approach. Focussing on the
specific case of Pro-drop, he maintains that the ranking hierarchy between
the (multivalued) settings of the parameter can be derived from the set
theoretical relations which exist between the corresponding grammars, thus
directly confronting the claims put forward by the proponents of the
extensionally-oriented Subset Principle. Saleemi further contends that, while
learning according to such an internal criterion can proceed exclusively
on the basis of positive evidence, the possibility of some inconsistency
between the marked parameter values and the corresponding languages
may remain. If so, the achievement of what he calls 'exact identification'
(i.e. of the ambient language) may have to involve some use of implicit
negative evidence.
The challenge presented to learnability by the Lexical Parameterisation
Hypothesis (Wexler and Manzini 1987) is taken up in Newson's contri-
bution. In particular, Newson attempts to solve the Undergeneralisation
Introduction xix
Problem (Safir 1987) by establishing 'Lexical Dependencies' between the

settings of any one parameter for different categories. Thus, for instance,
the subset relations following the values of the Governing Category
parameter and the Proper Antecedent parameter are inverted for anaphors
and pronominals, with the consequence that the corresponding markedness
hierarchies also ought to be reversed, a conclusion which conflicts with
the distribution of pronominals in the world's languages. According to
Newson, this difficulty is readily resolved if we assume that anaphors have
dominant status over pronominals as regards the setting of these parameters.
Consequently, in the unmarked situation the setting for pronominals will
be parasitic on that for anaphors, even if this runs counter the predictions
of the Subset Principle. Note that Newson's lexical dependencies are
parameter-internal, and thus leave open the possibility that the settings
of different parameters are mutually unaffectable, as has been contended
by Wexler and Manzini (1987).
Undoubtedly, one of the most fruitful and debated dichotomies intro-
duced by Chomsky is that between competence and performance. Not
unexpectedly such a distinction is also found to permeate the area of
language acquisition. In particular, an important issue for learnability
theory concerns the trade-off between the (innate) principles of grammar
and the effects on learning of factors of performance, especially those
which impinge on the input data.
Cook's paper examines the pitfalls inherent to the investigation of
competence through performance, the common situation in acquisition
studies. In particular, he confronts the evidence provided by the grammatical
judgement of single sentences, which he regards as paradigmatic of studies
of adult competence, with the typical use of observational evidence in
child language. Cook alerts us to the possible dangers of misusing such
E-language data, and he proposes a range of specific methodological
safeguards. He further expresses his uneasiness about reading too much
into negative evidence, and suggests that, if child performance data are
indeed to be used, they must be compared with data of adult performance,
rather than competence.
Smith's contribution centres on the role of Pragmatics, as defined in
the context of Relevance Theory (Sperber and Wilson 1986), in the fixing
of parameters. The interaction between pragmatics and the acquisition
of grammar is paradoxical, in that each appears to presuppose the other:
pragmatic interpretation requires the use of grammatical knowledge, while
the acquisition of such knowledge must be grounded in the contextual
interpretation of the input utterances. Smith cuts the Gordian knot by
allowing a distinction in the mode of operation of pragmatic principles
between the adult and the developing child. In particular, he contends
xx I.M. Roca
that the operation of the child's pragmatic principles may not in fact
presuppose a total syntactic analysis.
Relevant to this issue is the detailed evidence presented by Radford
concerning the structure of early child grammars (20-24 months). These
grammars must be taken to lean heavily on Universal Grammar, given
the minimal amount of exposure to the ambient language by this time.
Obviously, thus, they constitute a privileged testing ground for claims
regarding the availability of grammatical devices to the developing child.
Radford's finding is that early grammars are lexical-thematic, that is to
say, they contain neither functional categories nor non-thematic con-
stituents. Correspondingly, all structures in these grammars are projections
of lexical categories and comprise networks of thematic relations. As a
consequence, child grammar will lack the functional properties associated
with functional categories, such as case and binding. Interestingly, the
lexical-thematic hypothesis can account for the absence of movement chains,
referred to above in connection with Hoekstra's contribution.
The topic of the interaction between Universal Grammar and the ambient
language is taken up again by Hammond, who draws a distinction along
standard lines between a default setting of a parameter, for which no external
evidence is required, and a marked setting, which can only be triggered
by positive evidence. He goes on to show that in the domain of word
stress the marked values cooccur with a maximum of seven syllables. Rather
than building the corresponding constraint into UG, Hammond opts for
the non-stipulative strategy of relating the observation to Miller's (1967)
magical number seven. In particular, he contends that the reason for the
7-syllable limit is derivable from the limitation of the storage capacity
of short-term memory to seven units. In this way, the statement of the
stress parameters is kept at its maximum level of generality, while still
being compatible with the facts.
Hurford explicitly sets out to reconcile nativist and (social or cognitive)
functional explanations to language acquisition, all ¡too often incarnated
in the guise of two openly warring factions. He makes a general plea
for the integration of extra-grammatical factors into the domain of
learnability by drawing a distinction between the evolution of the species
and the evolution of particular languages, which he claims to be a function
of both innate and culturally transmitted factors. The central concept in
his theory is the 'Arena of Use', a performance-related abstraction pa-
ralleling Chomsky's competence-related Language Acquisition Device. By
jointly providing the input data for the next generation of learners, the
LAD and the AoU are both instrumental in the acquisition of competence.
Importantly, a model of this kind allows for such factors as statistical
frequency and distribution, discourse structure, and individual invention
and creativity to play a role in language development without needing
Introduction xxi
to build them directly into the competence. It moreover goes some way
towards accounting for such recalcitrant phenomena as the existence of
language drift or the survival of the phoneme through adverse theoretical
conditions.
As follows from the broad range of contributions, the book ought to
be readable by, and useful to, linguists with a variety of interests and
from a variety of backgrounds: child language researchers, learnability
theorists, syntacticians and phonologists with an interest in principles and
parameters, functionalists, language phylogenists, second language rese-
archers, and so on. Indeed, it is perhaps not unreasonable to hope that
the volume will make some contribution towards the integration of the
rich and varied field of language acquisition.
During the period leading up to publication, the papers were subject
to critical reviews and subsequent extensive revision, and I wish to make
public my gratitude to the anonymous referees who so generously con-
tributed their time and expertise. My editing task has been considerably
facilitated by the help and encouragement I received from the series editors,
Teun Hoekstra and Harry van der Hulst, and from the Essex colleagues
who participated in the project, in particular Martin Atkinson, whose idea
the collection originally was, and who made funds and facilities available
during his period as chairman of the Department of Language and
Linguistics.
In the interest of symmetry, I have taken the liberty of introducing
a modest degree of style harmonisation across the papers, which will
hopefully make the reader's task a more pleasurable one. The generic use
of he adopted here should obviously not mislead the pragmatically aware
reader into believing that children (or adults) are all of one sex. Owing
to practicalities and, especially, time pressure, the choice of a number
of typographic conventions has however been left to the individual initiative
of the contributors.
Throughout the two years which have elapsed between conception and
delivery, the contributors have at all times borne my periodic bombardments
with patience and good humour. I apologise to them for my countless
inefficiencies and thank them warmly for their enthusiasm and cooperation.
It is of course to the contributors that any merit of this collection must
ultimately revert.
REFERENCES
Abercrombie, D. 1967. Elements of General Phonetics. Edinburgh: Edinburgh University Press.

Atkinson, M. 1982. Explanations in the Study of Child Language Development. Cambridge:
Cambridge University Press.
xxii I.M. Roca
Baker, C. and J. J. McCarthy. 1981. The Logical Problem of Language Acquisition. Cambridge,
Massachusetts: MIT Press.
Berwick, R. C. 1985. The Acquisition of Syntactic Knowledge. Cambridge, Massachusetts:
MIT Press.
Borer, H. 1984. Parametric Syntax. Dordrecht: Foris.
Borer, H. and K. Wexler. 1987. The Maturation of Syntax. In Roeper and Williams. 123-
172.
Borer, H. and K. Wexler. 1988. The Maturation of Grammatical Principles. Ms. University
of California, Irvine.
Chomsky, N. 1986a. Knowledge of Language: Its Nature, Origin and Use. New York: Praeger.
Chomsky, N. 1986b. Barriers. Cambridge, Massachusetts: MIT Press.
Chomsky, N. 1988a. Generative Grammar. Studies in English Linguistics and Literature. Kyoto
University of Foreign Studies.
Chomsky, N. 1988b. Language and Problems of Knowledge: the Managua Lectures. Cambridge,
Chomsky, N. 1988c. Some Notes on Economy of Derivation and Representation. Ms. MIT.
Hornstein, N. and D. Lightfoot. 1981. Explanation in Linguistics. London: Longman.
Hyams, N. 1986. Language Acquisition and the Theory of Parameters. Dordrecht: Reidel.
Joos, M. 1957. Readings in Linguistics. Chicago: University of Chicago Press.
Manzini, R. 1988. Constituent Structure and Locality. In A. Cardinaletti, G. Cinque and
G. Giusti (eds.), Constituent Structure. Papers from the 1987 GLOW Conference, Annali
di Ca' Foscari 27, IV.
Manzini, R. 1989. Locality. Ms. University College, London.
Manzini, R. and K. Wexler. 1987. Parameters, Binding Theory and Learnability. Linguistic
Inquiry 18. 413-444.
Miller, G.A. 1967. The Magical Number Seven, plus or minus two: Some Limits on our
Capacity to Process Information. In G. A. Miller (ed.) The Psychology of Communication.
New York: Basic Books Inc. 14-44.
Piatelli-Palmerini, M. 1989. Evolution, Selection and Cognition: from 'Learning' to Parameter
Setting in Biology and in the Study of Language. Cognition 31. 1-44.
Pica, P. 1987. On the Nature of the Reflexivization Cycle. In Proceedings ofNELS 17, GSLA
University of Massachusetts.
Pike, K. 1945. The Intonation of American English. Ann Arbor. Michigan: University of
Michigan Press.
Pinker, S. 1984. Language Learnability and Language Development. Cambridge, Massachusetts:
Harvard University Press.
Roeper, T. and E. Williams. 1987. Parameter Setting. Dordrecht: Reidel.
Safir, K. 1987. Comments on Wexler and Manzini. In Roeper and Williams. 77-89.
Sperber, D. and D. Wilson. 1986. Relevance: Communication and Cognition. Oxford: Blackwell.
Wexler, K. and P. Culicover. 1980. Formal Principles of Language Acquisition. Cambridge,
Wexler, K. and R. Manzini. 1987. Parameters and Learnability in Binding Theory. In Roeper
and Williams. 41-76.
The logical problem of language
acquisition: representational and
procedural issues
Martin Atkinson
University of Essex
I take it that the logical problem of language acquisition has come of

age, in the sense that an increasing number of researchers in linguistic
theory and the acquisition of language refer their speculations to this
problem and offer them as contributing to its solution. This is to be
contrasted with the situation a decade ago, when much of the linguistics
literature contained only token gestures towards the problem and that
devoted to empirical studies of child language largely ignored it.1 This
latter was particularly worrying, leading to a plethora of studies which
were rich in data of various kinds and designed according to the best
standards operative in the field, but which, lacking theoretical foundation,
failed to have any lasting significance.
The purpose of this paper is to offer an overview of the field. It will

not, however, constitute a review, since, for the most part, I shall presuppose
some familiarity with the primary literature which informs much of the
discussion. Rather, I shall seek to highlight some of the issues which seem
to me to be central and, perhaps more importantly, draw attention to
a series of questions where clarification appears to be called for. There
is, in my view, a new optimism abroad in language acquisition theory
at the moment, based on the belief that recent linguistic theorising is at
last providing the appropriate concepts for approaching a genuinely
explanatory account of the child's achievement in mastering his first
language . This is an optimism which I share and which I hope this paper
will convey. Inevitably, however, novel conceptualisations of problems
generate their own fundamental questions, and this is a sign of a burgeoning
research paradigm.
I have found it convenient in my own thinking to attempt to maintain

a distinction between what I refer to as representational and procedural
aspects of the problem. Just what is involved in this distinction will become
clear as the paper proceeds, but I should state at the outset that in utilising
this distinction, I do not wish to maintain that it will necessarily survive
as understanding of the issues deepens. For now, it should be treated
as an expository convenience. Accordingly, after an introduction in which
2 Martin Atkinson
a number of background assumptions and issues are presented, the paper

consists of two major sections. The first is largely concerned with the
nature of parameters and parametric variation and the questions raised
are in the context of the idealisation to instantaneous acquisition; the second
focuses on mechanisms of development which might be supposed to operate
in real time.
1. BACKGROUND
Recognition of the existence of a problem of the type with which this

volume is concerned arises in the context of adopting an explicit framework
for thinking about language acquisition. Such a framework typically
contains at least three sets of assumptions, as in (l) 2 :
(1) a. those concerning the space of hypotheses available to the

child;
b. those concerning the data available to the child;
c. those concerning the procedure(s) the child utilises in selecting
hypotheses on the basis of exposure to data.
A logical problem exists, modulo such a framework, when it can be argued

(informally in most linguistic theorising, but formally in formal learning
theory, e.g. Wexler and Culicover 1980; Osherson, Stob and Weinstein
1986) that there is no guarantee that the correct hypothesis will be selected
by the assumed procedure(s) on exposure to appropriate data; if we present
such a theory of language learning as a serious candidate for explaining
some aspect of the child's acquisition and it leads to this conclusion, we
can immediately infer that some aspect of it is wrong, since children do
in fact acquire their native language at least partially on the basis of exposure
to data.
Various responses can be contemplated to the conclusion that a logical
problem exists, but before looking briefly at these, it will be useful to
see how the framework of (1) can be applied in an abstract context (Wexler
and Culicover 1980, 43-46). Consider the infinite set of languages in (2),
each consisting of an infinite set of sentences using the single 'word' a:
(2) Lj = {a, aa, aaa, }
L2 = {aa, aaa, }
L3= {aaa, }
etc.
The logical problem of language acquisition 3
This set of languages is to constitute the hypothesis space of (la), i.e.

the learner's task is to be the identification of the language to which he
is being exposed from this antecedently given set. Exposure consists of
the presentation of sentences from the target language so that, for any
sentence in the language, there will be some finite time at which that sentence
will have been presented. Crucially, the learner is presented with no
information about non-sentences. 3 For example, if the learner is being
exposed to L 3 , at no point is he presented with the information in (3):
(3) *aa
With our assumptions about the data available to the learner explicit,
it is easy to see how to formulate a procedure which will guarantee successful
identification after a finite time. This procedure simply instructs the learner
to set i, in his current hypothesis L i; as the length of the shortest sentence
to which he has been exposed so far. Since, by our assumptions about
the data, this shortest string will be presented after some finite time, at
that time the procedure will select the correct language and no subsequent
datum will modify this selection. Adding this procedure, then, to the
assumptions about the space of hypotheses and those concerning available
data will yield a learning theory for the languages of (2) in which no
logical problem arises.
But now consider a superficially similar problem which leads to a radically
different outcome. Suppose that the hypothesis space is defined by the
languages in (4), only one of which contains an infinite number of sentences,
and that our assumptions about data remain unaltered:
(4) L, = {a}
L 2 = {a, aa}
L 3 = {a, aa, aaa}
L 0 = {a, aa, aaa }
Now, it is easy to see that no procedure can be formulated which will

guarantee choice of the correct language in a finite time. For suppose
that we attempt to produce a procedure which is conservative and sensitive
to the length of input sentences: say, set i as the length of the longest
sentence in the data so far. This will be fine, so long as the target language
is one of the finite languages, but if the target happens to be L 0 , this
language will never be guessed, so the required guarantee of correctness
is not obtained. Alternatively, the non-conservative strategy of selecting
4 Martin Atkinson
L0 in all circumstances will be successful precisely where the conservative

strategy fails; but it will also fail where the latter succeeds. Other strategies
that could be envisaged may be successful on occasions, but it should
be clear on the basis of the above that no such strategy could guarantee
success across the full set of target languages. Accordingly, a logical problem
exists for the languages in (4), no matter what procedure we invoke, and
it follows that correct identification of languages from this set is either
impossible or is going to involve changing assumptions about data.
It is instructive to see how this problem can be solved if the assumption
that the learner gets no information about non-sentences is changed. So
now assume that the learner's data consist of sentences and non-sentences
labelled as such and that for any datum (sentence or non-sentence), there
will be a finite time at which it will have been presented.4 With this modified
assumption, there is no difficulty in specifying a procedure that will
guarantee successful identification: set i initially at 0 and successively modify
it to the length of the shortest non-sentence presented minus 1.
The above examples are some way removed from the arena of natural
language acquisition, but, in drawing attention to the fundamental changes
in the character of a problem which follow from the assumption that
information about the status of non-sentences is available to the learner,
they make contact with arguments which are constructed in more directly
relevant contexts.
The perspective I shall be adopting in what follows is that the child
develops a core grammar for a language on the basis of exposure to data
of a strictly limited kind, and by far the most important limitation
consistently advocated is that the data do not contain non-sentences labelled
as such.5 If this is so, we have only to consider the wide range of impressively
subtle judgements which appear to be readily available to native-speakers,
and which presumably arise because of what native-speakers know about
their language, to see how easy it is to construct informal tokens of the
logical problem in the natural language learning domain. To take well-
known examples, native speakers of English have little difficulty in agreeing
that (5) is considerably better than (6), although perhaps not perfect:
(5) ?Who did John wonder whether Mary kissed?
(6) *Who did John wonder whether kissed Mary?
Similarly, multiple wA-questions exhibit a subject-object asymmetry, with

the sentence with the subject in situ being considerably degraded in well-
formedness compared to that where the object is not moved6:
(7) Who did what?

(8) ?What did who do?
Or, to take a less widely discussed example, Baker (1988b) cites data from
Chichewa, showing that this language has applicative constructions cor-
responding to both instrumentals and benefactives:
(9) Mavuto a- na- umb -ir -a mpeni

Mavuto SUBJ PREF PAST mould APPLIC ASP knife
mtsuko
waterpot
'Mavuto moulded the waterpot with a knife'
(10) Mavuto a- na- umb -ir -a mfumu

Mavuto SUBJ PREF PAST mould APPLIC ASP chief
mtsuko
waterpot
'Mavuto moulded the waterpot for the chief
In (9) and (10) the instrumental NP mpeni and the benefactive NP mfumu
are 'promoted' to direct object position immediately following the verb,
creating a structure in which the verb appears to be followed by two objects.
However, these two objects behave rather differently in a number of respects
between the instrumental and benefactive cases. To take one such difference,
for the instrumental both 'objects' can appear as pronominal object prefixes
in front of the verb, as in (11):
(11) a. Mavuto a- na- u- umb -ir

Mavuto SUBJ PREF PAST OBJ PREF mould APPLIC
-a mitsuko
ASP waterpots
'Mavuto moulded the waterpots with it'
b. Mavuto a- na- i- umb -ir -a
Mavuto SUBJ PREF PAST OBJ PREF mould APPLIC ASP
mpeni
knife
'Mavuto moulded them with a knife'
For the benefactive applicative, however, only the benefactive NP can be

replaced by the pronominal object prefix7:
6 Martin Atkinson
(12) a. Mavuto a- na- wa- umb -ir

Mavuto SUBJ PREF PAST OBJ PREF mould APPLIC
-a mtsuko
ASP waterpot
'Mavuto moulded the waterpot for them'
b. * Mavuto a- na- u- umb -ir
Mavuto SUBJ PREF PAST OBJ P R E F mould APPLIC
-a ana
ASP children
'Mavuto moulded it for the children'
The point now is that to the extent that these judgements are reliable
and diagnostic of a uniform, internally represented grammar, we are obliged
to seek an account of how they arise. That native-speakers of English
are not consistently (or even exceptionally) told that (5) is odd but nothing
like so bad as (6), or that (7) is fine but (8) is less good is surely
uncontroversial. Furthermore, resorting to analogy has no attractions here,
as the relevant English judgements concern degrees of ill-formedness and,
by assumption, the child is provided with no information of this nature
which could form the basis for an analogical inference. Also in the Chichewa
case, if analogy were to be employed, it would presumably yield the
conclusion that (12b) is well-formed alongside ( l i b ) , since simple appli-
catives, lacking object prefixes, do not appear to differentiate between
instrumentals and benefactives. In these circumstances, we appear to be
driven to the conclusion that the judgements that are made regarding these
sentences must arise from an interaction of the data to which children
are exposed and knowledge which is brought to the acquisition task, this
knowledge amounting, within the framework of (1), to a substantive
constraint on the hypotheses considered by the child. Characterisation of
this knowledge is precisely the concern of linguists attempting to formulate
accounts of Universal Grammar and, as we shall see, is specifically aimed
at dealing with representational aspects of the logical problem.
Before being swept along with the current of opinion which claims that
information about non-sentences, or negative evidence as it is often called,
is not available to the learner, it is prudent to note that the above
observations claim only that acquirers of English (or Chichewa) do not
receive systematic explicit exposure to non-sentences together with infor-
mation about their status. This does not rule out the possibility of a causally
efficacious role for implicit negative evidence, as is noted in Chomsky
(1981). One way in which this suggestion could be given some substance
would be to equip the child with some mechanism which is sensitive to
non-occurring tokens which might be predicted as occurring on the basis
of an existing system, such non-occurrences, after exposure to a specified

amount of data, leading to modifications in the system. That it is possible
to formulate learning principles which have this sensitivity is demonstrated
by Oehrle (1985), and the extent to which profound changes in learnability
domains follows from assuming the existence of negative evidence (in any
form) is examined in detail by Osherson, Stob and Weinstein (1986). While
it is important to be aware of the magnitude of the effects of the no
negative data assumption, the fact remains that the vast majority of work
in this area adopts the assumption (but see Randall 1985; Lasnik 1985;
Saleemi 1988 and this volume).
Returning now to possible responses to the observation that a logical
problem exists, it is clear that all three components in the framework
of (1) will embody substantive claims and will be modifiable in principle.
The discussion of negative evidence above has exhibited one way in which
the information in the data available to the child might be enriched, albeit
in an indirect way, and such enrichment could well lead to the evaporation
of the perceived problem while leaving whatever assumptions we are
operating with under (la) and (lc) intact. An alternative strategy, also
aimed at (lb), is to assume that the available data are structured in a
particularly appropriate way, constituting the ideal language learning
environment favoured by those who see an important causal role in the
features of the special register of Motherese (see papers in Snow and
Ferguson 1977, most notoriously Brown's introduction to the volume).
As things stand, there is little reason to be optimistic that this strategy
can be profitably pursued. Most directly, whatever features may be
characteristic of the Motherese register, they are surely irrelevant to the
type of judgement involved with (5) - (12) above, and this type of example
could be multiplied endlessly. In addition, the strategy has been effectively
challenged both empirically (Newport, Gleitman and Gleitman 1977;
Gleitman, Newport and Gleitman 1984) and conceptually (Wexler and
Culicover 1980; Wexler 1982).
Manipulations of assumptions under (la) and (lc) are essentially what
the next two sections of this paper are concerned with, but a preliminary
word is in order here. Regarding (la), the obvious move to contemplate
is that of restricting the hypothesis space, and, of course, this can be seen
as a fairly constant backdrop in the development of generative grammar.
For (lc), the issues are less straightforward. If we are subscribing to a
hypothesis selection and testing account of development, as (1) suggests
we are, the most powerful procedure, in the sense of that which will
guarantee success in identification if any can, will be one that allows the
child access to an a priori enumeration of hypotheses. Changes in the
current hypothesis will be occasioned by errors on a current datum and
the next hypothesis in the enumeration which is consistent with the current
8 Martin Atkinson
datum and all previous data will be selected as an alternative. On the

assumption that the correct hypothesis is located in the enumeration, it
will eventually be selected and no further data will lead to its rejection.
But, as is argued extensively by Wexler and Culicover (1980), such a
procedure, requiring memory for all previous data, is empirically quite
implausible. However, any change in the procedure away from enumeration
will yield a 'weaker' procedure and any problems arising on the enumeration
assumption will remain unsolved. It follows, then, that certain changes
in the selection procedure will be quite ineffective in the context of a logical
problem and will in no way obviate the need for restrictions in the hypothesis
space under (la). 8
To close this introductory section, it is perhaps appropriate to consider
the three dominant paradigms in the linguistics of the last 50 years from
the point of view of the framework in (1). First, taking neo-Bloomfieldian
structuralist linguistics, the acquisition model that this approach gives rise
to might be schematised as in (13):
(13) a. any hypothesis compatible with the application of inductive

procedures to primary linguistic data;
b. 'objective' properties of utterances, most notably acoustic and
distributional properties;
c. hypothesis formulation and testing.
Since the neo-Bloomfieldians did not formulate a mentalistic acquisition

model, what we have in (13) must be approached with caution. Nevertheless,
it is very much in the spirit of the Chomskian re-construction of the position
advocated by his structuralist predecessors, taking the view that their way
of doing linguistics embodied an implicit theory of language acquisition.
Notable characteristics of it from the point of view of this paper include
the observation that the constraints on hypotheses are procedurally induced
and are not the product of any substantive linguistic principles, i.e. there
is no place for Universal Grammar in this schématisation. Furthermore,
hypotheses are not selected but are actually formulated via the application
of inductive procedures and whether this is even intelligible in a mentalistic
framework is highly debatable. However, explicitly restricting the data
available to the learner, albeit via a spurious notion of objectivity, is an
emphasis which has already cropped up and which will recur.
Inadequacies in the linguistic accounts produced by the structuralists led

to the classic theory of transformational grammar (Chomsky 1965) with
the characteristics qua a theory of language acquisition as in (14):
(14) a. any rule system compatible with Universal Grammar, this

being construed as a set of constraints on possible rule sy-
stems;
b. no clear statement, but with hindsight it appears that it was
necessary for the child to have access to the same data as the
linguist, including information about non-sentences, etc;
c. hypothesis selection, testing and evaluation via the operation
of an evaluation measure.
Compared to the structuralist account, there are major changes here. The
hypothesis space is now constrained by linguistic principles and the
formulation (or discovery) of hypotheses is replaced by selection from
an antecedently specified set of possibilities. However, the framework is
bedevilled by a number of problems, including the following: (i) Universal
Grammar, as a set of constraints on possible rule systems, makes available
a very rich set of descriptive options, many of which are not attested
and, indeed, are unlikely to be so; the descriptive poverty of structuralism
is replaced by profligacy; (ii) it is quite counterintuitive to assume that
the child has access to the same data as the linguist; yet without this
assumption, the problem raised under (i) takes on massive proportions,
as the child would then be required to pick his way through this forbiddingly
complex set of options on the basis of rudimentary data; (iii) the form
and operation of the evaluation measure, which was the mechanism enabling
the child to select the descriptively adequate grammar over one that was
merely observationally adequate, remained poorly understood and unde-
veloped.
It was against this background of descriptive largesse that the current
Principles and Parameters model emerged with the characteristics in (15):
(15) a. any core grammar which results from the interaction of a set
of universal principles and a set of parameters, the values of
which can vary;
b. subject to a criterion of 'epistemological priority';
c. triggering, parameter setting and maturation.
Of course, the switch from rule systems to principles does not in itself
guarantee the restrictiveness of descriptive options, but this must be seen
alongside an emphasis on the deductive structure of the theory, which
enables a particular principle to have effects, as far as the properties of
sentences are concerned, only at the end of a lengthy deduction, perhaps
involving complex interactions with other principles and parameters. The
intention, anyway, is that the number of principles, and perhaps also
parameters, will be fairly small and this is clearly a shift away from a
10 Martin Atkinson
situation in which each construction type in each language merits its own
rule.
Under (15b), the reference to epistemological priority is a recognition
that the development of the system must take place in the context of data
which it is plausible to assume the child actually has access to. Thus,
alongside the familiar restriction on negative data, this approach prohibits
reliance on complex data in the fixing of parameter values (in this
connection, see Wexler and Culicover 1980 on degree-2 learnability, Morgan
1986 on degree-1 learnability if the child has access to constituent infor-
mation, the speculations of Lightfoot (1989) on degree-0 learnability, and
Elliott and Wexler 1988 on the emergence of a set of grammatical categories
from an epistemologically plausible perspective).
Finally, triggering, parameter-setting and maturation under (15c) are
intended to have a character which makes them quite distinct from learning,
even when the latter is construed mentalistically as in (14c), and the extent
to which this can be maintained will be examined in Section 3 below.
I now turn to a discussion of issues surrounding (15a).
2. REPRESENTATIONAL ISSUES
The Principles and Parameters framework can, in fact, be pursued in two

rather different ways, depending upon whether researchers are interested
in the detailed operation of whatever mechanisms are postulated in (15c)
or not. If not, questions are raised in the context of pure linguistic research,
and acquisitional issues are considered at a level of abstraction defined
by an idealisation to instantaneous acquisition. The view is that this
idealisation is innocent for the purposes of furthering linguistic unders-
tanding, coupled with the recognition that it does not address real-time
problems in the acquisition domain. 9 Schematically, this idealisation can
be represented as in (16):
Here S0 and Sn designate the initial and final states in the acquisition
process. Pj, P 2 , ..., Pn is the set of universal principles, and p b p 2 , ...,
pm is the set of parameters. The use of x in connection with the parameters
at S0 is intended to indicate that at this stage their values are open, and
the aj at S n represent the values that are determined in the acquisition
process. Presumably, S 0 will also contain specifications of the ranges of
the different parameter values, but I am not concerned with such niceties
here.10 (16) contains nothing corresponding to the observed gradualness
of the acquisition process and no detailed information about how the
transition between S0 and S„ is effected. Consideration of these questions
is set aside until Section 3.
Perhaps the most serious problem confronting this way of looking at
things is that of the nature of the principles and parameters, i.e. what
is needed is a general theory of what principles and parameters are
legitimate, and this section is largely concerned with examining a number
of perspectives on this problem. I shall have little to say about principles
here, although the issues I raise deserve consideration from this perspective
too (see Safir 1987).
As things stand, there is not a great deal of agreement among researchers
on the identity of more than a small number of parameters. There is pro-
drop, the unitary status of which is the subject of considerable debate
(see, for example, Safir 1985), bounding node for Subjacency, direction
of Case and 0-role assignment, governing category, again involving some
dispute, the set of proper governors for ECP, and perhaps a few others
which have been the subject of systematic discussion. Alongside these,
however, there is a large set of proposals in the literature which might
be viewed simply as parametric relabellings for aspects of linguistic
variation. To take one example, in Lasnik and Saito (1984), in the context
of a discussion of the position of w/z-phrases in English and several other
languages, we meet the suggestion that whether complementiser positions
marked as [+w/i] must contain a [+w/i] element at S-structure is a parameter,
and they speculate on whether such a parameter is implicationally related
to whether languages have syntactic wA-movement, concluding that it is
and that the 'basic' parameter is one expressing the presence or absence
of such movement. But these observations do not proceed significantly
beyond the data that lead to them, and it is difficult to resist the suggestion
that we are being offered nothing more than a translation of an aspect
of linguistic variation into a fashionable mode.
Now, I do not wish to suggest that the Lasnik and Saito parameter
is illegitimate, but the view that the theory of parameters is itself in a
position similar to that of the theory of transformational rules in the late
1960s is not easy to put aside. Of course, there is an important difference
in that individual transformational rules were seen as having to be learned,
whereas parameters and their values are given as part of the solution to
the logical problem, but from a methodological perspective, there are
uncomfortable similarities; just as it was all too easy to formulate con-
12 Martin Atkinson
struction-specific rules within a rule-based framework to take account of

constructional idiosyncracies, so it is straightforward to allude to the
existence of some parameter in coming to terms with some aspect of
linguistic variation. The risk of a new variant of descriptivism is very real. 11
How might we contemplate constraining the theory of possible para-
meters? It seems to me that there are a number of avenues worth exploring,
although none of them presents a clear way to proceed at the moment.
One is that such constraints will emerge out of procedural considerations
once we drop the instantaneous acquisition idealisation, and I shall come
back to this in Section 3.12 At least two types of possibility exist, however,
which are not primarily based on procedural considerations and it is
appropriate to discuss these here.
The first is difficult to formulate clearly and is broadly methodological.
Thus, one might maintain that a legitimate parameter will have a certain
amount of 'explanatory depth', a property we might expect to follow from
the deductive structure of the theory alluded to at the end of Section
1. So Baker (1988b), discussing why instrumental NPs do not incorporate
universally into verbs in languages which allow noun incorporation,
speculates that this may be due to parameterisation in Case theory. The
alternative of parameterising G-role assignments so that instruments are
assigned 0-roles directly by the verb in some languages, thereby allowing
incorporation on Baker's assumptions, but not in others is dismissed as
less attractive, presumably because Baker (1988a) has extensive observations
suggesting that Case-assigning properties need to be parameterised across
a variety of languages, this parameterisation having far-reaching conse-
quences, whereas G-role assignment parameterisation would be an inno-
vation not linked to other aspects of the theory. That linguists do operate
with some such notion, then, is probably uncontroversial, but like all
methodological rules-of-thumb it is difficult to ascribe the sort of content
to it which would enable us to reliably categorise proposed parameters
as legitimate or not.
The second strategy is to impose some substantive constraint on
parameters and this admits two sub-cases: the constraints may directly
concern the form of parameters, or the location of parametric variation
within the theory. I shall briefly discuss each of these possibilities in turn.
Perhaps the most obvious formal property to consider is that of binarity.
Since the phonological speculations of Jakobson and his associates (Ja-
kobson, Fant and Halle 1952; Jakobson 1968), linguists have been attracted
by binarity, and, in a recent attempt to link the parameter-setting approach
in language learning to selective theories in biology, speaking of parameters,
Piattelli-Palmarini says (1989, 3): "... each can be 'set' on only one of
a small number of admissible values (for many linguistic parameters there
seem to be just two such possible values) ..." It might also be maintained
that binarity sits most comfortably with the switch-setting analogy offered
by Chomsky (1988a), an analogy which Piattelli-Palmarini uses extensively
(see further below), although nothing in principle rules out the possibility
of multiple switch-settings.
Unfortunately, attractive as binarity might be conceptually, in the current
state of enquiry we are forced to acknowledge the existence of multiple-
valued parameters even among those where fairly extensive justification
exists. Thus, for example, Wexler and Manzini (1987) and Manzini and
Wexler (1987) offer a 5-valued parameter for governing category, Saleemi
(this volume) considers a 4-valued parameter in connection with his
reanalysis of pro-drop phenomena in terms of the postponement of Case
assignment to LF, and Baker (1988a) suggests that verbs (or perhaps
languages) admit multiple possibilities for Case assignment, including the
option of assigning two structural cases and the option of one structural
and one inherent case, alongside the common situation of having only
a single structural case. Nor can we maintain the converse position that
a defining property of parameters is non-binarity, as there appear to be
some, most notably the directionality parameters, which by their very nature
are binary. Naturally, there is nothing unintelligible about sets of parameters
some of which are binary and others of which are not, particularly if
it transpired that the two sets clustered together with respect to other
properties, perhaps thereby constituting parametric natural 'kinds' (see
below), but this first attempt to impose a substantive constraint on
parameters would appear to require major réévaluations of central parts
of the theory if it were to be adopted. 13
A related possibility is that of whether parameters come pre-set, resetting
being determined by positive evidence (see Hammond, this volume) or
whether they are simply unset, requiring positive evidence to be set in
one way or another. Pre-set values, if they exist, can then be referred
to as unmarked. This possibility, of course, raises procedural questions,
and it will arise again in Section 3, but for now, since we are assuming
that S 0 contains a specification of the range of permissible parameter values
for each parameter, it is natural to wonder whether some a priori ordering
might not be imposed on this range. Again, unfortunately, what we have
on the ground is a mixed bag. Thus, taking the governing category parameter
in the work of Wexler and Manzini, the notion of a default, pre-set value
makes perfect sense and, indeed, is necessary from the set-theoretic
perspective they adopt (see p. 18 below, and Newson, this volume). As
is well-known, a default value has also been suggested by Hyams (1986)
for pro-drop, this being [+pro-drop] and motivated by some controversial
claims about early child speech (see, for example, Aldridge 1988). Already,
with these two cases, however, whatever the empirical status of the claims,
there is the uncomfortable observation that Wexler and Manzini's account
14 Martin Atkinson
is grounded in learnability considerations (irrespective of whether mar-

kedness is viewed as integral to the initial state of Universal Grammar
or a consequence of the operation of a separate learning module) which
are not available for Hyams, the point being that ±pro-drop does not
yield set-theoretically nested languages because of the non-existence of
sentences with expletive subjects in many [+pro-drop] languages. 14 Clearly,
if we consider other examples such as the directionality parameters (head
direction and direction of Case and 9-role assignment) or whether wh-
movement is obligatory in syntax, we end up with overlapping or disjoint
languages for which no observations have been offered to suggest that
one value should be the default setting of the parameter (see Newson,
this volume).
Turning now to how we might constrain the location of parameterisation
within the theory, there is a variety of ways in which we might spell out
the sense in which the framework allows for parametric variation. Some
construals appear in (17):
(17) Universal Grammar consists of:

a. a set of parameterised principles;
b. ( i) a set of universal principles;
(ii) a set of parameterised principles.
c. ( i) a set of universal principles, stated in terms of a set of
primitives;
(ii) a parameterisation of the primitives.
d. ( i) a set of universal principles, stated in terms of a set of
universal primitives;
(ii) a set of universal principles, stated in terms of paramete-
rised primitives;
(iii) a parameterisation of the appropriate primitives.
To appreciate what is involved in these alternatives, we might consider

the particular example of the Binding Theory. According to (17a), and
assuming that some parameterisation is justified here, we would say that
the Principles of the Binding Theory are themselves parameterised. Ac-
cording to (17c), however, following Wexler and Manzini (1987), the clauses
of the Binding Theory are formulated in terms of primitives, one of which
is governing category, and it is this notion which is parameterised. 15 On
this construal, then, the actual form of the Binding Theory is universal.
(17b) and (17d) merely represent mixed possibilities. Now, it seems
increasingly clear that when authors talk about parameterisation, they are,
like Manzini and Wexler, referring to primitives and not principles -
Subjacency would be another example, where we could talk about the
parameterisation of the Principle or of the notion of bounding node, one
of the primitives which enters into its formulation. This begins to look
like a fairly tidy constraint to impose on the location of parameterisation,
but, again, there are claimed instances of parameterisation to which it
is not clearly applicable. Thus, the directionality parameters do not enter
directly into the principles of X-bar theory, Case Theory or 0-theory and
we appear to have a situation where parameterisation can occur in the
primitives appearing in principles and elsewhere. It is, of course, notable
that the directionality parameters have clustered together with respect to
the properties of binarity and default values, being binary when other
parameters are not and not having default values when other parameters
do. This may be symptomatic of an interesting partition in the set of
possible parameters. 16
Another parametric constraint may also be construed as locational and
involves the claim that parametric variation is restricted to occur in the
lexicon. The suggestion seems to have been first made by Borer (1984)
and it has a clear intuitive appeal. Everyone agrees that the lexical items
of a language have to be learned along with their idiosyncratic properties.
It is also apparent that those features of languages which make them
different from each other have to be somehow acquired and cannot be
antecedently specified in the structure of S0. It is natural, therefore, to
identify the locus of variation with that aspect of grammar that has to
be learned, viz. the lexicon.
Support for this localisation of parametric variation has been supplied
by Wexler and Manzini (1987), who argue in detail that different values
of the governing category parameter cannot be associated with a language
once and for all, but have to be linked to specific anaphors and pronouns,
since it is possible to find two such items in a single language the syntactic
behaviour of which is regulated by different values of the parameter.
A more radical alternative is considered in a tentative way by Chomsky
(1988b), basing his discussion on Pollock (1987). Having stated the lexical
parameterisation view, he goes on to say (p. 44): "If substantive elements
(verbs, nouns, etc.) are drawn from an invariant universal vocabulary,
then only functional elements will be parameterised". His subsequent
discussion argues for just such a parameterisation of functional elements,
proposing that AGR is 'strong' in French but 'weak' in English, these
attributes being spelled out in terms of the ability or lack of it to transmit
0-roles, and that [+finite] is 'strong' for both languages, whereas [-finite]
is 'weak'. These proposals enable Chomsky, again following Pollock, to
produce a comprehensive account of the behaviour of adverbials, quan-
tifiers, negation, etc. in simple clauses in English and French. 17
There are at least two reasons for being cautious about these proposals,
which clearly represent a significant attempt to localise parametric effects.
First, they do not bear at all on the nature of parameters, so the questions
16 Martin Atkinson
with which I began this section stand unanswered, i.e. we are no nearer
an understanding of exactly what forms of lexical parameterisation are
legitimate and we have at best partially responded to the dangers of
descriptivism. Second, as Safir (1987) observes, restricting variation to the
lexicon runs the risk of losing generalisations, if it transpires that all, or
even most, lexical items of a particular category behave in a certain way.
As we shall see in the next section, Wexler and Manzini themselves confront
this sort of problem in connection with Binding Theory phenomena, but
their way of dealing with it is not entirely satisfactory.
Safir's specific worries again concern the directionality parameters and
could be met by extending the notion of lexical parameterisation to zero-
level categories. Thus, the claim that verbs in English uniformly assign
Case and 0-role to the right would fall under this extended notion of
lexical parameterisation. As far as the more radical version of lexical
parameterisation, restricting it to functional elements, is concerned, it is
perhaps premature to speculate on its plausibility. Suffice it to say that
pursuit of it would require the development and justification of an inventory
of functional categories and their properties (for an initial view on such
an inventory, see Abney 1987) and a re-analysis of the whole range of
linguistic variation in terms of these properties. An instance of how progress
might be achieved in this regard is Fassi-Fehri's (1988) discussion of Case
assignment in Arabic and English. Adopting Abney's (1987) DP analysis,
he argues that verbs in English and Arabic uniformly assign accusative
case to the right, a necessary consequence of restricting parameterisation
to functional elements. However, D and I, both functional elements, differ
in the two languages in that they assign genitive and nominative case to
the left in English and to the right in Arabic. This proposal enables Fassi-
Fehri to construct an interesting account of word-order differences in the
two languages.18
This section has surveyed some of the obvious and less obvious ways
in which a theory of parameters might be constrained within the instan-
taneous idealisation. I hope that the need for such constraints is self-evident,
but it is not clear which, if any, of the possibilities raised, is appropriate
to pursue. Some of these issues will arise in a different context as we
now shift away from the instantaneous idealisation and construe the system
as developing in real time.
3. PROCEDURAL ISSUES
Dropping the instantaneous idealisation of (16) gives us the schématisation

of (18):
(18) S0 — Sj —• — Sn
Here, again, S 0 and S n designate the initial and final states, but now we
recognise a succession of intermediate states. A large number of questions
arise in this context, but in this section I shall focus on aspects of just
two of these. What is the nature of the developmental process which
mediates between the various states in this sequence? And does S 0 contain
a full inventory of the principles and parameters of Universal Grammar,
thereby implying that the same is true of the intermediate states, or do
some principles and parameters only become available as the child develops?
If this latter possibility is correct, an immediate further question arises:
what is responsible for the emergence of those principles and parameters
which are not available in the initial state?
Let us initially focus on the first of these questions, assuming for the
purposes of this discussion that, indeed, the full set of principles and
parameters is present from S 0 . An obvious way to view the offerings of
the instantaneous idealisation is in terms of it providing a restricted set
of hypotheses in line with (la), each hypothesis corresponding to a core
grammar; then the learner's task is seen as that of selecting and testing
hypotheses on the basis of exposure to data which are subject to the criterion
of'epistemological priority', and the job of the theorist, no longer operating
with the idealisation, is to provide a detailed account of exactly how
hypotheses are selected, what 'epistemological priority' amounts to, etc.
Presumably, there will be a relation of 'content' between selected
hypotheses and the data which occasion their selection. For example, we
would anticipate that the governing category parameter for a particular
anaphor will be set, or re-set, on the basis of exposure to data containing
that anaphor, represented as such by the child, in a relevant structural
configuration. From the perspective of Fodor (1981), such content-rela-
tedness is diagnostic of paradigmatic cases of learning, yet supporters of
the parameter-setting account often give the impression that they are
offering something quite distinct from a learning account, and Piattelli-
Palmarini (1989) suggests that applications of the label 'learning' to the
envisaged procedures is quite wrong and should be resisted. With the
rejection of learning comes the rejection of hypothesis selection and testing,
since this is the only coherent account of learning within a mentalistic
framework.
Whether the claim that something conceptually distinct from learning
is going on here is a question of some importance. Exactly where does
the distinctiveness of development in this model reside?
First, and most obviously, the restrictedness of the hypothesis space
might be seen as contributing to this distinctiveness, but a moment's
reflection should persuade us that this is unlikely. In the typical concept
'learning' experiment, there is normally only a finite (and small) number
of obvious candidates for stimulus variation, and the subject's task is to
18 Martin Atkinson
fix a value for each of these. As Fodor (1975, 1981) maintains, the only
remotely plausible story that has ever been told about what goes on in
such experiments has the subject selecting and testing hypotheses, the
hypotheses being related in 'content' to occasioning stimuli, and this is
a learning situation.
Furthermore, there are cases in the linguistics literature which make
it clear that something like this is seen to be going on. So, consider Huang's
(1982) discussion of English and Chinese word-order and recall that he
takes Greenberg to task for (i) failing to account for why word-order
properties cluster in the way they do, and (ii) failing to account for exceptions
to his statistical tendencies. Huang's alternative is to formulate a version
of the head-direction parameter and, indeed, this comes to terms with
(i) in a straightforward way. For (ii), however, Huang has to recognise
that the head-direction parameter is not set once and for all for all categories
and all bar levels, and he has to contemplate a learner refining hypotheses
in the light of additional experience with the language. More generally,
any account that admits of a parameter being wrongly set, thereby requiring
re-setting (and this applies to some of the best-known proposals in the
field, e.g. Hyams 1986, Wexler and Manzini 1987), has to have some
mechanism for achieving this re-setting, and, at this level of generality,
it is not clear that authors have anything other than hypothesis testing
in mind. Qualitatively, we have no difference between this sort of account
and standard views on learning, although quantitatively, particularly in
comparison to the account offered in classic transformational grammar
in (14) above, there may be major differences in terms of the size of the
hypothesis space (see Atkinson 1987, for more extended discussion).
If distinctiveness does not lie in a shift away from hypothesis testing
per se, perhaps it resides in properties of the mechanism by which hypotheses
are selected and tested. The Subset Principle, as developed by Wexler and
Manzini (1987), building on earlier suggestions of Berwick (1985), can
be viewed in this context.
The Subset Principle is designed to directly alleviate the difficulty arising
from the no negative data assumption by rendering the learner conservative
in a straightforward sense. 7/"parameter values give rise to set-theoretically
nested languages, then the Subset Principle obliges the learner to select
the least inclusive language compatible with the data received so far and
the parameter value yielding this language is deemed to be less marked
than those giving rise to more inclusive languages. Modifications 'upwards'
will always be possible in the light of further positive data to justify them;
modifications 'downwards' will never occur, but, then, if things have gone
according to plan, they will never be needed. There are several points
to make about the Subset Principle.
First, Wexler and Manzini offer it as a principle of a learning module.

As such, it is not part of the initial state, S0. As a consequence, markedness
orderings do not have to be specified as part of S0 either (see p. 13 above),
since they arise naturally from the operation of a learning module which
is constrained by the Subset Principle. This is attractive, since the mar-
kedness orderings for values of the governing category parameter for
anaphors and pronouns are mirror-images; therefore, it would not be
possible to specify a single markedness ordering for this parameter as part
of the initial state.
Second, there is nothing in the postulation of the Subset Principle to
prevent the account including it from being a learning account; Wexler
and Manzini's location of the principle in a learning module is a clear
indication that this is not a view which they would find objectionable.
Again, consider the case of concept learning experiments. Here there is
ample evidence to suggest that subjects have available some sort of a priori
ordering of hypotheses, (for example, conjunctive concepts defined on the
parameters of stimulus variation are more accessible to subjects than
disjunctive concepts in the same domain) and that this determines their
learning strategies. Of course, the situations are different in a number
of ways. The ordering of concepts is not determined by set-theoretic
inclusion and the learning process is therefore not deterministic, as it is
for values of the governing category parameter; in the concept learning
experiment the subject is standardly supplied with negative feedback; and
whereas the Subset Principle is intended to be instrumental in every language
learner's selection of hypotheses, there is no suggestion that every learner
in a concept learning experiment operates with exactly the same a priori
ordering of hypotheses. But there is a sense in which the notion of learning
transcends these differences, and it is with reference to this sense that
I submit that there are important similarities between the two situations.
Finally, the extensional character of the Subset Principle is worthy of
attention. Wexler and Manzini, being anxious to get markedness orderings
out of Universal Grammar for reasons outlined above, have to assume
that their learning mechanism is capable of computing extensional rela-
tionships between the languages determined by particular parameter
settings. Now, there is little value in speculating on the computational
resources of the learner's mind, but some (e.g. Safir 1987) have seen this
computational assumption as a somewhat implausible aspect of the account,
and it is debatable how it should be accommodated to the current orthodox
position emphasising the importance of I-language, as opposed to E-
language, in linguistic theorising (Chomsky 1986b). An alternative which
might be worth considering is to construe the Subset Principle intensionally
on definitions of primitives, and Saleemi (1988, this volume) does this
for pro-drop, suggesting that his framework is extendible to governing
20 Martin Atkinson
category, but even if this proves feasible, it does not bear on the main
issue under consideration.
Overall, it seems that there is no compelling reason to view the Subset
Principle as requiring us to move away from a learning account. The learning
in question is 'special' in that it is governed by a domain-specific principle
for selecting hypotheses, but that selection and testing of hypotheses is
going on is surely incontestable.
The plausibility of the Subset Principle, a property of the learning module,
is seen as deriving from the Subset Condition, a constraint on possible
parameters, which might, therefore, be seen as responding to some of the
issues raised in Section 2 (cf. fn. 12) from a procedural perspective. The
Subset Condition is defined in Wexler and Manzini (1987, 60) as in (19):
(19) For every parameter p and every two values i, j of p, the languages
generated under the two values of the parameter are one a subset
of the other, that is, L(p(i)) c L(p(j)) or L(pG)) c L(p(i))
I am puzzled by the status of this condition, and it is worth attempting

to articulate this puzzlement. First, (19) has a universal quantifier in front
of it, ranging over the set of parameters. It is, however, readily apparent,
if we consider just the parameters which have been mentioned earlier in
this paper, that they do not all have values that yield nested sets of languages.
The directionality parameters constitute the most obvious instantiation
of this claim. Therefore, (19) is false of parameters as they are currently
understood. Are we then to read (19) as normative and as rendering
illegitimate any parameter which does not conform to it? This would
certainly constitute an interesting and strong constraint on the theory of
parameters, but since Wexler and Manzini acknowledge the existence of
such non-conforming parameters, this appears to be an unlikely inter-
pretation. Are there alternatives? Wexler and Manzini have this to say
(1987: 61): "What do we mean when we say that the Subset Condition
is necessary? We say that it is necessary in order for the Subset Principle
to be always applicable. In other words, if the values that the learning
function selects on the basis of data are determined by the Subset Principle
and by nothing else, then the values of a parameter must determine
languages which form a strict hierarchy of subsets". But, in my view,
this only succeeds in converting the Subset Condition from a false claim
to a vacuous claim. What this passage suggests is that the universal quantifier
at the front of (19) should be restricted to parameters to which the Subset
Principle is applicable. But, for the Subset Principle to be applicable,
parameter values must yield set-theoretically nested sets of languages.
There are possible extensional relations between the languages generated
by particular parameter values which might give rise to interesting con-
straints on the theory of parameters. For example, we could consider the

possibility that if for a particular parameter, one pair of values gives nested
languages, then any pair of values does, i.e. parameters come in two varieties,
those which uniformly produce nested languages and those which uniformly
do not; there are no mixed parameters. This is obviously true of the
governing category parameter, but other multi-valued parameters against
which it could be evaluated are not well understood, so I shall not pursue
the matter here. 19 I would, however, emphasise that I believe that the
Subset Condition should be laid to rest.
A matter which has been mentioned above but which, so far, nothing
much has been made of is that of the deductive structure of the theory.
The idea, roughly, is that setting a parameter in one way or another can
have wide consequences throughout the system, and this immediately
suggests a way in which the real-time development of the system might
have distinctive characteristics when compared to paradigmatic learning
models; perhaps it is possible for some aspect of the system to be 'fixed'
by virtue of the 'fixing' of some other aspect. Again alluding to concept
learning, this is not a situation that obtains there, since the various
parameters of stimulus variation are assumed to be independent.
The immediate question to raise in this connection is: in the schema
'X leads deductively to Y', what are the legitimate substitutions for X
and Y? Standardly, it seems to me, X will be a parameter value and Y
will be some property of the grammatical representations of sentences.
The deduction may well employ a number of suppressed premises, relating
to other parameter values. But, of course, there is another possible substitute
for Y, namely the value of a distinct parameter. Thus, we are considering
statements of the form: Pj(a) leads deductively to P2(a') where P] and
P 2 are distinct parameters and a, a' are specified values of P,, P 2 respectively.
This deductive relationship is presumably stipulated as a property of the
initial state, and to the extent that we can justify such statements, we
would appear to have something qualitatively different to learning going
on here, at least as far as the value of P 2 is concerned. Indeed, it may
be appropriate to construe the setting of P 2 in this scenario as 'triggering'. 20
Again, some caution is necessary. The possibility being contemplated
here is likely to lead to a response from the linguist which points us in
rather different directions to those we are envisaging. Pi(a) will be justified
on the basis of certain data D (which may of course be deductively some
way distant from Pi) and P 2 (a') on the basis of data D'. Thus, we shall
be considering a cluster of phenomena D U D ' , and the implicational
statement amounts to the claim that we never have D without D'. 21 In
these circumstances, the linguist will typically search for a single parameter
responsible (in interaction with other parameters and principles) for both
22 Martin Atkinson
D and D'. I conclude, therefore, that the linguistic tradition does not readily
accept stipulations of deductive relationships between parameter values.
The existence of implicational relationships between parameter values
comes under pressure from a different consideration in the learnability
context. Wexler and Manzini (1987), considering the operation of the Subset
Principle in the case where many parameters are to be set, formulate an
Independence Principle, the content of which is that the set-theoretic
relationships between languages generated by the values of one parameter
should not be disturbed by the values of other parameters. If these
relationships were not robust in this fashion, there would be no way for
the Subset Principle to function in a consistent way. At first glance, it
would appear that the Independence Principle rules out exactly the sort
of implicational relationships we are considering here.
Newson (this volume) argues that this is not necessarily the case, pointing
out that implicational relationships between the values of distinct para-
meters are not guaranteed to change set inclusion relations. What they
will do is make certain languages illegitimate, and it will follow that
markedness hierarchies for parameter values will not be calculable by the
learning module, since some of the languages which constitute the input
to the computation will not be available. 22 However, we have already seen
above that reservations about the extensional computations of the Wexler
and Manzini account have been expressed (Safir 1987), and it therefore
seems worthwhile to put these aside and give serious consideration to
including implicational statements as part of Universal Grammar. Newson
(1988 and this volume) pursues this course for Wexler and Manzini's
governing category parameter, arguing that the value of this parameter
for a pronominal is initially fixed on the basis of the value for a
corresponding anaphor. This enables Newson to produce coherent accounts
of two phenomena which are recalcitrant in the Wexler and Manzini
framework.
Manzini and Wexler (1987) note that it is never the case that the governing
category for pronominals in a language properly contains the governing
category for anaphors. If this were the case, there would be domains between
the pronominal and anaphor governing category boundary in which binding
relations would be inexpressible. To rule out this possibility, they formulate
the Spanning Hypothesis, as in (20):
(20) Any given grammar contains at least an anaphor and a pronominal

that have complementary or overlapping distribution.
Commenting on the status of (20), they say (p.440): "... it seems plausible
that [it] expresses a proposition that happens to be true of natural languages
as they have actually evolved, but has no psychological necessity, either
as part of the theory of learnability or as part of the theory of grammar."

There might be justified suspicion about the basis of this remark; intuitions
about those properties which are true of all languages via psychological
(ultimately, biological) necessity versus those which merely happen to be
true are not guaranteed to travel well, and it is a virtue of Newson's approach
that he can account for (20) without relying on this difficult distinction.
Briefly, if a pronoun is required to initially take the value of its corre-
sponding anaphor, subsequent positive evidence can either leave this
situation unchanged, in which case anaphoric and pronominal governing
categories coincide or it can make the pronominal value more marked,
in which case the governing category for the anaphor will contain that
of the pronoun. These two situations are exactly those stipulated by Manzini
and Wexler's (20).
The second problem noted by Manzini and Wexler is the fact that
pronouns taking unmarked values of the governing category parameter
are remarkably infrequent. Furthermore, the majority of pronouns they
consider appear to take the maximally marked value of the parameter,
and, while there is no necessary positive correlation between unmarkedness
and distribution in the world's languages on their account, the existence
of a seemingly massive negative correlation has to be seen as worrying.
Again, on Newson's proposal, these distributional phenomena follow
naturally. The majority of anaphors, as expected, take the minimally marked
value of the parameter. If the values for pronouns are initially fixed by
reference to the values for anaphors, the majority of pronouns will be
acquired with the maximally marked (for pronouns) value. Furthermore,
positive evidence will not enable this situation to change, so most pronouns
will be 'stuck' with this maximally marked value. The only way for a
pronoun to take the maximally unmarked value will be on the basis of
being linked to an anaphor which takes the maximally marked (for
anaphors) value. This will be an unusual situation, and, even if it should
arise, positive data will still provide the pronoun with the opportunity
of taking a more marked (for pronouns) value. Of course, Newson's account
raises many questions and makes predictions about the distribution of
anaphor-pronoun pairs in languages and about the course of development
of such pairs (for relevant discussion of the latter, see Wexler and Chien
1985; Solan 1987). For the moment, it is sufficient to have established
that the possibility of Universal Grammar containing implicational sta-
tements linking parameter values should not be summarily dismissed.
To close this paper, I would like to briefly consider the second question
with which this section opened, that of whether all the principles and
parameters of Universal Grammar are available to the child from the onset
of acquisition (see Hoekstra, this volume, for specific discussion of this
issue). This claim, made, for example, by Hyams (1986), embodies what
24 Martin Atkinson
Borer and Wexler (1987) and Wexler (1988), following Pinker (1984), refer
to as the Continuity Hypothesis.
The best-known arguments against the Continuity Hypothesis are set
out in Borer and Wexler (1987). Most obviously, they draw attention to
what they refer to as the Triggering Problem in connection with Hyams'
(1986) account of the re-setting of the pro-drop parameter for children
acquiring a [-pro-drop] language such as English. For Hyams, this re-
setting is 'triggered' by the presence of expletive subjects, but the question
that immediately arises is that of why this triggering does not occur earlier,
since the child is exposed to sentences containing expletive subjects from
an early age. Borer and Wexler do not offer an alternative theory of pro-
drop in their paper, but, to illustrate an area where they feel a non-continuity
account is insightful, they propose that the child's early 'passives' in English
and Hebrew are all adjectival and therefore do not involve movement.
Movement involves the representation of A-chains, and, they claim, this
aspect of Universal Grammar is not available to the child at the stage
at which the earliest 'passives' are produced. These 'passives' it is assumed,
are all lexical. This suggestion receives further support from a consideration
of causatives in English and Hebrew and also plays a role in accounting
for a range of control phenomena in Wexler (1988). It seems to me plausible
to consider similar proposals in connection with Radford's (1988) claim
that "small children speak small clauses", his explication of this being
in terms of children lacking an I-system at the relevant stage, and his
extension of this claim to include the C-system and D-system in Radford
(1990).
It is not my purpose here to submit such proposals to critical scrutiny
(for some remarks on the Borer and Wexler proposals, see Hoekstra, this
volume; also Weinberg 1987). Rather, I shall take the correctness of some
kind of non-continuity hypothesis for granted and briefly consider the
question of the developmental mechanisms it requires.
We might be tempted to think that a non-continuity hypothesis is
consistent with a learning emphasis and that the representation of A-chains,
I-constituents, etc. is somehow induced by the child on the basis of exposure
to the linguistic environment. But well-known arguments of Fodor (1975,
1980, 1981) militate against this approach. If learning is to be viewed in
terms of hypothesis testing, the hypotheses must be available to be tested,
and Fodor's conclusion that a 'more expressive' system cannot develop
out of a 'less expressive' one by this mechanism follows. An alternative,
advocated by Borer and Wexler, is that the relevant representational
capacities mature, coming on-line according to some genetically determined
schedule. This perspective raises a number of interesting issues which are
likely to be the subject of considerable debate in the near future.
First, and most obviously, there is clearly nothing unintelligible about
a maturationalist account of linguistic capacities, once one subscribes to

the view that significant aspects of linguistic development are genetically
determined, a position which is now surely standard throughout the field.
Other genetically determined capacities unfold as the child matures, so
the onus is on the detractors of maturational accounts to say why linguistic
capacities should be different in this respect.
Second, the extent to which a maturational account can provide insightful
analyses for a wide variety of data must be pursued. The case for A-
chains has a good deal of support (see, however, Hoekstra, this volume),
as, I believe, does that for the maturational emergence of the functional
categories I, C and D, but note that none of this impinges on our earlier
preoccupation with parameters. Indeed, it is not immediately clear whether
maturational considerations are going to be relevant to parametric va-
riation. To illustrate, if, say, it is a parametric property of I that it assigns
Case to the left or the right, then the development of I, perhaps according
to a maturational schedule, will be equivalent to the development of the
parameter. It will not, of course, be equivalent to the development of
the correct value of the parameter, but, as has been argued above, it seems
that some notion of learning will be the relevant developmental mechanism
in this respect.
Third, there is a clear sense in which the child language theorist's task
has to be reconstrued if maturational accounts prove to be generally
insightful for developmental phenomena. A question that has rightly
concerned many workers in the field over the years is that of why a specific
capacity emerges before another (see Atkinson 1982, for an extended defence
of the view that up to 1980, theorists interested in a wide range of
acquisitional phenomena had not been very successful in approaching this
question). But in a maturational theory, this question probably becomes
improper at the level at which linguists and psychologists speculate. For
example, Borer and Wexler note that at the time at which they wish to
maintain that the child does not control A-chains, he nevertheless controls
A'-chains, as witnessed by the productive use of w/¡-questions. A natural
question to ask is: why are A-chains later to develop than A'-chains? And
the answer might be: they just are. Of course, the biologist might at some
distant point in the future be able to tell us why this is, but at the level
of linguistic and psychological theorising, explanation stops here.23
The situation briefly described here is rather reminiscent of that favoured
by Fodor (1981) in his discussion of the ontogenesis of concepts. There,
Fodor distinguishes between rational causal processes which involve
projecting and testing hypotheses formulated in terms of some privileged
set of concepts and brute causal triggering which requires an essentially
arbitrary relationship between the occasioning stimulus and the resulting
concept. Facing up to the observation that concepts appear to be acquired
26 Martin Atkinson
in a more or less fixed order, and convinced that this is not explicable
in terms of later-acquired concepts being defined in terms of earlier-acquired
ones, Fodor extends the notion of brute-causal triggering to embrace the
possibility that certain concepts, while not defined in terms of others,
nevertheless have others as causal antecedents. To the extent that this view
of the development of concepts can be maintained, again the psychologist's
task should involve looking rather than attempting to analyse concepts
in terms of others. Such looking will reveal the layered conceptual structure
of the mind, but this structure will ultimately only be rationally explicable
in biological terms.
The extent to which representational capacities germane to the devel-
opment of syntax can be defined in terms of more basic capacities is,
in my view, a question still on the agenda (cf. fn. 15). To the extent that
they can, we may, at least in principle, contemplate producing a 'rational'
account of linguistic development. To the extent that they cannot, there
would appear to be no alternative to looking.
The conclusion suggested by the considerations in this section is that
if the acquisition of syntax is to be seen as having characteristics which
take it clearly outside the domain of learning, these will result from the
correctness of the non-continuity view. For the development of various
formal operations and the principles formulated in terms of them, this
is a perspective well worth pursuing. For linguistic variation, encoded in
distinct parameter settings, however, prospects of this kind do not look
inviting. While it makes sense for a parameter to become available as
a result of maturational scheduling, there is little to be said for its values
entering the system at different times. To date, I feel that there is no
compelling evidence to suggest that learning, perhaps in a very attenuated
sense, has no role to play in this aspect of development.
FOOTNOTES
1. Arguably, the problem has always had a central role in Chomsky's theorising, particularly
in his less technical works, e.g. Chomsky (1975). Isolated examples in the linguistics literature,
such as Peters (1972), Baker (1979) also exist.
2. An additional component of formalised versions of this framework is usually an
assumption about what should count as acquisition. The most obvious candidate here is
that there should be some finite time at which the learner selects the correct hypothesis
and then retains this hypothesis as further data are presented. For discussion of other
possibilities, see Wexler and Culicover (1980), Osherson, Stob and Weinstein (1986). I shall
suppress reference to this component in my discussion.
3. Readers familiar with the literature will recognise this as an informal characterisation
of Gold's (1967) text presentation.
4. This is an informal characterisation of the condition of informant presentation in Gold
(1967).
5. The core-periphery distinction is not one on which I shall focus here, although its utilisation
in linguistic argument may in itself constitute an interesting area for reflection. Chomsky
(1988b: 70), briefly referring to this distinction, says: "... [it] should be regarded as an expository
device reflecting a level of understanding that should be superseded as clarification of the
nature of linguistic inquiry advances".
6. What is interesting about (5) and (6) is that both of them involve a Subjacency violation
(although, see Chomsky (1986a, 50) for the suggestion that this may not be the correct
account of extraction from whether-c\auses). In addition, (6) includes an ECP violation,
since the empty subject position in the embedded clause is not properly governed. (7) and
(8) are also distinguished in terms of the ECP, with a violation of this principle occurring
in (8). For discussion of the relevant theoretical concepts, see Lasnik and Saito (1984).
7. Baker accounts for these differences in terms of the instrumental NP receiving its 0-
role directly from the verb, whereas the benefactive NP is part of a PP at D-structure and
receives its 6-role from the preposition. Readers are referred to Baker's discussion for extensive
justification of this asymmetric behaviour of different sorts of PP.
8. It is a matter of some contention whether the hypothesis selection and testing framework
adopted here is appropriate for the speculations we shall be considering below. Due notice
will be given to this issue at the appropriate time. For the purposes of this introduction,
I believe that adopting this mode of talk is harmless and quite useful.
9. The clearest argument for the innocence of the idealisation for linguistic theory is the
fact that the end state, S„, is remarkably uniform and does not seem to be at the mercy
of vagaries in the order in which data are presented. Given this, questions to do with the
order in which parameters are set, for example, will be irrelevant to the primary concern
of the linguist.
10. Whether information about markedness for parameter values should be included in
S 0 is an issue to which I shall return in Section 3.
11. Safir (1987: 77-8) puts the worry thus: "... our assumptions about what counts as a
"possible parameter" or a "leamable parameter" remain very weak. ... what is to prevent
us from describing any sort of language difference in terms of some ad hoc parameter?
In short, how are we to prevent S t a n d a r d ] Parameter] T[heory] from licensing mere
description?"
12. I refer here specifically to the Subset Principle and the Subset Condition of Manzini
and Wexler (1987) and Wexler and Manzini (1987). Another 'procedural' constraint might
be that any legitimate parameter must be capable of being set on the basis of the evidence
available to the child. Thus, a parameter requiring negative evidence or highly complex
evidence in order to be set would be illegitimate on this basis. See Lightfoot (1989) for
how some parameters could be set by degree-0 data.
13. This is not to suggest that such réévaluation would be impossible, and it might be
interesting to consider the possibility of replacing non-binary parameters by several distinct
binary parameters which together conspire to yield the effects of the original.
14. More recently, Hyams (1987) has proposed a reanalysis of pro-drop phenomena in
terms of morphological uniformity (see Jaeggli and Safir 1988). Briefly, the idea is that
languages which have morphologically uniform verbal paradigms allow pro-drop. Thus,
Italian, in which all verbal forms are inflected,"and Chinese in which none are, are pro-
drop languages, but English, which admits both inflected and non-inflected forms, is not.
With this reanalysis, Hyams is able to maintain that [+morphologically uniform] is the
unmarked parameter setting on learnability grounds, since positive data in the form of inflected
and uninfected forms will serve to re-set the parameter. If the initial setting were to be
[-morphologically uniform], no such re-setting could occur. This simple and attractive idea
involves a number of complications concerning licensing and identification, and I shall not
discuss it further here.
28 Martin Atkinson
15. To talk of primitives in this connection is merely to acknowledge that principles refer,
in their formulation, to a variety of configurational and non-configurational notions. These
could be viewed as 'primitive' modulo the statement of the principle. Whether there is a
fundamental primitive basis for the whole account and, if there is, how it relates to the
issue of epistemological priority mentioned earlier is an interesting issue which will not be
pursued here.
16. The directionality parameters, presumably formulated in terms of the predicates 'right
o f and 'left o f would appear to be readily relatable to a primitive epistemological basis.
Again, this raises issues beyond the scope of this paper.
17. 'Strong' and 'weak' are, of course, mere mnemonic labels for the distinction. It seems
to me reasonable to ask why these properties involve 9-role transmission in the way claimed,
i.e. it is not clear that reference to 8-roles is more than another labelling of the distinction.
Speculations such as those being considered here take on additional perspectives in the light
of Radford's (1988,1990) claims that children acquiring English pass through a pre-functional
stage during which they offer evidence of their control of lexical categories and their projections,
but give no indication of having mastered the functional systems based on I, C and D.
A number of very interesting questions emerge from a juxtaposition of Chomsky's speculations
and Radford's empirical claims, particularly if the latter are generalisable to the acquisition
of languages other than English. For example, it would appear to follow that any systematic
pre-functional variations in the speech of children, say in word order, must be referred
to factors which are not properly viewed as belonging to the language module. It would
not be appropriate to pursue the ramifications of this suggestion in this paper.
18. Fassi-Fehri's account also requires the assumption that subjects appear at D-structure
in some projection of V (or N) which occurs as complement to I (or D). Sportiche (1988),
who also adopts this proposal, suggests that languages are parameterised as to whether
this subject obligatorily moves to (Spec, I) at S-structure. This 'unconstrained' parameterisation
becomes principled on Fassi-Fehri's account; in languages like English, the subject has to
move in this way to get nominative case, assigned by I to its left. The movement of the
subject follows from a localised parameterisation and does not, it itself, constitute the
parameterisation.
19. Alternatively, the applicability of the Subset Principle might be relatable to binarity,
being restricted to multi-valued parameters, there then being two distinct types of development
countenanced by the model. One would fall under the Subset Principle and involve significant
learning; the other would conform more closely to the switch-setting analogy. It is premature
to speculate further in this respect.
20. In the literature, 'triggering' often appears to be identified with 'having consequences
beyond those immediately contained in the data' but this is obviously true of paradigmatic
learning situations, and it seems that some reference to 'content' and 'arbitrariness' is necessary
to distinguish these two notions (see Fodor 1978 for illuminating discussion). The appro-
priateness of the label 'triggering' in this scenario will depend on whether the data leading
to the fixing of P] bear an opaque relationship to P 2 . To take an implausible, but relevant
case, it is conceivable that the fixing of the governing category parameter is implicationally
dependent on the fixing of the head direction parameter, and if this were so, we would
surely be justified in asserting that a phrase with a particular head-complement order triggers
a value of the governing category parameter. This is not learning.
21. We could, of course, have D' without D if the implication is not bilateral, but I will
set this possibility aside here as it does not bear centrally on the discussion.
22. It is not clear that even this is necessary. Given a strict separation between Universal
Grammar and the learning module, it is conceivable that the latter could have access to
'impossible' languages to facilitate its computations, e.g. those obtained by removing just
the implicationally induced constraints from Universal Grammar.
23. It is noteworthy that, if this view is largely correct, then the traditional concerns of
developmentalists in accounting for how stages develop out of their predecessors evaporate;
there is no such development.
REFERENCES
Abney, S. 1987. The English Noun Phrase in its Sentential Aspect. Doctoral dissertation,
MIT.
Aldridge, M. 1988. The Acquisition of INFL. Research Monographs in Linguistics, UCNW,
Bangor 1. (Reprinted by IULC).
Atkinson, M. 1982. Explanations in the Study of Child Language Acquisition. Cambridge:
Atkinson, M. 1987. Mechanisms for language acquisition: learning, parameter-setting and
triggering. First Language 7. 3-30.
Baker, C. L. 1979. Syntactic theory and the projection problem. Linguistic Inquiry 10. 533-
81.
Baker, M. 1988a. Incorporation: A Theory of Grammatical Function Changing. Chicago:
University of Chicago Press.
Baker, M. 1988b. Theta theory and the syntax of applicatives in Chichewa. Natural Language
and Linguistic Theory 6. 353-89.
MIT Press.
Borer, H. and K. Wexler. 1987. The maturation of syntax. In T. Roeper and E. Williams
(eds.).
Chomsky, A. N. 1965. Aspects of the Theory of Syntax. Cambridge, Massachusetts: MIT
Press.
Chomsky, A. N. 1975. Reflections on Language. New York: Pantheon.
Chomsky, A. N. 1981. Lectures on Government and Binding. Dordrecht: Foris.
Chomsky, A. N. 1986a. Barriers. Cambridge, Massachusetts: MIT Press.
Chomsky, A. N. 1986b. Knowledge of Language. New York: Praeger.
Chomsky, A. N. 1988a. Generative Grammar. Studies in English Linguistics and Literature.
Kyoto University of Foreign Studies.
Chomsky, A. N. 1988b. Some notes on economy of derivation and representation. MIT
Working Papers in Linguistics 10. 43-74.
Elliott, W. N. and K. Wexler. 1988. Principles and computations in the acquisition of
grammatical categories. Ms. UC-Irvine.
Fassi-Fehri, A. 1988. Generalised IP structure, Case and VS word order. MIT Working Papers
in Linguistics 10. 75-112.
Fodor, J. A. 1975. The Language of Thought. New York: Thomas Y. Crowell.
Fodor, J. A. 1978. Computation and reduction. In C. W. Savage (zd.) Perception and Cognition:
Issues in the Foundations of Psychology, Minnesota Studies in the Philosophy of Science
9. Minneapolis: University of Minnesota Press.
Fodor, J. A. 1980. Contributions to M. Piattelli-Palmarini (ed.) Language and Learning:
The Debate Between Jean Piaget and Noam Chomsky. London: Routledge & Kegan Paul.
Fodor, J. 1981. The present status oftheinnateness controversy. In J. A. Fodor Representations.
Hassocks: Harvester.
Gleitman, L. R., E. Newport, and H. Gleitman. 1984. The current status of the Motherese
hypothesis. Journal of Child Language 11. 43-79.
Gold, E. M. 1967. Language identification in the limit. Information and Control 10. 447-74.
30 Martin Atkinson
Huang, J. C.-T. 1982. Logical Relations in Chinese and the Theory of Grammar. Doctoral
dissertation MIT.
Hyams, N. 1987. The setting of the null subject parameter: a reanalysis. Paper presented
to Boston University Conference on Child Language Development.
Jaeggli, O. and K. Safir. 1988. The null subject parameter and parametric theory. Version
of a paper to appear in O. Jaeggli and K. Safir (eds.) The Null Subject Parameter. Dordrecht:
Reidel.
Jakobson, R. 1968. Child Language, Aphasia and Phonological Universals. The Hague: Mouton.
Jakobson, R., G. Fant, and M. Halle. 1952. Preliminaries to Speech Analysis. Cambridge,
Lasnik, H. 1985. On certain substitutes for negative data. Ms. University of Connecticut.
Lasnik, H. and M. Saito. 1984. On the nature of proper government. Linguistic Inquiry
15. 235-89.
Lightfoot, D. 1989. The child's trigger experience: 'degree-0' learnability. Behavioral and
Brain Sciences 12. 321-34.
Manzini, R. and K. Wexler. 1987. Parameters, Binding Theory, and learnability. Linguistic
Inquiry 18. 413-44.
Morgan, J. L. 1986. From Simple Input to Complex Grammar. Cambridge, Massachusetts:
MIT Press.
Newport, E., L. R. Gleitman, and H. Gleitman. 1977. Mother, I'd rather do it myself: some
effects and non-effects of maternal speech style. In C. Snow and C. A. Ferguson (eds.)
Talking to Children. Cambridge: Cambridge University Press.
Newson, M. 1988. Dependencies in the lexical setting of parameters: a solution to the
undergeneralisation problem. Ms. University of Essex.
Oehrle, R. 1985. Implicit negative evidence. Ms. University of Arizona.
Osherson, D., M. Stob, and S. Weinstein. 1986. Systems that Learn. Cambridge, Massachusetts:
MIT Press.
Peters, S. 1972. The projection problem: how is a grammar to be selected?. In S. Peters,
(ed.) Goals of Linguistic Theory. Englewood Cliffs, N. J.: Prentice Hall.
Piattelli-Palmarini, M. 1989. Evolution, selection and cognition: from 'learning' to parameter
setting in biology and in the study of language. Cognition 31. 1-44.
Pollock, J. Y. 1987. Verb movement, UG and the structure of IP. Ms. Université de Haute
Bretagne, Rennes II.
Radford, A. 1988. Small children's small clauses. Transactions of the Philological Society
86. 1-43.
Radford, A. 1990. Syntactic Theory and the Acquisition of Syntax. Oxford: Blackwell.
Randall, J. 1985. Positive evidence from negative. In P. Fletcher and M. Garman (eds.)
Child Language Seminar Papers. University of Reading.
Roeper, T. and E. Williams (eds.). 1987. Parameter Setting. Dordrecht: Reidel.
Safir, K. 1985. Syntactic Chains. Cambridge: Cambridge University Press.
Safir, K. 1987. Comments on Wexler and Manzini. In T. Roeper and E. Williams (eds.).
Saleemi, A. 1988. Learnability and parameter-fixation: the problem of learning in the ontogeny
of grammar. Doctoral Dissertation, University of Essex.
Solan, L. 1987. Parameter setting and the development of pronouns and reflexives. In T.
Roeper and E. Williams (eds.).
Sportiche, D. 1988. A theory of floating quantifiers and its corollaries for constituent structure.
Linguistic Inquiry 19. 425-50.
Weinberg, A. 1987. Comments on Borer and Wexler. In T. Roeper and E. Williams (eds.).
Wexler, K. 1982. A principle theory for language acquisition. In E. Wanner and L. R. Gleitman
(eds.) Language Acquisition: The Slate of the Art. Cambridge: Cambridge University Press.
Wexler, K. 1988. Aspects of the acquisition of control. Paper presented to Boston University
Conference on Language Development.
Wexler, K. and Y. C. Chien 1985. The development of lexical anaphors and pronouns.
Papers and Research on Child Language Development 24. 138-49.
Wexler, K. and P. W. Culicover. 1980. Formal Principles of Language Acquisition. Cambridge,
Wexler, K. and R. Manzini. 1987. Parameters and learnability in Binding Theory. In T.
Roeper and E. Williams (eds.).
Observational data and the UG theory of
language acquisition
Vivian Cook
University of Essex
The theory of Universal Grammar (UG), as proposed by Chomsky (e.g.

Chomsky, 1986a, 1988), sees the acquisition of the first language as a
process of setting the values of parameters in the system of language
principles with which the human mind is endowed, in response to "trig-
gering" evidence, these principles and parameters being couched in terms
of the Government/Binding (GB) theory of syntax (Chomsky, 1981a, 1986a,
1986b). The aim of this paper is to examine how observations of actual
children's language can be related to this Chomskyan model of UG. It
does not therefore concern the use of such observational data within
approaches that do not share the UG assumptions.
1. EVIDENCE IN THE UG MODEL
The claims that UG theory makes for language acquisition are largely
based on the "poverty of the stimulus" argument; given that the adult
knows X, and given that X is not acquirable from the normal language
input the child hears, then X must have been already present in the child's
mind. This crucial argument uses the comparison of the knowledge of
language that the adult possesses with the initial state of the child to establish
what could not have been acquired from the types of evidence available
and must therefore be innate. Chomskyan UG theory would not be
discomfited if other evidence from acquisition were not forthcoming. On
the one hand, such research is not of prime importance, given the reliance
on the poverty of the stimulus argument. On the other, evidence from
language development in the child is related with difficulty to acquisition
because of the other factors involved in language performance and
development - production and comprehension processes, situation and
use, the growth in other mental faculties, and so on - all of which are
"non-stationary" (Morgan, 1986) and liable to change as the child grows
older.
The attraction of the current model is that the aspects of language built-
in to the mind are precisely the principles of GB theory - the Projection
Principle, the Binding Principles, and so on; the aspects that have to be
34 Vivian Cook
learnt are the settings for parameters of variation, and the properties of
lexical items. Hence built-in principles of syntax can now be postulated
in a rigorous form that has testable consequences; it is possible to start
looking for evidence of the effects of U G in children's language devel-
opment. And also the reverse; it is possible to start phrasing research
into acquisition in ways that can affect issues of linguistic theory, the
prime example being the work of Hyams (1986) and Radford (1986).
2. I-LANGUAGE A N D E-LANGUAGE THEORIES
A starting point is the distinction made in Chomsky (1986a) between I-

language (internalised language) and E-language (externalised language).
I-Language is "a system represented in the mind/brain of an individual
speaker" (Chomsky, 1986a, p.36); the task of I-language theories is to
describe this mental possession; hence I-language syntax is closely related
to the mind and to psychology. E-language is a collection of sentences
"understood independently of the properties of the mind" (Chomsky, 1986a,
p.20); it is the speech people have actually produced, the ways they use
language to interact, and the description of the statistical properties of
language events; E-language syntax is related to social interaction and
to sociology. In I-language theories, acquisition research is seen as ex-
plaining how the child acquires knowledge of language; in E-language
theories, it is seen partly as describing the regularities in a corpus of the
child's sentences, partly as describing how the child develops social
interaction. UG theory is concerned with I-language knowledge rather than
with the E-language regularities in a set of utterances. GB does not
necessarily confine itself to a single form of data: "In principle evidence
... could come from many different sources ... perceptual experiments,
the study of acquisition and deficit, or of partially invented languages
such as Creoles, or of literary usage or language change, neurology,
biochemistry, and so o n " (Chomsky, 1986a, p.36-37); its chief evidence
is whether a sentence conforms to the speaker's knowledge of language.
In practice GB theory has concentrated almost exclusively on what can
be called "single-sentence" evidence; a single sentence, such as Is the man
who is here tall? (Chomsky, 1980, p39) or ¿Esta el hombre, que esta contento,
en la casa? 'Is the man, who is happy, at home?' (Chomsky, 1988, p.42),
is sufficient to attest to the native speaker's language knowledge, and hence
to invoke the poverty of the stimulus argument.
An I-language theory takes Is the man who is here tall? to be a grammatical
sentence on the linguist's own say-so without hunting for observations
of native speakers using it in actual speech. But single-sentence evidence
is difficult to use in acquisition research since native children cannot be
Observational data and the UG theory of language acquisition 35
asked directly whether they accept Is the man who is here tall? as their
answer would not be meaningful. Children are by and large not capable
of attesting unambiguously that a particular sentence is or isn't generated
by their grammar. Other than single-sentence evidence or the pure poverty
of the stimulus argument, what else can count as evidence of language
acquisition in a UG framework? One possibility is to use experimental
techniques and statistical procedures from the psycholinguistic tradition.
The research of the past decade has employed a wealth of techniques ranging
from the elicited imitation tasks used by Lust and her associates (1989)
to the comprehension tasks employed by Matthei (1981) and others; indeed
the specific case of structure dependency seen in Is the man who is here
tall? was investigated by Crain and Nakayama (1983) through a question
production task.
The validity of such forms of evidence is not the concern here; an account
of some of their merits and demerits is seen in Bennett-Kastor (1988).
Instead the discussion will be restricted to one type of evidence that has
been used within UG theory, namely the use of sentences observed in
actual children's speech, which can be called "observational data". If a
child is heard to say Slug coming, what status does this sentence have
as evidence for UG theory? The main argument here is that there is an
inherent paradox in using observational data to support a UG model that
needs to be aired, even if it cannot be resolved. Observational data belongs
in essence to E-language; the typical E-language study of acquisition looks
at statistically prominent features found in a substantial collection of
children's speech, say Brown (1973) or Wells (1985). The major problem
is how to argue from E-language descriptive data of children's actual speech
to their I-language knowledge, a problem first perhaps highlighted in
Chomsky (1965) as "a general tendency... to assume that the determination
of competence can be derived from description of a corpus by some sort
of sufficiently developed data-processing technique". While it is interesting
and instructive to use observational data to investigate the UG claims,
the chain of qualifications and inferences between such data and language
knowledge is long and tortuous.
3. OBSERVATIONAL DATA, PERFORMANCE A N D DEVELOPMENT
There are two related dimensions to this within the UG theory - performance
and development. Any use of performance data by linguists faces the
problem of distinguishing grammatical competence from the effects of
production and comprehension processes, short term memory, or other
non-competence areas of the mind involved in actual speech production.
Single-sentence evidence is immune to all of these factors. In this sense
36 Vivian Cook
children's speech presents exactly the same problems for the I-language
analyst as the speech of adults. GB oriented linguists base their syntactic
analyses on single example sentences rather than on chunks of performance;
they too have problems with deriving the knowledge of the native speaker
from samples of raw performance.
But children's language also ties in with their development on other
fronts; the actual sentences they produce reflect their developing channel
capacity, that is to say a mixture of cognitive, social, and physical
development, from which the effects of language acquisition need to be
filtered out. The distortions that performance processes cause in actual
speech are doubly difficult to compensate for in language acquisition
research because they may be systematically, or nonsystematically, different
from those of adults - short term memory may be smaller in capacity
or organised in a different way, cognitive schemas may be different, and
so on; insofar as these are involved in language performance they affect
children differently from adults. "Much of the investigation of early
language development is concerned with matters that may not properly
belong to the language faculty ... but to other faculties of the mind that
interact in an intimate fashion with the language faculty in language use"
(Chomsky, 1981b). Cook (1988) distinguishes "acquisition" - the logical
problem of how the mind goes from S0 (zero state) to Ss (steady state)
- from "development" - the history of the intervening stages, S b S2, and
so on. To argue from observation of children's development to the theory
of acquisition means carefully balancing all: these possibilities. Linguists
are frequently struck by the child's presumed difficulties in dealing with
primary linguistic data; their own difficulties in deriving a representation
of grammatical knowledge from samples of children's performance are
not dissimilar, or indeed worse since children's sentences are more deficient
than the fully grammatical sentences spoken by caretakers (Newport, 1976).
So the child saying Slug coming may be suffering from particular production
difficulties shared by adults or from specific deficits in areas that have
not yet developed, say the articulatory loop in working memory (Baddeley,
1986). The apparent syntax of the sentence may be different from the
child's competence for all sorts of reasons.
Observational data thus raise two problems related to performance; one
is the distortion resulting from the systematic or accidental features of
psychological processes; the other is the compounding effect of the
development of the child's other faculties. For observational data to be
used in a UG context, eventually these distortions need to be accommodated
within a developmental framework that includes adequate accounts of the
other faculties involved in the child's language performance, which, needless
to say, does not yet exist. Furthermore, observational data of children's
speech are still only evidence for production rather than comprehension,
the two processes being arguably distinct in young children (Cairns, 1984).
4. REPRESENTATIVENESS OF OBSERVATIONAL DATA
Let us now turn to some methodological issues with observational data.

Taking the step from single-sentence evidence to E-language performance
evidence incurs several obligations. The first is to take a reasonable sample
of data. Any linguistics book or article in the I-language tradition assumes
that it is meaningful to discuss people's knowledge of John is easy to please
or That he won the prize amazed even John or John caused the book to
fall to the floor, it is beside the point whether such sentences have ever
been uttered, provided that they reflect the knowledge of a native speaker.
An actual child's sentence, however, is not concocted by a linguist to
illustrate a particular syntactic point and so removed from processing
constraints, discourse connections, and so on; nor can it be checked against
the speaker's intuitions about his grammatical knowledge. Because of the
deficient nature of children's language, an example or two can probably
be found of almost any syntactic possibility one cares to name, say Verb
Subject order as in Comes a mummy reported in Cook (in progress) or
apparent structure-dependency violations as in What does sheep make a
noise? (Cook, 1988). Such isolated examples are not sufficient for E-language
analysis, nor are arbitrary lists or collections of sentences; data need to
be typical of the child and of children in general if they are to be of
value, because of the many other causes that may be involved. As E-
language data, a single sentence such as Comes a mummy might be a
performance slip or a garbled attempt at a nursery rhyme or a genuine
reflection of competence. A sample of sentences is required to rule out
the accidental or non-accidental but non-linguistic effects of performance.
Again for E-language data a reasonable sample of children need to be
included; it prejudges the universality issue if arguments are based on
observational data from a few children or from one child. So transcripts
of actual children's speech must include a reasonable sample of children
and a reasonable sample of speech for each stage of each child. They
need to be based on a consistent analysis and on the same sample of
children, to eliminate possible regional, social, and individual differences,
as discussed in Bennett-Kastor (1988); a good example of such research
is seen in Stromswold's study of some 16000 examples of sentences with
wh-words from 12 children (Stromswold, 1988). Information on the relevant
makeup of the children should be explicitly stated, once the initial step
into observational data has been taken. All of these requirements are beside
the point for single-sentence evidence and for the poverty of the stimulus
argument. They are however perfectly plausible demands on E-language
38 Vivian Cook
evidence. A requirement for combining the I-language approach to know-

ledge with the use of E-language data is the quantification of aspects of
children's sentences; frequency of occurrence needs to be counted, cal-
culations need to be made, and statistical reliability becomes an issue.
As Bloom et al (1976, p.34) put it, "if structural features occur often enough
and are shared by a large enough number of different multi-word utterances,
then it is possible to ascribe the recurrence of such regular features to
the productivity of an underlying rule system ...". The key words when
handling E-language data are "often enough" and "large enough". A single
sentence could be highly revealing of the child's grammar; but, because
of the possibility that any single spoken sentence is an isolated freak due
to memory problems, rote repetition, discourse constraints, or any of a
host of other factors, the UG analyst using E-language data has a duty
to put them in the context of a broader picture of children's speech. This
is not to say that frequency of occurrence is crucial; it is clearly irrelevant
to single-sentence I-language evidence. But E-language evidence needs to
be safeguarded by showing that the data reflect some general property
of the child's knowledge rather than being one-off instances.
5. OBSERVATIONAL DATA A N D ADULT PERFORMANCE
A major point also concerns what data from children should be compared
with - adult competence or adult performance? The significant paper by
Radford (1986), for instance directly compares two sets of sentences, one
consisting of actual child performance such as That one go round, the other
of bracketed adult versions such as Let [that one go round], as if they
were the same type of data (pp. 10-11). It is difficult to offer children's
E-language data as evidence for their knowledge of language without
comparing them with E-language data from adults. A comparison of
observational data from children with single-sentence evidence from adult
competence begs many questions. Once it is conceded that adult perfor-
mance needs to be used, a range of phenomena must be taken into account
that GB syntax has mostly excluded. Let us take the pro-drop parameter
as an illustration. The main criterion for a pro-drop language is the absence
of certain subjects in declarative sentences. In her important work with
pro-drop Hyams (1986) found that children from three different language
backgrounds have null-subject sentences; she regarded this as confirmation
of an initial pro-drop setting, later rephrased as [+uniform] morphology
(Hyams, 1987). However, adult E-language data for English reveal that
subjects are often omitted in actual speech and writing, usually at the
beginning of the sentence. Taking a random selection of sources, Can't
buy me love and Flew in from Miami Beach come from well-known song
lyrics; the opening pages of the novel The Onion Eaters (Donleavy, 1971)
contain Wasn't a second before you came in, Must be ninety now, and Hasn't
been known to speak to a soul since anyone can remember, a column writer
in The Weekend Guardian with the pseudonym "Dulcie D o m u m " typically
uses sentences such as Drive to health food shop for takeaway, In fact might
be too exciting, and Replies that it's in my desk drawer (Domum, 1989)
- indeed in this article some 34 out of 68 sentences have at least one
null-subject; an anecdote in Preston (1989) concerns a prescriptively oriented
teacher denying that she uses gonna "Ridiculous" she said; "Never did;
never will". Adult speakers of English appear to use null-subject sentences,
even if they only utilise them in certain registers and situations. So, if
the performance dimension of variation between styles of language is taken
into account, null-subjects may be expected to appear in children's
performance and children may also be expected to have encountered them
in some forms of adult speech.
But also, given the many ways in which children are different from
adults, an argument based on observational data has to explore the
alternative developmental explanations that might cause something to be
lacking from their speech. One explanation might indeed be a more frequent
use by children of some performance process that the initial elements or
elements in the sentence can be omitted - a clipping of the start of the
sentence - which creates the illusion of pro-drop among other effects.
A counterargument is that the null-subject is not always initial and hence
not a product of utterance-initial clipping; however in a sample of children's
language discussed in Cook (in progress) only 3 out of 59 null-subject
examples had non-initial null-subject. Another explanation might be the
"recency" effect whereby children pay attention chiefly to the ends of
sentences (Cook, 1973), thus being more likely in SVO languages to omit
subjects than objects and hence giving the illusion of null subject sentences.
Hyams (1986) presents the counterargument that, at the same time as
children produce subjectless sentences, they also produce ones with subjects,
so that the lack of overt subjects is not a memory limitation; while this
may well be true, the use of null-subject sentences by English-speaking
adults is equally not a product of memory limitations. A further explanation
might be found in the type of subject that is missing. Children may leave
out some first person subjects because they feel they are not needed, and
this might be a cognitive universal; Sinclair and Bronckart (1971) suggested
that at a certain period children see themselves as the implicit subject
of the sentence; Halliday (1985) sees first person subjects as the most
prototypical form. According to Hyams (1986, p.69), however, "the referent
of the null-subject is not restricted to the child himself'. Yet in the same
sample of sentences some 39 out of 59 null subjects were apparently first
person; a high proportion, though not all, of children's null-subject
40 Vivian Cook
sentences could be attributed to this source.

To decide between these conflicting explanations would entail crosslin-
guistic evidence based on large samples of non-first person null-subject
sentences. The general point is that sheer observation of forms in children's
language is susceptible to several explanations other than language know-
ledge per se. Other evidence is required to decide between the explanation
based on adult E-language data, the explanations based on performance
and cognitive development, and the linguistic UG explanation. The cor-
rectness of the pro-drop explanation cannot be uniquely shown from E-
language observational data without in some way examining the other
explanations within a larger framework of children's development and of
language use by adults. If E-language performance by children is compared
directly with adult E-language data rather than with adult I-language
knowledge, an apparent peculiarity of the child's language may be shown
to be a fact of performance shared by adults but not taken into account
in the standard descriptions of grammatical competence within GB theory.
Adult users of a language tend to be regarded as grammatically perfect
without seeing that they are subject to the same factors of performance
as children: some adults, on some occasions, do produce null-subject
sentences in English. In one sense this claims simply that alternative
explanations can be found for apparently linguistic phenomena; since the
alternative proposals are not precisely formulated, what is wrong with
accepting the linguist's account? But in the early stages of child speech
this must be faut mieux; until there is an overall theoretical framework
that encompasses the diverse aspects of children's development, it will
not be possible to extract language acquisition from language development
by means of observational data alone.
6. EVIDENCE OF ABSENCE
The other problem is what counts as observational data. Sherlock Holmes

once drew attention to the behaviour of the household dog during a
burglary; Watson pointed out that there wasn't any behaviour since the
dog had not even barked; Holmes showed the very absence of barking
implicated one of the people in the household since otherwise the dog
would have barked at a stranger. Absence of something that might be
expected to occur may in itself be relevant evidence. Both Hyams (1986,
1987) and Radford (1986,1988) make extensive use of evidence of children's
non-performance. The major absence discussed by Hyams (1986) is the
subject of the sentence but she also talks about the absence of auxiliaries
and of inflections. Similarly there are twelve summary statements in Radford
(1986) to show that children's grammars lack Inflection and Complement
Phrases; eight claim that children "lack" or "may lack" Complementisers,

infinitival to, Modal Auxiliaries, Tense, Agreement, nominative subjects,
overt subjects, and VP; two statements suggest children "have n o " inverted
Auxiliaries and preposed wh-phrases; one states that "Child independent
clauses may be nonfinite" (i.e. they may lack a finite verb); only one is
phrased positively - "Child clauses have particle negation" - which partly
implies that their sentences lack auxiliaries. All the arguments except one
come down to the absence of elements from children's sentences. Doubtless
there are problems of phrasing here; some facts could be equally expressed
positively or negatively. But, whatever the phrasing, the point applies to
the type of evidence that is used - evidence of positive occurrence or evidence
of absence.
Arguing from absence is intrinsically problematic. If the possibilities
of occurrence are circumscribed so that a precise prediction can be made
as to what will be missing, then it seems perfectly acceptable. Providing
a list of the objects that are on my desk at the moment is easy; compiling
a list of what is not on my desk is difficult and eventually means listing
all the absent objects in the universe, unless there are precise expectations
of what should be on desks. Evidence of absence is hard to interpret when
alternative explanations exist. After all the dog might have been drugged,
or chasing a rat, or just fast asleep. When the main evidence for acquisition
is what children can't do or the evidence for their knowledge of language
is what they don't produce, interpretation becomes difficult if the predictions
are not tightly constrained. In a sense the object of Universal Grammar
is to specify the constraints on what might be expected in human language,
to say what cannot occur. The difficulty in applying evidence of absence
to observational data is that at the early stages of children's language
many things are absent and the child is deficient in many areas: many
explanations in cognitive and performance terms could be therefore
advanced for things that are absent. Features that are observed to occur
can be used directly as evidence; features that are not found are ambiguous
if there are many of them and if they are susceptible to explanations other
than in terms of grammatical competence.
The child has sometimes been referred to as a "mini-linguist"; let us
reverse the metaphor for a moment by talking about the linguist as a
"maxi-child". The problem that the child faces is deriving a grammar
only from positive evidence; negative evidence in the shape of sentences
that don't occur plays a less prominent role. It has been argued, for example
by Saleemi (1988), that it should only figure as ancillary support for ideas
already vouched for by positive evidence. The linguist attempts to derive
the grammar of children from observational data that constitutes evidence
for what occurs. The strongest requirement is that the maxi-child should
keep to positive evidence alone; in this case what doesn't occur in children's
42 Vivian Cook
speech should only support ideas already vouched for by observational

data or by other types of evidence, such as comprehension tests. This
is not to deny that absence of a precisely defined property is not a valid
form of observational data; the dog after all didn't bark. Evidence of
absence is however a tainted source in the early stages of children's
acquisition since virtually anything in adult grammar could be argued to
be missing; such indirect evidence needs support from other evidence of
one kind or another.
One counter-argument is that evidence of absence maintains the standard
generative use of what people don't say as a form of evidence; the use
of starred sentences is taken for granted as a normal and legitimate form
of argumentation and is shorthand for saying these sentences are not found.
The constraints of UG on the mind such as structure-dependency are
typically shown by demonstrating that people who have never heard an
example of a structure-dependent sentence nevertheless reject it on sight.
But this still constitutes single-sentence I-language evidence: is this sentence
generated by the grammar of the speaker? It can be backed up with
experimentally gained grammaticality judgements or comprehension tasks
for many sentences or many people, though this raises the difficulties of
experimental method, as seen in Carroll et al (1981). Shifting to obser-
vational data, there are indeed particular sentences that people do not
use; one might point to the total lack of structure-independent sentences
in some vast corpus. But is this the same as observing that there are aspects
of the sentence that children leave out? The prediction is of a different
kind; evidence of absence cannot be justified simply by the single-sentence
appeal to what people do not say.
7. CORRELATIONS WITHIN OBSERVATIONAL DATA
Finally, as has already been seen, analyses depending on observational

data frequently employ intuitive concepts of simultaneity and correlation;
because X occurs at the same time as Y it must be the same phenomenon
or indeed it must cause Y. Hyams (1987) for instance argues that "the
child who allows null-subjects must also be analysing his language as
morphologically uniform" based on evidence such as "the acquisition of
the present and past tense morphemes coincides with the end of the null-
subject stage in English". One problem is the matter of defining simultaneity
and, conversely, sequence: what counts as the same time? what counts
as a sequence? what does "coincide" actually mean? the same day, the
same month, the same stage, or what? Statements that forms coexist or
that they occur in a sequence mean little without an explicit framework
for measuring time and sequence: developmental research needs a clock.
Furthermore, if data from more than one child are being used, it is necessary
to define coexistence in terms of chronological age, mental age, MLU,
LARSP, grammatical stages or whatever developmental schedule one
prefers: developmental clocks need to be set to the same standard time
if comparisons are to be made.
Secondly, there is the question of how different aspects of behaviour
correlate within a stage. Ingram (1989) finds four meanings for "stage"
when considered in terms of a single behaviour and four more when
considered in terms of two behaviours;in his terms, much of the UG related
research goes beyond the simple "succession" stage to the "co-occurrence"
stage where two behaviours occur during the same timespan or the
"principle" stage in which a single principle accounts for diverse forms
of behaviour. In terms of observational data, given that many forms occur
or don't occur at the same stage, how can it be shown which correlate
with each other and which don't? All the forms present at the same stage
correlate in the sense that they coexist and are part of the same grammar;
what are the grounds for believing some are more closely related than
others? There may be entirely independent reasons why two things happen
at the same time; a paradigm statistical example showing correlation is
not causation is the clocks in town all striking twelve simultaneously.
Correlation is more problematic when it relies on absence rather than
presence of forms. The Hyams (1986) analysis depends on a link between
null-subject sentences and lack of expletive subjects; the Hyams (1987)
analysis depends on a link between null-subject sentences and lack of
inflections. UG can predict grouping of precise features missing from
children's language and their absence can be correlated closely; but, if
their absence coincides with large numbers of other absent features, the
validity of such a correlation becomes hard to test; why should any two
pairs of missing bits of the sentence be more related than any two other
missing bits? Early children's language is difficult for observational data
because it is so deficient: arguing from absence provides too unconfined
a set of possibilities for correlation.
8. GENERAL REQUIREMENTS FOR OBSERVATIONAL DATA IN UG RESEARCH
Let us sum up some of the requirements for observational data in a series

of statements:
- development is not acquisition. Any data from observation of children

must be related to the other processes and faculties involved in speech
production, i.e. to performance. Sentences from actual children's speech
do not have a unique explanation in terms of linguistic competence, as
44 Vivian Cook
single-sentence evidence may have; alternative explanations from non-

linguistic sources have always to be taken into account. At the moment,
because of the unknown aspects of the developmental process, the linguistic
explanations have to be considered defaults to be kept or modified when
a broader framework is available.
- E-language data must be representative. Because of the distortions of

actual performance, isolated sentences or small numbers of sentences cannot
be trusted. Statements can never be based on isolated sentences, simply
because it is uncertain how accidental these may be. Ways of balancing
E-language corpora must be employed to ensure representativeness across
children in general and across the speech of one child in particular; tests
of statistical significance can be employed. Without this the data may
illustrate a point but cannot be trusted for firmer conclusions.
- like must be compared with like. The sentences produced by children

cannot be compared directly to the I-language single sentences used in
linguistic theories but should be compared with the equivalent adult
productions. That is to say, performance sentences should not be compared
with idealised example sentences, again a familiar point rephrasing
Chomsky (1965, 36), "... one can find out about competence only by
studying performance, but this study must be carried out in clever and
devious ways, if any serious result is to be obtained". Children's perfor-
mance should be compared with adults' performance rather than with
adults' competence as shown in single-sentence evidence.
- evidence of presence is preferable to evidence from absence. In an E-

language account the first responsibility is to what actually occurs. Evidence
from absence is intrinsically open to many interpretations; it loses some
of its strength in early children's speech because of the multiplicity of
plausible nonlinguistic reasons for children's deficiencies.
- correlations should chiefly correlate positive data. Correlations between

non-occurring data should be treated with caution in early children's
observational data, again because of the general deficiencies of the child's
speech.
The main conclusion to this paper is that the use of observational data
within the UG theory of language acquisition must always be qualified;
such data should be treated as showing the interaction of complex
performance processes that are themselves developing. The work with
observational data by Hyams, Radford, and others has provided a tre-
mendous revitalisation of the UG theory in recent years; greater discussion
should take place on the methodological status of observational data within

a basically I-language theory. Observational data are one of the many
sources of evidence that are available outside the bounds of the poverty
of the stimulus argument. This paper has concentrated on the pitfalls and
possibilities of this form of data. Other techniques of investigation have
their own pros and cons - grammaticality judgements, elicited imitation,
act-out comprehension techniques, and so on - some of which are described
in Bennett-Kastor (1988); many of them could complement observational
data; for example the pro-drop explanation might be preferred over its
alternative because of evidence from experiments with comprehension or
elicited imitation of VS or null subject sentences. We should not be
disappointed if one source of evidence is not in itself sufficient. As Fodor
(1981) pointed out, a scientific theory should not confine itself to a certain
set of facts but "any facts about the use of language, and about how
it is learnt... could in principle be relevant to the choice between competing
theories". Together these alternative forms of evidence complement the
basic appeal to the poverty of the stimulus used by the UG theory; in
the Feyerabend view of science multiple approaches to the same area should
be explored simultaneously (Feyerabend, 1975). Observational data is one
useful tool in our kit, provided it is used with appropriate caution and
supplemented with other tools when necessary.
REFERENCES
Baddeley, A. D. 1986. Working Memory. Oxford: Clarendon Press.

Bennett-Kastor, T. 1988. Analyzing Children's Language. Oxford: Blackwell.
Bloom, L., P. Lightbown and L. Hood. 1975. Structure and Variation in Child Language.
Monographs of the Society for Research in Child Development Serial No 160, Vol 40,
No 2.
Brown, R. 1973. A First Language: The Early Stages. London: Allen and Unwin.
Cairns, H. S. 1984. Current issues in research in language comprehension. In R. Naremore
(ed.) Recent Advances in Language Sciences. College Hill Press.
Carroll, J. M.,T. G. Beverand C. R. Pollack. 1981. The non-uniqueness of linguistic intuitions.
Language 57. 368-383.
Chomsky, N. 1965. Formal discussion: the development of grammar in child language. In
U. Bellugi and R. Brown (eds.) The Acquisition of Language. Purdue University, Indiana.
Chomsky, N. 1980. On cognitive structures and their development. In M. Piattelli-Palmarini.
1980 (ed.) Language and Learning: the Debate between Jean Piaget and Noam Chomsky.
London: Routledge Kegan Paul.
Chomsky, N. 1981a. Lectures on Government and Binding. Dordrecht: Foris.
Chomsky, N. 1981b. Principles and parameters in syntactic theory. In N. Hornstein and
D. Lightfoot (eds.) Explanations in Linguistics. London: Longman.
Chomsky, N. 1986a. Knowledge of Language: Its Nature, Origin and Use. New York: Praeger
46 Vivian Cook
Chomsky, N. 1988. Language and Problems of Knowledge: The Managua Lectures. Cambridge,
Cook, V. J. 1973. The comparison of language development in native children and foreign
adults. IRAL XI/1. 13-28.
Cook, V. J. 1988. Chomsky's Universal Grammar: An Introduction. Oxford: Blackwell.
Cook, V. J., in progress. Universal Grammar and the child's acquisition of word order
in phrases.
Crain, S. C. and I. Nakayama. 1983. Structure dependence in grammar formation. Language
63. 522-543.
Domum, D. 1989. Plumbing the bidet depths. The Weekend Guardian. 11th April, p . l l .
Donleavy, J. P. 1971. The Onion Eaters. London: Eyre and Spottiswode.
Dulay, H. C. and M. K. Burt. 1973. Should we teach children syntax? Language Learning
23. 245-258.
Feyerabend, P. 1975. Against Method. London: Verso.
Fodor, J. A. 1981. Some notes on what linguistics is about. In N. Block (ed.) Readings
in the Philosophy of Psychology. 197-207.
Halliday, M. A. K. 1985. An Introduction to Functional Grammar. London: Edward Arnold.
Hyams, N. 1987. The setting of the null subject parameter: a reanalysis. Paper presented
to the Boston University Conference on Child Language Development.
Ingram, D. 1989. First Language Acquisition. Cambridge: Cambridge University Press.
Lust, B., J. Eisele and N. Goss (in prep.). 'The development of pronouns and null arguments
irfxhild language', Cornell University.
Matthei, E. 1981. Children's interpretation of sentences containing reciprocals. In Tavakolian
(ed.), 58-101.
Morgan, J. L. 1986. From Simple Input to Complex Grammar. Cambridge, Massachusetts:
MIT Press.
Newport, E. L. 1976. Motherese: the speech of mothers to young children. In N. Castellan,
D. Pisoni, and G. Potts (eds.) Cognitive Theory, vol 2. Hillsdale: Erlbaum.
Preston, D. R. 1989. Sociolinguistics and Second Language Acquisition. Oxford: Blackwell.
Radford, A. 1986. Small children's small clauses. Bangor Research Papers in Linguistics 1.
1-38.
Radford, A. 1988. Small children's small clauses. Transactions of the Philological Society
86. 1-43.
Saleemi, A. 1988. Learnability and Parameter Fixation. Doctoral Dissertation, University
of Essex.
Sinclair, H. and J. Bronckart. 1971. SVO a linguistic universal? Journal of Experimental
Child Psychology 14. 329-348.
Stromswold, K. 1988. Linguistic representations of children's wh-questions. Papers and Reports
in Child Language 27.
Wells, C. G. 1985. Language Development in the Preschool Years. Cambridge: Cambridge
University Press.
Parameters of Metrical Theory and
Learnability*
Michael Hammond
University of Arizona
Let us take Universal Grammar (UG) to denote the innate predisposition

a speaker has to deduce grammars within a certain range on presentation
of appropriate data. The criterion of learnability is the requirement that
there is an algorithm by which a learner can deduce any grammar in the
set licensed by UG on presentation of some finite and positive set of data
(Wexler & Culicover, 1980).
It is important to investigate this conception of UG because it is, in
principle, possible to construct theories of UG that do not satisfy the
criterion of learnability. Such theories of UG are unacceptable, as the
goal of the investigation of UG is a theory that can explain how it is
that people acquire adult grammars (i.e. learn languages).
The criterion of learnability has not often been applied to the domain
of phonology. 1 This is unfortunate as, in at least some domains of
phonology, the structure of UG is much clearer than in other areas of
grammar. The criterion of learnability can be applied successfully only
when we are relatively confident about the nature of the theory of UG.
Metrical theory is a domain of linguistic theory where the structure
of UG is relatively clear. While there is still considerable debate over a
number of issues in the theory, its general character is abundantly clear
from the consensus that can be gleaned from recent alternative formulations,
e.g. Halle (1989), Halle and Vergnaud (1987), Hammond (1984/1988, 1986,
1990b), Hayes (1981, 1987, 1989), Levin (1990) et cetera.
In this paper, the criterion of learnability will be applied to metrical
theory. Several surprising and valuable results follow from this exercise.
First, it is shown that metrical theory does NOT satisfy the criterion of
learnability per se. In other words, the grammars licensed by UG are not
all reachable by learners. Second, it is argued that a hitherto unexplained
fact about the parameters of metrical constituent construction finds
explanation only when the criterion of learnability is invoked. Third, it
is suggested that the metrical components of natural languages are a function
of the interaction of UG and a constraint on short-term memory.
The organization of this paper is as follows. First, the structure of metrical
theory is briefly reviewed. Second, the hypothesis is presented that metrical
systems are all learnable on the basis of words of seven syllables or less.
48 Michael Hammond
Third, an argument for the seven-syllable hypothesis is presented from

the asymmetric number of options employed in metrical constituency at
different levels of the metrical hierarchy. Finally, an explanation for this
asymmetry is presented and the consequences are discussed.
1. METRICAL THEORY
As noted above, there are a number of versions of metrical theory being

debated in the literature. However, all current versions share a number
of properties. First, all versions of metrical theory, past and present, share
the claim that stress should be represented hierarchically. In other words,
stress patterns should not be represented in terms of a linear feature [stress];
rather, stress is encoded by structural relationships in a treelike repre-
sentation. In (1), the stress pattern of Apalachicola is indicated using the
linear stress feature of Chomsky and Halle (1968) and in terms of a metrical
representation. 2
The linear feature encodes stress in terms of the numerical values
associated with the different vowels. A [lstress] indicates primary stress;
[2stress] indicates secondary stress; and [Ostress] indicates stresslessness.
The metrical representation encodes degree of stress in terms of the height
of the different columns dominating the vowels. The metrical representation
also indicates grouping relationships with parentheses. The metrical re-
presentation claims that stressed syllables are grouped with neighbouring
stressless syllables in particular ways. This claim is not made by the linear
representation and is amply supported by the metrical literature.3
(1) x level 2 word tree

(x x x) level 1 feet
(x x) (x x)(x x) level 0 syllables (metrical)
Apa lachicola
2 0 2 010 [stress] (linear)
All current versions of metrical theory include analogues to each of the

components in (2):
(2) a. constituents,
b. directionality,
c. iterativity,
d. extrametricality,
e. destressing,
f. scansions/levels.
All theories include a set of constituents (2a). The constituents in (1) are
binary and left-headed. (The stress occurs on the left side of the disyllabic
unit.) However, there are other kinds of constituents as well. For example,
there are binary right-headed constituents as well (e.g. in Aklan per Hayes
1981). Theories differ in how many constituent types they allow and in
the precise properties of those constituents.
All theories include a parameter of directionality (2b). Are constituents
assigned from left to right or from right to left? The directionality parameter
has an effect in polysyllabic words with an odd number of syllables.
All theories include some mechanism to deal with superficial iterativity
(2c). Are constituents constructed iteratively, filling the span with stresses,
or is only a single constituent built, placing a stress at or near one of
the peripheries of the domain?4
All theories include some analogue to the mechanism of extrametricality
(2d). This allows a peripheral syllable or higher-level constituent to be
excluded from metrification.
Metrical theory also includes some subsequent operations that will be
included under the rubric of "destressing" here (2e). Such rules manipulate
the metrical structure assigned by the parameters discussed above. In this
paper, some of the results concerning destressing rules of Hammond (1984/
1988) will be assumed. First, destressing rules may only remove stresses.
Second, stresses may only be removed to resolve stress clashes. Last, the
main stress of a domain may not be removed.
Finally, all versions of metrical theory include levels and scansions (2f).
These are discussed in section 4 below.
There are other aspects of metrical theory which are not discussed here,
e.g. cyclicity, exceptions, and the relationship between segmental rules and
metrical structure. Space limitations preclude an adequate treatment of
these. It is expected that including them would not alter the results arrived
at here.
2. LEARNABILITY
In this section, the criterion of learnability is reviewed. As noted above,

the criterion of learnability requires that any grammar licensed by UG
be reachable by some algorithm from data the child might be exposed
to. If no constraint is imposed on the data learning proceeds from, then
the criterion of learnability is vacuous. For example, if we hypothesise
that the set of grammars licensed by UG is enumerable, then it need only
be assumed that the data include an explicit statement about which grammar
is selected. This is obviously an unreasonable picture of the kind of data
children are exposed to.
50 Michael Hammond
There are two minimal assumptions about the kinds of data that children
are exposed to that are accepted here. The first is that learning proceeds
on the basis of positive evidence (but cf. Saleemi, this volume). That is,
it is normally assumed that children are not systematically corrected for
ill-formed utterances (Brown and Hanlon, 1970).
(3) Learning is based on positive evidence.
A second assumption that is often made and that will be adopted here
is that learning proceeds on the basis of a presentation of a finite set
of data. This is a natural consequence of the assumption that speakers
do actually come up with a grammar at some point and that the time
up to that point is finite.
(4) Learning is based on finite evidence.
From these two assumptions it is possible to show that certain possible

theories of U G satisfy the criterion of learnability and others do not. Wexler
and Culicover (1980) present the following schematic example of a theory
of UG that does not satisfy the learnability criterion. Imagine a theory
of phrase structure that licenses the following (infinite) set of grammars.
Assume, for simplicity, that " a " is the only terminal symbol, or word,
in all the candidate grammars. The grammar H 0 licenses sentences of any
length. The grammars denoted by H, license strings up to i words in length.
(5) i. H0 = {a,aa,aaa }
ii. H, = {a}
iii. H2 = {a,aa}
iv. H3 = {a,aa,aaa}
v. H, = {a,aa,...,a<}
Such a theory of UG does not satisfy the criterion of learnability. Consider,

for example, what would happen if the learner were presented with the
following set of data: {aa, aaaa}. This set is consistent with any grammar
H„ where / > 3 or i — 0. If, in such a case, the learner selects H 0 , then
there is no context where the learner would ever be prompted to select
a grammar H„ where i ^ 0. On the other hand, if the learner chooses
some grammar H„ where i # 0, say H 3 , then there is no context in which
the learner would select H 0 . Either way, some grammar is unreachable
and the theory of UG does not satisfy the criterion of learnability.
Wexler and Culicover (1980) deal with this problem by constraining
the theory of UG so that it does not have the crucial properties of the
example in (5).5 Specifically, they suggest that the theory of transformations,
which is the domain of grammar they are concerned with, is subject to

the constraints listed in (6) below.
(6) a. Binary Principle

b. Freezing Principle
c. Raising Principle
d. Principle of No Bottom Context
e. Principle of the Transparency of Untransformable Base Struc-
tures
This approach is a good one if it can be shown that constraints on UG

like those in (6) are desirable. In this paper, another tack will be taken.
The basic idea is that the criterion of learnability is too strong. It requires
that all grammars licensed by UG be learnable, which makes the implicit
assumption that all grammars licensed by UG can occur. Here, the
possibility is suggested that there are grammars licensed by UG that are
non occurring and that their non occurrence should be accounted for not
by restricting UG, but by constructing a learning algorithm that does not
allow the learner to reach the non occurring grammars.
Schematically, the two approaches are compared in (7). In (7a), Wexler
and Culicover's approach is schematised. There is a theory of UG that
licenses three grammars and excludes a fourth and fifth. The learning
algorithm, applied to that theory of UG, allows all three graihmars licensed
to be learned. In (7b), there is a theory of UG that licenses four grammars
and excludes a fifth. The learning algorithm, applied to this latter theory
of UG only allows three of the four grammars licensed to be learned.
Under (7a), the theory of UG does all the work; under (7b), the theory
of learning does some of the work.
(7) a. UG-{G,,G2,G3} (*G 4 , *G 5 , ....)

LA (UG) = { G „ G 2 , G 3 }
b. UG — {Gi, G 2 , G 3 , G 4 } (*G 5 ,....)
LA (UG) = {G„ G 2 , G 3 }
Where should the work be done? The answer depends on the character
of the theories of learning and UG that result. For example, if excluding
G 4 in the learning algorithm would vastly complicate the learning algorithm,
but excluding G 4 in UG would only slightly complicate UG, then G 4 should
be excluded by UG. If, on the other hand, excluding G 4 in UG would
overcomplicate UG, but excluding it in the learning algorithm would be
relatively minor, then G 4 should be excluded by the learning algorithm.
In the next two sections, a case is presented that would seem to be best
accounted for in terms of the approach in (7b).
52 Michael Hammond
3. THE SEVEN-SYLLABLE HYPOTHESIS
This paper is an interim report on a larger project developed in Hammond

(1990b). Here, only the basic hypothesis is presented. The basic hypothesis
is that all occurring stress systems licensed by the theory of metrical
phonology can be learned on the basis of words of seven syllables or
less.
(8) Seven-syllable hypothesis:

All occurring metrical systems can be learned on the basis of
words of seven syllables or less.
The word "occurring" here is crucial to the argument for (7b) above.
It will be shown that while all occurring metrical systems are learnable
on the basis of words of seven syllables or less, the larger set of metrical
systems licensed by UG is not. This distinction will form the centrepiece
of the argument for (7b).
A proof can be constructed if any two existing metrical systems can
be distinguished on the basis of words with n syllables (where n < 8).
Compare, for example, the following two systems. In Language I, a
simplified version of English, trochaic feet insensitive to syllable weight
are constructed from right to left. The rightmost foot is elevated to main
stress and adjacent stresses are resolved by removing one of the clashing
stresses. In Language II, a simplified version of Lenakel (Hammond, 1986,
1990b), one trochee is built from the right and then as many as possible
are built from the left. Again, adjacent stresses are resolved by destressing.
In both languages, destressing operates in a familiar fashion. The second
of two adjacent stresses is removed unless it is the main stress. Otherwise,
the first is removed.
The patterns produced in words of different lengths are diagrammed
with schematic words in (9). Notice how the two patterns only become
distinct in examples of at least seven syllables in length.
(9) language I language II

a a
aa aa
aaa aaa
aaaa aaaa
aaaaa aaaaa
aaaaaa aaaaaa
aaaaaaa aaaaaaa
The comparison in (9) shows that the two systems considered require that
learners be exposed to words of at least seven syllables in length. Hammond

(1990b) shows that all occurring systems licensed by metrical theory can
be distinguished on the basis of words of no more than seven syllables.
Notice that the facts of (9) necessitate a seven-syllable minimum regardless
of how the learner traverses the search space. If both of the grammars
in (9) occur, and if they truly are indistinct for words of less than seven
syllables, then to reach both grammars, a learner must have access to
words that would distinguish the analyses. Thus the seven-syllable hy-
pothesis is independent of how learners actually learn. How learners set
parameters is irrelevant to the fact that words of at least seven syllables
are necessary to have access to the distinctions that are necessary to deduce
the correct grammar.
4. LEVELS AND OPTIONS
As indicated in (2), all versions of metrical theory include some statement

about the number of options that occur in the metrical hierarchy. Here,
it is shown that the number of options available at different levels of
the hierarchy can be explained by an extralinguistic constraint on learning.
Metrical theory allows at most three levels of metrical constituency within
words: feet, cola, and word tree.
(10) x level 3 word tree W

(x x x) level 2 cola C
(x x ) (x x ) (x) level 1 feet F
(x x) (x x) (x x) (x x) (x) level 0 syllables s
aa aa a a aa a
In principle, one might expect to find the same options and parameters
available at each level of the hierarchy, but, in fact, that does not occur.
At each successively higher level, the number of options available decreases.
This fact finds an explanatory solution only when one looks to learnability
concerns.
In (11), the options for foot construction are diagrammed. 6
(11) a. all licensed constituents;

b. directionality;
c. iterativity;
d. extrametricality;
e. destressing;
f. multiple scansions.
54 Michael Hammond
All the options presented in (2) are included in (11).

In (12), the possibilities for colon construction are given.
(12) a. only one constituent: [[F] F];

b. directionality;
c. iterativity;
d. F-extrametricality;
e. no F-destressing;
f. one scansion maximum.
Here, there are fewer possibilities. For example, there seems to be only
left-headed binary cola. 7 All languages that exhibit cola exhibit left-headed
binary cola, e.g. in Tiberian Hebrew, Passamaquoddy, Hungarian, Odawa,
etc. Moreover, no language exhibiting cola exhibits more than one scansion
of cola. The fact that these additional options are not available at the
colon level has no explanation within any current version of metrical theory.
At the word tree level, the options are even more restricted.
(13) a. only two constituents: [[x]...] or [...[x]];

b. no directionality;
c. no iterativity;
d. no C-extrametricality;
e. no C-destressing;
f. one scansion maximum.
Again, this absence of the full power of metrical theory at the word tree
level is unexplained in all versions of metrical theory. 8
There are two ways to go about rectifying this lack of explanation in
metrical theory. One possibility might be to alter the theory in some radical
fashion so as to preclude these options at higher levels of the hierarchy.
This approach is problematic in two ways.
First, it would result in a rather "numerological" version of metrical
theory. The options can only be excluded by brute force and the resulting
theory does not have a desirable character.
The second problem is that excluding these options would be unex-
planatory. That is, altering U G directly would miss an important gener-
alization about the nature of the restrictions outlined in (11), (12), and
(13). Specifically, the restrictions on options available at each level are
directly related to the fact that words of seven syllables or less are sufficient
to distinguish all occurring stress systems. If the same number of options
were available at each level, then the seven-syllable hypothesis could not
be maintained. As a demonstration of this, let us consider several possible
enrichments of the system in (11), (12), and (13).
Consider what would happen to the seven-syllable limit if cola could

be constructed bidirectionally (12f). In addition to systems that constructed,
e.g. cola from right to left, there would also be systems where one colon
was constructed from the right, and then as many as possible from the
left. Assuming that those cola were constructed over, e.g. trochaic feet,
more then seven syllables would be required to distinguish them, as
diagrammed in (14). Such systems would only be distinct in words of
at least nine syllables.
(14) right-to-left: bidirectional:

x x x x x x
(x ) (x X ) (x x) (x X ) (x ) (x x)
(x x) (x x) (x x) (x x) (x) (x x) (x x) (x x) (x x) (x)
aa aa aa aa a aa aa aa aa a
To explicitly exclude the possibility of bidirectional cola from UG would

be a mistake as it would not capture the generalization that this exclusion
is directly tied to the seven-syllable maximum.
As a second example, consider what would happen if the set of
constituents in (13a) were expanded to include, say, an iterated trochee.
As shown in (15), such a system would only become distinct from a left-
headed word tree when words of at least nine syllables are considered.
(15) a. x
(x X x)
(x ) ( x X )(x x)
(x x) (x x) (x x) (x x) (x) word tree = [fx]...]
aa aa aa aa a
b. x
(x X x)
(x ) (x X ) (x x)
(x x) (x x) (x x) (x x) (x)
aa aa aa aa a word tree = [...[x]]
c. x x
(X X ) (X)
(x ) (x X ) (x x)
(x x) (x x) (x x) (x x) (x)
a a a a a a a a a word tree = [[x]x] (R->L)
As a third and final example, consider the possibility of colon-extrame-

tricality (13d). If colon-extrametricality were allowed under word tree
construction, it would entail also that more than seven syllables be required
to distinguish different systems. This is shown in (16)/(17). First, trochees
56 Michael Hammond
are built from left to right. Then cola are built right to left. The two
grammars diverge at that point. In the first, the rightmost colon is
extrametrical and a right-headed word tree is built. In the other, a left-
headed word tree is built. Figure (16) shows how these systems are indistinct
with words of eight syllables (or less); (17) shows how they are distinct
in words of nine syllables (or more).
(16) x x
(x ) <x> (X x)
(x X )(x x) (x X )(x x)
(x x) (x x) (x x) (x x) (x x) (x x) (x x) (x x)
a a a a aa aa aa aa aa aa
(17) x x
(x x ) <x> (x X X )
(X ) ( x x )(x) (x ) ( x x) (x x)
(x x) (x x) (x x) (x x) (x) (x x) (x x) (x x) (x x) (x)
aa aa aa aa a aa aa aa aa a
Thus the seven-syllable hypothesis can explain why fewer options are
available at successively higher levels of the metrical hierarchy.
Notice that the particular options available at any level does not follow
from the seven-syllable restriction. For example, it was argued above that
the seven-syllable restriction accounts for why the set of word tree
constituents cannot be augmented with an iterated trochee. The seven-
syllable restriction does not explain why the word tree constituents are
as in (18a), and not as in (18b). In (18a), the actually occurring possibilities
are given. In (18b), the left-headed unbounded foot is replaced with an
iterated trochee. The number of choices in each system is the same; the
particular choices are different.
(18) a. actual word tree constituents

i. [[x]...]
ii. [...[x]]
b. possible word tree constituents
i. [[x]x], iterated
ii. [...[x]]
The constituents of (18b) can also be distinguished in words of seven

syllables. Figure (19) shows how (18bi) and (18bii) are distinguishable in
words of seven syllables. The examples in (19) are a worst-case scenario
where there are also cola.
(19) a. x b. x
(X X) (X X)
(x x )(x x) (x x )(x x)
(x x) (x x) (x x) (x) (x x) (x x) (x x) (x)
aa aa aa a aa aa aa a
Thus an explanation for the specific asymmetries of (11), (12), and (13)
in terms of the seven-syllable hypothesis has to be supplemented with
something else. That "something else" would appear to be some kind
of markedness. Iterated trochees are more marked than [[x]...]. The
particular options available at any level are the least marked. The specific
details remain to be worked out.
To summarise thus far, it has been hypothesized that metrical systems
are all distinguishable on the basis of words of seven syllables or less.
It has been shown that there is an asymmetric use of the parameters provided
by the theory at the different levels of the metrical hierarchy. It has been
argued that directly accounting for this asymmetry would result in an
undesirable theory because the account would result in an inelegant theory
that does not explain the relationship between the restrictions and the
seven-syllable hypothesis. The seven-syllable hypothesis predicts that fewer
options should be available at higher levels of the metrical hierarchy.
Markedness accounts for what specific options are available at those higher
levels.
5. SHORT-TERM MEMORY CONSTRAINT
In this section, an explanation for the hierarchical asymmetry discussed

in the previous section is proposed. This explanation requires that the
criterion of learnability be revised as proposed above.
Let us suppose that learning must proceed on words of seven syllables
or less. Longer words are learnable, but cannot be used in extracting the
metrical system of a language.
(20) Seven-syllable constraint

Learners cannot make use of words longer than seven syllables to
extract metrical generalizations.
Possible support for this proposal comes from the psychological literature.
Miller (1967) discusses a number of psychological results that seem to
converge on the conclusion that human short-term memory is basically
limited to retaining seven elements (plus or minus two). The proposal
made here is that the seven-unit maximum on short-term memory applies
to language learning as well.
58 Michael Hammond
The idea is that forms can only be used to learn stress systems if they
can be held in short-term memory long enough for the learner to extract
the relevant generalizations. Words longer than seven syllables are learnable
because short-term memory does not constrain other aspects of acquisition.
The hypothesis is given in (21) below.
(21) Limit on short-term memory:
Learners cannot make use of forms longer than seven

syllables because of a general extralinguistic
constraint on the size of short-term memory.
The specific claim is that the nonlinguistic effects Miller discusses are
mirrored by a constraint on the learning algorithm for metrical systems.
This constraint prevents the learner from paying attention to words of
more than seven syllables.
This proposal solves both of the problems mentioned above. First, UG
is not complicated needlessly. The theory of UG allows all options at
all three levels, and the restrictions at higher levels are a function of the
fact that the number of options available increases the number of syllables
necessary to distinguish the resulting systems. The particular options
available are a function of markedness as discussed above. This proposal
also solves the second problem. The asymmetry is directly tied to the seven-
syllable restriction expressed as (20) or (21). For example, the absence
of bidirectional cola follows from the constraint on short-term memory
and is explained by it. The alternative tactic of complicating UG does
not connect the absence of bidirectional cola with the seven-syllable limit
at all.
Finally, this proposal is more general in that the seven-unit effect is
expected to have extralinguistic consequences, just as Miller shows.
In order to maintain this explanation, several aspects of the proposal
must be fleshed out. First, unlike some of the experiments Miller discusses,
it looks like the restriction with respect to language learning refers to
precisely seven syllables. It does not allow for variation. This is taken
as progress in our understanding of short-term memory.
Second, unlike the effects Miller discusses, the restriction on short-term
memory as it affects language is specific to the unit syllable. In the
psychological literature, the particular unit restricted in short-term memory
can vary. This is not the case in metrical phonology. The seven-syllable
restriction is specific to syllables, and not some other phonological unit,
like cola or word trees. This is arguably a consequence of the fact that,
while a variety of factors influence how metrical structure is applied, it
is always applied to syllables. For example, while syllable weight in a
language like English affects metrical structure, that structure is still applied
to syllables.9
Third, it might be thought that the seven-syllable limit is excessive as
there are many languages, e.g. English, where words of seven syllables
or more are vanishingly rare. This is not a problem at all, however. The
seven-syllable restriction makes the strong prediction that languages where
children are exposed only to relatively short words, must opt for default
settings of parameters when contradictory data are impossible because
of the length of words. Contrast languages like English and Lenakel. In
English, there is a single scansion of right-to-left footing. Moreover, children
are exposed to relatively short words. Lenakel, on the other hand, exhibits
bidirectional footing (at least two scansions from different directions). The
demonstration in (14) requires that Lenakel children be exposed, at the
appropriate point of acquisition, to words of at least seven syllables. Our
approach predicts that learners not exposed to words of sufficient length
will have to opt for the default choice between one scansion and two
scansions: presumably one scansion (as in English).10 As a second example
of this sort, consider the possibility of foot extrametricality. 11 Contrast
the following systems. All involve building trochees from left to right.
The first two build a right-headed word tree. The second system also makes
a final degenerate foot extrametrical. The third builds a left-headed word
tree. As shown in (22), the first two systems only become distinct in words
of three syllables or more. The latter two only become distinct when words
of four syllables are considered.
X X X
(X) (x) (X)

(X) (X) (X)
a a a
X X X
(X) (x) (x)

(X X) (X X) (X X)
a a a a aa
x) X X
(X X) (x ) <X> (X X)
(XX) (X) (X X) (X) (X X) (X)
aa a aa a aa a
X X X
(X X) (X X) (X X)
(x x)(x x) (X x)(x x) (x x)(x x)
aa aa aa aa a a aa
60 Michael Hammond
Again, the system developed here requires that learners exposed to words
of insufficient length to distinguish these systems will opt for the default
settings for the parameters that separate these systems.12
Finally, the approach taken here makes an extremely interesting pre-
diction about other components of grammar. If the hierarchical asymmetry
is truly a function of an extralinguistic constraint on the size of short-
term memory, then we would expect the same constraint to also affect
other domains of grammar, e.g. syntax, semantics, etc.
To summarise, it has been shown that there is an asymmetry in the
use of metrical parameters at different layers of the metrical hierarchy.
This asymmetry is most appropriately handled by imposing an extralin-
guistic constraint on the learning of stress systems. This forces us to revise
the criterion of learnability so that only occurring grammars need to be
learned. It also forces us to revise our understanding of the character
and relevance of short-term memory.
These conclusions are based on a comparison of the predictions made
by metrical theory and the stress systems of the world. If there are significant
flaws in our understanding of either of these, the results would have to
be reconsidered. This is not a problem by any means. The proposal made
here is easily falsified and thus provides clear directions for further
investigation.
Last, note that our results with respect to the learnability criterion are
independent of how learning actually takes place.13 The seven-syllable
hypothesis says nothing about how learning happens. What it says is about
what the input to learning must be.
FOOTNOTES
•Thanks for useful discussion to the participants in my Spring 1990 seminar at the University
of Arizona, Diana Archangeli, Andy Barss, Robin Clark, Dick Demers, Elan Dresher, Kerry
Green, Terry Langendoen, Adrienne Lehrer, John McCarthy, Cecile McKee, Shaun O'Connor,
Dick Oehrle, Doug Saddy, Paul Saka, and Sue Steele. Thanks also to the editor and two
anonymous reviewers. Some of this material was presented at G L O W (Hammond, 1990a).
All errors are my own.
1. See, however, Braine (1974), Dell (1981), Dresher (1981), Dresher and Kaye (1990), and
McCarthy (1981).
2. This particular representation is used for typographical convenience. In all respects, the
representation employed here is a notational variant of the "lollipop" representation used
by Hammond (1984/1988) etc. See Hammond (1987) for discussion.
3. See the references cited in the text.
4. Halle and Vergnaud (1987) accomplish this indirectly with the mechanism of conflation.
5. See Wexler and Culicover (1980) and Gold (1967) for a discussion of what these properties
are.
6. Theories differ with respect to the number of constituents allowed. For example, Hayes
(1987) has three, Halle and Vergnaud (1987) have five, Hammond (1990b) has nine, and
Hayes (1981) has twelve. All of these theories allow all possibilities at the foot level.
7. Beat addition of the sort that promotes the first stress of Apalachicola can be accomplished
with a binary left-headed colon. An unbounded colon is not necessary for cases like this.
8. As pointed out to me by Iggy Roca, some of the parameters in (13) are dependent in
an interesting sense. For example, from the fact that the only constituents available at this
level are unbounded, it follows that there is no directionality, no iterativity, and a one-
scansion maximum. While this accounts for some of the restrictions (13b,c,f), it does not
account for all of them (13a,d,e).
It might be possible to derive the feet in (13a) from the requirement that metrical trees
terminate in a single node. This requirement is a stipulation that is otherwise unmotivated.
9. This is shown by the fact that syllables can never be split into separate metrical constituents
(Hayes, 1981). There are languages like Southern Paiute where stress is arguably assigned
to the mora, rather than the syllable. In such a language, the seven-unit restriction may
apply to morae. The Southern Paiute stress system is consistent with either hypothesis.
10. Obviously, it is unethical to test this hypothesis experimentally. The hypothesis can be
verified observationally, however, if the language learner's experience can be assessed for
word length at the critical stage. Language acquisition research has not reached the point
where this information is available.
11. Foot-extrametricality can only apply to degenerate feet, e.g. in English, Aklan, Odawa,
etc. (Hammond, 1990b).
12. See Dresher and Kaye (1990) for one proposal regarding default parameters in metrical
theory.
13. For some interesting recent proposals, see Barss (1989) and Clark (1990).
REFERENCES
Barss, Andrew. 1989. Against the Subset Principle. Paper presented at WECOL, Phoenix.
Baker, C.L. and John J. McCarthy (eds.). 1981. The Logical Problem of Language Acquisition.
Cambridge, Massachusetts: MIT Press.
Braine, M. 1974. On what might constitute learnable phonology. Language 50. 270-299.
Brown, R. and C. Hanlon. 1970. Derivational complexity and the order of acquisition of
child speech. In J.R. Hayes (ed.) Cognition and the Development of Language, New York:
Wiley.
Chomsky, Noam and Morris Halle. 1968. The Sound Pattern of English, New York: Harper
& Row.
Clark, Robin. 1990. Some elements of a proof for language learnability. Ms. Université
de Geneve.
Dell, F. 1981. On the learnability of optional phonological rules. Linguistic Inquiry 12. 31-
38.
Dresher, Bezalel Elan. 1981. On the learnability of abstract phonology. In Baker and McCarthy
(eds.), 188-210.
Dresher, B. Elan and Jonathan D. Kaye. 1990. A computational learning model for metrical
phonology. Cognition 34. 137-195.
Gold, E.M. 1967. Language identification in the limit. Information and Control 10. 447-
474.
Halle, Morris. 1989. The exhaustivity condition, idiosyncratic constituent boundaries and
other issues in the theory of stress. Ms. MIT.
62 Michael Hammond
Halle, Morris and Jean-Roger Vergnaud. 1987. An Essay on Stress. Cambridge, Massachusetts:
MIT Press.
Hammond, Michael. 1984/1988. Constraining Metrical Theory: A Modular Theory of Rhythm
and Destressing, 1984 UCLA doctoral dissertation, revised version distributed by IULC,
1988, published by Garland, New York.
Hammond, Michael. 1986. The obligatory-branching parameter in metrical theory. Natural
Language and Linguistic Theory 4. 185-228.
Hammond, Michael. 1987. Accent, constituency, and lollipops. CLS 23/2. 149-166.
Hammond, Michael. 1990a. Degree-7 learnability. Paper presented at GLOW, Cambridge,
England.
Hammond, Michael. 1990b. Metrical Theory and Learnability. Ms. U. of Arizona.
Hayes, Bruce. 1981. A Metrical Theory of Stress Rules, 1980 MIT Doctoral Dissertation,
revised version available from IULC and Garland, New York.
Hayes, Bruce. 1987. A revised parametric metrical theory. NELS 17. 274-289.
Hayes, Bruce. 1989. Stress and syllabification in the Yupik languages. Ms. UCLA.
Levin, J. 1990. Alternatives to exhaustivity and conflation in metrical theory. Ms. University
of Texas, Austin.
McCarthy, John J. 1981. The role of the evaluation metric in the acquisition of phonology.
In Baker and McCarthy (eds.), 218-248.
Miller, George A. 1967. The magical number seven, plus or minus two: some limits on
our capacity for processing information. In G. A. Miller (ed.) The Psychology of Com-
munication. New York: Basic Books Inc. 14-44.
Wexler, Kenneth and Peter W. Culicover. 1980. Formal Principles of Language Acquisition.
Markedness and growth*
Teun Hoekstra
University of Leiden
Within generative-based acquisition studies two distinct implementations

of the innateness hypothesis may be distinguished. The traditional con-
ception is now commonly referred to as the continuity hypothesis (cf. Pinker
1984), which holds that all of UG is present at birth, while development
from initial state to steady state is determined by factors outside UG.
Various aspects can play a role in this development, such as general cognitive
growth, or memory growth in particular, in short factors that determine
learning. In contrast, the maturational hypothesis (cf. Borer and Wexler
1987) holds that UG itself develops gradually, in the sense that specific
principles of UG become available only after a certain period of maturation.
In this paper I want to discuss certain questions relating to the conception
of language acquisition in terms of parameter setting. An interesting issue
is the triggering problem (cf. Borer and Wexler 1987). If values of parameters
should meet a condition of learnability defined in terms of easily accessible
data to fix the value of a parameter, what delays the child in actually
fixing it? Evidently, the possible answers given by the two different
hypotheses are distinct: according to the maturational hypothesis (hen-
ceforth MH) the parameter P starts asking questions to the data only
after it has matured, while the continuity hypothesis (henceforth CH) should
provide an answer along different lines. A second question concerns the
way in which a particular stage of linguistic development relates to a
parameterised property if its value has not already been set. Suppose that
the actual output at stage n is consistent with one particular value (a)
of the parameter P, can we then say that P has (a) as its initial setting,
and that (a) is hence the unmarked value of P? If so, what does unmarked
mean, and does it tell us something about the two hypotheses mentioned
above?
Jakobson (1941) made the claim that crosslinguistic distribution and
developmental priority can be captured by the same notion of markedness.
This claim harbours a common research programme for traditional general
linguistics, taken as the study of linguistic systems, and the generative
paradigm as defined by its ultimate explanatory goal of language acqui-
64 Teun Hoekstra
sition. Yet, as I shall clarify below, there are several notions of markedness
in the current literature on language acquisition, which need to be kept
apart. Before getting into those matters, I shall start with a short description
of the parameters model.
1. PARAMETERS AND MARKEDNESS
Although solving the problem of language acquisition is central to the

generative research programme, actual work on language acquisition within
the framework has long been relatively scarce. The main problem with
early attempts to look at acquisition data from the generative perspective
was that too many interpretations were possible for the observations that
could be made. The situation for the study of adult systems was not
fundamentally different. The expressive power of earlier versions of
generative grammar was so rich as to hardly impose any restrictions on
the way in which a particular grammatical phenomenon could be analysed.
By way of illustration, consider the difference between English and French
regarding dative constructions. As is well-known, English allows two
complementation types for verbs of the g/ve-class, where French, if we
disregard the clitic construction, allows only the prepositional variant (*Je
donne Jean un livre vs. I give John a book). Although particular proposals
were around, the theory as such would allow an account of this difference
in at least the following ways:
a. in terms of lexical subcategorisations:
(1) give: {[ NP PP], [ NP NP] }

donner: [ NP PP]
b. in terms of PS-rules:
(2) English: VP — V (NP) (NP) (PP)* (S')

French : VP — V (NP) (PP)* (S')
c. in terms of transformational rules:
(3) dative shift: X V, NP to NP Y
1 2 3 4 5 6 = ^ 1 2 5 3 6
optional in English, not available in French
Given such freedom of expression allowed by UG for the analysis of adult

systems the child will have a hard time figuring out what the adequate
grammar for the language he is being exposed to should look like. Even
more difficult for the linguist is the interpretation of production data in
early stages of acquisition, as these data can also be analysed in a rich
variety of ways.
The first task facing generative theory was therefore to drastically reduce
the descriptive options made available by UG. Several changes in the theory
brought this goal within reach. Specifically, the abandonment of a con-
struction specific approach and/or its replacement by the modular con-
ception, according to which a particular construction can be seen as the
result of an interplay of several relatively simple modules. The reduction
of the transformational component to the move (a) format, available in
both French and English, made it impossible to express the difference
between these languages with respect to dative constructions in the manner
described in (3). The proposal to reduce the content of PS-rules to the
principle that the internal structure of a phrase is to be regarded as a
projection of lexical properties makes (2) unavailable. This leaves us with
(1) as a means to capture the difference between French and English,
but it will be clear that this can only be regarded as a description of
the difference, as it raises the question of why French should not have
lexical items with the properties of English give. In fact, the modular
approach leads us to ask even more general questions: is the fact that
French does not have such lexical items related to other properties of
French in which it differs from English, and can these sets of properties
follow from a single difference at a more abstract level? Could there be
a principle P-prep with two values, such that a positive value of the
parameter in P-prep yields a grammar of the French type, while a negative
value yields an English type system? For the case at hand, one might
think of a correlation of such properties as those in (4) (cf. Kayne 1981):
(4) preposition stranding:

a. English: Who did you vote fort
b. French: *Qui as-tu voté pourt
(5) particle constructions:

a. English: John called Bill up
John called up Bill
b. French: particle constructions are simply not available
(6) prepositional complementizer plus lexical subject:

a. English: We hoped for something nice to happen
b. French: *On espère de quelque chose passer
66 Teun Hoekstra
(7) believe-type with lexical subject

a. English: We believed John to have done it
b. French: *Nous croyons Jean avoir fait cela
Now that the expressive format of rule systems is replaced by a set of
more abstract principles, some parameterised, we can profitably undertake
the endeavour which is in a sense complementary to providing explanations
for the language acquisition problem, i.e. coming to grips with the variation
between languages. Clearly, this variation imposes a bound to the specificity
of the principles of UG in the sense that they must at least allow for
the attested variation. Hence, whereas the language acquisition problem
requires the principles of UG to determine the grammatical knowledge
of a particular grammar as closely as possible, where complete determi-
nation would constitute the optimal case, the complementary demand on
UG with respect to variation sets a limit to this fit. The study of
crosslinguistic variation from the perspective of UG suggests a notion of
markedness, which in turn becomes relevant to an understanding of
language acquisition. Several notions of markedness, or rather conside-
rations leading to the assumption that a certain option is marked, are
available in the literature. I shall first present these different notions of
markedness.
To start with, we can use the dative construction as an example. According
to one of the principles of UG, lexical NPs must be Case-marked. The
circumstances under which an NP is Case-marked may vary across
languages, within certain boundaries of locality, etc. Despite this variation,
at least two instances of Case-marking seem to be rather stable: assignment
of Nominative Case to the subject of full clauses and assignment of Objective
Case to the complement of verbs and prepositions. Let us assume that
these conventions are part of UG. The prepositional variant of the dative
construction is unmarked from this point of view, as both object NPs
are Case-marked in accordance with UG-determined conventions, whereas
the "prepositionless" dative construction poses a problem, i.e. it requires
a more specific mechanism. This is not to say that the mechanism involved
must be language specific or outside the scope of UG, but merely that
it is special and would hence require determination in a more specific
way. Assuming P-prep as above, the French value may be taken to represent
the unmarked case, while the English value is the marked value, which
must be fixed on the basis of positive evidence.
It should be noted that markedness considerations of this type are not
extensional, but intensional, as they pertain to the system generating
languages rather than to the number of languages having such a system,
or to the set of sentences generated by such a system. The reason to adopt
the English value of P-prep as marked does not necessarily mean that
English-type languages are necessarily less common than French-type
languages. What this means is that there is no a priori reason to assume

that what is marked with respect to language acquisition is also marked
in a distributional sense, although distributional factors may provide
motivation for specific markedness assumptions.
The assumption of markedness may just as well be determined on the
basis of acquisition data themselves. A case in point is the so-called pro-
drop parameter, which again may be taken to be binary, yielding the Italian-
type system with pro-drop, free-inversion and long subject extraction on
one value, and the French-type system, which lacks these three possibilities,
on the other value. I disregard at this moment the serious questions that
can be asked about the correctness of the parameter. It has been argued
by Hyams (1983) that early stages in the acquisition of English exhibit
pro-drop, unlike the target language, which should therefore be taken to
indicate that the English value is the more marked one, and thus requires
fixing, while an Italian-type system is stable in this sense, i.e. the initial
setting of the parameter is never changed in the course of the acquisition
process. Let us call this developmental markedness as opposed to distri-
butional markedness. There is no logical need that the system which is
said to be marked on the basis of such developmental considerations has
a more narrow distribution crosslinguistically: developmental and distri-
butional markedness would just not converge. If the developmentally prior
option is taken to derive from UG, distributional considerations become
irrelevant for the formulation of U G in principle. I return to this question
below.
At this point I would like to mention a further source of motivation
for particular markedness assumptions, which I shall call extensional
markedness (not to be confused with distributional markedness, which
is also external, but pertains to the number of languages, rather than to
the size of any particular language). In this case the markedness assumptions
derive from the no negative evidence hypothesis, i.e. the hypothesis that
children do not have access, at least not in a systematic way, to evidence
that something is impossible in the language they are exposed to. The
primary concern here is with the question of what prevents children from
constructing overly general grammars, which are not merely consistent
with the language to be acquired, but with a superset of that language.
In order to prevent this, the child is assumed to be conservative in the
sense of sticking to the least marked system possible, unless he is forced
by positive evidence to move over to the next least marked system, where
systems are ranked on a markedness hierarchy in terms of the extensions
of the sets generated by the different values or systems. The clearest
statement of this position is to be found in Wexler & Manzini (1987),
where a markedness hierarchy is provided for the notions governing
category and antecedent type for anaphors and pronouns.
68 Teun Hoekstra
As in the previous case, there is no intrinsic relation to distributional

markedness, i.e. most languages might, as a matter of fact, be pretty large
with respect to the governing category hierarchy for their anaphoric
elements. From the point of view of extensional markedness, such a
distributional fact would just be coincidental. There is of course by logical
necessity an intrinsic relation between developmental markedness and
extensional markedness, i.e. the system that the child starts off with should
be minimal from an extensional point of view. If Hyams' (1983) claim
concerning the pro-drop parameter is correct, therefore, pro-drop languages
should extensionally be smaller than non-pro-drop languages.
Although an ultimate assessment is rather complicated (see Saleemi,
this volume, for discussion), it would appear at first sight that rather the
opposite is true. Pro-drop languages allow both null and overt pronominal
subjects, preverbal and postverbal subjects and, long subject-extraction,
while non-pro-drop languages only have overt pronominal subjects, subjects
occur on only one side of the verb, and they do not allow sentences with
long subject extraction. Hence, the two types of internal markedness
considerations (internal in the sense that they both have to do with language
acquisition itself, rather than with matters external to it) may also fail
to converge. This also raises the question of which of the two should
be given priority, if either of the two types is incorrect in principle.
To sum up, I have distinguished several notions of markedness, or rather
several types of considerations to determine which value of a parameter
P is marked:
1 distributional: a is unmarked relative to /? if a is instantiated in a

larger number of languages than /?
2 developmental : a is unmarked relative to ft if a is developmentally
prior to /?
3 extensional : a is unmarked relative to /? if the set generated by P(a)
is a subset of the set generated by P(/3)
4 intensional : a is unmarked relative to /} if the system with P(a) is
"smaller" (in a sense to be made precise) than the system with P(/3)
Considerations 1 and 2 are observational in the sense that the marked

character of /? is determined purely in terms of what is actually found
in the data. The claim made by Jakobson suggests that there is an
observational convergence between these two domains. If this is not the
case, 2 should be given priority over 1, if considerations of this type are
correct to begin with. Similarly, if 2 does not square with 3 or 4, 2 should
be given priority. In the next section I shall argue that developmental
markedness considerations should be dismissed, as they are based on a
misconceived relation between output and system.
2. DEVELOPMENTAL MARKEDNESS
Let us take a closer look at considerations of type 2. This brings us to

the second issue mentioned in the introduction, viz. the relation between
UG and the stages of language acquisition. Much discussion within
generative grammar on the problem of language acquisition has focussed
on the logical rather than the developmental question, which is to say that
much of the discussion has taken place under the instantaneity assumption.
It is quite understandable that under this assumption considerations external
to the actual development, such as distributional markedness considera-
tions, play a prominent role. It is also true that parameters that are currently
being considered are parameters of variation rather than of systems as
they develop in the child. Considerations of extensional or intensional
markedness, on the other hand, do pertain to the developmental problem,
but in contrast to developmental considerations, their motivation is
independent of the actual process, and depends on the logical question.
What is remarkable is that children seem to be slow in some domains,
but extremely rapid in others, even though there is no sense in which
the one type of domain can be said to be more difficult, under whatever
extrinsic notion of complexity, than the other. A typical example of their
slowness is the acquisition of the correct past tenses of irregular verbs,
for which positive evidence is easily accessible, and quite often negative
evidence is provided as well. A more interesting case is the control over
disjoint reference of pronouns in a local domain, i.e. John washed him,
where children are slow in finding out that him and John must be different
persons. In order to account for delayed acquisition while the relevant
evidence had been available all the time, Wexler & Borer (1987) have
put forth the Maturational Hypothesis. As a specific example they suggest
that the notion of A-chain becomes available in a later stage of maturation,
which allows an explanation for the absence in earlier phases of a number
of properties that all seem to come in rapidly and simultaneously once
the A-chain capacity has matured. This specific hypothesis is discussed
at length in sections 4 and 5.
The Maturational Hypothesis leads us to reconsider both the question
of how to characterise early stages, and markedness considerations that
are based upon them. The absence of e.g. passive, a construction type
that is dependent on A-chain formation, at early stages could no longer
be taken as evidence for markedness considerations with respect to passive,
i.e. there would be no sense in which passive could be said to be marked
just because it is absent at a certain stage. By the same token, the fact
that early stages of both English and Italian children show pro-drop would
not need to have any bearing on the question about markedness of pro-
drop. More generally, considerations concerning developmental marked-
70 Teun Hoekstra
ness may be taken to be ill-conceived to begin with, if the Maturational

Hypothesis is adopted. The strong past tenses again provide a clear
illustration of what is wrong with the reasoning. It is well-known that
certain irregular past tenses are acquired very early. No one would accept
the conclusion that these forms are "unmarked" with respect to regularly
formed past tenses. The notion of markedness only makes sense relative
to a system, i.e. not in absolute terms. The irregular past tenses may be
considered marked relative to the regular system of past tense formation,
but before this system has become active, there is no sense in which such
forms are more marked than e.g. a form like horse.
The essential point is that whether phenomena displayed at some stage
of acquisition are determined by a system similar or identical to the adult
pro-drop system cannot be determined in an absolute sense. If that were
the case, markedness would be a mere taxonomy of observations. In Hyams'
view the grammars of Italian children are continuous with respect to the
pro-drop parameter, while English children have to change their initial
setting to the more marked value. However, if the setting of the pro-
drop parameter does not take place until a certain maturational stage
is reached, there is no obvious sense in which the grammar of Italian
children is continuous, even though its output may be unaffected.
To slightly elaborate on this, let us assume that the notion of pro-drop,
now restricted to the actual dropping of subject pronouns, has content
only relative to the interpretation of INFL-features. For example, let us
assume that the positive value of the pro-drop parameter is a function
of a pronominal interpretation of INFL-features, whereas a negative value
is consistent with an anaphoric status only, i.e. this set of features must
be licensed by entering into an agreement relation with an overt NP. Clearly,
then, the setting of the parameter value must be delayed until after the
acquisition of INFL-features, as the value is determined relative to these
features. It will be clear that before these features (or the node INFL
itself) are acquired, absence of overt subject pronouns cannot be interpreted
as resulting from the positive unmarked setting of the parameter, as there
is no sense internal to the system in which the notions relevant to this
parameter can play a part. There is a different sense in which the acquisition
of Italian is continuous as regards pro-drop, but this is an irrelevant
observational continuity under this construal, which cannot be explained
in terms of an identical setting of the parameter value. How the obser-
vational continuity is to be explained is again a different question. Here
I can only make a tentative suggestion.
One of the questions that still stands out in the assessment of early
child grammar is whether it is correct to assume that children drop subject
pronouns, or whether they drop pronouns more generally. It is certainly
not true that children do not " d r o p " object pronouns, but it is assumed
that object pro-drop is much less common than subject pro-drop. It seems
to me that in order to evaluate this quantitative distinction one has to
also take into account the relative distribution of subject and object
pronouns in adult speech. From this we know that the frequency of
pronominal subjects in transitive clauses is much higher than that of
pronominal objects. Looking at it as a dropping process, we must take
into account that from a discourse point of view the number of candidates
in subject position far exceeds the number of object candidates. We may
then assume that from a grammatical point of view there is initially no
asymmetry between subject and object drop, contrary to what Hyams
concludes.
Drawing on Rizzi (1986), Hoekstra & Roberts (1989) make a distinction
between content licensing and formal licensing, where content licensing
is interpreted as licensing in terms of 0-roles and formal licensing as licensing
in terms of "morphological" features. It is argued that the former can
be considered a form of D-structure licensing, while the latter is S-structure
licensing. Mechanisms of S-structure licensing have to do with the iden-
tification of the referent of an argument, e.g. through AGR-coindexation,
chain formation, or visibility in terms of phi-features or descriptive features.
Thus, the two arguments of a sentence like He kicked the boy are D-
structure licensed in terms of the argument roles assigned by the predicate
kick, while the agent argument is S-structure licensed through the phi-
features of the pronoun he and the patient argument is S-structure licensed
in terms of the descriptive content (plus quantification) in the NP the
boy.
I would now like to put forward the hypothesis that early child grammars
are characterised by the absence of an S-structure licensing requirement,
i.e. D-structure licensing suffices. Adopting a maturational perspective we
may interpret this hypothesis in terms of a maturational delay of S-structure
licensing. In Hoekstra & Roberts (1989) it is argued that under certain
conditions adult systems too allow arguments that are D-structure licensed
only, e.g. the null objects in the constructions discussed by Rizzi (1986)
and the null external arguments in middle constructions. In those cases,
the lack of S-structure licensing is compensated for by an additional form
of D-structure licensing (cf. Hoekstra & Roberts 1989 for details).
To make our hypothesis more specific, let us assume that S-structure
licensing is a function of Case marking. This seems to be quite reasonable
if we regard Case assignment as a way of providing visibility to arguments.
As we saw, there are two structural Case configurations, complements
of Verbs and Prepositions and the Specifier of tensed clauses. While P
and V assign Case to their complement under government, Nominative
Case is assigned under the mechanism of Head-Spec agreement. Given
this formal dissimilarity, we might expect an asymmetric growth of the
72 Teun Hoekstra
relevant assignment conventions, such that objective Case assignment is

acquired before Nominative Case assignment. If it is indeed correct that
the loss of "subject drop" takes place significantly later than the loss of
"object drop" this might be accounted for along these lines. I shall not
work out this hypothesis in this paper.
I want to conclude this section with a dismissal of developmental
markedness considerations. I have argued that output continuity cannot
be taken to reflect system continuity. If the value of P is set at stage
Sn, and stages S; prior to Sn exhibit phenomena that are characterisable
under P(a), this should not be taken as evidence that Sj is generated with
a system including P(a), and that therefore P(a) is unmarked vis-à-vis
P(ß), as P is not present in that system at all. The observational continuity
might be explained along different lines.
3. EXTENSION AND INTENSION
The notion of extensional markedness was developed by Wexler & Manzini

(1987) as part of a learning theory. The theory is rather minimal, but
is has a number of interesting theoretical consequences. Basically, what
the learning theory says is "assume the value x of P that generates the
smallest set, and let yourself be forced to y (x < y) only if some positive
data are outside the set generated by P(x)". In order to decide on a particular
value, the child has to consider the extension of what his system generates,
and compare it to the data he is being exposed to. A slightly different
conception would be that the system generating the smaller language is
"smaller" and that in order to adapt it so as to generate a larger language
if the input demands this, the system has to grow, e.g. by adding some
additional mechanism.
Let us make this difference somewhat more concrete by providing an
example. Take two languages, having only local anaphors, as in English,
and L2 having long distance anaphors, as in Japanese. Under the extensional
perspective this variation is captured in terms of a different notion of
governing category relevant for the two languages. So, Japanese grammar
possesses the value GC 2 , while English has G Q , where extentionally L(GCj)
< L(GC 2 ). Therefore, the Japanese child starts out assuming that he is
in a L(GC]) language, but being confronted with a long-distance bound
anaphor, he adopts GC 2 . Nothing in the grammar would suggest that
GC] is simpler than GC 2 : their relative appearance is merely a function
of the differences in extensions. Taking an intensional perspective, we might
suggest that the grammar of L2 properly includes the grammar of L, with
regard to anaphors, with the addition of some mechanism M, accounting
for the long distance binding phenomena, i.e. G(L2) = GiL^+M. Spe-
cifically, M might be LF-movement of the anaphor, lacking in English,

but available in Japanese (cf. Pica 1987, Chomsky 1986).
This second conception boils down to saying that languages grow because
their grammars grow, where growing is either maturation or learning. While
a language may grow as a result of the growth of the system, it might
also shrink as a consequence of an addition to the system. If my suggestion
concerning the implementation of a Case requirement at a later stage is
correct, this would indeed occur. Notice that such shrinkings cannot be
accounted for under an extensional approach like the one advocated by
Wexler & Manzini (1987).
In the next two sections I shall discuss two recent proposals by Borer
and Wexler within the framework of UG constrained maturation to see
whether the evidence they put forward is consistent with the intensional
view of grammatical growth. I shall reject the relevance of the notion
of extensional markedness. A second question that I address in this
discussion is whether the growth responsible for the acquisitional progress
should be considered as resulting from maturation or from learning.
4. THE NOTION OF GROWTH: THE UNIQUE EXTERNAL ARGUMENT PRINCIPLE
In an interesting paper, Borer and Wexler (1988) make a claim which

is inconsistent with the notion of growth that I developed in the previous
section. They propose a principle, called the Unique External Argument
Principle (UEAP), which disappears in the course of maturation. This would
be an instance of ungrowing, i.e. of shrinking of the system. I want to
argue that the facts that motivate UEAP can be reinterpreted as resulting
from growth of the system, and suggest that this growth does not require
a maturational account. The second point is essentially independent of
the first, i.e. even if an intensional account can be argued to provide a
better alternative to UEAP, the growth might be triggered by learning
or maturation.
UEAP requires that each predicative element have its own subject. B&W
use this principle to explain the following situation. In Italian, participles
in the perfect do not normally agree with their nominal object (8a), although
there is agreement with clitic objects (8b), as well as with subjects in passive
constructions (8c), but not in unergative intransitive constructions (8d):
(8) a. Gianni ha letto i libri

Gianni has read[-AGR] the books
b. Gianni li ha letti
Gianni them has read[+AGR]
74 Teun Hoekstra
c. I libri sono stati letti

The books are been[+AGR] read[+AGR]
d. Gianni e Piero hanno corso
Gianni and Piero have run[-AGR]
Children divert from this pattern in two respects: they uniformly have
agreement between the object and the participle in (8a), and there are
no occurrences of the perfect with intransitives of the type (8d). The question
is, how to capture the generalisation between non-occurrence of (8d) and
overgeneral agreement in (8a).
This is where UEAP comes in. With Borer & Wexler (1988) we must
make the basic assumption that agreement in early stages results from
the same mechanism that is operative in adult grammars, which is to say
that it results from a relation with a local subject (cf. Kayne 1986). UEAP
requires that every predicate element has its own unique subject. There
is no way in which this requirement can be met in (8d), as there are two
predicative elements {hanno and corso), but only one subject candidate.
UEAP can be met in (8a), however, if i libri is taken as the subject of
an adjectival participle letti, which must agree with its subject according
to the agreement rule. The overgeneralised agreement in (8a) is lost and
(8d) is let in as soon as UEAP disappears from the grammar. This way
the generalisation is captured.
Let us first turn to the epistemological status of UEAP. The interpretation
of UEAP given by B&W (1988) is a maturational one: "UEAP, we propose,
represents a maturational stage. While it constrains the early grammar,
it is, obviously, not a constraint on the grammar of adults" (1988:22).
Notice the implication of this for the hypothesis of UG-constrained
maturation. Not only are we to assume that certain portions of UG become
available at a certain maturational stage, other portions of UG become
unavailable at a certain maturational stage, since UEAP, a principle of
UG, is not characteristic of any adult system (by definition), but only
of certain stages of language acquisition, disappearing from the organism
in a way similar to the loss of the drowning reflex.
Notice that UEAP comes very close to a principle of adult-systems,
in effect one of the most basic principles of GB-theory, viz. the Projection
Principle. If a predicate has a role, it must be assigned to a unique argument.
Rather than taking UEAP as an independent principle, B&W suggest
looking upon it as a proto-principle that ultimately develops into this
principle. The difference between UEAP and the Projection Principle is
mainly a matter of scope, UEAP being wider in scope in the sense that
a predicate requires a subject independent of the assignment of an argument
role to it. The question we have to answer then is how this scope is narrowed
down, so as to capture the relevant generalisation, viz. that loss of agreement
in (8a) and the emergence of (8d) are simultaneous.
An interesting fact, to which B&W (1988) do not attach any significance,

is that Italian children do construct the perfect tense with intransitive verbs,
but only if these belong to the essere-selecting class, i.e. the ergative
intransitives (cf. Burzio 1981). In itself this is not problematic for UEAP,
as an exception is made as regards the application of UEAP precisely
for essere 'be'. Hence, a sentence like Maria e caduta 'Mary is fallen
( + AGR)' does not pose a problem for UEAP, because the only predicative
element subject to UEAP is caduta, the participle.
The notion of exception to UEAP, restricted by B&W to essere, provides
an alternative way of capturing the relevant generalisation. Once we state
that avere is an exception as well, the generalisation is captured as well:
avere in (8a) and (8b) would no longer require a subject of their own.
What does it mean to say that a (verbal) element is an exception to UEAP?
Formulated in terms of UEAP, an exception would be a verb that shares
its subject with another verb.
Clearly, for a verb to share its subject with another verb means that
it is an "auxiliary". This is not the place to enter into a discussion of
how to represent auxiliaries, neither in adult, nor in child grammars. Various
options are currently available. The crucial property would appear to be
that these verbs do not have a thematic grid. Whatever its implementation,
such a notion must be made available by UG. The question then becomes
whether this notion is continuously available, or whether it comes in at
a certain maturational stage. The fact that essere is taken as an auxiliary
in the relevant sense at a very early stage suggests the former answer,
but this answer, as always, raises the triggering problem: why is the
acquisition of auxiliary avere (as well as other auxiliaries) more delayed
than essere'? If a satisfactory answer can be given to this question, a
maturational account is not called for.
It would seem that the primary data provide unambiguous evidence
for the auxiliary status of essere: it is an auxiliary in every sentence in
which it occurs, i.e. it always shares its subject with another predicative
element (with an adjectival predicate as a copula, with a PP as a locational
predicate, and with a participle as a temporal or passive auxiliary). Avere,
on the other hand, is like have in English: it is a main verb in the simplest
sentences in which it occurs (John has a bicycle), it may take small clausal
complements (John has his door open), also with participles as their predicate
(John has fugitives hidden). Under the latter structural analysis of (8a),
agreement is predicted. In short, most of the input with avere either requires
or is consistent with a non-auxiliary interpretation of avere. I further claim
that the child will assume that a particular form instantiates the same
lexical requirements. Analysing avere as a main verb will not only allow,
but in fact impose the relevant small clause interpretation of (8a). At the
same time (8d) cannot be formed. Only input of the type (8d) then forces
76 Teun Hoekstra
the child to revise his initial assumption concerning mere, so as to also

use it as an auxiliary verb. This explains the developmental delay of the
auxiliary avere.
Under this account of the developmental delay which is illustrated by
the deviations from the pattern in (8), the grammatical development can
be formulated in terms of growth. What grows is the system that at first
takes avere to be a main verb, selecting its own subject, but then allows
an interpretation of avere as an auxiliary. This process of growth does
not require a maturational account, as sensible explanations for the delay
in identifying the auxiliary status of avere are available. Hence, the only
shift is one of adding lexical information, which is compatible with learning.
5. A-CHAINS
The major motivation for Borer and Wexler's maturational approach

derives from the growth of passives. Verbal passives do not occur at early
stages. Borer and Wexler explain this developmental delay by assuming
that A-chains mature. To be sure, certain passives do occur at stages before
the alleged maturation of A-chains, but they analyse these as adjectival
passives. Unlike verbal passives, adjectival passives do not involve A-chains,
i.e. are not created by syntactic movement, but rather formed in the lexicon,
as proposed by Wasow (1977).
In this section I shall first argue against the claim that A-chain formation
is unavailable at the relevant stage. This argument is based on the analysis
of ergative verb constructions, which also involve the formation of A-
chains. In 5.2. I turn to passives, and argue that the distinction between
the two types of passives is insufficiently clear. I then claim that restrictions
on early passives do not result from the absence of A-chain formation,
but from independent factors.
5.1. Ergatives
Within the class of intransitives a distinction is made between unergatives,

which take their single argument as an external one, and ergatives, the
single argument of which is projected internally. This internal argument
is moved to the subject position for reasons of Case, as ergative verbs
do not assign Case to the NP they govern.
If NP-movement is unavailable in early stages of the acquisition process,
it follows that children cannot make the distinction between ergative and
unergative intransitives in the way GB-theory represents this distinction.
More specifically, children must represent all their intransitives as uner-
gatives. This in turn implies that generalisations which are formulated
in terms of the class distinction either will fail to hold, or are captured
in other terms. A case in point is the selection of perfective auxiliaries,
which is sensitive to the (un)ergativity of the verb (cf. Burzio 1981 for
Italian, Hoekstra 1984 for Dutch). Dutch children are correct in this respect
very early, long before the purported emergence of A-chains. The same
appears to be true for Italian children. This implies that they are sensitive
to the distinction. If the distinction is not represented in the way it is
assumed to be in the adult grammar, the mechanism for auxiliary selection
should equally be different. This would raise the question of why children
would ever change their system.
In order to motivate the claim that children represent ergative and
unergative predicates in the same way, viz. as unergatives, B&W adduce
cases of overgeneralisation of lexical causativisation, reported for English
children by Bowerman (1982). So, apart from transitives such as John
broke the glass related to the ergative intransitive The glass broke, children
are reported to form alongside unergative intransitives like I sneezed
transitive causatives like Daddy's cigar sneezes me. To explain this, B&W
assume that, given the fact that they also have to represent intransitive
break as unergative, children are forced to formulate a causativisation
rule that is marked, while in adult English causatives are formed by an
unmarked rule. The marked rule requires the internalisation of an external
argument, while the unmarked one would solely add an external causer
argument to a verb that did not yet have an external argument. It is only
after the maturation of A-chain formation that the child realises that some
of the causative/inchoative patterns are consistent with the unmarked
instantiation of the rule. Once this is realised, the child stops overgene-
ralising, as he drops the assumption that the marked rule is operative
in the language he is learning.
There are several problems with this analysis. The most basic of these
is that the hypothesis lacks a perspective on the way in which the difference
between an ergative and an unergative representation of a particular item
is determined. It is unclear, therefore, how, after the A-chain mechanism
has come into the child's reach, he finds out which of his intransitive
verbs have an erroneous representation, given that an unergative repre-
sentation is still available after the emergence of A-chain formation. Related
to this is the observation that reported cases of overgeneralisation are
not random. I shall elaborate on this matter below.
First, however, I would like to consider the notion of marked causative
rule itself. From a crosslinguistic point of view, the notion of a marked
causative rule as the one employed by B&W seems highly suspect. They
notice in passing that the English rule makes use of a zero-affix. Such
zero-causative formation of the English type occurs in many languages,
but the rule always seems to be restricted to ergative verbs. On the other
78 Teun Hoekstra
hand, causative formation that makes use of overt affixation usually is

not restricted in this way. The discussion of the Hebrew hifil by B&W
makes clear that this is true for Hebrew. While an exploration of this
correspondence is outside the scope of this paper, it can hardly be considered
a coincidence. The relevant generalisation comes close to UEAP in a certain
sense, in that what it boils down to is that no morphologically simple
verb can have two external arguments, while a morphologically complex
verb, part of which can be taken to represent the causative meaning, may
allow an external argument on top of the external causer argument. This
should follow from a principle of U G (cf. Hoekstra, forthcoming, for
discussion). If that is correct, the marked rule of causative formation would
fall outside the scope of UG, and would therefore be excluded by the
program of UG-constrained maturation advocated by B&W.
If the assumption that children can have no ergative representation is
given up, the overgeneralisation of the causative rule can be used to make
the opposite claim. Rather than saying that the overgeneralisations are
the result of an overly general rule, they might be attributed to an
overgeneralised use of ergative representation, i.e. the verb sneeze in the
above example might have been represented as an ergative verb, thus falling
within the reach of the unmarked causative rule, in effect the only rule
consistent with UG, if I am correct.
This brings us to the question mentioned above, namely how the choice
between an ergative and an unergative representation is determined. It
is clear that the notion of external argument used above is semantically
determined. Certain argument roles qualify as external, others as internal,
still others are perhaps less clear in this respect. We are referring here
to the idea that there is a universal basis for the alignment of participant
roles and their linguistic representation, known as the Universal Alignment
Hypothesis (UAH) (cf. Pesetsky 1987). The principle has been questioned
in the current literature, on the basis of variation between languages with
respect to the set of ergative verbs. However, such variation does not
militate against UAH: the meaning of translation equivalents may be
different in a subtle way, e.g. Italian ergative arrossire 'become red' is
used to denote the kind of event for which Dutch uses unergative blozen
'blush'. Yet, the fact that they constitute translation equivalents is not
sufficient ground for taking them to mean the same thing: arrossire denotes
a change of state, whereas blozen denotes a bodily function, behaving in
similar fashion to verbs like lachen 'laugh' etc. Hence, whereas certain
concepts determine a unique linguistic meaning, with a unique set of
argument roles, other concepts less clearly determine the argument roles
of the verb, and in those cases variation between languages is expected.
This does not affect the status of UAH: it stipulates for a given argument
role how it is projected onto a grammatical function. The point here is
that the fact that the participant roles are not always uniquely determined
does not mean that the choice of an ergative or unergative representation
is always arbitrary.
The essential ingredients of this hypothesis also underlie ideas such as
Pinker's (1984) semantic bootstrapping hypothesis. According to UAH,
agents are uniformly represented as external arguments, while themes are
taken as internal arguments. In dealing with concepts that determine the
participant roles less clearly, the child has the same hypothesis space as
languages have: in the absence of any grammatical indications, one may
wonder whether the sole argument of (adult) sneeze is an experiencer or
theme, undergoing a process, or whether it should qualify as an agent.
The child might have the same difficulty. Precisely under these circums-
tances, erroneous representations are to be expected. The non-random
character of the overgeneralisations which are reported follows from this
perspective on the nature of the determination of the external/internal
status of participant roles.
To sum up, overgeneralisations of the causative rule in English do not
provide sufficient motivation for the claim that all ergative verbs are initially
represented as unergatives. Such a claim would undercut the essence of
the UAH, in the absence of which the way in which arguments are linked
up with grammatical functions would be arbitrary in principle. Moreover,
the claim requires that children exploit mechanisms which should be
excluded as a matter of principle, such as a marked causative rule, as
well as mechanisms for e.g. auxiliary selection and agreement which are
quite different from the mechanisms assumed for adult systems. None of
this is needed if the claim that A-chain formation is unavailable is given
up.
5.2. Passives
If the argument in the previous subsection is correct, the maturational

account of the delay in passivisation in terms of A-chain formation cannot
be correct. As I mentioned above, passive sentences are not really absent
in early child language, but, as B&W note, the range of passives in early
grammars appears to be restricted in significant ways. They take this as
evidence that the child has the ability of constructing adjectival passives
at an early age, but not verbal passives. Unlike the latter, adjectival passives
do not involve movement.
The distinction between adjectival and verbal passives was originally
made in Wasow (1977) and supported with further evidence by Williams
(1982). However, the theoretical underpinnings of the distinction have not
gone unquestioned. The logic in Wasow's approach was to isolate a number
of "adjectival" contexts and to show that certain types of passives are
80 Tern Hoekstra
not found in these contexts. To give an example, the prenominal position

as in (10) is taken to be an adjectival context, i.e. if a participle occurs
there, it must be an adjective. The types of passives in (9) are all impossible
in this position. This would be explained if these types of passives are
impossible qua adjectival passives, i.e. if they could only be derived by
movement.
(9) a. John was believed to be foolish (raising passive)

b. Mary was given a book (indirect object passive)
c. The war was prayed for (prepositional passive)
d. John was elected president (small clause complement
passive passive)
(10) a. *A [believed to be foolish] person

b. *A [given a book] girl
c. *A [prayed for] war
d. *An [elected president] candidate
As argued in Hoekstra (1984) these criteria are non-explanatory in the

sense that they fail to make the correct crosslinguistic predictions. Consider
the examples in (11):
(11) a. het mij tijdens de lunch gegeven cadeau

the me during lunch given present
b. de door iedereen ongeschikte geachte kandidaat
the by everyone unfit considered candidate
The participle gegeven in (11a) can take an NP complement, something

which adjectives do not. In (lib), the participle takes a SC-complement
(cf. (10d)). The reason for the ungrammaticality of the English examples
in (10) is the Head Final Filter (Williams 1982) or whatever explains this
filter. This filter states that the head of a prenominal modifier must occur
at the right periphery of its phrase, both in Dutch and in English. Due
to a difference in recursive side, English being head-initial, Dutch head-
final as far as VP is concerned, the range of possible prenominal participle
constructions in English is much smaller than in Dutch. This does not
constitute an argument in favour of an adjectival status, nor for a distinction
between two types of passive formation. Similar reasonings hold for the
other tests that Wasow proposes (cf. Hoekstra 1984 for detailed comments).
I should perhaps stress that I do not claim that there are no adjectival
participles. My point is that the way in which the distinction between
adjectival participles and verbal participles is made, as well as the con-
sequences for the analysis of passives based on it, are not sufficiently
motivated. In particular, the adjectival nature of a participle cannot simply

be taken as sufficient ground for the claim that no movement is involved.
B&W note that early passives are restricted to action verbs, i.e. passives
with non-actional verbs such as see, like etc. are not found. They relate
this to their claim that children only exploit adjectival passive formation,
noting that participles of such verbs do not easily occur in "adjectival"
contexts either (cf. *a liked man). Another property of early passives is
that they are usually truncated, i.e. without a ¿_y-phrase. They furthermore
claim that adjectival passives resist ¿^-phrases also. Interestingly, com-
parative evidence again suggests that the relation they see between the
use of ¿y-phrases and passivisability of non-actional verbs is spurious as
well. The range of application of passive in English is quite atypical. Many
non-actional verbs that may be passivised in English resist passivisation
in Dutch, but the use of the Dutch counterpart of the Ay-phrase is not
at all restricted. The same is true for many other languages. Conversely,
there are many languages in which passives are always truncated (cf.
Siewierska 1985).
However, even if we were to accept a qualitative difference of the type
that B&W argue to be characteristic of adjectival versus verbal passives,
we need not have recourse to an account in terms of adjectival passive
formation versus movement. Current analyses of the passive in GB-theory
claim that there is no elimination of the external argument role, but rather
that this role is assigned to the passive affix (cf. Roberts 1985, Hoekstra
1986, Jaeggli 1986). The unmarked assumption might be that the passive
affix is eligible only for agents, unmarked in the distributional sense. The
rise of the Ay-phrase may be an independent, perhaps maturationally delayed
step, which may prompt a more marked hypothesis with respect to the
argument roles for which the passive affix is eligible if the language provides
positive evidence for such a setting.
Summing up this part, we have seen that the hypothesis that A-chain
formation matures at a stage when particular types of passives come into
use meets with several problems. If we adopt some version of UAH, children
will postulate ergative representations for ergative verbs right from the
start, which thus prompts the use of A-movement. This is consistent with
the observation that Dutch and Italian children appear to be sensitive
to the distinction between ergative and unergative verbs if we look at
auxiliary selection in both languages, and participial agreement in Italian.
I have argued that this hypothesis might also derive some support from
the observation that Dutch children do not seem to overgeneralise the
causative rule in the way English children are reported to do. I argued
that these overgeneralisations in English are the result of an erroneous
interpretation of verbs as ergatives, which obviates the postulation of a
marked rule of causativisation. The late appearance of certain types of
82 Teun Hoekstra
passive constructions might be explained in maturational terms, not with

regard to A-chain formation, but perhaps in terms of the mechanism
involved in the use of the 6_y-phrase.
A scenario of this type is consistent with the conception developed above,
according to which expansion of the language generated is a function of
an expansion at the level of the system. The development of passives can
be seen as starting with the initial assumption that the passive morpheme
may only receive the role of agent. This accounts for the absence of a
large class of passives available in adult English, but the notion itself can
be motivated with reference to other linguistic systems, where the use of
the passive construction is similarly restricted. If early passives are the
result of such a system, there is no reason to invoke the absence of A-
chains to explain the absence of certain types of passives.
6. CONCLUSION
In this paper I have discussed various notions of markedness that are

applied to the setting of parameter values in the course of language
acquisition. I have dismissed a particular type of markedness considerations,
viz. those that I called developmental. I argued that the notion of
markedness can only be relevant with reference to a particular system.
Early stages of language acquisition may result from systems for which
particular parameters are not yet relevant.
I then addressed the question of what determines the expansion of early
systems. Two hypotheses have received a certain prominence, viz. the
continuity hypothesis and the maturational approach. While I am in
principle sympathetic to the maturational approach, two specific proposals
that I discussed appeared open to alternatives which are more in line with
a notion of growth. Growth of a language is a consequence of growth
of the system, which suggests a notion of complexity, rather than mar-
kedness per se. It is an empirical question whether this notion of complexity
has a direct relation to the notion of distributional markedness.
FOOTNOTE
*I would like to thank the following persons for conversations about the subject matter:
Harry van der Hulst, Hans Bennis, Jan Voskuil and Rene Mulder. A special thanks goes
to Hagit Borer, for giving comments which may have led to clarifications, although she
is bound to disagree on a number of points.
REFERENCES
Bowerman, M. 1982. Evaluating competing linguistic models with language acquisition data.
Semantica 3. 1-73.
Borer, H. and K. Wexler. 1987. The maturation of syntax. In T. Roeper and E. Williams
(eds) Parameter setting. 123-172. Dordrecht: Reidel.
Borer, H. and K. Wexler. 1988. The maturation of grammatical principles. Ms. UC at Irvine.
Burzio, L. 1981. Intransitive verbs and Italian auxiliaries. Doctoral dissertation, MIT.
Chomsky, N. 1986. Knowledge of language: its nature, origin and use. New York: Praeger.
Hoekstra, T. 1984. Transitivity. Dordrecht: Foris.
Hoekstra, T. 1986. Passives and participles. In F. Beukema and A. Hulk (eds) Linguistics
in the Netherlands ¡986. 95-104. Dordrecht: Foris.
Hoekstra, T., forthcoming. Theta theory and aspectual classification.
Hoekstra, T. and I. Roberts. 1989. The mapping from lexicon to syntax: null arguments. Paper
delivered at the Groningen conference "Knowledge and language".
Hyams, N. 1983. The acquisition of parametrized grammars. Doctoral dissertation, CUNY.
Jaeggli, O. 1986. Passive. Linguistic Inquiry 17. 587-622.
Jakobson, R. 1941. Kindersprache, Afasie und allgemeine Lautgesetze. Uppsala.
Kayne, R. 1981. On certain differences between French and English. Linguistic Inquiry 12.
349-372.
Kayne, R. 1986. Principles of participle agreement. Ms. University of Paris VIII.
Pesetsky, D. 1987. Psych predicates, universal alignment, and lexical decomposition. Ms. UMASS
at Amherst.
Pica, P. 1987. On the nature of the reflexivization cycle. NELS 17. 483-500.
Pinker, S. 1984. Language learnability and language learning. Cambridge, Massachusetts:
Rizzi, L. 1986. Null objects in Italian and the theory of pro. Linguistic Inquiry 17. 501-
557.
Roberts, I. 1985. [1987] The representation of implicit and dethematized subjects. Dordrecht:
Foris.
Siewierska, A. 1985. The passive. London: Croom Helm.
Wasow. T. 1977. Transformations and the lexicon. In T. Wasow, P. Culicover and A. Akmajian
(eds) Formal syntax. 327-377. New York: Academic Press.
Wexler, K. and R. Manzini. 1987. Parameters and learnability in binding theory. In: T.
Roeper and E. Williams (eds), Parameter setting. 41-76. Dordrecht: Reidel.
Williams, E. 1982. Another argument that passive is transformational. Linguistic Inquiry
13. 160-163.
Nativist and Functional Explanations in
Language Acquisition
James R. Hurford
University of Edinburgh
1. PRELIMINARIES
1.1. Setting and Purpose
Current theories of language acquisition and of linguistic universals tend

to be polarised, adopting strong positions along dimensions such as the
following: formal (or nativist) versus functional; internal versus external
explanation; acquisition of language versus acquisition of communication
skills; specific faculté de langage versus general cognitive capacity.
As with many enduring intellectual debates, there is much that is
convincing and plausible to be said on each side. Some works are very
polemical, apparently conceding little merit in the opposing point of view.
Some so-called 'functional' explanations of language universals, which
appeal to properties of performance mechanisms, e.g. the human parser,
miss the important point that these mechanisms are themselves innate and
as much in need of explanation as the properties of the linguistic system.
Another class of proposed functional explanations for language universals,
which appeal to the grammaticalisation of discourse patterns, fail to locate
this mechanism in the life-cycle of individual language-knowers. On the
other hand, some nativist explanations imply that they are complete, having
finally wrapped up the business of explaining language acquisition, missing
the point that the demand for explanations never ceases, and that the
'solution' to any given puzzle immediately becomes the next puzzle.
The appearance of a direct confrontation between nativist and functional
styles of, or emphases in, explanations of language acquisition and linguistic
universals was greater in the 1970s than it has been recently, as Mallinson
(1987) emphasises. Golinkoff and Gordon (1983) give a witty, but fairly
accurate, historical account of the pendulum-swings and emphasis-shifts
in the debate since the inception of generative grammar. Regrettably, the
embattled spirit of the barricades survives in some quarters, as in New-
meyer's (1983) review of Givon (1979), itself a sharp polemic, and in the
exchange between Coopmans (1984) and Hawkins (1985).
In an area where polemic is so rife, the truth-seeker can be distracted
or misled by a number of false trails which it is as well to be able to
86 James R. Hurford
recognize in advance. The following are some types of distraction to be

ever vigilant for: (1) Unannounced theory-laden use of everyday terms,
such as 'language', or 'universal' (for instance, using 'language' to mean
just the unmarked core grammar, or excluding phonology); (2) The
assumption of a monolithic research enterprise, such that a criticism of
any single aspect of it is taken as a blanket attack on the whole; and
(3) Sheer mistaking of an opposing position, taking it to be something
other (even the opposite) of what it really is (a distressingly frequent type
of mistaking involves elementary failure to distinguish between 'all' and
'some' in an opponent's exposition).
I assume that the readership of this book will not consist wholly, or
even largely, of convinced generative linguists, but will include people such
as psychologists studying language acquisition, linguists with a more
anthropological emphasis, philosophers who ponder issues of language
structure and use, sociolinguists, and theorists of historical language change,
to all of whose work logical issues in language acquisition are relevant.
Being concerned with outlining a synthesis of approaches accessible to
workers in these different areas, my points will typically be at a quite
general level, and I will often resort to quoting relevant work from the
various fields. The distinctions I discuss will tend to be broad distinctions
between domains of study, rather than the finer distinctions identified by
workers within domains. Seekers after very specific proposals about models
and mechanisms will not find them here. But, at this general level, I will
propose a model for the interaction of language use and language
acquisition, in which I believe all students of language, from psycholinguists
through 'core' linguists to sociolinguists and historical linguists, will be
able to identify a part which is theirs.
A colleague has likened this attempt at synthesis to waving a flag in
the no-man's-land between two entrenched armies shooting at each other,
with the consequent likelihood of finding oneself full of bullet-holes. But
the military metaphor is, one hopes, inappropriate to scholarly work.
Synthesizing, integrating work must be attempted. This is not to discourage
any individual researcher from trying to mount a strong case that such-
and-such an aspect of language should be attributed to the influence of
the innately structured LAD (or alternatively to what I shall call the Arena
of Use), nor to dissuade any rival researcher from trying to demolish such
a case, on theoretical or empirical grounds. Indeed such efforts, locally
partisan as they are, are the sine qua non of the growth of knowledge
in the field. What I am trying to discourage is a dismissive globally partisan,
academically totalitarian, kind of view, that holds that explanations from
innateness (or, for the opposing partisan, from use) are simply not worth
serious consideration, on either theoretical or empirical grounds.
1.2. Glossogenetic and Phylogenetic mechanisms
The dimension of diachrony, only skimpily treated in previous discussions,

provides a coherent background within which function and innateness can
be consistently accommodated. Functional explanations of language ac-
quisition can be compatible with nativist explanations, provided one gets
the timescale right. The much-debated dichotomy, innate versus functional,
is a red herring. The basic dichotomy is, rather, phylogeny versus ontogeny,
and also the related nature versus culture. Function is not 'opposed' to
any elements in these dyads, but exerts its influence on all.
The issue of the relation between linguistic development and other
(cognitive, social, etc.) experience can be set in different timescales, short-
term or long-term. Such experiences may be directly involved with linguistic
development within the time-span of an individual's acquisition of his
language, a period of a few years; or, at the other extreme, the outcomes
of experiences of members of the species over an evolutionary timescale
lead to the natural selection of individuals innately equipped to acquire
systems with particular formal properties. The idea of short-term (onto-
genetic or glossogenetic) timescales versus long-term (phylogenetic) times-
cales in explanations for linguistic facts is important to an overall view
of the relation between function and innateness. The term 'glossogenetic'
reflects a focus on the development and history of individual particular
languages; language-histories are the rough cumulation, over many ge-
nerations, of the experiences of individual language acquirers. The biological
endowments of successive generations of language acquirers in the history
of a language do not differ significantly, and so linguistic ontogeny, and
its cumulation, language history, or glossogeny, are to be distinguished
from linguistic phylogeny, the chronologically vastly longer domain, in
which biological change, affecting the innate language faculty, takes place.
After the present section of preliminaries, the second and main section
of this paper will be devoted to the short-term, onto- or glossogenetic
mechanism of functional influence on language form.
A detailed exposition of the phylogenetic mechanism of functional
influence on language form is, unfortunately, too long to be included in
this collection of papers, and is to be published elsewhere (Hurford, 1991).
The phylogenetic mechanism is mentioned briefly by Chomsky and Lasnik
(1977:437), but although their note has been echoed by various subsequent
authors (e.g. Lightfoot, 1983:32, Newmeyer, 1983:113, Lasnik, 1981:14,
Wexler, 1981:40), it has not initiated an appropriate strand of research
into functional explanations of language universals at the level of evolution
of the species.
Despite acceptance of the premise that functional explanations for
linguistic universals do operate at the level of evolution of the species,
88 James R. Hurford
remarkably little further gets done about it. Contributions from linguists,
of whatever theoretical persuasion, (e.g. Lightfoot's section "Evolution
of Grammars in the Species" (Lightfoot, 1983:165-169) and Givon's chapter
"Language and Phylogeny" (Givon, 1979:271-309)) remain sketchy, su-
perficial, and anecdotal.
On the other hand, a more promising sign is Pinker and Bloom's (1990)
paper, in which they systematically address some of the major skeptical
positions (e.g. of Piattelli-Palmarini, 1989, Chomsky, and Gould) concer-
ning natural selection and the evolution of the language faculty. Several
other articles (Hurford, 1989, 1991a, 1991b; Newmeyer, forthcoming) make
a start on working out proposals about how quite specific properties of
the human language faculty could have emerged through natural selection.
To whet the reader's appetite, without, I hope, appearing too enigmatic
or provocative at this stage, I give here a short paragraph with a diagram
(Figure 1), sketching the phylogenetic mechanism, and a table (Table 1),
summarising the major differences between the glossogenetic and the
phylogenetic mechanisms. Deep aspects of the form of language are not
likely to be readily identifiable with obvious specific uses, and one cannot
suppose that it will be possible to attribute them directly to the recurring
short-term needs of successive generations in a community. Here, nativist
explanations for aspects of the form of language, appealing to an innate
LAD, seem appropriate. But use or function can also be appealed to on
the evolutionary timescale, to attempt to explain the structure of the LAD
itself.
The phylogenetic explanatory scheme I envisage is as follows:
FACTORS INVOLVED IN Language

Biological SUCCESSFUL COMMUNICATION Acquisition
mutations IN THE HUMAN ENVIRONMENT Device
(THE ARENA OF USE)
Fig. 1.
Here biological mutations plus functional considerations constitute the

explanans, and the LAD itself constitutes the explanandum. The LAD is
part of the species' heredity, the result of mutations over a long period.
TWO TYPES OF FUNCTIONAL EXPLANATION
GLOSSOGENETIC PHYLOGENETIC
(Sec.2 of this paper) (Hurford, 1989, 1991b)
Usefulness felt: In short term In long term

(every generation) (evolutionary timespan)
Transmission: Cultural Genetic
Knowledge deter- Typically, well Typically, poorly

mined by data: determined determined
Innovation by: Invention, creativity Biological mutation

of individuals
Typical explanandum: Language-specific Universal
Competition in Between languages Between classes of

Arena of Use: (Ln vs Ln+1) languages
Motivating analogy: Language as a TOOL Language as an ORGAN
Table 1.
Much of the present paper will be an extended commentary on the rubrics

in this table, especially those in the 'Glossogenetic' column. Before getting
down to the details of the glossogenetic mechanism in Section 2,there are
a couple of general preliminary plots to be staked out, in the remainder
of this section.
1.3. Competence/performance, I-Language/E-language
Explanations differ according to what is being explained. This is a truism.

But much discussion of 'explaining linguistic phenomena' uses that phrase
to smother an important distinction, the distinction between grammaticality
and acceptability (competence and performance, I-language and E-
language). The distinction is central to the Chomskyan enterprise, and
has been a frequent target of attack, or source of misgivings. In the literature,
for instance, one finds widely-read authors writing:
"The distinction between competence and performance - or grammar and speaker's

behavior - is ... untenable, counterproductive, and nonexplanatory". (Givon, 1979:26)
"The borderline between the purely linguistic and the psychological aspects of language
... may not exist at all". (Clark and Haviland, 1974:91)
90 James R. Hurford
"There is a whole range of different objections from sociolinguists, sometimes querying

the legitimacy of drawing the [competence/performance] distinction at all". (Milroy,
1985:1)
Givon's book is still widely discussed, Herb Clark is an influential

psychologist, and Lesley Milroy speaks for a body of sociolinguists for
whom the competence/performance distinction itself is still a current issue.
In the context of their expressed doubts about competence/performance
(alternatively I-language/E-language), and concomitantly grammaticality/
acceptability, it is relevant to reassert this distinction. Despite such doubts
and attacks, I will maintain here that many clear cases of the distinction
exist, while conceding that there are borderline linguistic phenomena whose
classification as facts of grammar or facts of use is at present problematic.
Some early, perhaps overhasty, conclusions claiming to have explained

aspects of grammar in functional terms can now be reinterpreted as
explaining phenomena more peripheral to the grammatical system, such
as stylistic preference, or acceptability. For instance, this is how Newmeyer
(1980:223-226) depicts Kuno's various functional explanations: 'Kuno's
approach to discourse-based phenomena has gradually moved from a
syntactic one to one in which the generalisations are to be stated outside
of formal grammar' (Newmeyer, 1980:224). Such reinterpretation follows
shifting (and, one hopes, advancing) theories of the boundary between
grammatical phenomena proper and acceptability and style.
For concreteness, I will give some examples, all for Standard English,
of how I assume some relevant phenomena line up:
(1) GRAMMATICAL, BUT O F PROBLEMATIC ACCEPTABI-

LITY
Colourless green ideas sleep furiously.
The mouse the cat the dog chased caught ate some cheese.
The horse raced past the barn fell.
(2) UNGRAMMATICAL, AND OF PROBLEMATIC ACCEPTA-

BILITY
*He left is surprising.
*The man was here is my friend
(3) UNGRAMMATICAL, BUT OFTEN ACCEPTABLE

*He volunteered three students to approach the Chairman
*She has disappeared the evidence from her office
A degree of relative agreement between individuals, and certainty within

individuals, about the above examples does not mean that there can't be
genuine borderline cases. There may well be slight differences between
individuals in their genetically inherited language faculties 1 , and the input
data is certainly very variable from one individual to another, as is the
wider social context of language acquisition. And (any individual's ins-
tantiation of) the language acquisition device itself may not be structured
in such a way as to produce a classification of all possible wordstrings
with respect to their grammaticality.
This classification of patterns of linguistic facts as grammatical or
otherwise does not depend, circularly, on the kind of explanatory me-
chanism one can postulate for them, but rather primarily in practice (though
by no means wholly in principle) on that classical resource of generative
grammar, native speaker intuitions of grammaticality (themselves not
always easily accessible).
In fact, from a linguist's viewpoint, the sentences (1-3) constitute a
heterogeneous bunch, conflating much more interesting distinctions which
these very sentences, if aptly exploited, could well emphasise, for instance,
grammaticality versus parsability, grammaticality versus first-choice par-
sing strategies, semantically correct versus conceptually empty sentences
etc. But I am stressing here a more basic point. The grammaticality/
acceptability distinction, paralleling the competence/performance (I-
language/E-language) distinction, is an absolutely crucial foundation upon
which the further much more interesting distinctions can be elaborated.
Only if it is accepted can one progress to the more interesting distinctions.
In this paper my concern is to investigate the relationships obtaining between
the domain of grammar, on the one hand, and nongrammatical, e.g.
processing-psychological and social, domains, on the other hand. For my
purposes, as it turns out, these other domains can, at a broad general
level, be lumped together, so far as their role in potential functional
explanations for aspects of linguistic competence is concerned, although
obviously a study with a different focus of attention would immediately
separate and distinguish them. Sociolinguistics, pragmatics and discourse
analysis, and psycholinguistics are disciplines with highly divergent goals
and methodological styles. (Thus 'functional explanation' is likely to be
interpreted in different ways by sociolinguists and psycholinguists.)
Chomsky is entirely right in emphasising that a language (E-language)
is an artifact resulting from the interplay of many factors. Where I differ
from his judgement is in my belief that this artifact is of great interest,
that it is susceptible to systematic study (once its diverse component factors
are identified), and that it can in fact affect grammatical competence (I-
language).
Given the grammaticality/acceptability distinction, and a classification,
92 James R. Hurford
however tentative, of linguistic facts according to this distinction, the search

for explanations must provide appropriate explanatory mechanisms for
the different kinds of linguistic phenomena. The explanatory task for
grammaticality facts can be couched fairly naturally in terms of language
acquisition: 'How does a person acquire a particular set of intuitive
judgements about wordstrings?' But the explanatory tasks for the various
diverse classes of acceptability facts are not naturally couched in terms
of language acquisition.
Different kinds of questions require different kinds of answers, but this
does not mean that, for example, perceptual strategies can ultimately play
no part in explaining how a child acquires certain grammaticality jud-
gements. And, conversely, it does not mean that grammatical facts
(competence) can play no part in processing. (To the linguist convinced
of the grammaticality/acceptability distinction, processing necessarily in-
volves grammatical facts.) But as the mechanisms which give rise to
competence obviously differ in their 'end products' from the mechanisms
which give rise to acceptability facts (performance), the details of the two
kinds of mechanisms themselves must be different. The reasons for
distinguishing competence from performance are very well set out, in partly
Saussurean terminology, by Du Bois (1985).
"Saussure (1959:11-23, 191ff) demarcates sharply between what he calls internal lin-
guistics, the study of langue, and external linguistics, which encompasses such significant
fields of study as articulatory phonetics, ethnographic linguistics, sociolinguistics,
geographical linguistics and the study of utterances (discourse?), all of which deal with
positive facts.
Classical structuralism thus establishes a gulf between the two spheres, so that
structuring forces or organizing principles which operate in the one domain will not
affect the other. Though this formulation will be seen to be too one-sided, given its
assumption that langue is in principle independent of structuring forces originating outside
it, I will suggest that the distinction between internal linguistics and external linguistics
nevertheless remains useful and in fact necessary. I will draw on this distinction to
show how certain phenomena can be at the same time unmotivated from the generative
synchronic point of view and motivated from a genuinely metagrammatical viewpoint
which treats grammars as adaptive systems, i.e. both partially autonomous (hence systems)
and partially responsive to system-external pressures (hence adaptive). This will be fruitful
only if we recognise the existence of competing motivations, and further develop a
theoretical framework for describing and analysing their interaction within specified
contexts, and ultimately for predicting the resolution of their competition. This (pan-
chronic) approach to metagrammar is part of the developing theory of what has been
called the ecology of grammar ( D u Bois, 1980:273)." (1985:343-344).
The ecological metaphor is also taken up, independently, in Hurford (1987).

While I am in sympathy with Du Bois's approach, and regard it as an
admirably clear statement of the system/use dilemma that modern lin-
guistics has forged for itself, I believe Du Bois has not gone as far as
he might in considering the ontology of grammar. That is, he still tends,

in a Saussurean way, to treat grammatical systems as abstractions, with
their own laws and principles, without locating them in the minds of
speakers. And he does not locate the mechanism of grammaticisation in
the Chomskyan LAD, which, I believe, is where it belongs.
Sociolinguists' difficulties with the competence/performance distinction
stem largely, according to Milroy, from the problem of language variation.
And several current models of language acquisition respond to the pervasive
fact of variation by proposing that the linguistic competence acquired is
itself variable. Thus Macken (1987) proposes that acquired grammars are
partly 'algebraic' and partly 'stochastic'. And the 'competition model' (Bates
and MacWhinney, 1987; MacWhinney, 1987a,b) assumes that:
"... the 'steady state' reached by adults also contains patterns of statistical variation
in the use of grammatical structures that cannot be captured by discrete rules". (Bates
and MacWhinney, 1987:158)
This echoes early attempts to reconcile sociolinguistic variation with

generative grammar's view of competence; cf. Labov's (1969) idea of
'variable rules', its development by Cedergren and Sankoff (1974), and
critical discussion by Romaine (1982:247-251).
The facts of linguistic variation and gradual linguistic change lead Kroch
(1989) to propose another possibility, distinct from both the 'single discrete
competence' and the 'probabilistic competence' views.
"If we ask ourselves why the various contexts of a linguistic alternation should, as
a general rule, be constrained to change in lock step, the only apparent answer consistent
with the facts of the matter is that speakers learning a language in the course of a
gradual change learn two sets of well-formedness principles for certain grammatical
subsystems and that over historic time pressures associated with usage (presumably
processing or discourse function based) drive out one of the alternatives". (Kroch,
1989:349)
This echoes a long tradition in linguistics (cf. Fries and Pike, 1949).
It is hard, perhaps impossible, to distinguish empirically between a
situation where a speaker knows two grammars or subsystems, correspon-
ding, say, to 'New Variety' and 'Old Variety', and a situation where a
speaker knows a single grammar or subsystem providing for a number
of options, where these options are associated with use-related labels, 'Old'
and 'New'. Plural competences would certainly be methodologically more
intractable to investigate, presenting a whole new, and more difficult, ball-
game for learnability theory, for instance. On the other hand, plural
competences do presumably arise in genuine cases of bilingualism, and
so the LAD is equipped to cope with internalizing more than one grammar
94 James R. Hurford
at a time. Perhaps plural competences are indeed the rule for the majority
of mankind, and the typical generative study of singular monolithic
competence is a product of concentrating on standardised languages (a
point made by Milroy). The question is forced on us by the pervasive
facts of statistical patterning in sociolinguistic variation, even in the usage
of single individuals, and language change. And the question is highly
relevant to language acquisition studies, as McCawley (1984:435) points
out: 'Do children possess only one grammar at a time? Or may they possess
multiple grammars, corresponding to either overlapping developmental
stages, or multiple styles and registers?'
In what follows I will simply assume that statistical facts belong to
the domain of performance and pragmatics (e.g. rules of stylistic preference
or, more globally, rules of 'code choice'), whereas facts of acquired adult
grammatical competence are not to be stated probabilistically. I do not
claim to have argued this assumption, or demonstrated that the variation
problem must be handled in this way. But one cannot explore all the
possibilities in one article, and I shall explore here how the interplay of
grammar and use might be envisaged, if one banishes probabilities from
the realm of competence. The research challenge then appears as the twin
questions: 'How does all-or-nothing competence give rise to phenomena
in which statistical distributions are apparent?' and 'How does exposure
to variable data result in all-or-nothing competence?' Possibly, these are
the wrong research questions to ask, but the only way to find out is by
seeing how fruitful theorising along these lines turns out to be. Other
researchers may pursue other assumptions in parallel. In a later subsection
(2.3), I will discuss the phenomenon of grammaticalisation, in which, over
time, a statistical pattern of use (as I assume it to be) gets fixed into
a nonstatistical fact of grammar.
1.4. The ambiguity of'functional'
Opponents of nativist explanations for linguistic universals often contrast

the Chomskyan doctrine of an innate Language Acquisition Device with
a form of explanation labelled 'functionalist'. Such functionalist expla-
nations point to the use of language as accounting for the properties of
linguistic systems. But typically in such accounts, one of two distinct aspects
of 'use' is emphasised. Hyman identifies this ambiguity clearly:
"Unfortunately, there is disagreement on the meaning of 'functional' as applied in this

context. While everyone would agree that explanations in terms of communication and
the nature of discourse are functional, it became evident in different presentations at
this workshop that explanations in terms of cognition, the nature of the brain, etc.,
are considered functional by some but not by other linguists. The distinction appears
to be that cognitive or psycholinguistic explanations involve formal operations that

the human mind can vs. cannot accommodate or 'likes' vs. 'does not like', etc., while
pragmatic or sociolinguistic explanations involve (formal?) operations that a human
society or individual within a society can vs. cannot accommodate or likes vs. does
not like". (Hyman, 1984:67-8)
The same kind of distinction between types of functional explanation is

noted, but labelled differently, by Bever (1975):
"There have been two major kinds of attempts to explain linguistic structure as the
result of speech functions. One I shall call the 'behavioural context' approach, the other
the 'interactionist' approach. The 'behavioural context approach' argues that linguistic
patterns exist because of general properties of the way language is used and general
properties of the mind. The interactionist approach argues that particular mental
mechanisms guide and form certain aspects of linguistic structure". (Bever, 1975:585-
6)
And Atkinson (1982) makes approximately the same distinction between

alternative reductive explanations for language acquisition, which he labels
'cognitive reductions' and 'social reductions'.
The distinction between cognitive and social reductions (Atkinson's
terms), between explanations based on an interactionist approach and those
based on a behavioral context approach (Bever's terms) is by no means
clear-cut. All humans have cognition and all engage in social relations;
but social relations are experienced and managed via cognition (and
perception). Social relations not thus mediated by perception and cognition
are hard, if not impossible, to conceive. A good illustration of a 'social'
principle with substantive 'cognitive' content is the Gricean Maxim of
Manner, 'Be perspicuous'. This maxim is generally (by now even con-
ventionally!) held up as an example of the influence of social considerations
on language use. But 'Be perspicuous' clearly has psychological content.
What is perspicuous to one kind of organism may be opaque to an organism
with different cognitive structuring. As Grice's work is widely known, this
statement in terms of a Gricean maxim is adequate to make the point
of the interpénétration of cognitive and social 'functional' factors. Sperber
and Wilson's (1986) Relevance Theory, which claims to have supplanted
the Gricean model with a deeper, more general, more explanatory theory
of social communication through language, lays great stress on the
individual psychological factor of processing effort. 2 Speakers' discourse
strategies are jointly motivated by what hearers find easy to understand
(a cognitive consideration) and by a desire to communicate efficiently (a
social consideration). Functional explanations can indeed have the different
emphases which Hyman, Bever, and Atkinson all identify, but cognitive
and social factors are often intermingled and not easy to separate.
96 James R. Hurford
An explanation of some aspect of language structure is functional to

the extent that it provides an account relating that aspect of structure
to some purpose for which language is used, or to some characteristic
of the users or manner of use facilitating achievement of that purpose.
The canonical form of a functional explanation is as in (4).
(4) X has form F because X is used by U a n d / o r for purpose P.
where some clear connection between F (the putatively useful form) and
U (the user) a n d / o r P (the purpose) is articulated. The connection between
form and user or purpose need not be immediate or direct but may be
mediated in some way, provided the plausibility of the connection is not
thereby lost. As a simple concrete example, consider a spade. Parts of
its form, e.g. the sharp metal blade, relate directly to the intended purpose,
digging into the earth, but other aspects of its form, e.g. its handle and
its manageable weight, relate more directly to the given (human) charac-
teristics of the user. Separating out which aspects of spade-design are
purpose-motivated and which user-motivated is not easy; likewise it can
also be difficult to separate out social (purpose-motivated) functional
explanations of language form from psychological (user-motivated) func-
tional explanations.
For the purpose of exploring the relationship between nativist and
functional explanations of linguistic phenomena, it will in fact be convenient
to continue to deal in terms of a single functional domain, which has
both cognitive and social components. This domain, which I will label
the 'Arena of Use' and discuss in the next section, is contrasted with the
'internal' domain, the domain of facts of grammar. The Arena of Language
Use must figure in any explanation of language form that can reasonably
be called a 'functional' explanation.
2. GLOSSOGENETIC MECHANISM OF FUNCTIONAL INFLUENCE ON LANGUAGE

FORM
2.1. The Arena of Use
The familiar nativist scheme for explaining the form of grammatical

knowledge is shown in Figure 2.
Primary LANGUAGE Individual

Linguistic ACQUISITION Grammatical
Data DEVICE Competence
Fig. 2.
In this scheme, the grammatical competence acquired by every individual

who learns a language conforms to a pattern determined by innate
psychological properties of the acquirer. These innate characteristics are
influential enough to impose significant patterning, not obviously discer-
nible in the primary linguistic data, on the acquirer's internalized grammar.
Whatever the primary linguistic data (within the range normally experienced
by young humans) the competence acquired on exposure to it conforms
to the specifications built into the Language Acquisition Device. So, across
languages and cultures, adult language-knowers carry what they know in
significantly similar forms, studied under the heading of Universal Grammar
(UG).
The short-term functional mechanism by which nongrammatical factors
can in principle contribute to linguistic phenomena, and ultimately to
grammatical competence, can be represented by an extra component added
to the Chomskyan diagram (Figure 2), as in Figure 3 below.
Primary Individual
Linguistic Grammatical
Data Competence
ARENA OF USE
Fig. 3.
What is the Arena of Use? Well, it is non-grammatical, that is it contains

no facts of grammar, although it relates to them. And some of it is non-
98 James R. Hurford
psychological, in the sense of being outside the domain of individual mental

processes, although it receives input from these, and provides material
for them. The Arena of Use does have some psychological ingredients,
including those directly involved in linguistic performance. The Arena of
Use is where utterances (not sentences) exist. The Arena of Use is a
generalisation for theoretical purposes of all the possible non-grammatical
aspects, physical, psychological, and social, of human linguistic interactions.
Any particular set of temporal, spatial, performance-psychological and
social coordinates for a human linguistic encounter is a point in the Arena
of Use. So, for example, an address or point in the Arena of Use that
I happen just to have visited might be approximately described by the
phrase: 'Jim Hurford, sitting in his living room at noon on January 6th,
having some cognitive trouble composing an elegant written sentence
(strictly an inscription) about the Arena of Use, for an unknown readership,
assumed to consist of assorted academic linguists, sociolinguists and
psycholinguists.' Another address in the Arena of Use might be 'Mrs Bloggs,
at the greengrocer's, asking loudly, since the grocer is a bit deaf, for 21bs
of leeks'. The Arena of Use is where communication takes place. It embraces
human relationships, the ways in which we organise our social lives, the
objects that it is important to us to communicate about, the kinds of
message it is important for us to transmit and receive. Other creatures,
built differently from ourselves, would conduct their communication in,
and have it shaped by, a different (though probably partly similar) Arena
of Use. So, note, the Arena of Use is itself partly, in fact very largely,
a product of our heredity (part of our 'extended phenotype', in Dawkins'
(1982) phrase).
The Arena of Use, like UG, has both absolute and statistical properties.
A full description of the Arena would specify a definite, obviously infinite,
range of possibilities, the coordinates of possible communicative interac-
tions between people using language; and, within this range, the likelihood
of the various possibilities being realised would be projected by various
principles, in a way analogous to the role played by a theory of markedness
within UG. Obviously, we are no nearer to a full description of the Arena
of Use than we are to a full description of UG, but central aspects of
the nature of the Arena are nevertheless relatively easily accessible for
hypothesis and consideration.
The Arena of Use is emphatically not 'everything there is (provided
it has no grammatical import)'. And it is certainly not equivalent to, or
coextensive with, 'the world' or 'the environment'. I will try to clarify.
In the first place, the world is 'out there', existing somehow outside our
perceptions. (I assume this, being a Realist and not an Idealist.) In knowing
the world, we impose categories on it that are to a great extent our own
constructs, though they presumably mesh in some way with the ways things
really are 'out there'. The Arena of Use is not populated by just whatever
exists out there, but (in part) by entities that exist-as-some-category. The
relevant idea is put thus by Edie (commenting, as it happens, on Husserl):
"For Husserl no 'object' is conceivable except as the correlate of an act of consciousness.

An 'object' is thus never a thing-in-the-world, but is rather something apprehended
about a thing; objects are things as intended, as meant, as taken by a subject". (Edie,
1976:5, his emphasis, quoted by Fraser, 1989:79)
And Fraser elaborates this eloquently:

"Many of the Objects that we encounter are presented to us as what they are through
a filter of our language and culture, rather than being constituted anew by each Subject
on the basis of individual experience". (Fraser, 1989:121)
The inclusion in the Arena of Use of abstract objects constituted through

language illustrates how it itself (or better, its instantiation in a particular
historic language community) is something dynamic and developing.
Traugott (1989) discusses a diachronic tendency for meanings of words
to develop from concrete denotations of objects and states of affairs to
more abstract denotations 'licensed by the function of language' (Traugott,
1989:35). Thus 'everything there is', 'the world', or 'the environment' is
quite different from the Life-worlds of individual subjects, speakers of
a language; the Life-worlds are in some ways richer, in some ways poorer,
than the actual world, although clearly there is a degree of correspondence.
The Arena of Use includes the sum of entities (and classes of entities)
in the Life-worlds of individual Subjects (speakers) that these subjects can
talk about. (This excludes strictly private psychological entities that might
be quite real for many individuals, but which they cannot talk about.
One reason for not being able to talk about some experience is the lack
of appropriate words and/or grammatical constructions, which is why
creative writers sometimes resort to novel forms of expression.)
The Arena of Use is not just a union of sets of (classes of) entities.
It has structure and texture (much of which remains to be articulated
by pragmatic theory). Some (but not all) of its structure is statistical, deriving
from the salience (or otherwise) for numbers of speakers of particular
classes of entities. Prominent classes of entities in the Arena of Use are
those that everyone talks about relatively frequently. Other aspects of the
Arena of Use are what Fraser, following Husserl and Heidegger, calls
'points of view' and what Lakoff (1986:49) calls 'motivation'. Humans
have purposes, and employ language to manipulate other speakers to help
them to achieve those purposes. There are ways in which this is typically
done, which gives rise to the taxonomies of Speech Act theory. In fact,
any theory of pragmatics contributes to a theory of the Arena of Use,
and the categories postulated by pragmatic theorists, such as speaker,
100 James R. Hurford
hearer, overhearer, deixis of various types, utterance, situation of utterance,

illocution, perlocution, implicature, etc., etc. are all theoretical categories
forming part of our (current) picture of the Arena of Use. The Arena
of Use is in part the subject matter of pragmatics, and it would clearly
be wrong to say that it is 'just everything there is', 'the world', or 'the
environment'. If this were so, nothing would distinguish pragmatics from,
say, a branch of physics.
As for the usefulness of coining the expression 'Arena of Use', my purpose
is to focus attention on a vital link in the transmission of language from
one generation to the next. Chomsky's similarly ambitious expression
'Language Acquisition Device' has played an enormously important role
in focussing theorists' attention on the other important link in the cycle.
Clearly, it would have been unimaginative and counterproductive several
decades ago to dismiss that expression on the grounds that it simply meant
'child'.
It should be clear that the role of the Arena of Use is complementary
to that of the LAD, not, of course, in any sense proposed as an alternative
to it. And, in fact, just because of this complementarity, studies of UG
actually need systematic information about the Arena of Use. Thus
Lightfoot (1989a: 326) is forced to resort to a 'hunch' about whether a
particular hypothetical social scenario is plausible or 'too exotic', when
conducting an argument about whether "The existence of N' might be
derived from a property of UG or ... might be triggered by the scenario
just sketched". (Grimshaw, 1989:340, complains about the lack of inde-
pendent evidence backing such hunches.) Obviously it would be too much
to expect a theory of the Arena of Use to give a direct answer to this
specific question, but, equally obviously, the more systematic a picture
of the Arena of Use we can build up, the less we will need to rely on
hunches about what the input data available to the child may be. For
instance, observation of actual caretaker behaviour is a necessary empirical
support to the axiom of 'no negative evidence' central to UG and learnability
theory (Lightfoot, 1989a:323-324, Grimshaw and Pinker, 1989:341-342,
inter alios but cf. Saleemi, this volume). And a pragmatic theory of why
caretakers give little or no negative evidence, if we could get such a theory,
would neatly complement the UG and learnability theories.
The products of an individual's linguistic competence are filtered by
the Arena of Use. In the Chomskyan scheme, the LAD acts partly as
a filter. The child in some sense disregards the properties of utterances
in the Primary Linguistic Data that do not conform to his innate
(unconscious) expectations, the characteristics that cannot be interpreted
in terms of the structure already possessed (a function recently emphasised
and elaborated on by Lightfoot, 1989a. Similarly, the Arena of Use acts
as a filter. Not all the products of an individual's competence serve any
useful purpose, and these are either simply not uttered, or uttered and
not taken up by interlocutors.
At the level of discourse, the filtering function of the Arena is accepted
as uncontroversial. A coherent discourse (monologue or dialogue) is not
just any sequence of sentences generated by a generative grammar. The
uses to which sentences are put when uttered determine the order in which
they may be strung together. With the usual reservations about performance
errors, interruptions, etc., sequences which do not serve useful purposes
in discourse do not occur in the Primary Linguistic Data to which the
child is exposed.
At the level of vocabulary, the filtering function of the Arena is also
uncontroversial. Words whose usefulness diminishes are uttered less fre-
quently, eventually falling out of use. When they fall out of use, they
are no longer present in the PLD and cannot pass into the competences
of new language acquirers. What words pass through the cycle in Figure
3, assuming their linguistic properties present no acquisition difficulties,
is almost entirely determined by considerations of use. I grant that the
relation between vocabulary and use is far from simple, as academic folk-
tales about Eskimo words for snow (cf. Pullum, 1989, Martin, 1986), and
Arabic words for camel might lead the gullible to believe. But there is
a large body of scholarship, under the various titles of ethnographic
semantics, ethnoscience, and cognitive anthropology (cf Brown, 1984, for
a recent example), building up a picture of the relation between the structure
of a community's vocabulary and its external environment. Clearly the
usefulness of words is one part of this picture. One example from Brown
is:
"The fact that warm hues cluster with white and cool hues with dark contributes to
the likelihood that languages will make a "macro-white"/"macro-black" distinction
in the initial encoding of basic color categories. A utilitarian factor may also contribute
to this development. Basic color categories become important when people develop
a need to refer to colors in a general manner. An initial "macro-white"/"macro-black"
contrast is highly apt and useful since it permits people to refer to virtually all colors
through use of general terms". (Brown, 1984:125)
At the level of a semantico/pragmatic typology of sentences, it also seems

plausible that the existence of universal types is perpetuated through the
mediation of the Arena of Use, rather than of the LAD. The three-way
distinction declarative/interrogative/imperative reflects the three most
salient types of speech act used in human interaction. This taxonomy and
its grammatical realisation is probably passed on to successive generations
via ample exemplification in the Arena of Use, necessitating no extra-
ordinary innate powers of extrapolation from skimpy data by the LAD.
A theory of the acquisition of grammatical competence, such as UG, makes
available a range of syntactic forms. Without reference to pragmatics, which

provides a classification of the uses to which sentences may be put, there
is no account of why three (as opposed to five or nineteen) types of syntactic
structure are salient and typically assigned to different uses. What UG
cannot account for, without recourse to a pragmatic theory is this:
"There is a wealth of cross-language evidence showing the existence of three or four

syntactic structures which code prototypical speech acts in any language:
(a) Declarative
(b) Imperative
(c) Interrogative
(i) WH-question
(ii) Y e s / N o question.
It is hard to find a language in which some "norm" does not exist for (a), (b), (ci)
and (cii), i.e. some structural-syntactic means for keeping these four prototypes apart."
(Givon, 1986:94)
We can think of UG as providing a theory of the formal/structural

resources, or space, available to humans for the expression of useful
distinctions. Obviously, a theory of just what distinctions are useful
(pragmatic theory, theory of the Arena of Use) is also needed. That is,
"One must then strive to discover the underlying socio-psychological
parameters which define the multi-dimensional space within which speech-
act prototypes cluster". (Givon, 1986:98) Then, interesting discussion can
proceed on how specific features of use tend to select specific structural
features of sentence form for their expression. Givon's suggestion is that
there is an iconic relation between the syntactic forms and their functions,
but this clearly needs more fleshing out. Downes (1977) is an interesting
paper suggesting why the imperative construction, in particular, occupies
the area of syntactic space that it does, e.g. with base form of the verb
and suppressed subject. A theory of grammar, such as UG, can make
available sentences with null subjects and with base verb forms, but the
question arises: Why are these sentences, in particular, typically used to
get people to do one's bidding?. My intention is not to dispense with
the theory of UG. But the allocation of individual aspects of a phenomenon
to a theory of grammar-acquisition or a theory of use must be considered
on its merits. Perhaps the assignment of 2nd person to the null subject
of imperatives, for example, is a blank that UG can afford to leave to
a theory of use. This is in fact what Beukema and Coopmans suggest:
"... the position is occupied by a case-marked empty element associated with an empty
topic, which receives the interpretation of addressee from the discourse". (Beukema
and Coopmans, 1989:435)
Beside the declarative/interrogative/imperative pragmatic typology, one

could also cite the categories of person and number, which recur in all
grammars, as motivated by factors in the Arena of Use. Hawkins puts
it concisely:
"Innateness is not the only factor to which one can appeal when explaining universals.
Certain linguistic properties may have a communicative/functional motivation. If every
grammar contains pronouns distinguishing at least three persons and two numbers (cf.
Greenberg 1966:96), then an explanation involving the referential distinctions that
speakers of all languages regularly need to draw is, a priori, highly plausible". (Hawkins,
1985:583)
The facts of grammatical person are not quite so simple. Foley (1986:66-
74) (while subscribing to the same functional explanation as Hawkins for
distinctions of grammatical person) mentions languages without 3rd person
pronouns, and Mtihlhausler and Harre (1990) claim that even 1st versus
2nd person, as usually understood, is not universal. Nevertheless Hawkins'
point stands; it is not surprising that 'the referential distinctions that
speakers of all languages regularly need to draw' cannot be described by
a simple list, but rather require description in statistical terms of significant
tendencies.
Hawkins gives a number of further plausible examples, which I will
not take the space to repeat. In a more recent, and important, contribution
the same author accounts for universal tendencies to grammaticalise certain
word orders in terms of certain (innate) parsing principles:
"The parser has shaped the grammars of the world's languages, with the result that
actual grammaticality distinctions, and not just acceptability intuitions, performance
frequencies and psycholinguistic experimental results, are ultimately explained by it.
This does not entail, however, that the parser must also be assumed to have influenced
innate grammatical knowledge, at the level of the evolution of the species, as in the
discussion of Chomsky and Lasnik (1977). Rather, I would argue that human beings
are equipped with innate processing mechanisms in addition to innate grammatical
knowledge, that the grammars of particular languages are shaped by the former as
well as by the latter, and that the cross-linguistic regularities of word order that we
have seen in this paper are a particularly striking reflection of such innate mechanisms
for processing. The evolution of these word order regularities could have come about
through the process of language change (or language acquisition): the most frequent
orderings in performance, responding to principles such as EIC [Early Immediate
Constituents, a parsing principle], will gradually become fixed by the grammar. One
can see the kinds of grammaticalization principles at work here in the interplay between
"free" word order and fixed word order within and across languages today. The rules
or principles that are fixed by a grammar in response to the parser must then be learned
by successive generations of speakers". (Hawkins, 1990:258)
Another case of the influence of phenomena in the Arena of Use on patterns

of grammar is discussed in detail by Du Bois (1987). This study attributes
the existence of ergative/absolutive grammatical patterning to preferences
in discourse structure. The study has the merit of providing substantial
statistics on these discourse preferences. The link between such discourse
preferences and ergative grammatical patterning is argued for very plausibly.
And Du Bois answers the obvious question 'Why are not all languages
ergative?' by appealing (again plausibly, I believe) to competing motivations,
discourse pressures in several directions.
As a final example here of the contribution of the Arena of Use to
the form of linguistic phenomena, I cite certain properties of numeral
systems, in particular the universal property of being organised on a base
number (often 10). There is no evidence that children somehow innately
prefer numeral expressions organised in the familiar way using as a base-
word the highest-valued available numeral word in the lexicon. Rather,
the modern streamlined systems have evolved over long historical periods
because of their practical usefulness, and they have to be deliberately
inculcated into children. (This argument concerning numerals is pursued
in detail in Hurford, 1987, where a computer simulation of the social
interactions leading to the emergence of the base-oriented structure of
numeral systems is presented.)
In summary, the Arena of Use is the domain in which socially useful
and cognitively usable expressions are selected to fit the worldly purposes
of hearers and speakers. The Arena contributes to the form of languages
in a way complementary to the contribution of the Language Acquisition
Device. Languages are artifacts resulting from the interplay of many factors.
One such factor is the LAD, another is the Arena of Use. The aspects
of languages accounted for by these two factors are complementary. As
a first approximation, one might guess that the aspects of languages due
to the LAD are relatively deep, or abstract, whereas the aspects due to
the Arena of Use are relatively superficial, in the sense in which the terms
'deep' and 'superficial' are typically used by generative grammarians. The
terms 'deep' and 'superficial' tend to be rhetorically loaded, and imply
triviality for superficial aspects of language. One need not accept such
a value judgement. The deep characteristics of languages most convincingly
attributed to the Language Acquisition Device are those to which the
'poverty of stimulus' argument applies, that is, characteristics which are
not likely to be encountered in a sampling of primary linguistic data. Such
deep characteristics are thus those which are actually least characteristic
of languages, in any normal pretheoretical sense, in the sense of being
least obvious. Thus the theoretical style typifying research into the con-
tribution of the Arena of Use is to be expected, in the first place at least,
to be more 'superficial' than research into UG and the LAD. But the
intrinsic interest of such a theory is not thereby diminished.
A full and helpful discussion of the uses of 'deep' by generative
grammarians and others, and of the misunderstandings which have arisen
over the term, is to be found in Chapter 8 of Chomsky (1979). Putting
aside the use of 'deep' as a possible technical term applied to a level of
structure (which I am not talking about here), the term 'deep' can be
applied either to theories and analyses or to phenomena and data considered
pretheoretically. Those aspects of languages due to the LAD seem, at first
pretheoretical blush, to be 'deep', to require theories of notable complexity
to account for them. These aspects of a language's structure are subtle;
they are not the most obvious facts about it, and, for instance, probably
get no attention in courses teaching the language, even at an advanced
level. Exactly this point is stated by Chomsky:
"We cannot expect that the phenomena that are easily and commonly observed will
prove to be of much significance in determining the nature of the operative principles.
Quite often, the study of exotic phenomena that are difficult to discover and identify
is much more revealing, as is true in the sciences generally. This is particularly likely
when our inquiry is guided by the considerations of Plato's problem, which directs
our attention precisely to facts that are known on the basis of meager and unspecific
evidence, these being the facts that are likely to provide the greatest insight concerning
the principles of U G " . (Chomsky, 1986:149)
This subtlety in acquired knowledge after exposure to data in which the

subtlety is not obviously present accounts for the rise of complex theories
of language acquisition. On the other hand, those aspects of languages
due to the Arena of Use (many of which would be located in the periphery
of grammars by generativists, like irregular and suppletive morphological
forms) seem not to require anything so complex - they are much less
underdetermined by data, and thus require no invocation of special deep
principles to account for their acquisition.
My reservation about not necessarily accepting the value judgements
implicit in much current usage of 'deep', stems from the association that
has now become established between 'deep' and the language-acquisition
problem. In a theory of language cast as a theory of language acquisition,
or 'guided by the considerations of Plato's problem', the term 'deep' is
applied, naturally, to aspects of language whose acquisition apparently
necessitates deep analyses. In this sense, the question of how children acquire
irregular morphological forms, for example, is relatively trivial, not deep;
the child just observes each such irregularity individually and copies it.
(Well, let's say for the sake of argument that the right answer really is
as simple as that, which it isn't, clearly.) That's not a deep answer, so
the question, apparently, wasn't deep. But seen from another perspective,
the same aspects of language could well necessitate quite deep analyses.
If one casts a theory of language as a theory of communication systems 3
operating within human societies (systems transmitted from one generation
to the next), then the problem of acquisition is not the only problem one
faces. The kind of question one asks is, for instance: Why do these
communication systems (languages) have irregular morphological forms?,
Why do languages have words for certain classes of experience, but not
for others? And the answer to these questions may be quite deep, or at
least deeper than the answers to the corresponding acquisition questions.
(A similar argument is advanced in Ch.l of Hurford, 1987)
Figure 3, introducing the Arena of Use, is actually a version of a diagram
given by H. Andersen (1973). Andersen's diagram looks like this:
Fig. 4.
Andersen is interested in the mechanisms of linguistic change, and makes

the basic point that grammars do not beget grammars. Grammars give
rise to linguistic data, which are in turn taken and used as the basis for
the acquisition of grammars by succeeding generations. Lightfoot (1979)
argues on these grounds that there can be no theory of linguistic change
expressed as a theory directly relating one grammar to a successor grammar.
A theory attempting to predict the rise of new grammars from old grammars
purely on grounds internal to the grammars themselves would be attempting
to make the spurious direct 'horizontal' link between GRAMMAR n and
GRAMMAR w+1 in Figure 4.
The zigzag in Figure 4 could be extended indefinitely across the page,
representing the continuous cycle through acquired grammars and the data
they generate. The LAD belongs on the upward arrows between data and
grammars. The Arena of Use belongs on the downward arrows, between
grammars and data. In fact Figures 3 and 4 both represent exactly the
same diachronic spiral, merely differing in emphasis. Figure 3 is simply
Figure 4 rotated and viewed 'from one end'.
Pateman (1985), also drawing on this work of Andersen's, expresses
very neatly the relationship I have in mind between grammars and social
or cultural facts:
"... through time the content of mentally represented grammars, which are not in my
view social objects, comes to contain a content which was in origin clearly social or
cultural in character". (Pateman, 1985:51)
George Miller also expresses the same thought concisely and persuasively:
"Probably no further organic evolution would have been required for Cro-Magnon
man to learn a modern language. But social evolution supplements the biological gift
of language. The vocabulary of any language is a repository for all those categories
and relations that previous generations deemed worthy of terminological recognition,
a cultural heritage of c o m m o n sense passed on from each generation to the next and
slowly enriched from accumulated experience". (Miller, 1981:33)
It is worth asking whether the social evolution that Miller writes of affects
aspects of languages besides their vocabularies. An argument that it does
is presented in Hurford (1987), especially Ch.6.
It is clear that much of language structure can be explained by innate
characteristics of the LAD; I do not claim that all, or even 'central'
(according to some preconceived criterion of centrality) aspects of languages
can be explained by factors in the Arena of Use. Bates et al. (1988:235-
6) conclude: "we have found consistent evidence for 'intraorganismic'
correlations, i.e. nonlinguistic factors in the child that seem to vary
consistently with aspects of language development". Such factors belong
to the Arena of Use, as defined here, but so far as is yet known, affect
only development, and not the end product, the content of adult grammars.
On 'extraorganismic' correlations, Bates et al. conclude: "This search for
social correlates of language has been largely disappointing". (1988:236).
At a global level, one should not be 'disappointed' or otherwise at how
scientific results turn out. The question of interest is: ' What aspects of
language structure are attributable to the innate LAD, and what aspects
to the Arena of Use?' It seems likely that the search for influences of
the Arena of Use on acquired grammars will be least 'disappointing' in
the marked periphery of grammar, as opposed to the core, as the core/
periphery distinction is drawn by UG theorists.
2.2. Frequency, statistics and language acquisition
There is serious disagreement on the role to be played by statistical

considerations in the theory of language acquisition. The tradition of
learnability studies from Gold (1967), through Wexler and Culicover (1980),
to such discussions as Lightfoot (1989a), assumes, but of course does not
demonstrate, that statistical frequencies are totally alien to language
acquisition. Theorems are derived, within a formal system, from axioms,
whose truth may perhaps be taken for granted by the inventor of the
system, but which the system itself can in no way guarantee to be true.
The theorems of learnability theory are derived in systems which assume
a particular type of definition of 'language', in particular, languages are
assumed not to have stochastic properties. But, under a different definition
of 'language', different theorems are provable, showing that frequencies
in the input data can be relevant to language acquisition. See, for example,
Horning (1969), and comments by Macken (1987:391).
But, even with a nonstochastic definition of the adult competence
acquired, it is still easily conceivable that frequency factors in the input
should influence the process of acquisition. Pinker (1987), for example,
assumes that adult competence is nonprobabilistic, but proposes a model
of acquisition in which exposure to a piece of input data results in the
'strengths' of various elements of the grammar being adjusted, usually
being incremented. The point is that in Pinker's proposal one single example
of a particular structure in the input data does not automatically create
a corresponding all-or-nothing representation in the child's internal gram-
mar; it can take a number of exposures for the score on a given element
to accumulate to a total of 1. Presumably, if that number of exposures
isn't forthcoming in the input data, that element (rule, feature, whatever)
doesn't get into the adult grammar.
Learnability theory typically operates with an assumption that the
learning device is 'one-memory limited'. This is the assumption that
"the child has no memory for the input other than the current sentence-plus-inferred-
meaning and whatever information about past inputs is encoded into the grammar
at that point". (Pinker, 1984:31)
But the success of learnability theory does not depend on the assumption
that its 'one-memory inputs' correspond to single events in the experience
of a child. It is quite plausible that there is some pre-processing front
end to the device modelled by learnability theory, such that an accumulation
of experiences is required for the activation of each one-memory input.
Likewise it is easy to envisage that the setting of parameters in the G B /
UG account needs some threshold number (more than one) of exemplars.
If there were some theorem purporting to demonstrate that this is alien
to language acquisition, one would need to examine carefully the relevant
axioms and definitions of terms, to see if they made assumptions cor-
responding appropriately to data uncovered by real acquisition studies.
There are studies revealing relationships between acquired (albeit interim)
grammars and statistical properties of the input.
"One consistent and surprising characteristic of early phonological grammars is their

close relationship to frequency and distributional characteristics of not the whole language
being learned but the specific input. ... (see, for example, Ingram, 1979 on French;
Itkonen, 1977 on Finnish; Macken, 1980 on English and Spanish)". (Macken, 1987:385)
"... certain acquisition data in conjunction with an interpretation of the relevant evidence
and correlations show that there are stochastic aspects to language acquisition, like
sensitivity to frequency information". (Macken, 1987:393)
"... Gleitman et al. (1984) cite several studies showing that the development of verbal
auxiliaries is affected by the statistical distribution of auxiliaries in maternal speech.
In particular, mothers who produce a large number of sentence-initial auxiliaries ...
tend to have children who make greater progress in the use of sentence-internal auxiliaries
... Because this auxiliary system is a peculiar property of English, it cannot belong
to the stock of innate linguistic hypotheses. It follows that auxiliaries have to be picked
up by some kind of frequency-sensitive general learning mechanism". (Bates et al.,
1988:62)
There are several studies indicating the influence of word-frequency on

internalised phonological forms.
"Neu (1980) found that adults delete the / d / in 90 percent of their productions of
and, compared to a 32.4 per cent rate of / d / deletion in other monomorphemic clusters;
... Fidelholtz (1975) has observed less in the way of perceptible vowel reduction for
frequent words, and Koopmans-van Beinum and Harder (1982/3) have confirmed this
in the laboratory. The frequency-reducibility effect evidently holds even where syllabic
and phonemic length are equated (Coker, Umeda and Browman 1973; Wright 1979),
and as the effect has little to do with differences in the information content or predictability
of high and low frequency words (Thiemann 1982), their different reducibility suggests
that frequent (i.e. familiar) words may be stored in reduced form. [Footnote:-] Though
it is not my purpose here to deal with the child's role in phonological change, my
discussion here ... has an obvious bearing on this subject". (Locke, 1986:248; footnote,
524)
In the framework advanced here, either the rule deleting / d / is not a

rule of phonological competence, but belongs to the Arena of Use, or,
if it is a rule of phonological competence, it is an optional rule, with
applicability sensitive to factors in the Arena of Use (e.g. speed of speech)).
Further evidence of a relationship between word frequency and internalised
grammars is provided by Moder (1986):
"High frequency forms were found to be poorer primes of productive patterns than
medium frequency forms. Furthermore, the real verb classes which showed some
productivity were those with fewer high frequency forms. Because high frequency forms
are often rote-learned [Bybee and Brewer, 1980], they are less likely to be analysed
and related morphologically to the other members of their paradigm." (Moder, 1986:180)
Phillips (1984) discusses two distinct kinds of historical lexical phonological

change, both clearly correlated, in different ways, with word-frequency:
"Changes affecting the most frequent words first are motivated by physiological factors,
acting on surface phonetic forms; changes affecting the least frequent words first are
motivated by other, non-physiological factors, acting on underlying forms". (Phillips,
1984:320)
In a generative view of sound change, just as in the view I am advancing

here, a sound change cannot 'act on surface phonetic forms', since what
differs significantly from one generation to the next is speakers' grammars,
and these contain underlying phonological forms and phonological rules,
but no direct representation of surface phonetic forms. Phillips does not
discuss the micro-implementation of these sound changes at the level of
the individual's acquisition of language, but a straightforward interpretation
of her results is as follows. Physiological factors (in the Arena of Use)
produce phonetically modified forms, whose frequency gives rise, in the
language-acquiring generation, to internalised underlying forms closer to
the observed phonetic forms. On the other hand, non-physiologically
motivated changes arise from what Phillips, following Hooper (1976), calls
'conceptually motivated change', i.e. some kind of reorganisation of the
grammar for purposes of maximisation or achievement of some internal
property. But these changes, apparently, cannot fly in the face of strong
evidence on pronunciation coming from the Arena of Use. Only where
such evidence from the Arena is very slight, as with low-frequency words,
can the internal grammar reorganisation, for these cases, override the input
evidence. Thus, frequency factors from the Arena of Use affect the shape
of evolving languages, both positively (pressing for change) and negatively
(resisting change).
The argument against the relevance of statistical considerations has
another strand, which contrasts the subtlety, speed and effortlessness of
our grammatical judgments with the poverty of our statistical intuitions,
even the most elementary ones (this argument might cite research by Amos
Tversky and Daniel Kahneman). There are several points here. Firstly,
it is possible to exaggerate the subtlety, speed, and effortlessness of our
grammatical judgments. Chomsky points out in many works how our
grammatical knowledge needs to be 'teased out' (in the phrase used in
Chomsky, 1965). For instance, "Often it is not immediately obvious what
our knowledge of language entails in particular cases" (Chomsky, 1986:9),
and "... it takes some thought or preparation to see that (13) has the
interpretation it does have, and thus to determine the consequences of
our knowledge in this case" (ibid: 11).
A second point is that the relevant human frequency monitoring abilities
are not poor, but quite the contrary, as a seminal publication in the
psychological literature shows.
"People of all ages and abilities are extremely sensitive to frequency of occurrence
information. ... [In] the domain of cognitive psychology ... we note that the major
conclusion of this area of research stands on a firm empirical base: The encoding of
frequency information is uninfluenced by most task and individual difference variables.
As a result, memory for frequency shows a level of invariance that is highly unusual
in memory research. This is probably not so because memory is unique but because
memory researchers have paid little attention to implicit, or automatic, information
acquisition processes. Here we demonstrated the existence of one such process. We
also showed its implications for the acquisition and utilisation of some important aspects
of knowledge". (Hasher and Zacks, 1984:1385)
Hasher and Zacks also briefly discuss the relation of their work to that
of Tversky and Kahneman; they conclude "... the conflict between our
view and that of Tversky and Kahneman is more apparent than real"
(p. 1383)
Thus far, my arguments have been that statistical patterns in the input
can and do affect the content of the acquired competence, perhaps especially
where the language changes from one generation to the next (i.e. where
the acquired competence differs from the competence(s) underlying the
PLD). There is another, powerful, argument indicating the necessity, for
language acquisition to take place at all, of a certain kind of statistical
patterning in the input data. This involves what has been called the 'Semantic
Bootstrapping Hypothesis', discussed in detail by Pinker (1984), but
advanced in various forms by several others.
Briefly, the Semantic Bootstrapping Hypothesis states that the child
makes use of certain rough correspondences between linguistic categories
(e.g. Noun, Verb) and nonlinguistic categories (e.g. discrete physical object,
action) in order to arrive at initial hypotheses about the structure of strings
he hears. Without assuming such correspondences, Pinker argues, the set
of possible hypotheses would be unmanageably large. This seems right.
It is common knowledge, of course, that there is no one-to-one corre-
spondence between conceptual categories and linguistic categories - any
such correspondence is statistical. Pinker (1984:41) lists 24 grammatical
elements that he assumes correspond to nonlinguistic elements. (In Pinker,
1989 the background to the hypothesis is modified somewhat, but not
in any way that endangers the main point.) Now, according to the Semantic
Bootstrapping Hypothesis, if these correspondences are not present in the
experience of the child, grammar acquisition cannot take place.
UG theory characterises a class of possible grammars. These grammars,
as specified by UG, make no mention of nonlinguistic categories. Of course,
for the grammars to be usable, nonlinguistic categories must be associable
with elements of a grammar. For instance, the lexical entry for table must,
if a speaker is to use the word appropriately, get associated with the
nonlinguistic, experiential concept of a table (or tablehood, or whatever).
But UG theory makes no claim about how the nonlinguistic categories

are related to elements of grammars. A possible grammar, in the UG
sense, might be considerably complex, and yet not contain any elements
that happened to be associated with concrete physical objects, or actions,
for example. And the sentences generated by such a grammar could in
fact still be usable, say for abstract discourse, if, miraculously, a speaker
had managed to learn it. Such a speaker could, for instance, produce and
interpret such sentences as Linguistic entities correspond roughly to non-
linguistic entities, or Revolutionary new ideas are boring.
But he could not talk about physical objects or actions. And if the
Semantic Bootstrapping Hypothesis is true, his speech could not constitute
viable input data for the next generation of learners. Thus a theory which
aims to account for the perpetuation of (universals of) language across
generations, via the innate LAD, actually requires specific conditions to
be met in the Arena of Use. These conditions are not, as it happens,
absolute, but are statistical.
Of course, I do not claim that statistical properties of input are the
only ones relevant to the acquisition of competence. I agree with Lightfoot's
position:
"It has long been known that not everything a child hears has a noticeable or long-
term effect on the emergent mature capacity; some sifting is involved. Some of the
sifting must surely be statistical, some is effected through the nature of the endowed
properties ..." (Lightfoot, 1989b:364)
Facts of grammar are likely to be distributed along a dimension according

to whether their acquisition is sensitive to frequency effects in the input
data. Some aspects of grammar may involve very rapid fixing (once the
child is 'ready') on the basis of very little triggering experience. Other
aspects of grammar may be harder to fix, requiring heavier pressure (in
the form of frequency, among other things) from the input experience.
This suggested dimension is a graded version of Chomsky's binary core/
periphery distinction.
Chomsky seems to acknowledge the greater role of input data for the
acquisition of the periphery of grammar:
"... we would expect phenomena that belong to the periphery to be supported by specific
evidence of sufficient 'density'..." (Chomsky, 1986:147)
Whether or not Chomsky intended frequency considerations to contribute

to this 'density', there is no principled reason why they should not. As
Pinker writes:
"Ultimately no comprehensive and predictive account of language development and

language acquisition can avoid making quantitative commitments altogether. After all,
it may turn out to be true that one rule is learned more reliably than another only
because of the steepness of the relevant rule strengthening function or the perceptual
salience of its input triggers". (Pinker, 1984:357)
Pinker then states a methodological judgement that 'For now there is little
choice but to appeal to quantitative parameters sparingly'. I share his
apprehension about the possibility of 'injudicious appeals to quantitative
parameters in the absence of relevant data', but the solution lies in making
the effort to obtain the relevant data, rather than in prejudging the nature
(statistical or not) of the theories that are likely to be correct.
2.3. Grammaticalisation, syntacticisation, phonologisation
Previous work has identified a phenomenon of'grammaticalisation', dealing

precisely with historical interactions between the Arena of Use and
individual linguistic competences. Some such work is vitiated by a misguided
attempt to abolish the competence/performance distinction.
Givon (1979:26-31) surveys a number of cases in which, on one view
of grammar (a view Givon appears emphatically not to hold) "... one
may view a grammatical phenomenon as belonging to the realm of
competence in one language and performance-text frequency in another"
(26). Givon's examples are: (i) the definiteness of subjects of declarative
clauses, obligatory in some languages, but merely preferred in others; (ii)
the definiteness of referential objects of negative sentences, obligatory in
some languages, but merely preferred in others; and (iii) the lack of an
overt agent phrase with passive constructions, obligatory in some languages,
but merely the preferred pattern in others. The preferences involved can
be very strong, but in the languages where the facts seem not to be a
matter of absolute rule, but of preference, one can find isolated examples
of the pattern that would be ungrammatical in the other language.
In precisely similar vein, though not sharing Givon's conclusions, Hyman
(1984) writes that he has been
"... intrigued by a puzzling recurrent pattern which can be summarized as in
(1) a. Language A has a [phonological, phrase-structure, transformational] rule

R which produces a discrete (often obligatory) property P;
b. Language B, on the other hand, does not have rule R, but has property P in
some (often nondiscrete, often nonobligatory) less structured sense". (Hy-
man 1984:68)
And Corbett (1983) in an impressively documented study gives many

instances where one Slavic language has an absolute rule which is paralleled
by a statistical tendency in some other Slavic language. One such case

is:
The agreement hierarchy
attributive - predicate - relative pronoun - personal pronoun
"In absolute terms, if semantic agreement is possible in a given position in the hierarchy,
it will also be possible in all positions to the right. In relative terms, if alternative
agreement forms are available in two positions, the likelihood of semantic agreement
will be as great or greater in the position to the right than in that to the left." (Corbett,
1983:10-11)
Givon offers an alternative view to the one quoted above:
"Or one may view the phenomenon in both languages in the context of 'communicative
function', as being essentially of the same kind. The obvious inference to be drawn
from the presentation is as follows: If indeed the phenomenon is of the same kind
in both languages, then the distinction between competence and performance - or
grammar and speaker's behaviour - is (at least for these particular cases) untenable,
counterproductive, and nonexplanatory." (Givon, 1979:26)
This passage, like other polemical passages in linguistics, is a curious mixture

of over- and understatement. It ends like a Beethoven symphony, with
repeated heavy chords, slightly varied, but united in their effect 'untenable,
counterproductive, and nonexplanatory'. But immediately before is the
weakening parenthetical caveat '(at least for these particular cases)', and
the whole conclusion is in fact embedded in a conditional, ' I f indeed the
phenomenon is of the same kind in both languages' [emphasis added,
JRH]. So, if the condition is not met and the phenomena are not of the
same kind in both languages, the three big guns 'untenable, counterpro-
ductive, and nonexplanatory' aimed at the competence/performance dis-
tinction don't actually go off. And, even if the condition is met, they
may only be aimed at the distinction 'for these particular cases'. Much
of Givon's book reflects this kind of rhetorical mixture. The message,
if interpreted as urging alternative emphases in linguistic study, is entirely
reasonable; more work should be done on the relation between commu-
nicative, pragmatic, and discourse phenomena and grammar, and this,
thanks to the efforts of people like Givon, is beginning to happen. Clark
and Haviland, 1974 is another work in which a reasonable argument for
emphasis on discourse study is in places rhetorically inflated to a claim
that "the borderline between the purely linguistic and the psychological
aspects of language... may not exist at all", (p.91).) But the relation between
grammar on the one hand and discourse phenomena on the other cannot
be studied if the two sets of phenomena actually turn out to be the same
thing, as Givon appears in places to believe. There are good reasons to

maintain a distinction between facts of grammar and facts of discourse,
and Givon manages throughout his book to write convincingly as if the
distinction were valid. What is of great interest is the parallelism between
the two domains, illustrated by Givon, Hyman, and Corbett, as cited at
the beginning of this section.
Absolute grammatical rules in one language paralleled closely by sta-
tistical discourse preferences in another language may seem something of
a puzzle. But the puzzle can be relatively easily resolved. Let me risk
giving a nonlinguistic analogy, asking the reader to make the usual mutatis
mutandis allowances necessary for all analogies.
Some people eat a variety of foods, but, without having made any decision
on the matter, happen hardly ever to eat meat; other people are vegetarians
by decision, though sometimes they may accidentally eat meat. Some people,
as a matter of habit, drink no alcohol; for others, this is not a matter
of habit, but of principle. Some people are pacific by nature; others are
pacifists on principle. The principled vegetarians, teetotallers and pacifists
have made conscious absolute decisions which are parallel to the statistical
behavioural tendencies of certain other people. But there is a valid
distinction to be made between the two categories. This distinction is not
particularly obvious from mere observation of behaviour. But as humans,
we have the benefit of (some) self-knowledge, and we know that there
is a difference between a principled vegetarian and a person who happens
hardly ever to eat meat, and between a principled pacifist and a pacific
person.
No analogy is entirely apt, however. This one suffers in at least two
ways. Firstly, speakers of a language do not normally make conscious
decisions, like the pacifist or the vegetarian, about their own rules of
grammar; and secondly, the vegetarian/teetotaller/pacifist analogy distin-
guishes between individuals in the same community, whereas rules of
grammar tend to be shared by members of a speech-community. What
I hope this analogy demonstrates is that similar overt patterns of behaviour
can be attributed to different categories, such as fact of discourse, or fact
of grammar. The categories may also be historically related, as I believe
discourse and grammar are, but they are not now a single unified
phenomenon.
I assume, then, that there is factual content to the notion of following
an internalised rule. Chomsky (1986), in a lengthy and cogent discussion,
disposes quite satisfactorily, in my view, of the Wittgensteinian objections,
taken up by Kripke (1982), to attribution of rule-following by other
organisms, be they fellow-speakers of one's language, foreign humans, or
even other animals. Wittgensteinians (among whom one would include,
for instance, Itkonen, 1978) have often objected to the generativists'
interpretation of rules of language as essentially belonging to the psychology

of individuals, and a generativist response to this theme of Wittgenstein's
is now satisfactorily articulated. (Perhaps the delay in responding arose
from the enigmatic style of Wittgenstein's original presentation, and
Kripke's reformulation gave the clarity needed for a careful rebuttal.) I
uphold the view that rules of language belong to individual psychology,
and should not in any collective sense be attributed to communities (e.g.
as social norms). Thus far, I agree completely with Chomsky's position
on rules and rule-following. But now here is where we part company:
"reference to a community seems to add nothing substantive to the
discussion" (Chomsky, 1986:242). I maintain, on the contrary, that com-
munities play a role in determining what rules an individual acquires (which
is obvious), and, more generally, that general facts about human communal
life play a role in determining the kinds of rules that individuals born
into any human community acquire. Pateman expresses the idea so well
that his words are worth repeating:
"... through time the content of mentally represented grammars, which are not in my
view social objects, comes to contain a content which was in origin clearly social or
cultural in character." (Pateman, 1985:51)
The historical mechanism by which facts of discourse 'become' facts of

grammar is often labelled 'grammaticalisation'. To prevent confusion, it
should be stressed that the result of this process does not necessarily involve
a class of previously ungrammatical strings becoming grammatical. The
converse process can also occur. What gets grammaticalised is a pattern,
or configuration of facts, not some class of strings which happens to
participate in such a pattern. The following are the main interesting
possibilities, in terms of two classes of strings, A and B, which are in
some sense functionally equivalent (e.g. (partially) synonymous).
(5) A and B are both grammatical, but A is preferred in use.
Diachronic change T
in either direction. |
A is grammatical, and B is ungrammatical, though B may occur in

use
Change in either direction involves a new fact of grammar emerging, which

is why such changes are aptly called 'grammaticalisation'. But only change
in one direction (upward in (5)) involves previously ungrammatical strings
becoming grammatical. Change in either direction would account for the
parallelisms noted by Givon, Hyman, and Corbett. Another possibility

is:
(6) A and B, both grammatical, are wholly equivalent in meaning and

use.
Diachronic change T
in either direction. |
A and B, both grammatical, but differ slightly in meaning and use.
As the relation of surface forms to their linguistic meanings is a matter

of grammatical competence, this is also a case of the emergence of a new
fact of grammar, and aptly called 'grammaticalisation'.
How does the mechanism of grammaticalisation work and how does
it relate to the question of nativist versus functional explanations? I beg
leave to quote myself.4
"In the model proposed, individual language learners respond in a discrete all-or-nothing
way to overwhelming frequency facts. Language learners do not merely adapt their
own usage to mimic the frequencies of the data they experience. Rather, they 'make
a decision' to use only certain types of expression once the frequency of those types
of expression goes beyond some threshold. At a certain point there is a last straw
which breaks the camel's back and language learners 'click' discretely to a decision
about what for them constitutes a fact of grammar. What I have in mind is similar
to Bally and Sechehaye's suggestion about Saussure's view of language change. 'It is
only when an innovation becomes engraved in the memory through frequent repetition
and enters the system that it effects a shift in the equilibrium of values and that language
[langue] changes, spontaneously and ipso facto' (Saussure, 1966:143n). Bever and
Langendoen (1971:433) make the same point nicely by quoting Hamlet: 'For use can
almost change the face of nature'". (Hurford, 1987:282-3, slightly adapted)
Beyond the kind of vague remarks cited above, no-one has much idea
of how grammaticalisation works. Givon's book documents a large number
of interesting cases, but his account serves mainly to reinforce the conclusion
that grammaticalisation happens, rather than telling us how it happens.
And of course the fact that it does happen, that aspects of performance
get transmuted into aspects of competence, reinforces, rather than un-
dermines, the competence/performance distinction. But one thing that is
clear about grammaticalisation is that the LAD plays a vital part. This
emerges from Givon's discussion of Pidgins and Creoles, in which the
discrete step from Pidgin to Creole coincides with language acquisition
by the first-generation offspring of Pidgin speakers.
"Briefly, it seems that Pidgin languages (or at least the most prevalent type of Plantation
Pidgins) exhibit an enormous amount of internal variation and inconsistency both within
the output of the same speaker and across the speech community. The variation is
massive to the point where one is indeed justified in asserting that the Pidgin has no
stable syntax. No consistent "grammatical" word-order can be shown in a Pidgin, and
little or no use of grammatical morphology. The rate of delivery is excruciatingly slow
and halting, with many pauses. Verbal clauses are small, normally exhibiting a one-
to-one ratio of nouns to verbs. While the subject-predicate structure is virtually
undeterminable, the topic-comment structure is transparent. Virtually no syntactic
subordination can be found, and verbal clauses are loosely concatenated, usually separated
by considerable pauses. In other words, the Pidgin speech exhibits almost an extreme
case of the pragmatic mode of communication.
In contrast, the Creole - apparently a synthesis di novo [sic] by the first generation
of native speakers who received the Pidgin as their data input and proceeded to "create
the grammar" - is very much like normal languages, in that it possesses a syntactic
mode with all the trimmings ... The amount of variation in the Creole speech is much
smaller than in the Pidgin, indistinguishable from the normal level found in "normal"
language communities. While Creoles exhibit certain uniform and highly universal
characteristics which distinguish them, in degree though not in kind, from other normal
languages, they certainly possess the entire range of grammatical signals used in the
syntax of natural languages, such as fixed word order, grammatical morphology,
intonation, embedding, and various constraints". (Givon, 1979:224)
This passage makes the case so eloquently for the existence of an innate
Language Acquisition Device playing a large part in determining the shape
of normal languages that one would not be surprised to tind it verbatim
in the introduction to a text on orthodox Chomskyan generative grammar.
In my terms, the prototypical Pidgin is a hybrid monstrosity inhabiting
the Arena of Use, limping along on the basis of no particular shared core
of individual competences. The main unifying features it possesses arise
from its particular spatial/temporal/social range in the Arena of Use. When
a new generation is born into this range, and finds this mess, each newborn
brings his innate linguistic faculty to bear on it and helps create, in
interaction with other members of the community, the grammar of the
new Creole.
The picture just given is, by and large, that of Bickerton's Language
Bioprogram Hypothesis (Bickerton, 1981), and is probably correct in broad
outline, if no doubt an oversimplification of the actual facts. "Usually,
however, the trigger experience of original Creole speakers is shrouded
in the mists of history, and written records of early stages of Creole languages
are meagre." (Lightfoot, 1988:100) A vast amount of empirical research
into the creolisation process needs to be done before interesting details
become discernible, but clearly the focal point of the process is the point
where the innate LAD meets the products of the Arena of Use. The step
from a Pidgin to a Creole is an extreme case of many simultaneous
grammaticalisations across virtually the whole sweep of the (new) language.

Creolisation is massive grammaticalisation. But it is also, due to the
historical rootlessness of the Pidgin, grammaticalisation with a very free
hand. The LAD can impose its default values against weak opposition
from the Pidgin PLD. In discussing grammaticalisation, I do not presuppose
that the input to the process is necessarily some pattern evident in use.
My position is that grammaticalisation is the creation, by the LAD, of
new facts of grammar. Where the input is chaotic, the LAD has a very
free hand, and the new facts of grammar reflect the LAD's influence almost
solely. But where patterns of use exist in the input data, the new facts
of grammar may in certain instances reflect those patterns. We can call
these latter cases 'grammaticalisations of patterns of use', and the former
(dramatic creole) cases 'grammaticalisations from nothing'.
Creoles are in some sense more natural than languages with long histories.
Languages with long histories become encrusted with features that require
non-default setting from the LAD, and even rote-learning. These encru-
stations are due to innovations in the Arena of Use over many generations.
Many of these developments can be said to be functionally motivated.
I have already mentioned in passing several historical studies (Bever and
Langendoen, 1971, Phillips, 1984) which make at least prima facie cases
for the influence, across time, of use on structure. And in section 2.6,
I will add to the list of recent historical linguistic studies which point
to the role of the Arena of Use in determining, at least in part, the contents
of grammatical competence. In these cases, the languages have drifted,
due to pressures of use, to become, in some sense, historically more 'mature'
than a new creole.
It seems reasonable to suppose that sheer statistical frequency of
particular patterns in the Arena of Use will play some part in determining
what grammatical rules will be formed. This is one way in which a
parallelism between discourse patterns and grammatical rules would arise.
But of course the LAD is not merely quantitatively, but also qualitatively
selective. It is not the case that any, i.e. every, frequent pattern becomes
grammaticalised. If this were so, the most common performance errors,
hesitation markers and such like would always get grammaticalized, which
of course they often don't. (But note that hesitation markers do tend to
become fitted into the vowel system of the dialect in question, i.e. to become
phonologised. Cf. the various hesitation vowels in RP ([3:]), Scots English
([e:]), and French ([0:]).) I believe that Lightfoot, in his 1988 paper,
somewhat oversimplifies the relation between the qualitative and the
quantitative selectivity of the LAD in the following remarks:
"The most obvious point is that not everything that the child hears 'triggers' a device
in the emerging grammar. For example, so-called 'performance errors' and slips of
the tongue do not entail that the hearer's grammar be amended in such a way as to
generate such deviant expressions, presumably because a particular slip of the tongue
does not occur frequently enough to have this effect. This suggests that a trigger is
something that is robust in a child's experience, occurring frequently. Children are
typically exposed to a diverse and heterogeneous linguistic experience, consisting of
different styles of speech and dialects, but only those forms which occur frequently
for a given child will act as triggers, thus perpetuating themselves and being absorbed
into the productive system which is emerging in the child, the grammar." (Lightfoot,
1988:98)
This seems to equate 'potential trigger experience' with 'frequent expe-

rience'. Lightfoot has now developed his ideas on the child's trigger
experience further (Lightfoot, 1989), but he still holds that some statistical
considerations are relevant. While, with Lightfoot, I believe that frequency
in the Arena of Use is an important determinant of the grammars that
children acquire, there must also be substantial qualitative selectivity in
the LAD. Some aspects of competence can be picked up on the basis
of very few exemplars, while the LAD stubbornly resists acquiring other
aspects for which the positive examples are very frequent. The particular
qualitative selectivity of the LAD is what is studied under the heading
of grammatical universals, or UG.
2.4. The role of invention and individual creativity
Prototypical short-term functional explanations involve the usefulness of

some aspect of a language making itself felt within the time a single
individual takes to acquire his linguistic competence (although I shall later
mention a version of the same basic mechanism that happens to take
somewhat longer). This period may vary from a dozen years, for gram-
matical constructions, to a whole lifetime, for vocabulary. But, in the
prototypical case, a short-term functional explanation involves postulating
that each individual acquiring some language recognizes (perhaps uncons-
ciously) the usefulness of some linguistic element (word, construction, etc.)
and adds that item to his competence because it is useful. Some universal
facts of vocabulary, such as the fact that every human language has at
least one word with a designatum in the water/ice/sea/river area, can
be illuminated in this way, as can also many language-particular facts,
such as those of color, plant, and animal taxonomies worked on in detail
by the 'ethnographic semantics' movement (e.g. Brown, 1984). Thus, those
aspects of languages for which short-term functional explanations are
available are characteristically transmitted culturally. Individuals actually
learn these aspects of their language from other members of their com-
munity. They are not innate. Such aspects of languages, therefore, are
typically well-determined by the observable data of performance, since
they need to be sufficiently obvious to new generations to be noticed and
adopted.
Obviously, quite a lot is innate in the lexicon too. For instance, no
single verb can mean 'eat plenty of bread and...', 'persuade a woman that...',
'read many books but not...'. The constraints on possible lexical meanings
are strong and elaborate. My point is that, within such innately determined
constraints, the matter of what lexical items a language possesses is
influenced by factors of usefulness. Individual inventiveness cannot violate
the innately determined boundaries, see Hurford, 1987, Ch.2,Sec.5, for
a detailed discussion of the relation of individual inventiveness to the
capacity for language acquisition.
Aspects of languages transmitted culturally from one generation to the
next because of their usefulness have their origins in the inventiveness
and creativity (presumably in some sense innate) of the individuals who
first coined them and gave them currency. In the field of vocabulary again,
it is uncontroversial that new words are invented by individuals, or arise
somehow from small groups. Often it is not possible to trace who the
first user of a new word was, but nevertheless there must have been a
first user. In other parts of languages, such as their phonological, mor-
phological, syntactic, semantic and pragmatic rule components, it is difficult
to attribute the origins of particular rules to the creativity of individuals
or groups, but even here a kind of attenuated creativity in the use of
language, proceeding by small increments over many generations, seems
plausible. The approximate story would be of existing rules having their
domain of application gradually extended or diminished due to a myriad
of small individual choices motivated by considerations of usefulness. Very
few rules of syntax are completely general in the sense of having no lexical
exceptions. Such sets of lexical exceptions are augmented or lessened
continually throughout the history of languages. The specifically functional
considerations, that is considerations of usefulness, which motivate such
changes in the grammar of a language are of course usually impossible
to identify with accuracy, and will remain so until we have much subtler
theories and taxonomies of language use (which will help us to define
the notion of usefulness itself more precisely).
The historical role of invention and creativity that I have in mind is
envisaged by Gropen et al. (1989) and described by Mithun (1984):
"Instead, it could be that the historical processes which cause lexical rules to be defined
over some subclasses but not others seem to favour the addition or retention of narrow
classes of verbs whose meanings exemplify or echo the semantic structure created by
the rule most clearly. The full motivation for the dativisability of a narrow class may
come from the psychology of the first speakers creative enough or liberal enough to
extend the dative to an item in a new class, since such speakers are unlikely to make
such extensions at random. Thereafter speakers may add that narrow class to the list
of dativisable classes with varying degrees of attention to the motivation provided by
the broad-range rule - by recording that possibility as a brute memorised fact, by grasping
its motivation with the aid of a stroke of insight recapitulating that of the original
coiners, or by depending on some intermediate degree of appreciation of the rationale
to learn its components efficiently, depending on the speaker and the narrow class
involved". (Gropen et al., 1989:245)
"But in Mohawk, where NI [ = noun incorporation] of all types is highly productive,

speakers frequently report their pleasure at visiting someone from another Mohawk
community and hearing new NI's for the first time. They have no trouble understanding
the new words, but they recognise that they are not part of their own (vast) lexicon.
When they themselves form new combinations, they are conscious of creating 'new
words', and much discussion often surrounds such events." (Mithun, 1984:889)
The acts of individual speakers in responding creatively to considerations

of usefulness are analogous to micro-events at the level of molecules, and
the large movements of languages discernible to historical linguists are
analogous to macro-events, such as those described in geophysical terms
of plate tectonics (this analogy is Bob Ladd's). Whether or not we call
a language in which there has been one micro-change a different language
is a question of terminology. Let us adopt, temporarily and for argument's
sake, the rigid convention that any one change, however slight, in a language
L n produces a different language L n + 1 . This effectively equates 'language'
with some abstraction even lower in level than 'idiolect', and so is not
a generally useful convention in talking about language 5 . But, adopting
this usage, competition in the Arena of Use determines whether L n or
L n + 1 survives. These minimally differing languages may continue to co-
exist, because neither is significantly more useful than the other, or one
may replace the other because it is in some sense more useful. Adopting
a different terminological convention, wherein 'languages' are grosser
entities, distinguished by masses of detailed differences, it is still competition
in the Arena of Use which decides the survival of languages. The 'languages'
I have in mind in this paragraph are I-languages. But since they, existing
only inside speakers, can never come into contact with each other, the
competition between them is actually fought out through the medium of
their corresponding E-languages in the Arena of Use. (An approximate
analogy would be a tournament acted out by marionette puppets whose
behavioural repertoires (kick, punch, etc.) are specified by different pro-
grams of their robot operators, though the set of programs available in
principle to all robots is the same. When a puppet loses a match, the
program in the robot that was running it is eliminated. But remember
that no analogy is perfect.)
A schematic representation of the state of affairs postulated in a functional

explanation of the short-term type is given in Figure 5, below. Note that
the 'languages' mentioned in this diagram are E-languages, since they exist
in and through the Arena of Use; that is, they correspond to the competing
marionettes of the analogy of the previous paragraph.
SHORT-TERM MECHANISM OF FUNCTIONALLY MOTIVATED CHANGE ONTOGENETIC (OR

GLOSSOGENETIC) MECHANISM
GRAMMARS G1 G2 G3
/IK / u vms
/iiX / m \ /
I 11 AoU LAD I | . AoU LAD \\\ AoU
II I
/Il \ / i\\
REALISED I I \
LANGUAGES 1I 1I I1 LI ,' ' \ L2 ^ ^ L3
I I I \ \ \
I I \ / I l \ \ ^
1
/ ' / I I 1 \ \
UNREALISED / | \ j | \ \ \ \
LANGUAGES La Lb Lc... Li Lj Lk... Lx Ly Lz...
COMPATIBLE
WITH G l . e t c
Fig. 5.
The upper two levels in this diagram indicate the course of actual linguistic
history: the actually mentally represented grammars G l , G2, G3, ..., and
the actually realised languages LI, L2, L3, ... The bottom level in the
diagram represents alternative language histories - what languages might
have been realised if the pressures of the Arena of Use had been other
than what they actually were. These possible but unrealised languages can
be thought of as aborted due to competition in the Arena of Use from
a more successful rival language. Competition in the Arena of Use, in
the case of this short-term functional mechanism, is therefore between
possible languages defined by the same LAD. (Figure 5 is in fact another
variant of Andersen's scheme in Figure 4.) The unrealised languages are
possible but non-occurring aggregates of real speech events in the language
community, alternative courses of history, in effect.
The scheme shown in Figure 5 is obviously idealised in many ways. One

aspect of this idealisation worth mentioning is the fact that only one LAD
is represented at any transition, whereas in fact language change is mediated
by whole populations of LADs (tokens not types), all (1) exposed to different
(though partially intersecting) data, (2) possibly themselves subject to some
maturational change (see White, 1982:68-70, Borer and Wexler, 1987,1988),
and (3) perhaps even not originally completely uniform. In a real case,
some individuals would internalise grammars slightly different from those
internalised by others. This difference would be reflected by statistical

changes in the Arena of Use, which in turn might prompt a rather larger
proportion of language learners in the next generation to acquire grammars
with a certain property. In this way, it might take many generations for
a whole population to accomplish what with historical hindsight looks
like a single discrete change. The term 'ontogenetic mechanism' might
well be reserved for a case where a whole nonstatistical language change
is achieved in a single generation, rather like the Bickerton/Givon picture
of the leap from Pidgins to Creoles. That is, the new (version of the)
language grows, fast, in just the time it takes one generation of individuals
to acquire/create it. The slower version of the mechanism, which takes
more than one generation, could appropriately be called the 'glossogenetic
mechanism'. The only difference between the ontogenetic and the glos-
sogenetic mechanism is in the number of generations taken.
2.5. The problem of identifying major functional forces
This picture of functionally motivated language change has its opponents.

One of the fiercest and most sustained critiques of this general point of
view that I am aware of is in Lass (1980:64-97). Lass's view (in which
he is not alone) is summed up in:
"Merely on the evidence provided so far, if my arguments are sound, the proponents
of any functional motivation whatever for linguistic change have to do one of two
things:
(i) Admit that the concept of function is ad hoc and particularistic and give up;
or
(ii) Develop a reasonably rigorous, non-particularistic theory with at least some
predictive power; not a theory based merely on post hoc identification plus a
modicum of strategies for weaseling out of attempted disconfirmations.
This is the picture as I see it: (i) is of course the easy way out, and (ii) seems to be
the minimum required if (i) is not acceptable. I am myself not entirely happy with
(i), and it should probably not be taken up - though failing a satisfactory response
to (ii) it seems inevitable." (Lass, 1980:79-80)
Lass discusses functional explanation under three subheadings: 'preser-

vation of contrast', 'minimization of allomorphy', and 'avoidance of
homophony', and convincingly demolishes claims by various scholars to
have explained particular historical linguistic changes in such 'functional'
terms. But in fact these attempted explanations are not genuinely functional
according to the spirit in which I have argued the term should be taken.
It is crucial to note that 'contrast', 'allomorphy', and 'homophony', as
Lass uses them, are terms describing a language system, and not language
use. In other words, quite clearly, these terms do not describe phenomena
in the Arena of Use. Instances of contrast, mean degree of allomorphy,
and pervasiveness of homophony can all be ascertained from inspection
of a grammar, without ever observing a single speaker in action. This
is of course what makes them attractive to many linguists. These are formal
properties, in the same way that the simplicity of a grammar, measured
in whatever way one chooses, is a formal property. Martinet's 'functional
load' is likewise a formal property of language systems, not of language
use, which may account for the failure of that concept to blossom as
a tool of functional explanation. Obviously, the presence of contrast makes
itself/e/i in the Arena of Use, but then so do most other aspects of grammars.
In fact, an old and important debate in the transition from post-
Bloomfieldian structuralist phonology to generative phonology sheds light
on the relation between contrast, competence, and functionally motivated
language change. The classical, taxonomic, or autonomous phoneme, whose
essence was that it was defined in terms of contrast, was the central concept
of pregenerative phonology. This was before the emergence of a better
understanding of the competence/performance, or I-language/E-language,
distinction, that came with the advent of generative linguistics. To the
surprise of some, it turned out that generative phonology, conceived as
a model of an individual's mentally represented knowledge of the sound
pattern of his language, had no place at all for the classical phoneme.
The classical phoneme simply did not correspond to any linguistically
significant level of representation in competence grammars. The phone-
micists who found this puzzling had no arguments against this conclusion,
yet puzzlement remained, in some quarters. And, in 1971, a postscript
to the debate appeared, an article by Schane (Schane, 1971), which pointed
the way to a resolution of the puzzle. But even 1971 was too close to
the events for matters to have become completely clear, and Schane's
postscript still leaves something rather unsettled; I now offer a post-
postscript, taking Schane's ideas, and showing how they can be well
accommodated within the picture of the interaction between the LAD
and the Arena of Use.
Schane points to attested or ongoing sound changes in a number of
languages (French nasalisation, Rumanian Palatalisation, Rumanian de-
labialisation, Nupe palatalisation and labialisation, and Japanese palata-
lisation). These changes conform to a pattern:
"If, on the surface, a feature is contrastive in some environments but not in others,
that feature is lost where there is no contrast". (Schane, 1971:505)
On the basis of these examples, Schane maintains that, for the speakers
involved, the (approximately) phonemic level of representation at which
these contrasts exist must have had some psychological validity. But he
has this problem:
"Transformational phonology rejects the phoneme as a unit of surface contrast, so

the theory has no way of identifying contrasts, and therefore no basis for identifying
alternations (cf. Schane 1971:514). N o point in derivations exists where contrasts are
identified". (Hudson, 1980:116)
Faced with the problem of reconciling some kind of psychological validity

for the phoneme with the accepted conclusions of generative phonology,
Schane argues in detail that representations at the phonemic level can
be calculated from generative descriptions. The necessary calculations
involve a partition of the rules into two types (morphophonemic and
phonetic) and inspection of the derivations involvingjust rules of the former
type. Note that the partition of phonological rules into morphophonemic
and phonetic is also not directly represented in a generative grammar (of
the type Schane was assuming) and must itself be calculated. So though
a phonemic level may be accessible through a generative grammar, it is
certainly not retrievable in any simple way. Schane's dilemma was that
he, like others, "felt guilty about disinheriting the child [the phoneme]"
(520), but since linguistic theories at the time were only competence theories,
he had no obvious place to locate the phoneme.
The classical phoneme was never as well-behaved as its structuralist
proponents, some of whom wanted to build it into a bottom-up discovery
procedure for grammars, would have liked. Languages often use a contrast
distinctively in one environment, but ride roughshod over the distinction
in productive phonological rules elsewhere. An example is English /s/~/z/, a
phonemic contrast 'demonstrated' by the existence of many minimal pairs
(sue/zoo, bus/buzz, racer/razor), but neutralized in many environments
by some of the most productive phonological rules of English, the voicing
assimilation rules involving the plural, 3rd person singular present tense,
and possessive morphemes. Naturally, the phonemicists had a story to
tell about such problems, but they were typically epicyclic. What could
not be saved was the idea that the main thing a speaker knows about
the sounds of his language is a set of surface contrasts, which serve
everywhere to 'keep words apart' (Hockett's phrase).
But of course, by and large, in the rough and tumble of everyday
communication, enough words do get kept apart for decoding and successful
communication to take place, much of the time. If phonological rules
could obliterate all predictable distinctions between words, communication
would break down. Some neutralizations are clearly permissible; the typical
redundancy of language allows decoding in spite of them. But the situation
cannot get out of hand. This suggests that the proper place for something
like the 'Phonemic principle' is the Arena of Use. Speakers who allow
their phonetic performance to stray too far away from the surface contrasts
used as clues in reception by hearers are likely to be misunderstood. To
remain as (linguistically) successful members of the speech community,
they learn to respect, in a rough and ready way, a degree of surface
contrastivity.
I believe that Schane's basic account of the sound changes he discusses
does illuminate them. Something puzzling (e.g. denasalisation following
hard on the heels of nasalisation) is made to seem less puzzling by drawing
attention to the fact that this happened in an environment where no surface
contrast was lost. But Schane's principle is only explanatory in this weak
sense; it lacks the predictive power that Lass calls for, and falls into Lass's
category of 'a theory based merely on post hoc identification'. As Hock
(1976) points out:
"Though such changes undeniably occur, [Schane's] general claim is certainly t o o strong.
Note, first of all that the similar loss of u-umlaut before remaining u, referred to as
an 'Old Norse' change ..., is actually limited to Old Norwegian (cf. Benediktsson 1963)
- Old Icelandic does not participate in it:... Moreover, among such frequent conditioned
changes as palatalization and umlaut, examples of such a 'reversal' of change seem
extremely infrequent, suggesting that the phenomenon is quite rare". (Hock, 1976:208)
What is needed, to explain particular sound changes, is a demonstration

that particular contrasts are felt so important that actions occur in the
Arena of Use tending to prevent loss of such contrasts. Such demonstrations
are likely to be very difficult, because they involve delving into the very
messy data of the Arena of Use in search of clear indications involving
individual words, phonemes, etc. The confrontation with the messy data
of the Arena of Use is, however, far less daunting if one heeds the crucial
point made by Foley and Van Valin:
"It must be emphasised that functional theories are not performance theories. That
is, they seek to describe language in terms of the types of speech activities in which
language is used as well as the types of constructions which are used in speech activities.
They do not attempt to predict the actual tokens of speech events. ... They are theories
of systems, not of actual behavior". (Foley and Van Valin, 1984:15)
Unfortunately, this is expressed slightly inaccurately, in my terms. I would

rather have said: 'functional theories are theories of performance types,
and not of performance tokens'. The point is clear, however, and should
be invoked to protect functional theories from disappearing without trace
into the ultimate morasse of particular events. But the warning may still
not be strong enough, because even functional hypotheses in terms of
particular construction types and speech activity types are likely to be

met by counterexamples.
To overcome this and Lass's correct criticism of the 'particularism' of
functional explanations, we somehow need to get a good statistical grip
on the functional factors that affect language change. It is to be hoped
that broad classes of events in the Arena of Use are susceptible to statistical
treatment, even though individual events may appear to be more or less
random. A theory of functional language change is, for the foreseeable
future, only likely to be successful in characterising the statistical distri-
bution of possible end-results of change. In this way it will be predictive
in the same sense as, say, cosmology, is predictive. A given cosmological
theory may predict that background microwave radiation from all directions
in the universe varies only within very narrow limits (a statistical statement),
but it will make no predictions at all about the particular variations.
In starting to get to theoretical grips with phenomena in the Arena
of Use, it will be important to note Bever's guiding words:
"I have attempted to avoid vague reference to properties such as "mental effort"
"informativeness" "importance" "focus" "empathy" and so on. I do not mean that
these terms are empty in principle: however they are empty at the moment, and
consequently can have no clear explanatory force". (Bever, 1975:600-601)
Many well-intentioned attempts to establish foundations for functional

theories of language change, as, for instance, in Martinet (1961), fall foul
of this problem. But there are positive developments, too. The parsing
explanation for word order universals offered by Hawkins (1990) makes
precise a notion of economy in parsing that rescues a 'principle of least
effort' (Zipf, 1949), in this area at least, from vagueness and vacuity. And
I would add a reservation to Bever's warning. Terms and concepts acquire
explanatory force by being invoked in plausible explanations of wide ranges
of phenomena. We don't know in advance just where on the theoretical/
observational continuum notions like 'mental effort' and 'informativeness'
will fall. They may turn out to be relatively abstract notions, embedded
in a quite highly structured theory. In such a case, their explanatory force
would derive from the part they play in the explanatory success of the
theory as a whole: it will not be possible to evaluate their contribution
in isolation.
Lass's reluctance to take up his constructive second option, 'Develop
a reasonably rigorous, non-particularistic theory with at least some pre-
dictive power', is curious. We build theories, the best that the domains
concerned permit, to gain illumination about the world. As long as we
don't, we remain in the dark. Of course, we should also avoid building
theories only where the (usually mathematical) light is good, like the
proverbial man searching for his keys under the street lamp, rather than
where he had dropped them, because the light was better under the lamp.
But it is precisely because the light is (at present) dim in the area of functional
influences on language change that adequate functional theories have not
emerged.
Perhaps in some cases there are indeed no functional causes of language
change, and the changes merely come about by random drift such as one
may expect in any complex culturally transmitted system. But it would
be quite unreasonable to assert that in no cases does the factor of usefulness
exert a pressure for change. The fact that we are unable to pinpoint specific
instances should not be confused for an argument that changes caused
by factors of usefulness do not exist. We can't see black holes in space,
but we have good reasons to believe they exist. Does anyone really doubt
that languages are useful systems and that (some) changes in them are
brought about by factors of usefulness? The only (!) issue is of the precise
nature and extent of the mechanisms involved.
2.6. Language drift
A number of recent studies in diachronic linguistics have proposed

evolutionary tendencies in the histories of languages.
Bybee (1986), for example, argues for the universal origin of grammatical
morphemes in independent lexical items.
"... the types of change that create grammatical morphemes are universal, and the same
or similar material is worn down into grammatical material in the same manner in
languages time after time ..." (Bybee, 1986:26)
"... grammatical morphemes develop out of lexical morphemes by a gradual process

of phonological erosion and fusion, and a parallel process of semantic generalisation".
(Bybee, 1986:18)
Mithun (1984) proposes that noun incorporation (NI) develops diachro-

nically along a specific route:
"NI apparently arises as part of a general tendency in language for V's to coalesce
with their non-referential objects, as in Hungarian and Turkish. The drift may result
in a regular, productive word formation process, in which the NI reflects a reduction
of their individual salience within predicates (Stage I). Once such compounding has
become well established, its function may be extended in scope to background elements
within clauses (Stage II). In certain types of languages, the scope of NI may be extended
a third step, and be used as a device for backgrounding old or incidental information
within discourse (Stage III). Finally, it may evolve one step further into a classificatory
system in which generic NP's are systematically used to narrow the scope of V's with
and without external NP's which identify the arguments so implied (Stage IV)". (Mithun,
1984:891)
Mithun goes on to describe other tendencies for change that languages

may undergo, in cases where the evolutionary process is arrested at any
of these stages.
Traugott (1989) discusses 'paths of semantic change' in terms of the
following three closely related tendencies:
"Tendency I: Meanings based in the external described situation > meanings based
in the internal (evaluative/perceptual/cognitive) described situation.
Tendency II: Meanings based in the external or internal described situation > meanings
based in the textual and metalinguistic situation.
Tendency III: Meanings tend to become increasingly based in the speaker's subjective
belief state/attitude toward the proposition. ...
All three tendencies share one property: the later meanings presuppose a world not
only of objects and states of affairs, but of values and of linguistic relations that cannot
exist without language. In other words, the later meanings are licensed by the function
of language". (Traugott, 1989:34-35)
Naturally, the proposals of Bybee, Mithun, and Traugott are subject to

normal academic controversy, but it seems likely that some core of their
central ideas will stand the test of time. For my purpose, the crucial core
to all these proposals is the proposition that there exist specific identifiable
mechanisms affecting the histories of languages continuously over stretches
longer than a single generation. If this is true, which seems likely, then
there must be some identifiable property of the language acquirer's
experience which has the effect of inducing a competence different in some
way from that of the previous generation. If such patterning in the input
data were not possible, there could be no medium through which such
long-term diachronic mechanisms could manifest themselves; the diachronic
spiral through LAD and Arena of Use would not exist; languages would
be only reinvented with each generation, and they would contain no 'growth
marks', in the sense of Hurford (1987).
3. C O N C L U S I O N
Language, in some broad sense, is equally an object of interest to biologists,

to students of language acquisition, of grammatical competence, and of
discourse and pragmatics, and to historical linguists. Each of these
disciplines has its own perspective on the object (e.g. focussing on E-

language or I-language), but the perspectives must ultimately be mutually
consistent and able to inform each other. The biological linguist is concerned
with the innate human properties giving rise to the acquisition of uniformly
structured systems across the species. The student of language acquisition
is concerned with the interplay between these innate properties of the
grammar representation system, other aspects of internal structure (e.g.
innate processing mechanisms), and the learner's experience of the physical
and social world. Students of discourse and pragmatics focus on, and hope
to be able to explain and predict, certain patterning in the social linguistic
intercourse which the learner experiences. Such patterning makes some
impact on the grammatical competence acquired, resulting in the gram-
maticalisation of discourse processes, at which point the phenomena engage
the attention of the student of competence. Frequency monitoring and
individual creativity play a part in this diachronic spiral through grammars
and use, by which languages develop, giving rise to the processes studied
by the historical linguist.
The LAD is born into, and lives in, the Arena of Use. The Arena does
not, in the short term, shape the Device, but, in conjunction with it, shapes
the learner's acquired competence. The interaction between this competence
and the enveloping Arena reconstructs the Arena in readiness for the entry
of the next wave of LADs.
FOOTNOTES
1. Pinker and Bloom mention some of the evidence for this:
"Bever, Carrithers, Cowart, and Townsend (1989) have extensive experimental data showing
that right-handers with a family history of left-handedness show less reliance on syntactic
analysis and more reliance on lexical association than d o people without such a genetic
background.
Moreover, beyond the "normal" range there are documented genetically-transmitted
syndromes of grammatical deficits. Lenneberg (1967) notes that specific language disability
is a dominant partially sex-linked trait with almost complete penetrance (see also Ludlow
and Cooper, 1983, for a literature review). More strikingly, Gopnik, 1989, has found
a familial selective deficit in the use of morphological features (gender, number, tense,
etc.) that acts as if it is controlled by a dominant gene". (Pinker and Bloom, 1990)
2. Sperber and Wilson's theory is, however, still controversial. See the peer review in Behavioral
and Brain Sciences 10 (1987), also the exchange in Journal of Semantics 5 (1988), and Levinson
(1989).
3. This is how Fodor (1976) casts a theory of language:
"The fundamental question that a theory of language seeks to answer is: How is it
possible for speakers and hearers to communicate by the production of acoustic wave
forms?". (Fodor, 1976:103)
4. In this quotation, I have (with the author's approval) three times replaced an original
instance of 'speakers' with 'language learners' and (indicating a shift in my opinion about
certain numeral expressions) replaced 'preferred usage' with 'a fact of grammar'.
5. This convention is actually quite standard. Pinker, for example, adopts this usage: 'What
the Uniqueness principle does is ensure that languages are generally not in proper inclusive
relationships. When the child hears an irregular form and consequently drives out its
productively generated counterpart, he or she is tacitly assuming that there exists a language
that contains the irregular form and lacks the regular form, and a language that contains
the regular form and lacks the irregular form, but no language that contains both". (Pinker,
1984:360)
REFERENCES
Andersen, H. 1973. Abductive and Deductive change. Language 40. 765-793.

Atkinson, M. 1982. Explanations in the Study of Child Language Development. Cambridge:
Bates, E., I. Bretherton and L. Snider. 1988. From First Words to Grammar: Individual
Differences and Dissociable Mechanisms. Cambridge: Cambridge University Press.
Bates, E. and B. MacWhinney. 1987. Competition, Variation, and Language Learning. In
B. MacWhinney (ed.) Mechanisms of Language Acquisition. 157-193. Hillsdale, New Jersey:
Erlbaum.
Benediktsson, H. 1963. Some Aspects of Nordic Umlaut and Breaking. Language 39. 409-
431.
Beukema, F. and P. Coopmans. 1989. A Government-Binding Perspective on the Imperative
in English. Journal of Linguistics 25. 417-436.
Bever, T. G. 1975. Functional Explanations Require Independently Motivated Functional
Theories. In R. E. Grossman, L. James San and T. J. Vance (eds.) Papers from the Parasession
on Functionalism. 580-609. Chicago: Chicago Linguistic Society.
Bever, T. G., and D. T. Langendoen. 1971. A Dynamic Model of the Evolution of Language.
Linguistic Inquiry 2. 433-463.
Bever, T. G., C. Carrithers, W. Cowart and D. J. Townsend. (in press). Tales of two sites:
The quasimodularity of language. In A. Galaburda (ed.) Neurology and Language.
Bickerton, D. 1981. Roots of Language. Ann Arbor, Michigan: Karoma.
Borer, H. and K. Wexler. 1987. The Maturation of Syntax. In T. Roeper and E. Williams
(eds.) Parameter Setting. 123-172. Dordrecht: Reidel.
Borer, H. and K. Wexler. 1988. The Maturation of Grammatical Principles. Ms. Department
of Cognitive Science, University of California at Irvine.
Brown, C. H. 1984. Language and Living Things: Uniformities in Folk Classification and Naming.
New Brunswick: Rutgers University Press.
Bybee, J. L., and M. A. Brewer. 1980. Explanation in Morphophonemics: Changes in Provençal
and Spanish Preterite Forms. Lingua 52. 271-312.
Bybee, J. L. 1986. On the Nature of Grammatical Categories. Proceedings of the Second
Eastern States Conference on Linguistics. 17-34. Ohio State University.
Cerdegren, H. and D. Sankoff. 1984. Variable Rules: Performance as a Statistical Reflection

of Competence. Langugae 50. 333-355.
Chomsky, A. N. 1965. Aspects of the Theory of Syntax. Cambridge, Massachusetts: MIT
Press.
Chomsky, A. N. 1979. Language and Responsibility. Hassocks, Sussex: Harvester Press.
Chomsky, A. N. 1986. Knowledge of Language: its Nature, Origin, and Use. New York: Praeger.
Chomsky, A. N. and H. Lasnik. 1977. Filters and Control. Linguistic Inquiry 8. 425-504.
Clark, H., and S. E. Haviland. 1974. Psychological Processes as Linguistic Explanation.
In David Cohen (ed) Explaining Linguistic Phenomena. 91-124. Washington D. C.:
Hemisphere Publishing Corporation.
Coker, C. H., N. Umeda and C. P. Browman. 1973. Automatic Synthesis from Ordinary
English Text. IEEE Transactions on Audio and Electroacoustics AU-21. 293-8.
Coopmans, P. 1984. Surface Word-Order Typology and Universal Grammar. Language 60.
5-69.
Corbett, G. 1983. Hierarchies, Targets and Controllers: Agreement Patterns in Slavic. London:
Croom Helm.
Dawkins, R. 1982. The Extended Phenotype: the Gene as the Unit of Selection. Oxford: Oxford
University Press.
Downes, W. 1977. The Imperative and Pragmatics. Journal of Linguistics 13. 77-97.
Du Bois, J. W. 1980. Beyond Definiteness: The Trace of Identity in Discourse. In W. L.
Chafe (ed.) The Pear Stories: Cognitive Cultural and Linguistic Aspects of Narrative
Production. 207-274.
Du Bois, J. W. 1985. Competing Motivations. In John Haiman (ed.) Iconicity in Syntax.
343-365. Amsterdam: John Benjamins.
Du Bois, J. W. 1987. The Discourse Basis of Ergativity. Language 63. 805-855.
Edie, J. 1976. Speaking and Meaning: the Phenomenology of Language. Bloomington, Indiana:
Indiana University Press.
Fidelholtz, J. L. 1975. Word Frequency and Vowel Reduction in English. Papers from the
Eleventh Regional Meeting of the Chicago Linguistic Society. 200-213.
Fodor, J. A. 1976. The Language of Thought. Hassocks, Sussex: Harvester Press.
Foley, W. A. and R. D. Van Valin Jr. 1984. Functional Syntax and Universal Grammar.
Cambridge: Cambridge University Press.
Foley, W. A. 1986. The Papuan Languages of New Guinea. Cambridge: Cambridge University
Press.
Fries, C. C. and K. L. Pike. 1949. Coexistent Phonemic Systems. Language 25. 29-50.
Givon, T. 1979. On Understanding Grammar. New York: Academic Press.
Givon, T. 1986. Prototypes: Between Plato and Wittgenstein. In C. Craig (ed.) Noun Classes
and Categorization. 77-102. Amsterdam: John Benjamins.
Gleitman, L. R., E. Newport and H. Gleitman. 1984. The Current State of the Motherese
Hypothesis. Journal of Child Language 2. 43-81.
Gold, E. M. 1967. Language Identification in the Limit. Information and Control 10. 447-
474.
Golinkoff, R. M. and L. Gordon. 1983. In the Beginning was the Word: a History of the
Study of Language Acquisition. In R. M. Golinkoff (ed.) The Transition from Prelinguistic
to Linguistic Communication. 1-25. Hillsdale, New Jersey: Lawrence Erlbaum.
Gopnik, M. 1989. A Featureless Grammar in a Dysphasic Child. Ms. Department of
Linguistics, McGill University.
Greenberg, J. H. 1966. Some Universals of Grammar with Particular Reference to the Order
of Meaningful Elements. In J. H. Greenberg (ed.) Universals of Language. Cambridge,
Grimshaw, A. D. 1989. Infinitely Nested Chinese 'Black Boxes': Linguists and the Search
for Universal (Innate) Grammar. Behavioral and Brain Sciences 12. 339-340.
Grimshaw, J. and S. Pinker. 1989. Positive and Negative Evidence in Language Acquisition.
Behavioral and Brain Sciences 12. 341-342.
Gropen, J., S. Pinker, M. Hollander, R. Goldberg and R. Wilson. 1989. The Learnability
and Acquisition of the Dative Alternation in English. Language 65. 203-257.
Hasher, L. and R. T. Zacks. 1984. Automatic Processing of Fundamental Information: the
Case of Frequency of Occurrence. American Psychologist 39. 1372-1388.
Hawkins, J. A. 1990. A Parsing Theory of Word Order Universals. Linguistic Inquiry 21.
223-261.
Hock, H. H. 1976. Review article on Raimo Anttila 1972. An Introduction to Historical
and Comparative Linguistics. New York: Macmillan. Languaqe 52. 202-220.
Hooper, J. 1976. Word Frequency in Lexical Diffusion and the Source of Morphophonological
Change. In W. M Christie, Jr. (ed.). Current Progress in Historical Linguistics. 95-105.
Amsterdam: North Holland.
Horning, J. J. 1969. A Study of Grammatical Inference. Doctoral Dissertation, Stanford
University.
Hudson, G. 1980. Automatic Alternations in Non-Transformational Phonology. Language
56. 94-125.
Hurford, J. R. 1987. Language and Number. Oxford: Basil Blackwell.
Hurford, J. R. 1989. Biological Evolution of the Saussurean Sign as a Component of the
Language Acquisition Device. Lingua 77. 245-280.
Hurford, J. R. 1991a. The Evolution of the Critical Period for Language Acquisition.
Cognition.
Hurford, J. R. 1991b. An Approach to the Phylogeny of the Language Faculty. In J. A.
Hawkins and M. Gell-Mann (eds.) The Evolution of Human Languages. Santa Fe Institute
Studies in the Sciences of Complexity, Proceedings vol. X. Addison Wesley.
Hyman, L. M. 1984. Form and Substance in Language Universals. In Brian Butterworth,
B. Comrie and O. Dahl (eds.) Explanations for Language Universals. 67-85. Berlin: Mouton.
Ingram, D. 1979. Cross-linguistic Evidence on the Extent and Limit of Individual Variation
in Phonological Development. Proceedings of the 9th International Congress of Phonetic
Sciences. Institute of Phonetics. University of Copenhagen.
Itkonen, E. 1978. Grammatical Theory and Metascience: a critical investigation into the
methodological and philosophical foundations of' autonomous' linguistics. Amsterdam: John
Benjamins.
Itkonen, T. 1977. Notes on the Acquisition of Phonology. English summary of: Huomiota
lapsen aanteiston kehitykseka. Virittaja. 279-308. (English summary 304-308).
Koopmans-van Beinum, F. J. and J. H. Harder. 1982/3. Word Classification, Word Frequency
and Vowel Reduction. Proceedings of the Institute of Phonetic Sciences of the University
of Amsterdam 7. 61-9.
Kripke, S. 1982. Wittgenstein on Rules and Private Language. Cambridge, Massachusetts:
Kroch, A. 1989. Language Learning and Language Change. Behavioral and Brain Sciences
12. 348-349.
Labov, W. 1969. Contraction, Deletion and Inherent Variability of the English Copula.
Language AS. 716-762.
Lasnik, H. 1981. Learnability, Restrictiveness, and the Evaluation Metric. In C. L. Baker
and J. J. McCarthy (eds.) The Logical Problem of Language Acquisition. 1-21. Cambridge,
Lass, R. G. 1980. On Explaining Language Change. Cambridge: Cambridge University Press.
Lenneberg, E. H. 1967. Biological Foundations of Language. New York: John Wiley and
Levinson, S. C. 1989. A review of Relevance. Journal of Linguistics 25. 455-472.

Lightfoot, D. W. 1979. Principles of Diachronic Syntax. Cambridge: Cambridge University
Press.
Lightfoot, D. W. 1983. The Language Lottery: Toward a Biology of Grammars. Cambridge,
Lightfoot, D. W. 1988. Creoles, Triggers and Universal Grammar. In C. Duncan-Rose and
T. Vennemann (eds.) On Language: Rhetorica, Phonologica, Syntactica: A Festschrift for
R. P. Stockwell from his Friends and Colleagues. 97-105. London: Routledge.
Lightfoot, D. W. 1989a. The Child's Trigger Experience: Degree-O Learnability. Behavioral
and Brain Sciences 12. 321-334.
Lightfoot, D. W. 1989b. Matching Parameters to Simple Triggers. Behavioral and Brain Sciences
12. 364-371.
Locke, J. L. 1986. Speech Perception and the Emergent Lexicon: an Ethological Approach.
In P. Fletcher and M. Garman (eds.) Language Acquisition: Studies in First Language
Development (2nd ed.). 240-250. Cambridge: Cambridge University Press.
Ludlow, C. L. and J. A. Cooper. 1983. Genetic Aspects of Speech and Language Disorders:
Current status and future directions. In Ludlow, C. L. and J. A. Cooper (eds.) Genetic
Aspects of Speech and Language Disorders. New York: Academic Press.
Macken, M. A. 1980. Aspects of the Acquisition of Stop Consonants. In Yeni-Komshian
et al (eds.) Child Phonology. New York: Academic Press.
Macken, M. A. 1987. Representation, Rules, and Overgeneralization in Phonology. In B.
MacWhinney (ed.) Mechanisms of Language Acquisition. 367-397. Hillsdale, New Jersey:
Erlbaum.
McCawley, J. 1984. Review of White (1982). Language 60. 431-436.
MacWhinney, B. 1987a. The Competition Model. In B. MacWhinney (ed.) Mechanisms of
Language Acquisition. 249-308. Hillsdale, New Jersey: Erlbaum.
MacWhinney, B. 1987b. Applying the Competition Model to Bilingualism. Applied Psycho-
linguistics 8. 315-327.
Mallinson, G. 1987. Review of B. Butterworth. B. Comrie and O. Dahl. (eds.) Explanations
for Language Universals. Berlin: Mouton. Australian Journal of Linguistics 7. 144-150.
Martin, L. 1986. 'Eskimo Words for Snow': A Case Study in the Genesis and Decay of
an Anthropological Example. American Anthropologist 88.2 (June). 418-423.
Martinet, A. 1961. A Functional View of Language. Oxford: Clarendon Press.
Miller, G. 1981. Language and Speech. San Francisco: Freeman.
Milroy, L. 1985. What a Performance! some Problems with the Competence-Performance
Distinction. Australian Journal of Linguistics 5. 1-17.
Mithun, M. 1984. The Evolution of Noun Incorporation. Language 60. 847-894.
Moder, C. L. 1986. Productivity and Frequency in Morphological Classes. Proceedings for
the Second Eastern States Conference on Linguistics. Columbus, Ohio: Ohio State University.
Muhlhausler, P. and R. Harre. 1990. Pronouns and People. Oxford: Basil Blackwell.
Neu, H. 1980. Ranking of Constraints on / t , d / deletion in American English. In W. Labor
(ed.) Locating Language in Time and Space. 37-54. New York: Academic Press.
Newmeyer, F. J. 1980. Linguistic Theory in America: The First Quarter-Century of Trans-
formational Generative Grammar. New York: Academic Press.
Newmeyer, F. J., forthcoming. Functional Explanations in Linguistics and the Origin of
Language. Language and Communication.
Pateman, T. 1985. From Nativism to Sociolinguistics: Integrating a Theory of Language
Growth with a Theory of Speech Practices. Journal for the Theory of Social Behaviour
15. 38-59.
Phillips, B. 1984. Word Frequency and the Actuation of Sound Change. Language 60. 320-
342.
Piattelli-Palmarini, M. 1989. Evolution, Selection, and Cognition: From 'learning' to parameter

setting in Biology and the Study of Language. Cognition 31. 1-44.
Pinker, S. and P. Bloom. 1990. Natural Language and Natural Selection. Behavioral and
Brian Sciences 13.
Pullum, G. 1989. The Great Eskimo Vocabulary Hoax. Natural Language and Linguistic
Theory 7. 275-281.
Romaine, S. 1982. Socio-Historical Linguistics: its Status and Methodology. Cambridge:
Saussure, F. de 1966. Course in General Linguistics (translated by Wade Baskin). New York:
McGraw Hill.
Schane, S. A. 1971. The Phoneme Revisited. Language 47. 503-521.
Sperber, D. and D. Wilson. 1986. Relevance: Communication and Cognition. Oxford: Basil
Blackwell.
Thiemann
Traugott, E. C. 1989. On the Rise of Epistemic Meanings in English: an Example of

Subjectification in Semantic Change. Language 65. 31-55.
Wexler, K. and P. W. Culicover. 1980. Formal Principles of Language Acquisition. Cambridge,
Wexler, K. 1981. Some Issues in the Theory of Learnability. In C. L. Baker and J. J. McCarthy
(eds.) The Logical Problem of Language Acquisition. 30-52. Cambridge, Massachusetts: MIT
Press.
White, L. 1982. Grammatical Theory and Language Acquisition. Dordrecht: Foris.
Wright, C. W. 1979. Duration Differences between Rare and Common Words and their
Implications for the Interpretation of Word Frequency Effects. Memory and Cognition
7. 411-419.
Zipf, G. K. 1949. Human Behavior and the Principle of Least Effort. Cambridge, Massachusetts:
Addison Wesley.
Locality and Parameters again
Rita Manzini
University College London
A significant issue in the theory of parameterisation is whether there is

a parameter associated with the definition of locality, as proposed in
Manzini and Wexler (1987), Wexler and Manzini (1987), or the relevant
parameterisation effects are produced by other parameters, not associated
with the definition of locality, as proposed in Pica (1987). In this paper
I aim to show that the latter solution is inadequate, and hence the former
solution remains necessary on descriptive adequacy grounds. The conclu-
sion, if correct, is directly relevant to the psycholinguistic discussion
revolving around the Subset Principle.
Remember that according to the formulation of the Subset Principle
in Wexler and Manzini (1987) and Manzini and Wexler (1987), given a
parameter p with values pj and pj, and the two languages L( and Lj generated
under p; and pj respectively, value Pi is selected by the language learner
just in case two conditions are verified: first, L; is compatible with the
input data D; second, if Lj is also compatible with the data D, then Lj
is a subset of Lj. Obviously, under this formulation, the Subset Principle
is void unless subset relations hold between the languages generated under
the different values of a parameter. Furthermore, if the Subset Principle
is to determine the order of learning in all cases, it is necessary that subset
relations hold between the languages generated under each two values
of each parameter. This latter requirement corresponds to the Subset
Condition of Manzini and Wexler (1987), Wexler and Manzini (1987).
In order to show that the Subset Principle can be a sufficient condition
for learning, it is necessary to show that the Subset Condition holds. This
result is not proved in Manzini and Wexler (1987), Wexler and Manzini
(1987); in fact, well established parameters such as the null subject
parameter, as discussed notably in Hyams (1986), or head ordering
parameters, violate the Subset Condition under any of their formulations.
However the question arises whether the Subset Condition holds of at
least some parameters; if so, the Subset Principle can be at least a necessary
condition for learning. This latter question is answered positively in Manzini
and Wexler (1987), Wexler and Manzini (1987) on the basis of the parameter
associated with the definition of locality for binding. If on the other hand
the alternative to this parameter provided in Pica (1987) is correct, then
138 Rita Manzini
there is no longer any argument for even the weak version of the Subset
Condition, and the Subset Principle remains completely unsupported.
In the present study, I do not set out to uncover new evidence in favour
of the subset theory of learning. What I will argue however is that at
least the original evidence for it stands, in that precisely a locality parameter
for binding of the type in Manzini and Wexler (1987), Wexler and Manzini
(1987) is necessary, and an approach of the type in Pica (1987) is insufficient.
1. LOCALITY
To begin with, I assume that the basic structure of an English sentence

is as in (1):
(1) CP
This is the structure proposed in Chomsky (1986a; b), except that the
subject is taken to move to the Spec of IP position, where it can be assigned
a Case, from a VP-adjoined position, where it can be assigned a theta-
role, as in Sportiche (1988).
It is generally assumed that the locality theory for movement is based
at least in part on a notion of government, which following Chomsky
(1986b) is formulated as in (2) in terms of a notion of barrier:
(2) ft governs a iff

there is no y such that 7 is a barrier for and 7 excludes /?
In Manzini (1988; 1989) it is argued that the locality theory for movement
can be entirely based on the notion of government, if the notion of barrier
is in turn formulated as in (3)-(4). (3) defines a g-marker for a as a head

which is a sister to a or to a maximal projection that a agrees with :
(3) ¡3 g-marks a iff ¡3 is an X° and

(i) P is a sister of a; or
(ii) j8 is a sister of y and y agrees with a
(4) defines a barrier for a as a maximal projection that dominates a and,

if a has a g-marker, a g-marker for a:
(4) 7 is a barrier for a iff

7 is a maximal projection, y dominates a and
if a is g-marked, y dominates the g-marker for a
It is important to notice that (3)-(4) uses all and only the primitives used
in the definition of barrier (and minimality barrier) in Chomsky (1986b),
to the exclusion notably of the notion of subject. This in turn is the only
crucial property of (3)-(4) for the present discussion; hence the conclusions
that we will reach are essentially independent of the theory in Manzini
(1988; 1989).
Consider then the locality theory for referential dependencies. The locality
condition on anaphors in Chomsky (1981), Binding Condition A, states
that an anaphor must have an antecedent in its governing category. A
governing category for a is defined in turn as a category that dominates
a, a governor for a and a subject accessible to a . Let us compare this
definition of locality with (4). To begin with, there is no indication that
a governing category need ever be a non-maximal projection; thus in this
respect governing categories and barriers need never differ. Furthermore,
a governing category must dominate a governor for a, while a barrier
must dominate a g-marker for a , if a is g-marked. It is easy to check
however that in our theory the notion of g-marker reconstructs the notion
of governor in Chomsky (1981); thus in this respect the two definitions
of locality do not differ either. The only difference between the two remains
the notion of subject, which appears in the definition of governing category,
but not in the definition of barrier.
If so, the definition of governing category y can be given as in (5),
which is the definition of barrier in (4) with the added requirement that
7 must dominate a subject accessible to a; in the first instance a subject
can be taken to be acccessible to a just in case it c-commands a:
140 Rita Manzini
(5) 7 is a governing category for a iff

7 is a maximal projection, y dominates a, y dominates a g-marker
for a
and y dominates a subject accessible to a
As for Binding Condition A itself, it can require that an anaphor must

have an antecedent not excluded by its governing category, as in (6), thus
maximising the similarity between its requirement and a government
requirement:
(6) Given an anaphor a, there is an antecedent J3 for a such that no

governing category for a excludes y3
Consider English himself. If it is in object position, himself can refer no

further than the immediately superordinate subject, as in (7); if in subject
position, himself gives rise to illformedness, as in (8):
(7) John thinks that Peter likes himself
(8) * John thinks that heself/ himself likes Peter
The facts in (7)-(8) are predictable on the basis of (5)-(6), but also on
the basis of a government condition, stating that anaphors must have an
antecedent that governs them. Consider first the object position. VP is
a barrier for it, hence an anaphor in object position can only have an
antecedent that is not excluded by VP, if government is to be satisfied.
The VP-adjoined subject position in (1) satisfies this condition, and no
position higher than it does. Thus it correctly follows that himself in (7)
can only be bound by the embedded subject. Consider now the ultimate
subject position, in the Spec of IP, as in (1). CP is a barrier for the Spec
of IP, hence an anaphor in the Spec of IP must be bound internally to
CP, if government is to be satisfied. However, all available positions in
this domain are A'-position. Thus A-binding must violate government,
and the ungrammaticality of (8) is correctly derived.
(7)-(8), then, do not necessitate recourse to the notion of subject, and
therefore lend no support to the theory in (5)-(6) as opposed to (4). The
notion of subject is in fact needed to account for this type of examples
in Chomsky (1981; 1986a) but only because one subject position only is
postulated, the Spec of IP, a VP-external position; this has been noticed
also in Kitagawa (1986) and Sportiche (1988). Because of this, and because
under any definition of locality based only on the notion of maximal
projection (and governor/ g-marker) VP is a locality domain for the direct
object, the notion of subject must be referred to in order to allow the
locality domain of the direct object to include the immediately superordinate

subject as well.
With a pronoun such as him substituted for the anaphors in (7)-(8)
the grammaticality judgements are of course reversed, as in (9)-(10):
(9) John thinks that Peter likes him
(10) John thinks that he likes Peter
(10) is wellformed with John as the antecedent for him, while (9) is wellformed
with John again, but not Peter as the antecedent. According to Chomsky
(1981), this behaviour is again accounted for by a condition formulated
in terms of the notion of governing category in (5), Binding Condition
B; following the format of Binding Condition A, as in (6), Binding Condition
B can be rendered as in (11):
(11) Given a pronoun a, there is no antecedent /? for a such that no

governing category for a excludes )3
As in the case of (5)-(6), the theory in (5) and (11) can account for the
data, in this case (9)-( 10); but an account is equally possible in terms
of a government condition. Consider first a pronoun in object position,
as in (9). Its first barrier is VP, which under the theory of phrase structure
in (1) contains a subject position. It follows that the pronoun is correctly
predicted to be disjoint in reference from the immediately superordinate
subject, if the condition on it is that it cannot be governed by its antecedent.
Similarly, consider the subject pronoun in (10), ultimately in the Spec
of IP position. IP is not a barrier for the subject, if it is g-marked by
C; but CP is a barrier for it. Hence the superordinate subject is correctly
predicted to be a possible antecedent for the pronoun, since it does not
govern it.
As far as the object and subject position of a sentence are concerned,
or in general sentential positions, it appears then that Binding Conditions
A and B can be formulated in terms of the notion of government, as
in (12), and do not require reference to the notion of governing category
in (5):
(12) A. Given an anaphor a , there is an antecedent /? for a such

that fi governs a
B. Given a pronoun a, there is no antecedent ¡3 for such that
ft governs a
There is however a type of data in favour of the Binding Theory in (5)-

142 Rita Manzini
(6) and (11) that has not been considered so far. These data, involving
NP-internal positions will be considered in the next section, where I will
conclude that anaphors are indeed associated with the notion of governing
category, though pronouns are associated with the notion of barrier. Thus
the notion of governing category cannot be reduced to the notion of barrier,
and viceversa.
2. ENGLISH ANAPHORS A N D PRONOUNS
Suppose we assume that NP's have a structure of the type in (13), where
a is the NP's object, and /3 the NP's subject:
(13) NP
In the light of recent discussions of the internal structure of NP's, notably

English NP's, as in Abney (1987), it is doubtful that (13) and not a more
complex structure is to be postulated. I choose (13) simply for convenience;
the results obtained for (13) should in turn be extendable to a structure
such as (14), where D(et) is also a head:
(14) DP
This is especially true if in (14) the position of the subject is originally

in the Spec of NP, and Case reasons impel movement to the Spec of
DP. If so (13) represents not so much an alternative to (14), as a substructure
of (14).
If the subject in (13) is realised, then an anaphoric a must be bound
within NP, in (13) by /?. This is seen, with the English reflexive himself,
in examples of the type of (15), where NP itself is in object position:
(15) John likes [Peter's pictures of himself]
It is not difficult to see that in this case the correct predictions follow
under both our definition of barrier and the definition of governing category.
Under the former, NP is a barrier for because it is a maximal projection
that dominates a and the g-marker of a, namely N. Under the latter,
NP is a governing category for a for the same reasons and because it
also dominates a subject that c-commands a. Thus if an anaphor must
have a binder not excluded by its governing category, himself in (15) must
be bound by Peter's; the same result follows if an anaphor must have
a binder that governs it.
Consider now the case in which in (13) is not realised, as in (16), which
exactly reproduces (15) but for the absence of the subject Peter's-, crucially,
it is not necessary to the wellformedness of (16) that the subject of the
NP is interpreted as referentially dependent on John:
(16) John likes [pictures of himself]
Consider then the definitions of barrier and governing category. Under

the definition of barrier, the notion of subject, hence its presence or absence
in any given structure, is altogether irrelevant. Thus NP is a barrier for
a, whether /J is present or not. On the basis of the principle that an anaphor
must have a binder that governs it, himself in (16) is then incorrectly barred
from being referentially dependent on the sentence's subject, John, which
is NP-external.
The correct predictions, on the other hand, follow under the definition
of governing category. If (3 is missing, NP is not a governing category
for a for the simple reason that it does not have a subject. Rather the
governing category for a is the first maximal projection that does have
a subject, namely VP or IP. The prediction then is that a can be bound
by this subject; hence concretely that himself in (16) can be bound by
John.
In short, our discussion so far indicates that if NP-internal positions
are considered, a correct account of anaphoric dependencies can only be
given under a definition of locality making reference to the notion of subject.
Let us then consider the remaining examples in the himself paradigm. In
(17)-(18), himself is again in the object position of an NP, in particular
of an NP with a subject in (17) and of a subjectless NP in (18); however
the NP itself is in subject position, rather than in object position:
(17) John thought that [Peter's pictures of himself] were on sale
(18) John thought that [pictures of himself] were on sale

144 Rita Manzini
The embedded NP in (17) is a governing category for himself, since it

is a barrier for it and furthemore it contains a subject accessible to it.
In (17) then himself is required to be bound within NP; this correctly
predicts that it can be bound by Peter, but not by John. In (18) on the
other hand the embedded NP does not contain a subject accessible to
himself, nor do the embedded IP and CP. Only the superordinate VP
contains a subject accessible to himself. Hence binding of himself by John
is correctly predicted to be wellformed. Remember that an accessible subject
is simply defined as a c-commanding subject. If c-command is not defined
between two positions one of which dominates the other, the subject of
the embedded IP in (18) does not c-command himself because it contains
it; hence it is not accessible to it.
Consider on the other hand an ungrammatical example of the type of
(8) again, where himself is in the subject position of a sentence:
(8) »John thinks that [heself/ himself likes Peter]
Himself must be accessible to itself if the embedded IP is to be a governing

category for it and illformedness is to be correctly predicted. Suppose
then we assume that no position can dominate itself. It follows that any
position can c-command itself. Hence if an accessible subject is simply
a c-commanding subject the correct predictions follow. Examples of the
type in (7), where himself is in the object position of a sentence, are also
straightforwardly derived:
(7) John thinks that [Peter likes himself]
The embedded VP contains a subject accessible to himself, namely Peter,

hence it is correctly predicted that Peter but not John can bind himself.
Consider finally anaphors in the subject position of a nominal, as in
(19)-(20); each other is exemplified, rather than himself, because of the
lack of a genitive form for himself, which we can treat as purely accidental.
It is easily checked that each other otherwise has the same distribution
as himself.
(19) John and Peter like [each other's pictures]
(20) John and Peter thought that [each other's pictures] were on sale
By our definition each other in (19) is accessible to itself, exactly as in

(8). Notice however that in (19) there is an independent reason why the
matrix VP but not the embedded NP is a governing category for the
reciprocal. The reason is that in (19) the matrix verb is the g-marker for
each other in the Spec position of NP. NP then is not a governing category
for each other because it does not dominate its g-marker. Rather, the first
category that dominates the g-marker for each other is the matrix VP,
and this is its governing category. The correct predictions then follow,
in particular that each other can be bound by the matrix subject.
As for (20), our theory predicts its ungrammaticality. Notice that each
other does not have a g-marker in (20), since NP, being in subject position,
is not a sister to a head. NP itself then is a barrier for each other, and
since each other is accessible to itself by our definitions so far, NP is its
governing category. Since of course no antecedent is available for it within
NP, ungrammaticality is predicted to follow. In fact sentences of the type
in (20) appear not to be worse than their counterparts in (18).
However, leaving this problem aside, we have verified that the funda-
mental data relating to English anaphors, including himself and each other,
are correctly predicted by our theory under a subject-based definition of
locality. In doing so, we have also shown that at least in the cases considered
so far the notion of accessibility can be reduced to that of c-command.
Remember that in Chomsky (1981) accessibility is defined in terms of c-
command and of the i-within-i constraint; in particular, ft is said to be
accessible to a in case it c-commands a and it can be coindexed with
it under the i-within-i constraint. If we are correct the second part of
this definition can be eliminated altogether. Similarly, in Manzini (1983)
7 is said to be a locality domain for a just in case two independent conditions
are satisfied, which can be expressed as follows: first, y dominates a subject
that c-commands a, and second, this subject is accessible to a in the sense
that it does not violate the i-within-i constraint. In case the second condition
is not satisfied, no locality domain for a is defined. If we are correct
the whole definition of accessibility must reduce to the first of these two
conditions.
Nothing that I have said so far touches yet on pronouns. If an anaphor
and a pronoun in a language are associated with the same definition of
locality, and if locality theory is in fact a biconditional to the effect that
an element is anaphoric just in case it is bound within that locality domain,
we expect the pronoun and anaphor to have complementary distribution
in the language. This is the prediction in Chomsky (1981) for English,
and as is well known the prediction fails.
Consider first a pronoun in the object and subject position of a sentence,
as in (9) and (10) again:
(9) John thinks that [Peter likes him]
(10) John thinks that [he likes Peter]

146 Rita Manzini
In sentential positions there is in fact complementary distribution between

anaphors and pronouns in English; hence the correct predictions for
pronouns can be obtained under the subject-based definition of governing
category as for anaphors. In particular, the embedded VP is the governing
category for the pronouns in (9)-(10) and disjoint reference between the
pronoun and Peter in (9) is correctly predicted.
Consider then a pronoun in the subject position of an object NP, as
in (21); this is a clear case of noncomplementary distribution of pronouns
and anaphors in English:
(21) The boys saw [their pictures]
By the subject-based definition of locality, the embedded NP is not a

governing category for the pronoun in (21), for the same reasons for which
it is not for an anaphor. This of course yields the incorrect prediction
that their is disjoint in reference with the boys.
Suppose on the other hand the pronoun had no g-marker in (21). Then,
the embedded NP would be a barrier and a governing category for it,
their being a subject and accessible to itself, and the correct predictions
would follow. Thus we should be able to construct a theory under which
the g-marker for the Spec of NP is relevant if the Spec of NP is anaphoric,
but not if it is pronominal. The simplest way to achieve this result is
to assume that resort to a g-marker is optional in all cases. Concretely,
suppose the definition of barrier is to be modified as in (22), and the
definition of governing category is modified accordingly:
(22) 7 is a barrier for a iff

7 is a maximal projection, y dominates a,
(if a is g-marked, y dominates a g-marker for a)
Within (22) reference to the notion of g-marker is optional. Making g-

markers optional amounts to saying that the first maximal projection that
dominates a position can always count as its barrier; however if the position
under consideration has a g-marker its barrier can be extended to include
this g-marker.
It is easy to see that the locality domain defined by reference to a g-
marker is always wider than the locality domain defined without reference
to it. If so, in the case of anaphors the availability of both locality domains
is equivalent to the availability of the wider one only. For, all referential
dependency links allowed under the narrower definition of locality are
allowed under the wider one as well, though the reverse does not hold.
In the case of pronouns conversely we expect the availability of both to
be equivalent to the availability of the narrower one. Indeed all referential

dependency links allowed under the wider definition are also allowed under
the narrower, and not the reverse.
If the notion of g-marker is taken into account in (21) the matrix VP
is the governing category for the pronoun; if the notion of g-marker is
not taken into account, the embedded NP is. In the first case, coreference
between their and the boys is predicted to be impossible; but in the second
case coreference between their and the boys is correctly allowed. Consider
then a pronoun in the subject position of an NP which is itself in subject
position, as in (23):
(23) The boys thought that [their pictures] were on sale
If we are correct, the anaphoric counterpart to (23) is informed. If so,

the wellformedness of (23) under any interpretation is correctly predicted
whether the notion of g-marker is taken into account or not. In the first
case the locality domain for the pronoun is the same as for the anaphor,
and complementary distribution is predicted. In the second case the locality
domain for the pronoun is simply the embedded NP, and no disjoint
reference is predicted to arise.
Consider finally a pronoun in the object position of an NP, as in (24):
(24) The boys thought that [pictures of them] were on sale
(24) and its anaphoric counterpart appear to be both wellformed; thus

this appears to be again a case of non-complementary distribution of
pronouns and anaphors. Furthermore, the optionality of the g-marker
requirement does not help in these cases, since the g-marker for the pronoun
or anaphor, N, is internal to the first maximal projection that dominates
them.
However, it is crucial to the wellformedness of the anaphoric counterparts
to (24) that the subject-based definition of locality is chosen. The reason
is that the first subject accessible to the object of the embedded NP is
the matrix subject. One way of deriving the noncomplementary distribution
of pronouns and anaphors in examples of the type in (24) is to have recourse
to our notion of barrier for pronouns, because NP is a barrier for the
pronoun in (24), and under it no disjoint reference patterns are predicted
to arise with them. The predictions for the examples in (21) and (23) remain
unchanged, as can be easily checked; as for sentential positions, we have
already seen that the subject-based definition of locality and our definition
of barrier are always equivalent.
148 Rita Manzini
The last example to be considered involves a pronoun in the object

position of an NP again, where the NP however is in object position,
as in (25):
(25) The boys saw [pictures of them]
Again the optionality of g-markers is irrelevant here, as in (24). If the

pronoun is associated with our definition of barrier, as required by (24),
its locality domain is the embedded NP, hence no disjoint reference patterns
are predicted to arise.
In summary, if what precedes is correct, once NP-internal positions are
taken into consideration, English anaphors must be associated with the
subject-based definition of governing category, pronouns with the definition
of locality domain corresponding to our notion of barrier.
3. ITALIAN RECIPROCAL CONSTRUCTIONS
The existence of two separate notions of locality domain corresponding

to our definition of barrier and to the definition of governing category
appears to be confirmed by the Italian reciprocal I'un I'altro. The Italian
reciprocal consists of two elements, I'uno ('the one'/'each') and I'altro ('the
other'), which occupy two different types of positions and enter two different
types of dependencies. L'altro behaves like a lexical anaphor, surfacing
in A-position and entering referential dependencies with other elements
in A-position; I'uno behaves like a floating quantifier. Schematically,
configurations of the type of (26) appear to be created, where R2 corresponds
to the referential dependency between I'altro and its antecedent NP; while
R1 expresses the dependency between I'uno and NP:
(26) R2
I I
NP ... I'uno ... I'altro
Ri
It is important to stress at this point that our purpose is not to give a

full account of reciprocal constructions, either universally or for the Italian
type. Rather, what I am interested in is whether l'uno I'altro in NP-internal
position requires a subject-based definition of locality or rather our
definition of barrier. What I will conclude is that l'uno behaves according
to our definition of barrier, I'altro according to a subject-based definition
of locality.
An issue that can be largely disregarded here is whether there is indeed

a dependency corresponding to R, in (26); or there is only a dependency
corresponding to R,* in (27) and created at LF by l'uno moving to take
scope over NP. There is no doubt that (27) must be the LF for something
like (26); the question is whether R[ has also an existence of its own,
or only R|* does:
(27) l'unoj (NP ... t|... l'altro)
R,*
Two observations are in order before we dismiss the issue. First, if I'uno
at LF takes scope immediately over NP, the locality properties of R ^
are exactly the same as the locality properties of Rj. Thus (26) and (27)
are equivalent in this respect. Second, accepting that something like R]*
characterises the quantifier part of a reciprocal in English as well, as argued
for instance in Heim et al. (1988), and that L F is not parameterised, the
only hope of accounting for the discrepancies that we will see exist between
Italian and English is at s-structure. Thus (26) and (27) are not equivalent
in this respect, and there is perhaps a reason why R[ must be postulated
as an s-structure dependency.
Given this background, consider an NP in the object position of a
sentence. L'uno can float either NP-externally or NP-internally. If I'uno
floats NP-externally, the sentence is wellformed, provided NP is otherwise
subjectless. Relevant examples are of the type in (28). Notice that in our
examples the NP containing (part of) the reciprocal is systematically made
into an accusative subject of a small clause, rather than into an object;
this is to avoid as much as possible readings with the reciprocal taken
as an argument of the verb:
(28) Quei pittori considerano l'uno [ N P i ritratti dell'altro] ammirevoli

Those painters consider each the portraits of the other admirable
(28) does not chose among locality domains for l'uno, which I assume
is VP-internal. By our definition of barrier its locality domain is VP, the
first maximal projection that dominates it. The subject quei pittori ('those
painters') is then predicted to be a possible antecedent for it, correctly.
The same correct prediction, that the subject of the sentence is a possible
antecedent for l'uno, follows however under a subject-based definition of
governing category, since in this case the subject itself defines the governing
category.
Consider now I'altro. Under a subject-based definition of locality, the
locality domain for I'altro is defined by the subject of the sentence, and
150 Rita Manzini
binding of l'altro from an NP-external position is correctly predicted to

be possible. Under our definition of barrier, however, the locality domain
for l'altro is NP; thus binding of l'altro by the subject of the sentence,
which is NP-external, is incorrectly predicted to be impossible. Examples
of the type of (28) seem then to argue in favour of a subject-based definition
of locality for l'altro.
The prediction is that adding a subject to the NP in (28) produces a
sentence where l'altro cannot refer NP-externally, since NP is now the
locality domain for l'altro under a subject-based definition as well. As
I'uno is still NP-external and can only have an NP-external antecedent,
this in turn should produce an ungrammatical sentence. The prediction
seems to be correct, as in (29), where the pronominal subject of NP can
indifferently be taken to be coreferential with the subject of the sentence,
or not; judgements of this type are confirmed in Belletti (1983):
(29) *Quei pittori considerano l'uno [ N P i loro ritratti dell'altro]

ammirevoli
Those painters consider each their portraits of the other
admirable
Consider now the cases, crucial to the determination of the locality domain
for l'uno, where this floats NP-internally. These are exemplified in (30)-
(31), where (30) differs from (31) in that an overt subject is present in
NP:
(30) Quei pittori considerano [ N P i loro ritratti l'uno dell'altro] ammi-

revoli
Those painters consider their portraits each of the other admirable
(31) Quei pittori considerano [ N P i ritratti l'uno dell'altro] ammirevoli

Those painters consider the portraits each of the other admirable
(30) represents by far the easier of the two examples, though again it
does not distinguish between locality domains for l'uno. The locality domain
for l'uno is NP, both under our definition of barrier, since NP is the maximal
projection that dominates l'uno, and under a subject-based definition of
governing category, since NP has a subject. Hence the only possible
antecedent for l'uno is the subject of NP, loro ('their'). This of course
is also true for l'altro, which we have just seen to be associated with the
subject-based definition of locality. The prediction correctly is that if the
subject of NP, which is pronominal, is interpreted as coreferential with
the subject of the sentence, so is the reciprocal; but not otherwise.
Consider now (31). Contrary to examples of the type of (28), which

are generally judged wellformed, examples of the type of (31) give rise
to contradictory judgements. Certainly (31) has a wellformed interpretation
under which an empty or implicit subject of NP binds I'uno and I'altro,
and this subject in turn can or not refer to the subject of the sentence.
Of course, this interpretation is irrelevant here, reducing essentially to that
in (30), with an empty category or an implicit argument substituted for
the lexical pronoun.
The relevant interpretation is that under which I'uno and I'altro are bound
by the subject of the sentence, but not by the subject of the NP; in other
words, the admiration is reciprocal, not the portraying. If we accept the
judgement in Belletti (1983), then under this interpretation examples of
the type in (31) are informed. This in turn cannot be predicted if the
locality domain of I'uno is subject-based. For, under a subject-based
definition, if NP has no subject, or no subject distinct from I'uno I'altro,
the locality domain for I'uno is clearly the sentence. Hence binding of
I'uno by the sentential subject is predicted to be possible, incorrectly.
Suppose then we take our notion of barrier as defining the locality domain
for I'uno. The barrier for I'uno is of course NP under our definition, since
NP is a maximal projection that dominates it. L'uno must then be construed
with an antecedent within NP. Since in sentences of the type of (31) its
antecedent, the sentential subject, is NP-external, we correctly predict
ungrammaticality. Thus, the locality domain for l'uno must be defined
by our notion of barrier.
Another prediction which follows from our hypothesis that Italian
reciprocals are associated with our notion of barrier is that they can never
be found in NP's which are in (nominative) subject position unless they
are bound NP-internally. In this case, the facts are well established, as
in (32):
(32) *Quei pittori pensano che [ N P lo stile l'uno dell'altro] sia ammirevole
Those painters think that the style each of the other is admirable
There is of course no prohibition against having English reciprocals, or

reflexives in examples of the type in (32); which follows if they are associated
with a subject-based definition of locality. On the other hand, examples
of the type of (33) are predicted to be wellformed, if I'altro is associated
with a subject-based locality domain:
(33) Quei pittori pensano l'uno che [ N P lo stile dell'altro] sia ammirevole
Those painters think each that the style of the other is admirable
152 Rita Manzini
The status of (33) is extremely difficult to assess. It appears however that

if there is a violation in (33) it does not give rise to uninterpretability
judgements as (32) does. Thus we can tentatively take (33) to confirm
our account.
4. PARAMETERS IN LOCALITY THEORY
If the conclusions in Manzini (1988; 1989), as summarised in section 1,

are correct, chain dependencies are associated with the definition of barrier
in (4). On the other hand, in section 2 I have argued that referential
dependencies involving English anaphors and pronouns can be associated
with our notion of barrier or with the subject-based definition of governing
category in (5). In section 3 the existence of two separate locality domains
corresponding to our notion of barrier and to the subject-based definition
of governing category has further been argued for on the basis of the
Italian reciprocal I'uno I'altro. Of course, the existence of more than one
definition of locality for referential dependencies leads us to the issue of
whether there is a locality parameter and whether the two notions of locality
considered so far are values of this parameter.
The parameters that we will be concerned with fall essentially into two
types. The first type of parameter is what we can refer to as an ad hoc
parameter, i.e. a parameter built into the theory of locality for the sole
purpose of accounting for locality effects. The second type of parameter
is a non-ad hoc one in the sense that though it derives locality effects,
it is not associated with the theory of locality itself.
The first type of parameter is found in Manzini and Wexler (1987),
Wexler and Manzini (1987) and, before that, in Yang (1984). The theory's
starting point is the subject-based definition of locality in (5). The idea
is that as the definition in (5) refers to the notion of subject, so other
definitions of locality can refer to other opacity creating elements, such
as I, finite Tns, referential (i.e. non-subjunctive) Tns, etc. Once the locality
theory for chains is taken into consideration, as in section 1, the picture
that emerges is that there is a basic definition of locality, as in (4), to
which various opacity elements are added for referential dependencies.
This is the picture arrived at in Koster (1986).
The values of the locality parameter, i.e. the various opacity creating
elements, are not associated with languages but with single lexical items
in a language. Thus the Icelandic reciprocal has much the same form and
locality properties as English each other, but the Icelandic reflexive sig
obeys altogether different locality constraints. This is consistent with the
general hypothesis, first put forward in Borer (1984), that parameter learning
is part of the learning of the lexicon of a language; a restrictive version
of it surfaces in Manzini and Wexler (1987), Wexler and Manzini (1987)

as the Lexical Parameterisation Hypothesis (see Newson, this volume, for
discussion). Thus the locality parameter is consistent with the one generally
accepted restriction on parameters so far proposed. The various values
of the locality parameters can indeed be conceived of as features, associated
with lexical items in the way other features are.
Furthermore, the values of the locality parameter define languages each
of which is a subset, or a superset, of each of the others; they are therefore
ordered in a markedness hierarchy by the Subset Principle. Notice that
if this is the case the result that locality is not parameterised for traces,
though it is for other elements, can be derived directly from the fact that
parameters are associated with lexical items. For, if the unmarked value
of a parameter is conceived of as the default value, empty categories are
necessarily associated with it. Thus the fact that no variation ever involves
traces appears to provide evidence that markedness hierarchies are present
at least to the extent that the unmarked setting of a parameter is
distinguished from the marked setting(s). In turn the definition of mar-
kedness on the basis of the Subset Principle is supported to the extent
that it in fact predicts the correct value to be the unmarked one.
Remember on the other hand that we ultimately intend to formulate
the locality parameter so that (4) becomes the basis of a definition of
locality to which the notion of subject, as in (5), and possibly other opacity
creating elements, can be added as different settings of a parameter. Under
this picture another obvious definition of markedness can be suggested
to apply, independent of the Subset Principle. Since all definitions of locality
include (4) as one of their subparts, (4) is in fact universal. If so, (4)
can be unmarked for the simple reason that it is part of universal grammar;
other values are then marked for the simple reason that they can only
be obtained by 'manual' alteration of the inbuilt programme, where the
alteration takes the restrictive form of addition, and not deletion, of
information. Again the association of chains with (4) as the unmarked
setting for locality follows.
Notice that the choice between the subset based definition of markedness
hierarchies and the one tentatively sketched here is an entirely empirical
one, since it is obvious that their predictions must be at variance with
one another. For instance, one of the striking properties of the markedness
hierachies defined by the Subset Principle is that they differ for anaphors
and pronominals; in fact the markedness hierarchy for pronominals is
the mirror image of the markedness hierarchy for anaphors. Under a
definition of markedness of the type we are envisaging, (4) represents the
unmarked value across any other linguistic categorisations.
This, however, and other questions of a psycholinguistic nature, cannot
be settled within the limits of this investigation. Rather, the question that
154 Rita Manzini
I intend to settle is the purely linguistic one, concerning the adequacy

of ad hoc and non-ad hoc models of locality parameters. In short, if the
discussion that precedes is correct, there is a parameterised definition of
locality of the form in (34), where the requirements corresponding to the
definition of barrier must be invariably satisfied and various additional
opacity-creating elements form additional optional requirements:
(34) 7 is a locality domain for a iff 0. 7 is a maximal projection, 7

dominates a, (if a is g-marked, 7 dominates the g-marker of a)
and
1. 7 dominates a subject accessible to a; or
2. etc.
If the notion of government is defined in terms of locality domain, as

in (34), rather than in terms of barrier, it follows that not only the conditions
on movement, but also the Binding Theory, as in (12), can be formulated
as government conditions, completing the unification of locality theory
with respect at least to the notion of locality referred to.
Let us then consider what we have referred to as the non-ad hoc approach
to the locality parameter, represented notably by Pica (1987). According
to Pica (1987), as well as Chomsky (1986a), lexical anaphors move at
LF; Binding Theory, or ECP, holds of the anaphor trace link, rather than
of the anaphor and its antecedent. According to Pica (1987), there are
essentially two types of anaphors, anaphors like each other or himself which
are descriptively associated with a subject-based definition of locality, and
anaphors of the type of Icelandic sig, that are descriptively associated
with a referential Tns opacity element. The long-distance binding effects
with anaphors such as Icelandic sig, whose opacity creating element in
terms of a definition like (34) is an indicative Tns, follow from the fact
that they are X's and they move at LF from head to head. This leaves
us then with anaphors of the type of himself or each other, which are
XP's and whose movement is strictly local. The strict locality of the
movement of these anaphors follows again from the fact that these anaphors
move according to their categorial type, if Pica (1987) is correct.
Unfortunately, the theory encounters serious execution problems. Con-
sider in particular XP anaphors. If XP anaphors move according to the
XP type, only two possibilities are open: either they move to A-position
or they move to A'-position; but both options are higly problematic. If
they move to A-position, it is indeed expected that they display strictly
local binding effects; but in general there will be no A-position for them
to move into. If they move to A'-position, then there appears to be absolutely
no reason why they couldn't move successive cyclically, thus producing
long-distance binding effects once more.
I do not want to imply that these problems are insoluble; only that
their solution will involve a complication of the grammar. If so, the potential
simplicity argument for what I have called non-ad hoc theories would
disappear. What is crucial to our argument however is that the parameter
derived in Pica (1987) is a two-way parameter, between short-distance and
long-distance dependencies. If Manzini and Wexler (1987), Wexler and
Manzini (1987) are correct, there are of course many more values to the
parameter; but leaving these aside, on the basis solely of the new evidence
presented here, we must conclude that the parameter that is needed for
observational adequacy is at least a three-way one, the short-distance value
of Pica (1987) splitting into a subject-based and a non-subject-based value.
Notice that a parameter to this effect can presumably be added to the
theory in Pica (1987); but this only illustrates our point further, namely
that a locality parameter is required in any case.
In fact, a desirable property of the theory in Pica (1987) is that it links
the locality domain of anaphors and pronouns with their ability or inability
to take antecedents other than subjects. In particular, it appears to be
a fact that anaphors whose opacity element is a subject, do not necessarily
have a subject as their antecedent; on the other hand, long-distance anaphors
are subject oriented. The link between long-distance binding and subject
orientation follows if the landing site for a long-distance anaphor, which
is an X° and moves head-to-head, is I. This still needs to be stipulated
within the theory, but it at least provides a natural basis for linking
antecedence and locality.
By contrast, the theory in Manzini and Wexler (1987), Wexler and
Manzini (1987) cannot derive the link between long-distance binding and
subject orientation. Rather, the subject orientation of certain anaphors
is treated as a second parameter, the antecedent parameter. Thus it is
likely that at least this aspect of the theory in Pica (1987) is correct. This
however leaves our present argument unchanged. Once more, our argument
is simply that a crucial feature of the theory in Manzini and Wexler (1987),
Wexler and Manzini (1987) must be retained, namely parameterised locality
domains.
Any modification of the original conception of the locality parameter
in order to take into account at least the link with the antecedent parameter
must in turn raise the question whether the argument in favour of the
Subset Principle is preserved. In the meantime, however, our provisional
conclusions are supportive of the Subset Principle. If the necessity of a
parameter of the type in (34) is demonstrated, the original argument in
favor of the Subset Principle stands at least for the time being.
156 Rita Manzini
REFERENCES
Abney, S. 1987. The English Noun Phrase in its Sentential Aspect. Doctoral Dissertation,
MIT.
Belletti, A. 1983. On the Anaphoric Status of the Reciprocal Construction in Italian. The
Linguistic Review 2.
Chomsky, N. 1981. Lectures on Government and Binding. Dordrecht: Foris.
Chomsky, N. 1986a. Knowledge of Language: its Nature, Origin and Use. New York: Praeger.
Heim, I., H. Lasnik and R. May, to appear. Reciprocity and Plurality. Ms. UCLA, University
of Connecticut and UC Irvine.
Kitagawa, Y. 1986. Subjects in Japanese and English. Doctoral Dissertation, University of
Massachusetts.
Koster, J. 1986. Domains and Dynasties. Dordrecht: Foris.
Manzini, M. R. 1983. On Control and Control Theory. Linguistic Inquiry 14. 421-446.
Manzini, M. R. 1988. Constituent Structure and Locality. In A. Cardinaletti, G. Cinque
and G. Giusti (eds.) Constituent Structure. Papers from the 1987 GLOW Conference, Annali
di Ca' Foscari 27, IV.
Manzini, M. R. 1989. Locality. Ms. University College London.
Manzini, M. R. and K. Wexler. 1987. Parameters, Binding Theory and Learnability. Linguistic
Inquiry 17. 413-444.
Pica, P. 1987. On the Nature of the Reflexivization Cycle. In Proceedings ofNELS 17. GSLA.
Sportiche, D. 1988. A Theory of Floating Quantifiers and its Corollaries for Constituent
Structure. Linguistic Inquiry 19. 425-449.
Wexler, K. and M. R. Manzini 1987. Parameters and Learnability in Binding Theory. In
T. Roeper and E. Williams (eds.) Parameters in Linguistic Theory. Dordrecht: Reidel.
Yang, D.-W. 1984. The Extended Binding Theory of Anaphora. Theoretical Linguistic Research
1.
On the rhythm parameter in phonology*
Marina Nespor
University of Amsterdam
It is an observable fact that languages such as Spanish or Italian sound

very different from, say, English or Dutch. It is this difference that made
Lloyd James (1940) compare the sound of the first type of languages to
the sound of a machine-gun and that of the second type of languages
to a message in morse code. Lloyd James went then a step further by
attributing this difference in sound to different types of rhythm: "machine-
gun rhythm" and "morse code rhythm".
This dichotomy was taken over by Pike (1945) who renamed the two
types of rhythms syllable-timed and stress-timed. That is, Spanish would
have a temporal organisation based on the regular recurrence of syllables
and English one based on the regular recurrence of stresses.1 Abercrombie
(1967) went even further by claiming that the rhythm based on the isochrony
of syllables and that based on the isochrony of interstress intervals are
the only two rhythms available for the languages of the world.
The isochrony of syllables and that of interstress intervals are, moreover,
mutually exclusive. They would be compatible only in an ideal language
in which there were only one syllable type and in which stressed syllables
were maximally alternating with unstressed ones. In fact, the existence
of such a language has never been attested, so that, within this conception
of rhythm, a language belongs either to one category or to the other.
According to this view, having one type of rhythm rather than the other
has many consequences for the phonology of a language. That is, a stress-
timed language is characterised by a set of properties not present in syllable
timed languages, and vice versa. For example, if interstress intervals are
to be isochronous in English, the syllables contained in an interval must
be reduced in certain cases and stretched in other cases, to achieve the
desired result. These processes would not have any reason to exist in Spanish
where rhythm is supposedly not based on the regular recurrence of stress.
The question that will be addressed in this paper is whether there are
indeed two types of rhythm, that is, whether the machine-gun vs. morse-
code distinction is represented by the different settings of a single parameter
in the phonology of rhythm.
A very important prediction made by the postulation of such a parameter
is that no language exists that shares some of the phonological characteristics
158 Marina Nespor
typical of stress-timed languages and some typical of syllable-timed langua-

ges. That is, within a theory in which one of the two mutually exclusive
types of rhythms is at the origin of a series of phonological processes,
it is impossible to have a system whose sound is neither that of a machine-
gun, nor that of a morse code, but rather intermediate between the two.
This view of rhythm has, in addition, implications for learnability: if
there is a parameter for rhythm, it should be possible to find evidence
that this parameter is set in one of two ways on the basis of primary
linguistic data. That is, a child would select one of the two types of rhythms
depending on the language he is exposed to and would subsequently develop
the set of phonological rules that belong to that particular system.
In this paper, following a suggestion by Dasher and Bolinger (1982),
I will take a view that is the opposite of the one discussed so far. The
basic idea is that, the machine-gun vs. morse code distinction which
characterises the rhythm of Spanish on the one hand and that of English
on the other hand, is not the result of two different settings of one single
parameter. I will argue that the phonology of rhythm does not contain
a parameter that accounts for the machine-gun vs. morse-code distinction.
This distinction is the result of a series of nonrhythmic phonological
processes rather than the cause of these processes. If a specific set of
phonological processes coexist in a certain system, the language in question
gives the machine-gun impression; if another set of processes coexist, the
morse code effect is originated.
The problem we are confronted with is, first of all, how to empirically
distinguish the two alternative proposals. It is important to observe that
if there are independent reasons for the clustering together, in a given
phonological system, of the specific phonological processes that characterise
stress rhythm or of those that characterise syllable rhythm, then the task
of phonologically determining which of the two theories has greater
empirical adequacy would be a very hard one. If, however, no such
independent reasons exist, the nonparametric approach (henceforth "The-
ory 2" (T2)) makes a prediction whose verification represents a falsification
of the parametric approach ("Theory 1" (Tl) ). That is, the machine-
gun and the morse code types of languages would be extreme cases at
the two ends of a continuum in which there are languages that share some
nonrhythmic phonological properties with both and whose rhythm, con-
sequently, is perceived as neither stress-timed nor syllable-timed.
If this prediction of T2 is empirically confirmed, then the question arises
as to whether it is appropriate to speak of two different types of rhythmic
organisation. If rhythmic organisation follows general principles in all
languages then we would expect to have similar processes in the phonology
of rhythm of both so-called syllable-timed and stress-timed types of
languages. Alternatively, we would expect them to have two quite distinct

rhythmic subcomponents.
Additional empirical tests that would help us choose between T1 and
T2 come from the area of language acquisition: if T1 is on the right track,
the machine-gun vs. morse code differences should be audible in the speech
of children from the first stages of the acquisition of language. If on the
other hand, T2 is on the right track, the difference between the two should
be clear only after all the phonological processes that give rise to the
machine-gun vs. morse code effect have been developed.
In this paper, I will argue that T2 is preferable for reasons of empirical
adequacy. 2 To this end, I will first present the results of phonetic experiments
carried out by a number of phoneticians, which indicate that the dichotomy
stress-timing vs. syllable-timing is neither based on any measurable physical
reality, nor confirmed by data on perception (cf. section 1). I will then
present phonological evidence that the relation of causality between a certain
type of "rhythm" and the existence of certain nonrhythmic phonological
processes is as predicted by T2 (cf. section 2). In section 3, I will argue
in favour of a unified rhythmic subcomponent of phonology for Italian
and English, two typical examples of syllable-timing and stress-timing,
respectively, for the supporters of T l . The conclusion will then be drawn
that the rhythmic organisational principles of the two groups of languages
are the same and that the element that regularly recurs to give the impression
of "order [...] in movement" (cf. Plato, The Laws, book II: 93) is stress.
No justification is thus left for the classification of languages into stress-
timed and syllable-timed (cf. also den Os, 1988).
1. PHONETIC EVIDENCE AGAINST TWO TYPES OF TIMING
The dichotomy stress-timed and syllable-timed has largely been taken for
granted since Pike (1945), although already from the early sixties many
studies devoted to the issue have put into question the physical basis of
this dichotomy. Shen and Peterson (1962), O'Connor (1965) and Lea (1974),
for example, have shown, with different types of experiments, that in
English, interstress intervals increase in duration in a manner that is directly
proportional to the number of syllables they contain. Bolinger (1965),
besides showing that the isochrony of interstress intervals in English is
not a physical reality, finds that the length of the intervals is influenced
not only by the number of syllables they contain, but, among other factors,
also by the structure of the syllables and the position of the interval within
the utterance.
More recently, Roach (1982) carried out some experiments to test two
claims made by Abercrombie (1967:98): first, that there is variation in
160 Marina Nespor
syllable length in a stress-timed language as opposed to a syllable-timed

language, second that in the latter type of languages, stress pulses are
unevenly spaced. The languages that form the empirical basis of the
experiments are precisely those mentioned by Abercrombie: French, Telegu
and Yoruba as examples of syllable-timing, and English, Russian and Arabic
as examples of stress-timing. The first claim made by Abercrombie is not
supported by the results, since deviations in syllable duration are very
similar in all six languages. As far as the second claim is concerned, the
results even contradict it, in that deviations in interstress intervals are
higher in English than in the other languages.
Borzone de Manrique and Signorini (1983), investigated a "syllable-
timed" language, (Argentinian) Spanish, and showed a) that syllable
duration is not constant but varies depending on various factors and b)
that interstress intervals tend to cluster around an average duration. Their
conclusion is that Spanish has a tendency to stress alternation. In an article
that reports the results of experiments carried out both on "stress-timed"
and on "syllable-timed" languages (Dauer, 1983) it is shown that the
duration of interstress intervals is not significantly different in "stress-
timed" English, on the one hand, and "syllable-timed" Spanish, Italian
and Greek, on the other hand. Dauer thus suggests that the timing of
stresses reflects universal properties of language organisation (cf. also Allen,
1975).
den Os (1988) contains a comparative study of rhythm in Dutch and
Italian. She measures interstress intervals in the two languages and shows
that, given intervals with the same number of syllables and syllables with
the same number of phonemes, there is no difference in duration in the
two languages. That is, if the phonetic content of the two languages is
kept similar, then their rhythm is similar. This amounts to saying that
it is the phonetic material of a string rather than a particular timing strategy
that gives rise to the perception of a different temporal structure in Dutch
and Italian.
From all these studies 3 two important conclusions may be drawn. First,
the isochrony of interstress intervals and syllables does not exist in the
physical reality of "stress-timed" and "syllable-timed" languages, respec-
tively. Second, the two groups of languages do not show significant
differences in their temporal organisation. These conclusions indicate that
there is no acoustic support for the rhythmic nature of the dichotomy
stress-timed and syllable-timed.
Lehiste (1973, 1977), following a suggestion by Classe (1939), proposes
that isochrony, though not detectable in the physical message, could
characterise the way in which language is perceived. Specifically, the
intuition speakers have about the isochrony of interstress intervals in English
may be based on a perceptual illusion. That is, the tendency of listeners
to hear such intervals as more isochronous than they really are (cf. also
Donovan and Darwin, 1979, Darwin and Donovan, 1980) might suggest
the presence of an underlying rhythm that imposes itself on the phonetic
material. In other words, the (more or less) regular recurrence of stresses
would be part of the rhythmic competence of native speakers of English.
A similar conclusion is reached by Cutler (1980a) on the basis of syllable
omission errors. These speech errors tend to produce sequences whose
interstress intervals are more regular than they are in the original target
sentence (cf. also Cutler, 1980b).
These results are very interesting for the present discussion in that T1
and T2 make different predictions about perception as well. According
to T l , first, the behavior exhibited by native speakers of English to regularise
interstress intervals should be extraneous to native speakers of "syllable-
timed" languages since stress would supposedly not play any role in their
rhythmic organisation; second, the native speakers of syllable-timed langua-
ges should have the tendency of perceiving syllables as more isochronous
than they really are.
As far as the latter prediction of Tl is concerned, there are, to my
knowledge, no perception experiments on syllable-timed languages that
would parallel those just mentioned for English. It is, however interesting
to notice that most claims about Spanish, French, Italian or Yoruba having
syllables of similar length are made by native speakers of English, not
by native speakers of syllable-timed languages, the ones that supposedly
should most feel this type of regularity.
Concerning the first prediction of T l , important results have been reached
by Scott, Isard and de Boysson-Bardies (1985), who found that native
speakers of "syllable-timed" French and of "stress-timed" English behave
in the same way: they both hear the intervals in between stressed syllables
as more regular than they actually are. While the similar behaviour of
French and English listeners in the perception of linguistic rhythm con-
tradicts the first predicion of Tl mentioned above, it is just what T2 would
predict: since language is temporally organised according to universal
principles, these should have similar effects in the perception of all
languages. These results thus indicate that there is no perceptual support
for different underlying rhythmic systems for stress-timed and syllable-
timed languages.
162 Marina Nespor
2. PHONOLOGICAL EVIDENCE AGAINST TWO TYPES OF RHYTHM
2.1. Nonrhythmic characteristics of "stress-timed" and "syllable-timed"

languages
If "stress-timed" and "syllable-timed" languages do not differ in their

underlying rhythmic organisation, a different explanation of the machine-
gun vs. morse code effect is called for. That is, if the isochrony of interstress
intervals and of syllables does not exist in the physical reality of the different
languages, the question to be asked is which other physical characteristics
of the two types of languages are responsible for the fact that they are
perceived as either "stress-timed" or "syllable-timed" (cf. Lehiste, 1977).
Dauer (1983), an important contribution to T2, indicates three factors
that would contribute to give the illusion of different temporal organi-
sations: syllabic structure (cf. also Bolinger, 1962), vowel reduction and
the various physical correlates of stress.
As far as the syllable is concerned, in stress-timed languages there is,
according to Dauer, a greater variation in syllable types and thus in their
length than there is in syllable-timed languages. In English, for example,
the most common syllables consist of a minimum of one and a maximum
of seven segments that result in 16 syllable types 4 . In Dutch, another
language classified as stress-timed, the same amount of segments per syllable
yields 19 most common syllable types. In Spanish, on the other hand,
the most common syllables contain from 1 to 5 segments that result in
9 syllable types, and in both Italian and Greek there are up to 5 segments
in a syllable and a total of 8 most common types of syllable. A reduced
variation in syllable complexity is partly responsible for the fact that
Spanish, Italian and Greek give the impression of having more or less
isochronous syllables in comparison to languages with a much larger
variation in syllable complexity.
Dauer observes that, in addition, more than half of the Spanish and
French syllables are of the CV type. Similar results are reached by Bortolini
(1976) for Italian: over 60% of the syllables are CV. The fact that the
large majority of syllables in these three languages are open contrasts with
the situation in English and Dutch, where there is a greater distribution
of occurrence of the different types of syllables and where open syllables
are by no means the majority. These statistical observations provide a
second clue as to why Spanish, Italian or French give the impression of
having syllables of similar length when compared with English or Dutch.
In addition, it is observed by Dauer that "stress-timed" languages have
a strong tendency, opposed to only a slight tendency in "syllable-timed"
languages, for heavy syllables to be stressed and for light syllables to be
stressless. Since duration is one of the physical correlates of stress, syllable
weight and stress reinforce each other in some languages much more than
in others.
The second phonological factor that, according to Dauer, characterises
"stress-timed" English, Swedish and Russian as opposed to Spanish, Italian
or Greek, is the reduction of stressless vowels. A phenomenon instead
that is widespread in "syllable-timed" languages is the deletion of one
of two adjacent vowels. The important difference between the two processes
for the present discussion is that while a syllable whose vowel undergoes
reduction retains its syllabicity, a syllable whose vowel undergoes deletion
disappears. Very short syllables are thus originated in "stress-timed"
languages but not in "syllable-timed" languages. Thus, the lack of vowel
reduction in Spanish, Italian and Greek also contributes to the impression
of syllable isochrony in these languages. The presence of it in English
or Dutch, instead, is partly responsible for the impression that the stressed
syllables recur at regular intervals. That is, the fact that stressless syllables
are reduced and thus shortened, together with the fact that they are shorter
than stressed syllables to begin with, makes them so much less prominent
than the syllables that carry stress, and the impression is created that a
sequence of stressless syllables occupy a more or less constant amount
of time, independently of how many syllables it contains. 5
Finally, stress has a greater lengthening effect in English than it has
in Spanish (cf. Dauer, 1983). This is one more characteristic that makes
the difference in duration between stressed and stressless syllables much
greater in the former language than in the latter, thus reinforcing the illusion
of regular recurrence of stresses and syllables, respectively. Now that the
nonrhythmic processes have been identified that are present in the languages
most often used as examples of either stress-timing or syllable-timing, it
must be demonstrated that the causality relation between rhythmic and
nonrhythmic phonology supports T2.1 turn to this task in the next section.
2.2. On the existence of intermediate systems
As was mentioned in the introduction, the prediction made by T2 is that

either there are independent reasons why different timing related pho-
nological processes coexist in a given system, or else languages would exist
that are intermediate between "stress-timed" and "syllable-timed" langua-
ges as far as the perception of their temporal structure is concerned.
In the first case, the division of languages into two groups would be
justified independently of whether a certain type of rhythm triggers the
application of phonological rules or certain types of rules produce a certain
rhythmic effect. I am not aware, however, of any reason why vowel
reduction, a rich syllable structure and certain phonetic correlates of stress
should coexist in one phonological system. And, in fact, there are systems
164 Marina Nespor
that share some nonrhythmic phonological rules with so-called stress-timed

languages and some with so called syllable-timed languages.
Catalan has such a phonological system: it has 12 most common syllable
types constituted by a minimum of 1 and a maximum of 6 segments.
It is thus, in this respect, intermediate between Italian, Greek and Spanish
on the one hand, with 8 to 9 syllable types and a maximum of 5 segments
per syllable, and Dutch and English on the other hand, with 16 to 19
syllable types and a maximum of 8 segments per syllable.
Catalan has, in addition, a rule that centralises unstressed vowels thus
reducing their syllables (cf. Mascaro, 1976, Wheeler, 1979), a rule typical,
as we have seen, of stress-timed languages. In addition, the phonology
of Catalan contains a rule that deletes one of two adjacent vowels under
certain conditions (cf. Mascaro, 1989), a rule typical of "syllable-timed"
languages, according to Dauer (1983). As far as stress is concerned, there
is no strong tendency for it to fall on heavy or long syllables. In this
respect, Catalan is thus more similar to Italian or Spanish than it is to
English or Dutch. Not surprisingly, Catalan is neither machine-gun like,
nor morse code like.
A second language that has been described as neither stress nor syllable-
timed is Portuguese. Brazilian Portuguese, in particular, has a repertoire
of syllable types similar to that of "syllable-timed" languages, a minimum
of one and a maximum of 5 segments per syllable, but has several
simplifications of syllable structure when the syllable is not stressed (cf.
Major, 1985): for example, unstressed vowels are often raised and thus
shortened and diphthongs are reduced to monophthongs when unstressed.
It is because of these characteristics and because of a tendency to regularly
alternate strong and weak syllables (cf. Maia, 1981), that Portuguese has
been said to be a language whose rhythm is changing from syllable-timed
to stress-timed (cf. Major, 1981).
Polish appears also to be an intermediate case: it has a very complex
syllable structure as well as alternating rhythmic stress (cf. Rubach and
Booij, 1985), but no rule of vowel reduction at normal rates of speech.
Vowels are reduced in fast speech; this is, however, a phonetic process
that is not typical of "stress-timed" languages only, but takes place in
"syllable-timed" languages as well (cf. den Os, 1988). Again, it is not
surprising that Polish is considered stress-timed by some linguists (e.g.
Rubach and Booij, 1985) and syllable-timed by others (Hayes and Puppel,
1985).6
The existence of languages whose temporal structure is in between that
of "stress-rhythm" and of "syllable-rhythm" cannot be accounted for within
Tl. It is, instead, exactly what we expect, given T2.
2.3. On the development of rhythm
We have seen, in section 1, that the classification of languages into those

with a stress based type of rhythm and those with a syllable based rhythm
does not correspond to any physical reality. In section 2.1., we have seen
that there are several nonrhythmic phenomena typical of languages clas-
sified as stress-timed and others typical of languages classified as syllable-
timed that may very well be at the origin of the perception of different
rhythms in the two types of languages. In order to show that the
phonological processes are indeed the cause, and not the effect, of the
machine-gun and morse code effects, we have pointed to the existence
of languages that have some phonological processes in common with "stress-
timed" languages and some with "syllable-timed" languages and whose
rhythm is neither perceived as stress-timed nor as syllable-timed.
In this section we will examine another piece of evidence in favour of
T2 based on the acquisition of phonology. If two types of rhythmic
organisation were available for the languages of the world, we would expect
the different rhythms to be acquired quite early in the phonological
development of a child. That is, the rhythm parameter should be set before
the acquisition of the phonological rules triggered by one specific type
of rhythm. One such rule, for "stress-timed" languages, would be vowel
reduction, the intensity of which would have to be directly proportional
to the number of syllables contained in an interstress interval. If, on the
other hand, rhythm is not parametric, but rather a universal organising
element in language, we would expect that in the first stages of language
acquisition, when the phonological system of a language is not yet
completely developed, the surface temporal patterns of speech would be
more similar for speakers of "stress-timed" and of "syllable-timed"
languages than it is at later stages. That is, before the development of
the phonological rules and structures that give the illusion of different
temporal organisations in different languages, such an illusion should not
exist. One of the characteristics of the first stages of language acquisition,
for example, is a very uniform syllable structure. At this stage, one would
thus expect the difference in the number of occurring syllable types in
the speech of, say, English and Italian children not to be as large as it
is in the speech of adult speakers and thus also the temporal structures
of the two speeches to be more similar than at later stages.
It has, in fact, been observed in Allen and Hawkins (1975), an expe-
rimental study on the development of rhythm in native speakers of English,
that children's first utterances contain only heavy syllables, in the sense
that all vowels are fully articulated. The lack of syllable reduction makes
the rhythm at this stage of language acquisition sound syllable-timed
166 Marina Nespor
(cf. Allen and Hawkins, 1975). It is the acquisition of the reduction processes
that contributes to the development of adult rhythm.
Once more, we are confronted with data that are accounted for within
T2, while they are not explainable within Tl.
From the observations presented in sections 1 and 2, the conclusion
must be drawn that T2 is superior to Tl for both phonetic and phonological
reasons. That is, no motivation has been found in favour of different
temporal organisations in language, but rather against it.
3. THE PHONOLOGY OF RHYTHM: ARGUMENTS FOR A UNIFIED RHYTHMIC

COMPONENT
3.1. The metrical grid in English and Italian
Given the conclusion of the previous sections, we expect the rhythmic

subcomponents of so called stress-timed languages not to differ from that
of syllable-timed languages. The present section, as well as the following
three, is devoted to arguments in favour of a nonparametric rhythmic
subcomponent of phonology. The languages on which the discussion will
be based are English and Italian.
It has been suggested in Selkirk (1984), that the difference between stress-
timed and syllable-timed languages is incorporated at the basic level of
the metrical grid, the representation of rhythm. Specifically, while in the
grid of both English and Italian, to each syllable in the linguistic material
corresponds an x at the first grid level, at the second, or basic level, a
distinction is made between the two languages: in English, only those
syllables that have some degree of stress are assigned an x, while in Italian,
every syllable is assigned an x, independently of its being stressed or stressless
(cf. (1) a and b, respectively (Selkirk's examples)).
x x x xxx
x x x x x x xxx
(1) a. the manager's here b. il popolo
In this way, the observation that Italian syllables are more or less
isochronous is incorporated in the representation of rhythm.
Since, however, the length of a syllable depends crucially on the number
of segments it contains, both in Italian and in English, and since the number
of segments per syllable can vary in Italian, though less than in English,
the representation proposed by Selkirk for Italian is not a reflection of
physical reality. The results of an experiment described in den Os (1988)
indicate, in addition, that representing the timing of "stress-timed" and
"syllable-timed" languages in different ways does not reflect the way in

which the two types of languages are perceived either. The two languages
on which den Os's experiment is based are Italian and Dutch. A Dutch
and an Italian text, similar in syllable composition were recorded and
then delexicalised by means of low-pass filtering (cf. den Os, 1988: 40).
These utterances formed the material for one experiment. The two texts
were then devoided of their melody. The utterances without intonation
formed the material for a parallel experiment. Both intonated and monotone
versions of the two texts were presented to native speakers of Dutch that
had to determine whether what they were hearing was originally Dutch
or Italian. While the subjects very often correctly identified the two
languages when confronted with the intonated texts, they were absolutely
unable to do so with the monotone versions. Since the rhythmic patterns
of the two languages was not modified in any way, the results of this
experiment show that Italian and Dutch do not differ as to their rhythmic
organisation. It is the segmental material that fills the syllables that gives
the illusion of different rhythms.
This experiment convincingly shows that the two different metrical grids
for Italian and English proposed by Selkirk are not a reflection of the
perception of rhythm. The conclusion must thus be drawn, I believe, that
if the metrical grid is to represent rhythm, it should not make any distinction
between stress-timed and syllable-timed languages. Therefore, the second
grid level will only contain one x for every prominent syllable in both
types of languages (cf. also Roca, 1986). The first two levels of the grid
of il popolo are thus as in (2).
x
x xxx
(2) il popolo
I will now turn to some observations about certain rules of rhythm in

English and Italian, as well as about the structures that constitute arhythmic
configurations in the two languages.
3.2. The Rhythm Rule in English and Italian
The phonology of English includes a rhythmic process whose effect is

that of eliminating arhythmic configurations consisting of word primary
stresses on adjacent syllables, the so called stress clash (cf. Liberman and
Prince, 1977). The phenomenon is usually accounted for by a rule that
moves the leftmost of the two stresses to the next syllable with some
168 Marina Nespor
prominence. The application of the rule is illustrated in (3), where " / "
marks word primary stress. 7
(3) a. thirteen vs. thirteen men

b. Tennessee vs. Tennessee air
The same process is present in the phonology of Italian, as Nespor and

Vogel (1979) have shown (cf. (4)).
(4) a. ventitré vs. véntitre gradi

'twentythree' 'twentythree degrees'
b. si presenterà vs. si présentera bène
'(it) will be presented' '(it) will be well presented'
It has, in addition, been shown that the domain within which the rules
apply is identical in the two languages (cf. Selkirk, 1978, Nespor and Vogel,
1982, 1986). As shown in (5) and (6) for Italian and English, respectively,
this domain coincides with the phonological phrase [<p]. In (7) and (8),
it is shown that the rule does not apply across phonological phrases. The
analysis in <p's is made according to the rules of phonological phrase
formation and restructuring proposed in Nespor and Vogel (1986).8
(5) a. [Le città nòrdiche],, mi piacciono. ( » citta)

'(I) like Nordic cities'
b. [Pescherà grànchi],, almeno, se non aragoste, ( » p é s c h e r a )
'(He) will fish crabs at least, if not lobsters'
(6) a. John [persevéres gladly]^ ( » p é r s e v e r e s )

b. Given the chance, rabbits [reproduce quickly],,
(»réproduce)
(7) a. [Le città],, [mólto nordiche],, mi piacciono, (no change)

'(I) like very Nordic cities'
b. [Pescherà],, [quàlche granchio],, almeno, se non aragoste, (no
change)
'(He) will fish some crabs at least, if not lobsters'
(8) a. John [perseveres],, [gladly],, and diligently, (no change)

b. Given the chance, rabbits [reproduce],, [very quickly]^ (no
change)
It is clear that the rule of English and the rule of Italian are very similar
and that they apply in order to create a more alternating pattern of stressed
and unstressed syllables. This motivation is a very natural one for a language
that, like English, is supposed to be stressed-timed. However, if Italian
were indeed to be syllable-timed, that is, have a rhythm based on the
succession of identical syllables rather than on the alternation of stressed
and stressless syllables, the existence of the rhythm rule just discussed
is quite unexpected. It is, in fact, difficult to find a reason for its existence,
but if one could imagine such a reason, it would still be surprising to
have one and the same rule triggered in two different ways and with different
motivations in the two languages.
It seems much more natural to assume that if one rule applies in the
same way in two languages, its motivation is also the same. In our specific
case, then, Italian would also have an aspiration to the alternation of
stresses. Since for a "syllable-timed" language there is no reason to have
alternating rather than adjacent stresses, we may once more draw the
conclusion that the fact that most Italian syllables are similar in length
has nothing to do with the language's temporal organisation. Rather,
rhythm in Italian, as well as in English, is an accentual phenomenon.
That is, the one object that must recur at regular intervals to establish
"order in movement" is stress.
The rhythm rule has, in addition, been proposed to account for similar
facts in German (Kiparsky, 1966), Dutch (Schultink, 1979, Kager and Visch,
1983), Finnish (Hayes, 1981), Polish (Hayes and Puppel, 1985) French
(Dell, 1984), Canadian French (Phinney, 1980), Brazilian Portuguese
(Major, 1985), Tiberian Hebrew (Mc Carthy, 1979), Dari (Bing, 1980),
Passamaquoddy (Stowell, 1979), Catalan (Nespor and Vogel, 1989). As
Hayes and Puppel (1985) suggest, it might very well be the exception for
stress languages not to have the rhythm rule (cf. also Nespor and Vogel,
1989), where it is proposed that the rules that take care that stress clashes
are eliminated are part of universal grammar).
3.3. The definition of stress clash in Italian and English
In the previous section, I have underlined the similarities in the ways in

which English and Italian eliminate a stress clash. There are, however,
certain aspects of the English rule that have never been claimed to
characterise the Italian rule as well (cf., however, Nespor, 1990). It has
been shown in Liberman and Prince (1977), Prince (1983), Hayes (1984)
and Selkirk (1984), among others, that the context of application of the
rule is not exclusively that exemplified in (3), that is, strict adjacency of
the two accented syllables is not a requirement for the application of the
rule. Rather, the rule may also apply when an unstressed syllable intervenes
between the two stressed ones, as exemplified in (9).
170 Marina Nespor
(9) a. Mississippi législature » Mississippi législature

b. good-loóking lifeguard » goód-looking lifeguard
c. Apalachicóla Falls » Apalachicola Falls
While with adjacent stresses the rule applies at all rates of speech, however,
in the case in which the two stresses are not adjacent the rule is gradient
in application, that is, its likelihood to apply increases as speech becomes
faster (cf. Hayes, 1984). On the basis of these data, it is proposed in Nespor
and Vogel (1989) that there is a parameter in the phonology of rhythm
that accounts for the different behavior of stress-timed English and syllable-
timed Italian in the definition of the configuration that constitutes a stress
clash. In particular, it is proposed that what counts as adjacent differs
in the two groups of languages: strict adjacency would be required in
Italian and more generally in syllable-timed languages; instead, two stressed
syllables would be considered adjacent in stress-timed languages, even
though one unstressed syllable intervenes to separate them.
In this section, I will argue, contra Nespor and Vogel (1989) that the
difference between English and Italian is to be found in their nonrhythmic
phonological systems, rather than in their rhythmic component (cf. also
Nespor, 1990). The data that reveal that rhythm functions in the two
languages in a way more similar than previously thought come from
Northern Italian. Besides the contexts of application of the rule exemplified
in (4), there are other cases in which it applies, although the syllables
bearing the clashing stresses are not linearly adjacent, as shown in (10).
(10) a. ventidûe bimbi » véntidue bimbi

'twentytwo children'
b. trentatré aèrei » tréntatre aèrei
'thirtythree airplanes'
What these two examples have in common is that the (italicised) weak
syllable that separates the stressed syllables consists exclusively of one vowel.
Any longer intervening syllable blocks the application of the rule, as shown
in (11).
(11) a. ventisètte bimbi, (no change)

'twentyseven children'
b. trentatré matite, (no change)
'thirtythree pencils'
These data are interesting in light of the observation made by Hayes that,
in English, a syllable intervening between two clashing prominences should
be short in order for a rhythmic readjustment to take place (1984:70).
Thus, while words with a short final syllable, such as Mississippi or

Apalachicola, are likely to undergo readjustment, words with phonetically
longer final syllables, such as Adirondack or Massapequod, are not.
I would like to propose that the fact that a stress readjustment takes
place in English even if there is an intervening syllable in between the
two clashing prominences is not due to the fact that English is stress-
timed, as seen from the fact that a similar situation arises in Italian. Rather,
"short" syllables may be considered extrarhythmic in both languages (cf.
also Nespor, 1990). The problem is how to define a "short" syllable.
Although I do not have a definition at this moment, I would like to propose
that what counts as a short syllable in a given language depends on the
syllable types that form the repertoire of that language. Thus in Italian,
a language with a quite restricted set of syllable types, only the minimal
syllable is considered short, that is, a syllable containing only one short
vowel. In English, however, a language with a large variety of syllable
types, also a CV syllable, or in certain cases a CVC syllable is considered
short. It is thus, once more, nonrhythmic phonology that determines a
difference in the rhythmic patterns allowed in a given language. The
rhythmic subcomponent, instead, is identical for English and Italian also
in this respect: as proposed in Nespor (1990), it will have a rule that deletes
extrarhythmic xs at the first grid level. Strict adjacencies of the syllables
bearing the clashing prominences is thus a requirement for the definition
of stress clash both in English and in Italian.
3.4. Stress lapses in English and Italian«
Stress clashes are not the only type of arhythmic configurations that may
arise when words are strung together in a sentence. Another type of
rhythmically ill-formed configurations is an "overlong" sequence of weak
positions, the so-called stress lapse (Selkirk, 1984:49). As is observed in
Selkirk (1984), a lapse is eliminated in both English and Italian (cf. also
Roca, 1986, Nespor and Vogel, 1989). Since English is stress-timed, it is
quite clear why its rhythmic component has a rule that adds a prominence
to a certain position in a lapse: its effect is that of producing alternation.
But what about the addition of a prominence in stress lapses in Italian,
exemplified in (13)?
(13) Non sanwo se glielo dicono (either se or glie)

'(They) don't know if (they) tell them'
Once more, if rhythm were parametric and based, in Italian, on the regular
succession of syllables rather than on the alternation of stressed and
stressless syllables, there would not be any reason why a sequence of
172 Marina Nespor
unstressed syllables like, for example, the ones italicised in (13), should
not be well-formed in Italian. If, however, rhythm is not parametric, but
is based on alternation in Italian, as well as in English, their similar
behaviour in the elimination of stress lapses is exactly what we expect.
Of course, this is not to say that the physical realisation of the added
prominence is the same: rather, it reflects the different nature of stress
in the two languages (cf. section 1).
4. CONCLUSIONS
In this paper, I have presented a number of arguments against a parameter

in the phonology of rhythm that would account for the dichotomy "stress-
timed" and "syllable-timed". The illusion of isochrony finds its origin
in phonological characteristics of the language and not in its temporal
organisation (cf. Dasher and Bolinger, 1982, Borzone de Manrique and
Signorini, 1983, Dauer, 1983, den Os, 1988). The rhythmic organisation
of language, instead, is the same for both languages traditionally classified
as having one of the two types of rhythm, and possibly for speech in
general as well as for other rhythmic activities (cf. Allen, 1975).
If this conception of the relation between rhythm and nonrhythmic
phonology is on the right track, then the disposition to alternate more
and less prominent elements may very well be innate. What the child should
identify in the acquisition process is the physical realisation of prominence
in the language he is exposed to. This is not to say that the child does
not hear the machine-gun or the morse code or any other type of sound
the language he is exposed to might have. The claim is rather that he
will not project any phonological rule from this type of information.
Given this view of rhythm, the rhythmic subcomponent of phonology
does not contain a parameter whose different settings would characterise
either stress rhythm or syllable rhythm. Accordingly, I have argued that
certain rhythmic structures that have been proposed in order to account
for the dichotomy stress-timing vs. syllable-timing (cf. Selkirk, 1984, Nespor
and Vogel, 1988) should be eliminated from the phonology of rhythm
(cf. also Nespor, 1990). The machine-gun and the morse code effects are
what results from the application of different nonrhythmic phonological
rules.
FOOTNOTES
* I would like to thank Iggy Roca for his comments on some of the ideas presented in
this paper and Joan Mascaro for discussions on Catalan phonology and for commenting
on a previous version of this paper and offering suggestions for improvements.
1. Strictly speaking, while the analogy between the sound of a machine-gun and a rhythm
with isochronous recurrence of events of sorts is appropriate, the same cannot be said for
the analogy between a message in morse code and any type of isochrony. There is, in fact,
no regular recurrence of events in the sequence of dots and dashes of a message in morse
code.
2. The arguments I will present are against two different claims made by T l : a) that there
are only two types of rhythmic organisation in language and, b) that the type of rhythm
according to which a language is temporally organised triggers a number of nonrhythmic
phonological properties. These two claims are not logically related. Since, however, they
are treated as strictly connected within T l , I will not always separate the arguments against
one from the arguments against the other.
3. The studies mentioned in this paper by no means exhaust the literature on isochrony.
For a more complete survey of the literature, cf. den Os (1988).
4. I am using here the term syllable in a by now traditional way: all consonants that appear
in the surface are included in a syllable whose nucleus is also present on the surface. This
is by no means an unquestioned assumption (cf. Kaye, Lowenstamm and Vergnaud, 1987).
5. It must be noted that the amount of stressless syllables that may occur in between two
stressed ones cannot vary very much, since if the sequence of stressless syllables is long
enough to constitute a stress lapse, a stress is added to remedy this arhythmic configuration
(cf. among others, Selkirk, 1984, and section 3.4 below).
6. Jerzy Rubach has pointed out to me that traditional Polish linguists and phoneticians
also consider Polish a stress-timed language.
7. Although at present, I believe that a rule that deletes a prominence (Beat Deletion) plus
a rule that adds a prominence (Beat Addition) account for this phenomenon better than
a rule that moves a prominence (cf. Nespor and Vogel, 1988 or Nespor and Vogel, 1989
for an extended proposal about the rules of rhythm), I omit a discussion of this proposal,
since it is not crucial to the point being made here. For different accounts of these facts,
the reader is referred to Liberman and Prince, 1977, Prince, 1983, Selkirk, 1984, Hayes,
1984, Nespor and Vogel, 1988, 1989, among others.
8. In grid terms, the domain of Beat Deletion is derivable from the definition of what
constitutes a minimal clash in a given language. For reasons of space, I will not discuss
this analysis here, although at present I believe it is the most adequate account of the facts.
The interested reader is referred to Nespor and Vogel, 1989.
REFERENCES
Abercrombie, D. 1967. Elements of General Phonetics. Edinburgh: University Press.

Allen, G. D. 1975. Speech rhythm: its relation to performance universals and articulatory
timing. Journal of Phonetics 3. 75-86.
Allen, G. D. and S. Hawkins. 1978. The Development of Phonological Rhythm. In A. Bell
and J.B. Hooper (eds.) Syllables and Segments. 173-185. Amsterdam: North-Holland.
Bing, J. M. 1980. Linguistic Rhythm and Grammatical Structure in Afghan Persian. Linguistic
Inquiry 11.437-463.
174 Marina Nespor
Bolinger, D. 1965. Pitch Accent and Sentence Rhythm. In Forms of English: Accent, Morpheme,
Order. Cambridge, Massachusetts: Harvard University Press.
Bortolini, U. 1976. Tipología sillabica dell'Italiano. Studio statistico. In R. Simone, U. Vignuzzi
and G. Ruggiero (eds.) Studi di Fonética e Fonología. 5-22. Roma: Bulzoni.
Borzone de Manrique, A. M. and A. Signorini. 1983. Segmental durations and the rhythm
in Spanish. Journal of Phonetics 11.117-128.
Classe, A. 1939. The Rhythm of English Prose. Oxford: Blackwell.
Cutler, A. 1980a. Syllable omission errors and isochrony. In H.W. Dechert and M. Raupach
(eds.) Temporal Variables in Speech. 183-190. The Hague: Mouton.
Cutler, A. 1980b. Errors of stress and intonation. In V.A. Fromkin (ed.) Errors in Linguistic
Performance: Slips of the Tongue, Ear, Pen and Hand. 67-80. New York: Academic Press.
Darwin, C. and A. Donovan. 1980. Perceptual studies of speech: isochrony and intonation.
In J. Simon (ed.) Proceedings of NA TO AST on spoken language generation and understanding.
77-85. Dordrecht: Reidel.
Dasher, R. and D. Bolinger. 1982. On pre-accentual lengthening. Journal of the International
Phonetic Association 12. 58-69.
Dauer, R. 1983. Stress-timing and syllable-timing reanalysed. Journal of Phonetics 11. 51-
62.
Dell, F. 1984. L'accentuation dans les phrases en français. In François Dell, Daniel Hirst
and Jean-Roger Vergnaud (eds.) Forme Sonore du Langage. Paris: Hermann.
den Os, E. 1988. Rhythm and Tempo in Dutch and Italian, a contrastive study. Doctoral
dissertation, Utrecht.
Donovan, A. and C. Darwin. 1979. The perceived rhythm of speech. In Proceedings of the
Ninth International Congress of Phonetic Sciences. 268-274. Copenhagen.
Hayes, B. 1980. A Metrical Theory of Stress Rules. Doctoral dissertation, MIT (IULC, 1981).
Hayes, B. 1984. The phonology of rhythm in English. Linguistic Inquiry 15. 33-74.
Hayes, B. to appear. The prosodie hierarchy in meter. In P. Kiparsky and G. Youmans
(eds.) Proceedings of the 1984 Stanford Conference on Meter. New York: Academic Press.
Hayes, B. and S. Puppel. 1985. On the Rhythm Rule in Polish. In H. van der Hulst and
N. Smith (eds.) Advances in Nonlinear Phonology. 59-81. Dordrecht: Foris.
Kaye, J., J. Lowenstamm and J.-R. Vergnaud. (to appear). Konstituentenrektion und Rektion
in der Phonologie. In H. Prinzhorn (ed.) Phonologie. Wiesbaden: Westdeutscher Verlag.
Kager, R. and E. Visch. 1983. Een Metrische Analyse van Ritmische Klemtoonverschijnselen.
M.A. Thesis, Utrecht.
Kiparsky, P. 1966. Über den deutschen Akzent. Studia Grammatica 7. 69-98.
Lea, W. A. 1974. Prosodie Aids to Speech Recognition: IV. A General Strategy for
Prosodically-guided Speech Understanding. Univac Report. No. PX10791, Sperry Univac,
DSD, St. Paul, Minnesota.
Lehiste, I. 1973. Rhythmic units and syntactic units in production and perception. Journal
of the Acoustical Society of America 54. 1228-34.
Lehiste, I. 1977. Isochrony Reconsidered. Journal of Phonetics 5. 253-263.
Liberman, M. and A. Prince. 1977. On stress and linguistic rhythm. Linguistic Inquiry 8.
249-336.
Lloyd, James A. 1940. Speech Signals in Telephony. London.
Maia, E. A. D. M. 1981. Hierarquias de constituentes en fonologia. Anais do V Encontro
Nacional de Lingüistica. 260-289. Pontificia Universidade Católica, Rio de Janeiro.
Major, R. C. 1981. Stress-timing in Brazilian Portuguese, Journal of Phonetics 9. 343-351.
Major, R. C. 1985. Stress and Rhythm in Brazilian Portuguese. Language 61. 259-282.
Mascaró, J. 1976. Catalan Phonology and the Phonological Cycle. Doctoral dissertation, MIT
(IULC, 1978).
Mascaro, J. 1989. On the Form of Segment Deletion and Insertion Rules. Probus 1. 31-
62.
McCarthy, J. 1979. Formal Problems in Semitic Phonology and Morphology. Doctoral
dissertation, MIT.
Nespor, M. 1990. On the Separation of Prosodic and Rhythmic Phonology. In S. Inkelas
and D. Zee (eds.) The Phonology-Syntax Connection. 243-258. CSLI. Chicago: The University
of Chicago Press.
Nespor, M. and I. Vogel. 1979. Clash Avoidance in Italian. Linguistic Inquiry 10. 467-482.
Nespor, M. and I. Vogel. 1982. Prosodic domains of external sandhi rules. In H. van der
Hulst and N. Smith (eds.) The Structure of Phonological Representations. Part I. 225-255.
Dordrecht: Foris.
Nespor, M. and I. Vogel. 1986. Prosodic Phonology. Dordrecht: Foris.
Nespor, M. and I. Vogel. 1988. Arhythmic sequences and their resolution in Italian and
Greek. Constituent structure. Papers from the 1987 GLOW Conference. Annali di
Ca'Foscari, University of Venezia.
Nespor, M. and I. Vogel. 1989. On clashes and lapses. Phonology 6. 69-116.
O'Connor, J. D. 1965. The Perception of Time Intervals. Progress Report 2. 11-15. Phonetics
Laboratory, University College, London.
Phinney, M. 1980. Evidence for a Rhythm Rule in Quebec French. NELS 9.
Pike, K. 1945. The Intonation of American English. Ann Arbor, Michigan: University of
Michigan Press.
Plato, The Laws. Loeb Classical Library. Cambridge, Massachusetts: Harvard University
Press, 1926.
Prince, A. 1983. Relating to the Grid. Linguistic Inquiry 14. 19-100.
Roach, P. 1982. On the distinction between "stress-timed" and "syllable-timed" languages.
In D. Crystal (ed.) Linguistic Controversies. London: Edward Arnold.
Roca, I. 1986. Secondary stress and metrical rhythm. Phonology Yearbook 3. 341-370.
Rubach, J. and G. E. Booij. 1985. A Grid Theory of Stress in Polish. Lingua 66. 281-319.
Schultink, H. 1979. Readies op "Stress Clash". Spektator 8.5. 195-208.
Scott, D. R., S. D. Isard and B. de Boysson Bardies. 1985. Perceptual isochrony in English
and French. Journal of Phonetics 13. 155-162.
Selkirk, E. O. 1978. On Prosodic Structure and its Relation to Syntactic Structure. Paper
presented at the Conference on Mental Representation in Phonology. IULC, 1980.
Selkirk, E. O. 1984. Phonology and Syntax: the Relation between Sound and Structure.
Shen, Y. and G. G. Peterson. 1962. Isochronism in English. University of Buffalo Studies
in Linguistics. Occasional Papers 9. 1-36.
Stowell, T. 1979. Stress Systems of the World, Unite. MIT Working Papers in Linguistics
1. 51-76.
Wheeler, M. 1979. Phonology of Catalan. Oxford: Blackwell.
Dependencies in the Lexical Setting of
Parameters: a solution to the
undergeneralisation problem*
Mark Newson
University of Essex
1. THE LEXICAL PARAMETERISATION HYPOTHESIS AND ENSUING PROBLEMS
From within the Principles and Parameters framework, the "traditional"

view of Universal Grammar is that of a system of grammatical modules,
each of which consists of universal principles pertaining to a particular
grammatical phenomenon (such as Case assignment, Binding, the licensing
of empty categories, etc.) and a number of parameters allowing for
variability within the whole system. Language learning is viewed as a process
of "setting" the parameters to one or another of their values. This is
accomplished from "positive evidence" presented from the target language,
which favours the selection of one value over another. It is important
to note that, under this view, parameters are associated with grammars;
thus any particular parameter setting permeates the whole grammar and
it should not be possible for a language to conform to more than one
value of any parameter.
This view of parameters has recently altered from that of the traditional
perspective. The main cause of this shift has been the observation that
the situation predicted above does not hold of all parameters. In particular,
variations within binding phenomena can be seen within certain languages
as well as across languages. For example, Japanese has at least three different
types of anaphor: zibun, a long distance, subject oriented anaphor; zibun
zisin, which is non-long distance but still subject oriented and kare zisin,
which is neither long distance nor subject oriented (on these see Katada
(1988)). If such variations can be put down to parameterisation, as has
been claimed in numerous places (e.g. Yang (1983), Harbert (1986), Wexler
and Manzini (1987) and Vikner (1985)) and as will be maintained in the
present paper, then we must conclude of these parameters that they are
not set for grammars as a whole, but for the individual lexical items within
a language.
Besides such empirical evidence indicating that a number of parameters
must be seen as "lexical" rather than "grammatical" there are some
theoretical considerations which make it an attractive idea to extend this
view to all parameters. First, it represents the minimal learning theory,
in that all other theories of language acquisition must suppose at least
178 Mark Newson
some lexical learning but this view claims that this is all there is. Second,
it is conceptually simpler to locate language variation in the lexicon. One
result of this is that in a suitably abstract sense there is only one language;
all language variation can be put down to mere lexical differences. Moreover,
if such variation represents idiosyncratic properties of individual languages
then the lexicon is the rightful place to store such information. Finally,
storing all information about variability in the behaviour of lexical items
in the same place may make parsing less complex and thus the proposal
that all parameters are lexical may have computational advantages too.
The view that all parameters are lexical has been termed by Wexler
and Manzini (1987) the Lexical Parameterisation Hypothesis.
However, despite the empirical and theoretical support that the Lexical
Parameterisation Hypothesis receives, its adoption does lead to some major
problems. Although the Lexical Parameterisation Hypothesis may be the
minimal learning theory, it is not in all cases the most obvious one and,
in certain instances, it is positively counter-intuitive. For instance, take
word order parameters, however these are to be construed.1 Rather than
these being set once and for all for the language as a whole, if these
parameters are lexical they will have to be set for each and every individual
lexical item of the language. Obviously, this increases the amount of learning
that we suppose a child must do, perhaps beyond tolerable limits if we
consider that all parameters must be thus set. Furthermore, it is highly
counter-intuitive to suppose that each token of, say, a verb must be presented
so that the learner will know that each is head initial, for example. This
does not fare well with the fact that if native speakers are presented with
a newly invented verb, they will automatically know that that verb is head
final or head initial, depending on how this parameter is set for their
language. With respect to such "creativity" the traditional view is more
intuitive, as one instance of a head initial verb would be sufficient to
allow the learner to generalise this information to all other lexical items.
This last point raises a second problem with the Lexical Parameterisation
Hypothesis. Consider, again, word order parameters. It appears to be the
case that languages in general either have head initial or head final verbs
(or, more generally, it may be possible to characterise a whole language
as being basically head initial or head final). If word order parameters
are set for each individual lexical item, we might expect to find some
languages in which some verbs are head initial and some are head final.
There are, to my knowledge, no such languages.2
In the main, it is very difficult to account for intra-language genera-
lisations if we assume that the mechanisms which were originally proposed
to capture cross-language variance are also responsible for variation within
languages. This problem, Safir (1987) has referred to as the undergene-
ralisation problem.
Dependencies in the Lexical Setting of Parameters 179
The two problems identified above, the over-burdening of the learning

mechanism and the undergeneralisation problem, require answers if the
Lexical Parameterisation Hypothesis is to be maintained. It is the claim
of this paper that a rather simple solution exists to both problems, for
which there seems to be considerable empirical support, at least for one
particular module of the grammar. In what follows I shall first outline
this solution and then proceed to examine the empirical support for this
approach through investigation of generalisations concerning the binding
theory and its parameters.
2. A SOLUTION TO THE PROBLEMS
If parameters are lexical, then the information as to which value of a

certain parameter a lexical item conforms to must be stored in the lexical
entry for that item, like any other lexical feature. One problem we have
identified is that if such features have to be learned on the basis of direct
positive data, then this may be seen to overburden the learner. What is
needed is some way to relieve this burden so that the presentation of
a small amount of data would be sufficient to enable the setting of a
number of lexical parameters.
Let us attack this problem head on and suppose that there is a mechanism
which generalises learned information concerning a particular lexical item
to all other relevant lexical items. An example of such a mechanism would
be something which simply copied relevant features from one lexical entry
into others as a part of the learning of such lexical entries. We could
say that this sort of learning mechanism makes available to the child
"indirect positive evidence" in that such learning takes place on the basis
of implications derived from direct positive evidence. Let us refer to this
learning mechanism as a "Lexical Dependency" as the setting of the lexical
parameters of certain items are, under these assumptions, dependent on
those of others.
Obviously, we will have to allow for some individual lexical differences
within this system as not all lexical items in a language share all the same
features. All idiosyncrasies will, of course, have to be learned individually
and included in this is any lexical difference in parameter values conformed
to. This, presumably, takes place after the operation of the Lexical
Dependency and must be done on the basis of positive evidence indicating
that the relevant lexical item does not conform to the general setting enforced
by the Lexical Dependency. 3 However, it should be apparent that the
learning of idiosyncratic differences against a backdrop of a general
parameter setting places far less of a burden on the learner than would
the learning of each value individually; in the worst case, it involves, at
180 Mark Newson
most, the same amount of learning.

It is also fairly obvious that if a learning mechanism such as a Lexical
Dependency did exist, then this would be the cause and therefore the
explanation of many generalisations concerning the setting of parameters
within a language's lexicon. It would immediately account for generalised
tendencies, such as the tendency of a language to be either head initial
or head final. But what about absolute universals? These might be capturable
if it could be shown that Lexical Dependencies interact with other learning
principles to the effect that certain combinations of parameter settings
are unlearnable; as what is unlearnable will never appear in any natural
language, it will be an absolute universal that this combination is never
apparent. In principle, therefore, Lexical Dependencies could provide an
answer to both the problem of the overburdening of the language learner
and the undergeneralisation problem.
Thus far, what I have proposed is purely speculative. In what follows,
I shall present an example of the way in which undergeneralisations
concerning the binding theory can be solved by the supposition of a Lexical
Dependency.
3. UNDERGENERALISATIONS A N D THE BINDING THEORY
3.1. Background issues
To understand some of the undergeneralisations we shall be dealing with,

it is important to understand the learnability issues which underlie much
of what we shall be discussing and also to introduce the parameters with
which we will be concerned.
The major learnability issue of relevance here concerns the learning of
sets of languages which stand in subset or superset relationships to each
other. When such a situation holds of a set of languages there is a potential
learning problem; call it the subset problem. Making the usual learnability
assumption that children do not have access to data as to what is not
a possible structure in the languages they are engaged in learning (i.e.
negative evidence), the subset problem arises if the child hypothesises a
language which is bigger than the target language. The problem is that
once an overgeneral language has been hypothesised, there will never be
any data to enable the child to change the hypothesis; all the data presented
from the target language will be compatible with the hypothesised over-
general language, given that the target is a subset of the incorrectly
hypothesised language.
Obviously, the solution to this problem must involve the prevention
of the child from making an overgeneral hypothesis. This can be done
through a learning procedure whereby the selection of a language is

restricted to the smallest one compatible with the data presented. In this
way, it can be guaranteed that only an undergeneral language or the correct
one will ever be hypothesised. If an undergeneral language is selected,
there will be potential data to enable the learner to reject it in favour
of the correct one and hence learnability of the set of languages is
guaranteed. This learning procedure, known as the Subset Principle, was
first introduced by Berwick (1985).
Wexler and Manzini (1987) have pointed out that the Subset Principle
can be said to define a markedness hierarchy over values of parameters
which are associated with subset languages, such that the value producing
the smallest language will be defined as unmarked and the bigger the
language associated with a value, the more marked that value will be.
Concentrating on parameters of the binding theory, Wexler and Manzini
demonstrate that at least two of these have values which produce subset
languages and hence have markedness hierarchies defined by the operation
of the Subset Principle. These parameters concern the definition of the
governing category, the domain within which anaphors must be bound
and pronominals free, and the definition of proper antecedents, the class
of items which can act as possible antecedents for anaphors and disjoint
targets for pronominals. 4
The respective parameters given by Wexler and Manzini are as follows:
(1) ß is the governing category for a, iff ß is the minimal category

which contains a and
a. has a subject, or;
b. has an Inflection, or;
c. has a tense, or;
d. has an indicative tense, or;
e. has a root tense.5
(2) ft is a proper antecedent for a, iff is

a. a subject, or;
b. any item.
It is important to note that for both of these parameters the markedness

hierarchy, determined by the subset relations of the associated languages,
is different depending on whether the parameter is set for an anaphor
or a pronominal. In fact, the respective markedness hierarchies are in
opposite directions. Thus, for anaphors, the unmarked value of the
Governing Category Parameter (1) is value (a) and the most marked value
is (e), and the unmarked value of the Proper Antecedent Parameter (2)
is also value (a) and the most marked (b). For pronominals the opposite
182 Mark Newson
is true: value (e) of the Governing Category Parameter is unmarked and

value (a) is most marked, and value (b) of the Proper Antecedent Parameter
is unmarked and (a) is most marked (for more on this, the reader is directed
to Wexler and Manzini (1987)).
Wexler and Manzini (1987) make a proposal of direct relevance to the
present argument, which needs some comment here. They suggest that
if the setting of parameters cannot be done independently, then the
learnability of the whole system of parameters is threatened. Their argument
is that, given that UG contains many parameters, in order for it to make
sense to talk about the languages associated with the values of any parameter
being in subset relationships, it must be possible to vary parameter values
without affecting the setting of any other parameter. Thus, they claim,
the independence of a parameter is a necessary precondition for the
operation of the Subset Principle.
There are two relevant points to make concerning these claims. The
first is that the particular kind of parametric dependencies we are interested
in here involve dependencies within a single parameter rather than across
parameters and hence it is not clear how Wexler and Manzini's claims
affect the present proposals. The second point is that it is not perfectly
clear that the operation of the Subset Principle should be affected by all
dependent parameters anyway. What dependencies do is to rule out certain
possible languages as unlearnable. There is no reason why this function
should upset the subset hierarchies of the remaining languages on which
markedness hierarchies could be defined. As long as there are known
markedness hierarchies for the Subset Principle to operate on, and as long
as the dependency does not alter these hierarchies, it is not necessarily
the case that the Subset Principle must fail to work;6 given a piece of
data concerning any parameter all the Subset Principle does is to restrict
the selection of a parameter value to the least marked compatible one
and as long as either an undergeneral or the correct value is chosen, there
will be no learnability difficulties.
It seems very much an empirical issue as to whether or not parametric
dependencies exist, rather than, as Wexler and Manzini maintain, a matter
of logical necessity. This paper proposes to offer empirical support for
the existence of one type of dependency and hence, contrary to Wexler
and Manzini, it would appear that parameters do not necessarily have
to be independent.
3.2 Generalisations and the Lexical Dependency
The first generalisation we will discuss concerns the Governing Category

Parameter. Each value of this parameter, from (a) to (e), defines a domain
which, potentially, can be bigger than that defined by the previous Value.
For example, (a) could define an N P as a governing category while (b)

defines only clauses as such. It is obvious that a clause, as a governing
category under (b) can include an N P defined as a governing category
under (a). However, it is impossible for an N P governing category to contain
a clausal governing category, as this clause will be the minimal category
which has a subject and hence be defined as the governing category under
both (a) and (b). Thus the governing category defined by (b) will either
be the same as or bigger than that defined by (a). The same is true for
all values from (a) to (e).
The Lexical Parameterisation Hypothesis would predict that languages
can have lexical items which select different values of this parameter and
indeed there is much evidence of this; Danish and Norwegian have
pronominals which conform to value (a) but some anaphors conforming
to value (c), Icelandic has pronominals conforming to (c) and anaphors
conforming to (d) and Japanese has a pronominal conforming to (a) and
an anaphor conforming to (e):
Danish1
a. ...at [Peterj sa [Johnsj fem billeder af ham,/j]]
that P saw J's five pictures of him
b. ...at [Peterj bad Johnj om [PROj at ringe til hanvj]]
that P asked J for to ring to him
c. *...at [Peter; fortalte Anne om hanij]
that P told A about him
d. ...at [Peter; sa [Johns fem billeder af sigi]]8
that P saw J's five pictures of self
e. ...at [Peterj herte [Anne omtale sig;]
that P heard A mention self
f. *John; sagde at [Peter kritiserer sig. meget ofte]
J said that P criticises self very often
(4) Norwegian9
a. Dei leste [mine klager mot demj
they read my complaints against them
b. *De, leste [klager mot demj]
they read complaints against them
c. Knut; ba Olaj [PROj korrigere s e g j
K asked O to-correct self
d. *01aj vet [vi beundrer segj]
O knows we admire self
184 Mark Newson
(5) Icelandic10
a. Jonj segir a5 [Maria elski haniij]
J said that M love him
[subj]
b. Joni segir a5 [Mariaj elskar hann,]

J said that M loves him
c. *J6n; skipaSi merj a5 [PROj raka hannj
J ordered me that to-shave him
d. Jorij segir a5 [Mariaj elski sig,/j]
J said that M love self
[subj]
e. Jorij segir a8 [Mariaj elskar sigj/j]

J said that M loves self
(6) Japanese
a. J o h n r w a kare^-ni tsite-no Billj-no hon -o
J TOP he DAT about B GEN book OB
yonda
read
'John read Bill's book about him'
b. J o h n r w a kare r ni tsite-no hon -o yonda
J TOP he DAT about book OB read
'John read a book about him'
c. J o h n r w a zibunj/j-ni tsite-no Billj-no hon -o yonda
J TOP self DAT about B GEN book OB read
'John read Bill's book about self
d. J o h n r w a Billj-ga zibunj/j-o semeta to itta
J TOP B SUB self OB blamed that said
'John said that Bill blamed self
It is therefore the case that languages may have lexical items which take
governing categories which can include or be included within those of
other lexical items.
It is interesting to consider what this means for the distribution of
pronominals and anaphors in a language. There are three possible con-
ditions. If anaphors and pronominals select the same value of the Governing
Category Parameter, then they will be in complementary distribution, given
that anaphors must be bound and pronominals free within the same domain.
If the anaphors have a governing category which includes that of the
pronominals, then they will have overlapping distributions as the domain
within which the anaphors must be bound will extend beyond that within
which pronominals must be free, hence there will be a domain in which
either can be bound. Finally, if the pronominals have a governing category
which includes that of the anaphors, there will be a domain in which

neither can be bound; something which might be called an "inaccessible
domain" for binding. Schematically, this might be represented thus:
(7) i ...5i...[AP...)3j...a...] a.j/j = anaphor

a;/.j = pronominal
ii ...3i...[A...ej...[P...y3k...a...]] a. i / j / k = anaphor
=
a
i/j/«k pronominal
iii ...5i...[P...ej...[A...&...«...]] a.j/.j/k = anaphor
a =
i/y*k pronominal
While there is evidence of languages which have anaphors and pronominals

with complementary and overlapping distributions (see the examples in
(3)-(6) for languages with overlapping anaphor and pronominal distribu-
tions), it seems that there is no language which contains inaccessible
domains. The generalisation could therefore be made that no language's
pronominals ever select governing categories which include those of its
anaphors or, equivalently, all languages must be such that their anaphors
and pronominals have complementary or overlapping distributions. 11
From the perspective of the current paper, we shall want to account
for this generalisation in terms of a restriction placed on the values of
the Governing Category Parameter selected by a language's anaphors and
pronominals. In particular, we will want to make this generalisation follow
from a Lexical Dependency which states that if such and such a value
is selected for anaphors then such and such a value must be selected for
pronominals. However, there are a few considerations which must be born
in mind before we propose such a restriction.
First, assuming the Lexical Parameterisation Hypothesis, there is no
well defined notion of a uniform anaphor or pronominal governing category
for any possible language; it could be the case that a language's anaphors,
for example, select different values of the Governing Category Parameter
and hence have different governing categories to each other. Indeed such
a thing is not uncommon, e.g. Danish, Norwegian, Italian, Japanese and
Greek all have anaphors with differing governing categories. 12 This means
that any restriction placed on the the selection of parameter values will
have to be made for individual lexical items and not for the class of anaphors
or pronominals as a whole.
Second, consider what it means for there to be inaccessible domains
in a language. For a domain to be truly inaccessible, it must be the case
that for any possible antecedent situated in that domain, there can be
no anaphor or pronominal which has referential access to it. In a language
which differentiates its pronouns, it will be the case that the majority of
these will be unable to refer to a given antecedent, simply because there
186 Mark New son
will be an incompatibility between the ^-features of the antecedent and

the pronouns. Usually, there will only be two pronouns which could possibly
take any given element as their antecedent; one anaphor and one pron-
ominal. This indicates that whatever else may be, the most important
restriction we must make, if we are to prevent inaccessible domains, is
one which holds between anaphor and pronominal pairs which can take
the same antecedents; i.e. those for which there is no incompatibility of
their «^-features. The restriction will obviously have to maintain that for
all such pairs, at least one of them must have referential access to any
given domain.
Suppose we define a restriction on the values of the Governing Category
Parameter that a pronominal can select which depends upon the value
selected by the anaphor counterpart of that pronominal. In other words,
we claim there is a Lexical Dependency holding between anaphor/
pronominal pairs which share the same set of possible antecedents.
To see how such a device can prevent inaccessible domains, consider
what the restriction would have to be. Given that values (a) to (e) define
successively larger governing categories, and that the restriction we want
prevents pronominals from selecting a larger one than the relevant anaphor,
then whichever value the anaphor selects must define the largest governing
category that the pronominal can possibly select. The pronominal can,
of course, select a value producing a smaller governing category, i.e. one
nearer to value (a) than that selected by the anaphor, but not one producing
a larger governing category.
To achieve this is quite simple. We have a Lexical Dependency forcing
the pronominal to initially select exactly the same value as the relevant
anaphor. We then allow further learning to take place for the pronominal.
That such learning will never produce a pronominal governing category
which is bigger than the anaphor's is guaranteed by the pronominal's
markedness hierarchy which, as stated earlier, goes from value (e) to value
(a). Hence, whichever value is provided by the Lexical Dependency for
the pronominal will always serve as the least marked value that the
pronominal can select; the Subset Principle will prevent a less marked
value from being selected.
This Lexical Dependency can be summed up in the following diagram:
(8) anaphor pronominal
a —a
b b
c c
d d
e e
In (8), the vertical arrows represent the markedness hierarchies for anaphors
and pronominals and the horizontal arrows represent the Lexical Depen-
dency operating from anaphor to pronominal.
4. SUPPORT FOR THE LEXICAL DEPENDENCY
What we have so far is a rather neat way of accounting for one generalisation
concerning the setting of the Governing Category Parameter. We do not,
as yet, have any other evidence to support this Lexical Dependency, nor
do we have any support for the idea that Lexical Dependencies are in
general operation for setting parameters, thereby offering a potential
solution to other undergeneralisations. However, if support can be found
for this particular Lexical Dependency, then it would be reasonable to
assume that Lexical Dependencies are generally available as parameter
setting devices, as it would be odd in the extreme for such a device to
be only available for the setting of the Governing Category Parameter.
In this section, support for this particular Lexical Dependency will be
presented.
The first piece of evidence concerns a generalisation about the values
of the Governing Category Parameter which both anaphors and pron-
ominals tend to select. If it is the case that markedness hierarchies of
parameters have any influence over which values are selected within
languages, 13 we might expect that there would be a tendency toward the
selection of unmarked values. Thus, as far as the Governing Category
Parameter is concerned, we might expect that anaphors, in general, would
favour the selection of value (a) and pronominals that of value (e). These
expectations are borne out for the case of anaphors, with the majority
of these selecting their unmarked value. However, the same is not true
for pronominals and in fact most pronominals tend to select value (a),
188 Mark Newson
their most marked value. There are very few cases of pronominals which
select their unmarked value; Manzini and Wexler (1987) present only one.
Obviously, the question begged here is - why should this be so? Note
that we cannot take this to mean that anaphors and pronominals should
properly be considered as having the same markedness hierarchy. After
all, the markedness hierarchies that have been proposed for anaphors and
pronominals are based on learnability arguments under some fairly re-
asonable assumptions. If these hierarchies are not as proposed, the
Governing Category Parameter should be impossible to set for any lexical
item. Therefore we cannot simply reject the suggested pronominal mar-
kedness hierarchy, even though the data concerning the values they tend
to select seem to suggest otherwise.
There is a very simple solution to this puzzle, however, which follows
directly from the Lexical Dependency suggested above. If it is the case
that pronominals are dependent for their parameter values on anaphors,
then the markedness hierarchy for anaphors will impose a restriction on
the values that pronominals can select. Simply put, if most anaphors tend
to select value (a) of the Governing Category Parameter, the Lexical
Dependency will force most pronominals to select this value. Given that
value (a) is most marked for pronominals, no further learning can take
place for these and thus most pronominals will end up with this value.
Furthermore, for a pronominal to be able to select its least marked value,
i.e. (e), will be dependent on the relevant anaphor also selecting this value.
This value is most marked for anaphors and therefore, presumably, least
likely to be selected by them. Further still, even if an anaphor were to
select its most marked value, thereby enabling the counterpart pronominal
to select the same, there is still the possibility of further learning for the
pronominal, with any value from (e) to (a) available for selection. We
can see, then, that given the Lexical Dependency, it is entirely expected
that pronominals should tend to select their most marked value and not
their least marked one.
There are a number of other places where the anaphor markedness
hierarchy seems to dominate pronominal behaviour quite contrary to
expectation. Each of these can be taken as empirical support for the Lexical
Dependency which would lead us to expect this situation. For example,
take the case of empty categories. Under standard assumptions, these are
seen as having pronominal and anaphoric features and thus come under
the restrictions of the binding theory. When we look into the question
of which values of the binding parameters empty categories conform to,
we find that most seem not to be parameterised at all; i.e. their behaviour
is the same in any language which has them. For example, the trace of
a moved NP, standardly considered as a pure anaphor, seems to conform
to value (a) of the Governing Category Parameter in all languages. There
is a question raised here; is it purely coincidental that NP traces should

conform to their unmarked value of the Governing Category Parameter
or is this a reflection of a wider restriction? Manzini and Wexler (1987)
claim that this is indeed a token of a wider phenomenon in which all
non-parameterised items conform to the unmarked values of relevant
parameters. 14 There are two pieces of evidence bearing on this. First, there
is the fact that NP traces are always subject bound and hence also conform
to the unmarked anaphor value of the Proper Antecedent Parameter.
Second, there are other non-parameterised anaphors, such as "phrasal"
anaphors, which also conform to the unmarked value of the Governing
Category Parameter (see Yang (1983)).15
Given the above discussion, it might be expected that a non-parameterised
pronominal should conform to its unmarked values of the binding pa-
rameters. Assuming that it, like other empty categories, is non-parame-
terised with respect to the binding theory parameters, a likely candidate
for a non-parameterised pronominal would be the null subject in pro-
drop languages. However, although matters are far from straightforward,
one thing that can be said about the item pro is that it does not conform
to value (e) of the Governing Category Parameter; it is perfectly possible
for this item to be bound in the matrix clause, as is shown in the following
Spanish example:
(9) Juan; dice que pro, telefoneó

J says that telephoned
'John says he telephoned'
It is rather more difficult to determine which value of the Governing

Category Parameter pro does conform to, mainly because restrictions placed
on its distribution from outside the binding theory tend to interfere; in
some languages it is restricted to subject positions, in others it can appear
in both subject and object positions and in others it cannot appear at
all.16 However, as we are assuming that pro conforms to the same Governing
Category Parameter value across all languages, then an example of a
language in which the null subject takes the smallest governing category
should give us the least marked value of the Governing Category Parameter
that empty pronominals conform to. As it happens, there are cases where
empty pronominals take NPs as their governing categories, as in the
following Hungarian example from Kiss (1987):
(10) Jánosj [ NP az pro, autójával] ment el

J the car-Agr-with went away
'John left in his car'
190 Mark New son
As it is only value (a) which defines NPs as possible governing categories,

we can conclude from the above that pronominal empty categories conform
to this value.
So, here we have another example of a pronominal conforming to the
unmarked anaphor value of a parameter where we might expect it to
conform to the unmarked pronominal value. It is clear that the Lexical
Dependency can offer some explanation for this phenomenon in that it
imposes the anaphor markedness hierarchy on top of that of pronominals
making the latter only secondary. Thus it might be claimed that in the
above situation, the anaphor markedness hierarchy is more dominant and
hence all empty categories, be they anaphor or pronominal, conform to
it.:17
Yet another example of the dominance of the anaphor markedness
hierarchy can be seen in phenomena which seem to associate values of
the Governing Category Parameter with those of the Proper Antecedent
Parameter. This is usually noted in terms of anaphors; long distance
anaphors, which we can take to be those associated with any value of
the Governing Category Parameter except value (a), are always subject-
oriented and hence must be associated with value (a) of the Proper
Antecedent Parameter. However, it must also be noted that the same
phenomenon occurs with pronominals; pronominals which must be free
within a fairly wide governing category, i.e. one defined by any value
other than (a), can be bound by an object within this category and hence
are associated with value (a) of the Proper Antecedent Parameter. Thus
for both anaphors and pronominals there seems to be a link between values
other than (a) of the Governing Category Parameter and value (a) of the
Proper Antecedent Parameter.
However, this phenomenon is not as straightforward as it is often
assumed. To start with, there are non-long distance anaphors which are
also subject-oriented (e.g. Japanese zibun zisin) and also there are long
distance anaphors which, under certain circumstances can be bound by
objects (e.g. Italian se which can be bound by a "close" object). As Manzini
and Wexler (1987) point out, it would seem that the connection between
the Governing Category Parameter and the Proper Antecedent Parameter
is not one of an implicational linking between the selection of values of
the two; i.e. if a value of the Governing Category Parameter other than
(a) is selected then value (a) of the Proper Antecedent Parameter must
be selected. Rather, it would appear that this connection involves the
behaviour of individual anaphors within the confines of the values that
they actually select, such that they can only conform to at most one non-
(a) value in any one instance. The same also seems to be true of pronominals,
i.e. these too can only conform to at most one non-(a) value of these
parameters irrespective of what values they actually select.
However, it is obvious that we cannot capture such behaviour through

reference to unmarked values. For example, this behaviour is not captured
by a principle which states that any lexical item can conform to at most
only one marked value.18 Such a statement captures the generalisation
only as it applies to anaphors. The case for pronominals seems to be
described by the statement that they must conform to at least one most
marked value. 19 This restriction can only follow from a statement concerning
unmarked values, if value (a) of both binding parameters is taken to be
unmarked for anaphors and pronominals alike. It would again seem that
here pronominals behave as though their markedness hierarchies are the
same as those of anaphors, a situation which is explained if we assume
a dependency operating between anaphors and pronominals.
Up to this point, we have been concentrating on evidence for the Lexical
Dependency which concerns generalisations about which values of the
binding theory parameters anaphors and pronominals select. The final
piece of evidence we shall consider here concerns data from child language
acquisition studies. Since the introduction of the binding theory parameters,
there has been much work done to investigate the way in which children
seem to acquire knowledge about the permitted behaviours of anaphors
and pronominals. Perhaps one of the most overwhelming conclusions drawn
by most of these studies (see Grimshaw and Rosen (1988) for a critical
discussion) is that children seem to master control over anaphors far in
advance of pronominals. In Wexler and Chien (1985), replicated in Chien
and Wexler (1987), it is reported that by the age of 6;6 children have
perfect control over their use of anaphors, i.e. they will assign an appropriate
antecedent to an anaphor in every instance. However, it is not until after
this age that the child's performance with pronominals even starts to
improve; up to this point they average around just over chance levels
in their assignment of appropriate antecedents to pronominals. After the
age of 6;6, however, there is improvement in their performance with these
items (see Deutsch, Koster and Koster (1986)). Is it just coincidence that
children's performance with pronominals improves only after it can be
said that their learning concerning anaphors is complete? Obviously, we
need a better explanation than coincidence for this observation.
If it is the case that pronominals are dependent on anaphors for the
setting of their parameter values, then this situation is entirely predictable.
We would expect there to be a period of time in which the learning of
anaphors was undertaken and before which the operation of the Lexical
Dependency could not take place. Only after the learning of anaphors
was complete would we expect the Lexical Dependency to operate and
thereby initiate a period of learning for pronominals. Thus we can claim
that the behaviour of children in learning the binding principles is entirely
192 Mark New son
in accordance with what would be expected if pronominals were dependent

on anaphors.
To summarise this section, we have seen that there are a number of
instances where, contrary to expectation, pronominals seem to behave as
though their markedness hierarchies are exactly the same as those of
anaphors. However, we cannot take this to be an argument that pronominals
and anaphors do indeed have the same markedness hierarchies as, according
to the Subset Principle, pronominals would be unlearnable if they did.
Rather, we are forced to accept the markedness hierarchies as predicted
by the Subset Principle and find some other explanation for the behaviour
of pronominals. We have argued that the Lexical Dependency provides
such an explanation in that it predicts that the anaphor markedness
hierarchy will be dominant over that of pronominals and hence in situations
where there is pressure to conform to unmarked values, it might be expected
that pronominals conform to the unmarked value of anaphors. This, in
turn, can be taken as empirical support for the proposed Lexical Depen-
dency, as without assuming such a thing we would be at a loss to explain
pronominal behaviour. Moreover, the behaviour of children learning
anaphors and pronominals also lends support to the Lexical Dependency;
only if there is some link in the learning of anaphors and pronominals
can we explain why it is only after the time when anaphors seem to be
completely learned that learning of pronominals proceeds.
5. A FURTHER PREDICTED GENERALISATION
On top of the empirical support for the Lexical Dependency discussed

above, the mechanism itself is open to empirical testing through its
predictions. One fairly clear prediction that the Lexical Dependency makes
concerns the possibility of pronominals existing without anaphor coun-
terparts. If it is the case that pronominals are learned through their anaphor
counterparts, it might be claimed that if there is no anaphor there can
be no pronominal. Yet, the existence of an anaphor will not necessarily
guarantee that of a pronominal. Therefore, we might expect that the
inventory of a language's pronouns would reflect this with there being
necessarily an identifiable anaphor partner to each pronominal but not
necessarily the other way round.
However, this is perhaps a little too simplistic and while this situation
is the most straightforward one compatible with the predictions of the
Lexical Dependency, it is by no means the only one. For example, it could
be the case that a pronominal without an anaphor partner exists. The
Lexical Dependency would predict that such a pronominal could not be
learned; which is to say that it could not have its parameters set on the
basis of direct evidence. However, it would be perfectly possible for such

a pronominal to be given a "default" parameter value, which, given the
discussion about the dominance of the anaphor markedness hierarchy,
we might expect to be the unmarked anaphor value of any binding
parameter.
A second compatible situation might be where a single anaphor de-
termines the parameter values of a number of pronominals. If this was
the case, the Lexical Dependency would still predict that each pronominal
would have a governing category included in or exactly the same as that
of the anaphor. Moreover, given that the Lexical Dependency is supposed
to hold between anaphor/pronominal pairs which take the same set of
possible antecedents, the set of pronominal dependents on an anaphor
would have to be such that their defining features were not incompatible
with the anaphor; i.e. those features which distinguish between the
dependent pronominals would be undefined for the anaphor and hence
the feature set of the anaphor would still be compatible with that of each
of the dependent pronominals.
In the main, it would appear that this predicted generalisation holds
fairly straightforwardly and that there are very few cases of pronominals
which go without anaphor partners, although there are several cases where
we get partnerless anaphors. A clear example of this comes with reciprocals
which always appear to have anaphoric status. While it is true that this
could possibly reflect some basic semantic property of reciprocity, it is
difficult to identify what the semantic property would be which could
rule out the following constructions:
(11) the menj think Bill likes recip,

[pron]
(12) recip met

[pron]
(13) I introduced recip

[pron]
It seems that there is a reasonable interpretation for each of these sentences

and thus they cannot be ruled out on semantic grounds. If it is valid
to conclude from these observations that pronominal reciprocals are at
least logically possible, then obviously we need some explanation as to
their non-occurrence. While I do not intend to propose such an explanation,
the relevant point to note is that this situation is entirely in accordance
with the predictions of the Lexical Dependency.
194 Mark Newson
Another, more straightforward example of an anaphor without a

pronominal counterpart can be found in Chinese. Chinese has a basic
reflexive form X ziji (where X represents any personal pronoun) which
is comparable in most ways with English reflexives. However, there is
another reflexive form, X ben ren. The difference between the ziji and
ben ren forms is extremely subtle but can perhaps be described as a difference
in a relationship of "personalness" between the reflexive and its antecedent.
However, whatever the difference between these two anaphoric forms, the
important point is that there is no corresponding difference with pron-
ominals; there is only one pronominal form.
As a counter-example to the predicted generalisation that there should
be no pronominals without anaphoric partners, it may be pointed out
that there are languages which seem to fail to have anaphors at all, e.g
Old English and West Flemish (Burzio, 1988) and also some Polynesian
languages (Hale, personal communication). In these languages, however,
it would appear that there is one form which covers all of the cases of
anaphors and pronominals in that such items can be bound anywhere
in a sentence or be totally free. There are a number of responses one
could make to these data from the point of view of the Lexical Dependency.
One might be to claim that what we have here is a case of an unlearned
pronominal taking, as was predicted, the unmarked anaphor value of the
Governing Category Parameter. This argument would also involve the
postulation of a governing category which is smaller than that defined
by value (a) of this parameter, in order to allow these pronominals to
refer to close elements. Such a claim has been made elsewhere (in Koster
(1987) for example). It should be obvious that if there was such a governing
category, it would represent the unmarked case for anaphors as it is included
in all other possible governing categories.
A second possible response would be to claim that in these languages
there are no anaphors or pronominals. It is fairly clear that the pronouns
in these languages do not behave anything like those of languages where
the distinction between anaphors and pronominals is maintained. It could
therefore be that these pronouns are simply undefined for the features
[ianaphor], [ipronominal] and hence are not subject to any of the binding
constraints.
If these arguments can be maintained, it would appear that there are
no languages which provide counter-evidence to the prediction that all
pronominals must either have an anaphor counterpart or conform to the
unmarked anaphor values of the binding parameters. It would seem that
this generalisation is only explicable under the assumptions made here
concerning the Lexical Dependency and thus can be taken as support for
this learning device.
In conclusion, it would seem that there is much empirical evidence from

a variety of sources supporting the assumption that pronominals are
dependent on anaphors for the setting of their binding theory parameters.
It would be a strange situation indeed if a learning mechanism such as
the Lexical Dependency proposed here were to be restricted in application
to a small number of parameters in one module of the grammar. Thus,
on these grounds we can take support for this particular Lexical Dependency
as support for the notion that such learning devices are in general application
throughout the whole of UG. If this is the case, then it is clear that we
have a solution to the two problems raised by the Lexical Parameterisation
Hypothesis; those of the overburdening of the learner and the underge-
neralisation problem. Therefore, we can maintain a theory in which
parameters are said to be lexical and retain the empirical and theoretical
advantages of this.
FOOTNOTES
* I wish to thank Martin Atkinson, Annabel Cormack and Iggy Roca for invaluable comments
and suggestions concerning previous drafts of this paper.
1. One proposal concerning word order parameters, put forward by Huang (1982), is that
complements are ordered with respect to their heads by a head final/head initial parameter.
Another suggestion is that word order falls out from parameters determining the direction
of Case and theta role assignment (Koopman (1984), Travis (1984) and Fukui (1986)).
2. In actual fact, there are languages which have both prepositions and postpositions; for
example, German, which is basically prepositional, has at least two postpositions; entlang
'along' and gegenüber 'opposite' (thanks to Mike Jones for pointing these out to me) and
similarly Persian, another overwhelmingly prepositional language, has one postposition ra,
which is used for direct objects (Comrie (1981)). This, perhaps, indicates that word order
parameters for these items are set individually. However, a closer investigation of the properties
of these words would need to be undertaken before a strong claim about this can be made.
It is also intereting to find that such a phenomenon seems only to affect adpositions and
never verbs. This observation obviously warrants investigation.
3. This is similar to an idea proposed by Huang (1982) who claimed that an " u n m a r k e d "
setting of the head final/head initial parameter is one where all lexical categories conformed
to the same setting. More marked situations were possible where certain lexical categories
would have the value for this parameter changed on the basis of positive evidence.
4. The term "proper antecedent" is a little misleading in connection with pronominals given
that what is meant is that such elements cannot be co-referential with a pronominal within
its governing category and hence are not antecedents at all. However, the term is convenient
and far less awkward than Yang's (1983) more accurate term "disjoint reference target".
5. This is obviously a much simplified definition of the governing category than the one
that is required to capture the true distribution of anaphors and pronominals and also such
elements as PRO. For a more accurate but complex version of the parameter, the reader
is directed to Manzini and Wexler (1987) where such issues are addressed. However, for
the purpose of the present paper, the simplified parameter will suffice.
196 Mark Newson
6. It is a separate question as to where the knowledge of the markedness hierarchies comes

from. If Manzini and Wexler are correct, these are computed, based on the subset relations
of the languages associated with the relevant values. However, this is not the only possibility.
It could be that knowledge of markedness hierarchies is part of the innate endowment and
thus need not be computed at all. See Newson (19B9) for an argument which favours the
second of these proposals.
7. From Vikner (1985), apart from (f) which is from Jakubowicz and Olsen (1988).
8. It is to be noted that sig in this example cannot be bound by the close subject John
and thus that there is some other restriction operating on this anaphor. This does not mean,
however, that we cannot consider such items to have governing categories in which they
must be bound defined in exactly the same way as for other anaphors. The same applies
for the following Norwegian examples concerning the reflexive seg.
9. From Yang (1983).
10. From Manzini and Wexler (1987).
11. Manzini and Wexler (1987) call this generalisation the Spanning Hypothesis but do not
propose to account for it.
12. In general, this is the case with any language which has anaphors selecting any value
other than value (a) of the Governing Category Parameter. This is because it appears that
reciprocals across all languages select the unmarked value of this parameter and hence all
languages, if they have reciprocals, will have at least one anaphor which selects this value.
On this, see Yang (1983).
13. Why this should be so, however, is not at all clear.
14. Manzini and Wexler propose to derive this from the Lexical Parameterisation Hypothesis,
claiming that this predicts that only lexical items will be parameterised and empty categories
are non-lexical. However, while this may be true for NP traces, it is not necessarily so
for other empty categories such as pro and PRO which are present at D-structure and hence,
perhaps, inserted from the lexicon. There are, though, other reasons why such items may
appear to be non-parameterised; for example if PRO is viewed as a pronominal anaphor,
we would not expect it to be associated with any value of the Governing Category Parameter
given that it cannot have a governing category. The case of pro, as we shall see, is a little
more complex. If it can be claimed that this item is non-parameterised, as I assume, then
the Lexical Dependency itself could be the explanation for this; if pronominals are dependent
on anaphoric counterparts for their learning, then a pronominal without an anaphor
counterpart should be unleamable (however, see later in the text for more discussion on
this). As pro is a pronominal without a lexical anaphor counterpart, it could be assumed
that this item must be universal and therefore non-parameterised.
15. Pica (1987) makes much of the fact that long distance anaphors tend to be those which
are morphologically simple in his theory of anaphor movement. However, it must be pointed
out that morphological complexity and phrasal status are not necessarily one and the same
thing, as seems to have been the assumption.
16. Indeed, if Rizzi (1986) is correct, the distribution of pro is dependent on a parameter
which determines a set of licensors which can be any of the major categories.
17. A stronger claim is made in Newson (1989), where it is supposed that only the markedness
hierarchy for anaphors is available to Universal Grammar. Given that it is a reflex of Universal
Grammar that non-parameterised elements conform to unmarked values, the only unmarked
value available is that of the anaphor. The reader is directed to this paper for a more detailed
discussion.
18. As Manzini and Wexler (1987), in fact, claim.
19. It may be true that the statement that pronominals must either be free in the root
clause (i.e. conforming to the unmarked pronominal value of the Governing Category
Parameter) or free from any item within their governing category (conforming to their
unmarked Proper Antecedent Parameter value) is not incompatible with pronominal be-
haviour, as Manzini and Wexler point out. However, it is also true that such a statement
does not capture the restriction placed on pronominals that they conform to at most only
one value (a).
REFERENCES

MIT Press.
Burzio, L. 1988. On the Non-Existence of Disjoint Reference Principles. Ms. of a paper
presented at the LSA meeting.
Chien, Y-C and K. Wexler. 1987. Children's Acquisition of the Locality Condition for
Reflexives and Pronouns. Ms. University of California, Irvine.
Comrie, B. 1981. Language Universals and Linguistic Typology: syntax and morphology. Oxford:
Blackwell.
Deutsch, W., C. Koster and J. Koster. 1986. What Can We Learn From Children's Errors
in Understanding Anaphora? Linguistics 24. 203-225.
Fukul, N. 1986. A Theory of Category Projection and its Applications. Doctoral dissertation,
MIT.
Grimshaw, J. and S. T. Rosen. 1988. The Developmental Status of the Binding Theory
(or Knowledge and Obedience). Ms. Brandeis University, Waltham, Massachusetts.
Harbert, W. 1986. Markedness and Bindability of Subject of NP. In F. R. Echman, E.
A. Moravcsick and J. R. Wirth (eds.) Markedness: Linguistic Symposium of the University
of Wisconsin - Milwaukee. 139-154.
Huang, J. C-T. 1982. Logical Relations in Chinese and the Theory of Grammar. Doctoral
dissertation, MIT, Cambridge, Massachusetts.
Jakubowicz, C. and L. Olsen. 1988. Reflexive Anaphors and Pronouns in Danish. Syntax
and Acquisition. Ms. of a paper presented at the Boston University Conference.
Katada, F. 1988. LF-Binding of Anaphors. Proceedings ofWCCFL 7. 171-185.
Kiss, K. E. 1987. Conflgurationality in Hungarian. Budapest: Akademiai Kiado.
Koopman, H. 1984. The Syntax of Verbs: From Verb Movement Rules in the Kru Languages
to Universal Grammar. Dordrecht: Foris.
Koster, J. 1987. Domains and Dynasties: the radical autonomy of syntax. Dordrecht: Foris.
Manzini, M. R. and K. Wexler. 1987. Parameters, binding theory, and learnability. Linguistic
Inquiry 18. 413-444.
Newson, M. 1989. Capturing Generalisations With Dependencies in the Lexical Setting of
Parameters. In P. Branigan, J. Gaulding, M. Kubo and K. Murasugi (eds.) Student
Conference in Linguistics: MIT Working Papers in Linguistics 11. 183-198.
Pica, P. 1987. On the Nature of the Reflexivisation Cycle. Proceedings of NELS 17. 483-
499.
Rizzi, L. 1986. Null Objects in Italian and the Theory of pro. Linguistic Inquiry 17. 501-
577.
Safir, K. 1987. Comments on Wexler and Manzini. In T. Roeper and E. Williams (eds.)
Parameter Setting. Dordrecht: Reidel. 77-89.
Travis, L. 1984. Parameters and Effects of Word Order Variation. Doctoral dissertation, MIT.
Vikner, S. 1985. Parameters of Binder and of Binding Category in Danish. Working Papers
in Scandinavian Syntax 23. Dragvoll, Norway: University of Trondheim.
198 Mark Newson
Wexler, K. and Y-C. Chien. 1985. The Development of Lexical Anaphors and Pronouns.
Papers and reports on child language development 24. Stanford University: Stanford University
Press.
Wexler, K. and M. R. Manzini. 1987. Parameters and Learnability in Binding Theory. In
T. Roeper and E. Williams (eds.) Parameter Setting. Dordrecht: Reidel. 41-76.
Yang, D-W. 1983. The Extended Binding Theory of Anaphors. Language Research 19. 169-
192.
The Nature of Children's Initial
Grammars of English
Andrew Radford
University of Essex
1. INTRODUCTION
It is widely held that we first have clear evidence that a child has developed
an initial grammar of his native language during the period of early patterned
speech, when the child shows evidence of being able to combine words
together productively to form systematic structures - a period which
typically lasts from around 20 to 24 (±20%) months of age (cf. e.g. Goodluck
1989). The nature of children's initial grammars is of particular interest
because this is the point at which the child has accumulated minimal
linguistic experience, and is thus the point at which the contribution made
by Universal Grammar to the child's linguistic development might therefore
seem to be most readily observable (albeit indirectly). In addition, children's
initial grammars provide an obvious testing-ground for maturational
theories of language acquisition (such as that proposed by Borer and Wexler
1987) which hold that different principles and parameters may come 'on
line' at different stages of linguistic maturation.
In this paper, I shall suggest that early child grammars of English differ
radically from adult grammars in two interesting and inter-related respects.
Firstly, whereas adult phrases and sentences are projections of both lexical
and functional categories, child phrase and sentence structures are pro-
jections of the four primary lexical categories (Noun, Verb, Adjective, and
Preposition), and thus lack functional categories altogether. Secondly,
whereas adult phrases and sentences contain both thematic and nonthematic
constituents, their child counterparts are purely thematic structures (in
the sense defined below). We can represent what I am saying in diagram-
matic terms by positing that all phrases and clauses produced by young
children will be lexical-thematic structures of the canonical form (1) below
(where X, Y, and Z are lexical categories):
200 Andrew Radford
(1)
specifier
0-marked lexical lexical
by X' 0-marking complement
head 0-marked
by X
I shall henceforth define a lexical-thematic structure as being a structure

which comprises only lexical categories (N, V, P, or A and their projections),
and in which all non-maximal projections independently theta-mark any
sisters they have, and in which all maximal projections are theta-marked
by any sisters they have. These conditions are satisfied in (1), by virtue
of the fact that all categories in (1) are lexical, and by virtue of the fact
that all constituents with sisters in (1) are either theta-assigners or theta-
assignees (thus, X and X' are theta-assigners, and YP and ZP are theta-
assignees, since X 9-marks ZP, and X' 0-marks YP). Of course, the overall
XP itself is not theta-marked; but this is because it is a root constituent
which has no sister, and therefore does not need to be 0-marked. Thus,
(1) satisfies our definition of a lexical-thematic structure. I shall henceforth
say that a given word-level category X is thematic just in case X theta-
marks any complement it has, and its immediate X-bar projection theta-
marks any specifier it has.
Given the schema in (1), a typical child utterance such as Daddy read
book would have the simplified structure (2) below:
(2) VP
NP V'
AGENT
I V NP
Daddy I PATIENT
read I
book
The whole structure would thus be a Verb Phrase (or verbal Small Clause):
it would be a lexical structure in that it comprises only projections of
the head lexical categories N and V. It would also be a thematic structure
in the sense that the V read theta-marks its sister constituent book, the
V-bar read book theta-marks its sister constituent Daddy, the NP book
is theta-marked by its sister V read, and the NP Daddy is theta-marked
by its sister V-bar read book (VP is not theta-marked, but is not required
to be as it is a root constituent and so has no sisters).
The twin facts that children's phrases and sentences contain (i) lexical
but not functional, and (ii) thematic but not nonthematic constituents are
clearly closely inter-related. Abney (1987: 54 ff.) posits that the essential
difference between lexical and functional categories lies in the fact that
lexical categories have thematic content (by which he presumably means
that non-maximal lexical projections theta-mark any sister constituents
which they have), whereas functional categories do not (hence he refers
to non-functional categories as 'thematic categories'). However, the inter-
relationship between categorial status and thematic status is more complex
than this implies. For instance, some lexical categories do not theta-mark
their sisters: e.g. a single-bar constituent headed by a raising predicate
like seem or likely, or by a passive participle like thought does not theta-
mark its sister (subject) constituent; hence, in a sentence such as:
(3) The enemy seems to be thought likely to destroy the city
the italicised subject the enemy is theta-marked by the bold-printed V-

bar destroy the city, not by (any projection of) the passive participle thought
or the raising predicates likely and seem. The converse also seems to be
true, namely that some functional categories may theta-mark their sisters:
for example, Chomsky suggests in Barriers (1986a: 20) that Modal I
constituents may theta-mark their complement VPs; and others have
suggested that an I-bar headed by a so-called root Modal may theta-mark
its sister (subject) constituent. If it is indeed the case that some lexical
constituents do not theta-mark their sisters, while conversely some func-
tional constituents do theta-mark their sisters, then clearly the relationship
between categorial and thematic properties is not a straightforward one.
Of course, what our lexical-thematic analysis in (1) predicts is that all
constituents in early child English will be lexical and thematic, and thus
that children will make no productive use of functional or nonthematic
constituents at all.
One might argue that the lexical-thematic analysis proposed here in-
corporates insights from two different traditions in child language research:
thus, our claim that early child structures are purely lexical in nature
incorporates the central insight of Brown and Fraser 1963 and Brown
and Bellugi 1964 that early child utterances contain contentives but lack
functors; and the claim that all child constituents have a thematic function
might be seen to build on some of the ideas put forward in Schlesinger
1971/1982, Bowerman 1973, Brown 1973, Braine 1976, Gleitman 1981,
and MacNamara 1982 (though there are important differences, discussed
in Radford 1990, chapter 2). Within the Government and Binding paradigm,
202 Andrew Radford
the suggestion that early child phrases and sentences are purely lexical-
thematic structures echoes earlier ideas in Radford 1986/1987/1988a/1990,
Abney 1987 (p. 64), Guilfoyle and Noonan 1989, Lebeaux 1987/1988,
Kazman 1988, Platzack 1989, and many others.
In the remainder of this paper, I shall present substantial empirical
evidence in support of the lexical-thematic analysis of early child English,
arguing that this provides a correct characterisation of the structure of
both phrasal and clausal structures produced by young children: in section
2, I shall look at typical nominal phrases produced by young children;
in section 3 I turn to examine children's clausal structures; and subsequently
(in section 4) I present an overview of the overall organisation of early
child grammars.
2. STRUCTURE OF NOMINALS IN EARLY CHILD ENGLISH
Before turning to consider the structure of children's nominals, it is useful

(for comparative purposes) to begin with a brief outline of the structure
of adult nominals. The analysis which I shall presuppose here is the so-
called D P analysis, under which Determiners ( = D) are analysed as the
head constituents of their containing nominals (slightly differing versions
of the D P analysis are outlined in Fukui 1986, Abney 1987, and Fassi
Fehri 1988). Under this analysis, all word categories (whether lexical or
functional) are taken to be projectable into the corresponding phrase
categories, so that just as N projects into NP, so too D projects into DP.
Accordingly, a nominal such as this picture of us is analysed as a D P
( = Determiner Phrase) constituent with the simplified constituent structure
indicated in (4) below:
(4) [DP [D this] [ N P [ N picture] [ P P [ P of] [ D P [ D us]]]]]
Thus the overall nominal this picture of us is a D P whose head is the

Determiner this; moreover, the pronoun us which functions as the com-
plement of the Preposition of is analysed as a pronominal Determiner
(if the is a third person Determiner in an expression such as the children,
then the pronoun we is plausibly analysed as a first person Determiner
in a structure like we children). It is interesting to note that the Preposition
of which introduces the complement of the Noun appears to be nonthematic
in nature (in the sense that it has no intrinsic thematic content of its own),
since the complement us would appear to be theta-marked by the Noun
picture. A structure such as (4) shows that adult English nominals can
contain both functional elements (e.g. prenominal Determiners like this
and pronominal Determiners like we), and nonthematic elements (e.g.

complement-introducing Prepositions like of).
However, if we are correct in assuming that early child grammars
comprise only lexical-thematic constituents, then we should expect to find
that early child nominals contain neither functional nor nonthematic
elements. The absence of nonthematic elements in early child grammars
is illustrated (inter alia) by the fact that children consistently omit the
nonthematic complement-introducing Preposition of before complements
of Nouns, as we see from examples such as the following (where the names
and numbers in parentheses identify the names and ages - in months -
of the children producing the utterances):
(5) a. [Cup tea] (= 'a cup of tea', Stefan 17)

b. [Bottle juice] ( = 'a bottle of juice', Lucy 20)
c. [Picture Gia] ( = 'a picture of Gia', Gia 20 from Bloom 1970)
d. Want [piece bar] (= 'I want a piece of the chocolate bar',
Daniel 20)
e. Have [drink orange] ( = 'I want to have a drink of orange',
Jem 21)
f. Want [cup tea] (Jenny 21)
g. [Cup tea] ready (John 22)
h. [Picture Kendall]. [Picture water] ( = 'a picture of Kendall/
water', Kendall 23, from Bowerman 1973)
i. [Drink water] (— 'a drink of water', Paula 23)
j. [Colour crate]. [Colour new shoes] ( = 'The colour of the crate/
of my new shoes', Anna 24)
Instead, the (bracketed) child nominal in each case comprises only an

italicised (theta-marking) head Noun and a following (theta-marked)
complement Noun - i.e. the child nominal is a purely thematic structure.
For example, the nominal piece bar in (5)(d) is an NP with the categorial
structure indicated in (6) below (here, as elsewhere, I simplify structural
representations by omitting constituents not immediately relevant to the
discussion at hand, e.g. single-bar constituents):
(6) [NP [N Piece] [ N P [ N b a r ] ]
The structure conforms to the lexical-thematic schema in (1) above in

that it comprises only projections of the lexical category Noun, and the
complement NP bar is theta-marked by its sister constituent piece. By
contrast, the adult counterpart of (6) would be the functional-nonthematic
structure (7) below:
204 Andrew Radford
(7) [ D p [D a] [ N P [ N piece] [ PP [ P of] [ D P [ D the] [ N P [ N bar]]]]]]
This adult structure is a functional-nonthematic one in that it contains

the D constituents a and the which are both functional and nonthematic.
Moreover, although the P o/belongs to a lexical category, it is nonthematic
in the sense that it assigns no independent theta-role to its complement
(as we can see from the impossibility of using of here as a predicate, in
structures like *It is of the bar). Furthermore, (7) fails to conform to the
schema (1) in that the complement the bar is not theta-marked by its
immediate sister of, but rather by its aunt piece. The fact that the child
uses the lexical-thematic structure (6) in a context where the adult would
require the functional-nonthematic structure (7) underlines our central
hypothesis that there are no functional or nonthematic constituents in
early child grammars.
As we can see by comparing the bracketed child nominal piece bar in
(6) with its adult counterpart a piece of the bar in (7), it is not just nonthematic
elements like complement-introducing of which are 'missing' from child
nominals, but also functional elements like the Determiners a and the.
There is abundant evidence that the absence of Determiners is not just
a contingent property of the structure in (6), but rather a systematic fact
about early child nominals. Some evidence in support of this claim comes
from experimental work conducted in the 1960s with children performing
sentence repetition tasks: I can illustrate the findings of this research in
terms of the following imitations of adult model sentences produced by
Eve at age 25 months (from Brown and Fraser 1963):
(8) MODEL SENTENCE CHILD'S IMITATION

a. Read the book Read book
b. I am drawing a dog Drawing dog
c. I will read the book Read book
d. I can see a cow See cow
In each case, Eve systematically omits the italicised Determiner in her

response. If we follow Slobin 1979 in positing that children can only
consistently imitate correctly items whose morphosyntax they have ac-
quired, then a natural conclusion to draw from data such as (8) is that
young children at Eve's stage of development have not yet acquired a
category of Determiners. In place of adult functional DPs such as the
book, a dog, and a cow, she systematically uses lexical NPs such as book,
cow, and dog - so reinforcing our hypothesis that children use lexical NPs
in contexts where adults require functional DPs.
Moreover, spontaneous speech data show the same pattern of children
using indeterminate nominals (i.e. nominals lacking Determiners) in con-
texts where adults would require determinate nominals (i.e. nominals with
a preceding Determiner). This can be illustrated by the spontaneous speech
data in (9) below:
(9) a. Got bee. Got bean. Open door (Stefan 17)

b. Ribbon off. Paula want open box (Paula 18)
c. Hayley draw boat. Turn page. Reading book. Want duck. Want
boot (Hayley 20)
d. Dog barking. Got lorry. Paper off. Want ball (Bethan 21)
e. Open can. Open box. Eat cookie. Get diaper. Build tower. Hurt
knee. Help cow in table. Horse tumble. Man drive truck. Pig
ride. Diaper out. Napkin out (Allison 22, from Bloom 1973)
f. Wayne in garden. Want tractor. Want sweet. Want chocolate
biscuit. Want orange. Want coat. Daddy want golf ball. Lady
get sweetie now. Where carl Where bike! Where tractor? Tractor
broken (Daniel 23)
In each of these examples, the italicised child nominal is headed by a

singular count Noun, and would therefore require a preceding Determiner
such as a, the, my, etc. in adult speech; but the child's nominal is
indeterminate in every case, thus suggesting that the children in question
have not yet developed a Determiner system. The wider significance of
this would seem to be that whereas adult nominals are functional DP
constituents, early child nominals are purely lexical NP constituents.
There is a further source of evidence which we can look to in order
to seek corroboration for our claim that early child nominals are inde-
terminate (in the sense that they lack Determiners). Recent work (cf. e.g.
Fukui 1986) has suggested that the genitive's morpheme in English functions
as a head Determiner constituent. Given this assumption, a nominal such
as the president's policy on disarmament would be analysed as having the
(simplified) superficial structure indicated in (10) below:
(10) [ D P the president [ D [ NP policy on disarmament]]
in which the head of the overall DP structure is the genitive Determiner

's, the following NP [policy on disarmament] is the complement of the
Determiner, and the preceding DP [the president] functions as the specifier
of the Determiner.
If the genitive 's morpheme is indeed a head Determiner constituent,
and if we are correct in hypothesising that early child nominals lack a
D-system, then we should expect to find that the child's earliest counterpart
of adult genitive DP structures will show no evidence of acquisition of
the genitive 's morpheme. This prediction is borne out by the fact that
206 Andrew Radford
children at this stage do not attach the genitive 's suffix to possessor
nominals, as the data in (11) below illustrate (the (a) examples are from
Bloom 1970, and the (b) examples from Braine 1976):
(11) a. Mommy haircurl(er). Mommy cottage cheese. Mommy milk.

Mommy hangnail. Mommy vegetable. Mommy pigtail. Mommy
sock. Mommy slipper. Kathryn sock. Kathryn shoe. Wendy
cottage cheese. Baby cottage cheese. Cat cottage cheese. Jocelyn
cheek. Baby milk. Tiger tail. Sheep ear (Kathryn 21)
b. Daddy coffee. Daddy shell. Mommy shell. Andrew shoe. Daddy
hat. Elliot juice. Mommy mouth. Andrew book. Daddy car. Daddy
chair. Daddy cookie. Daddy tea. Mommy tea. Daddy door.
Daddy book. Mommy book. Daddy bread. Elliot cookie. Elliot
diaper. Elliot boat. Daddy eat ( = 'Daddy's food'). Daddy juice.
Mommy butter. Daddy butter (Jonathan 24)
The use of s-less possessives is widely reported as a typical characteristic

of early child English in the acquisition literature: cf. e.g. Brown and Bellugi
1964, Cazden 1968, Bloom 1970, Brown 1973, Bowerman 1973, De Villiers
and De Villiers 1973a, Braine 1976, Hill 1983, Greenfield et al. 1985, etc.
Thus, it would seem that we have abundant empirical evidence that children
at this point in their development have not acquired the genitive Determiner
l
s. This in turn is consistent with our more general suggestion that early
child nominals are purely lexical NPs which lack functional categories
such as Determiners. 1
It is interesting to compare the superficial syntactic structure of a typical
child possessive like lady cup coffee with that of its adult counterpart the
lady's cup of coffee-, the two arguably have the respective structures indicated
in (12) below (simplified, inter alia, by omitting single-bar constituents):
(12) a. [ N P [ N P lady] [ N cup] [ N P coffee]]

b. [ D P [ D P the lady] [ D ' s ] [ N P [ N cup] [ P P [ P of] [ D P coffee]]]]
The child nominal (12)(a) would seem to be a lexical-thematic structure

of the canonical form (1): it is lexical in that it comprises only lexical
categories (N and its projections); it is thematic in that the NP complement
coffee is theta-marked by its sister constituent (the N cup), and the NP
specifier lady likewise is theta-marked by its sister constituent (the N-bar
cup coffee). By contrast, the adult nominal (12)(b) is clearly a functional
structure (since it comprises the functional category D and its projections)
and a nonthematic one (in that it contains nonthematic constituents like
's and of)\ moreover, neither the complement coffee nor the specifier the
lady is directly theta-marked by its immediate sister constituent. Thus,
the differences between adult and child possessive structures support the
hypothesis that adult nominals are functional-nonthematic structures, while
their child counterparts are lexical-thematic structures.
There is a third piece of evidence which we can adduce in support of
the claim that early child nominals are indeterminate. As I noted earlier
in relation to examples such as (4) above, there are a number of reasons
for supposing that so-called 'personal pronouns' are Determiners (so that
we and you function as prenominal D constituents in structures such as
We linguists respect you psychologists, and as pronominal D constituents
in structures like We respect you). This being so, then one should expect
to find that early child English is characterised by the absence of personal
pronouns. In this connection, it is interesting to note the observation by
Bloom et al. (1978) that young children typically have a nominal style
of speech which is characterised by their nonuse of case-marked pronouns
such as I/you/he/she/we/they, etc. (cf. also the parallel remark by Bo-
werman (1973: 109) that in the first stage of their grammatical development
'Seppo and Kendall used no personal pronouns at all'). We can illustrate
this nominal speech style from the transcript of the speech of Allison Bloom
at age 22 months provided in the appendix to Bloom (1973: 233-57), since
Bloom et al. (1978: 237) report that Allison's NPs at this stage were
'exclusively nominal'. Of particular interest to us is the fact that Allison
used nominals in contexts where adults would use pronominals. For
example, in conversation with her mother, Allison uses the nominal
expressions baby, baby Allison, or Allison to refer to herself (where an
adult would use the first person pronouns I/me/my), as we see from the
examples in (13) below:
(13) a. Baby Allison comb hair. Baby eat cookies. Baby eat cookie.
Baby eat. Baby open door. Baby drive truck. Baby ride truck.
Baby down chair.
b. Help baby
c. Allison cookie. Put away Allison bag. Baby cookie. Baby diaper.
Baby back. Wiping baby chin. There baby cup (Allison 22)
The adult counterparts of Allison's sentences would contain (in place of

the italicised nominal) the nominative pronoun I in (13)(a), the objective
pronoun me in (13)(b), and the genitive pronoun my in (13)(c); but Allison's
utterances in each example contain only an uninflected nominal. Similarly,
Allison also uses the nominal Mommy to address her mother in contexts
where adults would use the second person pronouns you/your, as the
following examples illustrate:
208 Andrew Radford
(14) a. Mommy open. Mommy help. Mommy pull. Mommy eat cookie
b. Peeking Mommy. Get Mommy cookie. Pour Mommy juice
c. Eat Mommy cookie. Eating Mommy cookie. Mommy lap (Al-
lison 22)
Thus, in place of the italicised nominal mommy in (14), an adult would

use the nominative/objective pronoun you in the (a) and (b) examples,
and the genitive pronoun your in the (c) examples. But Allison systematically
avoids the use of personal pronouns. Now, if personal pronouns function
as pronominal Determiners, then the fact that children like Allison have
not acquired personal pronouns at this stage would clearly lend further
empirical support to the claim that such children have not yet acquired
a category of Determiners. The fact that children are using nominals in
(what in adult terms are) pronominal contexts would suggest that they
are using simple NPs in contexts where adults require DPs: and indeed,
the fact that the genitive Determiner's is systematically omitted in examples
such as (13)(c) and (14)(c) lends further credence to the claim that the
nominal constituents developed by young children at this stage have the
status of indeterminate NPs rather than determinate DPs.
Interestingly, it is not the case that children at this stage simply have
no pronouns at all. Although (as we have seen) they do not make productive
use of pronominal Determiners like I/you/he at this stage, they do on
the other hand make productive use of the pronoun one, as nominals
such as the following illustrate:
(15) a. Yellow one (Paula 18) b. Big one (Bethan 20)

c. Fat one (Lucy 20) d. Dirty one (Neil 20)
e. Yellow one (Angharad 23) f. Red one (Lucy 24)
However, the pronoun one arguably has a very different categorial status
from personal pronouns like we. Although (as we have seen) personal
pronouns have the status of pronominal Determiners, the pronoun one
by contrast has the status of a pronominal Noun-, the analysis of one as
a pro-N would account for why it takes the Noun plural inflection
in the plural form ones, and why it can be premodified by Adjectives,
as we see from examples such as (15) (cf. Radford 1989 for evidence in
support of the claim that one is a pro-N constituent). Thus, more accurately,
we should say that our lexical-thematic analysis of early child English
predicts that children may have lexical pronouns but not functional pronouns
- e.g. they may have pro-N constituents like one, but not pro-D constituents
like he.2
3. STRUCTURE OF CLAUSES IN EARLY CHILD ENGLISH
Having argued that early child nominals have a lexical-thematic structure,

I now turn to look at the structure of the earliest clauses produced by
young children. In adult English, 'Ordinary Clauses' (a term made precise
in Radford 1988b) contain both functional and nonthematic constituents,
as we can illustrate in relation to a sentence like (16) below:
(16) There is no reason to think that it would be politic for the president
to back down
The pronouns there and it seem to be purely nonthematic, nonreferential

'dummy' constituents - hence the fact that they cannot have their reference
questioned by where?/what?. Moreover, there are a number of functional
categories in (16), including the I constituents would/to, the D constituents
there/it/no/a/the, and the C constituents that/for. Thus, it seems reasonable
to conclude that adult 'Ordinary Clauses' are functional-nonthematic struc-
tures. We can illustrate what is meant by this by looking rather more
closely at the structure of the italicised complement clause in (16) above:
this arguably has the simplified structure indicated in (17) below:
(17) [Cp [c for] [IP [Dp [D the] [NP president]] h to] [ VP back down]]]
The structure is a functional one in the sense that the overall clause for
the president to back down is a Complementiser Phrase ( = CP), i.e. a
projection of the head functional C constituent for: moreover, the com-
plement of for is itself a functional IP constituent ( = the Infinitive Phrase
the president to back down), headed by the functional Infinitiviser to\ and
the subject of the Infinitive Phrase is itself a functional DP ( = the Determiner
Phrase the president), headed by the functional D constituent the. Thus,
clauses are clearly functional structures; moreover, it should be apparent
from (17) that they are also nonthematic structures (in the sense defined
above): for example, for, and the are nonthematic constituents; and the
subject DP the president is in a nonthematic position, in the sense that
it is not theta-marked by (i.e. is not a logical argument of) its immediate
sister constituent ( = the I-bar to back down), but rather by its grand niece
( = the V-bar back down).
Having looked briefly at the structure of adult clauses, we can now
turn to look at the structure of the earliest clause types produced by young
children. Typical examples of early child clauses are given in (18) below
(the (a) and (b) examples being from Bloom 1970):
210 Andrew Radford
(18) a. Man drive truck ( = 'The man drives the truck', Allison 22)
b. Baby eating cookies ( = 'I am eating the cookies', Allison 22)
c. Wayne taken bubble ( = 'Wayne has taken the bubble container',
Daniel 21)
d. Bear in chair ( = 'The bear is in the chair', Gerald 21)
e. Hand cold ( = 'My hand is cold', Elen 20)
f. Want lady get chocolate ( = 'I want the lady to get the chocolate',
Daniel 23)
What is immediately striking about the child clauses in (18) in comparison

with their adult counterparts (indicated by the adult paraphrases given
in single quotation marks) is that the child clauses lack all the functional
and nonthematic constituents which occur in the corresponding adult
sentences. For example, the child sentences lack the D constituents the/
I/my, and the I constituents am/has/is/to which are found in the adult
form (we assume that finite Auxiliaries are superficially positioned in I
in adult English, for reasons outlined in Pollock 1989). On the contrary,
children's clauses seem to be purely lexical-thematic structures of the
schematic form (1) above. Assuming this to be so, the clauses in (18)(c),
(d), and (e) will have the respective structures indicated in (19) below:
(19) a. [Vp [NP Wayne] [Y [V taken] [ N P bubble]]]

B. [PP [NP Bear] [ P in] [ N P chair]]]
C. [Ap [NP Hand] [A- [ A hot]]]
If we define a 'clause' as a subject-predicate structure, then (19)(a) will

be a verbal clause, (19)(b) a prepositional clause, and (19)(c) an adjectival
clause. (19)(a) will have essentially the same lexical-thematic structure as
(2) above; (19)(b) will be similar, but will differ in that the structure is
headed by a P projected into P' and PP; and (19)(c) will again be similar,
but differ in that the head is an A (projected into A' and AP) which
lacks a sister complement constituent to theta-mark.
If our suggestions here are along the right lines, then it follows that
whereas adult clauses are functional-nonthematic structures which are
projections of the head functional categories C and I, child clauses by
contrast are lexical-thematic structures which are projections of head lexical
categories (V, P, and A in the case of (18) above). In order to defend
our lexical-thematic analysis, we clearly need to argue that child clauses
have neither an I-system, nor a C-system: in the remainder of this section,
I shall provide empirical substantiation for these claims; I shall begin by
arguing that early child clauses have no I-system.
As we saw in relation to (17) above, infinitive structures in adult English
are phrasal projections of the Infinitiviser to, and thus have the status
of IP ( = Infinitive Phrase) constituents. Since I is a functional-nonthematic

constituent, we should clearly expect that children would not have acquired
infinitive structures headed by to at this stage. Confirmation of this
suggestion comes from the fact that the child's counterpart of adult
infinitival IPs headed by the I constituent to appear to be simple VPs
lacking to. For example, we find that the child's counterpart of adult
infinitive complements of Verbs like want do not contain to at this stage;
this is true not only where the [bracketed] complement clause has an overt
subject, as in (20) below:
(20) a. Want [teddy drink] (Daniel 19)

b. Want [mummy come] (Jem 21)
c. Want [dolly talk] (Daniel 21)
d. Want [lady get chocolate] (Daniel 23)
e. Want [mummy do] (Anna 24)
but also where the bracketed complement clause lacks an overt subject,
as in (21) below (taken from a longitudinal study of a boy called Daniel):
(21) a. Want [have money] (Daniel 19)

b. Want [blow]. Want [put coat]. Want [open door] (Daniel 20)
c. Want [read]. Want [put car] (Daniel 21)
d. Want [go out]. Want [sit] Want [find bike] (Daniel 22)
e. Want [sit on knee] (Daniel 23)
f. Want [see handbag]. Want [get down]. Want [go up]. Want
[open door]. Want [smack Wayne]. Want [get down] (Daniel
24)
The adult counterpart of all of the bracketed complements in (20) and

(21) above would be an IP headed by an I constituent containing infinitival
to\ in place of the adult IP, the child uses (what would appear to be)
a simple VP with an overt subject in (20), and with a 'missing' subject
in (21). For example, the adult counterpart of the bracketed complement
clause in (20)(d) would have the structure (22)(a) below, whereas the child's
utterance arguably has the structure (22)(b):
(22) a. [ I P [ D the lady] [ v fr to] [ VP [y [v get] [ D P the chocolate]]]]]

b. [ VP [ N p lady] [y [ v get] [ N P chocolate]]]
There are two familiar differences between the adult and child structures.
Firstly, the adult structure is a functional one (since it contains the functional
categories I and D and their projections), whereas the child structure is
a lexical one (in that it contains only the lexical categories V and N and
212 Andrew Radford
their projections). Secondly, the adult structure is a nonthematic one (since

e.g. the lady occupies a nonthematic position), whereas the child structure
is a purely thematic one which conforms to the canonical schema in (1)
above (by virtue of the fact that the NP chocolate is theta-marked by
its sister V get, and the NP lady is theta-marked by its sister V-bar get
chocolate). Thus, data like (20) and (21) suggest that the child's counterpart
of adult functional-nonthematic infinitival IPs are purely lexical-thematic
VPs.
In addition to serving as the position in which infinitival to is base-
generated, I is also the position in which Modals are base-generated in
adult English (cf. Radford 1988b, section 6.5). This accounts (inter alia)
for the fact that Modals and infinitival to occupy the same position within
the clause, between subject and predicate - as illustrated by the bracketed
IPs in (23) below:
(23) a. I'm anxious that [ IP you [i should\ [ VP do it]]

b. I'm anxious for [ IP you Q to] [Vp do it]]
This being so, then we should expect to find that if early child clauses
do indeed contain no I-system, then they will also lack Modals. Numerous
published studies have commented on the systematic absence of Modals
as a salient characteristic of early child speech: cf. e.g. Brown 1973, Wells
1979, and Hyams 1986. Indeed, this pattern was reported in studies of
imitative speech in the 1960s. For example, Brown and Fraser 1963, Brown
and Bellugi 1964 and Ervin-Tripp 1964 observed that children systematically
omit Modals when asked to repeat model sentences containing them, as
illustrated by the following examples which they provide (ibid.):
(24) ADULT MODEL SENTENCE CHILD'S IMITATION

a. Mr Miller will try Miller try (Susan 24)
b. I will read the book Read book (Eve 25)
c. I can see a cow See cow (Eve 25)
d. The doggy will bite Doggy bite (Adam)
It seems reasonable to suppose that whereas the adult model sentences

in (24) are functional-nonthematic IPs, their child counterparts are lexical-
thematic VPs: for example, the adult model sentence in (24)(d) is an IP
of the simplified form (25)(a) below, whereas the child's imitation by
contrast is a VP of the simplified form (25)(b):
(25) a. [ IP [ D P The doggy] [ r [, will] [ VP [v- [v bite]]]]]

b. [ VP [Np Doggy] [v- tv bite]]
The systematic differences between the adult model sentences and their
child counterparts are directly predictable from our hypothesis that early
child grammars lack functional/nonthematic constituents, so that in place
of adult modal IPs children use nonmodal VPs (moreover, they replace
the functional DPs Mr Miller/the book/a cow/the doggy by the lexical
NPs Miller/book/cow/doggy, and likewise have a 'missing' argument in
place of the adult pronominal Determiner I). It should be immediately
apparent that (25)(b) is a lexical-thematic structure of the canonical form
(1) above (save for the fact that it is headed by an intransitive verb which
has no sister complement to theta-mark). The fact that children use
nonmodal VPs in contexts where adults require Modal IPs provides us
with further evidence that the earliest clauses produced by young children
are purely lexical-thematic structures.
Although we have concentrated on modal Auxiliaries here, there is parallel
evidence that children likewise make no productive use of nonmodal
Auxiliaries at this stage. We can illustrate this in terms of examples such
as the following:
(26) a. Kathryn no like celery ( = 'Kathryn doesn't like celery', Kathryn

22, from Bloom 1970) 3
b. Wayne taken bubble ( = 'Wayne has taken the bubble container',
Daniel 21)
c. Mummy doing dinner ( = 'Mummy is doing the dinner', Daniel
22)
d. Hair wet (= 'My hair is wet', Kendall 22, from Bowerman
1973)
The adult counterpart of such structures would be an IP headed by a

finite functional I constituent which is superficially filled by a nonthematic
Auxiliary, viz. dummy do in (a), perfective have in (b), progressive be in
(c), and copula be in (d). However, the child's counterpart in each case
would seem to be a lexical-thematic structure which lacks a functional
I-system and nonthematic Auxiliary constituents, and instead is a simple
VP (= verbal clause) structure in the case of (26)(a-c), and an AP ( =
adjectival clause) in the case of (26)(d). Thus, the more general conclusion
we reach is that early child clauses contain no Auxiliaries of any kind;
and since Auxiliaries are nonthematic constituents which superficially
occupy a functional I position, this conclusion clearly lends further empirical
support to our analysis of early child clauses as purely lexical-thematic
structures.
Thus far we have looked at the child's counterpart of adult infinitive
clauses on the one hand and adult finite clauses containing Auxiliaries
on the other. However, a further clause type found in English are clauses
214 Andrew Radford
containing a finite Nonauxiliary Verb. For reasons outlined in Radford

1988b (section 6.5), such clauses are arguably IPs headed by an empty
(i.e. phonologically null) finite I constituent which carries tense and
agreement properties. Thus, clauses such as those in (27) below:
(27) John adores/adored Mary
have the simplified superficial structure indicated in (28) below:
(28) [ IP John [] e] [Vp [v adores/adored] Mary]]
The head I constituent of IP is empty (= e) of lexical material, but carries

tense/agreement properties (third person singular present/past tense) which
must be discharged onto a verbal stem. Since there is no Auxiliary Verb
in I for these features to be discharged onto, they are discharged instead
onto the head V of VP, and there surface in the form of the +s/+d inflection
on the verb adore. We might therefore say that the present/past tense
inflections +s/+d encode abstract tense/agreement properties of the head
I of IP. However if (as I claim here) early child clauses have no I-system,
then we should expect to find that children have not yet acquired the
present/past inflections +s/+d (since these are surface reflexes of properties
of an adult I constituent not yet acquired). Empirical support for this
claim comes from the fact that children at this stage typically reply to
questions containing a Verb overtly marked for Tense with a sentence
containing a tenseless Verb: cf. the following dialogue:
(29) ADULT: What did you draw?

CHILD: Hayley draw boat (Hayley 20)
Likewise, when asked a question containing a Verb overtly inflected for

Agreement, children at this stage typically reply using an uninfected,
agreementless verb: cf. e.g.
(30) a. ADULT: What does Ashley do?

CHILD: Ashley do pee...Ashley do poo (Jem 23)
b. ADULT: What does the pig say?
CHILD: Pig say oink (Claire 25, from Hill 1983)
Since Tense and Agreement are properties of I, the fact that children at
this stage have not acquired the relevant Tense/Agreement inflections
provides further evidence in support of our claim that they have not yet
acquired I.4 Nonacquisition of finite verb inflections is a characteristic
property of early child English widely reported in the relevant literature
(cf. e.g. Brown and Fraser 1963, De Villiers and De Villiers 1973a, Brown
1973, etc.).
It is perhaps useful to pinpoint the exact nature of the differences between
auxiliariless finite clauses in adult English, and their child counterparts.
Given the assumptions made here, the child's reply Pig say oink in (30)(b)
would have the structure (31)(a) below, whereas its adult counterpart The
pig says oink would have the superficial structure (31)(b):
(31) a. [ VP [ NP Pig] [ v say] oink]

b. [ IP [ D P The pig] [, e] [ VP [ v says] oink]]
The differences between the two structures reflect the familiar pattern that
adult clauses are functional-nonthematic structures, whereas their child
counterparts are lexical-thematic structures. Thus, the overall clause has
the status of a functional IP in adult English, but of a lexical VP in early
child English; the verb says carries the Tense/Agreement properties
discharged by I in the adult structure (31)(b), but the child's verb say
in (31)(a) carries no I-inflections for the obvious reason that the child's
grammar has no I-system; the adult subject the pig is a functional DP
which superficially occupies a functional position (as the specifier of the
functional category IP) in (31)(a), whereas its child counterpart is a lexical
NP which superficially occupies a lexical position (as the specifier of the
lexical category VP) in (31)(b).
Having argued that early child clauses lack an I-system, I shall now
turn to argue that early child clauses likewise lack the functional C-system
found in adult English Ordinary Clauses (the structure of which is discussed
in Radford 1988b, chapter 9). We can illustrate the nature of the adult
English C-system in terms of the bracketed complement clause in (32)
below:
(32) I wonder [how best for us to deal with the situation]
The head of the bracketed clause appears to be the Complementiser ( =

C) for, its complement appears to be the Infinitive Phrase ( = IP) us to
deal with the situation; and its specifier seems to be the wh-phrase how
best. Given these assumptions, (32) is a CP with the simplified structure
indicated in (33) below:
(33) [ c p how best [ c for] [ IP us to deal with the situation]]
Given the arguments in Radford 1988b (section 6.4) that preposed Au-
xiliaries are superficially positioned in the head C position of CP, it follows
that a wh-question such as:
216 Andrew Radford
(34) How best can we deal with the situation?
will have a superficial structure essentially similar to that of (33), save

that the head C position of CP will be filled not by a Complementiser,
but rather by an Auxiliary ( = can) moved out of I into C: thus, we assume
that (34) has the simplified superficial structure (35) below:
(35) [cp How best [ c can] [ I P we deal with the situation]]?
Given the assumptions we are making here, it follows that the head C
position of CP can be filled either by a base-generated Complementiser
(for/that/whether/if), or by a finite Auxiliary {can/could/will/has/is/was/
does etc.) preposed into C (from I); and that the specifier position in CP
can be filled by a preposed constituent of some kind (e.g. a preposed
wh-phrase).
Given that Complementisers are both functional and nonthematic cons-
tituents, it follows that our lexical-thematic analysis of early child phrases
and sentences would predict that early child clauses will contain no C-
system whatever. Some evidence which supports this conclusion comes
from the fact that children's complement clauses at this stage are never
introduced by Complementisers like that/for/whether/if. on the contrary,
children's complement clauses at this stage are purely lexical-thematic
structures of the canonical form (1) above, as illustrated by the bracketed
clausal complements of want in (36) below:
(36) a. Want [teddy drink] (Daniel 19)

b. Want [baby talking] (Hayley 20)
c. Want [mummy come] (Jem 21)
d. Want [lady get chocolate] (Daniel 23)
e. Want [car out] (Daniel 21)
f. Want [hat on]. Want [monkey on bed]. Want [coat on] (Daniel
23)
g. Want [telly on]. Want [top off] (Leigh 24)
h. Want [sweet out]. Want [sweetie in bag out]. Want [telly on].
Want [Roland on telly] Want [shoes on]. Want [key in] (Daniel
24)
The head of the complement clause is usually either a nonfinite Verb (as
in the (a-d) examples), or a Preposition (as in the (e-h) examples). In no
case, however, is the complement clause ever introduced by a Comple-
mentiser - a fact which is clearly consistent with our view that early child
clauses entirely lack a C-system. Moreover, imitative speech data yield
much the same conclusion: Phinney 1981 argues that young children
consistently omit Complementisers on sentence repetition tasks.
Given that a second role of the C-constituent in adult speech is to act
as the landing-site for preposed Auxiliaries (e.g. in direct questions), we
should expect that children under two years of age will not show any
productive examples of 'subject-auxiliary inversion' in direct questions.
In fact, early child interrogative clauses show no evidence whatever of
Auxiliaries preposed into C, and more generally lack Auxiliaries altogether.
Typical examples of auxiliariless interrogatives found in early child speech
are given below (examples of yes-no questions from Klima and Bellugi
1966: 200):
(37) Fraser water? Mommy eggnog? See hole? Sit chair? Ball go?
A similar pattern emerges from the transcripts of the speech of Claire

at 24-25 months in Hill 1983, where we find questions such as the following:
(38) Chair go? Kitty go? Car go? Jane go home? Mommy gone?
As Klima and Bellugi remark (1966: 201) it is characteristic of early child

English at this stage that: 'There are no Auxiliaries, and there is no form
of Subject-Verb Inversion'. Once again, the absence of preposed Auxiliaries
is consistent with our hypothesis that early child clauses lack a syntactic
C-system (given that preposed Auxiliaries are superficially positioned in
C): cf. similar conclusions reached in Radford 1986/1987/1988a/1990, and
echoed in Guilfoyle and Noonan 1989.
On the assumption that children's clauses lack a C-system at this stage,
it follows that they will also lack a Specifier of C, and hence that young
children will not show any evidence of having acquired the adult English
rule of wh-movement which (in simple cases) moves wh-phrases out of
a thematic position within VP into the nonthematic C-specifier position
to the left of C. In this connection, it is interesting to note that Fukui
(1986: 234) argues that the absence of syntactic wh-movement in Japanese
is a direct consequence of the fact that Japanese is a language which has
no C-system, since this will mean that Japanese has no C-specifier position
to act as a landing-site for preposed wh-phrases. Thus, if early child English
resembles adult Japanese in respect of lacking a C-system, then we should
expect that at this stage we will not find examples of interrogatives showing
clear evidence of a wh-phrase having being moved into the precomple-
mentiser position within CP. Significantly, studies of wh-questions in child
speech over the past decade have generally agreed that children under
two years of age do not show any evidence of having acquired a productive
syntactic rule of wh-movement (cf. e.g. Klima and Bellugi 1966, Brown
218 Andrew Radford
1968, Bowerman 1973, Wells 1985). This finding is obviously consistent

with our hypothesis that early child clauses have no C-system (given the
assumption that preposed wh-phrases in adult English occupy the specifier
position within CP).
In contexts where they attempt to imitate an adult question containing
a preposed Auxiliary and a preposed wh-word, children typically omit
both the Auxiliary and the wh-word, as examples such as the following
illustrate (the last two examples being from Brown and Fraser 1963):
(39) ADULT MODEL SENTENCE CHILD'S IMITATION

Where does Daddy go? Daddy go? (Daniel 23)
Where shall I go? Go? (Eve 25)
Where does it go? Go? (Adam 28)
A similar pattern (no preposed wh-word or preposed Auxiliary) is found

in (what appear to be) child counterparts of adult wh-questions in
spontaneous speech, as examples such as those in (40) below illustrate:
(40) a. Bow-wow go? (' Where did the bow-wow go?', Louise 15)
b. Mummy doing? ('What is mummy doing?', Daniel 21)
c. Car going? ('Where is the car going?', Jem 21)
d. Doing there? (What is he doing there?', John 22)
e. Mouse doing? ('What is the mouse doing?', Paula 23)
The omission of the italicised preposed wh-phrases and preposed Auxiliaries

is obviously consistent with the assumption that children have not yet
developed a syntactic C-system, and thus lack a landing site for preposed
Auxiliaries and preposed wh-phrases (omission of preposed wh-expressions
is also reported in the early stages of the acquisition of French by Guillaume,
1927: 241). It is also interesting to note the omission of the pronominal
Determiners I/it/he and the prenominal Determiner the in (39) and (40).5
Interestingly, comprehension data appear to reinforce our conclusion that
children have not developed a C-system at this stage. The relevant data
concern the fact that children at the lexical-thematic stage of development
appear to have considerable difficulty in understanding wh-questions which
show clear evidence of the preposing of a wh-phrase from a thematic
complement position within VP into the nonthematic specifier position
within CP. This was noted by Klima and Bellugi (1966: 201-2), who observe
that children at this stage frequently fail to understand questions involving
a preposed wh-word (for example a preposed wh-object like what in ' What
are you doingT). They remark (1966: 201) that 'If one looks at the set
of what-object questions which the mother asks the child in the course
of the samples of speech, one finds that at stage I the child generally
does not respond or responds inappropriately.' Among the examples they

provide (1966: 202) in support of this claim are the following, where the
question is asked by the mother, and the italicised expression is the child's
reply:
(41) a. What did you do? - Head

b. What do you want me to do with this shoe? - Cromer shoe
c. What are you doing? - No
Klima and Bellugi conclude of children at this stage that 'They do not
understand this construction when they hear it.' Why should this be? The
answer we suggest here is that early child clauses are purely lexical-thematic
structures which lack a C-system, with the result that children are unable
to parse (i.e. assign a proper syntactic and thematic analysis to) adult
CP constituents (They have to rely on pragmatic rather than syntactic
competence in order to attempt to assign an interpretation to adult CP
structures).
4. THE OVERALL ORGANISATION OF EARLY CHILD GRAMMARS
Thus far, I have argued that the earliest phrasal and clausal structures
produced by young children are lexical-thematic in nature, and so lack
the functional and nonthematic constituents found in adult English. In
this section, I shall argue that in consequence, children's grammars are
radically different in nature from their adult counterparts.
In section 2,1 argued that early child nominals are lexical NP constituents
which lack a functional D-system. If this is so, then we should expect
that children's NPs would lack the functional properties carried in the
adult D-system. One such property which is carried by the D-system in
adult grammars is morphological case. For example, in a sentence such
as:
(42) They hate us Americans
the case properties of the italicised nominal arguments they and us Americans
are carried in the D-system, in the sense that they are overtly marked
on the nominative pronominal Determiner they, and the objective pre-
nominal Determiner us. Now, if (as I am arguing here) early child grammars
have no D-system, then we should expect to find that they lack the functional
case properties carried by the D-system: in other words, we should expect
to find that early child English is a caseless language. In this connection,
it is interesting that (as we have already noted), children typically have
220 Andrew Radford
no productive use of case-marked pronouns like I/me/my at this stage.

Moreover, there is no case contrast in Nouns in early child English: as
we see from examples like (14) above, a Noun like mommy remains
uninfected for case irrespective of whether it is used in (what in adult
terms would be) a nominative, objective, or genitive position. Thus, there
appears to be no evidence of the existence of morphological case in early
child grammars of English. Moreover, data from published studies on the
acquisition of other languages lead us to the same conclusion that the
earliest (pro)nominals developed by all children at the early patterned speech
stage are typically caseless. For example, Rom and Dgani 1985 report
that children under 24 months of age acquiring Hebrew as their first
language have no productive case-system. Likewise, Schieffelin 1981 argues
that children acquiring Kaluli as their first language do not begin to make
productive use of case-marking particles until after two years of age. Various
studies of the acquisition of German (cf. e.g. Park 1981, Clahsen 1984,
Tracy 1986) have concluded that at the earliest stage of development there
are 'No case markers present' (Tracy 1986: 54). If (as I have suggested
here), case is a property of the D-system, then the absence of morphological
case contrasts is a direct consequence of the absence of a D-system in
early child grammars.
The absence of case in early child grammars would be expected to have
important consequences not only for the morphology of nominals, but
also for their syntax. If nominals in adult English are DPs and must carry
morphological case, then it follows that they will be restricted to occurring
in correspondingly case-marked positions. For example, a nominative D P
will only be able to occur in a nominative position (i.e. a position where
it falls within the domain of a nominative case-assigner), an objective D P
in an objective position, and a genitive DP in a genitive position.
Consequently, there are case constraints on the distribution of DPs in adult
grammars: for example, a nominative D P cannot occur in an objective
position (hence the ungrammaticality of *John loves she); and (more
generally), no overt DP (whether nominative, objective, or genitive) can
occur in a caseless position - i.e. a position in which it does not fall within
the domain of any case assigner at all (hence the ungrammaticality of
*I am keen you to go there).
Thus, the overall situation is that adult nominals are case-marked DPs,
restricted to occurring in correspondingly case-marked positions. However,
if (as I claim here) child nominals are caseless NPs, and if there is no
case module (nonvacuously) operative in early child grammars of English,
then it follows that there can in principle be no case constraints on the
distribution of child nominals. We should thus expect to find children
using nominals (free of case constraints) in any argument position - even
in an argument position which (in adult terms) would be a caseless position.
In this connection, it is interesting to note that children at this stage freely

use nominals as the subject of nonfinite independent sentences:
(43) a. Mouse in window (Hayley 20)

b. Wayne in bedroom (Daniel 21)
c. Girl hungry (Kathryn 22, from Bloom 1970)
d. Birdie flying (Bethan 21)
e. Car gone (Angharad 22)
f. Man drive truck (Allison 22, from Bloom 1973)
The italicised argument position (viz. subject position in a nonfinite clause)

would be a caseless position in adult English, since only a finite verb licenses
a case-marked (nominative) subject: thus, the adult counterparts of (43)
would be ungrammatical, since they would involve a case-marked D P
occurring in a caseless position. The very fact that child sentences like
(43) are very frequent (and hence presumably grammatical) in early child
English provides us with strong evidence that there are no case constraints
on the distribution of nominals in child grammars.
Additional evidence in support of this conclusion comes from the fact
that young children at this stage use nominals as direct complements of
intransitive Verbs: we can illustrate this phenomenon in terms of the way
that children use the verb go:
(44) a. Wayne go river. Daddy gone van. Want go mummy (Daniel

22)
b. Go work (Jenny 23)
c. Go school. Gone school (Daniel 23)
d. Go daddy [wanting to go to her daddy] (Anne 24)
e. Going village. Roland going children (Daniel 24)
Such examples are a mere handful of the dozens of examples of structures

of this type in my corpus (involving not only go but also other intransitive
Verbs). Given that go is an intransitive Verb, it does not directly case-
mark its complement: one immediate consequence of this is that structures
such as those in (44) would be ungrammatical in adult English, since the
italicised nominals would be case-marked DPs occupying caseless positions.
Their grammatical counterparts in adult English require the use of a
transitive Preposition such as to in order to case-mark (or license) the
italicised complement, so that the adult counterpart of Wayne go river
would be Wayne went to the river, where the D P the river falls within
the case domain of the transitive Preposition to. The very fact that structures
like (45) occur in early child English provides empirical support for our
claim that there are no case constraints on the distibution of nominals
222 Andrew Radford
in early child English (sentences like (5) above provide additional evidence
in support of this claim, in that the complement Noun is in a caseless
position).
An interesting corollary of our claim that early child nominals are caseless
NPs free of case constraints on their distribution is that we should expect
this to be true not only of overt nominal arguments used by young children,
but also of any covert nominal arguments which they may use. In a number
of influential works, Nina Hyams 1986/1987a/1987b/1988/1989 has argued
that apparently subjectless child utterances have (syntactically present but
phonologically null) 'understood' empty subjects, so that (details aside)
a child utterance such as (45)(a) below would have the 'fuller' structure
(45)(b):
(45) a. Want Lisa (Hayley 20) b. e want Lisa
where e denotes an empty subject understood as designating the speaker

Hayley. Given the analysis of early child English presented here, the empty
subject e would be an empty caseless NP, and hence should be free of
case constraints on its distribution. This means that we should expect that
any argument position can be occupied by an empty NP in early child
English; and (as we shall see) there is strong empirical evidence that this
is so.
For example, we find frequent examples of seeming null-subject sentences:
(46) a. Want Lisa. Want baby talking (Hayley 20)

b. Want one. Gone out. Coming to rubbish (Bethan 20)
c. Want crayons. Want biscuit. Want mummy come. Pee in potty
(Jem 21)
d. Want tiger. Get tiger. Get daddy. Get trousers. Get Tina. Shoot
Tina. Shoot plant (Domenico 24)
But there are also numerous examples of what appear to be null object
sentences, such as:
(47) a. Lady do (Jem 21)

b. Wayne do (Daniel 21)
c. Wayne got (Daniel 22)
d. Mummy get. Man taking. Man take (Claire 23)
e. Jem put back [in reply to 'Please put that back']. Lady read
[handing book to recordist for her to read to him] (Jem 23)
f. Lady get (Daniel 23, wanting sound recordist to get him some
sweets)
g. Jem have (Jem 24, reaching for pot of yoghurt)

h. Want mummy do (Anna 24)
Not only can the first or second argument of a predicate be null, but
in addition the third argument of a three-place predicate can likewise be
null, as illustrated by examples such as the following (where I use e to
designate the empty third argument; the examples also have 'missing'
subjects):
(48) a. Bring mummy e (= 'I'm going to bring mummy one\ Domenico

24)
b. Put car e (= 'I'm going to put the car there', Daniel 21)
In (48)(a), a secondary THEME complement (corresponding to adult one)

is missing, while in (48)(b), a secondary LOCATIVE complement (cor-
responding to adult there) is missing. Thus, the fact that the first, second,
or third argument of a predicate can be empty suggests that there are
no case constraints on the use of null NPs (It should be noted that our
discussion here presupposes that Hyams is correct in positing that 'missing
arguments' are projected into the syntax as empty categories in early child
speech; for an alternative analysis of 'missing arguments' as lexically
saturated syntactically unprojected arguments, cf. Radford 1990: 229-37).
If early child nominals are not subject to case constraints on their
distribution, the obvious question to ask is just what does constrain the
distribution of nominal arguments in early child English. The answer
provided by our lexical-thematic analysis is that nominal arguments are
subject to thematic constraints on their distribution, since (as we noted
in relation to our discussion of the canonical structural schema in (1)
above), every nominal argument is required to occur in a position where
it is theta-marked by a sister (lexical) constituent - i.e. where it is the
complement or specifier of a lexical category. What this means in more
concrete terms is that a typical transitive predicate like hit in early child
grammars licenses an AGENT NP to occur in the specifier position of
VP to the left of the head V hit, and a PATIENT NP to occur in the
complement position to the right of the head. If this is so, then it follows
that the relative ordering of arguments with respect to their heads will
be determined by the directionality of theta-role assignment (so that if
AGENT theta-role assignment is leftwards, then AGENT nominals will
precede their predicates): this would be in keeping with the suggestion
made by Koopman 1984 and Travis 1984 that theta-marking is directional
in adult grammars. We might therefore follow Lebeaux (1988: 39) in
224 Andrew Radford
concluding that 'Throughout stage I speech, it is direct theta-role assign-

ment, rather than assignment of (abstract) case, which is regulating the
appearance of arguments'. 6
If all specifier and complement positions are thematic in early child
English, then we should expect to find that children will not make productive
use of nonthematic nominals (i.e. nominals not theta-marked by any
predicate). The most familiar type of nonthematic nominal in English are
so-called 'expletive' pronouns like 'dummy' it/there in (16) above. In this
connection, it is interesting to note the observation made by Hyams (1986:
63) that early child English is characterised by 'a notable lack of expletive
pronouns'. Her observation would appear to be borne out by data such
as (49) below:
(49) a. Raining ( = 'It's raining', Jenny 22)

b. Outside cold ( - I f s cold outside', Hyams 1986)
c. No morning ( = '/f's not morning', Hyams 1986)
d. Mouse in window ( = 'There is a mouse in the window', Hayley
20)
e. Bubble on dungaree ( = 'There is a bubble on my dungarees',
Daniel 21)
f. Mess on legs ( = 'There is a mess on my legs', Daniel 24)
In each of these examples, the natural adult counterpart would be a structure

involving an (italicised) expletive pronoun; but the children concerned
systematically avoid expletive structures, in keeping with our overall claim
that children have not yet developed a mechanism for licensing nonthematic
constituents. We might argue that the absence of nonthematic constituents
(like expletive it/there) is a direct consequence of the absence of functional
constituents in early child English, in that nonthematic constituents are
required in adult English to satisfy functional requirements: e.g. the
'dummy' it in It's raining is needed in order to satisfy the requirement
in adult English that the nominative case assigned by a finite Auxiliary
in I be discharged onto an overt (pro)nominal.
If (as I have suggested here) all specifier and complement positions in
early child English are theta-marked, then it follows from principles of
Universal Grammar that there will be no A-movement in early child English
- i.e. no movement of constituents from one A-position to another (an
A-position can be thought of informally as a 'subject or complement
position'). The reason is that since all A-positions in early child English
are thematic positions, any A-movement operation will result in multiple
theta-marking of the moved constituent (in violation of Chomsky's (1981:
36) theta criterion requirement that 'Each argument bears one and only
one 0-role'). Since 'passivisation' is an instance of movement from one
A-position to another, we should expect to find that passive structures

are not productive in early child grammars of English. There are two
sets of facts which would seem to bear out this prediction. The first is
that there is no evidence of passive structures in children's speech production
at this stage. The second is that data from comprehension experiments
(although complex to interpret) suggest that children at this stage are unable
to parse reversible passive structures correctly (cf. e.g. Fraser, Bellugi and
Brown 1963; Bever 1970; De Villiers and De Villiers 1973b; Maratsos 1974,
etc.). Indeed, young children frequently misparse sentences like 'The lion
was chased by the tiger' and misanalyse the superficial subject (the lion)
as the AGENT: this fact would seem to suggest that they treat passive
subject position as a thematic position (i.e. as the canonical A G E N T
position), and this in turn would bear out our suggestion that all A-positions
are thematic positions in early child grammars of English. Further evidence
in support of the claim that there is no A-movement in early child grammars
of English comes from the fact that children at this stage do not produce
any structures which plausibly involve raising (hence we don't find structures
like It seems to be snowing).
The more general conclusion to be drawn from the fact that children
have not acquired 'passive' or 'raising' structures at this stage would seem
to be that there is no A-movement in early child grammars of English
(i.e. there are no structures in which a constituent moves from one A-
position into another A-position). This conclusion echoes that reached
(via a rather different route) by Borer and Wexler (1987: 147), namely
that one of the defining characteristics of early child grammars is 'the
absence from the early grammar of A-chains'. Within the framework
presented here, the absence of A-movement structures in the early patterned
speech of young children is a direct consequence of the assumption that
all A-positions in the child's syntax are thematic positions: movement from
one A-position to another would be blocked by the THETA CRITERION
(as we have seen).
However, we might go further and argue that our lexical-thematic analysis
also predicts the absence of A-bar movement in early child grammars -
i.e. movement into an A-bar position (We can define an A-bar position
informally as a position which is neither a 'subject' nor a 'complement'
position). We can illustrate A-bar movement in terms of familiar (simplified)
wh-structures such as that in (50) below:
(50) [ C P What [ c will] [ IP you [, e] [ VP [ v do] - ] ] ] ?
The wh-pronoun what originates in a thematic A-position (indicated by

the elongated dash) within VP as the complement of do, but ultimately
ends up in the nonthematic specifier position in CP; since the specifier
226 Andrew Radford
position of CP is an A-bar position (in that what is neither the subject

nor complement of the head C of CP), it follows that wh-movement is
an instance of A-bar movement. Although A-bar movements like wh-
movement are licensed in adult English, there is no wh-movement in early
child English (as we saw in our earlier discussion of examples (39-41)
above); and we might argue that the lexical-thematic analysis correctly
predicts that wh-movement will be unlicensed in early child grammars -
for three reasons. Firstly, the empty category left behind at the original
extraction site when a wh-phrase moves is a variable; and since a variable
is a case-marked trace and children have no case-system in their grammars
at this stage, it cannot be that children have developed variables (or
movement rules involving variables) at this stage. Secondly, the (ultimate)
landing-site for a moved wh-phrase is a functional position, in the sense
that it is a position contained within a projection of the head functional
category C. Thirdly, the (ultimate) landing-site for a preposed wh-phrase
is a nonthematic position, in that the specifier position in CP is not theta-
marked by its (nonthematic) head C. Given that wh-movement involves
movement into a functional nonthematic position, it follows (from our
lexical-thematic analysis) that no such movement operation will be licensed
in early child grammars.
Given that we have argued that there is no A-movement or A-bar movement
in early child grammars of English, the obvious overall conclusion to reach
is that there are no nominal movement chains of any kind (i.e. no chains
formed by the movement of a nominal from one position to another)
in early child grammars of English. This is a conclusion which we can
also reach by a different route. We noted earlier that if children's nominals
are NPs which lack a functional D-system, then they will lack the functional
properties (e.g. case-marking) carried by the adult D-system. Now, we
might argue that a further property of nominals carried in their D-system
is their binding (i.e. coreference) indices, as we can illustrate by examples
such as the following (where subscripts are referential indices marking
possible binding relationships):
(51) a. Harry , got electrocuted because the fool,/some fool} forgot to

turn the electricity off
b. Harry; blames himself/himhfor what happened
In (51)(a), the italicised DP can be interpreted as bound by (i.e. referring

back to) Harry only if the head Determiner is the, not some - so suggesting
that the binding properties of nominals are determined by their D-system.
In (51)(b), the italicised expression is a DP containing a head pronominal
D constituent him/himself, but this can only be interpreted as bound by
Harry if the head D is the anaphor himself - again underlining the fact
that the binding properties of nominals are determined by their D-system.

Since the binding properties of nominals are determined by their D-system,
we might posit that binding indices (represented by the subscript letters
in (51) above) are assigned to DP. However, if we are correct in positing
that early child grammars lack the D-system which carries binding indices,
then it follows that early child nominals will not carry binding indices
at all: this in turn means that child nominals would be unindexed NPs
free of binding restrictions on their use (their referential properties being
determined purely pragmatically, and not constrained grammatically). Some
evidence that this is so comes from examples such as:
(52) a. Kendall see Kendall (= 'I can see myself, Kendall 23, from
Bowerman 1973)
b. Betty touch head...touch Betty head (Betty 24)
In adult grammars, nominals such as Kendall are indexed DPs headed

by a Determiner which is null in English, but can surface as the counterpart
of the in other languages. Binding Theory would require such nominal
DPs to be assigned distinct indices in adult English (and thus be interpreted
as referring to two different individuals), since nominal DPs are required
to be free (i.e. referentially independent) - see Chomsky 1986a for an
attempted formal characterisation of this requirement. However, it is clear
from the context in which (52)(a) was uttered that there is no such
noncoreferentiality constraint operating in the child's grammar, and that
the two instances of Kendall are intended to be interpreted as coreferential.
Why should this be? If early child nominals are unindexed NPs which
lack the binding indices carried by adult DPs, then it follows that nominals
in early child English are free of binding (i.e. coreference) restrictions on
their use: on the contrary, the reference of NPs like Kendall will be
determined purely pragmatically (so that both instances of Kendall will
be interpreted as denoting someone called Kendall who is in the immediate
domain of discourse; if there is only one such person, both instances of
Kendall will be interpreted as referring to that same individual).
Our conclusion that early child nominals do not carry binding indices
ties up in an interesting way with our earlier claim that nominal movement
chains are unlicensed in early child grammars. Current theories of movement
operations assume that a moved constituent leaves behind (in the position
out of which it moves) an empty trace; the moved constituent serves as
the antecedent of the trace, and is coindexed with it. Given these assump-
tions, our earlier wh-question What will you dol in (50) above would have
the (much simplified) superficial structure (53) below:
(53) What, will you do t,l

228 Andrew Radford
where the subscripts are binding indices which serve to indicate that the
trace t is bound by the pronominal wh-DP what (i.e. that what is the
antecedent of the trace). However, since binding is a property of a D-
system which children have not acquired at the lexical-thematic stage of
their development, it follows that nominal movement chains (because they
involve binding chains) will not be licensed in early child grammars. The
fact that two entirely different routes lead us to the same conclusion (viz.
that there are no nominal movement chains in early child grammars)
substantially reinforces the plausibility of the conclusion.
Given that movement of nominals from one position to another involves
movement of a maximal projection, an obvious question to ask is whether
the second type of movement which we find in adult grammars (viz.
movement of a head category into another head position, i.e. head-to-head
movement) is licensed in early child grammars. Typical instances of head-
to-head movement are movement from N to D in Arabic (cf. Fassi Fehri
1988), movement from V to I for Auxiliaries in English and Verbs in
French (cf. Pollock 1989), and movement from I to C for preposed
Auxiliaries in English. All instances of head-to-head movement that I am
familiar with involve movement into a head functional category position
(D, I, or C). However, if (as we have argued here), children have no
functional category systems in their initial grammars, then it follows that
there will be no head-to-head movement in early child grammars: empirical
support for this claim comes from our earlier observation that children
have not acquired direct questions with preposed Auxiliaries at this stage.
Thus, the fact that children have neither XP movement (i.e. movement
of maximal projections) nor X movement (i.e. movement of heads) in
their grammars leads us to the overall conclusion that there are no movement
chains of any kind in early child grammars of English. Our conclusion
thus echoes the words of McNeill 1966, who suggests that 'It is not
unreasonable to think of children "talking" base strings directly.'
5. SUMMARY
The overall conclusion to be drawn from this paper is that whereas adult
phrases and sentences are functional structures which may contain non-
thematic constituents, their child counterparts are purely lexical structures
which contain only thematic constituents, and thus conform to the category-
neutral schema (1) above. In consequence of the absence of functional
categories, early child nominals have no D-system, and early child clauses
have no I-system or C-system. Because case is a property of a D-system
not yet acquired, early child grammars are caseless systems, so that there
are no case constraints on the distribution of overt or covert nominal

arguments. In consequence of the absence of nonthematic constituents in
early child sentences, we find no productive use of 'expletive' nominals,
and no A-movement (i.e. no 'passive', or 'raising' structures). In conse-
quence of the absence of a functional/nonthematic C-system, we find no
A-bar movements like wh-movement. Since neither A-movement nor A-
bar movement is found in early child grammars, the more general conclusion
to be drawn is that no nominal movement chains of any kind are licensed
in early child grammars. This conclusion is given added plausibility by
the fact that early child nominals lack the binding properties carried in
the adult D-system, with the result that there are no binding chains (hence
no nominal movement chains) in early child grammars. Moreover, there
is no evidence of head-to-head movement in early child grammars: given
that this always involves movement into a head functional category position,
the absence of this type of movement operation is a direct consequence
of the absence of functional categories. The more general conclusion we
reach is thus that there are no movement operations of any kind in early
child grammars. Thus, a truly remarkable array of facts are accounted
for in a principled and maximally generalised fashion in terms of our
unifying hypothesis that all phrases and clauses in early child English are
lexical-thematic structures of the schematic form (1) above.
FOOTNOTES
1. There are cases reported in the acquisition literature of children at this stage using 's
pronominally, but not prenominally - cf. e.g. the following sequence produced by Gia at
age 20 months (from Bloom 1970):
(i) Mommy's. Mommy key
In Radford (1990: 106-8) I discuss such examples, suggesting that 's may be misanalysed
by the child as a nominal proform having much the same status as adult one (save for
the fact t h a t ' s encliticises onto a preceding NP).
2. It should be acknowledged that some children do make limited use of demonstrative,
interrogative, and even personal pronouns at this stage (i.e. use items such as this/that/
what/it). However, in Radford (1990: 99-105) I argue that such pronouns have the status
of pronominal Nouns in early child speech, not of pronominal Determiners. From the pro-
N analysis, it follows that while children may use an item like what in a typical Noun position
(viz. as subject or object of a verb), they will not use it in its adult prenominal Determiner
function at this stage, and hence will not produce interrogative nominals like What carl
3. It might be supposed that the negative particles no/not used by young children are functors,
and thus belong to a functional category of some kind (so challenging our assertion that
children have no functional category systems at this stage). However, such negative particles
differ from functional heads in a number of respects. For example, functional heads typically
assign or are assigned functional properties (e.g. D and I can assign case to their specifiers
230 Andrew Radford
and C can likewise case-mark its complement, and D can carry case), whereas negative
particles neither assign nor carry functional properties. Moreover, functional (and other)
heads have specific subcategorisation properties (so that e.g. the Complementiser for
subcategorises an IP headed by to, and to in turn subcategorises a VP headed by an infinitival
V), whereas no/not impose no such restrictions on their choice of complement in child English,
so that we find e.g. Mummy not go/going/gone shops. All in all, it seems more plausible
that prepredicate negative particles like no(t) have the structural function of adjuncts to
V-bar in early child English, and may have the categorial status of Adverbs. Preclausal
negatives (as in No Mummy go shops) are probably best analysed as clausal adjuncts (with
no adjoined to the VP Mummy go shops).
4. While children have not acquired the tense/agreement inflections +s/+d at this stage,
it is nonetheless true that they make productive use of progressive +ing, and limited (though
not productive) use of perfective +n - cf. examples (26)(b) and (c) in the text. I do not
take this to indicate that they must therefore have developed one or more functional heads
(e.g. PROG or PERF) marking aspect. Rather, I take the view (defended in Radford 1988b)
that there is a clear distinction between lexical and functional inflections - i.e. those inflections
associated with lexical heads, and those associated with functional heads. In these terms,
the tense/agreement inflections +s/+d are functional inflections associated with finite I
constituents, whereas +ing/+n are lexical inflections associated with V constituents (in much
the same way as plural + i is a lexical inflection associated with N constituents).
5. Although we find no productive wh-movement at this stage, we do find formulaic wh-
questions like Whasat? ( = 'What's that?'); however, there is general agreement in the literature
that such structures do not involve wh-movement. More problematic for the analysis proposed
in the text is the fact that some children at this stage develop semiformulaic wh-questions
like What NP do(ing)?, a n d / o r Where NP go(ing)?. However, since these structures are item-
specific and clause-bound, no productive movement rule seems to be involved. It may be
that children who produce item-specific structures like What Daddy do(ing)? initially develop
a lexical entry for do which projects an interrogative T H E M E argument (or perhaps, more
specifically, a what T H E M E argument) as an adjunct to the VP containing do, in which
case What Daddy doing? would have the skeletal structure:
(i) [ VP What [ VP Daddy [y [ v doing]]]]?
On this account, no movement would be involved, since what would be directly projected
into the clausal adjunct position, not moved there from the complement position within
VP. It follows from this analysis that the interrogative word will always remain clause-
bound (i.e positioned within its containing clause, so that there is no movement of the
the wh-word out of one clause into another). For more detailed discussion of early wh-
questions, see Radford (1990: 122-36).
6. An interesting question which arises here is whether children master the directionality
of theta-marking from the very beginnings of early multiword speech. In this connection,
it is interesting to note that Bowerman's 1973 study of Kendall at age 22 and 23 months
showed Kendall producing not only A G E N T + A C T I O N + P A T I E N T structures like (i) below,
but also P A T I E N T + A C T I O N + A G E N T structures such as (ii):
(i) Mommy pick-up Kendall

(ii) Mommy hit Kendall ( = 'Kendall hit Mommy')
While it is far from clear how to interpret the relevant data, one possibility would be to
posit that Kendall has not yet set the relevant parameter which determines the directionality
of theta-marking, and thus allows A G E N T nomináis to be positioned either to the left or

to the right of their associated predicates.
REFERENCES
Abney, S. P. 1987. The English Noun Phrase in its Sentential Aspect. Doctoral dissertation,
MIT.
Bever, T. 1970. The cognitive basis for linguistic structures. In J.R. Hayes (ed.) Cognition
and the Development of Language. 274-353. New York: Wiley.
Bloom, L. 1970. Language Development. Cambridge, Massachusetts: MIT Press.
Bloom, L. 1973. One Word at a Time. The Hague: Mouton.
Bloom, L., P. Lightbown and L. Hood. 1978. Pronominal-Nominal Variation in Child
Language. In L. Bloom (ed.) Readings in Language Development. 231-238. New York:
Wiley.
Borer, H. and K. Wexler. 1987. The Maturation of Syntax. In Roeper and Williams, 123-
172.
Bowerman, M. 1973. Early Syntactic Development. Cambridge: Cambridge University Press.
Braine, M. D. S. 1976. Children's first word combinations. Monographs of the Society for
Research in Child Development no. 41.
Brown, R. 1968. The Development of Wh Questions in Child Speech. Journal of Verbal
Learning and Verbal Behaviour 7. 279-290.
Brown, R. 1973. A First Language: the Early Stages. London: George Allen and Unwin.
Brown, R. and U. Bellugi. 1964. Three processes in the child's acquisition of syntax. Harvard
Educational Review 34. 133-51.
Brown, R. and C. Fräser. 1963. The Acquisition of Syntax. In C. Cofer and B. Musgrave
(eds.) Verbal behaviour and learning: problems and processes. 158-201. New York: McGraw-
Hill.
Cazden, C. B. 1968. The acquisition of noun and verb inflections. Child Development 39.
433-448.
Chomsky, N. 1965. Aspects of the Theory of Syntax. Cambridge, Massachusetts: MIT Press.
Chomsky, N. 1975. Reflections on Language. New York: Pantheon.
Chomsky, N. 1986a. Knowledge of Language: Its Nature, Origin, and Use. New York: Praeger.
Clahsen, H. 1984. Der Erwerb von Kasusmarkierungen in der deutschen Kindersprache,
Linguistische Berichte 89. 1-31.
De Villiers, P.A. and J.G. 1973a. A cross-sectional study of the acquisition of grammatical
morphemes in child speech. Journal of Psycholinguistic Research 2. 267-78.
De Villiers, P.A. and J.G. 1973b. Development of the use of word order in comprehension.
Journal of Psycholinguistic Research 2. 331-341.
Ervin-Tripp, S.M. 1964). Imitation and structural change in children's language. In E.H.
Lenneberg (ed.) New Directions in the Study of Language. Cambridge, Massachusetts: MIT
Press. 163-89.
Fassi Fehri, A. 1988. Generalised IP structure, Case, and VS word order. In Fassi Fehri,
A. et al. (eds.) Proceedings of the First International Conference of the Linguistic Society
of Morocco. Rabat: Editions OKAD. 189-221.
Fräser, C., U. Bellugi, and R. Brown. 1963. Control of grammar in imitation, comprehension,
and production. Journal of Verbal Learning and Verbal Behaviour 2. 121-135.
232 Andrew Radford
Fukui, N. 1986. A Theory of Category Projection and its Applications. Doctoral dissertation,
MIT.
Gleitman, L. 1981. Maturational Determinants of Language Growth. Cognition 10. 103-
114.
Goodluck, H. 1989. The Acquisition of Syntax. Ms. University of Ottawa.
Greenfield, P. et al. 1985. The structural and functional status of single-word utterances
and their relationship to early multi-word speech. In M. D. Barrett (ed.) Children's Single-
Word Speech. New York: Wiley. 233-267.
Guilfoyle, E. and M. Noonan. 1989. Functional Categories and Language Acquisition. Ms.
McGill University.
Guillaume, P. 1927. Le dévelopement des éléments formels dans le langage de l'enfant. Journal
de Psychologie 24. 203-229. Translated as The development of formal elements in the
child's speech. In C.A. Ferguson and D. I. Slobin (eds.) 1973. Studies of Child Language
Development. New York: Holt Rinehart and Winston. 240-251.
Hill, J. A. C. 1983. A Computational Model of Language Acquisition in the Two Year Old.
Indiana University Linguistics Club.
Hyams, N. 1987a. The Theory of Parameters and Syntactic Development. In Roeper and
Williams, 1-22.
Hyams, N. 1987b. The Setting of the Null Subject Parameter: A reanalysis. Text of paper
presented at the Boston University Conference on Child Language Acquisition.
Hyams, N. 1988. The Acquisition of Inflection: a parameter-setting approach. Ms. UCLA.
Hyams, N. 1989. The Null Subject Parameter in Language Acquisition. In O. Jaeggli and
K. Safir (eds.) The Null Subject Parameter. Dordrecht: Kluwer. 215-238.
Kazman, R. 1988. Null Arguments and the Acquisition of Case and Infl. Text of paper
presented at the University of Boston Conference on Child Language Acquisition.
Klima, E. S. and U. Bellugi. 1966. Syntactic Regularities in the Speech of Children. In J.
Lyons and R. Wales (eds.) Psycholinguistic Papers. Edinburgh: Edinburgh University Press.
183-207.
Koopman, H. 1984. The Syntax of Verbs. Dordrecht: Foris.
Lebeaux, D. S. 1987. Comments on Hyams. In Roeper and Williams, 23-39.
Lebeaux, D. S. 1988. Language Acquisition and the Form of the Grammar. Doctoral Dissertation,
Macnamara, J. 1982. Names for things: a study of child language. Cambridge, Massachusetts:
MIT Press.
Maratsos, M. 1974. Children who get worse at understanding the passive: a replication of
Bever. Journal of Psycholinguistic Research 3. 65-74.
Park, T-Z. 1981. The Development of Syntax in the Child with special reference to German.
Innsbruck: AMOE.
Phinney, M. 1981. Syntactic Constraints and the Acquisition of Embedded Sentential Com-
plements. Doctoral Dissertation, University of Massachusetts.
Platzack, C. 1989. A grammar without functional categories: A syntactic study of Early
Swedish Child Language. Ms. Lund University.
Pollock, J-Y. 1989. Verb Movement, Universal Grammar, and the structure of IP. Linguistic
Inquiry 20. 365-424.
Radford, A. 1986. Small Children's Small Clauses, Research Papers in Linguistics 1. 1-38.
Bangor: UCNW.
Radford, A. 1987. The Acquisition of the Complementiser System. Research Papers in
Linguistics 2. 55-76. Bangor: UCNW.
Radford, A. 1988a. Small Children's Small Clauses. Transactions of the Philological Society
86. 1-46 (revised extended version of Radford 1986).
Radford, A. 1988b. Transformational Grammar. Cambridge: Cambridge University Press.

Radford, A. 1989. Profiling Proforms. Ms. University of Essex.
Radford, A. 1990. Syntactic Theory and the Acquisition of English Syntax. Oxford: Basil
Blackwell.
Roeper, T. and E. Williams (eds.) 1987. Parameter Setting. Dordrecht: Reidel.
Rom, A. and R. Dgani. 1985. Acquiring case-marked pronouns in Hebrew: the interaction
of linguistic factors. Journal of Child Language 12. 61-77.
Schieffelin, B. B. 1981. A developmental study of pragmatic appropriateness of word order
and casemarking in Kaiuli. In W. Deutsch (ed.) The Child's Construction of Language.
New York: Academic Press. 105-120.
Schlesinger, I. M. 1971. The production of utterances and language acquisition. In D. I.
Slobin (ed.) The Ontogenesis of Grammar. New York: Academic Press. 63-101.
Schlesinger, I. M. 1982. Steps to Language: Toward a Theory of Native Language Acquisition.
Hillsdale: Erlbaum.
Slobin, D. I. 1979, 2nd edn. Psycholinguistics. London: Scott Foresman and Co.
Tracy, R. 1986. The acquisition of case morphology in German. Linguistics 24. 47-78.
Travis, L. 1984. Parameters and Effects of Word Order Variation. Doctoral Dissertation,
MIT.
Null Subjects, Markedness, and Implicit
Negative Evidence*
Anjum P. Saleemi
Allama Iqbal Open University
In this paper I shall attempt to investigate the null subject phenomenon

from the perspective of the selective or parametric theory of language
learning (e.g. Roeper and Williams (1987), Lightfoot (1989)), taking the
specific learnability model built around the Subset Principle, in particular
the view presented in Manzini and Wexler (1987) and Wexler and Manzini
(1987), as a first approximation to a viable theory of parameter fixation
(see also Berwick (1985)). The null subject parameter has been cited as
evidence both for and against the Subset Principle. It has been claimed
that, in the relevant respect, a language generated by the pro-drop option
includes a language generated by the non-pro-drop option, and therefore
the parameter is susceptible to the learnability logic underpinning the
principle (e.g. Berwick (1985), 291-293). The converse claim has been made
under the assumption that the two types of language partially intersect,
since a pro-drop language does not have sentences with overt expletive
(or pleonastic) subjects, whereas a non-pro-drop one does not contain
sentences with null subjects (see Hyams (1986), and the remarks in Wexler
and Manzini (1987)).
What follows is in part based on the view that neither of the two claims
appears to be valid, as the binary-valued formulation of the parameter
underlying them may be descriptively inadequate. As an alternative, I shall
tentatively suggest a multi-valued formulation. I further suggest that
markedness should be defined in terms of parameter values, rather than
in terms of their respective languages (i.e. the corresponding sets of
sentences).
Another major point demonstrated in this paper is that, owing to the
possibility of a lack of total consistency between parameter values and
the languages they are associated with, the learning of the correct values
may not be entirely coextensive with the learning of the corresponding
languages. Specifically, the latter process, which can be termed exact
identification, may involve some use of implicit negative evidence, whereas
the former, to be referred to as positive identification, is accomplished
exclusively on the basis of non-negative instances. Thus our learnability
logic incorporates substantive new assumptions regarding markedness, the
way a parameter and the languages generated under its different values
236 Anjum P. Saleemi
could be related, and the general nature of the language learning process.
(See Saleemi (1988a) for related discussion, and Saleemi (1988b) for a
detailed treatment of the syntactic and learnability issues).
1. SOME BACKGROUND ASSUMPTIONS
In the syntactic analysis described below I assume that pro-drop and

postverbal subjects originate from separate, if not entirely independent,
parameters (cf. Safir (1985), Hyams (1986)). Obviously, most alleged
consequences of pro-drop, e.g. lack of the that-trace effect and long-distance
movement of subjects, can be demonstrated to follow from the inversion
of subjects, rather than from pro-drop (Rizzi (1982), but see Picallo (1984)
for a contrary view). I further assume, following Lasnik and Saito (1984)
and Chomsky (1986b), that the requirement of proper government (i.e.
the ECP) does not hold of the empty category pro; under this approach
the ECP applies only to nonpronominal empty categories, namely the traces
of moved elements, and never to the pronominal elements pro and PRO.
A key feature of our analysis is that the licensing of null subjects by Infl
is held to primarily depend on its role as the assigner of nominative Case,
rather than on its association with Agr (cf. Bouchard (1984), Safir (1984),
(1985), Rizzi (1986)). The crucial assumption is that the formal licensing
and identification of null subjects are independent processes (see Rizzi
(1986), Adams (1987), Jaeggli and Safir (1989), for views along similar
lines; also Huang (1984)). The role of a rich Agr is hypothesised to vary
on a language-particular basis, and to be primarily confined to the process
of identification of the grammatical features of null subjects. Considering
the diversity of identification processes resorted to by various languages,
this seems to be a reasonable hypothesis. I would like to remind the reader
that the relationship between pro-drop and various identification mecha-
nisms, which range from identification through a rich Agr to identification
by means of a superordinate NP to identification through a discourse-
bound Topic, does not appear to be uniform or systematic cross-lingui-
stically. Further, it is often not totally consistent within a particular language
either, although admittedly in several cases a fairly strong correlation
between the two properties is known to exist (e.g. see Bennis and Haegeman
(1984) on West-Flemish, Borer (1984) on Hebrew, McCloskey and Hale
(1984) on Irish, and Rizzi (1986) on Italian; see also some evidence from
Pushto reported in Huang (1984)). In the remainder of this paper we shall
concentrate on the licensing and its learnability implications, assuming
that an appropriate identification mechanism would be available to the
child once the licensing process has been fixed.
Crucial to our view of the null subject parameter is a hierarchy of three

types of subject, originally due to Chomsky (1981), which distinguishes
among the types by their characteristic thematic and referential content.
The three types are as follows. In addition to the fully referential subject,
we further identify two nonreferential (i.e. pleonastic) types, namely
nonargument and quasi-argument (cf. Rizzi (1986)). A nonargument is
an expletive subject which is obligatorily construed with a postverbal NP
or clause. A quasi-argument is the expletive subject of atmospheric-temporal
predicates (e.g. a weather predicate). Nonarguments always occur in clauses
whose predicates do not assign an external G-role, whereas the predicates
of quasi-arguments tend to assign a minimal external 9-role. The typology
of subjects suggested in the foregoing discussion is summarised in (l): 1
(1) R-index 0-role

a. Referential argument + +
b. Quasi-argument - +
c. Nonargument
Below, (la) and (lb) together are sometimes referred to as argumentai

or thematic subjects, and (lb) and (lc) as nonreferential subjects.
2. THE LICENSING PARAMETER
It is an established fact that pro can occur only in those contexts in which
overt Case-governed NPs can appear. Thus, Rizzi (1986) asserts that pro
can only be licensed by a Case-assigning head, which is Infl in the case
of a subject pro. We can draw many different conclusions on the basis
of this generalisation. Firstly, we can hypothesise that pro is Case-marked
just like lexical NPs (e.g. Chomsky (1982), 86); Hyams (1986), 32-33)).
Alternatively, it can be stipulated that nominative Case is absorbed by
Infl (Rizzi (1986)), or that it is simply not phonetically realised (Safir (1985)).
None of these assumptions is straightforwardly consistent with the view,
crucial to the present analysis, that Case, standardly considered to be
assigned at S-structure, renders an NP "visible" at PF, and thus requires
it to be phonetically realised at that level (Bouchard (1984), Fabb (1984)).
In our analysis we also adopt the broader Visibility Condition: under this
condition 0-role assignment at LF can occur only if an argument is Case-
marked (Aoun (1985), Chomsky (1986a)). Suppose that the Case Filter
can be subsumed under the Visibility Condition, as Chomsky (1986a)
suggests, then it would appear that Case is instrumental in ensuring visibility
at both PF and LF: just as arguments can have a phonetic matrix at
PF only if they are Case-marked, they can be assigned 6-roles at LF,
and thus be interpreted, just in case they are Case-marked. This means
that if a pro reached LF without Case, it would not be visible for 0-
role assignment at that level.
In order to meet the dual visibility requirements, I first propose that
an NP may be assigned Case at LF as well as at S-structure (cf. Fabb
(1984), 43). This should account for the assignment of Case to null and
overt NPs alike, the former acquiring Case only at LF. 2 Second, I adopt
the idea, due to Bouchard (1984), that in pro-drop languages Case
assignment to the subject can be delayed until LF. This idea presupposes,
in keeping with the above discussion, that obligatory Case at S-structure
requires an NP to be lexically realised at PF, whereas in the event of
optionality of syntactic Case an NP need not be so realised, unless some
other feature (e.g. focus) forces it to acquire Case in order to be overt.
Thus the null subject parameter may now be (provisionally) stated as
follows.
(2) The assignment of Case to a subject may optionally occur at LF.
To sum up, I hypothesise that null subjects are formally made possible
by optionality of syntactic Case. This licensing condition (together with
identification through language-particular means) determines whether a
language will allow null subjects. Much like the traditional approaches,
the licensing condition (2) differentiates null subject languages from non-
null subject ones on a binary basis. Considering (2) to be essentially correct,
I shall now explore the possibility of revising the null subject parameter
to encompass more than two types of language, as it appears that the
phenomenon under consideration is more diverse than can be captured
by a minimal binary parameter. It is notable that a multi-valued parameter,
such as the governing category parameter proposed by Wexler and Manzini, 3
makes it possible to directly capture a wider range of variation. On the
other hand, a binary formulation that is intended to account for the same
range of variation must somehow explain away part of the attested diversity.
Further, a many-valued analysis should be of greater advantage in de-
termining learnability if the corresponding binary analysis requires the
postulation of many additional grammatical mechanisms to the system,
the exact consequences of which may appear to necessitate some intricate
deductive reasoning on the part of the learner. Is the null subject
phenomenon diverse enough to warrant an expanded typological analysis?
The following crosslinguistic data suggest that such an analysis is quite
plausible.
First consider these German examples. 4
(3) a. *(Es) troff (Safir (1984))

'It dripped'
b. Heute regnet *(es). (Travis (1984))
today rains it
'It's raining today'
c. Heute sind (*es) zwei Kinder gekommen
today are two children come
'Today there came two children'
Unlike Italian and Spanish, in German pro-drop is available only in a

very reduced range of contexts. Referential pro-drop is not allowed at
all (3a). A nonargument subject, ceteris paribus, is obligatorily omitted
in many constructions (3c), but a quasi-argument subject must always be
retained (3b).5 In other words, in German argumental subjects must never
be omitted (Safir (1984), (1985), Travis (1984)). (At this point I do not
comment on the apparently obligatory nature of the omission of pleonastics,
an issue that will be taken up in due course.)
According to Travis (1984), Yiddish represents still another type, as
these examples show.
(4) a. Haynt hot *(es) alts gegesn (Travis (1984))

today has it all eaten
'It has eaten everything else today?'
b. Haynt geyt (*es) a regn
today goes rain
As in German, referential pronouns are never dropped in Yiddish (4a).

But no nonreferential pronouns, including quasi-arguments (4b), can appear
overtly. With respect to nonreferential drop, Malagasy (Travis (1984)) and
Insular Scandinavian languages, namely, Icelandic and Faroese, pattern
with Yiddish (Platzack (1987)).6 Finally, Italian permits referential pro-
drop as well as nonreferential pro-drop, as demonstrated below.
(5) a. (Io) vado al cinema (Hyams (1986))

'I go to the movies'
b. Sembra che Gianni sia molto infelice oggi
seems that John is very unhappy today
'It seems that John is very unhappy today'
c. Piove molto durante il mese di febbraio
rains a lot during the month of February
'It rains a lot during the month of February'
In this respect Spanish, Portuguese, and many other null subject languages,
resemble Italian.
Clearly, in view of these data a revision of the traditional binary view
of pro-drop is in order. While acknowledging this diversity, Rizzi (1986)
adheres to a binary-valued formulation of the parameter, suggesting that
the reduced range of pro-drop in some essentially pro-drop languages may
be a result of the language-particular interaction of the null subject
parameter with an independent parameter that regulates the recovery of
pronominal features in a piecemeal fashion. If operative in a given language,
the recovery mechanism overrides the consequences of the null subject
parameter whenever the retrieval of certain designated feature(s) of the
empty subject from overt affixation on Infl is not possible. 7 Rizzi's account
is incomplete or unsatisfactory for several reasons. Empirically, it does
not have much to say about languages without Agr which allow pro-drop,
e.g. Chinese (Huang (1984)) and Japanese (Hasegawa (1985)), or indeed
about any cases of pro-drop where the overt correspondence between pro-
drop and Agr is weak. Theory-internally, it implies that parameters can
massively nullify each other's triggering conditions, with one parameter
almost undoing the effect of another one, a supposition that is likely to
run up against considerable descriptive and learnability problems. It would
be tantamount to a conspiracy for the suppression of relevant evidence,
which might complicate the learning procedure whereby the language
learner is supposed to pick the correct value of a parameter. This underscores
the point that the major motivation behind a parameterised theory of
grammar is to guarantee learnability, a key but often unrecognised
assumption being that parameterisation is meant to enable the learner
to make a set of independent decisions. This obviously means that
parameters have to be independent from each other in such a way that
they can be fixed directly on the basis of relevant evidence. Note that
the view of independence being implied here is much weaker than that
embodied in Wexler and Manzini's Independence Principle, which spe-
cifically requires the subset relations between the languages generated by
the values of a parameter to hold irrespective of such relations between
the languages associated with the values of all other parameters. Neither
view, I believe, rules out the possibility that parameters may interact to
some extent, or that the degree of mutual functional compatibility (in
some sense that can be made precise) between different parameters may
be rather strong; hence the tendency in null subject languages with a rich
Agr to depend on identification through overtly realised «¿»-features (namely,
person, number and gender).
A more effective way to describe the null subject phenomenon, in part
following the typological variation noted by Rizzi (1986) (see also Travis
(1984)), is to posit a wider range of parameterisation. Therefore, under
Null Subjects, Marlcedness, and Implicit Negative Evidence 241
our analysis the phenomenon is characterised as a multi-valued parameter,

which may be stated as follows, replacing (2).8
(6) The Null Subject Parameter

The assignment of Case to a may be delayed until LF; where o, a
a subject, represents
a. 0; or
b. nonargument; or
c. nonreferential argument; or
d. any argument whatsoever.
English, French, and Swedish are associated with value (a) of the parameter,
allowing no null subjects. On the other hand, German takes value (b),
that permits only nonarguments to be omitted, requiring all argumental
subjects to be lexically expressed. Yiddish, Malagasy, Icelandic and Faroese
take value (c), that allows the omission of quasi-arguments as well as
nonarguments, i.e. all nonreferential subjects. Finally, languages like Italian
and Spanish (and possibly also those like Chinese and Japanese) are
associated with value (d), under which any subject, referential or non-
referential, may remain null.
(6) is apparently problematic in one respect, though: it predicts that
pleonastic pro-drop will be optional, like the core cases of referential pro-
drop. However, whereas referential pro-drop is in general optional, non-
referential pro-drop seems to be mandatory in most pro-drop languages,
with some (possibly marked) exceptions. Practically, then, the pro-drop
option with respect to pleonastics might be no more than a Hobson's
choice. But there appears to be a simple solution to the problem. In the
spirit of Chomsky's (1981) Avoid Pronoun Principle, the absence of lexical
pleonastics in many pro-drop languages can be ultimately attributed to
their pragmatic infelicity (cf. Travis (1984), 229; Hyams (1986)). Briefly,
it can be assumed that since pleonastic subjects are nonreferential, they
would be superfluous, and thus might not, or might have ceased to, exist
in some null subject languages. What I am suggesting is that there could
be more or less fortuitous gaps in languages (though perhaps not in their
grammars) that might (at least in part) be ascribable to lack of functional
usefulness.
We can thus indirectly account for the lack of expletive subjects in most
pro-drop languages, and one can still claim (6), pragmatically qualified,
to be formally correct. This maximally general statement of the parameter
may in any case be required for some supposedly "marked" languages
- such as Welsh (Awbery (1976)), Irish (Travis (1984), 231ff.), substandard
Hebrew (Borer (1984), 216), Faroese (Platzack (1987)), and Urdu - in
which nonreferential pro-drop is in fact optional in many configurations;

see these examples from Urdu.
(7) a. (Ye) maloom hota hai ke wo ja chuka hai

it seems is that he gone is
'It seems that he has gone.'
b. (Ye) wazeh hai ke us ne jhoot bola tha.
it obvious is that he ERG lie spoken was
'It is obvious that he had lied.'
Nevertheless, due to the irregular distribution of pleonastics in languages,

the shortfall in the data available to the learner leaves us with a learnability
problem, particularly in the total absence of any kind of negative evidence,
as is demonstrated in the following pages.
3. T H E LEARNABILITY PROBLEM
The null subject parameter, as formulated in the last section, raises some
interesting questions about the relationship between parameters and the
languages they generate, and about the resulting implications for lear-
nability. It affords an example of the intricate connection between parameter
values and the corresponding languages, in particular of a mismatch between
the two, perhaps suggesting that parameters do not, strictly speaking,
generate languages, but only fix the maximal bounds within which languages
can be realised. The problem is analogous to the well-known ontological
problem of structures that are predicted to be well-formed by a grammar
but that do not exist. This gives rise to some inconsistency between the
training instances available to the learner and the relevant generalisation
in the grammar. The parameter (6) illustrates that this state of affairs
is possible in a parametric theory as well, resulting in a projection puzzle
that is discussed below. As the data the child will get may not exactly
be the data he will expect under the parameter, the problem to resolve
is how the child infers the presence of gaps in the ambient language in
the absence of any negative information. In more general terms, the question
is how the learner deals with partial generalisations in the target grammar.
4. POSITIVE I D E N T I F I C A T I O N
It is plainly obvious that in principle the four values of the parameter

should generate languages which form a subset hierarchy, as, proceeding
from (a) to (d), each value potentially increases the set of well-formed
structures allowed by the parameter. That is, if lexical pleonastics were

in general optional, the situation obtained would be compatible with the
monotonic model of parameter fixation proposed by Wexler and Manzini,
indicating that the parameter could be fixed without any difficulty on
the basis of positive-only evidence. This indeed might be more or less
correct in relation to those languages in which the pro-drop of pleonastics
is optional to some degree; there should be no learnability problem insofar
as overt pleonastics exist. However, due to the nonappearance of overt
pleonastics in many null subject languages, we cannot be sure that each
of these languages will actually fall into a subset hierarchy in respect of
the parameter (6). Under such circumstances the Subset Principle, as
conceived by Wexler and Manzini, cannot be regarded as an effective
learning procedure. This points to the possibility of a projection problem,
relevant only to those languages that contain gaps resulting from the
nonexistence of overt pleonastics. The best way to spell out this problem
explicitly is to assume the "worst case", that is, the situation in which
overt pleonastics are absent from all types of null subject languages. In
such a state of maximal deviation from the parameter (6), the set-theoretical
relations between the languages will be as follows.
Let L(a) be the language related to value (a) of the parameter; likewise
L(b), L(c), and L(d). Then it is obvious that L(a) and L(b) will partially
intersect, as L(a) contains sentences with overt nonarguments, which L(b)
will not have, and L(b) contains sentences with null nonarguments that
are excluded by L(a). Next consider L(c). L(c) has sentences with null
quasi-arguments, not included in L(b), whereas L(b) has sentences with
overt quasi-arguments that will not be contained in L(c). Therefore L(b)
and L(c) will also intersect. Now consider L(d). L(d) will be coextensive
with L(c) with respect to the nonpresence of nonreferential subjects, but
it additionally contains referential null subjects. Consequently, L(c) C L(d).
In short, the set-theoretical profile of languages that emerges is rather
mixed, incorporating both subset and intersecting relations, quite unlike
what the Subset Principle and the related assumptions predict.
Given the way the values of the parameter reflect a gradual decrease
in restrictiveness, and the fact that the relations among languages may
involve at least one subset-superset relation (i.e. that between L(c) and
L(d)), it is obvious that the order in which the values are supposed to
be inspected by the learner remains an important consideration. Note that
a total absence of any subset-superset relations would have made the
question of order irrelevant, as criterial evidence distinguishing all of the
four values from each other could have been available no matter in which
particular order the values were considered. However, in the situation under
review the dilemma remains as to how to impose an ordering on the set
of values associated with the parameter. Note that it is still possible to
consider that the values of the parameter are ordered in terms of markedness
just as dictated by a subset hierarchy, as in theory the parameter is
compatible with the Subset Condition. However, that may not be desirable,
since it has been shown that although the languages associated with different
values of the parameter can fall into a subset hierarchy, they do not do
so in relation to a significant number of cases. Alternatively, one can define
the inclusion relations among values, rather than extensionally (i.e. among
languages generated by these values), a possibility that follows naturally
from the internal structure of the parameter. The markedness hierarchy
and the learning procedure can accordingly be redefined.
Recall that the set of null subject types under the four values progressively
enlarges from value (a) to value (d): the set of null subjects under value
(a) of the parameter is 0; the set of possible null subjects under value
(b) consists of nonarguments only; the set of possible null subjects under
value (c) has as its members quasi-arguments as well as nonarguments;
and the set of possible null subjects under value (d) contains nonarguments,
quasi-arguments, and referential arguments. In other words, in this specific
sense value (d) includes value (c), value (c) includes value (b), and value
(b) includes value (a). The following condition is proposed to determine
markedness among parameter values that are so related.
(8) Markedness Condition

Given a parameter P with values P b ..., P n , for every P; and Pj, 1 ^
i,j ^ n ,
a. Pj includes Pj if the set of categories to which Pj applies is a
subset of the set of categories to which Pj applies; and
b. Pj is less marked than Pj if Pj includes Pj in the sense of (a).
The intuitive idea behind (8) is that markedness is a function of certain

internal properties of language, rather than of the external properties of
particular languages (cf. Chomsky's (1986a) distinction between I-language
and E-language). The chief criterion for markedness, accordingly, is subset
relations among sets of categories affected by the values, rather than among
the sets of strings they generate. Although in certain respects (8) and Wexler
and Manzini's Subset Condition are equivalent, (8) differs from the Subset
Condition and the related Subset Principle in one important respect: since
it is not conceived in terms of languages, psychologically it could be more
plausible. Wexler and Manzini's learning module appears to presuppose
quite complex computational abilities on the part of the learner, who may
have to compute different subset hierarchies for different lexical items of
the same type (the "Lexical Parameterization Hypothesis"), and who must
further establish distinct hierarchies for pronominals and anaphors (see
Safir (1987) for some discussion of this issue; also Newson, this volume).
Whether or not the child is endowed with such abilities is of course a

moot point, but it would seem that a learning module that depends rather
heavily on extensionally defined computational operations cannot be
considered realistic so long as a computationally less demanding alternative
is available.
Much as we may desire it, we cannot expect (8) to be the only criterion
for determining markedness. However, later in this paper we shall see
that (8) can be shown to be relevant to the binding parameters of Wexler
and Manzini as well.
Given that (8) applies in the case of at least some parameters, the learning
can be easily taken care of by a rather general procedure, defined in (9),
that takes (8) as one of the metrics of markedness, and correspondingly
inspects the parameter values in the ascending order of markedness. This
learning procedure will not, at least in the cases when (8) or a similar
condition applies, need to observe extensively the languages associated
with these values.
(9) Learning Procedure

Given a parameter P with values P b ..., P n , let L(P;) be the
language generated under value Pj of the parameter P, let / p be the
learning function for P, and let D be a set of data. Then for every
Pi, 1 ^ i < n , / p ( D ) = ^ iff
a. D C L(Pi), and
b. P; is the least marked value which is consistent with D.
(9) formalises the learning procedure, which says that the learning function
/ p maps the set of data D onto a value P; of parameter P if and only
if D is a subset of L(Pi), the language generated when P takes value Pi(
and Pj is the least marked value consistent with D. I consider (9) to be
a domain-specific learning procedure, comparable to the Subset Principle
in that respect; however, like the markedness condition (8) it possesses
greater likelihood of being computationally tractable.
In present terms the markedness condition (8) defines the order in which
the parametric choices expressed in the null subject parameter are explored
by the child learner, and (9) the learning principle that can be used to
select the correct value of the parameter on the basis of positive-only data,
a process that may be termed positive identification.
(10) Positive Identification

A parameter value is positively identified just in case all observed
positive instances are consistent with that value.
However, learning is in part determined externally by the data presented

to the learner, and positive identification can be fully successful only if
the data are comprehensively consistent with one of the set of values
associated with the parameter, which, we already know, cannot be gu-
aranteed under (6). The following consequence immediately ensues: if the
correct language is any language other than L(a), the selection of each
one might lead to overgeneralisation within that language, as shown in
the following paragraphs.
Suppose L(b) is the ambient language. Recall that in L(b) all argumental
subjects must be overt. Then, when presented with sentences with null
nonarguments, the learner is bound to conjecture the parametric identity
of the language. But notice that, given the no-negative evidence assumption,
there is nothing that would prevent him from overgeneralising within the
correct language, producing sentences with overt nonarguments, as exem-
plified here with respect to German.
(11) *Heute sind es zwei Kinder gekommen

'Today there came two children'
Now consider that L(c) is the language to be learned. The presence in

the data of sentences with both kinds of null nonreferential subjects should
be sufficient to rule out L(a) and L(b), pointing to L(c) as the most likely
choice. But the learner might still overgeneralise within L(c). That is, the
learner might erroneously regard sentences with overt quasi-argument
subjects, such as the Yiddish example (12), as well as those with overt
nonargument subjects, to be in L(c).
(12) *Haynt geyt es a regn

Likewise in the case of L(d). The appearance in the data of null referential
subjects should straightaway rule out L(a), L(b),and L(c). But the problem
of possible overgeneralisation to overt pleonastics within L(d) is still there.
Considering that quite often expletives are homophonous with certain
referential pronouns or in some way semantically nonempty items (e.g.
Yiddish es and English it have referential counterparts; notably, Welsh
hi is 3rd person feminine singular), in principle overgeneralisation can occur
even though overt expletives are totally absent in the language being learned,
as they are in Italian. 9 This type of overgeneralisation may simply consist
of an expectation on the part of the learner that non-null pleonastic subjects
are possible, without any definite knowledge of the particular lexical form(s)
they would actually assume.
The upshot is that a learner armed solely with (6), (8) and (9) may
not be guaranteed to be entirely successful. Though absolutely central to
the process of parameter fixation, positive identification could well prove
to be insufficient, since, owing to the nonexistence of certain predicted
structures, the principle (9) will not ensure that the learner's language
as defined by the parameter (6) is extensionally identical to the ambient
language. To put it more succinctly, (9) may not be able to exactly identify
the ambient language. Recall that the learning procedure (9) is designed
to be driven solely by positive-only evidence. It seems then that although
such evidence is effective in positively identifying a language from among
the four possible ones, it is not effective in exactly identifying that language.
Beyond the point in linguistic development where a parameter is fixed,
say following the application of the learning procedure (9), the learner
might have to employ further inferencing strategies that are essentially
data-driven; that is to say, exact identification would require that in relation
to the missing forms the learning system must be completely guided by
the record of linguistic examples made available to him, rather than
exclusively by the specific innate entities modelled in the form of (6), (8)
and (9) above.
5. EXACT IDENTIFICATION
Patently, whatever exact identification may involve, under the present

approach it will presuppose positive identification. Further, it will require
that the learner should discover the exact extent of the difference between
the language predicted by the target parameter value and the "incomplete"
ambient language. Suppose that the language defined by the target value
(a) of a parameter is L(a), and the corresponding ambient language is
L(a). Then exact identification may be defined in this manner.
(13) Exact Identification

A parameter value (a) is exactly identified just in case
a. it is positively identified; and
b. the difference between L(a) and L(a) is known.
Keeping this definition in mind, let us now try to ascertain the mechanisms
whereby exact identification can come about.
A solution to the projection problem described in the last section, which
could ensure exact identification, would be for the learner to undergeneralise
within the conjectured language. This can be accomplished by noticing
the nonoccurrence of the relevant types of overt pleonastic subjects in
the "incomplete" data, in other words by resorting to what Chomsky called
indirect negative evidence (Chomsky (1981), 8-9; see also Lasnik (to appear),
Oehrle (1985), Wexler (1987)). The basic idea is as follows. If there are
structures that are predicted to exist by the learner's grammar, and that
do not appear in the stream of data after n positive instances (where n
is a sufficiently large number indicating the size at a given time of the
ever expanding corpus of data), then the learner is capable of the negative
inference that these structures are in fact missing from the ambient language.
Under our approach exact identification is considered to come into
operation once the core learning process, i.e. positive identification, has
taken place, its task being to bridge any gaps between the predictions
made by the parameters of Universal Grammar and the linguistic data
actually exhibited. Whereas positive identification is a simple selective
process that can occur on the basis of a small number of triggering instances,
exact identification in addition consists of (relatively general) inferential
processes which involve much closer and extensive inspection of the data,
including the keeping of a record of certain examples. Clearly, the use
of indirect negative evidence in a manner akin to that outlined above
will indeed be sufficient to exactly identify the correct language from data
that are incomplete with respect to the parameter.
6. IS IMPLICIT NEGATIVE EVIDENCE REALLY NECESSARY?
Although the use of implicit negative evidence is logically feasible, it has

yet to be established if it is empirically plausible. It is possible to argue
that indirect negative evidence is not really necessary, since alternative
accounts based on positive-only evidence are usually possible. In relation
to the null subject parameter (6), one such alternative may be somewhat
like this.
Let us presume that the learner intrinsically expects the data to be deficient
due to some quasi-linguistic considerations. To put it more concretely,
assume that the following constraint, which stipulates the absence of
redundant forms such as expletive subjects unless they are observed in
positive data, is part of the learner's a priori baggage (here "redundant"
means syntactically optional and functionally useless):
(14) Redundancy Constraint

Assume redundant forms to be absent unless they are exemplified
in positive data.
Suppose that the absence of overt expletives in null subject languages is

the norm, then this constraint, comparable but not identical to the Avoid
Pronoun Principle, would make it possible for the parameter to be correctly
fixed from positive-only evidence. Is the problem really resolved if we

assume a constraint like (14) to be part of the pre-existing linguistic and
learnability structures? I would like to argue that there are a number of
reasons why such a move could be ill-advised; instead, allowing some
measure of implicit negative evidence might be a better option.
By itself the constraint in question (or any other similar assumption)
is not independently motivated, its only justification being that it salvages
the no-negative evidence condition. Though ensuring a simple view of
evidence, it does not necessarily simplify the overall learning system, as
it will, for one thing, make the initial state more intricate. As a built-
in device for undergeneration, it will counteract the parameter (6) whose
primary task is to generate well-formed structures. Further, in terms of
markedness the constraint appears to be in conflict with the parameter.
Whereas the latter defines a language with overt pleonastics, e.g. L(a),
to be less marked, the constraint in question implies that, owing to some
pragmatically motivated component of Universal Grammar, overt pleon-
astics are less acceptable than the nonlexical ones. I think there are
compelling reasons for postulating that grammatical principles and prag-
matic tendencies should never be conflated, since a maximally simple theory
will be obtained by excluding pragmatic processes from the characterisation
of Universal Grammar (see Smith (1989), also this volume, for a discussion
of the role of pragmatics in acquisition). In sum, it is urged that the type
of evidence under discussion should be added to the class of possible
solutions to the learnability problems. 10
7. DEVELOPMENTAL IMPLICATIONS
In the foregoing pages I have discussed at some length the incompatibility

of the parameter stated in (6) with one of the two major current approaches
to parameter fixation, i.e. the set-theoretical approach. I have shown that
the Subset Principle does not yield desirable results in guaranteeing
learnability in this case. Let us now turn to an appraisal of the other
major viewpoint, namely Hyams' (1986, and elsewhere) developmental
approach.
The parameter (6) encapsulates the way the knowledge of the null subject
phenomenon could be determined from primary linguistic evidence, without
necessarily making clear predictions regarding the process of acquisition.
In general I do not regard a learnability theory as being a theory of
development. Nevertheless, such predictions as can be deduced from it
in the present context are as follows. In the simplest case each value of
the parameter in (6) can be fixed straightaway, provided the relevant
evidence is available. In the more complicated situations, limited to the
learning of the marked values, there is a possibility that one or more

values which are less marked than the correct one could be provisionally
chosen first. However, it is not predicted that any provisional choices will
necessarily be manifested in the form of temporal stages of acquisition,
as such choices, if made at all, may not last long enough to impinge on
the production data. Similarly, considering that early child grammars allow
even referential subjects to be null and that pleonastics are functionally
redundant anyway, we do not expect the effect of the possible overge-
neralisation to overt pleonastics to be clearly reflected in child language
data.
This is in contrast with the approach advocated by Hyams (1986), which
rests on the claim that not only can provisional wrong choices be made,
but that in some cases they must be made. In particular, Hyams contends
that the null subject parameter is first fixed at the null value regardless
of the nature of the language to be learned. Thus children learning languages
as diverse as Italian, German, and English are supposed to start out with
the assumption that the language they are learning is a null subject one,
and they stay with this assumption until restructuring, for whatever reason,
occurs. More importantly from the present point of view, Hyams' (1986)
account depends crucially on the assumption that the parameter yields
(partially) intersecting languages, so that crucial triggers (sentences con-
taining pleonastic subjects in the case of non-pro-drop languages, and
sentences without overt referential subjects in the case of pro-drop langua-
ges) are always available. However, as far as (6) is concerned, among
language types predicted by it there is at least one instance of proper
inclusion, i.e. that between L(c) and L(d), which is bound to create an
insurmountable "subset problem" (Manzini and Wexler (1987)) if, in
keeping with the spirit of Hyams' analysis, L(d) is considered to be the
initial choice by the child. I, therefore, conclude that, to the extent that
(6) is a correct model of parametric variation with regard to null subjects,
Hyams' (1986) view is indefensible on learnability-theoretic grounds.
In Hyams (1987) the analysis presented in Hyams (1986) is extensively
revised. The move is motivated by certain inadequacies of the hypothesis,
presented in Hyams (1986), that in null subject languages Agr = PRO
(see Guilfoyle (1984), Hyams (1987), Lebeaux (1987), Radford (1988,
forthcoming), for critical discussion). For example, the previous analysis
did not capture the fact that in the acquisition of non-null subject languages
obligatory lexical subjects and inflections for tense, etc., appear roughly
at the same time. It also failed to take into account the absence of inflections
in early null subject English, although early Italian, similar as it is in
respect of the setting of the null subject parameter, is known to have
inflected paradigms. This would imply that in early Italian, but not in
early English, identification of features of the null subject is possible through

overt agreement markings.
The current version adopts a new linguistic analysis based on the notion
of morphological uniformity (Jaeggli and Hyams (1988), Jaeggli and Safir
(1989)). Nonetheless, Hyams' basic claim regarding acquisition is the same:
the null subject option is preferred over the non-null subject option. The
former now corresponds to the hypothesis on the part of the child that
the language to be learned is characterised by morphologically uniform
paradigms, that is to say, is either consistently inflected (like Italian), or
consistently uninfected (like Chinese). If the ambient language is a language
like Italian, then positive data readily confirm the initial option. If, on
the other hand, the ambient language has a non-uniform inflectional system,
e.g. English, then also the child first assumes that the language is
morphologicaly uniform, but in the sense of Chinese rather than that of
Italian. (Recall that under the Agr = PRO hypothesis early English was
considered to be like Italian, not like Chinese!). Early English is further
supposed to be like Chinese in respect of the identification mechanism
involved, which is now considered to be the binding of a subject pro by
a null Topic, following Huang's (1984) analysis of Chinese, with the
difference that a variable is not admitted in the subject position. According
to Hyams, the new analysis is supported by the fact that in the acquisition
of richly and uniformly inflected languages like Italian, Polish and Japanese
the inflections are acquired quite early, in contrast with the rather late
acquisition of inflections in the case of morphologically poor and non-
uniform languages (e.g. English). Since in these latter obligatory lexical
subjects and morphological markings emerge almost concurrently, sug-
gesting a possible connection, it is proposed that when the child discovers
that English is morphologically non-uniform, he automatically infers that
null subjects are not available in this language. I decline to comment further
on this revised account until further research succeeds in establishing an
explicit causal link between inflectional uniformity and the availability
of null subjects.
Whatever version of Hyams' approach is adopted, the problem remains
that there is no explanation for the supposition that the null choice must
initially be made in all cases, although when the ambient language is a
non-null subject one the evidence confirming this fact, whether overt
expletives or inflections, is always available to the learner (see, however,
Cook, this volume). Even if we suppose that the parameter is indeed binary
valued, and that it does not conform to the Subset Condition, then, as
long as there is a good explanation for why early speech tends to lack
overt subjects, such as the one offered by Radford (1988, 1990), there
is no conceivable reason to claim that one choice or the other is made
first, as in principle both choices should be equally accessible. (See also
Saleemi (in preparation), where it is argued that parameter fixation is

primarily a "learning" process, and maturation primarily a developmental
one).
8. BINDING PARAMETERS AND MARKEDNESS
In this paper I have proposed a metric for parameter values which regards
the range of grammatical categories affected by a value as being criterial
for evaluating markedness. The idea is to dispense with the specific
extensional measure of markedness put forward by Wexler and Manzini.
It is natural to ask if our approach is extendible to the binding parameters
as defined by these authors (also cf. Koster (1987), 319ff.). If it is, that
would provide additional support for our contention that markedness is
not a function of the application of certain set-theoretical constructs to
the languages generated by parameter values.
It is relatively easy to show that the proper antecedent parameter of
Wexler and Manzini lends itself to an intensional view of markedness rather
straightforwardly. Recall that the parameter has two values, (a) and (b).
Under value (a) the set of proper antecedents contains only subjects, and
under value (b) the set of proper antecedents includes both subjects and
objects. Thus the set of proper antecedents defined by value (b) is a proper
superset of the set of proper antecedents defined by value (a), and value
(b) includes value (a) exactly as the markedness condition (8) requires.
What about the governing category parameter? I believe that this parameter
too is compatible with the present approach. I begin by pointing out a
weakness in the formulation of the parameter, repeated in (15) in the form
given in Wexler and Manzini (1987).
(15) Governing Category Parameter

7 is a governing category for a iff y is the minimal category that
contains a and
a. has a subject; or
b. has an Infl; or
c. has a Tense; or
d. has an indicative Tense; or
e. has a root Tense.
By way of demonstration I shall focus largely on anaphors, but the argument

is intended to apply to pronominals as well. To exemplify the weakness
mentioned above, I shall only refer to the Italian anaphor se and the
Icelandic anaphor sig\ it should be clear, though, that, mutatis mutandis,
the point is equally relevant to all other types of marked anaphors. Under
the parameter in (15), the correct definitions of governing category for

se and sig require the presence of Infl and indicative tense respectively.
But consider the following pairs of Italian (16) and Icelandic (17) examples
(adapted from Manzini and Wexler (1987)).
(16) a. Alice guardò i [NP ritratti di sè; di MarioJ

Alice looked at portraits of Refi of Mario
'Alice looked at Mario's portraits of Refi'
b. [ N P ritratti di sè; di MarioJ
'Mario's portraits of Refi'
(17) a. Jôn heyrôi [ N P lysingu Mariu; af s è r j

Jon heard description Maria(gen) of Refl
'Jon heard Maria's description of Refl'
b. [ N P lysingu Mariu; af sèr,]
'Maria's description of Refl'
According to Wexler and Manzini the reflexives in (16a) and (17a) are
not bound in the minimal governing category containing them and their
antecedents, i.e. the NP, since it lacks Infl/tense, but in the maximal
governing category, i.e. the sentence containing the NP and Infl/tense.
I assume that it is legitimate to speak of the grammaticality of any kind
of maximal projections, not just sentences; this would be in keeping with
the well established assumption that grammatical processes are essentially
category-neutral, and that the category S has no privileged status in the
theory of Universal Grammar. Now, if (16b) and (17b) are also grammatical
alongside (16a) and (17a), as I presume they are, then the relevant definitions
of governing category under the parameter (15) are inadequate for these
constructions, as under these definitions both (16b) and (17b) are wrongly
predicted to be ill-formed. Note that in their independent capacity the
NPs in (16b) and (17b) do not contain Infl or tense, respectively, as required
by (15), so they can be grammatical only by virtue of being a part of
an S. I have delineated the problem only with respect to NPs, but the
logic is equally relevant to small clauses - which, just like NPs, can be
governing categories in their own right under no value other than (a) -
and indeed to all those types of governing categories which can be embedded
in another. The point I wish to make is that, although there is no way
out of this dilemma under the Subset Condition, a solution is possible
precisely in terms of the markedness condition (8).
Suppose that the set of governing categories permitted in Universal
Grammar contains five types corresponding to the five values of the
parameter (15). For convenience I shall write a governing category with
a subject as GC(A), a governing category with an Infl as GC(B), a governing
category with a tense as GC(C), a governing category with an indicative

tense as GC(D), and a governing category with a root tense as GC(E).
One could argue that the parametric variation captured in (15) is not to
be explained by stipulating a different single governing category related
to each value, but by a gradual increase in the number of governing
categories allowed by the marked values. Thus, value (a) permits only
one type of governing category, which is GC(A). On the other hand under
value (b) both GC(A) and GC(B) are legitimate governing categories, and
so forth. The governing categories so permitted under each of the five
values are listed in (18).
(18) a. GC(A)
b. GC(A), GC(B)
c. GC(A), GC(B), GC(C)
d. GC(A), GC(B), GC(C), GC(D)
e. GC(A), GC(B), GC(C), GC(D), GC(E)
This would require a slight revision in the statement of the parameter

(15), but the result would be in accord with the definition of markedness
in (8). If such a revision is justified, then one would be able to say that
the reflexives in all the examples listed in (16) and (17) are bound within
NPs, i.e. within their minimal governing categories. We can now define
a minimal governing category for anaphors as in (19a), and that for
pronominals as in (19b).
(19) Minimal Governing Category

a. 7 is a minimal governing category for a, a an anaphor, if it is
the minimal maximal projection containing a and its antece-
dent.
b. 7 is a minimal governing category for a, a a pronominal, if it
is the minimal maximal projection containing a and the ante-
cedents from which a is free.
We can thus hold that an anaphor is always bound, and a pronominal

is always free, within its minimal governing category. In conclusion, I
believe that this revised view of the binding parameters is preferable, and
that it lends support to the notion of markedness already shown to be
relevant to the null subject parameter.
FOOTNOTES
*I am deeply indebted to Martin Atkinson and Vivian Cook for their constant help and
encouragement while the research in part reported here was in progress, and to Michael
Jones, Ken Safir and Ken Wexler for valuable comments on the earlier versions of this
paper.
I also wish to acknowledge the helpful comments I received from audiences at Stanford
and Essex.
1. A word of caution is necessary at this point. It is not certain that the distinctions between
the three types of subjects are neatly held across most languages. It is quite possible that
a class of predicates that appears in one category in one language appears in another category
in a different language. For instance, in some languages the atmospheric-temporal predicates
are expressed without recourse to quasi-arguments. Consider the following Urdu (i), Jordanian
(ii) and Standard (iii) Arabic examples, where the italicized subject is a referential NP.
(i) Barish ho rahi hai

rain happening is
'It is raining.'
(ii) (Iddinya) bi-tsatti.
the-world is raining
'It is raining.'
(iii) Assama?u tomtir.
the-sky is raining
'It is raining.'
This is, in a way, consistent with the tendency among the subjects of atmöspheric-temporal
predicates in languages like English to behave as if they are somewhat like referential
arguments, in that they appear to be thematic. In the text I shall presume, with the proviso
regarding cross-linguistic lexical variation in mind, that the distinctions between referential
arguments, quasi-arguments, and nonarguments are in general well motivated.
2. Lack of syntactic Case, rather than government, may be held to be responsible for the
distribution of PRO as well as pro. If this is correct, then one can assert that, with the
exception of variables, all empty categories lack syntactic Case, and that they get Case
universally at LF. This would straightforwardly account for the visibility of PRO at LF.
3. This and all subsequent undated references are to both Manzini and Wexler (1987) and
Wexler and Manzini (1987).
4. Throughout this paper, parentheses in examples indicate optional pro-drop. Further, an
asterisk outside the parentheses suggests that pro-drop is not available, and an asterisk inside
the parentheses denotes obligatory pro-drop.
5. The German facts are not as simple as shown in the text, and therefore deserve some
comment. Specifically, it is not the case that lexical nonarguments can be freely dispensed
with. Consider the following examples.
(i) *(Es) wurde ein Mann getötet (Safir (1984))

there was a-NOM man killed
(ii) *(Es) scheint, daß er kommt.
it seems that he comes
In neither of these examples is the expletive subject es allowed to drop. I assume that in
German subject-initial sentences the subject must appear in order to fulfil the V2 constraint.
Thus, in German nonarguments can be omitted as long as they are not required by an
independent factor, such as the V2 rule; according to Safir (1984), "for some speakers this
prediction is roughly borne o u t " (p. 216). However, one should note that some further
restrictions, of a relatively less systematic nature, appear to exist that counteract the possibility
of a null subject in certain cases; see Safir (1984, 1985) and Travis (1984).
6. As in German, pleonastic pro-drop in Icelandic is not without exceptions; see Platzack
(1987).
7. A similar view is adopted in Jaeggli and Safir (1989), who for example maintain that
underlyingly German and Icelandic are null subject languages in the same sense as Italian
and Spanish, but that in them the presence of the V2 effect blocks the process of identification,
and therefore the availability of referential null subjects. Interestingly, for Adams (1987)
the V2 effect is one of the two mechanisms that make pro-drop possible, the other being
Romance inversion; on her view the loss of V2 in Old French was responsible for the change
in the language from a pro-drop to a non-pro-drop character.
8. I assume, following Rothstein (1983), that "subject" and "predicate" are syntactic terms,
and not merely derivative functional categories; hence the appearance of the term "subject"
in the definition (6) in the text.
9. That expletives have homophonous referential or otherwise meaningful analogues is of
course not purely accidental; Nishigauchi and Roeper (1987) adduce evidence suggesting
that expletives are bootstrapped via their meaningful counterparts.
10. The use of implicit negative evidence is also faulted on the grounds that it would rule
out certain highly complex well-formed structures that tend to be very rare, for example
the fully expanded auxiliary phrase in English (Wexler (1987)). However, Berwick and Pilato
(1987) show that a machine induction model designed to learn the English auxiliary system
can infer such rare examples from relatively simple ones likely to be encountered quite
frequently.
REFERENCES
Adams, M. 1987. From Old French to the Theory of Pro-Drop. Natural Language and Linguistic
Theory 5. 1-32.
Aoun, J. 1985. A Grammar of Anaphora. Cambridge, Massachusetts: MIT Press.
Awbery, G. M. 1976. The Syntax of Welsh: a Transformational Study of the Passive. Cambridge:
Bennis, H. and L. Haegeman. 1984. On the Status of Agreement and Relative Clauses in
West-Flemish. In W. de Geest and Y. Putseys (eds.) Sentential Complementation. Dordrecht:
Foris.
MIT Press.
Berwick, R. C. and S. Pilato. 1987. Learning Syntax by Automata Induction. Machine Learning
2. 9-38.
Borer, H. 1984. Parametric Syntax: Case Studies in Semitic and Romance Languages. Dordrecht:
Foris.
Bouchard, D. 1984. On the Content of Empty Categories. Dordrecht: Foris.
Chomsky, N. 1982. Some Concepts and Consequences of the Theory of Government and Binding.
Chomsky, N. 1986a. Knowledge of Language: its Nature, Origin and Use. New York: Praeger.
Fabb, N. 1984. Syntactic Affixation. Doctoral dissertation, MIT.

Guilfoyle, E. 1984. The Acquisition of Tense and the Emergence of Lexical Subjects in
Child Grammars of English. The McGill Working Papers in Linguistics 2. 20-30.
Hasegawa, N. 1985. On the So-called 'Zero Pronouns' in Japanese. The Linguistic Review
4. 289-341.
Huang, C.-T. J. 1984. On the Distribution and Reference of Empty Pronouns. Linguistic
Inquiry 15. 531-574.
Hyams, N. M. 1986. Language Acquisition and the Theory of Parameters. Dordrecht: Reidel.
Hyams, N. M. 1987. The Setting of the Null Subject Parameter: a Reanalysis. Paper presented
to the Boston University Conference on Child Language Development.
Jaeggli, O. and N. M. Hyams. 1988. Morphological Uniformity and the Setting of the Null
Subject Parameter. NELS 18.
Jaeggli, O. and K. Safir. 1989. The Null Subject Parameter and Parametric Theory. In O.
Jaeggli and K. Safir (eds.) The Null Subject Parameter. Norwell: Kluwer.
Koster, J. 1987. Domains and Dynasties: the Radical Autonomy of Syntax. Dordrecht: Foris.
Lasnik, H., to appear. On Certain Substitutes for Negative Data. In H. Lasnik, Essays on
Restrictiveness and Learnability. Dordrecht: Reidel.
Lasnik, H. and M. Saito. 1984. On the Nature of Proper Government. Linguistic Inquiry
15. 235-289.
Lebeaux, D. 1987. Comments on Hyams. In Roeper and Williams (1987).
Lightfoot, D. 1989. The Child's Trigger Experience: Degree-0 Learnability. Behavioral and
Brain Sciences 12. 321-375.
Manzini, M. R. and K. Wexler. 1987. Parameters, Binding Theory, and Learnability. Linguistic
Inquiry 18. 413-444.
McCloskey, J. and K. Hale. 1984. On the Syntax of Person-Number Inflection in Modern
Irish. Natural Language and Linguistic Theory 1. 487-533.
Nishigauchi, T. and T. Roeper. 1987. Deductive Parameters and the Growth of Empty
Categories. In Roeper and Williams (1987).
Oehrle, R. T. 1985. Implicit Negative Evidence. Ms. Department of Linguistics, University
of Arizona, Tucson.
Picallo, M. C. 1984. The Infl Node and the Null Subject Parameter. Linguistic Inquiry 15.
75-102.
Platzack, C. 1987. The Scandinavian Languages and the Null-Subject Parameter. Natural
Language and Linguistic Theory 5. 377-401.
Radford, A. 1988. Small Children's Small Clauses. Transactions of the Philological Society
86. 1-46.
Radford, A. 1990. Syntactic Theory and the Acquisition of Syntax: the Nature of Early Child
Grammars of English. Oxford: Basil Blackwell.
Rizzi, L. 1982. Issues in Italian Syntax. Dordrecht: Foris.
Rizzi, L. 1986. Null Objects in Italian and the Theory of pro. Linguistic Inquiry 17. 501-
557.
Roeper, T. and E. Williams (eds.) 1987. Parameter Setting. Dordrecht: Reidel.
Rothstein, S. D. 1983. The Syntactic Forms of Predication. Doctoral dissertation, MIT
(Reproduced by IULC, Bloomington, Indiana 1985).
Safir, K. 1984. Missing Subjects in German. In J. Toman (ed.) Studies in German Grammar.
Dordrecht: Foris.
Safir, K. 1985. Syntactic Chains. Cambridge: Cambridge University Press.
Safir, K. 1987. Comments on Wexler and Manzini. In Roeper and Williams (1987).
Saleemi, A. P. 1988a. Language Learnability and Empirical Plausibility: Null Subjects and
Indirect Negative Evidence. Papers and Reports in Child Language Development 27. 89-
96.
Saleemi, A. P. 1988b. Learnability and Parameter Fixation: the Problem of Learning in the
Ontogeny of Grammar. Doctoral dissertation, University of Essex. (To be published by
Cambridge University Press).
Saleemi, A. P., in preparation. Choice and Maturation in Language Learnability and
Development.
Travis, L. 1984. Parameters and Effects of Word Order Variation. Doctoral dissertation, MIT.
Wexler, K. 1987. On the Nonconcrete Relation between Evidence and Acquired Language.
In B. Lust (ed.) Studies in the Acquisition of Anaphora, Vol. II: Applying the Constraints.
Dordrecht: Reidel.
Wexler, K. and M. R. Manzini. 1987. Parameters and Learnability in Binding Theory. In
Roeper and Williams (1987).
Second Language Learnability
University of Utrecht
1. INTRODUCTION
The aim of this paper will be to examine second language (L2) research
in recent years with particular reference to how the subconscious devel-
opmental processes underlying non-native language development are cur-
rently viewed. Since second language acquisition has by now been shown
to be a highly complex and poorly understood process not depending simply
on habit formation or indeed on the deliberate learning of rules and
vocabulary items, it is very relevant to ask whether the Chomskyan notion
of learnability is relevant for L2 research (for earlier discussions of this
topic, see, for example, papers in Pankhurst et al. 1988)
In the preliminary sections, I will set the scene for a discussion of second
language learnability by giving a brief sketch of the development of ideas
in this field over the last two decades. I will then summarise the main
points of view on learnability and describe the kind of research that is
currently being done. Since there is an explosion of literature on this subject
(see for example papers in Flynn and O'Neil 1988, Pankhurst et al 1989,
and the last five volumes of Second Language Research, for example), the
aim here will simply be to give an informative impression of the situation
with references to some of the important work in the field. In my
illustrations, I will use the better known (rather than the most recent)
theoretical analyses of various aspects of grammar such as subjacency and
adjacency, the analysis itself being immaterial for present purposes.
1.1. The second language learner as a constructor of mental grammars
The notion of learner grammars as mental phenomena dates back to the

late sixties when the field of second language acquisition was just starting
up as an independent area of activity. Researchers pointed to the systematic
linguistic behaviour of learners of a second or other language (in other
words learners of non-native languages) and suggested that they should
be viewed as operating linguistic systems all their own. Because not just
beginners were looked at but also learners who possessed quite sophisticated
intermediate systems, the idea of learner grammars took hold more quickly
260 Michael Sharwood Smith
than it did in first language acquisition, where researchers have tended

to focus on the very early stages of grammatical development. People like
Corder and Selinker (Corder 1967, Selinker 1972) called our attention to
the possibility of viewing L2 learner language such as "the Dutch of English
learners of Dutch" for example, as possessing systematic features which
can be studied in their own right rather than as imperfect reflections of
some norm, in this particular case call that norm "educated native speaker
Dutch". The term "interlanguage" (IL) was adopted to cover this general
idea following a seminal paper by Larry Selinker (1972) in which he claimed
that IL was the outcome of a set of special strategies that characterised
second language learning and hence set it apart from the language produced
by first language learners. As the learning process unfolded so the learner
passed through a series of different IL systems.
For Selinker, and others more recently, the crucial fact in determining
the qualitative difference between first and second language acquisition
mechanisms was the fact of fossilisation in IL systems, namely the typical
cessation of development prior to the attainment of native-speaker norms
despite repeated exposure and practice (definition adapted from Selinker
1972). Other views include the idea that fossilisation is the result of various
external factors such as mother tongue influence, lack of motivation,
inhibitions (see, for example, Dulay et al. 1982, White 1985, 1988) or a
conflict between the learner's original language learning mechanisms and
more general cognitive learning strategies not available to the younger
child (see Felix 1985).
In the discussion that follows, IL grammars will be treated as a particular
type of 'developing grammar'. That is, in contradistinction to the position
adopted by Selinker, the possibility that child (LI) grammars and inter-
language (L2) grammars have something essential in common will be kept
open.
1.2. LI and L2 acquisition as special cases of the same process
The term coined in the seventies by Heidi Dulay and Marina Burt to
describe the organising principles that (by hypothesis) create LI or L2
grammars from primary linguistic input was "creative construction". The
idea was implicated in their successful attempt to discredit the view that
L2 learning was basically overcoming LI habits or, to put it in non-
behaviourist terms, a process of gradually transforming mother tongue
(i.e. Ll)-based cognitive structures into ones which conformed to the
information coming from the environment. Dulay and Burt, and later Steven
Krashen, pursued the line that L2 creative construction took place without
recourse to the mother tongue: this meant that all Ll-patterns observed
in L2 production ought to be attributable to performance constraints, i.e.
falling back on LI resources out of sheer expediency rather than as

something which actually reflected representations of the L2 data in the
learner's mind (see Dulay and Burt 1974, Dulay et al. 1982, Krashen 1976,
1982, 1985).
It should be noted that the creative construction approach differed in
two important ways from the interlanguage approach advocated by Selinker
(1972). Firstly, as implied above, Selinker assumed that L2 acquisition
was qualitatively different from LI acquisition - this would explain
fossilisation in L2 development. Secondly, LI influence had a more
important status as one of the "central processes" in L2 development
(Selinker 1972). Unlike interlanguage, creative construction was a term
which specifically made reference to the Chomskyan approach to LI
acquisition. Nonetheless, until very recently, no attempt was made by the
proponents of this approach to actually apply Chomskyan models to
research questions in L2 research (cf. Schwartz 1986). Also, creative
construction research looked at developmental steps in particular areas
as though development progressed steadily towards the target within small
subsystems like "negation" or "wh-questions" and not as an interactive
phenomenon and did not therefore entertain the idea of larger-scale systems
developing in stages as is implied in interlanguage model. However,
Bialystok and Sharwood Smith (1986) did suggest that this interlanguage
approach is perfectly compatible with the idea that first and second language
acquisition are driven by the same basic processes despite variation in
the ultimate product.
The claim that was advanced by the behaviourists in the sixties was
that almost all learning difficulty originated in the difference between LI
and L2 and that this could be explained as a question of old, interfering
habits to be unlearned (see Lado 1967). In contrast to this, creative
construction theorists, as has just been indicated, relegated LI grammatical
'transfer' to the area of performance errors. They pointed to evidence
of common orders of development across learners with different LI
backgrounds - the well-known "morpheme order studies" based on Roger
Brown's early work on LI acquisition - and they claimed that this evidence
demonstrated the irrelevance of the LI in the creation of new grammars
(Brown 1973, Dulay et al. 1982). If LI background was important, they
argued, then the developmental order for the acquisition of such mor-
phosyntactic phenomena such as third person singular -s, copula be,
perfective have and the progressive V-ing in English would vary syste-
matically with the LI background of the second language learner. A
comparison between various observed orders for L2 acquisition and LI
(morpheme) orders observed by Brown and others revealed suggestive areas
of commonality and seemed to support the claim that LI and L2 acquisition
were the outcome of the same basic process, i.e. creative construction.
Since the structural areas chosen for investigation in L I research ("child

language") were selected on the basis of high frequency (allowing for ease
of data collection) and focussed on the early and linguistically simpler
utterances of learners, the claims of L1/L2 identity rest on but a small
area of the language system. Even allowing that the results strongly
suggested interesting parallels between L I and L2 development, there is
still a great deal of evidence needed to properly support this hypothesis.
In fact, as a result of the failure of the creative construction theorists
to convincingly substantiate their claim concerning the irrelevance of L I
influence in grammatical development, received opinion has backtracked
somewhat on this. Many researchers now view L I influence as something
which certainly does not happen automatically and with all aspects of
interlanguage but, when it does, may have subtle, interesting and unpre-
dictable consequences by sheer virtue of the complexities involved, com-
plexities that a deeper understanding of the linguistic problems have made
us appreciate more fully (see Gass and Selinker 1983, Kellerman and
Sharwood Smith 1986, Odlin 1989).
2. LINGUISTIC THEORY AND SECOND LANGUAGE ACQUISITION
Since the failure of Bloomfleldian structural linguistics to help applied

linguists solve the problem of predicting learner's errors, researchers rather
shunned the fine detail of theoretical linguistics in the seventies. However,
the eighties has seen a return to this particular sister discipline, this time
using Chomsky's Extended Standard Theory and, later, Government and
Binding Theory (Chomsky 1981). This has allowed L2 researchers to frame
a number of specific, theoretical questions, like the one concerning the
supposed naturalness of I L grammars (see Adjemian 1976 for an earlier
discussion of this issue). Other questions include the possible role of
linguistic markedness1 in real-time development and the degree to which
the grammatical "parameters" of the learner's L I (selected from the total
repertoire available for natural languages and set in specific ways) influence
the shape of the emerging L2 grammar.
2.1. L2 learnability
The learnability issue in the context of second language research may be

expressed as follows: however rich the communicative context of utterances
addressed to the language learner and however helpful the native speakers
are or teachers may be, there are subtle and complex features of human
languages that cannot be provided by the usual kind of input. This is,
of course, assuming that the language learning mechanisms do not operate
by building hypotheses and testing them out: such mechanisms would

require the learner to have constant access to negative evidence, for example
in the form of corrective feedback whenever needed. It is by no means
clear that the second language learning situation is so radically different
from the situation that obtains in mother tongue acquisition, despite
appearances to the contrary.
Even where correction and explanation is provided, that is, even in a
formal classroom where there may be a lot of metalinguistic discussion
of grammar, teachers do not typically provide the kind of structural
information that linguists would recognise as crucial for the setting of
(native) L2 parameters. There is also considerable doubt about whether
this information, were it available, would actually help develop the relevant
subconscious processes of grammar formation. It is not clear, for example,
that if learners were informed (on the basis of a particular theoretical
linguistic analysis of the problem area) about the number of bounding
nodes that were appropriate for the L2 grammar and actually understood
what that meant, that they would be able to do anything more than be
able (fairly laboriously) to identify correct and incorrect utterances from
a list. By the same token, an apparently simpler piece of information such
as "English observes strict adjacency, not argument adjacency like French"
with appropriate examples, may not automatically enable learners to have
native-like gut feelings about the acceptability of utterances in which strict
adjacency is violated (as in he eats often apples). Again, they may only
be able to come to a native-like judgement by performing a conscious
analysis of the relevant strings as they would if parsing sentences from
a dead language. Where there is no explanation, but still negative evidence
in the form of samples of L2 labelled clearly as incorrect, it ought to
be possible, again, by inductive learning, to work out the rules of the
grammar. Evidence that such feedback solves the poverty of the stimulus
problem is also scanty. For these reasons, it is possible to see L2 input
of whatever kind in the same light as LI input, that is, as structurally
impoverished whatever it does provide in semantic or pragmatic terms,
it does not, by hypothesis, furnish the learner with enough relevant and
"perceptible" evidence to work out certain subtle principles and constraints
that characterise the native-speaker grammar, i.e. as far what is made
visible to the learning device, the grammar is "underdetermined" by
information coming from the the environment.
If the hypothesised deficiencies mentioned above exist for L2 learners,
then it may accordingly be concluded that acquisition of particular crucial
aspects of L2 grammar takes place via an interaction between the evidence
provided by the input and a set of grammatical principles available to
all normal language learners which forms part of our genetic endowment,
i.e. Universal Grammar (UG). In addition, as, for example, White (op.cit.)
has argued, the learners LI may also play an important role in the setting
of L2 parameters.
Naturally, what is understood by parameters of UG is theory bound.
For example, prepositional languages vary in the degree to which extraction
of the NP out of the prepositional phrase is permitted. Preposition stranding,
generally seen as a marked phenomenon and perhaps representing the
marked value of a parameter of which pied-piping is the unmarked value,
may be epiphenomenal, and be a reflection of the operation of other
principles and parameters involving Empty Category Movement, direction
of government, and so forth (see overview in Van Buren and Sharwood
Smith 1985). Clearly, developmental researchers are going to have to keep
changes in linguistic theory in mind when formulating their hypotheses.
However, the notion of learnability provides a constant motivation for
such research, in that it is always possible to say that direct positive evidence
disconfirming some parameter-setting carried over from the LI is not
available to the L2 learner. Whatever the theoretical analysis is, the French
learners of English will not encounter direct positive evidence to tell them
that interruption of the verb and its direct object NP is unacceptable in
native-English except via stylistically motivated movement (heavy NP shift).
And "marked" English sentences exhibiting this stylistic property will
actually reinforce the mistaken (subconscious) assumption they may have
that English works like French in this respect (as in il mange souvent les
pommes = he eats often apples). By the same token, English learners learning
French will not encounter information in the input that a stranded
preposition such as qui parle-je a? is not one of the options open to Standard
French native-speakers as it is in English (who am I speaking to?). If learners
manage to acquire the native property despite the apparent deficiency of
the data, it is a matter for researchers to determine how they did this.
Was it because they received indirect positive evidence from some other
source enabling them to infer the property (see Hilles 1986, Zobl 1988
and Van Buren 1988) or was it because, after all, negative evidence allowed
them to reset the relevant parameters in their IL (see Rutherford and
Sharwood Smith 1986)? In this way, examining the nature of the input
from a learnability perspective allows research to formulate interesting
questions about IL grammars.
2.2. The initial L2 state: logical possibilities
By analogy with Chomsky's views on what the child brings to the

development of the LI, his biological linguistic endowment, it is important
to consider how the L2 learner's initial state might be characterised. Various
logical possibilities concerning this vital aspect of IL research were outlined
in Sharwood Smith 1988, and these are reviewed below with some changes
in terminology).
2.2.1. The "UG by proxy" view

One possibility may be that common principles typical of all languages,
i.e. "universal grammatical" principles (principles of UG) have been
activated during LI acquisition to make the learner's first natural grammar,
and that some or all of these principles are, solely by virtue of this fact,
transferred to the L2 in the form they have taken in the LI grammar.
In this way, the LI grammar serves as an initial template for the L2 system
and properties of natural grammars that were part and parcel of the LI
are passed on to the initial IL. Learners then have to restructure those
aspects of the LI-based IL grammar which turn out to be specific to the
LI in the light of new hypotheses about L2. This means that principles
of U G which, for example, rule out structure-independent operations, come
to be in the early IL grammar, as it were, by proxy. At the same time,
under this scenario, U G is no longer directly accessible and therefore does
not help the learner in creating new areas of the IL not based on L I .
Those features of U G that were not relevant for LI but are relevant for
L2 would therefore pose a serious problem for the L2 learner. It is interesting
to note in passing that L2 (IL) grammars are typically structure dependent
and there are errors that no L2 learners appears to make (see, for example,
Jenkins 1988).
Where the LI template no longer serves as the basis for the IL grammar,
the L2 learner has, under this scenario, to build an IL using some other
principles, perhaps just those principles of hypothesis-formation and testing
that are not allowed free rein in first language (grammatical) acquisition.
In Fodorian terms (Fodor 1983) these processes would be part and parcel
of the central processor and not the encapsulated language module (see
discussion in Schwartz 1986, Gregg 1988, Zobl forthc.). This would mean,
as far as learnability is concerned, that negative evidence made available
to the learner by others, or sought by the learners themselves, would
certainly play a crucial role in the construction of the L2 grammar. On
the other hand, since the constraints of U G would not be currently operative,
nothing would prevent the learner from developing rational but non
(natural) language-like rule systems.
Under the UG-by-proxy scenario IL development is, to the extent that
it is influenced by UG, "parasitic" on the LI grammar. The consequence
of this must be that IL conforms to LI and U G until some adjustment
to the LI-based system in the learner's developing grammar (as a result
of L2 input calling the LI-based system into question) actually leads to
a violation of UG. The IL, taken as a whole, would become highly peripheral
in the Chomskyan sense of the word: it would have many elements that
constitute relaxations or even violate UG. Such violations would be
tolerated by the L2 learner precisely because U G would no longer be active
in the L2 acquisition process. In a similar vein, Helen Goodluck discusses
the potential development of what she calls "wild" grammars in first

language acquisition but in this case the temporary non-conformity to
UG is attributed to maturational factors typical of child language devel-
opment (Goodluck 1986).
The empirical consequences of the above "nonconformist" view are that
utterances reflecting non-natural (anti-UG) patterns ought then to appear
regularly in IL production and, more importantly, tests of grammatical
intuition (the normal technique for eliciting IL competence) should show
tolerance for such constructions. The possibility also exists that the learner's
unconstrained hypothesising will allow LI-based rules to be restructured
and change into rules that are not possible according to UG. The
nonconformist view in L2 acquisition has been adopted by a number of
researchers recently, in particular Bley-Vroman, Clahsen, Muysken and
Schachter (Bley-Vroman 1986, 1988, Clahsen, forthcoming, Clahsen and
Muysken 1986, and Schachter 1988). Such scholars have sought to show
that the predictions of UG accessibility simply do not work out in practice.
Clahsen and Muysken (1986) provide a useful illustration in terms of
the development of German word order. They point out that once LI
learners of German have established the fact that German is underlyingly
SOV and that the verb-second position in matrix/main clauses is a derived
position, when they eventually come to produce constructions with em-
bedded/subordinate clauses they always observe the newly acquired prin-
ciple from the start. This is in stark contrast to migrant workers learning
German as an L2, who, whatever the verb-placement situation is in their
own LI, seem typically to reiterate the history of main clause development
and produce verb-second constructions in both matrix and embedded
clauses. This would indicate that, unlike LI learners of German, they have
not perceived the evidence indicating the verb-final structure of the VP.
What is particularly important in this kind of investigation is to look
at the interaction between different properties in the grammar to see if
certain predictions about UG accessibility are reflected in real-time de-
velopment. Clahsen and Muysken (1986) claim to show that such interaction
does exist in LI data but not in the data they have on the acquisition
of German as an L2. For example, with regard to agreement markings,
the establishment of AGR features in LI German appears to coincide
with the establishment of the verb second rule (a feature of German SOV
syntax). This seems not to be the case in L2 German development. The
latter kind of acquisition they characterise as piecemeal and driven by
general learning principles. The claims made here are still highly contro-
versial. Duplessis, Solin, Travis and White (1987) as well as Schwartz and
Tomaselli (forthc.) offer explanations which do indicate UG-accessibility
in the L2 data. Clahsen and Muysken in a second paper attempt to counter
these arguments and the debate continues (Clahsen and Muysken 1988).
2.2.2. The "back-to-square-one" view

Another logical possibility is that L2 systems are acquired in the same
way as LI systems and that the constraints on possible shapes that the
L2 grammar may take are imposed directly (and not via transfer). We
might term this the recreative view. In other words, the learner goes back
to square one and "recreates" the L2 grammar as if he were a native
learner of the language, because UG is still active; and not only this: the
learner is insensitive to the structure of his LI. The L2 learner's native
language is ignored and plays no part in the developing IL grammar.
Using Adjemian's terminology, there would be no "permeability" in the
IL. This position was taken up by Mazurkewich who investigated dative
alternation in the IL of Inuit learners of English in Canada. Her results
suggested, to her at least, that Inuit learners initially adopted the unmarked
option offered by UG, as illustrated in I gave the ball to Mary (NP PP)
as opposed to I gave Mary the ball (NP). There is some confusion here,
as Kellerman (Kellerman 1985) pointed out, about the source of this initial
hypothesis, UG or LI, and there has not subsequently been much in the
way of research reporting support for this radical "Back-to-UG" position
(cf. Mazurkewich 1985).
There is also a problem in translating the markedness dimension into
real-time developmental terms. There is no obvious reason why a first
or second language learner should have an unmarked phase in his grammar
if they are truly input-driven as implied by LI learnability theory. The
moment the first piece of evidence for LI (or L2) being marked in some
respect comes along the parameter will immediately be set correctly. And,
as Kean has pointed out, the evidence for the marked setting often comes
along at the same time as the evidence for the relevance of the given
parameter: English learners encountering prepositions in the input will
most probably encounter stranded prepositions at the same time (Kean
1986; see also Van Buren and Sharwood Smith 1986).
2.2.3. The UG-Reorganisation view

It may be seen from the two basic views sketched below, the parasitic
" U G by proxy" view and the recreation view, that LI influence only plays
a role in the first view. But it is not the case that LI influence automatically
implies an inactive UG. For example, there is the possibility that internal
UG-inspired reorganisation is held up because of LI influence - a version
of an idea put forward by a few researchers in the seventies (see Schumann
1978, Wode 1978 and Zobl 1978). It might, then, be the case the LI influence
(or influence from any other linguistic system possessed by the learner)
may exert a counter-force to UG. In particular, Lydia White has argued
for LI influence in a manner that suggests that UG is still active in L2
Principles of UG Principles of UG
1 L — n
LI grammar LI grammar
(initial template) (ignored)
IL grammar IL grammar
L2 input
!
L2 input
I. Parasitic development II. Recreative development

(UG not active in IL (UG active in LI & IL
grammar) grammar)
Fig. 1: Parasitic versus recreative development (from Sharwood Smith 1989)
acquisition despite the fact that its operation is initially constrained by

certain instantiations of U G in LI carried over to L2 (see, for example,
White 1985, Liceras 1986). Mary-Louise Kean makes this point clearly
(in Kean 1988) by pointing out that the L2 learner brings a different U G
to the task of developing an L2, different, that is, only in the very specific
sense of being no longer unset: it has been set in LI terms. To take the
example of Jorge discussed by Hilles with reference to the setting of the
pro-drop parameter, Jorge brings to the acquisition of English a UG that
has had the pro-drop parameter set as [+PD], This implies that, for Jorge,
a natural grammar should have pro-drop (null subject), his LI Spanish
being the prime (and only) example, and this includes all the associated
morphosyntactic features of that particular UG parameter, unless the
evidence disconfirms this assumption and, of course, assuming that Jorge
attends to this evidence (Hilles 1986). Implicit in the notion of resetting
is the idea that the LI values form the initial L2 state and that the learner
(Language Acquisition Device) will need positive evidence to unset incorrect
settings. Where the L2 requires an unmarked setting and the LI is marked,
then the learner has a learnability problem. Negative evidence or perhaps
some indirect positive evidence is required to force a change in perception
of the L2 system (see Zobl 1988).
What might be called the "UG-reorganisation" view involves three
developmental phases. The first phase involves the initial application of
LI instantiations of UG. The second phase involves a recreative application

of U G in areas where LI parameters are not relevant, i.e. on the basis
of (perceived) positive evidence. This means that a learner with a con-
figurational LI having no movement rules and encountering an L2 with
movement rules begins to apply U G constraints on movement (move alpha)
while building the new grammar. Naturally, this process of applying as
yet unused aspects of U G only begins when the learner/learning device
recognises evidence pointing to the fact that the L2 is in fact configurational.
The third phase involves reorganisation, revising the LI settings of Phase
1, where the evidence demands it within the constraints imposed by L I .
The reorganisation view is illustrated in Fig 2 :
Principles of U G (interacting with input)
Note:
U G ensures
conformity
within the
grammar
as it
develops
L2 input
Fig 2. A third view: the reorganisation view (UG active in LI and IL grammar) (from Sharwood
Smith 1988)
The three main views on the roles of LI and U G that have just been
outlined above can be re-expressed as three general (working) hypotheses
for IL investigations, i.e., respectively, the Parasitic Hypothesis (as advanced
by Bley-Vroman, Schachter and others; see also Clahsen and Muysken
1986), the Recreative Hypothesis (as advanced by Mazurkewich), and the
Reorganisation Hypothesis (as advanced by White, Liceras, Sharwood Smith
and Van Buren and others). 2
The Parasitic Hypothesis holds that UG is no longer active in second

language acquisition and that traces of conformity to UG in IL may be
traced back to features of LI carried over into the developing grammar.
This is UG-by-proxy. This hypothesis would predict that where L2 had
some UG principle or parameter that was irrelevant to the learner's LI
(e.g. subjacency, which is relevant only where the grammar has move-a),
the learner would not be able to develop L2 competence via the same
input-UG interaction that typified child LI acquisition. If negative evidence
was not forthcoming or ineffective, then L2 competence development would
be incomplete by LI standards. The Recreative Hypothesis holds that UG
is active in second language acquisition and that grammatical development
unfolds very much along the same lines as it does for first language acquirers.
In this case, UG principles and parameters will be established for L2 in
the same way as they were established for LI: partial development of
these aspects of L2 grammar, all other things being equal, is not predicted.
The Reorganisation Hypothesis holds that UG is still active but in a different
way in that the learner sets parameters shared by LI and L2 in the way
that have been set for LI: this entails complications where there is no
evidence in the input for resetting the IL parameters so that IL is aligned
with native-speaker L2. Hence, as was pointed out above, French learners,
who have opted for the wider, marked type of adjacency allowing some
material in between verb and direct object, will have difficulty recovering
from the resultant L2 error and will need negative evidence, if they can
profit by this, to make up for the deficiency of the input.
In actual fact, the literature arguing in favour of endemic nonconformism

to UG also implies the possibility of L2 (IL) development which does
not even make reference to relevant LI parameters. In other words, the
LI does not provide an initial template, and with it reflexes of the workings
of UG principles in LI development. Rather, the L2 learner is seen as
processing the L2 input independently albeit via the central processor using
inductive principles and not the principles of UG (see Fig 3, below). This
position, which is related to the UG-by-proxy (parasitic) view discussed
above, has been argued for by people such as Jordens (1987, 1988) and
Pienemann and Johnston (see discussion in Clahsen 1989). For example,
Zobl (Zobl forthc) in considering the developmental data from Japanese
and Spanish learners of English considers just this option although he
ultimately argues that the interaction between properties of UG can indeed
be seen in his data. With the Fodor's views in mind Zobl dubs this the
"centralist" position as opposed to the "modularist" positions involving
UG accessibility.
The lack of crosslinguistic influence between LI and L2 implied by the
centralist position would be predictable from the encapsulated nature of
the LI system. In fact, we may well get crosslinguistic influence but it

would be in areas not included within the language module, or more
properly, the grammar module. That is to say that LI may influence L2
with regard to various grammatical, lexical, phonetic and pragmatic
phenomena that may arguably be acquired by general learning principles
whatever the theory about that domain of language covered by UG (Foster
1985).
3. RESEARCH STRATEGIES
Figure 3 (below) shows different approaches to the construction of

interlanguage (developmental) grammars in L2 acquisition. That is to say,
it shows some of the possible ways in which L2 mental grammars can
be built. It is simplistic in one sense at least, namely that IL grammar
is conceived of as a singular entity. However, the preceding discussion
should have made clear the fact that different aspects of the grammar
may well be acquired using different principles, and this goes for LI
development as well, of course.
Principles of UG Inductive Principles
L2 input L2 input
I. UG reflected in IL II. UG principles not reflected in

grammars either ac- IL grammar
tively or passively
(by proxy)
Fig. 3: The role of UG in IL grammar: different perspectives

Shaping the course of research to focus on these possibilities allows

researchers to apply one or other of these models to particular areas of
L2 grammar taking advantage of relevant research in theoretical linguistics.
Such research is necessarily theory-dependent in an interdisciplinary sense
such that a shift in the received view on UG, for instance, will have
repercussions for the interpretation of experimental results. More important
however is the decision about what the Null Hypothesis should be. If
we are to take as the methodological starting point the claim that LI
and L2 processes are the same, it is going to be very difficult to disprove.
Partial attainment can always be attributed to attitude, fear of error,
inhibitions and so forth (see Dulay et al. 1982 for full discussion). Hence,
whatever one's preferred beliefs are, it seems much sounder to assume
that different processes drive LI and L2 development and then try to put
this claim to the test.
A final methodological point would be that research should concentrate
on individuals since it is an observed fact that some L2 learners are extremely
successful while others are not. It may then be that one individual learner
may follow a course matching one of the models sketched above, while
another individual takes an alternative route. This would imply research
procedures that allow generalisations to made from individuals as well
as groups.
4. CONCLUSION
It may be argued, somewhat controversially perhaps, that the field of second

language acquisition is in essence a cognitively richer and more complex
field than first language acquisition. In terms of learnability theory, the
fact that the young child is learning his first language in a state of cognitive
immaturity simplifies matters considerably, for the theorist at least. Mature
second language learners have at their disposal new potential sources of
evidence for discovering the properties of the target grammar. They ought
in principle to be able to gain knowledge about the language in a way
that the child cannot. A theory of second language learnability has to
take a stance on the status of these sources and the cognitive machinery
that handles the relevant information. In addition to this complication,
there is the fact that second language learners, by definition, are already
in possession of a separate linguistic system. A theory of learnability has
also to take account of how this affects the nature of L2 evidence. And
the backdrop to all this is that it does seem humanly possible to learn
a second or third language to a degree which makes the difference between
native and non-native either a quibble or an impossible problem to resolve
coherently. It is therefore reasonable, as indeed others have pointed out
(e.g. Rutherford 1989), to encourage more theoretical linguists and anyone

working in other areas of real-time language acquisition to consider second
language research data as being relevant for helping to unveil the mysteries
of the innate language learning capacity.
FOOTNOTES
1. Second language researchers have looked at various kinds of markedness although, here,
only markedness in the Chomskyan sense will be considered.
2. For various positions on this issue, see Mazurkewich 1984, White 1986, Flynn 1986, Liceras
1986, Bley-Vroman 1986, Schachter 1986, 1988, Van Buren and Sharwood Smith 1986.
REFERENCES
Adjemian, C. 1976. On the nature of Interlanguage systems. Language Learning 26. 297-
320.
Bialystok, E. and M. Sharwood Smith. 1985. Interlanguage is not a state of mind, an evaluation
of the construct for second language acquisition. Applied Linguistics 6. 101-107,
Bley-Vroman, R. 1986. Hypothesis testing in second language acquisition theory. Language
Learning 36. 353-376.
Bley-Vroman, R. 1988. The fundamental character of foreign language learning. In W.
Rutherford and M. Sharwood Smith (eds.) Grammar in the Classroom. New York: Harper
and Row.
Foster, S. 1985. Taking a modular approach to universal of language acquisition. Paper presented
at SLRF, Los Angeles, February 1985.
Clahsen, H. 1989. The comparative study of first and second language development. Ms.
University of Düsseldorf.
Clahsen, H. and P. Muysken 1988. The UG paradox in L2 acquisition. Second Language
Research 5. 1-30.
Clahsen, H. and P. Muysken 1986. The accessibility of Universal Grammar to adult and
child learners. A study of the acquisition of German word order. Second Language Research
2. 93-119.
Corder, S. Pit. 1967. The significance of learner's errors. International Review of Applied
Linguistics 5. 160-170.
Dulay, H., M. Burt and S. Kxashen. 1982. Language Two. Oxford: Oxford University Press.
Dulay, H. and M. Burt. 1974. Natural sequences in child second language acquisition. Language
Learning 25. 37-53.
Duplessis, J., L. Solin, L. Travis and L. White. 1987. U G or not UG: that is the question.
Second Language Research 3. 56-75.
Felix, S. 1985. More evidence on competing cognitive systems. Second Language Research
1. 47-72.
Flynn, S. 1986. A Parameter-Setting Model of Second Language Acquisition. Dordrecht: Reidel.
Flynn, S. and W. O'Neil (eds.) 1988. Linguistic Theory and Second Language Acquisition.
Dordrecht: Reidel.
Fodor, J. 1983. Modularity of Mind. Cambridge, Massachusetts: MIT Press.
Gass, S. and L. Selinker (eds.) 1983. Language Transfer in Language Learning. Rowley,
Massachusetts: Newbury House.
Goodluck, H. 1986. Language acquisition and linguistic theory. In P. Fletcher and M. Garman
(eds.) Language Acquisition. Cambridge: Cambridge University Press.
Gregg, K. 1988. Epistemology without knowledge, Schwartz on Chomsky, Fodor and Krashen.
Hilles, S. 1986. Interlanguage and the pro-drop parameter. Second Language Research 2.
33-52.
Jenkins, L. 1988. Second language acquisition: a biological perspective. In S. Flynn and
W. O'Neil (eds.) Linguistic Theory and Second Language Acquisition. Dordrecht: Reidel.
Jordens, P. 1986. Production rules in interlanguage: evidence from case errors in L2 German.
In E. Kellerman and M. Sharwood Smith (eds.) Crosslinguistic Influence in Second Language
Acquisition. Oxford: Pergamon.
Jordens, P. 1988. The acquisition of verb categories and word order in Dutch and German:
evidence from first and second language acquisition. In J. Pankhurst, M. Sharwood Smith
and P. Van Buren (eds.) Learnability and Second Languages. Dordrecht: Foris.
Kean, M-L. 1986. Core issues in transfer. In E. Kellerman and M. Sharwood Smith (eds.)
Crosslinguistic Influence in Second Language Acquisition. Oxford: Pergamon.
Kean, M-L. 1988. The relation between linguistic theory and second language acquisition:
a biological perspective. In J. Pankhurst, M. Sharwood Smith and P. Van Buren (eds.)
Learnability and Second Languages. Dordrecht: Foris.
Kellerman, E. 1985. Dative alternation and the analysis of data. Language Learning 35. 91-
101.
Kellerman, E. and M. Sharwood Smith. 1986. Crosslinguistic Influence in Second Language
Acquisition. Oxford: Pergamon.
Krashen, S. 1976. Formal and informal linguistic environments in language acquisition and
language learning. TESOL Quarterly 10. 157-168.
Krashen, S. 1982. Principles and Practice in Second Language. Learning and Acquisition. Oxford:
Pergamon.
Krashen, S. 1985. The Input Hypothesis: Issues and Implications. London: Longmans.
Liceras, J. 1986. Linguistic Theory and Second Language Acquisition: The Spanish Non-native
Grammar of English Speakers. Tubingen: Narr.
Mazurkewich, I. 1984. The acquisition of the dative alternation by second language learners
and linguistic theory. Language Learning 34. 91-110.
Mazurkewich, 1.1985. In reply to Kellerman: a response from Mazurkewich. Language Learning
30. 103-106.
Odlin, T. 1989. Language Transfer: Cross-linguistic influence in Language Learning. Cambridge:
Pankhurst, J., M. Sharwood Smith and P. Van Buren (eds.) 1989. Learnability and Second
Languages. Dordrecht: Foris.
Rutherford, W. 1987. Learnability, SLA and explicit metalinguistic knowledge. Ms. University
of Southern California.
Rutherford, W. 1989. Linguistics and SLA: the two-way street phenomenon. Ms. University
of Southern California.
Schachter, J. 1988. Second language acquisition and its relationship to Universal Grammar.
Applied Linguistics 9. 219-235.
Schumann, J. 1978. The relationship of pidginization, creolization and decreolization to
second language acquisition. Language Learning 8. 367-388.
Schwartz, B. 1986. The epistemological status of second language acquisition. Second Language
Research 2. 120-159.
Schwartz, B. and S. Tomaselli, forthcoming. Analyzing the acquisition stages in L2: support
for UG in adult SLA. Second Language Research.
Selinker, L. 1972. Interlanguage. International Review of Applied Linguistics 10. 109-230.
Sharwood Smith, M. 1988. On the role of linguistic theory in explanations of second language
developmental grammars. In S. Flynn and W. O'Neil (eds.) Linguistic Theory and Second
Language Acquisition. Dordrecht: Reidel.
Van Buren, P. and M. Sharwood Smith 1985. The acquisition of preposition-stranding by
second language learners and parametric variation. Second Language Research 1. 18-46.
Van Buren, P. 1988. Some remarks of the subset principle in second language acquisition.
White, L. 1985. The pro-drop parameter in adult second language acquisition. Language
Learning 30. 43-47.
White, L. 1986. The principle of adjacency in second language acquisition. In S. Gass (ed.)
Second Language Acquisition: a Linguistic Perspective. Cambridge: Cambridge University
Press.
Wode, H. 1978. Developmental sequences in naturalistic L2 acquisition. In E. Hatch (ed.)
Second Language Acquisition. London: Newbury House.
Zobl, H. 1978. The formal and developmental selectivity of LI influence on L2 acquisition.
Language Learning 30. 43-57.
Zobl, H. 1988. Configurationality and the subset principle. In J. Pankhurst, M. Sharwood
Smith and P. Van Buren (eds.) Learnability and Second Languages. Dordrecht: Foris. 116-
131.
Zobl, H. (forthcoming) Evidence for parameter-sensitive acquisition: a contribution to the
domain-specific vs. central processes debate. Second Language Research 6.
Can Pragmatics fix Parameters?
N.V. Smith
University College London
1. INTRODUCTION
The appropriate response to the question in the title is moot, because

the simple answers "yes" and " n o " both seem obviously right.* If the
domain of "pragmatics" includes "the interpretation of utterances in
context" the answer has to be "yes" as it seems reasonably clear that
the child learning his first language must be able, at least partially, to
interpret the utterances which constitute the primary linguistic data on
the basis of which the grammar is learned. But the (pragmatic) interpretation
of an utterance standardly presupposes the grammatical analysis of that
utterance, so the answer has to be " n o " as it seems reasonably clear that
the child cannot use something which presupposes the grammar to learn
that grammar. In one of his recent compilations, Chomsky (1987:7) says
"These processes [of language acquisition] take place in different ways
depending on external events, but the basic lines of development are
internally determined". If "internal" means "internal to the language
module" and "external" refers to everything outside that module, in
particular to both central cognitive processes and to events external to
the individual altogether, then this also suggests a " n o " answer. If, however,
"internal" is taken to mean only "internal to the mind-brain" i.e. including
central cognitive processes as well as those processes peculiar to the language
organ, and "external" refers simply to those differences in input to the
child which are dependent on the language of the community in which
he finds himself, then a "yes" answer is plausible. Given that parameter
settings, by definition, vary from language to language, one might ipso
facto expect their fixing to depend on external events in the latter sense;
it is less obvious what one should expect with regard to the role of external
events in the former - e.g. pragmatic - sense.
To make the question explicit enough to be answerable at all, I shall
assume the "pragmatics" to be that of Sperber & Wilson's "Relevance
Theory" (Sperber & Wilson, 1986), and the "parameter fixing" (equivalently
"setting") to be that of Chomsky's "Principles and Parameters Theory"
(e.g. Chomsky, 1981a, 1981b, 1986; Roeper & Williams, 1987). As a general
background I also presuppose the validity of Fodor's "Language of
Thought" hypothesis (Fodor, 1975).
278 N. V. Smith
This combination of choices (i.e. Relevance Theory, the Language of

Thought, and the Principles and Parameters framework) provides a more
reasoned basis for an answer to the question posed at the outset. For
Sperber & Wilson "pragmatics" is included within a general theory of
cognition, and for Chomsky, "the way in which the development of the
grammar takes place is ... independent of other kinds of social and even
cognitive interactions" (1982:115). If pragmatics is part of cognition, and
cognition is irrelevant to the development of the grammar, we appear
to have an unequivocal "no" answer to our question, one, moreover, which
I have previously explicitly endorsed (Smith, 1988a: 198). I wish to argue
here that that answer is overly simple, but before looking more closely
at the details of the theories involved, it is worth spelling out what possible
positions are excluded by the choice of a framework of ideas defined by
the union of the work of Chomsky, Fodor and Sperber & Wilson. I will
deal briefly with each of those writers whose work bears on the possible
explication of the question in the title.
2. EXCLUSIONS
Fodor's by now well-known position on first language acquisition is

encapsulated in the quotation: "learning a language presupposes the ability
to use expressions coextensive with each of the elementary predicates of
the language being learned" (Fodor, 1975:80). This leads directly to the
conclusion that there must be an innate language of thought with at least
the expressive power of any Natural Language. Although the detailed
implications of Fodor's claim are still a matter of contention, (cf. e.g.
Carey, 1982, esp. p.357) it is reasonably certain that his thesis renders
implausible the position of e.g. Halliday, who champions the view that
language is socially determined and claims that "in the very first instance,
he [the child] is learning that there is such a thing as language at all"
(1975:10). The only plausible construal of this remark in the current context
would be that the child is becoming aware that communication can be
effected by using a syntactically structured medium analogous to that he
uses to think with. Such a position is compatible with the possibility that,
given a few lexical items, the child can initially by-pass Natural Language
syntax by exploiting pragmatic processes to set up a representation in
the Language of Thought. (For discussion of Halliday's position, cf. Smith
1988b; for the position that the Language of Thought is the Natural
Language acquired, cf. Smith 1983).
The relevance of these remarks here is that for many people "pragmatics"
subsumes notions of social interaction and control, which are irrelevant
to the acquisition of grammar except insofar as they are concomitants
of the normal input of data the child needs as triggering devices. That
is, there is no evidence that differences of social environment determine
differences of grammatical development. As Dore (1979:360) put it: "...
while abstract linguistic structures can not be acquired by the child on
the basis of his communicative experience, a communicative environment
is necessary to provide the child with empirical sources against which to
assess his hypotheses about structure". Apart from the questionable
assumption that children test their nascent hypotheses, this remark seems
as valid now as a decade ago. Despite the cogency of Dore's observation,
the same volume contains typical examples of a not unusual confusion
between the acquisition of grammar and the acquisition of the ability to
participate in inter-personal interaction. For instance, Bates & MacWhinney
claim that "the child's acquisition of grammar is guided not by abstract
categories, but by pragmatic and semantic structures of communication
interacting with the performance constraints of the speech channel" (Bates
& MacWhinney, 1979:168). The nature of such "pragmatic and semantic
structures" and how they eventuate in a complex syntax is never disclosed;
and elsewhere in the same article (p.210) they talk of the child "encoding"
aspects of the language in a way which presupposes the existence of the
grammar which is putatively being acquired.
It is necessary to exclude from consideration two further possible
interpretations of the original question. First, a number of writers have
suggested that certain rules or principles of the grammar might be usurped
by pragmatic considerations. That is, what were previously deemed to
be bona fide grammatical rules may turn out not to need incorporating
into the grammar at all, as the phenomena concerned fall out automatically
from independently motivated pragmatic considerations. A typical example
is provided by Lust (1986) who discusses whether part of Binding Theory
can be reduced to pragmatics. Similarly, Kempson (e.g. 1988 and work
in progress) has embarked on a revisionist attempt to construct a grammar
in which Binding Theory, while articulated within the grammar, is im-
plemented outside it, with the appropriate generalisations captured by
Relevance Theory. Clearly, to the extent that such attempts are successful,
there will be in these domains simply no parameters to fix. For present
purposes I shall assume that in some domains (including Binding Theory),
there are parameters and that therefore the question of whether pragmatics
is causally involved in fixing them remains coherent.
Second, there is an extensive literature on the effect of "pragmatic
context" on the child's interpretation of the sentences to which he is exposed.
For instance, Lust (1986:82ff) discussed the effect of priming on children's
judgements of coreference. She showed that when children were primed
with the name of one of the characters mentioned in the test sentences,
the probability of their opting for coreference increased even when such
280 N. V. Smith
coreference was configurationally excluded. Again, the conclusion must

be that such considerations are irrelevant to the fixing of parameters, as
the experimental paradigm concerned presupposes that the relevant part
of the grammar has already been at least partly internalised, even though
certain of its constructs may be over-ridden. There is no coherent possibility
that the "pragmatic context" could determine the form of the grammatical
rules as opposed to the interpretation of individual sentences construed
by reference to those rules.
3. RELEVANCE
With these clarificatory preliminaries out of the way, we can turn to an

outline of the main features of the Pragmatic and Linguistic theories
involved. The following remarks are intended to act as priming devices
for the already initiated rather than as tutorial overviews for neophytes.
The latter are referred to Sperber & Wilson (1986), Chomsky (1986), or
the relevant chapters of Smith (1989).
The heart of Sperber & Wilson's theory is the Principle of Relevance
given in (1):
(1) "Every act of ostensive communication communicates the pre-

sumption of its own optimal relevance." (1986:158)
This somewhat opaque formulation can be taken for present purposes

as equivalent to: "Every utterance carries a guarantee of optimal relevance
to the hearer" which can be further interpreted as follows. An utterance
is relevant if, and only if, it has "contextual effects" that is, if it allows
the hearer to deduce conclusions that would follow neither from the
utterance alone nor from the context alone; it is optimally relevant if,
and only if, it achieves adequate contextual effects, and puts the hearer
to no unjustifiable effort in achieving them; it is consistent with the principle
of relevance, on a given interpretation, if, and only if, a rational speaker
might have expected it to be optimally relevant to the hearer on that
interpretation. All comprehension involves the (unconscious) use of the
criterion of consistency with the principle of relevance, as can be most
clearly seen in, for instance, processes of reference assignment and dis-
ambiguation.
Consider the utterance of (2):
(2) He's taken the collection
The grammar tells us that some male person has done something involving
a collection, but whether "he" refers to the churchwarden you were just
chatting to, or an unknown burglar; whether "take" is synonymous with
"solicit" or "steal"; and whether the "collection" is the money solicited
or the Meissen absconded with are pragmatically determined. If you have
just entered a ransacked room with someone who then says (2) to you,
you will interpret it as a comment on a theft rather than as a quotation
from the vicar, simply because that is the only construal that a rational
speaker might have thought worth your attention: the only reading that
is consistent with the principle of relevance. If you have just asked your
pew neighbour where Fred has disappeared to and he responds with (2),
you will take it as a comment on a normal part of church ritual. In neither
case is the other interpretation impossible, given additional contextual
assumptions, but the complexity of the contextualising legerdemain ne-
cessary to arrive at it makes it vanishingly unlikely.
Our ability to exploit contextual information in this way is automatic
and unconscious: so much so that we frequently fail to notice indeter-
minacies or ambiguities in utterances addressed to us. Even the child still
in the process of acquiring his first language can represent to himself
sufficient of the context to make some understanding possible, (cf. Smith
1988b, for further discussion), and it is not implausible that the tendency
to maximise the relevance of incoming stimuli, and the notion of optimal
relevance, are innate. If so, one might well imagine that considerations
of relevance could be exploited in the process of language development.
4. PARAMETERS
The Principles and Parameters framework argues that U(niversal) G r a m -

mar) is characterised by a number of principles which, despite their
universality, allow of a certain amount of parametric variation. The simplest
example is provided by the "Extended Projection Principle" according
to which all sentences in all languages have a subject. Manifestly, however,
not all sentences do have overt subjects, and languages can differ with
respect to the classes of sentence in which they allow the subject position
to be empty. One part of this variation is accommodated by the "pro-
drop parameter".
The pro-drop parameter, which is set differently for English and (for
example) Italian, accounts for a constellation of differences between the
languages of which the most obvious is the potential absence of one class
of subject pronouns in Italian and typologically similar languages, and
their obligatory presence in languages such as English. Thus, beside (3),
Italian also allows (4) whereas English allows only the former:
282 N. V. Smith
(3) Giovanni ha mangiato una mela - Giovanni has eaten an apple
(4) Ha mangiato una mela - *Has eaten an apple
Correlating with this difference is the existence in "pro-drop" languages

of so-called 'free inversion' of the kind exemplified in (5):
(5) Ha mangiato Giovanni - Giovanni has eaten
and a number of other phenomena, including the presence of expletives

such as it and there in non-pro-drop languages, and their absence from
pro-drop languages. A central task of current work on language acquisition
is to determine the precise developmental sequence in the emergence of
parametric phenomena and to discover what causes that sequence.
5. HYAMS
In recent work on the acquisition of the syntax of pro-drop, Hyams (1986)

argues that children assume that English is pro-drop and have to learn
that it is not, on the basis of their exposure to particular pieces of evidence.
This claim is of interest both inherently and because it is diametrically
opposed to the prediction of the "Subset Principle" according to which
precisely the reverse sequence of stages is gone through. That is, as the
sentences of a non-pro-drop language constitute a proper sub-set of those
of a pro-drop language, and as negative evidence is, by hypothesis,
unavailable, the child acquiring English will start by assuming that it is
non-pro-drop. (For discussion, cf. Wexler & Manzini, 1987; also Atkinson,
this volume).
The evidence that Hyams claims children use in progressing from one
stage to the next is in part structural, e.g. whether the language being
learned contains expletives, and in part pragmatic: specifically, the ex-
ploitation of the Avoid Pronoun Principle which, in Chomsky's (1981a:65)
formulation is "interpreted as imposing a choice of PRO over an overt
pronoun where possible".1
The Avoid Pronoun Principle accounts for the choice of (6) rather than
(7) (taken from Chomsky, ibid.) where his is to be construed as coreferential
with John:
(6) John would much prefer going to the movie

(7) John would much prefer his going to the movie
According to Chomsky (1981a:227) this principle is one of those which

"interact with grammar but do not strictly speaking constitute part of
a distinct language faculty, or at least, are specific realisations in the
language faculty of much more general principles ...". Hyams herself
describes it as a "universal pragmatic principle" (1986:96), and it follows
logically from the Principle of Relevance with its requirement that pro-
cessing costs be minimised with respect to any intended effect. That is,
on the (perhaps questionable) assumption that an utterance containing
some piece of overt material is harder to process than one in which that
material is absent, (7) is more complex than (6) in virtue of the presence
of his. The presence of this overt item has to be interpreted as conveying
relevant information not recoverable from the empty category in (6);
specifically that its antecedent is not the obvious, linguistically present
one, John, but some other person. Similarly, any overt linguistic entity
must be interpreted as contributing to the interpretation of the utterance
containing it, and if no such contributory function is discernible, then
the item concerned should be avoided.
If Hyams is right in claiming both that the Avoid Pronoun Principle
is pragmatic and that it is causally implicated in the fixing of the pro-
drop parameter, then the question we started with has been answered in
the affirmative. Given that the Avoid Pronoun Principle follows from the
Principle of Relevance, the first clause seems to be uncontroversially true;
what is still problematic is its causal implication in the fixing of the pro-
drop parameter. Let us examine this claim a little more closely.
Hyams argues that by hypothesis the child
"operates under the Avoid Pronoun Principle, and hence, expects that subject pronouns
will be avoided except where required for contrast, emphasis, etc. In English contrastive
or emphatic elements are generally stressed. Once the child learns this, any subject
pronoun which is unstressed might be construed as infelicitous ... the child could then
deduce that if the referential pronoun is not needed for pragmatic reasons, it must
be necessary for grammatical reasons; i.e. a null pronominal is impossible, and hence,
A G ^ P R O " (1986:94)
i.e. English is not pro-drop.

As pointed out by A. Smith (1988:245ff.), there are several problems
with this argument. First, there is some experimental evidence, summarised
in Solan (1983), to the effect that children older than those discussed by
Hyams have not mastered the role of contrastive stress. As Solan puts
it: "it's easier at first to talk loudly than it is to learn syntax" (1983:182),
so it is at best dubious to suggest that children can use their knowledge
284 N. V. Smith
of contrastive stress as a basis for learning other parts of the system. Second,
the assumption that pro-drop languages do not have expletives is suspect.
Welsh is (probably) a pro-drop language but normally manifests expletives
in sentences like that in (8) - where "e" is an expletive pronoun, not an
empty category:
(8) cy ydy e
dog is it - "It's a dog"
so the structural evidence the child can use is less clear-cut than Hyams'
argument requires. Third, while "general" it is not the case that stress
is a necessary concomitant of subject pronouns (in pro-drop or non-pro-
drop languages), so the evidence available to the child is of minimal salience.
Because of such considerations and in particular because of the need for

pragmatic principles of interpretation to have an antecedently cognised
syntactic structure to work on, I concluded previously that "it is in principle
impossible for a pragmatic principle to be ... implicated [in the fixing
of parameters]" (Smith, 1988a: 197).
6. FIXING
This position in turn, however, is not unproblematic. In particular, it is

not self-evident that the mode of operation of pragmatic principles, whereby
they operate over linguistically decoded strings of the grammar, carries
over unchanged from the synchronic analysis of adult speech to the
ontogenetic development of the knowledge the child ends up with. That
is, in order for the child to convert some linguistic input into a representation
in his Language of Thought, it may not be necessary for him to have
a (completely) syntactically analysed string for his pragmatic principles
to work on: not all the "primary linguistic data" that the child is exposed
to is part of his "triggering experience" (cf. Lightfoot, 1989:324f.). Con-
versely the question broached in this paper can be rephrased as querying
whether the relevant triggering experience has to be restricted to primary
linguistic data in the sense of morpho-syntactic representations, or whether
it should be taken to include semantic and conceptual representations
constructed in part on the basis of purely linguistic input but in part on
the basis of other perceptual and cognitive processes.
Let us see a little more closely how a parameter becomes fixed, working
on the assumption put forth in Chomsky (1987:61) that "the initial state
of the language faculty can be regarded as ... a deterministic input-output

system that takes presented data as its input and produces a cognitive
system as its 'output'". Consider how the child might fix the Head-first/
Head-last Parameter on the basis (in part) of exposure to an utterance
like that in (9):
(9) Fred ate beans
It is assumed that the child knows the meanings of the individual words
and that he perceives some relation between these words and the actions
associated with them on particular occasions. By hypothesis, moreover,
UG will provide him with categories like V and N, and X-bar theory
will give him the category VP. Accordingly, ate will be identified as a
V, beans as its internal argument, and ate beans will be automatically
analysed as VP. As V is the Head of VP it follows that the parameter
will be set to Head-first. That is, given the data and innately specified
knowledge about UG, in particular X-bar theory, the analysis is indeed
deterministic.
Provided one accepts that the child's perception of the situation described
above allows him to identify beans rather than Fred as the internal argument
of ate, it is not difficult to see how this particular parameter can be fixed
deterministically in the absence of further pragmatic considerations. Is
it possible to provide an equally deterministic account of the fixing of
other parameters in the same autonomous fashion? Take as an example
the Subject Antecedent Parameter, according to which a proper antecedent
for an anaphor is either: a) a subject NP or: b) any NP. In English, the
setting for the parameter is (b), in most other languages it is (a); so (10)
is ambiguous in English - with himself able to refer to John or Bill, whereas
its congener in Hindi or Swedish is univocal - with only John as a possible
antecedent.
(10) John told Bill about himself
In adult conversation, as presumably in the case of a child, (10) would

typically be disambiguated by means of the criterion of consistency with
the Principle of Relevance, giving a mental representation in which either
John or Bill is identified as the person spoken about. It does not follow,
however, that the child has used a syntactic analysis in which that
interpretation is represented. That is, for the child, himself may be identified
merely as a referring expression, and the referent may be determined
independently of the syntax, and before the syntactic system attains its
adult steady state. 2 It is moreover quite plausible that the child should
have greater difficulty in arriving at an analysis involving the identification
286 N. V. Smith
of a category such as "anaphor" than one involving the identification

of a category such as "verb". The relation between syntactic verb/argument
complexes and logical predicate/argument complexes is more easily pre-
dictable than that between syntactic anaphors and their logical congeners,
precisely because of the potential confusion of anaphors with R-expressions.
That is, given the assumptions about word-meaning above, the linguistic
relation between the verb eat and the NP beans is transparently mapped
onto the parallel conceptual relation between the predicate EAT and its
logical argument BEANS. Determining the relation between himself and
its anaphoric or exophoric antecedent is less straightforward. Assume that
the child knows that himself is an NP and canonically therefore a referring
expression. Even if he also "knows" that R expressions must be free
(principle C of the Binding Theory), he cannot use that: knowledge to
determine the status of himself as an anaphor or an R expression until
he has decided (pragmatically) whether its referent is the same as that
of "John" or "Bill" or of some third person. A fortiori he cannot determine
whether the language he is learning is like English or Hindi with respect
to the Subject Antecedent Parameter until he has made such a decision,
and for that he needs an indexed string, not just a labelled bracketing.
Despite the possibility of arriving at the correct interpretation of a particular

utterance without crucial resort to the syntax, it obviously remains the
case that the parameter finally does get fixed. Moreover, this can happen,
as a matter of logic, only after a (syntactic) analysis has been tried out,
and has been seen to provide a successful mapping from the Natural
Language to the Language of Thought. Further, the fixing of the parameter
is unlikely to be established on the basis of a single successful act of
interpretation, as the Natural Language syntax might be "by-passed" on
one such occasion. To fix the parameter, any (pragmatic) by-pass strategy
must be phased out: i.e. the possible neglect of syntax must be superseded
by a stage where, after suitable feedback, it is indeed causally involved
in providing more tightly constrained possibilities for pragmatic interpre-
tation.
On an alternative scenario there might perhaps be no "by-pass" stage,
with its attendant difficulty of requiring a (presumably maturational)
specification of when syntax does make an appearance. Rather, one might
assume that processes of pragmatic interpretation operate initially on the
basis of the configurations provided by Universal Grammar, with default
settings for the parameters being gradually replaced by those of the language
being learned. In either case the parameter is ultimately fixed as a result
of the interaction of linguistic and non-linguistic factors, with the latter
being particularly important for those parts of the grammar where
something beyond a simple labelled bracketing is required. That is, whereas
the Head-first/Head-last Parameter can be set on the basis of a simple

tree, an input sufficiently rich to fix the Subject Antecedent Parameter
(and presumably all parameters related to LF) requires not just a tree,
but an indexed tree.
Let us return briefly to the claim that language acquisition is socially
determined. The proposed interplay of syntactic and pragmatic principles
enabling the child to construct representations in the Language of Thought
can not only account for the germ of truth in the claim that (ostensive)
communication is prerequisite to language acquisition, but can also provide
a partial explanation for the perhaps surprising absence of the Evil
Neighbour Syndrome:3 the situation where some malicious being sneaks
into the infant's presence and whispers in its ear the misleading (9'):
(9') Fred beans ate
thereby mis-setting the Head-first/Head-last parameter with dire results

for the nascent system. Unless the Evil Neighbour correlates his inter-
ventions consistently with perceived regularities in the child's environment,
they are overwhelmingly unlikely to optimise relevance for the infant.
Accordingly, the possibility of such mis-setting is essentially eliminated
by the Principle of Relevance: the child simply ignores the irrelevant
stimulus. If the Neighbour is consistently malign, of course, the child will
grow up bilingual, with his linguistic systems differing precisely in the
setting of the relevant parameter.
7. CONCLUSION
Where does this leave us with our initial question? On the one hand, it
is clear that pragmatic factors are not directly causally involved in the
fixing of parameters in the way that principles of Universal Grammar
such as X-bar theory are, so Chomsky's claim that the development of
grammar is independent of cognitive considerations is partially vindicated.
On the other hand, Hyams' contention that pragmatic principles play a
role is indirectly correct, in that it seems necessary to assume that:
pragmatics (in the form of the Principle of Relevance) contributes to providing the data
which constitute the evidence for the analysis which, once arrived at, deterministically
sets the parameter.
Finally, this formulation suggests the need for greater care than is customary
in the use of the term "primary linguistic data". The linguistic data the
child uses in acquiring his first language are representations - phonological,
288 N. V. Smith
morphological and syntactic - in some canonical notation. He arrives at

these representations in part on the basis of exclusively linguistic principles
such as X-bar theory, but in part on the basis of pragmatic and cognitive
principles, specifically the Principle of Relevance. The causal involvement
of such principles in language acquisition does not make them part of
the linguistic data, primary or otherwise. In exploiting the work of Chomsky,
Fodor and Sperber & Wilson, all of whom subscribe to some form of
the modularity hypothesis, I hope to have shown how the combination
of these views has brought us within reach of understanding not only
the nature of language, but also some aspects of its acquisition and use.
To carry the enterprise further we need minimally to define our terms
so that simple questions such as that posed in the title can be coherently
(if not simply) answered. I hope that the answer suggested here and
highlighted above is at least coherent.
FOOTNOTES
* I am grateful to Iggy Roca for inviting me to present the predecessor of this paper at
the University of Essex, and for coercing me into resuscitating it subsequently. I am likewise
grateful to those who contributed to the discussion and forced me to revise my ideas (some
of which appeared in Smith, 1988a). I am particularly indebted to Michael Brody, Robyn
Carston, Annabel Cormack, Deirdre Wilson, and an anonymous referee, who have all plied
me with constructive suggestions and saved me from innumerable solecisms and stupidities.
I alone am to blame for remaining errors and infelicities in the paper. Iggy is to blame
for its appearing at all.
1. Note that to be able to "Avoid a P r o n o u n " presupposes the grammatical knowledge

of what a " P r o n o u n " is, though in the present case it might be sufficient to exploit the
difference between the presence and absence of phonological content, which would be
constrained by considerations of processing effort, cf. Fodor, Bever & Garrett (1974); Gleitman
et al. (1988) for discussion of this problem. For the notion of triggering implicit here, cf.
Davies (in prep.).
2. That the syntactic and pragmatic systems may be dissociated in this way is evident
from cases such as that of " J o h n " (See Blank et al., 1978) or "Clive" (Smith, 1989).
3. I am grateful to Jonathan Kaye for this delightful locution.
REFERENCES
Bates, E.and B. MacWhinney. 1979. A functionalist approach to the acquisition of grammar.

In E. Ochs and B. Schieffelin (eds.) Developmental Pragmatics. 167-211. New York: Academic
Press.
Blank, M., M. Gessner and A. Esposito. 1978. Language without communication: a case
study. Journal of Child Language 6. 329-352.
Carey, S. 1982. Semantic development: The state of the art. In E. Wammes and L. Gleitman
(eds.) Language Acquisition: The State of the art. 347-389. Cambridge University Press.
Chomsky, N. 1981a. Lectures on Government and Binding. Dordrecht: Foris.

Chomsky, N. 1981b. Principles and parameters in syntactic theory. In N. Hornstein and
D. Lightfoot (eds.) Explanation in Linguistics. 32-75. London: Longman.
Chomsky, N. 1982 The Generative Enterprise. A Discussion with Riny Huybregts and Henk
van Riemsdijk. Dordrecht: Foris.
Chomsky, N. 1986. Knowledge of Language: Its nature, Origin and Use. New York: Praeger.
Chomsky, N. 1987. Language in a Psycholinguistic Setting. Special issue of Sophia Linguistica:
Working Papers in Linguistics 22. Tokyo: Sophia University.
Davies, M., in preparation. Learning, growth and triggering in language acquisition. Ms.
Birkbeck College.
Dore, J. 1979. Conversational acts and the acquisition of language In E. Ochs and B. Schieffelin
(eds.) Developmental Pragmatics. 339-361. New York: Academic Press.
Fodor, J. 1975. The language of thought. New York: Crowell.
Fodor, J., T. Bever and M. Garrett. 1974. The Psychology of language. New York: McGraw-
Hill.
Gleitman, L., H. Gleitman, B. Landau and E. Wanner. 1988. Where learning begins: initial
representations for language learning. In F.Newmeyer (ed.) Linguistics: The Cambridge
Survey, vol. 3. 15-193. Cambridge: Cambridge University Press.
Halliday, M. A. K. 1975. Learning How to Mean. London: Edward Arnold
Kempson, R. 1988. Grammar and conversational principles. In F. Newmeyer (ed.) Linguistics:
The Cambridge Survey, vol. 2. 139-163.
Lightfoot, D. 1989. The child's trigger experience: Degree-0 learnability. Behavioral and Brain
Sciences 12. 321-375.
Lust, B. (ed.) 1986. Studies in the Acquisition of Anaphora, vol.1: Defining the Constraints.
Dordrecht: Reidel.
Roeper, T. and E. Williams, (eds.) 1987. Parameter Setting. Dordrecht: Foris.
Smith, A. 1988. Language acquisition: Learnability, Maturation, and the Fixing of Parameters.
Cognitive Neuropsychology 5. 235-265.
Smith, N. V. 1983. Speculative Linguistics. An inaugural lecture delivered at University College,
London. Published by the College.
Smith, N. V. 1988a. Principles, parameters and pragmatics. Journal of Linguistics 24. 189-
201.
Smith, N. V. 1988b. First language acquisition and relevance theory. Polyglot. Vol. 9, fiche
2, 1-29. (papers from the 1986 Cumberland Lodge Conference).
Smith, N. V. 1989. The Twitter Machine. Oxford: Blackwell.
Solan, L. 1983. Pronominal Reference: Child Language and the Theory of Grammar. Dordrecht:
Reidel.
Sperber, D. and D. Wilson. 1986. Relevance: Communication and Cognition. Oxford: Blackwell.
Wexler, K. and R. Manzini. 1987. Parameters and learnability in Binding Theory. In T.
Roeper and E. Williams (eds.) Parameter Setting. 41-76. Dordrecht: Reidel.
Author Index
Abercrombie, D. 157, 159, 160 217,218,225,261

Abney, S. 16, 142, 201, 202 Brown, C.H. 101, 120
Adams, M. 236 Browman, C.P. 109
Adjemian, C. 262, 267 Bronckart, J. 39
Aldridge, M. 13 Burt, M. 260, 261
Allen, G.D. 160, 165, 166, 172 Burzio, L. 75, 77, 194
Anderson, H. 106 Bybee, J.L. 109, 129, 130
Aoun, J. 237
Atkinson, M. 18, 25,95, 282 Cairns, H.S. 37
Awbery, G.M. 241 Carey, S. 278
Carroll, J.M. 42
Baddeley, A.D. 36 Cazden, C.B. 206
Baker, M. 5, 12, 13 Cedergren, H. 93
Bally, C. 117 Chien, J.C. 191
Bates, E. 93, 107, 109, 279 Chien, Y.-C. 23
Bellugi, U. 201, 206, 212, 217-219, 225 Chomsky, N. 6, 8, 13, 15, 19, 33-36, 44, 48,
Belletti, A. 150, 151 73, 87, 88, 91, 94, 97, 100, 103, 105, 110,
Benediktsson, H. 127 112, 115, 116, 118, 138, 139, 140, 141,
Bennis, H. 236 145, 154, 201, 224, 227, 236, 237, 241,
Bennett-Kastor, T. 35, 37,45 244, 247, 262, 277, 278, 280, 282, 283,
Berwick, R.C. 18, 181, 235 284, 287, 288
Beukema, F. 102 Clahsen, H. 220, 266, 269, 270
Bever, T.G. 95, 117, 119, 128, 225 Clark, H. 89, 90, 114
Bickerton, D. 118, 124 Classe, A. 160
Bing, J.M. 169 Coker, C.H. 109
Blalystok, E. 261 Cook, V.J. 36, 37, 39, 251
Bley-Vroman, R. 266, 269 Coopmans, P. 85, 102
Bloom, L. 38, 88, 206, 207, 209, 221 Corbett, G. 113-115, 117
Bolinger, D. 158, 159, 162, 172 Corder, S. 260
Booij, G.E. 164 Crain, S.C. 35
Borer, H. 15, 24, 25, 63, 69, 73-79, 81, 123, Culicover, P.W. 2, 7, 8, 10, 47, 50, 51, 107
152, 199, 225, 236, 241 Cutler, A. 161
Borzone de Manrique, A.M. 160, 172
Bortolini, U. 162 Darwin, C. 161
Bouchard, D. 236-238 Dasher, R. 158, 172
Bowerman, M. 77, 201, 206, 207, 218 Dauer, R. 160, 162-164, 172
Boysson Bardies, B. de 161 Dawkins, R. 98
Braine, M.D.S. 201, 206 De Villiers, P.A. 206, 215, 225
Brewer, M.A. 109 De Villiers, J.G. 206, 215, 225
Brown, R. 35, 50, 201, 204, 206, 212, 215, Dell, F.
292 Author Index
Den Os, E. 159, 160, 164, 166, 167, 172 Hale, K. 194, 236
Deutsch, W. 191 Halle, 12, 47, 48
Dgani, R. 220 Halliday, M.A.K. 39, 278
Dornum, D. 39 Hammond, M. 13, 47, 49, 52, 53
Donleavy, J. P. 39 Hanlon, C. 50
Donovan, A. 161 Harbert, W. 177
Dore, J. 279 Harder, J.H. 109
Downes, W. 102 Harre, R. 103
Du Bois, J.W. 92, 104 Hasegawa, N. 246
Dulay, H. 260, 261, 270 Hasher, L. I l l
Duplessis, J. 266 Haviland, S.E. 89, 114
Hawkins, J.A. 85, 103
Edie, J. 99 Hawkins, S. 1, 165, 166
Elliott, W.N. 10 Hayes, B. 47, 49, 128, 164, 169, 170
Ervin-Tripp, S.M. 212 Heidegger, 99
Hill, J.A.C. 206, 217
Fabb, N. 237, 238 Hilles, S. 264, 268
Fant, 12 Hock, H.H. 127
Fassi-Fehri, A. 16, 202, 228 Hockett, C. 126
Felix, S. 260 Hoekstra, T. 23-25, 71, 77, 78, 80, 81
Ferguson, C.A. 7 Hooper, J. 110
Feyerabend, P. 45 Horning, J.J. 108
Fidelholtz, J.L. 109 Huang, C.T.J. 18, 236, 240, 251
Flynn, S. 259 Hudson, G. 126
Fodor, J.A. 17, 18, 24-26, 45, 265, 270, 277, Hurford, J.R. 87, 88, 92, 98, 104, 106, 107,
278,288 117, 120, 130
Foley, W.A. 103, 127 Husserl, E. 99
Foster, 271 Hyams, N. 13, 14, 18, 23, 24, 34, 38, 39, 40,
Fräser, C. 99, 201, 204, 212, 215, 218, 225 42, 43, 44, 67, 68, 70, 71, 73, 137, 212,
Fries, C.C. 93 222-224, 235-237, 239,241, 249, 250, 251,
Fukui, N. 202, 205, 217 282, 283, 287
Hyman, L.M. 94, 95, 113, 115, 117
Gass, S. 262
Givon, T. 86, 88, 89, 90, 102, 113-115, 117, Ingram, D. 43, 109
118, 124 Isard, S.D. 161
Gleitman, L. 7, 109, 201 Itkonen, E. 109, 115
Gleitman, H. 7, 109
Golinkoff, R.M. 85 Jaeggli, O. 81, 236, 250
Gold, E.M. 88, 107 Jakobson, R. 12, 63, 68
Goodluck, H. 199, 266 Jenkins, P. 265
Gordon, L. 85 Johnston, 270
Gregg, K. 265 Jordens, 270
Greenberg, J.H. 18 Jorge, 268
Greenfield, P. 206
Grice, P. 95 Kager, R. 169
Grimshaw, J. 100, 191 Kahneman, D. 110, 111
Gropen, J. 121, 122 Katada, F. 177
Guilfoyle, E. 202, 217, 250 Kayne, R. 65, 74
Guillaume, P. 218 Kazman, R. 202
Kean, M. 267, 268
Haegeman, L. 236 Kellerman, E. 262, 267
Author Index 293
Kempson, R. 279 McNeill, 228

Kiparsky, P. 169 Milroy, L. 90, 93, 94
Kiss, K.E. 189 Miller, G. 57, 58, 107
Kitagawa, Y. 140 Mithun, M. 121, 122, 129, 130
Klima, E.S. 217-219 Moder, C.L. 109
Koopman, H. 223 Morgan, J . L . 10, 33
Koopmans-van Beinum, F . J . 109 Mühlhäusler, P. 103
Koster, J . 152, 191, 194, 252 Muysken, P. 266, 269
Koster, C. 191
Krashen, S. 260, 261 Nakayama, I. 35
Kripke, S. 115, 116 Nespor, M. 168-172
Kroch, A. 93 Neu, H. 109
Newmeyer, F . J . 85, 87, 88, 90
Labov, W. 93 Newport, E.L. 7, 36
Lado, B. 122 Newson, M. 13, 14, 22, 23, 153
Ladd, R. 261 Noonan, M. 202, 217
Lakoff, G. 99
Langendoen, D.T. 117, 119 O'Connor, J . D . 159
Lasnik, H. 7, 11, 87, 103, 236, 247 O'Neil, W. 259
Lass, R . G . 124, 127, 128 Odlin, T. 262
Lea, W.A. 159 Oehrle, R.T. 7, 247
Lebeaux, D.S. 202, 223, 250 Osherson, D. 2, 7
Lehiste, I. 160, 162
Levin, J . 47 Pankhurst, J . 259
Liberman, M. 167, 169 Park, T.-Z. 220
Liceras, J . 268, 269 Pateman, T. 106, 107, 117
Lightfoot, D. 10, 87, 88, 100, 107, 112, 118, Pesetsky, D. 78
119, 120, 2 3 5 , 2 8 4 Peterson, G . G . 159
Lloyd, James A. 157 Phillips, B. 109, 110, 119
Locke, J . L . 109 Phinney, M. 217
Lust, B. 35, 279 Piattelli-Palmarini, M. 12, 17, 88
Pica, P. 73, 137, 138, 154, 155
Macnamara, J . 201 Picallo, 236
MacWhinney, B. 93, 279 Pienemann, 270
Macken, M.A. 93, 108, 109 Pike, K.L. 93, 157, 159
Maia, E.A.D.M. 164 Pinker, S. 24, 63, 79, 88, 100, 108, 111-113
Major, R.C. 164, 169 Platzack, C. 202, 239, 241
Mallinson, G . 85 Pollock, J . - Y . 15, 210, 228
Manzini, M.R. 13-16, 18-20, 22, 23, 67, 72, Preston, D . R . 39
137-139, 145,152, 153,155, 177, 178,181, Prince, A. 167, 169
182, 188-190, 2 3 5 , 2 3 8 , 2 4 0 , 2 4 3 - 2 4 5 , 250, Pullum, G. 101
252, 282 Puppel, S. 164, 169
Maratsos, M. 225
Martin, L. 101 Radford, A. 24, 34, 38, 4 0 , 4 4 , 201, 202, 208,
Martinet, A. 125, 128 209, 212, 214, 215, 217, 223, 250, 251
Mascarö, J . 164 Randall, J . 7
Matthei, E. 35 Rizzi, L. 71, 236, 237, 240
Mazurkewich, I. 267, 269 Roach, P. 159
McCarthy, J . J . 169 Roberts, I. 71, 81
McCawley, J . 94 Roca, I. 167, 171
McCloskey, J . 236 Roeper, T. 235, 277
294 Author Index
Rom, A. 220 Thiemann, 109

Romaine, S. 93 Tomaselli, S. 266
Rosen, S.T. 191 Tracy, R. 220
Rubach, J. 164 Traugott, E.C. 99, 130
Rutherford, W. 264, 273 Travis, L. 223, 239, 240, 241, 266
Tversky, A. 110, 111
Safir, K. 11, 16, 19, 22, 178, 236, 237, 239,
244, 250 Umeda, N. 109
Saito, M. 11,236
Saleemi, A.P. 7, 13, 19, 41, 50, 68, 100, 236, Van Buren, P. 264, 267, 269
251 Van Valin, R.D., Jr 127
Sankoff, D. 93 Vergnaud, J.R. 47
Saussure, F. de 92, 117 Vikner, S. 177
Schachter, L. 266, 269 Visch, E. 169
Schane, S.A. 125-127 Vogel, I. 168-172
Schlesinger, I.M. 201
Schieflelin, B.B. 220 Wasow, T. 76, 79, 80
Schultink, H. 169 Weinberg, A. 24
Schumann, J. 267 Weinstein, S. 2, 7
Schwartz, B. 261, 265, 266 Wells, C.G. 35, 212, 218
Scott, D.R. 161 Wexler, K. 2, 7, 8, 10, 13-16, 18-20, 22-25,
Sechehaye, A. 117 47, 50, 51, 63, 67, 69, 72-79, 81, 87, 107,
Selinker, L. 260-262 123, 137, 138, 152, 153, 155, 177, 178,
Selkirk, E.O. 166-169, 171, 172 181,182,188-191,199,225,235,238,240,
Sharwood Smith, M. 262,264, 267, 269 243, 244, 245, 247, 250, 252, 282
Shen,Y. 159 Wheeler, M. 164
Siewierska, A. 81 White, L. 123, 260, 266-269
Signorini, A. 160, 172 Williams, E. 79, 80, 235, 277
Sinclair, H. 39 Wilson, D. 95, 277, 278, 280, 288
Slobin, D.I. 204 Wittgenstein, L. 115, 116
Smith, N.V. 249,261,262,278,280,281,283, Wode, N. 267
284 Wright, C.W. 109
Snow, C. 7
Solan, L. 23, 283 Yang, D.-W. 152, 177, 189
Solin, L. 266
Sperber, D. 95, 277, 278, 280, 288 Zacks, R.T. I l l
Jportiche, D. 138, 140 Zipf, G.K. 128
Stob, M. 2, 7 Zobl, H. 264, 265, 267, 268, 270
Stowell, T. 169
Stromswold, K. 37
Subject Index
A movement 224-226 Case xx, 11, 13, 14, 16, 25, 66, 71-73, 76,
A-bar movement 225-226 142
A-chains xvi, 24,25, 69, 76-82 Case Theory 12, 15
A'-chains 25 Categories xix, xx, 10, 111, 112, 279, 285,
Acquisition xvi-xx, 1, 2, 6, 10-12, 23, 26, 286
33-36, 41, 42, 58, 59, 63-65, 67, 76, 85, Chains xvi, xx, 24, 25, 69, 76-82, 227, 228
101, 106, 109, 111, 131, 172, 206, 249- Child grammars (of English) xx, 75, 79,
251, 260, 261,263, 265, 267,270-273, 199-228
278, 279, 281, 282, 288 Cognition 95, 278
Adjectival passivisation 24, 76, 79-82 Cognitive processes 36, 277, 284
Agr 15, 41, 70, 71, 73-75, 79, 81, 214, 236, Communication 126
240, 250, 251, 266 Competence/performance xix, xx, xxi, 35-
Ambient language 246-248, 251 45, 87, 93, 113, 114, 117, 125, 279
Ambiguity 281 Complementisers: see C
Anaphor xvii, xix, 15, 17, 19, 22,23, 67, Comprehension 33, 35, 280
72, 73, 139-148, 152-155, 177, 181, 183- Concept Learning 19, 21
194, 226, 244, 252, 254, 285, 286 Continuity Hypothesis 24-26, 63, 72, 82
Antecedent 26, 140,141, 148,149, 151, Coreference 226, 227
154, 181, 186, 191,283,285, 286 Correlations and stages 42-44
Arena of Use xx, 85, 96-101, 103-107, 109, Covert nominals: see null arguments
110, 112, 118-120, 122-125, 127, 128, CP: see C
130, 131 Creative construction 261
Argumentai subjects (see also thematic Creativity xx
subjects) 239, 241, 246 Crosslinguistic-influence 271
Auxiliary 40,41, 75-77, 79, 81, 210, 213,
217 D 16, 25, 202, 204, 205, 209, 215, 219-221,
Avoid Pronoun Principle 241, 248, 282, 226-228
283 Defaults xx, 14, 15, 59, 60, 286
Determiners: see D
Barrier 138, 141, 143, 146, 148, 150-152, Developmental implications 249
154 Developmental mechanisms 2, 24, 25, 36,
Binarity xviii, 12, 13, 15, 49, 51, 54, 57 40
Binding xvii, xviii, xx, 22, 72, 139-141, Disambiguation 280
150, 185, 188, 193-195,226-228 DP: see D
Binding Theory 14, 16, 139-141, 154, 178-
181, 188, 189,227, 279,286 E-language xix, 34-45, 89-91, 122
Bounding node parameter 14 E-Language/I-Language 19, 34-45, 89-91,
By-pass 278, 286 123, 125, 130
Empty categories 151, 189, 222, 223, 226,
C 25, 41, 215-218, 226 236, 283, 284
296 Subject Index
Empty Category Principle 11, 154, 236 Innateness xix, xx, 33,47, 63, 278, 285
Enumeration 7, 8 Input xvi, xix, xx, 3, 33, 60, 72, 75, 277,
Epistemological priority 9, 10, 17 279, 284, 285, 287
Ergative verb 76-79, 81 Inversion (of subject and auxiliary) 67, 282
Evidence from absence 40-42, 44 IP:see I
Evil Neighbour Syndrome 287
Exact identification 247 L2 learnability 259, 262, 272
Expletive pronouns 282, 284 Language development 34, 42, 43, 87
Expletive subjects (see also pleonastic Language acquisition xx, xxi, 2, 4, 8, 33,
subjects) 14, 24, 43, 235, 237, 241, 282 35, 36, 40, 44, 63, 64, 66-69, 74, 82, 85,
Expletives 282, 284 92, 95, 107, 108, 159, 165, 177, 178, 199
Extended Projection Principle 281 Language Acquisition Device xx, 86, 88,
93,94, 97, 100, 101, 104-107, 112, 117-
Finite evidence 50 120, 123, 125, 130, 131, 268, 277, 278,
Finite verb inflections: see I 281,282, 287, 288
Fossilisation 260, 261 Learnability xvii-xxi, 7, 10, 14, 22, 23, 47,
Frequency 107-111, 117, 131 49-51, 53, 57, 60, 100, 108, 158, 180,
Functional categories 200, 201, 206, 209, 182, 235, 238, 240, 242, 249, 250, 259,
210 264, 265, 268, 272
Functional mechanisms of language Learning xv-xix, 2, 7, 10, 12, 17-22, 25, 26,
change 87 49-51, 53, 57, 58, 60, 63, 73, 76, 277,
278, 284, 286
Glossogenetic mechanism of functional Learning procedure 247
influence on language 87, 124 Levels 18, 47, 53-57
Governing category 11, 13, 19, 22, 23, 67, Lexical Dependencies 179, 180, 182, 185-
68, 72, 138-141, 143-149, 152, 181, 183, 188, 190-192, 194, 195
184, 186, 189, 190, 253, 254 Lexical Parameterisation Hypothesis xviii,
Governing Category Parameter xix, 15, 15, 16, 153, 177, 179, 183, 185, 195, 244
17, 19,21-23, 139-142, 182, 185-190, Lexical-thematic structures 219, 223-226,
194, 252 228
Government 138-141, 143 LF xvii, 13, 73, 154, 237, 238, 240, 287
Grammaticalisation 116, 119 Licensing 71
Locality xvii, xviii, 137, 138, 143, 145-154
Head-direction parameter 18 Logical Form (LF) xvii, 13, 73, 154, 237,
Head-first parameter 285, 287 238, 240, 287
Head-to-head movement 228 Logical problem of language acquisition
Hypothesis selection and testing xvii xv, xvi, 1, 2-11, 69
I 16, 24, 25, 40, 43, 210, 211, 213-215, 252, Markedness xx, 19, 22, 23, 57, 58, 63-82,
253 245, 249, 252, 254, 267
Identification 251 Markedness condition 244, 245, 253
I-language 34-45, 89-91, 122 of anaphors 67
I-Language and E-language 34-45, 89-91 of pronominals 67
Imitative speech 212 Markedness hierarchy xviii, xix
Implicit (or indirect) negative evidence 6, for anaphors xix
235, 248, 249 for pronominals xix
Independence Principle 22, 240 Maturation xvi, 9, 10, 24-26, 69, 73-79, 81,
Inflectional uniformity 251 82, 286
INFL-features 70 Maturational hypothesis xvi, 63, 69
Infinitive: see also I 210 Memory, short-term xx, 57-60
Inflection: see also I 40, 43 Methodology of acquisition research 34-45
Subject Index 297
Metrical grid 166 Positive identification 235, 245-247

Metrical theory 47-60 Pragmatic competence 219
Minimal governing category 253-254 Pragmatics xix, 91, 94, 95, 99, 100, 102,
Modal 212 118, 130, 131,277-288
Modularity 65, 288 Primary linguistic data 284, 287, 288
Morphological Uniformity 251 Principle of Relevance 280, 281, 283, 285,
Motherese 7 287, 288
Movement xvii, xx, 269 Principles and parameters xvi, xvii, xxi, 9,
10-26, 33, 65,66,277,278,281
Negation 261 Pro-drop (see also null arguments) xviii,
Negative evidence xviii, xix, 6, 7, 10, 18, 11, 13, 14, 19, 24,40, 42, 43, 45, 67, 69-
19,41, 67, 69, 242, 263-265, 270, 282 71,235, 236, 239-243,282, 283
explicit 6 Pro-drop language 38, 67, 68, 235, 240,
implicit 6 250, 281,282, 284
Nominals 202, 204, 205, 208, 209, 220, Pro-drop parameter 11, 24, 38-40, 67-69,
221, 224, 226, 227 137, 237, 238, 240, 242, 245, 248, 250,
Nonarguments 237, 240, 243, 244, 246 254, 268,281,283
Nonreferential subjects 241, 243, 246 Pronominals xix, 22, 150, 181, 183-185,
Null arguments 11, 13, 14, 19, 24, 40, 42, 187, 188, 190-192, 194, 207, 244, 254
43,45, 67, 69-71 Pronoun 15, 19, 23, 67, 69, 70, 141, 142,
Null subject parameter 11, 24, 38-40, 67- 145-148, 151-153, 155, 209, 281-284
69, 137, 237, 238, 240, 242, 245, 248, Proper Antecedent Parameter xix, 182,
250, 254, 268 189, 190,252
Observational data 33, 35-45, 68 Quasi-arguments 237, 239, 241, 243, 246
Questions: see interrogatives
Parameter xvii, xviii, xx, 10-16, 18-21, 25,
26, 47, 49, 53, 60, 63-70, 113, 137, 152, R expressions 286
153, 155, 157, 158, 172, 178, 180, 181, Reciprocal 145, 149-152
190, 192-195, 199, 235, 237, 240-244, Recreation Hypothesis 267, 269, 270
246, 247, 249-251, 262, 264, 270, 279, Redundancy constraint 248
284-286 Referential arguments 237, 244
Parameter fixation/setting xix, 10, 177, Reflexive 152, 194
235, 243, 247, 249, 252, 264, 277-288 Relevance Theory xix, 277-279
Parameter setting xvi, xviii-xx, 10, 12, 17, Rhythm xviii, 157-159, 161, 165-167, 169,
21, 26, 33, 34, 53, 59, 60, 63, 65, 67, 70, 172
72,81,82, 277, 281,285-287
Parameter value xvi, xviii, xix, 9-11, 13, Semantics 60
15, 18-23, 25, 26, 33, 63, 65-67, 70, 72, Small clause 75
82, 252 Spanning Hypothesis 22
Parametric variation 2, 15, 25, 250, 254 Statistical distributions 128
Parsing 103 Stress xx, 47-49, 52-60, 157, 159-164, 169,
Passive 24, 69, 73, 76, 79-82 283, 284
Passivisation 79, 224 Stress clash 49, 52, 167, 169-171
Phylogenetic mechanism of functional Stress lapse 171
influence on language 87 Stress-timed language xviii, 157-164, 172
Pleonastic subjects (see also expletive Structuralist linguistics 8
subjects) 12, 24, 43, 235, 237, 241, 243, Subject 38, 39, 41, 66, 68, 73, 139, 140,
246, 249, 250 143-145, 151-153, 155, 281, 283-285
Positive evidence xviii, 13, 23, 41, 47, 50, Subject Antecedent Parameter 285-287
66, 67, 69, 7 2 , 8 1 , 2 4 3 , 2 6 4 Subset Condition 20, 137, 138, 244, 251,
298 Subject Index
253 111, 112, 120, 153, 177, 182, 195, 199,

Subset Principle xvii-xix, 18-23, 137, 138, 249, 253, 263, 265-271, 281, 285-287
153, 155, 181, 182, 192, 235, 243, 244, Undergeneralisation Problem xviii, 177,
249, 282 178, 180
Subset Problem 250 Unergative verb 73, 76-78
Syllable-timed language 157-164, 172 Unique external argument principle xvii,
Syntactic Case 219, 220 73-76, 78
Universal Alignment Hypothesis 78, 79, 81
Thematic relations xx, 200 Universal grammar xvi, xvii, xx, 6, 8, 9,
Thematic roles 11, 12, 14-16, 71, 74, 76, 14, 17, 19, 22-24, 33-45, 47,49-52, 54,
78,79,81,82, 223,225 55, 58, 63-67, 69, 73, 74, 78, 97,98, 101,
Theta criterion 225 105, 107, 108, 111, 112, 120, 153, 177,
Theta marking 200, 203, 206, 212, 223, 224 182, 195, 199, 249, 253, 263, 265-271,
Theta theory 15 281,285-287
Transformations 11, 64 Universals 14, 39
Triggering xx, 9, 10, 21, 24-26, 73, 279,
284 Verbal passivisation xvi, 76, 79, 81
Triggering Problem 24, 63, 75 Visibility condition 237
UG xvi, xvii, xx, 6, 8, 9, 14, 17, 19, 22-24, Wh-movement 11, 14, 217, 218, 226
33-45,47, 49-52, 54, 55, 58, 63-67, 69,
73, 74,78, 97,98, 101, 105, 107, 108, X-bar Theory 15, 285, 287, 288

Logical Issues in Language Acquisition 9789067655064 9067655066 Compress

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Logical Issues in Language Acquisition 9789067655064 9067655066 Compress

Uploaded by

Copyright:

Available Formats

Logical Issues

Other books in this series:

I.M. Roca (ed.)

Distributor for the U.S.A. and Canada:

Distributor for Japan:

CIP-DATA KONINKLIJKEBIBLIOTHEEK, DENHAAG

ISBN 90 6765 506 6

© 1990 Foris Publications - Dordrecht

Printed in The Netherlands by ICG Printing, Dordrecht.

Michael Sharwood Smith

Michael Sharwood Smith

contraction of the output language, and in this light Hoekstra reinterprets

in accordance with their categorisation as phrases or heads. In her paper,

Problem (Safir 1987) by establishing 'Lexical Dependencies' between the

Abercrombie, D. 1967. Elements of General Phonetics. Edinburgh: Edinburgh University Press.

I take it that the logical problem of language acquisition has come of

The purpose of this paper is to offer an overview of the field. It will

I have found it convenient in my own thinking to attempt to maintain

a number of background assumptions and issues are presented, the paper

Recognition of the existence of a problem of the type with which this

(1) a. those concerning the space of hypotheses available to the

A logical problem exists, modulo such a framework, when it can be argued

(2) Lj = {a, aa, aaa, }

This set of languages is to constitute the hypothesis space of (la), i.e.

L 0 = {a, aa, aaa }

Now, it is easy to see that no procedure can be formulated which will

L0 in all circumstances will be successful precisely where the conservative

(5) ?Who did John wonder whether Mary kissed?

(6) *Who did John wonder whether kissed Mary?

Similarly, multiple wA-questions exhibit a subject-object asymmetry, with

(7) Who did what?

(8) ?What did who do?

(9) Mavuto a- na- umb -ir -a mpeni

(10) Mavuto a- na- umb -ir -a mfumu

(11) a. Mavuto a- na- u- umb -ir

For the benefactive applicative, however, only the benefactive NP can be

(12) a. Mavuto a- na- wa- umb -ir

of an existing system, such non-occurrences, after exposure to a specified

datum and all previous data will be selected as an alternative. On the

(13) a. any hypothesis compatible with the application of inductive

Since the neo-Bloomfieldians did not formulate a mentalistic acquisition

Inadequacies in the linguistic accounts produced by the structuralists led

(14) a. any rule system compatible with Universal Grammar, this

The Principles and Parameters framework can, in fact, be pursued in two

struction-specific rules within a rule-based framework to take account of

is grounded in learnability considerations (irrespective of whether mar-

(17) Universal Grammar consists of:

To appreciate what is involved in these alternatives, we might consider

Dropping the instantaneous idealisation of (16) gives us the schématisation

First, Wexler and Manzini offer it as a principle of a learning module.

I am puzzled by the status of this condition, and it is worth attempting

straints on the theory of parameters. For example, we could consider the

(20) Any given grammar contains at least an anaphor and a pronominal

as part of the theory of learnability or as part of the theory of grammar."

a maturationalist account of linguistic capacities, once one subscribes to

The theory of Universal Grammar (UG), as proposed by Chomsky (e.g.

1. EVIDENCE IN THE UG MODEL

2. I-LANGUAGE A N D E-LANGUAGE THEORIES

A starting point is the distinction made in Chomsky (1986a) between I-

3. OBSERVATIONAL DATA, PERFORMANCE A N D DEVELOPMENT

4. REPRESENTATIVENESS OF OBSERVATIONAL DATA

Let us now turn to some methodological issues with observational data.

evidence. A requirement for combining the I-language approach to know-

5. OBSERVATIONAL DATA A N D ADULT PERFORMANCE

(7) a. UG-{G,,G2,G3} (G 4 , G 5 , ....)