Professional Documents
Culture Documents
Biology & Philosophy Journal Sample
Biology & Philosophy Journal Sample
DOI 10.1007/s10539-017-9571-5
REVIEW ESSAY
Richard Moore1
Robert C. Berwick and Noam Chomsky (2016). Why Only Us: Language and
Evolution. MIT Press, Cambridge.
123
R. Moore
One of the more heated academic debates of recent years concerns the extent to which
the human ability to use and understand syntactically complex utterances is the result of
adaptations for language. The dispute has (roughly) two sides. Nativistsled by Noam
Chomskyargue that language acquisition can be explained only on the assumption
that, in addition to all of our general purpose cognitive abilities (memory, social
cognition, and the like) humans possess a hardwired and domain specific faculty of
language. This nativism is motivated by the poverty of the stimulus argument,
according to which young childrens exposure to language could never suffice for them
to learn that many possible natural language grammars are wrong. Since this would
make any natural language unlearnable, Chomsky argues that children must possess
genetically inherited knowledge of languageor Universal Grammar (hereafter UG),
the genetic component of the faculty of language (BC p. 90). UG constrains our
judgements about which of the possible sentences of a language are syntactically well
formedand thereby makes the task of language acquisition computationally tractable.
Historically, some prominent nativistsnot least Chomsky himselfhave doubted
that this faculty of language could have arisen under natural selection, and so argue that it
likely emerged by other evolutionary mechanismslike random mutation, or exaptation.
In contrast, proponents of the non-nativist Construction Grammar viewled by Michael
Tomasello, Morten Christiansen, and Nick Chater, among othershave argued that
language acquisition does not require any innate faculty of language. Moreover, they
argue that the domain-general cognitive mechanisms exploited in our language
acquisition and use have been shaped by natural selection in just the same manner as
our other cognitive processes. Thus debate crystalises around two questions: first, the
extent to which our syntactic abilities originate in brain functions that are specific to
language, and second the question of whether or not these abilities were the product of
natural selection. In principle, at least, answers to these questions are independent.
On the face of it, both nativist and non-nativist views of the origins of syntactic
structure are quite sane, and the merits of each have been defended by some of the
worlds best thinkers. Nonetheless, the debates have been characterised by mutual
acrimony. Both sides have been guilty of what Pullum and Scholz (2005) call
irrational exuberance.
Especially for newcomers and the undecided (among whom I include myself),
the recent publication of two books presents a good opportunity to assess the
progress of the competing views in this debate. Creating Language: Integrating
Evolution, Acquisition, and Processing by Morten H. Christiansen and Nick Chater
(2016 MIT; hereafter CC) presents a constructionist approach to language
understanding. Meanwhile, in Why Only Us: Language and Evolution (also 2016
MIT; hereafter BC), Robert C. Berwick and Noam Chomsky present the nativist
account. The latter is particularly notable because it is Chomskys long-awaited
book-length treatment of the subject of evolutiona topic on which his previous
remarks have been both spare and somewhat cryptic, and on which even relatively
devout Chomskyans have sometimes departed from his views.1 Both books are
1
On this point, see and Pinker and Bloom (1990), Jackendoff (2002), and Progovac (2015). Following
Chomsky these authors hold that humans do possess a hardwired faculty of language, but they argue that
it likely emerged gradually, and as a product of natural selection.
123
The evolution of syntactic structure
123
R. Moore
among extant creatures, BC doubt that any possess the word-like concepts that
humans do Moreover, BC are sceptical that even Neanderthals possessed language,
given that they left scant evidence of symbolic behaviour, which is taken to be a
reliable indicator of language use. Summarising their view, BC write:
In some completely unknown way, our ancestors developed human concepts.
At some time in the very recent past, apparently some time before
80,000 years ago , individuals in a small group of hominids in East Africa
underwent a minor biological change that provided the operation Merge an
operation that takes human concepts as computational atoms and yields
structured expressions that, systematically interpreted by the conceptual
system, provide a rich language of thought. At some later stage, the internal
language of thought was connected to the sensorimotor system[.] In the
course of these events, the human capacity took shape[.] (BC p. 87)
The book presents a fascinating array of empirical data, but it is not particularly
well organised or sign-posted. Partly because of this, and partly because BC engage
in a lot of (justifiably) cautious speculation, their exact view is sometimes tricky to
discern. Evidently, they take the human conceptual system to have emerged
sometime before 80kya, and so prior to the small rewiring of the brain (BC
p. 107) that created Merge. However, they think the circumstances surrounding the
arrival of the conceptual system wholly mysterious. If I understand correctly, they
also think that language proper emerged only with the last step of externalisation,
which came just under 50kya following a seemingly uniquely human change to one
of the regulatory elements of the FOXP2 gene (Maricic et al. 2013). This final tweak
to the sensorimotor interface enabled the rapid emergence of language. The speed of
this change could be so rapid because it arose not as a consequence of genetic
change, but due to changes in gene enhancers.
BC argue that selection pressure that lead to a genetic sweep for Merge would
have been independent of any function it served for communication. In their words:
the modern doctrine that communication is somehow the function of language is
mistaken a traditional conception of language as an instrument of thought is
more nearly correct (BC p. 107). Furthermore:
[I]nitially, at least, if there was no externalization, then Merge would be just
like any other internal trait that boosted selective advantage internally, by
means of better planning, inference, and the like. (BC p. 164)
This claim is just sketched, but it is, they say, supported by evidence that
language-using adults perform better in some reasoning tasks than pre-verbal infants
(BC p. 165). So on BCs view, even if some elements of externalisation were
subject to selection pressure for better communication, language itself was not.
BC is not an easy read, but it is a very good book. For a volume of 177 pages
(including notes), it is impressively detailed, and should certainly banish the myth that
Chomsky does not understand evolution. I found myself largely persuaded by their
argument for the claim that Merge might have arisen through a very minor biological
change. Though the move to the MP is now over 20 years old, I suspect that some will
see the very explicit retreat from the original vision of UG as a flight from the spirit of
123
The evolution of syntactic structure
what made Chomskys ideas distinctive. However, its not an objection to a view that it
has changed over time. This is particularly so when the later view is a carefully
considered reformulation of the original one, made in light of compelling scientific
argumentslike considerations about the evolvability of UG.
Nonetheless, not all of BCS arguments are persuasive. In particular, BC
sometimes overstate the differences between human and non-human communica-
tion and cognition. In several places, they argue that even our nearest relatives, the
non-human great apes, lack concepts. For example:
The minimal meaning-bearing elements of human languages wordlike, but
not words are radically different from anything known in animal
communication systems. (BC p. 90)
[T]ogether with the word-like atomic objects, Merge is one key evolutionary
evolution for human language. (BC p. 112)
BCs argument that apes lack any analogue of words is drawn from a single
casethat of the enculturated chimpanzee Nim Chimpsky, who was famously
taught sign language with very limited success. BC argue that Nim did not ever
learn the names of objects (e.g., apple), but only loose bundles of associations
(e.g., associating apple not just with apples, but with the knives used to cut them,
and so on). Moreover, they cite an analysis (Yang 2013) that reports that Nims
utterances lacked the combinatorial structure of human childrens utterances
suggesting that his gesture sequences were learned by rote.
While enculturated apes are known to produce gesture sequences (e.g., TICKLE
THERE), the claim that these lack syntactic structure is (or should be)
uncontroversialand has been replicated in cautious analyses of other enculturated
apes (Rivas 2005). However, while BC repeatedly claim that the emergence of
concepts in humans is mysterious, the claim that non-human communication lacks
word-like elements may be too quick. It is not implausible to claim that their
utterances are meaningfuland even word-likein the same respects as human
utterances are. For example, recent analyses of the gestural repertoire of wild
chimpanzees suggests that, while they do not use gestural signs to name objects,
they do use them in ways consistent with their possessing consistent semantic
properties (Hobaiter and Byrne 2014; see Moore 2014, 2016 for discussion). These
may cut the world at a cruder grain than do human conceptsbut, contra Davidson
(1982), this is no argument for thinking that they are not concepts at all. While
Rivas (2005) found the utterances he analysed to be largely consistent with cautious
analyses of Nims dataset, his discussion supports the conclusion that enculturated
apes (at least for the most part) use signs with relatively stable meanings. Given
that, its unclear that the lack of syntactic structure in ape communication is
attributable to conceptual shortcomings, and not just to their lacking combinatorial
abilities. Perhaps BC are after something more substantial herefor example, the
idea that only human concepts are sufficiently word-like to be bound hierarchically
by Merge.3 However, the argument for this claim is somewhat opaque, and would
(at least) have benefitted from a more detailed elaboration.
3
Thanks to an anonymous reviewer for this point.
123
R. Moore
4
In discussion of Kanzis abilities, one should always be wary of anthropomorphismbut a compelling
video illustrating his comprehension of novel sentences can be found here: https://www.youtube.com/
watch?v=2Dhc2zePJFE.
123
The evolution of syntactic structure
one final gripe. My first edition of the book is rather poorly edited. There are
numerous missing references (e.g., Frank et al. 2012 is discussed on p. 115 but
missing from the bibliography). Four of the diagrams in the book are also printed
twiceonce in black and white, and a second time in a set of handsome coloured
plates. This is presumably because the black and white illustrations are unusable,
since the accompanying descriptions refer to coloured aspects that are indiscernible
in grey (e.g. Fig. 4.4 on p. 160). This suggests that the book was rushed to press.
Hopefully for future editions MIT will employ a proofreader.
I turn now to CC.
The central premise of CC is that the argument that language is made possible by
a hardwired Universal Grammar generates a logical problem of language evolution.
The problem, as they see it, is that
it is mysterious how proto-language which must have been, at least initially,
a cultural product likely to be highly variable both over time and location
could ultimately have become genetically fixed as a highly elaborate
biological structure. (BC p. 24)
As this passage suggests, the main target of CCS attack is the original
formulation of UG that takes it to be a highly elaborate biological structure; and
particularly the non-Chomskyan developments of this view (e.g., Pinker and Bloom
1990) that take this biological structure to be the result of natural selection.
Occasionally, CC extend their argument against UG to include its more recent
minimalist versions. However, when they do so, their general strategy is to hint that
minimalist UG is so far removed from its older self that it is no longer recognisable,
and has somehow betrayed UG first principles.
In what follows I will devote more time to elaborating the details of CCs view
that shed light on possible ongoing disagreements between CC and BC. Since the
precise targets of CCs criticisms are somewhat hazily defined, and since the bulk of
CC is devoted to a positive argument that language acquisition can be explained
without an innate faculty of language, adopting this strategy will not distort the
contents of the book.
In response to the logical problem that they identify, CC reject the claims that
UG is genetically encoded, and that our brains are specialised for language. Rather,
they argue, language reflects pre-existing, and hence non-language specific, neural
constraints (ibid.). What makes language learnable is not UG, but the combination
of general learning mechanisms and cultural selection.
According to CC, understanding cultural selection is key to understanding how
we can get by without a UG, because it enables us to give up the claim that our
brains are adapted for language, and to replace it with the much less controversial
claim that the languages we speak have been adapted to our pre-existing brains.
In order for languages to be passed on from generation to generation, they
must adapt to properties of the human learning and processing mechanisms.
(CC p. 44)
123
R. Moore
A consequence of this is that languages that are difficult to learn will be weeded
out over generations, as speakers abandon or modify them in favour of alternatives
that are easier to master. Therefore, according to CC
the learnability of language is not a puzzle demanding the presence of
innate information, but rather an inevitable consequence of the process of the
incremental creation of language and culture more generally, by successive
generations. (p. 74)
Additionally, CC offer a range of short arguments against the various accounts of
UG that have been developed. With respect to the original Chomskyan view that
UG that arose by non-adaptationist means, they throw up the following dilemma. As
traditionally conceived, UG is a large and complex set of abstract rules. For these to
have arisen by chance would be infinitesimally low (p. 39). However, if they
emerged not by chance, but by virtue of being exapted from other cognitive
processes, then there is no reason to think they should be language specific in the
way that proponents of UG have claimed. In that case, a plausible story about
evolution cannot be told for anything remotely resembling the traditional picture of
UG (ibid.). Proponents of UG therefore face a dilemma: abandon UG orthodoxy,
or give up on an evolutionarily plausible account.5
The bulk of CC is dedicated not to criticisms of the nativist paradigm, but to the
development of the positive view that language learning requires no language-
specific neural substrates. A central tenet of this is that the neural substrates of
language can be explained in terms of a process of neural recycling, which they
explain as follows:
Instead of viewing various brain regions as being dedicated to broad cognitive
domains such as language, vision, memory, or reasoning, it is proposed that
low-level neural circuits are redeployed as part of another neuronal network to
accommodate a new function. (p. 44)
The central goal of the second half of CC is to develop the details of this view, in
order to show that, even in the absence of any UG, a domain-general sequence
learner can acquire aspects of syntactic structure (p. 157) via the integration of
multiple linguistic cues and statistical predictions about the relative frequencies
with which different words are combined.
At the heart of CCs positive account of language processing lies a problem that
they propose to solve. The problem, as CC conceive it, is that in both acquisition
and later use, units of language must be processed in real time in order to be
functional. This creates a Now-or-Never bottleneck, which arises from general
principles of perceptuo-motor processing and memory (CCp. 94) but provides
strong constraints on the sorts of things that can function as units of language. Units
of input that are too difficult to be processed quickly and efficiently will be
forgotten.
5
CC also offer arguments against gradualist versions of UG, but for reasons of space I will not discuss
them here.
123
The evolution of syntactic structure
However, at the same age found it much harder to grasp centre-embedded (3)
subject and (4) object relative clauses.
3. The cow that jumped over the pig bumps the sheep.
4. The sheep that the goat bumped pushes the pig.
123
R. Moore
This developmental lag would support both the existence of the Chunk-and-Pass
processing system, and the proposal that syntax mastery is bound up with
experience:
Only once chunking becomes faster and more efficient as a function of
repeated experience do children become better at processing centre-
embedded relative clauses. (CC p. 180)
This item-based approach to language learning might seem to threaten the
possibility of learning for exactly the sorts of reasons that first lead Chomsky to
posit the existence of UGnamely the poverty of the stimulus argument.
However, CC argue, this worry is groundlessbecause learning is facilitated in a
number of different ways. First, much language comprehension is over-deter-
minedbecause the proper interpretations of linguistic units are cued by multiple
sources. For example, the differences between nouns and verbs are cued not only by
syntactic properties, but by semantic and prosodic ones too, and that consequently
domain general simple recurrent connectionist machine learning networks (SRNs)
can learn some aspects of syntactic structure (CC pp. 157). Second, because over
time chunks of language are themselves subject to cultural evolution pressures,
linguistic units that cannot easily be processed will be abandoned or refined, one
item at a time. Over successive generations, language users will have selected and
retained linguistic structures the comprehension of which is over-determined and
which are consequently easily learned.
The final chapters of CC culminate with an argument that builds on two claims
already developednamely, that syntax processing involves domain general
abilities, and improves with experience. CC argue that recursion is also learned.
Recursion is the ability that allows the reuse of the same grammatical construction
multiple times in a given sentence (CC p. 197), and which enables us to produce
and understand sentences like Richard knows that Kim knows that Richard knows
this essay is overdue. It is closely linked to the computation that BC call Merge.6
However, in contrast to UG accounts, according to CC
the ability to process recursive structure does not depend on a built-in property
of a competence grammar, but, rather, is an acquired skill learned through
experience with specific instances of recursive constructions and limited
generalizations over these. (CC p. 203)
According to CC, what makes recursion possible is general purpose mechanisms
for sequence learning. Whereas BC argue that the evolutionary significance of
FOXP2 is limited to externalisation, CC maintain that changes to the human variant
made possible skills for sequence learning that ancestors had lacked. In support of
this claim (and among other things) they point to evidence showing both that mice
genetically modified with a human variant of the gene show improved sequence
6
In an earlier version of Chomskys minimalist UG (Hauser et al. 2002), recursion was thought to play
the same role that Merge does for BC. Despite the obvious continuity between Hauser et al. (2002) and
BC, in BC it is not entirely clear whether Merge is identical to recursion, or just a necessary precondition
for it.
123
The evolution of syntactic structure
learning of actions (e.g., Schreiweis et al. 2014), and that FOXP2 translocations in
humans have been associated with both language and sequence learning deficits.
Thus, they suggest, FOXP2 was selected for general-purpose sequence learning in
humans, and was only later recruited for language.
Two final points of disagreement with BC are worth mentioning. First, on UG
approaches to language comprehension, the limitations of language users to process
recursively complex sentences is taken to be the consequence of working memory
constraints that limit the function of the innate recursion mechanism. Thus failures
are attributed to performance issues, and not any problem of underlying
competence. By contrast, CC deny the performance-competence distinction, as
Chomsky developed it. For them, and in both ontogeny and phylogeny, linguistic
competence develops only as a function of performance. Thus, CC make their case
for a domain general account of language acquisition.
I found a great deal to admire in CC, but it was often a frustrating read. This is
not because I am unsympathetic to their project (on the contrary), but because the
book suffers from a number of flaws. First, as I mentioned previously, the precise
targets of many of their arguments can be poorly defined. To give one illustration,
the logical problem that CC identify with Chomskys UG targets the evolution of
syntax in natural languages. However, since BCs argument for nativism concerns a
language of thought, CCs objection does not obviously engage with it. Addition-
ally, arguments against the various versions of UGnon-adaptationist, adaptation-
ist, and MP versionsare also not always distinguished carefully, such that what
CC offer is less a considered criticism of the details of rival views, but a fairly fast
and loose rejection of a whole program of thought. This is a shame, both because
they make some very important points (for example, about the role of cultural
evolution), and because many of the points that they raise have already been
conceded by those whose views they are criticising. For example, BC would simply
agree with the claim that the original version of UG could not have evolved under
natural selection pressure. Since CC was being written before BC was published,
CC cannot be held accountable for not having responded to Chomskys most recent
formulations of his view. However, the views defended in BC are not new. They
build on foundations that have been in place for over ten (Hauser et al. 2002) and
twenty (Chomsky 1995) years, respectively. While these older works are discussed
in the text, the bulk of argument is directed against much earlier formulations of the
nature of UG.
Second, CCs positive account is built on metaphors that are superficially
compelling but that make the details of their positive claims very difficult to discern.
For example, the Chunk-and-Pass processing system combines elements that, on
BCs view, would be handled by both the sensorimotor system (for example, in the
computations that lead groups of phonemes to be recognised as words) and Merge
(as when words are bound into phrases and sentences). Thus, for CC the operation
of Chunking-and-Passing is doing an extraordinary amount of work. However, the
nature of Chunk-and-Pass processing is only ever sketched, and beyond the claim
that it recruits only domain general mechanisms, we are told next to nothing about
how it works. In that case, their central claim turns out to be largely a statement of
faith: there will turn out to be no syntax-specific neural processes involved in
123
R. Moore
chunking basic units of linguistic input into syntactically structured discourse. This
is a bold claim. However, while the data CC present suggest that domain general
processes might achieve far more than was once thought, they are a long way from
demonstrating that no language specific functions are needed.
Its also hard to escape the feeling that in some parts of the book CC are tacitly
appealing to something that looks a lot like Merge, built into the structure of their
Chunk-and-Pass processor. One feature of Merge that CC seem not to want to adopt
is its internal structure: elements bound by Merge are bound hierarchically, and not
merely additively. However, the possibility of a Chunk-and-Pass processing system
that was not organised in this way is difficult to explainbecause if there is no
hierarchical structure at work in basic chunking functions, its unclear why some of
the elements of a sentence should be chunked together and not others. BC make this
point forcefully, arguing that non-hierarchical comprehension cannot explain simple
and uncontroversial cases of binding. They give the example of the following
sentences
5. Birds that fly instinctively swim.
6. Instinctively birds that fly swim.
BC argue that (5) is ambiguous, since instinctively could modify either of the
verbs. By contrast, (6) is unambiguous. Here instinctively clearly modifies swim
and not fly. Their view can explain this on the grounds that instinctively is
hierarchically closer to swim than fly, as below (taken from Fig. 4.1 on BC
p. 117):
They argue that a non-hierarchical binding system cannot explain why sentence
(6) is not ambiguous in the way that they describe.
There may be a rejoinder that could be developed in line with CCs view.
However, CCs anticipation of the objection is weak. In a very short explanatory
text box that is not part of the main body of the text (p. 112), we are told that real-
time chatbots like Apples Siri do not use syntactic trees to parse commands, but
rely on probabilistic pattern matching with respect to individual words, and that
consequently language processing in the here-and-now can get by without
hierarchical syntactic structure (ibid.; see also Frank et al. 2012 for further
discussion). This aside provides no grounds for responding to BC. However, BCs
argument is fundamental to their rejection of the plausibility of domain general
sequence learning processes. A better responseconsistent with CCs argument
123
The evolution of syntactic structure
7
I owe this point to an anonymous reviewer.
123
R. Moore
syntax is grounded in language specific abilities, and (2) the question of whether or
not these abilities were the product of natural selection. With respect to (1), at least
if language is understood in terms of natural language, then neither BC nor CC
hold that syntax is language-specific. Further, with respect to (2), BC and CC agree
that syntactic abilities underwent natural selection, and that they did for functions
independent of any benefit for communication. Wherein they differ is that for BC it
was thought and planning that drove the selective sweep for Merge; for CC it was
general sequence learning abilities. These alternatives may not be mutually
exclusive though.
The major difference between BC and CC now concerns only the question of
whether general-purpose, non-hierarchical sequence learning mechanisms can
perform the single binding function that BC think is performed by Merge. CC dont
offer a detailed argument for thinking that it could. However, their book offers a
compelling illustration of the growing power of non-specialised machine learning
tools, and it is possible that they will be vindicated in time. In that case, this
disagreement should be settled by empirical developments.
In the meantime, I hope to have shown that the constructivist and UG approaches
to language development have a great deal in common. If that is right, then perhaps
the time of dichotomising rhetoric can be over.
Acknowledgements For helpful comments on the first draft of this essay, I would like to think Cameron
Buckner, Bryce Huebner, and one anonymous referee.
References
123
The evolution of syntactic structure
Moore R (2016) Meaning and ostension in great ape gestural communication. Anim Cogn 19(1):223231
Pinker S, Bloom P (1990) Natural language and natural selection. Behav Brain Sci 13(04):707727
Progovac L (2015) Evolutionary syntax. OUP, Oxford
Pullum GK, Scholz BC (2005) Contrasting applications of logic in natural language syntactic description.
Logic, methodology and philosophy of science. In: Proceedings of the twelfth international
congress, pp 481503
Rivas E (2005) Recent use of signs by chimpanzees (Pan troglodytes) in interactions with humans.
J Comp Psychol 119(4):404
Savage-Rumbaugh ES, Shanker S, Taylor TJ (1998) Apes, language, and the human mind. OUP, Oxford
Schreiweis C, Bornschein U, Burguiere E, Kerimoglu C, Schreiter S, Dannemann M, Goyal S, Rea E,
French CA, Puliyadi R, Groszer M (2014) Humanized Foxp2 accelerates learning by enhancing
transitions from declarative to procedural performance. PNAS 111(39):1425314258
Truswell R (in press) Dendrophobia in bonobo comprehension of spoken English. Mind & Lang
Yang C (2013) Ontogeny and phylogeny of language. PNAS 110(16):63246327
123