Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

Running head: FEELING FOR THE PHENOTYPE 1

A Feeling for the Phenotype


Robert C. Berwick
Massachusetts Institute of Technology
A FEELING FOR THE PHENOTYPE 2

1. Origins: the search for the human language phenotype


It is a truth universally unacknowledged that the birth of modern generative grammar circa 1950
also marked the beginning of modern biological and evolutionary thinking about language. While
it has been sometimes claimed that generative grammar and evolutionary biology have been at
odds from this beginning–see, e.g., Pinker and Bloom (1989) or Aitchison (1998)–the truth turns
out to be quite the opposite. The entire 60-year narrative arc of generative grammar has been a
search to characterize the human “language capacity,” or LC, as a constrained, evolvable trait or
phenotype, the physical “appearance” of an individual. Additionally, we would like to know the
genetic basis of the LC, what biologists call its genotype, but what is more commonly called
“Universal Grammar.” As Chomsky (2016) observes, it is LC genotype and phenotype that has
evolved, uniquely in our species, analogous to the distinct mammalian visual system as opposed
to the insect visual system.
Any account of language evolution, then, ought to begin with the clearest possible account
of what exactly is the LC phenotype. Here, Berwick and Chomsky (2016) provide a full picture.
This chapter revisits their account from a narrower, biological point of view. Specifically, it
contrasts generative grammar’s focus on the internal representations and computations–language
as internal thought. It contrasts this view with an “externalist” perspective, here using as a foil a
series of recent reviews by Fisher and colleagues that is tellingly almost exclusively described in
terms of the genetics of speech and communication, with our rich knowledge of the syntax of
generative grammar going virtually unmentioned (Fisher 2016; Fisher and Vernes 2015; Graham
and Fisher, 2015; Fisher, Deriziotis, and Vernes 2013)–language as speech.
Which is more appropriate as a description of the language phenotype, language as thought
or language as speech? This chapter concludes that the “internalist” perspective–language as
thought–remains the more valuable one for an evolutionary explanation of the quantitative
discontinuity between us and other animals. Other animals just don’t have language like us–not
simply speech, but language as an open-ended “inner mental tool.” In contrast, attention to just
speech or sign conflates language with speech, resulting in a misplaced emphasis on inessential
differences the ways in which internal language surfaces in various external forms. In short, the
internalist perspective suggests the right phenotype for language and its corresponding genotype
while the externalist one does not. Like the geneticist Barbara McClintock’s call that proper
biological insight demands “a feeling for the organism,” generative grammar’s biological approach
has developed the proper “feeling for the phenotype.”
The modern search for the language phenotype began just a few blocks from Harvard
Square in the early 1950s, when two young Harvard graduate students, Morris Halle and Eric
Lenneberg, met with an even younger Harvard Junior Fellow, Noam Chomsky, and read Niko
Tinbergen’s just-published volume, The Study of Instinct (1951) along with Konrad Lorenz’s
German articles from the preceding decade. As described in a recent historical review by Fitch
(2013), all three were immediately struck by the stark contrast between Lorenz and Tinbergen’s
rich views of complex innate behaviors underpinned by species-specific genetics, often triggered
by simple environmental cues as in the ritualized fighting of male stickleback fish, as compared to
the experience-driven staple psychological framework of that day, behaviorism.
The attraction is understandable. Tinbergen’s chapters read like a generative grammar
primer regarding the hallmarks of internally driven learning: growth and maturation shape the
trajectory of external behaviors; isolation leads to behavioral deficits; and critical periods both start
and stop learning plasticity. Last but not least, it was understood that Darwinian evolution played
a large hand in explaining why species differ–sticklebacks don’t react like baby grey geese to
A FEELING FOR THE PHENOTYPE 3

maternal geese models–each species comes equipped with a sophisticated, distinctive behavioral
repertoire released by different environmental triggers.
For Chomsky, Halle, and Lenneberg, language seemed no different. They set out on their
now familiar research program aiming to ferret out the “innate schemata” of human biology that,
along with experiential events, enabled human children to acquire language. Their aim was nothing
less than the characterization of the “faculty of language” as a true biological trait or phenotype–
the result of the human genome in conjunction with environmental interaction, just as Tinbergen-
Lorenz program would have it. Their notion of language as quintessentially biological was, of
course, quite different from the prevailing conception of the time, which considered “language” to
be essentially a collection of social and cultural dispositions, ultimately grounded on
communication. With only slight exaggeration, one might label the Chomsky-Halle-Lenneberg
approach to characterizing the language phenotype “internalist,” and the contrasting behavioral-
communicative approach “externalist”–echoing Chomsky’s I-language/E-language distinction.
If LC is ‘just another’ phenotypic trait, then what about its genomics and evolution? Two
explanatory hurdles were noted from the very outset, as expressed in Lenneberg’s early writings
(1964, 1967). The first was a paradox familiar from neuroscience: Just as the genome cannot
possibly fix the detailed inter-neuron wiring of the human brain, there cannot be enough
information storage capacity in the human genome to fix the detailed rules for each human
language. To be sure, some of the required information might be tapped from
environmental/experiential regularities, but to push this all the way to the cultural-communicative
position also seemed implausible, given the presumed scenario of acquisition with a relatively
impoverished input, as well as the observed regularities in human language. Rather, it might be
more reasonable to presume that the environment supplies low-information density experiential
“triggers,” each encoding just a few bits of information. All this was clearly understood sixty
years ago by Lenneberg and colleagues.
It therefore comes as something of a surprise to find this point revived decades later, in
2015 pitched as a novel discovery, and one presented as a potent objection to the possibility of
some substantive theory of an “internalist” language phenotype. However, this is exactly what
Graham, Deriziotis, and Fisher (2015) write: “The enormous diversity in the grammatical rule
systems of the world’s languages indicate that the finer details of these rules system are not
genetically encoded” (2015, 5). Indeed, the “finer details” of language are not, and cannot be
genetically encoded. But this position appears to be overly “externalist.” As has been described
elsewhere, “we know that all normal people, unlike any cats or fish, uniformly grow up speaking
some language, just like having 5 fingers on each hand, so language must be part of what is unique
to the human genome. However, if one is born in Beijing one winds up speaking a very different
language than if one is born in Mumbai, so the number-of-fingers analogy is not quite correct. The
right answer must therefore strike the right balance between sufficient genomic constraint
guaranteeing that every child will successfully develop an adult language and sufficient lability
guaranteeing that no matter where they are born each child will acquire the language of its
caretakers” (Berwick 2009). The Fisher position does not seem to advance us very much towards
this goal.
The second problem Lenneberg, Chomsky, and Halle immediately faced was this:
assuming that the appearance of human language was relatively recent in evolutionary history, the
required evolutionary change must be genetically small–yet, paradoxically, at the same time
phenotypically large, given the apparent discontinuity between humans and other animal species.
A FEELING FOR THE PHENOTYPE 4

At the start of the generative era, neither of these two problems were solvable. It was clear
that the rule systems posited by generative grammar were far too complex to tackle on evolutionary
grounds. Similarly, we did not have any substantial understanding of genomics. As a result,
through the 1950s and 1960s little was said about language, genetics, and evolution. What little
was written was, however, often prescient in many respects. For example, it was clear from the
start that although language must have some genetic basis, this could not be the result of some
single gene, but, more likely, the result of a complex developmental process involving many genes
that are regulatory in nature. Lenneberg (1967) describes the situation in this way:
“The problem is…what is known about the specific action of genes…this puzzle is, of
course, not peculiar to the problems of the genetic basis of language but to the relationship between
genic action and the inheritance of traits in general” (Lenneberg 1967, 239). “These considerations
make it clear that it is not strictly correct to speak of genes…for the capacity for language” but that
“the route though which genes affect the over-all patterns of structure and function …. show that
it is possible to talk about language in connection with genetics without having to make shaky
assumptions about “genes for language.” (Lenneberg 1967, 243, 244). Despite Lenneberg’s clear
pronouncements, the biological-generative approach has been widely misconstrued as something
entirely different. Again, one of the most current assessments of the same situation Simon Fisher,
along with his colleague Sonja Vernes, remarks that “such theories [of generative grammar]
routinely assume that a human gene (or set of genes) somehow encodes the necessary information,
they do so without reference to what genes are and how they work, neglecting the complex indirect
routes by which genomes contribute to brain development and function …. that helped feed the
myth of so-called grammar genes” (2015, 290).
Fisher and Vernes’ assessment appears wildly off the mark. More than a half-century ago,
Lenneberg and the other biological generativists observed that the genetic family pedigrees of
language impairment–what is today studied as specific language impairment–along with much
other genetic evidence pointed to a clear genomic underpinning for human language. But as we
have just seen, Lenneberg explicitly disavowed “grammar genes” and had a clear-eyed
appreciation of the subtlety of the genotype-phenotype link. What is more important, Lenneberg’s
prescient emphasis on genomic regulatory systems has now borne fruit–many of the genes
discussed in connection with language, such as FOXP2 and others–have turned out to be regulatory
or ontogenetic in nature.
More importantly however, the early biological generativists really did pursue the question
of more precisely defining a biologically-oriented language phenotype, one that was as narrow as
possible. The genomic encoding and discontinuity problems left the biological generativists with
just one avenue to pursue: reduce the complexity of the internal rule systems they hypothesized so
as to satisfy the dual demands of learnability and evolvability.1 The more simple and more narrow
one can make the “language phenotype,” the smaller the gap between human and nonhuman
animals with respect to the LC trait and the easier the explanatory burden. To those familiar with
the history of modern linguistics, this emphasis to “shrink the phenotype” will be familiar, since
the research program in generative grammar from the 1950s has been exactly this. To see how the
“shrink the phenotype” approach has fared sixty years later, we consider the answers to what
Berwick and Chomsky (2016) dub the six key “mystery questions” about the evolution of
language: What, Who, Where, When, How, and Why. Our main business will then be to show

1
There is also a clear third, functional constraint involving the use of language, efficient parseability and
efficient generation, that we will set to one side in this chapter. For some discussion on the overlap between the two,
see Berwick (1982, 1985).
A FEELING FOR THE PHENOTYPE 5

how the internal LC phenotype serves much better for research than a focus on the external
properties of speech or sign.
What evolved boils down to the species-specific properties of the LC trait and human
language generally. Most evidently, human languages generally can consist of a discrete infinity
of structured expressions, interpretable at two interfaces, the first interface yielding a
representation for thought and action, and the second a sensory-motor representation for external
speech or sign–in short, meanings paired with sounds (or gestures). Since the foundational work
in the 1930s by Turing, Gödel, Church, and others, we know that to do this, to compute a discrete
infinity of structured expressions for language (any language) requires some combinatorial
generative procedure that assembles smaller elements (starting with elements of a lexicon, word-
like elements and others) into ever larger expressions. In Chomsky’s current Minimalist Program,
as we detail in the next section, perhaps the key advance is that one can show that the new
biological machinery required for this human computational ability is very small indeed, almost
as small as is conceivable without granting full human language to all vertebrates–a single, new,
small “algebraic twist” on existing vertebrate biological traits. This is just what a conventional
Darwinian would have expected. Famously, Darwin likened language to upright posture, and once
again here Darwin seems to have been extraordinarily prescient. However, in another respect,
Darwin’s apparent position that we differ only quantitatively, rather than qualitatively from other
animals, he appears to have been quite in error.
This new combinatorial ability is called Merge. Merge takes any two syntactic objects,
perhaps previously computed by Merge, and outputs a single new syntactic object, their labeled
composition, a hierarchical structure. Who is just modern humans, not songbirds, chimpanzees, or
dolphins. That is, only our species has the LC trait, including Merge and a (conceptual) lexicon,
yielding our inner thought, and other animals do not, just as spiders, but not people, spin webs.
Where is sub-Saharan Africa, between 200,000 and 100,000 years ago. How comprises two sub-
questions–the computational implementation of the BP, and the evolutionary genomic and
neurological changes that enabled it. Finally, Why is language’s purpose–the role LC plays as the
engine running the “language of thought,” an “inner mental tool” in the words of the neurologist
Harry Jerison (1977). We turn first in the next section then to the questions of what evolved and
who had it.

2. What and Who? The linear vs. hierarchical human innovation


We contend that the novel, simple algebraic twist that distinguishes the human trait LC from other
animals can be caricatured at least in terms of syntax as in the schematic “cartoon” depicted in
Figure 1, which also serves as a summary of the computational core of Chomsky’s current syntactic
theory. The top third of the figure displays a system based on linear precedence, that is, a
representation like beads on a string, dictated by one predicate of left-to-right order–a
concatenation operation for strings. Informally, one can tell only whether there is an element that
is to the “left” or the “right” of another. So for example, given the Campbell’s monkey call boom-
boom (with a “meaning” of ‘come here’) one can concatenate the krak-oo to yield boom-boom-
krak-oo (‘fallen branches’). Following much recent research in animal behavior, apparently all
vertebrates possess this computational ability–including us of course.
The middle and bottom third of the figure display the human innovation: linear precedence
has been replaced by hierarchical precedence–an operation called Merge that can combine any
two syntactic objects into a single new syntactic hierarchical object. (Recall that these
representations are built internal to the organism.) Informally, given Merge, one can now only
A FEELING FOR THE PHENOTYPE 6

determine whether one element or structure is “above” or “below” another in terms of structural
distance. These resulting nonlinear representations resemble open “triangles”–with one key
difference, namely that the left-to-right order of the individual elements consequently becomes
irrelevant, in contrast to the linear precedence representation above. This shift from linear to
nonlinear representations enabled by Merge encapsulates the innovation for human language
syntax that, in conjunction with our apparently open-ended system of word-like elements, gives
us corresponding open-ended thought.
As a concrete example the middle third portion of the figure thus illustrates how the word-
like atomic elements ate and apples can be combined into the hierarchical structure ate apples,
and then, following another Merge operation with John, into John ate apples. This is the internal
mental syntactic representation for the generated sentence, without linear precedence. (In this
example we have ignored many interesting linguistic details irrelevant to this point.)
The bottom third of the figure illustrates that this internal hierarchical syntactic
representation is indeed left-to-right order free in humans; it surfaces as the familiar “word order
variation” in individual human languages. In the view set out here, such differences are inessential
to the internal LC phenotype. To be sure, they are important for the external use of language, but
that is not our focus, and, further, appears to be irrelevant for what we want to study. For instance,
in Japanese, the same sentence would be expressed with a different left-to-right order but with the
identical hierarchical relationships, “Jon wa ringo o tabemashita” (‘John apples ate’). In
conjunction with some initial set of word-like atomic elements, it is claimed under the Strong
Minimalist Thesis that Merge and the structures it builds characterizes the human LC phenotype.
We turn now to examine these linear and hierarchical systems in a bit more detail, taking note
how they exemplify the “externalist” vs. “internalist” phenotypic viewpoints.

-----------------Insert Figure 1 about here------------------------

Merge Merge

ate apples ate apples John

ate apples

John Jon-wa

ate apples ringo-o ate

Figure 1. A cartoon-like representation of the key distinction between nonhuman and human syntactic
computational abilities. Top: a “beads-on-a-string” model of left-to-right precedence relations,
characterizing the computational power assumed available to all vertebrates. Middle: the presumed human-
unique innovation of hierarchical precedence, as created by the Merge operation, illustrating that ate and
apples can be combined into the open “triangular” hierarchical structure ate apples and then by another
Merge operation, into John ate apples. Bottom: Note that this internal hierarchical syntactic representation
A FEELING FOR THE PHENOTYPE 7

is left-to-right order free; in Japanese, the same sentence would be expressed with a different left-to-right
order but with the identical hierarchical relationships, “Jon wa ringo o tabemashita” (‘John apples ate’).
(Irrelevant syntactic details have been omitted in this figure.)

Simple examples reveal that it is structure and structural (hierarchical) “distance,” not
linear order, that is essential for both syntax and interpretation. Berwick and Chomsky (2016)
point out pairs such as these:

(1) Birds that fly instinctively swim


(2) Instinctively birds that fly swim

Sentence (1) is ambiguous. The adverb “instinctively” can modify either how birds fly, or how
they swim. From this one might conclude, erroneously, that adverbial modification is based on
linear distance to a verb. Sentence (2), however, reveals that the key constraint is structural
distance: now the sentence is unambiguous and “instinctively” can modify only the linearly “more
distant” verb “swim,” rather than the apparently closer “fly.” Figure 2 shows that in terms of
hierarchical structural distance, “instinctively” is closer to “swim” than “fly.” Despite the apparent
computational simplicity of linear order, it does not even seem to be part of the core internal
syntactic phenotype, and in this sense is not even “seen” by the internal system, though plainly
linear order is imposed on externalization, that is, speech or sign. Everaert et al. 2016 provide
additional examples drawn from human language sound systems, syntax, and semantics for the
role of structure, not strings, in human language.

----------------------------- Insert Figure 2 about here ----------------------------------------------

Adv S

Instinctively NP VP

birds S swim

that fly

Figure 2. The interpretation of “instinctively birds that fly swim” is based on hierarchical structural distance,
not linear distance. The internal structural representation of the sentence as constructed by the basic
combinatorial operator, depicted in the figure, shows that “instinctively” is closer to “swim” in terms of
hierarchical structure; linear structure is not “visible” to the system and is irrelevant.

Chomsky (2016) provides further evidence that internal semantic interpretation is also based on
the same sort of hierarchical structural representation. He considers this sentence:

(3) Which girls and boys did the men expect to like each other
The anaphor “each other” is interpreted as associated with “which girls,” so that the sentence as
A FEELING FOR THE PHENOTYPE 8

meaning something roughly like (4) below:

(4) for which girls and boys, the men expected those girls and boys to like each other
Once again, the interpretation is not based on local linear distance, with “each other” linked to the
closest antecedent, “the men.” Rather, it is based on the hierarchical structure formed by successive
Merge operations, resulting in a structure as depicted in (5), with two copies of “which girls and
boys”:

(5) [which girls and boys] did the men expect [which girls and boys] to like each other
The semantic interpretation we give to such examples provides good evidence that this is indeed
the representation used by the internal system of thought and inference of the mind.

2.1 Linear externalization and animal sensory input-output systems


From comparative work over the past few decades, it has become increasingly evident that all
vertebrates apparently share the ability described above to process sequences of elements that are
based on linear precedence. We argue that the signaling/communication systems of songbirds,
dolphins, whales, and as well human language sound systems, all fall under this analytical umbrella,
which for shorthand we will call linear; see Berwick et al. 2011.2
Formally, it appears that all such systems can be described by regular grammars (Idsardi
2015), equivalently, via finite-state transition networks as depicted in Figure 2, where the top half
of the figure displays the description of a Bengalese finch song as such a network; and the bottom
half displays one constraint on Navajo sound sequences, namely that if there is an ‘s’ sound, can
only be (eventually) followed by another ‘s’, not a ‘ʃ,’ and vice-versa. Though such networks can
yield a discrete infinity of expressions–the existence of “loops” in such finite-transition networks
means that a sequence can be arbitrarily long–the resulting expressions are not “hierarchically
structured” as required by human language syntax. In this context, “process” here means the
ability to detect (segment), acquire (learn), and produce such linear sequences, where “segment”
includes categorical perception, that is, the ability to “cluster” acoustic signals into discrete
equivalence classes, similar to human sound systems. Specifically, there seems to be a widespread,
species-shared ability to statistically analyze linear strings. As demonstrated by Saffran, Aslin, and
Newport (1996), 8-month old human infants can extract and apparently use linear adjacent
regularities in the form of syllabic patterns bigram-like analysis–presumably in decoding the
acoustic stream for word segmentation (but see Yang 2004). Subsequently, other researchers have
established similar abilities in other species, as well as extending from adjacent to nonadjacent

2
It has sometimes been argued that the birdsong of certain species, like that of canaries, and the apparent
acoustic signaling of whales, dolphins, and even Drosophila, is not linear in this sense, that is, cannot be described by
the finite-state transition networks as outline in the next paragraph and displayed in Figure 1. Such demonstrations
often take the form of arguing that only hidden Markov models (one form of non-linear representation) best model
the acoustic output of such creatures. See, e.g., Markowitz et al. (2013). On dolphin “syntactic comprehension,” see
Herman and Uyeyama (1999). None of these studies provides evidence for non-linear syntactic analysis. For example,
the dolphin’s comprehension of gestures appears limited to sequences such as ‘basket ball fetch’ and ‘basket ball in’
(where ‘in’ is in fact a verbal operator, contrary to the authors’ contention). This is all still a regular language. The
canary song analysis is at least suggestive of ‘phrases’ that is, repeated syllable sequences that are manipulated as
“chunks,” but these do not seem to follow a combinatorial merge-like pattern, and it is still possible to describe the
patterns via finite-state transition networks. The dolphin ability at mimicry is impressive, in both the visual and motor
domains. But its syntactic abilities are not.
A FEELING FOR THE PHENOTYPE 9

pattern analysis, so as long as the patterns are regular in the sense defined above. Additionally,
genomic work suggests that there may well be a common neural and genomic “substrate” exploited
by the entire group of vertebrates for vocal learning and production, some as phylogenetically
distant as songbirds and humans, separated by more than 600 million years of evolutionary time
(Pfenning et al. 2014). There also seems to be a large shared neural component for sequence
learning generally, especially amongst primates (Bornkessel-Schlesewsky et al. 2015).
Taken together, this biological evidence supports the contention that apparently all animal
“externalization” systems involved in communication and signaling–including songbird, dolphin,
monkey, and human sound systems–seem to be cast from the same finite-state transition network
mold, describable by linear precedence systems in which left-to-right order plays a key role. This
could be due either to shared descent from a common ancestor or convergent evolution, the
exigencies of temporal patterning, or some combination of these. This greatly simplifies
evolutionary explanation.3

2.2 Emergence of language in the human lineage and its subsequent variation
The upshot of the previous sections is that even though language seems to be a trait unique to our
lineage, a claim that’s contested by others, as we shall see below, all agree some of its antecedents
were already in place before it appeared–a standard Darwinian expectation. But just how many of
its antecedents? It is uncontroversial that “the human capacity for language evolved from a
genomic substrate present in the last common ancestor of humans and chimpanzees and bonobos,
through the gradual accumulation of genetic changes of the intervening 6 million years” as Graham,
Deritziotis, and Fisher (2015, 4) note.
But these authors extend their argument about evolutionary antecedents much further than
this. They claim, first of all, that what has to be accounted for is (human) communicative capacity,
which they equate with language; and, second of all, that there is essentially no qualitative
difference between us and these other species–it’s simply a matter of degree of communication,
not kind: “although chimpanzees and bonobos cannot remotely match the linguistic capabilities of
a human child, careful observation and experiment has shown that they have greater
communicative abilities than previously thought…. Chimpanzees and bonobos in captivity are
able to map meanings onto arbitrary symbols and use these for communication” (2015, 4). This,
of course, is what Lenneberg referred to 60 years ago when he noted that this was Darwin’s view
of language evolution–the continuity view, and that this the standard account of the evolution of
language embraces continuity in part because it becomes much more straightforward to use other
animals for classical evolutionary comparative analysis.
However, for one thing, it’s not even clear that other primates “are able to map meanings
onto arbitrary symbols” or even what these “meanings” are, as compared to human notions, as
Berwick and Chomsky (2016) observe, citing Petitto (2005). Even granting some symbolic ability
with concepts (which remain undefined), as Yang (2013) has demonstrated the statement that other
primates “cannot remotely match the linguistic capabilities of a human child” though accurate read
narrowly, is still misleading. Other primates cannot match children’s linguistic abilities at all: Yang

3
One might suppose that this commonality is simply the logical consequence of having to project some
(perhaps arbitrarily complex) internal representation (linear or not), onto a precedence-oriented left-to-right temporal
external signal stream–and that all such ordered sequences must therefore be representable via transition network
diagrams like those in Figure 2. This intuition proves to be false. As was proved in the 1950s, there are sequential
patterns not describable by such networks, namely, the strictly context-free string patterns or string sets, that can be
generated by strictly context-free grammars but not by finite-state transition networks, as shown by Chomsky (1956).
A FEELING FOR THE PHENOTYPE 10

shows that the ASL-taught Nim had, in effect, no linguistic capabilities of human 2 year-old
children. The children acquired rules for two-word sentences/utterances, according to Yang’s
objective information theoretic measure; Nim memorized sign sequences. Conclusion: Nim
lacked Merge, and so the crucial bridge from nonhuman animal linear to human hierarchical
structure; and, if Pettito is correct, Nim also lacked human concepts and word-like atomic elements,
communicative abilities or not. (Note that this does not assert that Nim could not associate spatio-
temporally contiguous objects with particular signs.)
Graham, Deritziotis, and Fisher’s claim is in even more sweeping that this–that nonhuman
animals have all the qualitative ingredients for human language, and that perhaps the only “gap”
separating us from other primates is the demand for communicative sophistication pushed by
greater sociality in the human lineage. “The reason why the ancestors of humans went on to evolve
full-fledged language but the ancestors of chimpanzees and bonobos did not may relate to prior
genetic changes pertaining to sociality in the human lineage” (2015, 4). There may have been such
changes, but as we have just noted, the other primates do not appear to have even a quasi-human
“fledged” language, let along a “full-fledged” one. Evidently, chimpanzees cannot even master
two-item combinations. (Of course, it is possible that their ancestors did have this ability and lost
it.) So here the focus on communicative externalization leads one astray. Below we point out that
the neural evidence also points to a “gap” and major circuitry that is simply missing in other
primates.
What then of human syntax’s role in the evolution of language? Graham, Deritziotis, and
Fisher do admit that other primates “have an extremely limited ability to combine…items” (2015,
4). This is an understatement, as we have seen. Our ability to combine even two items is markedly
different from other primates– hierarchically rather than sequentially, and just as one would expect
these other primates lack Merge. Neuropsychological experiments briefly described below confirm
this. These authors go on to say that even this human combinatorial ability might have arisen
without conventional evolution: “it is unclear to what extent this complexity has arisen due to
genetically grounded cognitive developments versus cultural processes not requiring any attendant
genetic change (Kirby et al. 2014).”
There are three important counters to this claim. First, the reference to “cultural processes”
here points to a long line of research on “iterated cultural evolution” from Kirby and colleagues
running back to at least 1997 (Kirby 2000) asserting that Merge-like compositional operations can
“emerge” from the need to form communicatively coherent symbol-external sign maps. However,
these studies show less than has sometimes been thought. They demonstrate only that if an animal
already has a compositional operator like Merge, then one might require it in order to assemble a
systematic culturally-transmittable (learnable) connection between externalized strings and
internalized representations of “meanings.” In other words, these studies presume that Merge is
antecedently present, and that the observed discontinuity in language-like abilities is a
consequence of communicative complexity. On this account, Merge must have evolved even
earlier, which simply pushes one of the key mysteries further back in evolutionary time. That’s a
very different assumption than the one made by the biological generativists, who point to the
absence of Merge in nonhuman primates in both behavior and brain circuits, as we describe below.
A similar story emerges from all the attempts at training songbirds to recognize patterns that can
only be generated by Merge. As Marc Hauser has observed, this same gap pervades all aspects of
nonhuman animal cognition: while many animals make tools, no other animal makes a
combinatorial tool or uses a tool made for one purpose for another task. That is, other animals
lack the human combinatorial “promiscuity” of form and function. Only humans appear to possess
A FEELING FOR THE PHENOTYPE 11

an open-ended generative capacity that applies to all domains, including tool-making. We ascribe
this to Merge. While both “externalists” like Fisher and colleagues who focus on speech and the
generative “internalists” insist that an abundance of atomic word-like “concepts” played a key role
in shaping human language, only the generativists single out the combinatorial syntactic operation
Merge as uniquely human, operating together with word-like atomic elements to give us the rich
internal resource for thought that we call here language.
Second, neurological experimental evidence undermines the view that other primates
possess an antecedent combinatorial system. For example, Amunts and Ziles (2012) note that
while other primates have an analog of Broca’s area (so-called BA44), they don’t have a human
cyto-architectonic structure. Further, Zaccarella and Friederici (2015) have found that the signs of
two-item (determiner plus pseudo-word noun) Merge can be pinpointed to a very particular part
of Broca’s area, the most ventral anterior portion, a phylogenetically younger region. The
processing of two-word sequences independent of syntax being present or not was located in the
frontal operculum, a phylogenetically older brain region than BA 44 itself. Further, the frontal
operculum seems to be activated in the processing of artificial finite state transition network
sequences–those common to all vertebrates. Recall also that the Bornkessel-Schlesewsky (2015)
results cited above on sequential processing in primates actually indicates that other primates lack
this ability (so they must argue, incorrectly, that sequence processing suffices for human language).
Finally, we now have developmental and comparative evidence that lines up with the linear vs.
nonlinear distinction, as described in Berwick and Chomsky (2016). Briefly, human infants start
acquiring the particular sound system of their language in utero, before birth, but they don’t speak
in two or more word sentences that early. Why not? Part of the answer, it seems, is that white-
matter fiber tracts have not yet fully myelinated to properly connect the regions of BA44 that carry
out Merge with the system of word-like elements, lexical-argument structure, and so forth. As soo
as that has matured, sentences with syntax follow. Further, nonhuman primates don’t seem to have
these connections at all–they are absent in macaques. All the parts of this picture fit what we
know.
Third, nonhuman primates show no definitive signs of being able to learn artificial
languages that are non-regular, that is, the “nonlinear” systems as described above–even after
extensive operant training (Friederici 2004). Further, while it might be true that the deliberative
planning observed in some animals is underpinned by a computational system that reflects the
architecture sometimes speculatively posited for Merge (some functional equivalent of a push-
down stack), this too has not really been confirmed, as Fitch (2014) observes. For the time being
then, the neurobiological evidence we have points to “only us” possessing Merge, not other
animals. Note that this in no way says that language must be localized to the “traditional” Broca’s
and Wernicke’s areas, a criticism (improperly) leveled at Berwick and Chomsky (2016). Rather,
it is clear that words and merge are tied up in a much more far-flung cortical web, with at least
four separate dorsal and ventral fiber tract paths, that includes access to lexical-semantic
information, conceptual structure along with merge-driven syntax (Friederici et al. 2016). As
Rilling (2014) writes, “the projections of the human arcuate fasciculus [a white matter fiber tract]
reach beyond Wernicke’s area to a region of expanded association cortex in the middle and inferior
temporal cortex that appears to be involved in processing word meaning” (2014, 13).
Summarizing: all the detailed behavioral and neurological evidence so far points to a “gap”
between human and nonhuman computational abilities, exactly the one defined by the abstract
linear vs. nonlinear distinction described at the beginning of this section, and exactly the one that
is accounted for by the appearance of Merge in us, but not other species.
A FEELING FOR THE PHENOTYPE 12

This “feeling for the phenotype” as internal syntactic computation rather than the
externalization devoted to speech or sign has further ramifications as to one’s view of language
variation. For externalists like Fisher and colleagues, there is apparently an “enormous diversity
of the grammatical rule systems of the world languages” (2015, 4), seemingly without any rhyme
or reason. However, for internalists, this variation is simply a reflection of different ways of
projecting hierarchical structure onto a linear speech stream. Perhaps as a result, many who view
language simply as speech appear to have focused on what amounts to superficial variation in the
external appearance of one language as opposed to another, while the genomic variation of the
internal LC phenotype is in fact minimal. We take up a few examples here, illustrating that as in
other the other sciences, once the underlying principles regarding a phenomenon are better
understood, apparently bewildering and unruly variation disappears.
After discussing the appearance of sign languages in congenitally deaf communities as a
“genetic” factor in language change–as it indeed must be, though an obviously genetically-driven
one again related to externalization modality–Graham, Deriziotis, and Fisher (2015) have this to
say about language change: “a more widespread scenario may be language change through the
process of genetic biasing, in which small inter-population differences in language production,
perception, or processing abilities with a genetic basis are amplified over generations, leading to a
shift in some aspect of language (Dediu 2011)” (2015, 4). While such genomic factors are certainly
logically conceivable, in truth no gene-language link like this has ever been confirmed, despite
repeated attempts to do so. One prominent example was advanced in the middle of the last century,
when it was argued that certain human language sound systems were associated with certain
geographic groups as genetically defined via blood types. Lenneberg and several prominent
evolutionary biologists, including Ernst Mayr and Dobzhansky, quickly demonstrated that this
suggested genetic-language link was not tenable (see also Berwick and Chomsky 2016, chapter 1).
Taking up the example that Fisher et al. cite–their only example–Dediu and Ladd (2007)
claimed to find a (purely associative) relationship between two genetic variants of two brain-
growth related genes, ASPM and MICRO1, and tonal vs. non-tonal languages. The idea is that the
ancestral forms of both genes appeared earlier n the hominid lineage, and the derived variants are
more recently evolved, and appear in greater proportions only in modern humans. This association
is between the genomic variation and languages is said to be revealed by a two-way x-y scatterplot
of the percentage of the two gene variants as associated with tone-speaking and non-tone speaking
language populations, where each axis denotes the percent ancestral/derived percentage of each
gene variant. What one appears to see is that tonal languages are associated with particular levels
of ancestral/derived forms, given both genes. However, this putative association, which in fact has
never been confirmed via any functional causal link between the genes themselves or in any
language phenotype, vanishes as soon as one factors out an unsurprising confound of geographic
location.
Figure 4 illustrates the clear lack of any gene-tone language association once geography is
brought in as a potential explanatory factor for the association. It displays an x-y scatterplot of each
gene type, % derived MCPH1 on the y-axis and % derived ASMP on the x-axis. Each plotted point,
either an open or closed circle, gives the proportion of derived genomic variants for a population
speaking a particular language, with tonal language populations denoted by closed shaded circles,
and non-tonal by open circles. Here we have taken the original data from Dediu and Ladd and
restricted the plot to just Asian tonal/non-tonal languages for which measurements were available.
While Dediu and Ladd (2007) noted that all tonal or non-tonal regions clustered in just a single
part of the 2-dimensional plots, it is easy to see in Figure 4 that this is not true when geographic
A FEELING FOR THE PHENOTYPE 13

regions are considered: here, the tonal and non-tonal languages are mixed together, even though
they have about the same proportion of ancestral and derived varieties of both gene variants. The
tonal and non-tonal languages within a region are clearly not partitioned according to the brain
development and associated gene variant, but rather the tonal and non-tonal languages with the
same allele proportions are mixed together. The same is true for the American, South-east Asia,
and European regions, figures not shown here. In other words, –there is no real association between
gene variant and tonal language type as soon as we have localized the data to a particular
geographical area, probably as a result of contact effects. At least in this domain, geography
trumps geneology.
Graham, Deriziotis, and Fisher apparently go much further than this though, claiming that
this effect of gene biasing is even more general: “for example, inter-population differences in the
anatomy of the vocal tract could influence the phonemic inventories of different languages” (2015,
4). However, they offer no evidence for this assertion. In fact, models based on the learnability
of the clusters of distinctive features that make up human sound systems do a far better job of
accounting for the observed sound systems of human languages, as described in Berwick (1982,
1985). Such variation is external, driven by exposure to experience along with constraints on what
can be learned from! explicit, positive examples (as with birdsong). As one might expect, the
capacity to acquire! any human language is equipotential. As far as can be determined, the possible
!
inventory of distinctive features, and so possible human language sound systems, has remained
!
fixed as far back as ancient Sumer. There has been no genomic change here either.
!

!!!!!! !
!
!
Figure 4. Lack of association between tonal languages and gene (allele) variation as illustrated by a
geographically-restricted scatterplot of tonal (closed circle) language populations and atonal (open circle)
language populations plotted with their derived/ancestral percentage ASPM (amitotic spindle assembly
gene), x-axis, and MCPH1 (microencephalin) gene, y-axis, for Asian languages only. Not the lack of any
A FEELING FOR THE PHENOTYPE 14

clustered association between derived allele types and tone-non-tonal languages: aside from two outliers,
the open and shaded circles are intermixed. Original data replotted from Dediu and Ladd (2007).

To be sure, languages do change over time. But this variation operates within certain
envelopes and is not unconstrained. This has often been formulated by models that mirror the
evolutionary change of populations, within a variety of theoretical frameworks, with change driven
by slight “mistransmission” errors like those in DNA replication or inheritance generally. One
result is that familiar variety of “word orders” such as Subject-Verb-Object, Subject-Object-Verb,
and so forth. From an externalist perspective, this variety leads to the appearance of great variation:
“the enormous diversity in the grammatical rule systems of the world’s languages” (Graham et al.
2015, 4). But this is only a surface appearance; the variety is in fact not all that great–pronouncing
the Object before the Verb is simply a choice in externalization drawn from a restricted set of
possibilities, that the Verb appear either first or not.
Particular word orders can be readily accounted for by the joint effect of learners exposed
examples along with the given, invariant topological forms generated by Merge. Using this
framework, Niyogi and Berwick (1995, 2009) show how a “Verb final” language such as Old
French might shift to become Modern French, with Subject-Verb-Object order, under the influence
of just a few speakers who slightly mislearn an initially Verb-final language. Over several
generations, the resulting dynamical system moves the externalization pattern from Verb final to
Subject-Verb. A more detailed analysis of the same effect has been given by Yang (2000). Note
that this mislearning itself might, in fact, be due to some underlying genomic variation, a
possibility that remains to be explored. Similar dynamical system effects can account for much of
the apparent “surface flux” in one language as compared to the next, a point examined by Niyogi
(2006) in several language areas from syntax to phonology and Niyogi and Sonderegger (2010)
with respect to stress systems in language, another dimension of apparent language variation that
is readily accounted for without any change in the genomic properties of the LC. In phonology
generally, work by researchers such as Blevins (2006) demonstrates how such “mistransmission”
effects can account for observed changes. Similarly, changing patterns of word use in language
over time can be treated via simple population dynamic models with “selection” for more or less
“fit” words, as carried out by Pagel and colleagues (Pagel et al. 2012). Space here does not permit
a full examination of the wide range of such efforts, but the basic point is that most of what is
commonly observed as “variation” from language to language can be accounted for by the external
selection from a relatively small set of “menu choices.”4 Taken as a whole, this work shows that
the basic underlying machinery of Merge human syntax can remain fixed, and still account for the
observed variation from human language to language, where “language” means a system for
externalization.
Taking all this current information together then, the human capacity for language, LC,
appears to have been genomically fixed since the Ethiopian exodus from Africa approximately
60,000 years ago. Given the required constraints on language acquisition–sufficient lability to
acquire any language, largely, externalization variants, and sufficient constraint to acquire
language under severe input restrictions (deaf, blind children)–this invariance seems unsurprising.
To be sure, there is clinical variation (such as that exposed by the FOXP2 haplo-insufficiency),
and there is also individual variation that is not pathological, but the significance of this individual

4
For other models that account for language change in terms of “iterated replication” from quite different
perspectives see Griffiths and Kalish (2007) and Pearl and Weinberg (2007). Nevertheless, both studies demonstrate
how apparent “external variation” arises naturally without the need for any genomic change.
A FEELING FOR THE PHENOTYPE 15

variation is less clear. In some cases, a few differences have been found. As Hoogman et al. (2014)
notes, in a study of a larger population carried out with Fisher, “Despite using a sample that is
more than 10 times that used for prior studies of FOXP2 variation we found no evidence for effects
of SNPs [single nucleotide polymorphisms – genetic variants] on variability in neuroanatomy in
the general population….the impact of this gene…may be largely limited to extreme cases of rare
disruptive alleles” (2014, 473). In another recent study, Fisher and colleagues (Whitehouse et al.
2011) found that common variation in another gene that is a downstream target of the FOXP2
transcription factor, CNTNAP2, was linked to proficiency at early language, measured at age 24
months by parents in an Australian sample of 1149 children. For example, children differed in the
exact age at which they produced their first words–a familiar fact for most parents. Similarly,
Bates et al. (2011) found individual genetic variation in the gene ROBO1 linked to individual
differences in “phonological buffering,” and facility at non-word repetition–potentially related to
language acquisition. (It makes sense that the better an individual can process the language-related
sounds it receives, the better its learning.) Note, however, that phonological buffering is again
patently linked to the externalization system.
In at least a few instances, when discussing the genomic changes on the path to language,
Fisher and colleagues do seem to be aware that there might be more to language than its
externalized use as speech. Fisher and Ridley (2013) take the position advocated by Lenneberg
and Berwick and Chomsky (2016) when considering the FOXP2 transcription factor that Fisher
himself has been so thoroughly investigating. They note that it is unlike that “FOXP2 triggered
the appearance of spoken language in a nonspeaking ancestor. It is more plausible that altered
versions of this gene were able to spread through the populations in which they arose because the
species was already using a communication system requiring high fidelity and high variety. If, for
instance, humanized FOXP2 confers more sophisticated control of vocal sequences, this would
most benefit an animal already capable of speech. Alternatively, the spread of the relevant changes
may have had nothing to do with emergence of spoken language, but may have conferred selective
advantages in another domain” (Fisher and Ridley, 929-930). We believe their last point more
accurately reflects what took place.

3. Where and When?


Darwin appears to have been largely correct about the origin point of modern humans and
language–out of Africa. Berwick and Chomsky (2016) advance a tentative time interval for the
appearance of the Basic Property: no earlier than 200,000 years ago, the appearance of the first
anatomically modern humans (AMH) in Africa; and no later than 60,000 years ago, the time of the
last major human exodus out of Africa through Ethiopia, though here this date can be refined to
probably no later than 80,000-90,000 years ago, given the suggestive evidence of unambiguous
symbolic activity associated with the Blombos cave stone ochre geometric engravings and other
artifacts from this point in time.5
We can also considerably sharpen the origin of language estimate by drawing on the most
recent results from comparative genomics via the sequencing of ancient and modern DNA.
(Gronau et al 2011; Kulhwilm et al. 2016). Siepel and colleagues (Gronau et al. 2011) analyzed
the whole-genome variation diversity patterns of six individuals from several contemporary
subpopulations of Africa and the world (European, Yoruban, Han Chinese, Korean, Bantu, and an
African subpopulation, Khoisan-speaking San). They found that the San African population most

5
Note that AMH might have appeared earlier than this point in time, as discussed below; this date
corresponds to the earliest known AMH fossils.
A FEELING FOR THE PHENOTYPE 16

likely had completely branched off from the rest of the human population approximately 108-157
thousand years ago, remaining as a genomically isolated subpopulation ever since. Since the
modern San evidently possess human language, this implies that the human LC was present at least
this long ago. This narrows the estimated time of language’s emergence to between 100,000-
200,000 years ago. Below we show how to push this date back even earlier.
The genomic isolation of the contemporary San population dating back to 200,000 years
(see just below), who clearly have the LC, also leads to an intriguing proposal made by Huijbregts
(2016). If one examines the linguistic evidence carefully, it suggests that the Khoisan group, but
not other human populations, use “clicks” as part of their sound system–that is, for externalization.
As Huijbregts notes, “click consonants occur only in Khoisan (or Khoisan language families).”
They don’t occur outside Africa.6 Why might this be? Huijbregts notes that this pattern makes
sense if we posit that possession of the internal language faculty preceded externalized language–
i.e., speech. In this case, the San ancestral population, or perhaps one just before it had Merge for
the internal construction of thought, but the difficult task of externalizing this took longer, just as
posited by Chomsky (Berwick and Chomsky 2011). The genetic isolation of the San population
leads to the possibility of a similarly “isolated” solution to the problem of externalization involving
non-pulmonary airstream manipulation–i.e., clicks–that were not “discovered” by other human
populations to solve the problem of externalization, but remained restricted to this earliest, isolated
anatomically modern human subgroup. Note that if we assume that “speech=language” along
with associated externalized language behavior, then there would be no reason for the independent
re-invention of language externalization with different properties (i.e., without clicks), at least not
without extraneous assumptions. Thus the singular appearance of clicks along with the genomic
evidence fits the “internal language phenotype” view.
But we now can do even better than this with respect to dating. Recently, Kuhlwilm et al.
2016 have examined in more detail the DNA of a Neandertal individual from the Altai mountains
in Siberia as compared to the DNA of modern African populations. Their results shed additional
light on the question of gene flow between Neandertals and modern humans, as well refining the
estimated date for the origin of language. Using computer simulations along with a comparative
analysis of the Neandertal-African human DNA differences, Kulhwilm and colleagues arrive at
two relevant conclusions: (1) some modern human genomic regions appear to have introgressed
from modern humans into one particular Altaic Neandertal individual’s DNA; and (2) in particular,
a general gene flow from anatomically modern humans into this Neandertal individual’s lineage
seems to have occurred just about 200,000 years ago, at the time of the split between the ancestors
of the human San population and the rest of modern humans, or perhaps a bit earlier, with gene
flow between the immediate ancestors of the San and other modern humans: “We conclude that
the introgressing population diverged from other modern human populations before or shortly after
the split between the ancestors of San and other Africans, which occurred approximately 200,000
years ago” (2016, 4). Figure 5, reproduced from Kuhlwilm et al. 2016, depicts the presumed gene
flow between the Neandertal-human groups.
Turning to the first result, Kuhlwilm et al. found that a 150kb segment of the FOXP2
transcription factor gene appears to have introgressed from AMH to the Altaic Neandertal
individual. This introgression does not appear to involve the two putatively keys nucleotide
changes coding for amino acid differences distinguishing modern humans or Neandertals (Enard

6
It has sometimes been suggested that one register of the (extinct) Australian language Lardil, Damin, used
clicks, but as Huijbregts says, “these are a few phonemic nasal clicks, probably introduced from the paralinguistic use
of click sounds” (i.e., like the “clucking” noises made for horses).
A FEELING FOR THE PHENOTYPE 17

et al. 2002; Krause et al. 2007)–it appears well before these presumptively critical regions. (The
regulatory regions also suggested as possibly important occur even further along the FOXP2 gene.)
Whether this introgression was functionally significant is thus unclear. However, the fact that such
introgression occurred with FOXP2 means that the discovery of modern FOXP2 sequences in
Neandertals by Krause might require re-examination.
Second, as Figure 5 shows, statistical analysis reveals a presumptive gene flow from AMH
to the Altaic Neandertal individual, dated back to about 200,000 years ago. (That is, the Altaic
individual retains the signal of this gene flow from its ancestors.) Note the shaded circle at 200,000
years ago indicates that the statistical analysis cannot resolve whether this flow was from just
before the separation of the San population from the root AMH ancestors, or from some other
AMH group that separated at roughly this time or slightly before from the rest of the AMH group.
Since all these AMH groups presumably possessed Merge and the LC–as the modern San
population currently do–that may push back the date for the origin of the LC back to about 200,000
years ago.

Figure 5. Estimated percentage gene flow between ancestral human (light green) and Neandertal (red)
groups. Chimpanzee is shown for comparison. From Kuhlwilm et al. 2016, Ancient gene flow from early
modern humans into Eastern Neandertals, Nature. Reprinted with permission from Nature Publishing,
MacMillan Ltd.

What about the origin of language–and presumably Merge–approximately 200,000 years


ago? Could the LC have appeared before this time? That would place language’s origin before the
appearance of anatomically modern humans, at a minimum somewhere along the line of the
ancestral Homo species that led to Homo sapiens. A hint of this is given in Figure 5, which shows
that some (unknown) ancestor to AMH, or an early AMH group. These more ancient, extinct Homo
A FEELING FOR THE PHENOTYPE 18

clades have even sometimes suggested to be either Homo heidelbergensis or Homo antecessor,
dated at anywhere between 600,000 years ago to 800,000 years ago.
This picture has also been recently clarified by the sequencing of ancient nuclear DNA
from Homo specimens dated at 300,000-400,000 years old, found at a Spanish site, the Sima de
los Huesos, or ‘pit of bones,’ and sequenced by Matthias Meyer (Gibbons, 2015). This ancient
DNA points to a phylogenetic tree of family genetic resemblances as depicted in Figure 6. The
Sima fossil DNA is genetically closer to Neandertal DNA than Denisovan, and ancestral to
Neandertal. From this one can infer that the Neandertal-Denisovan split occurred farther back in
time than had been generally assumed, which in turn pushes back the time of the split between the
common ancestor of Homo sapiens and H. neandertal and H. denisova to about 565,000-700,000
years ago. If this finding is correct, then modern humans have been separated from Neandertals
and Denisovans by about 1.4 million years of evolutionary time–a long divorce with sufficient
time for differences to emerge.7

Neandertals Denisovans Modern humans

Sima fossils
300,000-400,000
years ago

500,000
years ago

700,000 years ago

Homo ergaster
Figure 7. Phylogenetic picture of Homo species, as suggested by the ancient nuclear DNA analysis of the
Sima “pit of bones” fossils.

Is there any evidence that Neandertals/Denisovans possessed the (human) LC? As with
AMH, any such evidence can only be via proxy. In this case, we don’t see the same unambiguous
signs of symbolic behavior as with modern humans, so one must turn to other symbolic proxies
for the LC. One of these is toolmaking. It has been argued that the Mousterian-type Neandertal
tool making serves as a kind of motor reflection of the (internal) language phenotype: “precise tool
replication provides ample evidence for the necessary cognitive capacity in another modality”
(Dediu and Levinson 2013, 7). But some serious paleontologists disagree. In particular, Tattersall

7
Despite appearances, this date of 700,000 years ago does not contradict the date given earlier for the first
AMH fossils, at 200,000 years ago. We don’t know the actual species that was the direct ancestor of modern humans,
which might lie anywhere along the line from 200,000-700,000 years ago. Or it may be that the AMH line originated
earlier than 200,000 years ago. As discoveries accumulate, the Homo lineage continues to expand into a biologically
more familiar “bushy” form with many origin and extinction points as opposed to a single inexorable march from
early forms to Homo sapiens. See Schwartz and Tattersall (2015).
A FEELING FOR THE PHENOTYPE 19

(2012) takes the Mousterian technology as a sign that the toolmakers were not behaviorally
modern–even when the manufacturers were anatomically modern.
In particular, it has been suggested that the motor sequences required for certain
toolmaking, such handaxes, must involve hierarchical action or tacit planning sequences that quite
directly reflect Merge. Viewing toolmaking as a kind of grammatical system has a long pedigree,
stretching back to at least Holloway (1969), who argued for a general “grammar of action.” The
idea runs as follows. Consider a motor action sequence required to forge a hand-axe by chipping
off flakes from a stone, what is called “knapping” (Stout 2011). Detaching a flake from a stone
requires first that one select a target position on the stone to strike, and second that one strike the
stone. Target selection in turn may be broken down into a sequence of separate grasping and
rotating actions (Stout, 2011). Stout arranges the entire required set of motor actions into a
hierarchical tree as depicted in Figure 8, the motor sequence of identify-geometry, rotate-core (of
the stone), turn-core, tilt-core, and finally strike. Advocates of hierarchical motor sequencing call
attention to the apparent similarity between this kind of tree and the syntactic trees in linguistics
as evidence that motor combinatory operations and language combinatory operations are alike.
Therefore, they argue, evidence that Neandertals could make tools that required organized motor
patterns of this sort also had the same combinatory abilities as modern humans in the area of
syntax–i.e., Merge.
Flake

Identify Rotate Turn Tilt Strike


geometry core core core
Figure 8. Action sequence for a basic “flake” operation in tool knapping, displayed in terms of a tree-like
organization. After Greenfield (1991).

But is this analysis correct? While the hierarchical structures for motor sequences and
human syntax might at first glance appear very much alike, on closer inspection the connection
dissolves. There are two key problems.
First, the putative connection to “language” often once again simply assumes that
language=speech–that is, the way in which tool-making is parasitic on language is the way in
which the motor coordination involved in speech parallels tool-making. This claim was made
explicitly in Holloway (1969), who drew an analogy between the “hierarchically structured”
aspects of language and that of tool-making. However, at least initially the notion of
“hierarchically structured” was not that of Merge-built syntax, but the rather broader
organizational principles of sound structure (phonology)-syntax-meaning (however defined). This
obvious, but unhelpful organizational parallel is quite readily dismissed.
But Holloway and Stout ascribe much more to tool-making, as in Figure 8. As Stout and
Charminade have argued “it is not clear that linguistic recursion is really so different from the
hierarchy of behavioural chunks seen in stone tool-making or any other motor behavior” (2009)–
also noting the “overlap” in neural circuitry between language processing and motor areas. And
A FEELING FOR THE PHENOTYPE 20

that leads to the second problem with use of action sequences as a proxy for Merge: this is not
really the case. Recall that Merge combines any two syntactic objects into a single new, combined
syntactic objects along with a label–the way that a verb like “ate” and a Noun Phrase (NP) “apples”
combine to form a new, complex object with the label Verb Phrase (VP). Further, Merge applies
recursively to its own output, so that syntactic objects of the same sort are are often embedded
inside one another, as in [John [VP ate [NP apples [ that [VP were old]]]], where the VP contains a
second VP.
However, the motor actions in knapping do not display these crucial properties. In fact, it
is unclear whether the grouping implied by the tree structure even makes sense. What are the
appropriate labels? In the case of language–and Merge–a verb and NP as in “eat apples” is grouped
as a single constituent because it serves as a distributional equivalence class for which one has
independent evidence. But what does it mean to group together “rotate” and “turn” as a natural
syntactic class apart from “tilt”? Why not simply list the leaves of this tree, without any implied
hierarchical structure? In syntax, we have matched beginnings and ends of hierarchical structures
like Verb Phrases. While it has been claimed that motor actions correspond to language syntax of
this sort–like the beginning and end of some action, beginning to knap, and finishing–this latter
correspondence is simply the ineluctable result of an imposed causality; it is not necessary. As
Moro (2014) observes, while it is certain true that there can be an apparently nested sequence of
motor actions like, open a door; then open a bottle; then close a bottle; then close the door, we all
know that this isn’t a necessity, as legions of parents closing doors left open by children will attest.
Finally, there is certainly no indication in motor actions of copied actions that are not “pronounced”
as in the example “guess what John ate (what)” where the second occurrence of “what” is not
externalized. It might make some sense to execute exactly the same action twice, but how (or
why) would one copy and then execute an action not externalize an action?
Taken together, these discrepancies demonstrate that tool-making action sequences simply
don’t follow what one would expect from the output of Merge, except in a metaphorical sense. As
Moro (2014, 221) summarizes, “the idea of ‘a syntax of actions’ remains a metaphor if compared
with the syntax of any human language. Perhaps…if we shift from actions to action planning some
analogies with syntax could be explored, although it remains to be proved that this activity is not
essentially parasitic on language.” If all this is so, then we really have no evidence for any material
technology that points to language-like human behavior in Neandertals or any earlier species–no
evidence of language-and-thought in the sense we have described it here.
In this way, the early generativists’ “feeling for the phenotype” as an internal generative
computational system seems to have been largely borne out. Language appears to be an essentially
internal system, and focus on external language-as-speech has, in the end, proved to be something
of a distraction–not the phenotype we were looking for. Focus on the internal structure of language,
in contrast, has yielded the insights due to modern generative grammar on locality, learnability,
neurological realization, and the evolution of this computational system. We appear to have
evolved two unique capabilities compared to other animals: the first, described in detail here,
Merge; the second, a system of mind-independent word-like elements, quite different from the
apparently “associative” symbolic systems of other animals, as Chomsky (2013) has noted. We
have as yet no clear understanding of how this latter system of elements evolved. Operating over
these as yet little understood elements, Merge yields an open-ended system of conceptual
structures, sometimes called a language of thought. Here we have come to understand that Merge
itself seems to have appeared quite late in evolutionary history, with “accumulating evidence that
human brain development was fundamentally reshaped through several genetic events within the
A FEELING FOR THE PHENOTYPE 21

short time space between the human–Neanderthal split and the emergence of modern humans”
(Somel et al., 2013) coinciding with the appearance of modern humans, and their unique capacity
for thought, planning, reflection, and creativity.
A FEELING FOR THE PHENOTYPE 22

References

Amunts, Katrin and Karl Ziles 2012. Architecture and organizational principles of Broca's region.
Trends in Cognitive Sciences 16 (8):418-426.
Bates, Timothy C., Luciano, Michelle, Medland, Sarah, Montgomery, Grant W., Wright, Margaret,
and Nicholas G. Martin, 2011. Genetic variance in a component of the language acquisition
device: ROBO1 polymorphisms associated with phonological buffer deficits. Behavior
Genetics, 41:50-57.
Berwick, Robert C. 1982. Locality Principles and the Acquisition of Syntactic Knowledge. Ph.D.
thesis, MIT, Cambridge, MA: Department of EECS, MIT.
Berwick, Robert C. 1985. The Acquisition of Syntactic Knowledge. Cambridge, MA: MIT Press.
Berwick, Robert C. 2009. What genes cannot tell us about language. Proceedings of the National
Academy of Sciences, 106(6):1685-1686.
Berwick, Robert C., Gabriel Beckers, Kazuo Okanoya, and Johan Bolhuis, 2011. Songs to syntax.
Trends in Cognitive Sciences. 15(3):112-121.
Blevins, Judith. 2006. A theoretical synopsis of evolutionary phonology. Theoretical Linguistics,
32(2):117–166.
Berwick, Robert C. and Noam Chomsky. 2016. Why Only Us. Cambridge: MIT Press.
Bornkessel-Schlesewsky, Ina, Matthias Schlesewsky, Steven L. Small, and Josef P. Rauschecke.
2015. Neurobiological roots of language in primate audition: common computational
properties. Trends in Cognitive Sciences, 19:142–150.
Chomsky, Noam. 2013. Notes on denotation and denoting. In I. Caponigro and C. Cecchetto, eds.,
From Grammar to Meaning: the Spontaneious Logicality of Language. New York:
Cambridge Univeristy Press.
Chomsky, Noam. 2016. What evolved and how it might have happened. In William Tecumseh
Fitch, ed. Psychological Bulletin and Review, in press.
Dediu, Daniel and D. Robert Ladd. 2007. Linguistic tone is related to the population frequency of
the adaptive haplogroups of two brain size genes, ASPM and Microcephalin. Proceedings of
the National Academy of Sciences 104(26): 10944-10949.
Dediu, Daniel and Steven Levinson. 2013. On the antiquity of language. Frontiers in Psychology.
Ding, N., Melloni, L., Zhang, H., Tian, X. and David Poeppel, 2016. Cortical tracking of
hierarchical linguistic structures in connected speech. Nature Neuroscience. 19(1): 168-164.
doi:10.1038/nn.4186.
Everaert, Martin, Riny Huijbregts, Noam Chomsky, Robert C. Berwick, and Johan J. Bolhuis.
2015. Structure not strings: linguistics as part of the cognitive sciences. Trends in Cognitive
Sciences 19(12):729-743.
Fitch, William Tecumseh. 2013. Noam Chomsky and the biology of language. In Oren Harmand
and Michel R. Dietrich (eds.), Outsider Scientists: Routes to Innovation in Biology. Chicago:
University of Chicago Press, 201-222.
Fitch, William Tecumseh. 2014. Toward a computational framework for cognitive biology:
Unifying approaches from cognitive neuroscience and comparative cognition. Physics of
Life Reviews, 11(3), 329–364.
Fisher, Simon E. 2016. A molecular genetic perspective on speech and language. In Gregory
Hickock and Steven Small (eds.), Neurobiology of Language. Amsterdam: Elsevier,13-24.
A FEELING FOR THE PHENOTYPE 23

Fisher, Simon E. and Matthew Ridley. 2013. Evolution, culture, genes, and the human revolution.
Science. 340(6135):929-930.
Fisher, Simon E. and Sonja Vernes, 2015. Genetics and the language sciences. Annual Review of
Linguistics, 1:289-310.
Friederici, Angela D., 2004. Processing local transitions versus long-distance syntactic hierarchies.
Trends Cognitive Science 8:245-247.
Friederici, Angela D., Andrea Moro, Noam Chomsky, Robert C. Berwick and Johan Bolhuis. 2016.
Language in the brain. Natural Reviews Neuroscience, in press.
Graham, Sarah and Simon E. Fisher. 2015. Understanding language from a genomic perspective.
Annual Reviews of Genetics, 49:131-160.
Graham, Sarah, Deriziotis, Pelagia, and Simon E. Fisher. 2015. Insights into the genetic
foundations of human communication. Neuropsychological Review, 25:3-26.
Griffiths, Thomas L. and Mark L. Kalish. 2007. Language evolution by iterated learning with
Bayesian agents. Cognitive Science, 31(3):441–480.
Herman, Louis M. and Robert K. Uyeyama. 1999. The dolphin’s grammatical competency.
Journal of Learning and Behavior. 27(1):18-23.
Holloway, R., 1969. Culture: a human domain. Current Anthropology 10:395-412/
Hoogman, Martine, Guadalupe, Tulio, Zwiers, Marcel P., Klarenbeek, Patricia, Francks, Clyde,
Fisher, Simon E. 2014. Assessing the effects of common variation in the FOXP2 gene on
human brain structure. Frontiers in Human Neuroscience, 8:1, July. Doi:
10.3389/fnhum.2014.00473.
Huijbregts, Riny. 2016. Clicks and evolution. Unpublished ms. Utrecht University.
Idsardi, William James. 2015. What’s different about phonology? No recursion. Ms. University of
Maryland.
Kirby, Simon. 1998. Learning bottlenecks and the evolution of recursive syntax. Edinburgh:
Edinburgh University.
Kirby, Simon. 2000. Syntax without natural selection: How compositionality emerges from
vocabulary in a population of learners. In Christopher Knight (ed.) The Evolutionary
Emergence of Language: Social Function and the Origins of Linguistic Form, pages 303-
323. Cambridge University Press.
Lenneberg, Eric. 1964. New Directions in the Study of Language. Cambridge, MA: MIT Press.
Lenneberg, Eric. 1967. Biological Foundations of Language. New York: John Wiley and Sons.
Markowitz, Jeffrey E., Lizabeth Ivie, Laura Kligler, Timothy J. Gardner. 2013. Long-range Order
in Canary Song PLoS Computational Biology DOI: 10.1371/journal.pcbi.1003052.
Moro, Andrea. Response to Pulvermüller: the syntax of actions and other metaphors. Trends in
Cognitive Sciences 18(5):221.
Niyogi, Partha, and Robert C. Berwick. 1995. The logical problem of language change. AI Memo
1516, MIT. Cambridge, MA: MIT.
Niyogi, Partha. 2006. The Computational Nature of Language Learning and Evolution. Cambridge,
MA: MIT Press.
Pagel, Mark, Quentin D. Atkinson, and Andrew Meade. 2007. Frequency of word-use predicts
rates of lexical evolution throughout Indo-European history. Nature 449:717-720.
Pearl, Lisa and Amy Weinberg. 2007. Input filtering in syntactic acquisition: Answers from
language change modeling. Language Learning and Development, 3(1):43–72.
Petitto, Laura Anne. 2005. In The Chomsky Reader edited by James McGilvray, 85-101.
Cambridge: Cambridge University Press.
A FEELING FOR THE PHENOTYPE 24

Pfenning, Andreas R., Erina Hara, Osceola Whitney, Miriam V. Rivas, Rui Wang, Petra L. Roulhac,
Jason T. Howard, Morgan Wirthlin, Peter V. Lovell, Ganeshkumar Ganapathy, Jacquelyn
Mountcastle, M. Arthur Moseley, J. Will Thompson, Erik J. Soderblom, Atsushi Iriki,
Masaki Kato, M. Thomas P. Gilbert, Guojie Zhang, Trygve Bakken, Angie Bongaarts, Amy
Bernard, Ed Lein, Claudio V. Mello, Alexander J. Hartemink, Erich D. Jarvis. 2014.
Convergent transcriptional specializations in the brains of humans and song-learning birds.
Science 346: (6215), 1256846.
Rilling, James K. 2014. Comparative primate neurobiology and the evolution of brain language
systems. Current Opinion in Neurobiology 28C: 10-14.
Saffran, Jenny, Aslin, William, and Elissa Newport, 1996. Statistical learning by 8-month old
infants. Science 274:1926-1928.
Schwartz, Jeffrey J. and Ian Tattersall, 2015. Defining the genus Homo. Science, 349(6251): 931-
932.
Somel, Mehmet, Xiling Liu, and Philipp Khaitovich. 2013. Human brain evolution: transcripts,
metabolites and their regulators. Nature Reviews Neuroscience. 14: 112-127.
Sonderreger, Morgan and Partha Niyogi, 2010. Combining data and mathematical models of
language change. Proceedings of the 48th Annual Meeting of the Association for
Computational Linguistics, pages 1019–1029, Uppsala, Sweden, 11-16 July 2010,
Association for Computational Linguistics.
Stout, Dietrich, 2011. Stone toolmaking and the evolution of human culture and cognition.
Philosophical Transactions of the Royal Society of London Series B, Biological Sciences,
366(1567): 1050–1059.
Stout, Dietrich, and Thierry Chaminade. 2009. Making tools and making sense: complex,
intentional behaviour in human evolution Cambridge Archaeological Journal 19(01):85-96.
Tattersall, Ian. 2012. Masters of the Planet.
Tinbergen, Niko, 1951. The Study of Instinct. New York: Oxford University Press.
Whitehouse, Andrew, Dorothy. V. M. Bishop, Qi W. Ang, Craig E. Pennell, and Simon E. Fisher,
2011. CNTNAP2 variants affect early language development in the general population.
Genes, Brain and Behavior 10: 451–456.
Yang, Charles D., 2012. Universal Grammar, statistics, or both? Trends in the Cognitive Sciences
8:10, 451-456.
Yang, Charles D., 2000. Internal and external forces in language change. Language Variation
and Change 12:231–50.
Zaccarella, E. & Friederici, Angela D., 2015. Reflections of word processing in the insular cortex:
A sub-regional parcellation based functional assessment. Brain ad Language 142:1-7 (2015).
A FEELING FOR THE PHENOTYPE 25
A FEELING FOR THE PHENOTYPE 26

You might also like