Indo-European and The Ancient DNA Revolu PDF

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 54

Indo-European and the Ancient DNA

Revolution
Paul Heggarty
Dept of Linguistic and Cultural Evolution,
Max Planck Institute for the Science of Human History, Jena

In the fast-moving world of ancient DNA, a string of important


new publications has appeared since this text had to be submitted
in final form in July 2016. These latest findings do not change the
basic argumentation set out here, which remains valid. In fact,
they reconfirm many of the arguments, although they do alter
the balance of plausibilities on some of the questions left open in
this paper. I thank the editors for therefore allowing a brief
update to be added to the end of this paper, at the final proofing
stage.

1. Revolutionising the Indo-European Question


The birth of linguistics is often dated to the realisation, in
the late eighteenth century, that many languages of India and
of Europe were “sprung from some common source”. For a
good century thereafter, Indo-European studies dominated and
all but defined the entire discipline. In recent decades, however,
it is above all archaeologists who have formulated and
articulated the leading hypotheses on Indo-European origins
(see Heggarty, this volume). In particular, they have framed
hypotheses within the archaeological contexts of the spread of
pastoralism based on the domestication of the horse, or of
farming, and the corresponding cultural and/or demographic
contrasts in prehistory.
In the last few years, however, those debates have been
revolutionised by what already rank among the most significant
developments for decades on the Indo-European question. The
novelty comes neither from linguistics nor from archaeology.
Rather, it comes from the twin revolutions now sweeping
human genetics: full genome analysis, with its much enhanced
Indo-European and the Ancient DNA Revolution 121

resolution; and above all the ability to extract viable ancient


DNA (aDNA). A string of prominent papers, not least in Science
and Nature, has begun to build up a sequence of aDNA samples
from Europe, and northern Eurasia more widely. Although this
aDNA record remains patchy and uneven, both across
geographical space and through time periods, coming years
should rapidly fill in many of the gaps. And the pieces of the
puzzle already in place are beginning to allow much firmer
inferences on certain key aspects of population prehistory,
particularly of northern Europe, and most recently of all, of the
Near and Middle East.
The latest papers bring contributions on all three levels of
the when, where and why of Indo-European. Arguably the
biggest novelty, though, is on the why question, and specifically
on the theme of this special issue: farming vis-à-vis pastoralism,
especially their respective demographic impacts. On the one
hand, the new data support a demic diffusion with farming
eastwards from the Fertile Crescent through Iran and as far as
India (Broushaki et al. 2016). Westwards into Europe, aDNA
confirms a transformational impact of farming, with large-scale
genetic replacement by mass population expansion, in a ‘wave
of advance’ spreading westwards out of Anatolia across Europe
over the course of several millennia. On the other hand, aDNA
also reveals that in Europe that genetic picture has since been
quite heavily rewritten, at least in some regions. Specifically,
and as foreshadowed in Brandt et al. (2013), Haak et al. (2015)
report evidence for a (very) strong demographic impact from
the Pontic-Caspian Steppe into north-central and north-eastern
Europe c. 4500 BP. In archaeological terms, a population from
the Yamnaya Steppe culture appears to have formed much of
the genetic make-up of populations of the north European
Corded Ware culture.
The scale of this reported impact — around three-quarters
of the Corded Ware aDNA sample is claimed as derivable from
the Steppe — has come as a surprise to widespread
archaeological thinking. Both main hypotheses on Indo-
European origins seem to have underestimated the
demographic dimension of this east-to-west movement, long
pointed to by some archaeologists, but not in such demographic
strength. An accompanying paper (Heggarty, this volume) gives
122 Paul Heggarty

full citations on this question of demography. To summarise


here, Renfrew’s (1987: 265) main objection to the Steppe
hypothesis was the lack of archaeological evidence for any
“profound population and language changes across the whole
of Europe at the beginning of the bronze age”. The new genetic
data, however, now do seem to indicate “profound population
change [and] demographic consequences” in this period. Or
rather, at least in some parts of Europe they do, so strictly
Renfrew’s point may still hold, given his qualification “across
the whole of Europe”.
In fact, even advocates of the Steppe hypothesis have
hitherto seen Indo-European incursion into Europe primarily as
a cultural rather than a demographic expansion. For Anthony
(2013), unlike the Anatolian hypothesis “driven by demographic
advantages”, the Steppe hypothesis was driven by a mechanism
“not obvious” and which “must have depended little if at all on
demographic advantages, as no obvious demographic advantage
can be assigned to any particular region or culture in the
Copper and Bronze Ages”. Even texts as recent as Anthony &
Ringe (2015: 210) repeat the same message: “steppe herders
certainly held no demographic advantage over the Old
European population”. All of this may have to be swiftly
rethought, then. For if the new genetic data are as clear-cut and
as incontrovertible as claimed, then there was after all a
significant demographic replacement, in north-central and
north-eastern Europe in the Bronze Age, and it came from the
Steppe.
This paper aims to provide a thorough review of the
results and interpretations in a series of articles on ancient DNA
from Eurasia, published in 2015 and 2016, specifically as they
contribute to the debate on Indo-European origins. Many of
those articles, and the discussion here, turn on the theme of this
special issue: the origins and dispersal of farming in the Fertile
Crescent, and horse-based pastoralism on the Pontic Steppe.
Ancient DNA turns out to offer full support to neither of the
main rival Indo-European hypotheses, in the most clear-cut
formulation of each. So this paper also outlines an alternative
‘A2’ hypothesis, which combines some elements of both, to
propose a better fit with the aDNA results, for both the eastward
and westward dispersals of Indo-European.
Indo-European and the Ancient DNA Revolution 123

1.1 Genes and Language: Provisos and Clarifications


Before starting out on any consideration of possible
associations between genes and language, one very big proviso
is in order: genetic and linguistic lineages do not necessarily
match. Obviously, language shift happens. Over many
millennia, humanity built up enormous linguistic diversity, but
in the Modern Era that is now calamitously collapsing.
Language death is proceeding at an accelerating rate, in favour
of a very small number of national or supra-national linguistic
juggernauts like English, Spanish and Mandarin. The scale of
shift is apparent even from how many native English-speakers
in the USA bear surnames that conspicuously originate in other
languages (Obama, Kennedy, Roosevelt, Eisenhower …).
The range of populations that speak the Indo-European
‘linguistic lineage’ certainly make for no straightforward match
to some single genetic profile. The main ancestries in its
speaker populations from northern India to the Atlantic are
estimated to have diverged from each other tens of millennia
before Indo-European linguistic divergence could ever have
begun (see §3.3 below). Conversely, in many regions Indo-
European-speakers are genetically all but indistinguishable
from neighbours whose linguistic lineages are not Indo-
European at all. Uralic speakers in north-eastern Europe are
genetically very close to their immediate neighbours speaking
Indo-European languages of the Baltic, Slavic and (especially
Scandinavian) Germanic branches. In particular, all bear
similarly high proportions of the genetic components identified
in Haak et al. (2015) with the Yamnaya → Corded Ware
incursion into north-eastern Europe.
It may seem disconcerting, then, that so much recent
discussion of ancient DNA results tries to associate particular
population movements with particular language lineages,
especially Indo-European. A first clarification, however, is that
there is certainly no expectation and aspiration for any
simplistic one-to-one match between linguistic and genetic
lineages. Rather, across populations that overall are clearly
genetically very distinct, the search is for just some part
components that may be common to them all, albeit in very
different proportions. A second clarification is that the
124 Paul Heggarty

associations need not be sought so directly between languages


and genes. Rather, one can search for corresponding patterns
found in both data-sets, that can both be linked back to the
same underlying processes in prehistory, especially major
cultural and/or demographic expansions. This focus on such
processes, and their parallel impacts on different levels
(archaeological, genetic, linguistic), is a much more plausible
methodology for linking the disciplines than outdated,
simplistic equations of “cultures = languages = genes” (see
Heggarty 2014: 599-602).
Thirdly, the mere fact that language shift can happen does
not invalidate a basic connection between language spreads and
demography. All hangs on the demographic (and social)
contexts — balances or imbalances — that tend to push that
shift towards one language lineage or another. Does language
shift tend to operate counter to a demographic gradient, or in
line with it? Before the radically changed contexts of the
Modern Era, did mass language shift happen in contexts of elite
dominance, towards the language of a distinct minority
population? Or did shift instead work more by pushing
minority populations progressively to shift to a majority
language with demographic ‘critical mass’ (in something of an
analogy to drift to fixation in genetics)? (For more on this, see
Heggarty 2014: 262-263, and Heggarty, this volume.)
One other proviso concerns genetics alone. For all its
spectacular advances, ancient DNA analysis still faces the
challenges of a young, novel and very highly technical field. As
vast quantities of raw genetic data are produced, much still
hangs on the methods and models used to get from those data
to interpretations of actual population history, deep in the past.
Among the most popular techniques is Principal Components
Analysis: most papers reproduce similar versions of a PCA plot
for ancient and modern populations of Eurasia. To judge how
much to read into it, however, one must bear in mind that the
first two principal components in fact represent but a tiny
proportion of the total, very complex signal: just 1.48% and
0.59% in Broushaki et al. (2016: Fig. 2), for instance. Most papers
also rely heavily on ADMIXTURE analysis, but its outputs can
result from a range of different histories (Falush et al. 2016),
while alternative approaches, more explicit models of descent,
Indo-European and the Ancient DNA Revolution 125

can come to significantly different results (as we shall see).


Those models rely on certain presumptions and choices,
however, which are in turn a function of a (non-random)
‘sampling’ of those regions and periods that have so far yielded
viable aDNA, usually for just very small numbers of ancient
individuals. In sum, the messages from aDNA are not necessarily
so clear-cut as they may first appear, especially to those outside
the discipline and less aware of the methodological issues
within it.
With these provisos now clarified, we can now more
safely turn to the ancient DNA data, and how they bear on the
Indo-European question, particularly with respect to farming
and pastoralism.

1.2 What About the Indo-?


Haak et al.’s (2015) ancient DNA data can certainly appear
as support for the Steppe hypothesis, at least at a first glance,
and at least for parts of Europe, above all the north(-east). Still,
believers in that hypothesis might rein back their enthusiasm
with a reality-check from the name of the family itself: for
Indo-European is not all about Europe. The focus of the first
main batch of recent papers was on Europe (and the Steppe),
because that is where almost all of the new ancient DNA
samples were first emerging from.
Then, just as this paper was to be submitted in final form,
three new articles (two still only in pre-print releases) published
aDNA results for a total of 52 new ancient samples: 44 in
Lazaridis et al. (2016), 7 in Broushaki et al. (2016), and 1 in
Gallego-Llorente et al. (2016). These bear more directly on the
origins of Indo-European outside Europe, for the samples are
from ancient Iran, Anatolia and the Levant, spanning the last
ten millennia or so. These paint a scenario very unlike the
clarity and scale of Steppe impact in Corded Ware Europe, and
have led to conflicting interpretations for the Indo-European
debate. These very latest articles are assessed here particularly
in an addendum towards the end of this paper (§3).
Returning to Haak et al. (2015), what the paper actually
attributes to the Steppe is the origin of “at least some of the
Indo-European languages of Europe”. Every qualification
counts here, not least “of Europe”. Some of the most relevant
126 Paul Heggarty

data on those qualifications are only to be found deep in the


supplementary information of Haak et al. (2015: si Fig. S6.3), in
the genetic profiles of present-day speakers of Iranic and Indic
languages, by comparison with the ancient DNA samples. The
newer aDNA data have since confirmed and clarified things
further: first Jones et al. (2015) for hunter-gatherers from the
Caucasus, now the latest three papers sampling early farmers
across the Fertile Crescent.
Furthest east in India, modern populations show a merger
of two basic genetic ancestries: a local ‘Ancestral South Indian’
one, into which came another input especially strong in the
north, and dubbed ‘Ancestral North Indian’, ANI (since Reich et
al. 2009). A large part of this incoming profile is a component
found at very high proportions in Iranic-speakers too, where on
a straightforward analysis it seems not derivable from the
Yamnaya steppe population to any significant degree (although
see also §3.1 below).
Jones et al. (2015) were the first to report aDNA samples
that were clear ancient representatives of this component,
dominant both in Iranic speakers and within the ANI input to
Indic-speaking areas. Geographically, these ancient individuals
were from western Georgia, i.e. the southern reaches of the
Caucasus ‘isthmus’ between the Black and Caspian Seas,
immediately north of eastern Turkey and Armenia. In other
words, they are rather far from the Steppe, but not from Eastern
Anatolia, i.e. the northern arc of the Fertile Crescent.
Chronologically, these samples are pre-Neolithic, when all
populations were still hunter-gatherers, hence their
denomination as CHG (Caucasus Hunter-Gatherer) in Jones et
al. (2015). There is considerable continuity to modern
populations south of the Caucasus, however, and the very latest
aDNA has confirmed that early farmers there were also
genetically similar. Certainly for our debate on farming and
Indo-European hypotheses, more relevant is not which samples
happened to be reported first, but who were the first farmers, so
this component can be more helpfully referred to as ‘Eastern
Fertile Crescent’ (EFC) rather than CHG — although for
consistency with the papers cited here, we continue with CHG
for now.
Indo-European and the Ancient DNA Revolution 127

It is this CHG/EFC component that eventually moved north


to make up c. 40% of the Yamnaya pastoralist samples, but not
until 7000-5000 BP (Haak et al. 2015: 209), by which time they
would indeed more plausibly have been farmers than hunter-
gatherers. The remaining 60% of the Yamnaya ancestry mix, the
‘large half’ local to the steppe, is dubbed Eastern (European)
Hunter-Gatherer (EHG). By the time of the Yamnaya culture and
its expansions (after 5000 BP), its population had already
become an indissoluble mix of those two components. Any
dispersal out of this region in the date-range of the Steppe
hypothesis, then, would necessarily have carried a population
profile roughly half EHG and half CHG, and such is indeed what
entered Corded Ware north-eastern Europe.
Indic and Iranic-speakers, however, show no such even
mix (e.g. in Haak et al. 2015: Fig. S6-3). Instead, the incoming
western ancestry is far more heavily made up of the Eastern
Fertile Crescent component, with little if any of the other ‘large
half’ EHG component in the Yamnaya Steppe population. So if
one is looking for a common genetic component to tie to both
Corded Ware and Indo-Iranic speakers, there is one — but it is
not the CHG-EHG mix of the Steppe. Only CHG (EFC) alone, from
the Eastern Fertile Crescent, without a major EHG component
from the Steppe, fits as a dominant source for speakers of
Iranic, and a large part of the ANI mix that spread into India. As
Jones et al. (2015: 5) put it: “CHG ancestry was also carried east
to become a major contributor to the Ancestral North Indian
component”.
As for “Exactly when the eastwards movement occurred”,
Jones et al. go on to underline that it is not yet known, but that
“movements associated with … cereal farming … are also
plausible.” This recalls Renfrew’s farming hypothesis (version
A, not B), by which Indo-Iranic spread eastwards with farming,
out of the Fertile Crescent to the Indus. On the other
chronological question, of when this western ancestry began to
admix with the Ancestral South Indian genetic component,
Jones et al. (2015) estimate a time-frame from 4200 BP onwards.
This compares with the fifth millennium BP as the
archaeological dating for when farming, after a long pause on
the Indus, crossed the highland watershed and began spreading
down the Ganges (Heggarty & Renfrew 2014: 540-548). Jones et
128 Paul Heggarty

al. (2015: 1) themselves explicitly suggest the eastward spread


of CHG as “possibly marking the arrival of Indo-Aryan
languages”. They fail to follow through, however, on all the
limitations and qualifications that their own suggestion
necessarily entails for the claims regarding Yamnaya and Indo-
European, which they instead largely just repeat from Haak et
al. (2015).
In short, the more one trusts in the ancient DNA profile of
the Yamnaya Steppe population, and associates it with bringing
at least some Indo-European languages to Europe, then the
more — by that same insistence — one actually implies that the
Steppe is not a plausible origin for Proto-Indo-European as a
whole. For the roughly even mix of EHG and CHG/EFC in the
Yamnaya profile cannot be the basic genetic source of Iranic
and Indic speakers, in whom any EHG is generally dwarfed by
CHG/EFC, in Haak et al.’s (2015: Fig. S6-3) own results. That
CHG/EFC component neither originated on the Steppe, nor did it
pass through the Steppe on its eastward expansion through Iran
and towards India. On the contrary, as confirmed by the latest
aDNA, it was the dominant ancestry component of the first
farmers from the Eastern Fertile Crescent.
We return to the eastward story of Indo-European at the
end of this paper in §3, an update on those very latest aDNA
findings. Until they had emerged, Europe and Yamnaya had
necessarily been the focus of the ancient DNA debate, and thus
remain the focus of the assessment here, of how farming,
pastoralism and Indo-European fit into that debate. But the
Indo- side of the story already serves as a powerful cautionary
tale. It is pause for thought before jumping to conclusions that
the latest ancient DNA tell a simplistic story unequivocally in
favour of the Steppe and against the farming hypothesis.

2. Europe and the Steppe


Indeed in Europe too, on closer inspection the picture
turns out much more complex than the prima facie impression
of support for the Steppe hypothesis, and open to a very
different interpretation. That hinges above all on how the two
rival answers to the why question relate to each other — how
pastoralism relates to farming — so we must first set that
relationship into its contexts in time and space.
Indo-European and the Ancient DNA Revolution 129

2.1 Chronology: Indo-European Divergence in Just 4500


Years?
A first concern for the Steppe hypothesis is on the level of
chronology. The Yamnaya → Corded Ware incursion from the
Steppe is set at c. 4500 BP, from the radiocarbon dates of the
skeletons from which the aDNA was extracted (Haak et al. 2015).
Even allowing a few more centuries’ leeway, back to the
4800 BP date usually given as the beginnings of Corded Ware,
this remains uncomfortably late for the Steppe hypothesis as a
whole, usually conceived of within a time-frame of at least
5500 BP, if not up to 6500 BP. If this movement out of the Steppe
is to explain all Indo-European in Europe, then it not only had
to reach north-central Europe c. 4500 BP, but also spread
(subsequently?) into southern and western Europe, to account
for the other branches of Indo-European found there. A time-
span of just 4500 years or so would thus need to be sufficient to
allow for all of the divergence between all European branches
of Indo-European: Slavic vs. Baltic vs. Germanic vs. Celtic vs.
Italic vs. Greek vs. Albanian. In other words, we are left with
little more than two millennia to take us, for example, from the
early Latin of the last few centuries BC back to Italic, then
further back to Italo-Celtic (if one accepts that clade), and then
back again to allow for a sufficiently deep split from other
branches such as Greek. Early Latin and Greek texts document
what were, already by 2500 years ago, sub-lineages very far
diverged from each other into fully-fledged, mutually
unintelligible languages. And even amongst dialects all
unquestionably identifiable as Greek, within that single clade
their own divergence already takes us back to at least 3000 BP,
on standard thinking. That leaves just 1500 years for divergence
vis-à-vis all other branches of Indo-European in Europe.
As an informal but informative yardstick, consider how
similar modern Italian and Spanish remain, some 2200 years
after Roman expansion to Iberia. From that perspective, an
expansion at barely double that time-depth, 4500 BP, looks
suspiciously shallow for the entire, far greater diversity of Indo-
European within Europe. Obviously, one can alternatively pick
example languages that have changed abnormally quickly, such
as French and English, especially well known and prominent to
130 Paul Heggarty

many historical linguists, and they can duly create different


impressions of how much change and divergence seem ‘normal’
over particular time-frames. Yet other individual languages can
give an impression at the opposite extreme, of just how little a
linguistic system can change over time, as in the famous
example of Lithuanian case-endings vis-à-vis those of Proto-
Indo-European itself. This all serves to underline how
important it is firstly to work towards objective measures of
rates of change over time (and how widely they can vary), and
secondly, in the meantime, to remain open to a wide range of
possible, plausible time-spans over which given degrees of
language divergence may have arisen.
For their “Late Proto-Indo-European” ancestor of all Indo-
European branches in Europe, plus Indo-Iranic, Anthony &
Ringe (2015: Fig. 2) propose 5000 BP. Some branches emerge
only later still, including the differentiation between Germanic,
Balto-Slavic and even Indo-Iranic. They explain their branches
by not one but two migrations (their 3a and 3b) from the
Steppe, more or less contemporaneously around 5000 BP. With
some chronological leeway, their 3b could correspond to the
Steppe → Corded Ware movement c. 4500 BP, but Haak et al.
(2015) contains no genetic evidence for 3a. Nor does it indicate
predominant Steppe input to the Indo-European-speaking
populations of southern Europe, which retain to this day a
primarily ‘first farmer’ signal (Haak et al. 2015: Fig. 3).

2.2 Geography: All Indo-European — or Some?


That brings us to the second level, geography, where again
the new genetic data are far from straightforward for the
Steppe hypothesis. This is already hinted at in how Haak et al.
(2015) phrase their title, writing of “a” rather than “the (only)”
source of Indo-European languages, and adding the limitation
“in Europe”. The closing sentence of their abstract is similarly
hedged: “These results provide support for the theory of a
steppe origin of at least some of the Indo-European languages of
Europe” (emphasis added).
On the one hand, these qualifications seem reasonable.
The new aDNA data the authors report on — already significant
enough, of course — come mostly from parts of the Steppe and
northern Europe, and coverage is still too limited to make
Indo-European and the Ancient DNA Revolution 131

claims for all of the continent in all periods. The abstract is


being careful to clarify those limits. And the basic message is
indeed the first powerful confirmation in genetics of a
significant ancient Steppe impact on any part of Europe. On the
other hand, the phrasing is crafted to lean towards the Steppe
hypothesis in a way that is potentially deceptive within the
Indo-European debate. It borders on the misleading to talk of
support for “the theory”, clearly intending this to be identified
with the Steppe hypothesis. For the authors choose not to state
simply that their results “provide support for a steppe origin of
…”, but specifically of “the theory” to that effect. This,
particularly the use of the definite article in “the theory”,
implies a specific known, established theory. And among the
main theories, the one that entails “a steppe origin” is of course
the Steppe hypothesis. But what then follows — “the theory of a
steppe origin of at least some of the Indo-European languages of
Europe” — is not at all an accurate description of what the
hypothesis claims, nor of what the Indo-European question
actually is. It is precisely because the abstract is so guarded and
limited that it is misleading for it nonetheless to suggest that its
results support the Steppe hypothesis, when that is far more
ambitious, and limited neither to just some Indo-European
languages, nor to those of Europe. The wording seems intended
to redirect us to ask the wrong Indo-European question, as if it
boiled down to just whether or not there was a significant
Steppe → Corded Ware population movement. If that were the
extent of the Indo-European question, then there would be little
to dispute, for the farming hypothesis too can just as well
accommodate such a migration as a secondary movement
within it, as we shall see.
‘The’ Steppe hypothesis has never has been so limited.
Like all proposals, it aims to be a complete answer to the Indo-
European question, which is much less qualified, and about far
more than a Steppe → Corded Ware movement. The Steppe
hypothesis is the theory that the Steppe is the homeland of
Indo-European languages not just in (Corded Ware) northern
Europe, but everywhere else too: western and southern Europe,
Armenia, Iran, India, (formerly) Anatolia, and so on. So if only
some branches in some regions go back to the Steppe, then
132 Paul Heggarty

strictly the Steppe hypothesis in fact fails as an answer to the


real, full Indo-European question.
This brings us back to Renfrew’s (1987) repeated
clarifications on the level of geography, which led him to doubt
the Steppe hypothesis in the first place. The scale of Indo-
European reflects impacts “so widespread geographically” as to
apply almost “across the whole of Europe”, and far beyond.
From that perspective, the latest aDNA includes results that can
even seem positively problematic for the Steppe hypothesis.
These are most usefully summarised in Haak et al. (2015: Fig. 3),
where it is important to distinguish both between modern and
ancient populations (the top and bottom halves of Fig. 3), and
within each, between populations of (broadly) northern and
southern Europe.
Firstly, among the modern populations, Steppe ancestry is
at high levels in northern Europe. In southern, Mediterranean
Europe, although present to some degree in most populations, it
is far from pervasive. The contribution from Europe’s first
farmers continues to far outweigh it, in most cases by two to
one, or more.
Secondly, ancient DNA can in principle reveal at which
stage in European (pre)history Steppe ancestry is first detected
in any given region. To do so, however, requires a transect
through time: aDNA samples from multiple different time depths
in that region. This is a luxury we do not yet enjoy, but the
picture is gradually being filled out. In northern Europe, there is
no steppe impact in samples from the Early and Middle
Neolithic periods, but it then does appear in Late
Neolithic/Early Bronze Age samples from c. 4500 BP onwards
(Haak et al. 2015), and especially strongly in the Corded Ware
north-central Europe. From then on, this signal remains fairly
high in what becomes the genetic make-up of north-central and
north-east Europe. In the first ancient DNA results from Ireland,
steppe ancestry is again not found in the one Middle Neolithic
sample, but does appear in three Bronze Age samples, in a
proportion of up to about one third (Cassidy et al. 2016: 370),
although different analyses in fact give significantly different
values, as we shall see (§3.1).
For southern, Mediterranean Europe, a few samples are
available from the Early and Middle Neolithic, and up to the
Indo-European and the Ancient DNA Revolution 133

Copper Age, and point to predominant first farmer ancestry,


and (as in northern Europe) no Steppe ancestry yet. But there is
then a long gap in the record, with almost no aDNA samples yet
reported in Mediterranean Europe from any date after 4500 BP,
until our plentiful modern data. The latter do by now show
some proportion (although lower than in northern Europe) of
Steppe/Corded Ware/north European ancestry. But when did it
arrive in the south? Without aDNA from Mediterranean Europe
after 4500 BP, we cannot yet tell: at the same time as in the
north, or much more recently?
From history, however, we do know of a series of major
population inputs from northern into southern Europe. Within
the Roman Empire, slaves and ultimately many legionaries too
were drawn from Germanic populations. Thereafter, it was in
large part Germanic and Slavic incursions that hastened the fall
of Rome. For centuries Germanic-speakers then ruled much of
southern Europe: the Franks and Burgundians in France,
Visigoths in Spain, Lombards in much of Italy, and so on (see
also Heggarty, this volume: §6.2). Slavs, meanwhile, came to
dominate most of the Balkans, with incursions as far south as
what is today Greece.
So the limited Steppe/north European genetic component
visible in modern populations of southern Europe could thus be
the result of these movements in historical times, not the
Bronze Age. That would suit the farming hypothesis, since that
timeframe is obviously far too late for the first entry of Indo-
European languages into southern Europe, where they had
been widely spoken since long before the Roman Empire. There
were of course non-Indo-European languages there too, not
least Etruscan and Basque. But the issue is not whether one can
cite a few language names, but how significant those languages
were in demographic and territorial scale across Europe. Yes,
Etruscan covered a good fraction of Italy — but still much less
of Italy than Italic, Celtic, Greek and other Indo-European. Yes,
there are gaps and uncertainties about whether some poorly-
attested languages are Indo-European or not — but that does
not authorise an ‘if in doubt, presume not’ bias. Traditional
Steppe hypothesis accounts tend to overplay the extent of non-
Indo-European languages, when in geographical coverage it
was indeed Indo-European that dominated the Balkans and
134 Paul Heggarty

Gaul, a majority of Italy, and large parts of Iberia, as they


emerge into history and we can first reliably identify the
languages there.
So if Yamnaya ancestry is taken to mark the first spread of
Indo-European to Europe, then that (limited) ancestry in
southern Europe today has to go back to an incursion there
significantly before historical times. It must have come either
directly out of the Steppe (but more weakly than in the north),
or indirectly out of northern Europe as a relatively short-term
staging post, either during the Corded Ware period too, or soon
thereafter.
Such would be the different predictions of the competing
Indo-European hypotheses. To assess which best fits, we can
but wait with bated breath for the aDNA record to be filled out
for the relevant time-periods in southern Europe, from the
Bronze Age to the present. From a cultural perspective, at least,
it would entail significant re-interpretation of the
archaeological record to presume, in Bronze Age Europe, an
impact from north to south so powerful as to oust native
cultures and languages far across the heartlands of
Mediterranean civilisation that gave rise to Greece and Rome,
for example.
It is worth noting that in the supplementary material to
Haak et al. (2015: si11, 135-140), section 11 on the “Relevance of
ancient DNA to the problem of Indo-European language
dispersals” is rather more balanced and non-committal than the
title and abstract of the paper itself. An alternative way of
viewing the data, indeed, is that in the very act of confirming
significant impact out of the Steppe into Corded Ware Europe,
Haak et al. (2015) simultaneously confirm how relatively weak
that impact was in other Indo-European-speaking regions of
Europe, particularly the south, where first farmer genes remain
dominant to this day.
Indo-European and the Ancient DNA Revolution 135

2.3 All or Part? A Primary or Secondary Expansion?


How might the farming hypothesis itself, then,
incorporate the latest genetic data within its wider, deeper-time
scope? Just because a post-Neolithic Steppe → Corded Ware
incursion may accord well with one of the predictions of the
Steppe hypothesis hardly precludes it from fitting as a
secondary expansion within the farming hypothesis too. Indeed,
the nature of the Indo-European family also requires
explanations for the major branches or sub-lineages within it,
their own respective ‘sub-expansions’, and the patterns of
linguistic characteristics that some of them share with each
other. Within the longer chronology of the farming hypothesis,
a Steppe incursion into parts of Europe at c. 4500 BP, long after
farming had first spread out of the Near East, would have to
correspond not to Indo-European as a whole, but to one or
more of its main sub-branches.
To work out which, Haak et al.’s (2015) figure 3, presented
only with geographical labels, needs to be seen also in terms of
linguistic (sub-)lineages, to identify which have the highest and
lowest proportions of Steppe input. (Necessarily, this entails
focusing on the modern populations, those whose linguistic
affiliation we can be sure of.) As already noted, Steppe ancestry
is high only in populations that speak some, not all, branches of
Indo-European in Europe: Baltic, Slavic and Germanic.
Obviously, the Corded Ware time-depth of 4500 BP long
predates the separate individual expansions of each branch.
That of Germanic, out of its presumed homeland in southern
Scandinavia, is usually set at c. 2500 BP. Likewise, at 4500 BP the
expansion that would give rise to the Slavic family proper still
lay far in the future. Balto-Slavic is a wider lineage, however,
and 4500 BP or soon thereafter would fit well with estimates for
the divergence between Baltic and Slavic within a farming
hypothesis chronology. An initially more easterly origin for the
lineage deeply ancestral to Balto-Slavic might also explain the
occasional linguistic characteristics that it has been argued to
hold in common with Indo-Iranic, and that set it apart from
most or all other branches in Europe.
136 Paul Heggarty

2.4 Not Just Indo-European: Uralic


It is striking, too, that these three sub-lineages of Baltic,
Slavic and Germanic by no means exhaust the list of European
populations high in Steppe ancestry. Some of the very highest
proportions are found in populations that speak languages not
Indo-European at all, but Uralic: Finnish, Estonian and Saami.
So if these new genetic data are to explain the presence in
Europe of (some branches of) Indo-European, then that same
logic holds just as validly for Uralic-speaking populations that
have similar or even higher Steppe ancestry. The population
movement from the Steppe into north-eastern Europe may
therefore equally well reflect the first incursion here of
members of the Uralic family too. Chronologically, 4500 BP falls
within the range of estimates for when Uralic’s own divergence
began (Kallio 2006), and this movement into Europe would have
been precisely what ensured divergence from those speakers of
Uralic who remained further east. (Amongst those were the
linguistic ancestors of Hungarian, who arrived in Europe just
before AD 900.) This would thus set Proto-Uralic on or near the
Steppe, and in contact there with some already ‘eastern’ form of
Indo-European, plausibly explaining the presence and
particular nature of Indo-European loanwords into early Uralic.
It is perhaps not surprising, although rather convenient,
that Haak et al. (2015) do not mention linguistic affiliations
within their Figure 3. For when one does take them into
account, the data once more emerge as anything but clear-cut
support for the Steppe hypothesis. Several key Indo-European
lineages are spoken by populations that do not show high
proportions of Steppe ancestry, whereas that is found in all
Uralic-speaking populations in Europe.

2.5 Back to Why: Secondary Phases within Farming


Expansions
Haak et al.’s aDNA provides pointers as to what happened,
and helps pinpoint when and where: a major population
movement from the Yamnaya Steppe region into Corded Ware
northern Europe. But what real processes in prehistory could
explain how and why this movement should have happened at
all? What would have given some such population both the
incentive and wherewithal to embark upon such an expansion,
Indo-European and the Ancient DNA Revolution 137

with such far-reaching effects? On this question, the main news


for both sides in the debate is the sheer scale of that
demographic impact: two thirds to three quarters of the
ancestry of much of northern Europe (at least according to the
model in Haak et al. (2015), although see also §3.1). Where did
this demographic advantage stem from? In particular, given the
relationship between demography and the productive success
of a subsistence regime, how does this fit with the contrasting
theories of Indo-European origins that set horse-based
pastoralism against farming?
The general farming/language hypothesis in fact comes
with a string of qualifications, limitations and refinements
(Heggarty & Beresford-Jones 2010, 2014). Amongst these is that
in any given part of the world, the first impact of farming can
be given further impetus, and even be partly overwritten, by
later, ‘secondary’ expansions. These can also be conceived of as
(re-)expansions of farming, but with new components not
present in the initial spread. New plant and animal species, or
new farming technologies, can bring about a step-change in
farming intensification, and thereby also in demographic
potential. Moreover, the relative success of one form of farming
(or any other subsistence mode) over another is heavily
contingent on environmental context. (Note the cautionary tale
of the mediaeval Norse in Greenland: farming settlements that
collapsed and vanished, in the very same context where hunter-
gatherer speakers of Eskimo-Aleut enjoyed expansive linguistic
‘success’.) A particular technological package can open up a
previously under-exploited environmental niche, and release
further demographic potential.
The main cases worldwide where a farming-language
dispersal is most widely accepted repeatedly illustrate this
possibility of secondary expansions in such contexts. Farming
did not develop independently on Taiwan, but spread there
with a population presumed to have spoken a language
ancestral to Austronesian. Later, farming and just one form of
Austronesian spread together out of Taiwan, into and
throughout Island South-East Asia. In linguistic, phylogenetic
terms this was technically but a secondary expansion, of the
Malayo-Polynesian branch alone. A millennium or so later, just
one sub-lineage of Malayo-Polynesian — the Oceanic languages
138 Paul Heggarty

— spread in another (‘tertiary’) phase, through Central and


Eastern Polynesia. The explanation for both expansion stages
was not just the demographic potential of farming, but most
likely also significant advances in technology (in this case, for
seafaring: the outrigger, and then the double-hulled canoe?). In
Africa, similarly, Bantu spread with farming, but owed its
expansive success also to iron-working technology (not least
for agricultural tools). Linguistically too, Bantu represents a
secondary spread within the wider Niger-Congo family. The
initial dispersal of Sino-Tibetan has likewise been widely
attributed to farming, but much of it was later rewritten by a
huge secondary expansion when Sinitic (‘Chinese’) seems to
have crossed some further demographic threshold, an
intensification of farming productivity, but based on major
cultural and socio-political developments (Heggarty &
Beresford-Jones 2014).
In fact, for Indo-European too, similar proposals have
already been mooted, as in Sherratt & Sherratt’s (1988) appeal
to a “secondary products revolution”, even if the specific
scenario they suggested has not enjoyed any great support. The
latest genetic findings now call for a reconsideration of
hypotheses on Indo-European origins, with an eye to the
precedents in other great language families for both a primary
spread with farming and a secondary intensification and
expansion phase. This can even bring together certain key
insights of the competing Indo-European hypotheses, rather
than necessarily opposing them.

2.6 Secondary: How Pastoralism Relates to Farming


The farming hypothesis for Indo-European has generally
stressed the importance of a primary demographic ‘wave of
advance’ out of the Near East. As if in deliberate opposition to
that fundamentally demographic logic, the Steppe hypothesis
has generally been couched in terms of elite dominance and
language shift, and pointedly downplayed mass population
movements (see Heggarty, this volume: §3, §6.2). For both
hypotheses, then, but in very different ways, there is something
of an irony that Haak et al. (2015) uncover an unexpectedly
powerful demographic signal accompanying the Steppe →
Corded Ware movement. For it implies that here too, behind
Indo-European and the Ancient DNA Revolution 139

any associated language expansion was a basically demographic


logic, and thus presumably some process of subsistence
intensification. (For one other possibility, epidemic disease, see
also Heggarty, this volume: §6.3.)
But in what sense could Steppe pastoralism be seen as a
development secondary to farming, or even an intensification of
it? In popular perception (and still in much archaeological
thinking when the Steppe hypothesis was first proposed),
pastoralism and crop farming tend to be set against each other
as radically distinct modes of subsistence, complete with
different implications for culture and society, not least
nomadism vs. sedentism. But this is a widespread
misconception of how farming and pastoralism actually relate
to each other, especially in their origins.
So to see how the two hypotheses for Indo-European
might engage with each other, we must first stress the
clarification outlined in Heggarty (this volume: §1) on the
theme of this special issue, and specifically on the Neolithic
‘Revolution’ — or rather, the extended and gradual ‘transition’
to farming. What really defines this fundamental
transformation in human prehistory is the shift from just
procuring one’s food (by hunting and gathering) to actively
producing and managing it, whether in the form of crops,
livestock, or both together. Indeed, the main cereal crops and
livestock animals were domesticated roughly contemporan-
eously across the same broad region of the Fertile Crescent.
Here, farming or agriculture (taking those synonymously,
irrespective of etymology) emerged as the production of food in
both plant and animal form. At this early stage, many animals
were reared mostly for their meat, rather than for secondary
products like milk or wool.
In this context, the term that best captures the nature of
pastoralism and how it arose is specialisation (e.g. Khazanov
1984). Pastoralism generally emerged not ex novo by a separate
domestication of animals alone, but out of farming, as a
specialisation of it away from crops, as livestock provided an
ever greater share of the diet. So chronologically too,
pastoralism is secondary to — i.e. arose later than — agriculture
in general. Farming had begun as a combination of both plant
and animal domesticates; only thereafter did the specialisation
140 Paul Heggarty

towards livestock arise. Moreover, to recall the importance of


the environmental context for any subsistence mode, this
specialisation was most likely in those ecologies where
pastoralism was most viable and advantageous with respect to
mixed crop and livestock farming. The vast grasslands of the
Steppe offer just such an environment. (As pastoralism itself
spread, sometimes beyond where farming had yet reached, in
those areas it could constitute the first known form of food
production. That is no evidence that it had developed there ex
novo, however, rather than just arriving from elsewhere, having
first developed out of farming.)
The Indo-European debate has all too rarely asked the
question of where the pastoralism that emerged on the Steppes
came from in the first place. Where did its ultimate origins lie?
The earliest archaeological dates for farming across different
regions piece together into maps (e.g. Balaresque et al. 2010:
Fig. 1A, Bellwood 2005: Fig. 4.1, Broushaki et al. 2016: Fig. 1)
that show it spreading, over several millennia, in multiple
directions out of its origins in the Fertile Crescent. This
included farming first reaching the Steppe. Whether it did so
primarily or uniquely via either the Balkans or Caucasus
remains debated, although Haak et al.’s (2015) latest genetic
data incline to the latter (see §2.8 below). In either case, the
ultimate origin is not in doubt: by whichever route(s), it was
out of the Fertile Crescent that food production first arrived on
the (Pontic) Steppe.
Once it reached that ecological zone, so well suited to
grazing, it is hardly remarkable that farming gradually
specialised towards pastoralism, and proved so successful there,
especially with the domestication of the horse. Ultimately, the
specialisation went so far that many Steppe populations became
almost purely pastoralist — although that was not yet entirely
the case for the region and time-depth of the Steppe → Corded
Ware movement. As Anthony (2013: 16) puts it: “Yamnaya
herding communities west of the Don River … were
occasionally tethered to small fortified settlements where some
agriculture has been found. It was probably these western
Yamnaya communities that migrated into the Danube valley
and central Europe.”
Indo-European and the Ancient DNA Revolution 141

2.7 Subsistence in North-Eastern Europe: Niches and


Intensification?
That description of a subsistence regime also fits the
Corded Ware culture of northern Europe, for although
traditionally presented as predominantly pastoralist, recent
findings report crop farming and settlements too. Haak et al.’s
(2015) genetic results in effect suggest that around 4500 BP this
part-crop, (large) part-pastoralist economy had become highly
successful in demographic terms, and not just in the Steppe
ecology but also in the steppe-like Great Hungarian Plain, and
across the North European Plain. Nor is it any great surprise
that this net impact came in the cooler climes of north-central
and north-eastern Europe, where early farming had hardly been
uniquely successful and dominant in any case, certainly far less
so than in the Mediterranean. Recalling one of the key
qualifications to the farming/language dispersal logic, relative
demographic success depends very much on the species and
technologies that make up any particular subsistence package,
and their suitability or otherwise to local ecology. In the north-
easternmost regions of Europe, farming was in fact not so well
established until the Corded Ware period itself (Balaresque et
al. 2010: Fig. 1A, Bellwood 2005: Fig. 4.1, Broushaki et al. 2016:
Fig. 1). Recent aDNA and isotope analyses by Bollongino et al.
(2013) document that in north-central Europe, the first farmers
had by no means replaced hunter-gatherer communities; rather,
the two continued to co-exist for over two millennia.
Estimating population densities in prehistory is a
notoriously difficult and imprecise art, but these multiple lines
of evidence together suggest that first farmers were by no
means especially numerous across northern Europe, and
considerably less densely settled than in Mediterranean Europe.
This offers a plausible explanation for how, into this context in
northern Europe, migrants intruding from the Steppe could
have achieved considerable demographic success relative to the
existing population, around 4500 BP. The incomers’ mixed but
predominantly pastoral subsistence package, supported notably
by the domesticated horse, was new to this region, and thus
perhaps well placed to profit from environmental contexts
hitherto under-exploited here. Along with Bollongino et
al.’s (2013) findings on farmers and foragers, the archaeology of
142 Paul Heggarty

the Globular Amphorae and Corded Ware cultures alongside


each other has been taken to attest to distinct farmer and
pastoralist communities co-existing for centuries, too.
In short, the arrival of a heavily (but not exclusively)
pastoralist subsistence regime can indeed be seen as a form of
intensification, with significant demographic impact, in these
parts of northern Europe. More widely, though, that still defines
it as a secondary expansion within the overall spread of food
production, ultimately out of the Fertile Crescent.

2.8 Yamnaya: Genetic and Linguistic Lineages


In what sense could this secondary relationship be
reflected on the genetic level, though? Who were these
Yamnaya Steppe populations, and what were their own origins?
We have already seen how Jones et al. (2015) reconfirm Haak et
al.’s (2015: 207) findings that they “descended not only from the
preceding eastern European hunter-gatherers [EHG], but also
from a population of Near Eastern ancestry”. This Near Eastern
CHG/EFC component is highly distinct from the one that spread
westwards into Europe, however (EEF), hence the presumption
that the CHG/EFC component spread to the Steppe via the
Caucasus, not the Balkans. On reaching the Steppe, this
incoming Near Eastern component encountered an original
local EHG (“eastern European hunter-gatherer”) ancestry, and
admixed with it to produce the Yamnaya population. The
chronology also fits with the archaeological date-maps for
when farming first reached the Steppe, out of the Near East,
and the route via the Caucasus suggests a correspondence with
the early farming culture of Maikop, on the plains to the north
of the Caucasus mountain ranges.
Linguistically, too, who were these people, and what
would a ‘secondary’ expansion mean? Within the farming
hypothesis, the Steppe → Corded Ware migration would
correspond not to all Indo-European in Europe, but only to a
subset of it. The family’s ultimate homeland would be where
farming began, and its expansion from there would remain the
first, primary movement responsible for the initial dispersal of
Indo-European. This would include some branch(es) spreading
into southern Europe, not least those ancestral to Greek and
other early Balkan lineages, plausibly also Italic and perhaps
Indo-European and the Ancient DNA Revolution 143

Celtic. Another spur, meanwhile, would be the one that


ultimately took farming to the Steppe. As just noted, for this
spur Haak et al.’s (2015) data support a route not via the
Balkans but via the Caucasus. This does not accord, then, with
Renfrew’s (1987: ch. 8) “Hypothesis B” of a single major
expansion route, first through the Balkans and then with a
(Balto-Slavic and Indo-Iranic) branch looping back eastwards
via the Steppe. Rather, a Caucasus route is more in line with the
original “Hypothesis A” formulation, in which Indo-European
spreads both westwards and eastwards out of the northern arc
of the Fertile Crescent.

2.9 A New ‘Hypothesis A2’ of Indo-European Origins


Still, that Hypothesis A now needs explicitly to add a
particular secondary movement, to give what might thus be
dubbed the ‘A2’ scenario. As well as the main westward and
eastward movements originally proposed, this new scenario has
an additional spur that spread northwards, just as farming did
(Balaresque et al. 2010: Fig. 1A, Bellwood 2005: Fig. 4.1), via the
Caucasus. This took it first through the ‘Armenian highlands’
of north-eastern Anatolia (Armenian being a primary branch of
Indo-European, of course) and the nearby region home to the
CHG samples. (Geographically, this recalls parts of a third, ‘Near
Eastern hypothesis’ of Indo-European origins, but set into a
more recent time-frame, after the spread of farming: see
Gamkrelidze & Ivanov 1995.) Linguistically, hypothesis A2
needs to presume that these Near Eastern farming populations
must effectively have skirted around the most mountainous
parts of the Caucasus, to explain why those were left to the
three small, non-Indo-European language families still spoken
there to this day. The archaeological and genetic data do seem
to offer clear indications that such a route was indeed taken,
and by enough people to make a significant impact on the
Steppe, both culturally and genetically.
The A2 spur would thus have brought its form of Indo-
European speech north onto the Steppe, where the farming
practised by its speakers in time specialised into a primarily
pastoralist subsistence package. Once fully developed, thanks
not least to the domesticated horse and wheel technology, this
successful new subsistence regime facilitated the later,
144 Paul Heggarty

secondary expansion of the A2 form(s) of Indo-European into


the Corded Ware region of northern Europe. That is, this A2
spur, and the first Indo-European on the Steppe, would have
spoken sub-lineage(s) of Indo-European deeply ancestral to
those whose speaker populations today register the highest
proportions of steppe ancestry: above all, Balto-Slavic.
Not that Indo-European speakers are alone in showing
such high Steppe input (§2.4), just as the first incoming farmers
(ultimately from the Near East) were not the only population on
the Steppe. Rather, on arriving here they encountered a local
“eastern European hunter-gatherer” (EHG) population, with
reference samples as far east as the Urals. These two
components came near to balance in the admixed genetic
profile of the Yamnaya Steppe populations (and archaeological
data support a continuing major role in the diet for fishing).
What did the local EHG groups originally speak? The obvious
candidate would be the other language lineage spoken by
populations with extremely high proportions of Steppe ancestry
in Haak et al.’s (2015) figure 3: Uralic.
So to paraphrase Haak et al. (2015), we have here precisely
“a theory of a steppe origin of some of the Indo-European
languages of Europe” (and of the Uralic ones too), that is
compatible with their aDNA data. Only … it is not the Steppe
hypothesis at all, but the A2 variant of the farming hypothesis.

2.10 What Place for Germanic and Celtic?


Within Indo-European, meanwhile, alongside Baltic and
Slavic the other branch with high Steppe ancestry is Germanic.
A prima facie interpretation would thus be that the sub-lineage
ancestral to Germanic also reached northern Europe not
directly out of Anatolia through the Balkans, but as part of the
A2 spur and then the Steppe → Corded Ware movement. Again,
however, the details are less clear-cut. Of the Germanic-
speaking populations in Haak et al.’s (2015) figure 3, none
(curiously) are from continental Europe. Rather, in order of
declining Steppe ancestry, they are: Norwegian, Icelandic,
Scottish, English and Orcadian. That is, all are either from
Scandinavia itself, or from regions heavily settled from there.
Indeed the highest Steppe ancestry rating of all is for
Norwegians, followed by populations from the Baltic coast
Indo-European and the Ancient DNA Revolution 145

facing Scandinavia: the (Baltic-speaking) Lithuanians and


(Uralic-speaking) Estonians.
On standard thinking, the expansion of Germanic itself
began only c. 2500 BP, out of a homeland then limited to
southern mainland Scandinavia. Until then, Germanic proper
would not have been spoken elsewhere. Germanic expansion
through the rest of Scandinavia has been at least in part into
regions previously Uralic-speaking. In continental northern
Europe, meanwhile, it was only as recently as the mediaeval
Ostsiedlung that eastern parts of modern Germany, and former
German-speaking regions further east, became Germanic-
speaking, having previously spoken Slavic (as does the Sorb
minority to this day) and Baltic (e.g. Old Prussian). Historical
records, and sources such as toponymy and surnames, suggest
that a good part of the spread of Germanic around the Baltic
proceeded also by local populations switching to Germanic,
having hitherto spoken Baltic, Slavic and even Uralic languages.
An alternative interpretation is possible, then, of the high
Steppe ancestry in modern Germanic-speaking populations.
The Indo-European lineage ancestral to Germanic might have
been not itself part of the incoming Steppe → Corded Ware
movement, but the speech of a population already established
in southern Scandinavia. These would have descended in part
from early farmers, but were also doubtless part
foragers/fishers too (adapting to the best subsistence balance
‘on offer’ from the local ecology in these cooler regions). From
c. 2500 BP onwards, the Germanic language lineage spread
extensively into regions speaking Slavic, Baltic and Uralic, in
part by local populations shifting language to Germanic, but
retaining their steppe ancestry signal. That shift is at least part
of the genetic story, either way, as evident from the eastward
German-speaking expansion in historical times.
A linguistic perspective on this alternative scenario would
note that Germanic does not share in the putative ‘eastern’
Indo-European features (such as the ‘ruki’ rule) found in Balto-
Slavic and Indo-Iranic. Indeed, the position of Germanic within
the Indo-European family tree is disputed and problematic, not
least for being in some senses intermediate between Balto-
Slavic, Italic and Celtic (Ringe et al. 2002: esp. 85-92). Germanic
even betrays characteristics that seem otherwise foreign to all
146 Paul Heggarty

Indo-European (although see Salmons, this volume). In short,


the prehistory of the Germanic branch continues to pose a
puzzle. The linguistic data do not yet seem clear enough, and
the new genetic data still not complete enough, to resolve it.
They do, though, pinpoint a new, specific question within that
puzzle: was the Indo-European sub-lineage ancestral to
Germanic part of the Steppe → Corded Ware migration, or was
it not?
A similar question arises for the Celtic branch. In Haak et
al.’s (2015: Fig. 3) populations with high steppe ancestry today,
‘Scottish’ and ‘Orcadian’ may in part reflect populations once
linguistically Celtic, although by no means exclusively
(alongside Norse, Pictish and Anglo-Saxon components). More
informative are Cassidy et al.’s (2016) even more recent results
from three Bronze Age samples from Rathlin Island off
Northern Ireland (in the North Channel to Scotland). These
show only 6-13% of the CHG/EFC component that makes up
about 40% of the Yamnaya, and 20% of the Corded Ware
samples. For the authors this somewhat tentatively “invites the
possibility of accompanying introduction of Indo-European,
perhaps early Celtic, language” (Cassidy et al. 2016: 368). But
the proportion of Steppe ancestry, although “substantial”, is still
a clear minority (up to at most a third, see §3.1 below), so it
hardly settles the issue of whether this would have been
enough to effect language spread. The samples date from 4000
to 3500 BP, earlier than the traditional chronology for when
Celtic reached Britain and Ireland, and Brythonic/Goidelic
divergence began. The one (late) Neolithic Irish sample
reported is predominantly first farmer, meanwhile.
In sum, Germanic and Celtic well illustrate the two main
levels on which a series of outstanding questions remain.
Firstly, how do genetic data fit with the (disputed) branch
structure of Indo-European? And secondly, exactly how
significant was the demographic impact of the Yamnaya →
Corded Ware movement?

2.11 Genes, Migrations, and the Structure of the Indo-


European Family Tree
In judging whether the Celtic branch should be associated
with the Irish Bronze Age samples, for instance, and whether
Indo-European and the Ancient DNA Revolution 147

the limited Steppe input in them is enough to take that


association ultimately all the way back to Yamnaya, it makes a
difference whether Celtic does or does not form a clade with
Italic. If it does, then assessing the Celtic question needs to
consider the degree of any Steppe input into putative Italic-
speaking populations too. If not, then the origins of each branch
can be considered much more independently of each other.
Anthony & Ringe (2015: Fig. 2) take a very strong
interpretation of a sequence of migrations, each of which
corresponds to an individual branch in one putative tree
structure for Indo-European. Given the scale of uncertainty and
disagreement about the higher-order branching of Indo-
European, however, it seems wise to remain open to a range of
possibilities, with an eye to the scenarios more plausible given
the genetic and archaeological data too. Anthony & Ringe’s tree
is in fact partly in conflict even with Ringe et al. (2002: Fig. 7)
and Nakhleh et al. (2005: Fig. 12), for Greco-Armenian,
Germanic and Albanian. And within their tree, the last node to
break up is one that unites Balto-Slavic and Germanic with
Indo-Iranic. That is controversial enough linguistically, and
curious also in the light of the genetic data, for uniting
populations with respectively the highest steppe ancestry, and
almost none. There is no wide agreement on any of these
specific tree configurations for Indo-European, in any case;
many alternative proposals have Indo-Iranic branching off
much earlier.
The place of Tocharian is another perennial conundrum,
to which aDNA emerging from the central and eastern Steppe
adds an intriguing new perspective. Allentoft et al. (2015: 169,
171) report aDNA from “the enigmatic Sintashta culture near the
Urals” that actually includes a (limited) “Early European
Farmer” component, i.e. taken to have originated far to the
west, and to have leapfrogged eastwards over the Yamnaya
samples, which lack it. They thus suggest another migration, all
the way from Europe to the eastern Steppe, and associated
there with Tocharian. Ever since the surprise of its discovery,
and of its ‘western’ centum rather than ‘eastern’ satem status,
Tocharian has always appeared so out on a limb, both
geographically and linguistically, as to call for some long-range
migration in any case, opening up a range of possible origin
148 Paul Heggarty

points for the (western) first farmer component in Sintashta.


Allentoft et al. (2015) also fall in with a presumption of the
Steppe hypothesis, but if anything that sits uncomfortably with
their own data and interpretation that eastern Steppe cultures
and genes represent Tocharian-speakers. For the Steppe
hypothesis has traditionally looked to these cultures as the
source of the Indo-Iranic branch. And that bears no close
relationship to Tocharian within the Indo-European tree,
despite their common status as the family’s easternmost
branches. Nor does there seem to be any close relationship in
genetic terms, between the aDNA of these (presumed) speakers
of Tocharian and modern Indo-Iranic populations (see §3). This
fits more plausibly with these two branches having followed
two very different routes eastwards out of the Indo-European
homeland. Put simply, Tocharian would have taken a route
north of the Caspian and Himalayas (i.e. via the Steppe), and
Indo-Iranic to the south — as per hypothesis A2.

2.12 Just How Big a Steppe Impact? Just How Widespread?


The big novelty in the ancient DNA results from Europe is
that they point to a second demographic wave in the Bronze
Age more significant than hitherto imagined by supporters of
all hypotheses. Not surprisingly, Haak et al. (2015) are keen to
stress just how significant this second wave was. As we have
seen, though, the apparent clarity in the picture they present in
fact turns out much more complex and less clear-cut, and more
ambivalent between the two hypotheses. The high proportional
impact refers above all to just the Corded Ware samples. So far
these are relatively few, and it is not entirely clear how
representative they are. There is significantly less Steppe
impact in other nearby and roughly contemporaneous
populations, including those associated with the Globular
Amphorae culture. Nor do all genetic data give the same
signals, or in the same strength. In mitochondrial DNA (passed
down only in the female line), farming remains clearly the
bigger impact demographically (Brandt et al. 2013). Above all,
when Allentoft et al. (2015: Figs. 2b, 3b) report a collection of
samples overlapping significantly with those of Haak et al.
(2015: Fig. 2b), and conduct the same ADMIXTURE analysis of
ancestry components, at the same K value of 16, they
Indo-European and the Ancient DNA Revolution 149

nonetheless find some quite different results, not least a


generally lower measure of Yamnaya impact on northern
Europe. In short, we are still rather far from the full coverage of
all regions in all periods that is needed to provide greater clarity
on even just the genetic story of the population history of
Europe, let alone all other regions central to the Indo-European
question.

3. Update: From the Fertile Crescent to India


Indeed to go beyond Europe, and update the Indo-
European question with the very latest aDNA findings, an
addendum section is now needed here. Just as this paper was to
be submitted in final form, a batch of three articles appeared:
Lazaridis et al. (2916), Broushaki et al. (2016) and Gallego-
Llorente et al. (2016). Together they present new aDNA results
from the broad region of the Fertile Crescent, particularly its
eastern arm in what is today western Iran, and across a wide
time-transect. The papers concentrate heavily on the genetic
data, and limit discussion of language lineages to just one or
two short paragraphs. Nonetheless, the data and analyses do
bear directly on Indo-European origins, especially as regards
our theme here, of farming (vis-à-vis Steppe pastoralism).
Firstly, the new papers all report that at the time when
farming first arose, populations in different parts of the Fertile
Crescent still retained markedly distinct genetic signatures,
reflecting minimal interaction since their ancestries had first
diverged, many millennia earlier. This will come as little
surprise to archaeologists, many of whom have long argued
that farming here had not a single origin, but came together out
of several components from different, smaller source areas
within the broad arc of the Fertile Crescent. The latest aDNA
data seem highly compatible with that scenario — which
already has significant implications for the farming hypothesis
for Indo-European, as we shall see in §3.3 below.
Secondly, the new aDNA data feed into further analyses of
what role past populations, both south of the Caucasus and
north of it on the Steppe, may have had in shaping the genetic
make-up of modern populations not just in Iran but further east
all the way to India, i.e. across the full span of the Indo-Iranic
branch of Indo-European. Not that there are any ancient DNA
150 Paul Heggarty

results to report from India itself yet, which continue to elude


us. Much still has to be left to modelling, which opens up scope
for different starting assumptions and selections of candidate
source populations (and how many to include). This is how,
even from largely similar aDNA data, these most recent papers
come to some contrasting calculations and interpretations on
the issues central to the Indo-European question in the east.
A first basic result is the identification of which modern
populations are genetically closest to the ancient DNA of the
eastern Fertile Crescent. As Broushaki et al. (2016: 3) put it:
“Early Neolithic genomes from the Zagros region of Iran
(eastern Fertile Crescent) … show affinities to modern day
Pakistani and Afghan populations”, before adding “but
particularly to Iranian Zoroastrians”. More generally, they find
“a strong Neolithic component in [many] modern South Asian
populations”. From this they take it as “probable that the
Zagros region was the source of an eastern expansion of the
South-West Asian domestic plant and animal economy”, and
that there was “a strong demic component to this expansion”.
In linguistic terms, then, modern speakers of Iranic
languages turn out to be genetically close to ancient first farmer
populations of the eastern (but not the western) arm of the
Fertile Crescent. (And Zoroastrianism, since the authors
highlight it, was inseparably associated with the Avesta, its
primary corpus of religious texts, written in the earliest attested
Iranic language, ‘Avestan’.) Indic speakers generally also have a
significant proportion of the same ancestry.
Broushaki et al. (2016) do recognise another ancestry
source, namely the Steppe: “modern Middle Easterners and
South Asians appear to possess mixed ancestry from ancient
Iranian and Steppe populations”. Nonetheless, they immediately
clarify that within that mix, the Steppe input is the minor and
less fundamental one, compared to that from the eastern Fertile
Crescent, which they see as primary. “However, Steppe-related
ancestry may also have been acquired indirectly from other
sources and it is not clear if this is sufficient to explain the
spread of Indo-European languages from a hypothesized Steppe
homeland to the region where Indo-Iranian languages are
spoken today. On the other hand, the affinities of Zagros
Neolithic individuals to modern populations of Pakistan,
Indo-European and the Ancient DNA Revolution 151

Afghanistan, Iran, and India [are] consistent with a spread of


Indo-Iranian languages … from the Zagros into southern Asia,
in association with farming.”
At least for how Indo-Iranic spread eastwards, then, this
follows Renfrew’s original Hypothesis A: together with
farming, by a route south of the Caucasus and the Caspian Sea,
through Persia and to the Indus. (In other words, these new
aDNA data reinforce and expand upon the suggestion already
advanced by Jones et al. 2015.) The farming hypothesis would
still have to address the deep genetic contrast between the first
farmers of the Eastern Fertile Crescent and those of western
Anatolia and the Levant, of course. Indeed that contrast
underlines how the popular name of the “Anatolian” hypothesis
is unhelpfully vague, given that it was always intended to refer
to the Central and Eastern areas of Anatolia, where the Fertile
Crescent arcs through. Meanwhile, Broushaki et al. (2016) toy
also with the claim that even the Dravidian family too might
have taken this route — a yet more controversial aspect of the
most sweeping visions of the farming/language dispersal
hypothesis (Bellwood 2005).
Another of the new papers, however, comes to a very
different interpretation. Lazaridis et al. (2016), from the Reich
group in Harvard, touch on languages only briefly, in a single
paragraph. And as in Haak et al. (2015), they again incline to a
presentation generally favourable to the Steppe hypothesis,
writing of “significant” and “substantial” Steppe impact, as if
close to balance with the impact from the eastern Fertile
Crescent. So although there is consensus that both sources had
at least some input to the population histories of the Middle
East and South Asia, there is no agreement on their respective
proportions, nor on their timing. Those are precisely the issues
on which the Indo-European question hangs, however, so we
now take each in turn, to see in a little more detail how the
different interpretations arise.

3.1 Sources of Inputs: Steppe and/or Fertile Crescent


To start from another point of general agreement, and
recap from §1.2 above, the Yamnaya Steppe population was
itself essentially a combination of two main genetic
components, as shown in all the main published admixture
152 Paul Heggarty

analyses. (Unfortunately, the same components are assigned


different colours from one paper to the next.) In pre-farming
times, i.e. among hunter-gatherers, one of these two
components had been dominant in northern Eurasia, the other
in the Caucasus. In line with the first clear reference samples,
these components were first dubbed EHG and CHG respectively.
The latter is now confirmed as dominant also immediately
south of the Caucasus, in the samples from Neolithic Iran, alias
the eastern Fertile Crescent. As noted above, then, for our
specific purposes here, to contrast hypotheses on Indo-
European origins, it is more relevant to refer to this as an
eastern Fertile Crescent (‘EFC’) component.
The standard interpretation of the admixture in the
Yamnaya Steppe population is that it had come about when the
component from the eastern Fertile Crescent spread (via the
Caucasus) onto the Steppe, to admix into the north Eurasian
(EHG) component hitherto all but exclusive there. By the Early
Bronze Age, Yamnaya aDNA had thus acquired about 40%
eastern Fertile Crescent ancestry, with the remaining 60% still
north Eurasian.
These are the two components that both appear strongly
in a DNA samples from the Corded Ware culture in north-east
Europe. Not only that, but they resurface in proportions relative
to each other that closely reproduce their ratio on the Steppe:
only slightly less eastern Fertile Crescent than north Eurasian,
despite the latter having already been present in north-eastern
Europe. It is this that gives the strong interpretation of a major
incursion and indeed replacement of populations here, by
newcomers from the Steppe.
As foreshadowed in §1.2, however, in modern populations
from Iran to India, the Steppe ratios are not reproduced at all.
Whereas Yamnaya had c. 60% north Eurasian to 40% eastern
Fertile Crescent components, in the Middle East the latter,
‘local’ component remains very dominant over any limited
traces of the former.
Further east, particularly in India, the earlier ‘local’
component is a different one, at proportions that end up at well
over half, especially in southern, Dravidian-speaking regions.
All analyses seem to agree that the prima facie candidate for
having brought Indo-European languages into India, from
Indo-European and the Ancient DNA Revolution 153

further west, is instead the ‘non-local’ remainder, generally


found at higher proportions in northern, Indo-European-
speaking India. This is the so-called ‘Ancestral North Indian’
signature — although it is best seen as intrusive to India,
through the north. The critical question for the Indo-European
debate is how much of this signature can be attributed to the
respective sources proposed by the competing hypotheses: how
much to the eastern Fertile Crescent, and how much to the
Bronze Age Steppe?
As in Iran, so too within this intrusive ‘remainder’ in
India, the eastern Fertile Crescent component is not smaller
than the north Eurasian one (as it is on the Steppe, by about
40:60), but several multiples greater, generally between 3 to 6
times greater in the admixture analyses. That is, despite it being
the main component in the Yamnaya Steppe samples, the north
Eurasian component is found only in low to very low
proportions from Iran to India. All of this lies behind Broushaki
et al.’s (2016) interpretations against a primary Steppe source,
and in favour of Indo-Iranic languages having been carried
eastwards with farming, and with people from the eastern
Fertile Crescent.
Lazaridis et al. (2016) work from aDNA data that are largely
the same, and agree that ‘Ancestral North Indian’ includes
components from both “early farmers of western Iran and …
people of the Bronze Age Eurasian steppe”. How is it that they
nonetheless come to a very different interpretation, of a much
more even balance between the two components, and a much
stronger role for the Bronze Age Steppe as source? As Lazaridis
et al. (2016: 9) put it: “all sampled South Asian groups are
inferred to have significant amounts of both ancestral types …
The demographic impact of steppe related populations on South
Asia was substantial”. (The phrasing “steppe related
populations” is not quite “steppe populations” proper, however,
so it is unclear quite what to make of this.) They also cite
figures for selected individual modern populations for which
their analysis gives a particularly high proportion of Steppe
origin, considerably higher than those implied by the admixture
analyses. They prominently mention the Kalash of Pakistan, yet
they are well known as “an isolated population” (Falush et al.
2016) whose genetic origins are hotly debated, and not safely
154 Paul Heggarty

representative of the region (they are also culturally highly


distinctive).
A first crucial difference in presentation and interpretation
is that Lazaridis et al. (2016) focus on inputs to South Asia. They
do not mention any substantial Steppe input to most
populations in the more westerly, Iranic half of Indo-Iranic, for
which Broushaki et al. (2016) report such strong continuity with
first farmers in the Eastern Fertile Crescent. The other main
differences are in the analyses used. Lazaridis et al. (2016: 9)
switch to admixture results different to most previous papers
(K=11, rather than K=16), and to new modelling based now on
four selected source populations. This difference between
admixture results and modelling based on selected candidate
source populations is reminiscent of contrasting analyses of the
Steppe impact in Europe (§2.12), including the first aDNA from
Ireland (§2.10). There, Cassidy et al. (2016: 370) take three
different approaches to estimating the “proportion of Yamnaya
to Middle Neolithic ancestry in each Irish Bronze Age sample”,
and they yield significantly different results. Comparing
proportions of ancestry components in admixture gave
relatively low estimates of “14–33% Yamnaya ancestry” vs. a
heavy majority (67–86%) of Middle Neolithic ancestry. Yet
taking the Reich group’s methods as applied by Haak et al.
(2015), to estimate admixture proportions from selected
candidate source populations, gave Cassidy et al. a much higher
result for Yamnaya ancestry: 31-47%, where the lower bound
only just overlaps with the higher bound from the admixture
analysis. Similar differences seem now to emerge from
alternative analyses of the new aDNA data from India, hence the
divergent views on the relative significance of the Steppe input
to populations there.

3.2 Timing of Inputs: First Farming, Bronze Age, or Historical


Times?
The other main dimension on which to judge the
plausibility of the rival hypotheses is that of chronology.
Notwithstanding the disagreements on its scale, at least some
north Eurasian component is found in modern populations even
as far as India. And if not from the Steppe in the Early Bronze
Age, or not uniquely then, when and how else could the north
Indo-European and the Ancient DNA Revolution 155

Eurasian component have come south into modern populations


from Iran to India?
An interesting new perspective, not highlighted by the
latest aDNA papers, but significant for the Indo-European
question, emerges in the admixture analysis by Gallego-
Llorente et al. (2016: Fig. 1c). Here, the north Eurasian
component does in fact appear at a low proportion (12%) far
south of the Steppe, in the Satsurblia and Kotias samples from
Georgia, which date to before even the Neolithic, let alone any
Bronze Age Steppe expansion. This in principle diminishes the
role that a later Steppe source would need to play, to account
for low proportions of this component this far south. More
clarity is needed here, however, for the same component is not
found significantly in the few samples available thus far of
early farmers south of the Caucasus.
There is in any case a more obvious timeframe for the
main southward incursion of any genetic input from the
Steppe: not before the Yamnaya Early Bronze Age, but long
after. From around the turn of the millennium of AD 1000, the
history of Iran, India and all lands in between is one of
invasions and conquest from the Central Asian Steppe, followed
by centuries of rule by their ‘Turko-Mongol’ dynasties, not least
the Timurid and Mughal Empires. This necessarily brought at
least some genetic input from the Steppe. It was far too late to
have brought Indo-European languages, however, already long
established as by this stage ‘indigenous’ from Iran to India. The
incoming Steppe peoples originally spoke non-Indo-European
languages, particularly of the Turkic and Mongolic families
(‘Mughal’=‘Mongol’). Ultimately, they adopted much of the
culture of Persia, and its language — another of the precedents
that historical, pastoralist invaders out of the steppe generally
lost their non-Indo-European native languages, in favour of
Indo-European ones spoken by the majority, farming
populations that they conquered and ruled (see Heggarty this
volume).
For the farming hypothesis (A), these relatively late Steppe
incomers would be the bearers of much of the limited northern
Eurasian genetic component into populations from Iran to
India. They failed to replace the local Indo-European speech,
brought much earlier by the population that spread here with
156 Paul Heggarty

farming, out of the eastern Fertile Crescent, along with the


corresponding EFC genetic component. That is still dominant in
Iranian populations, as it was within the ANI mix that entered
India when farming eventually crossed the watershed from the
Indus, into and along the Ganges valley, beginning in the fifth
millennium BP.
This chronological question recalls the similar one in
Europe (§2.2). There too, a rich historical record documents
impacts (Rome, the Great Migrations) that can be presumed to
have brought a genetic input from northern Europe — and thus
in part ultimately Yamnaya — into Mediterranean Europe. Indo-
European languages, however, had already been established
there long before. So the question is whether much earlier
movements, in the Early Bronze Age, had already had such
impacts on Mediterranean Europe as to explain Indo-European
languages there. The same question as is now to be asked for
lands from Iran to India, too.
As for how we might tell, the solution is also similar. We
must await aDNA from a full time transect of all periods, from
before the Neolithic up to the present day, and with a much
denser geographical spread. Unfortunately, the challenge is
similar, too: conditions for the preservation of aDNA are
generally poorer in the hotter, drier, more southerly latitudes
than they are further north. As these obstacles are overcome,
and the still patchy availability of ancient samples is
progressively filled out, that should help resolve the apparently
contradictory interpretations from broadly the same aDNA data.
But until then we are left with modelling and analyses that rely
on different assumptions and choices of putative source
populations, hence the competing interpretations. We have had
to sketch those differences in analysis only superficially here,
and summarily across various papers by different research
groups with aDNA data samples that do not fully overlap. Often
the differences come down to technical questions to be argued
out not here, in any case, but among geneticists and specialists
in the computational modelling.

3.3 Genes, Languages and Culture among the First Farmers


For our purposes here, we instead need now to start
stepping back, to sum up the overall picture emerging from
Indo-European and the Ancient DNA Revolution 157

ancient DNA. At the broadest level, that picture is of a basic


contrast. There is an agreed, clear signal of Yamnaya Steppe
impact westwards into Corded Ware north-eastern Europe. But
to the east, in the Indo-Iranic sphere, no such signal is seen
with remotely the same clarity or scale. That contrast in
outcomes allows no prima facie presumption that the same
source, in the same period (i.e. the Steppe in the Early Bronze
Age) could account for both.
On the other hand, the other basic finding from ancient
DNA is just how genetically distinct were the first farmer
populations in different parts of the Fertile Crescent, who then
carried broadly the same farming package further afield, in
multiple directions. That hardly fits well with a simplistic view
of a monolithic farming/language dispersal either.
In short, the most general lesson from ancient DNA data is
that it does not offer full support to either of the main
hypotheses, in the simplest and most straightforward terms in
which they are often presented. If a definitive answer to the
Indo-European question has eluded us for so long, it may be
precisely because the reality was more complex than either
‘clean’ hypothesis, neither of which is uniquely right. Certainly,
the ancient DNA revolution is beginning to make it seem so.
Genetics is not the only game in town, however. How
might its fast-changing picture best fit with the other
disciplines, into a coherent and comprehensive Indo-European
prehistory? This brings us back to the provisos in §1.1. They are
needed above all to dispel two superficial but common
confusions, sometimes imagined to undermine more complex
hypotheses that combine different parts of the main opposing
hypotheses (e.g. the A2 version sketched out above, or other
possible ‘hybrid’ scenarios).
The first confusion presumes that two deeply distinct
genetic components could not plausibly be associated with the
same linguistic lineage, not even geographically very different
branches of it. Specifically, this logic would have it that the EEF
and EFC components (dominant in Greece and Iran respectively,
for example) could not both be associated with Indo-European,
because they diverged genetically tens of millennia before
Proto-Indo-European language did — as if a genetic split date
necessarily implies the same time-depth for divergence in
158 Paul Heggarty

linguistic lineages. The second confusion presumes, conversely,


that the EEF component cannot be associated with Indo-
European, because it is also dominant in (Mediterranean)
populations whose ancestors putatively spoke other, now
extinct languages that were possibly, probably or definitely not
Indo-European.
These objections do not stand up, however, in the face of
some genetic and linguistic facts of life. As already noted (§1.1),
genetic and linguistic lineages do not by any means necessarily
match one to one in practice. Populations of north-eastern
Europe who share near-identical genetic profiles (those highest
in Yamnaya Steppe ancestry) nonetheless speak languages of
more than one linguistic lineage, both Indo-European and not
(i.e. Uralic). Meanwhile, speakers of ‘the same’ Indo-European
language lineage correspond to populations with genetic
components that are different, or in very different proportions,
from northern Europe to southern Europe to Iran to India.
To repeat from §1.1, then, there is no expectation of any
exclusive, one-to-one association between linguistic lineages
and genetic ones. So neither the spread of farming, nor the
Yamnaya → Corded Ware incursion, need have carried just a
single language lineage. And a single language lineage like
Indo-European need not have a unique, clear-cut genetic profile
— as it clearly does not. So hypotheses do not stand or fall on
simplistic one-to-one associations between languages and
genes. On the contrary, they stand or fall on how well they
bring together all aspects of a more complex reality: those in
which linguistic and genetic lineages do match, along with
those in which they do not (see Heggarty 2014). And they stand
or fall on how plausibly those patterns of (dis)association reflect
any contrasting demographic and socio-cultural contexts visible
in the archaeological record.
It may at first sight seem surprising that such a
transformative phenomenon as farming could have arisen
among, and then been spread by, populations that genetically
were so distinct. But so it was — the evidence now seems
incontrovertible. Indeed on closer inspection of how farming
developed in the Near East, any story of a single, monolithic
cultural, linguistic and genetic entity does not seem plausible.
That has long been widespread thinking in archaeology, which
Indo-European and the Ancient DNA Revolution 159

the very latest aDNA results only reaffirm. Domestication began


independently in at least five different clusters of sites
throughout the Fertile Crescent, where “the crops were
different and … cultural and agricultural developments …
differed strongly from one cluster to another” (Willcox 2013:
39). The same goes for animal domestication (Zeder 2008). The
“hilly flanks” of south-eastern Anatolia were one of these
geographical clusters, but only one, and the Zagros in Iran were
another. The rise of farming in the Fertile Crescent was a
gradual overlapping and coming-together, drawn out over a
few millennia, of different domesticated species and practices
from multiple sources.
What does this mean for how language might fit into the
puzzle? Again, it entails no expectation that farming, with its
own disparate origins, should be associated with a single
cultural or linguistic identity. (Early writing in the Near East
has always attested several distinct linguistic lineages:
Sumerian, Semitic, Elamite, etc..) But how could that fit with
the farming/language dispersal hypothesis, and its basic
“subsistence/demography” model of language spread? That
follows the logic that since farming can support much higher
population densities from the same land area than can hunting
and gathering, farmers should simply outpopulate hunter-
gatherers, and spread by ‘demic diffusion’. Broushaki et al.
(2016: 3) support this for the eastward spread of the eastern
Fertile Crescent component, for example. But then why would
the same component not have spread demically with farming in
all other directions, too?
The answer to the apparent contradiction is that the net
demographic advantage applies powerfully only in prototypical
contrasts where already developed farming comes up against
hunting and gathering. The first transition towards incipient
food production was so protracted, taking tens of centuries to
accumulate only piecemeal out of multiple sources, that no one
such source would have enjoyed particular, immediate
demographic advantage over the others. Between rough
demographic equals, this initial coalescence stage presumably
proceeded largely by cultural not demic diffusion — hence the
aDNA evidence of how distinct their genetic signatures
remained. And within that process of diffusion mostly on a
160 Paul Heggarty

cultural level may also have been some exchange of language.


That is: of the various genetically distinct populations that
came to share farming, some might have come to share one of
their languages too.
At length, the multiple sources did come together into a
fuller, richer farming package. Indeed it was only this
coalescence that endowed farming with its full expansive
potential, on both the geographical and demographic levels.
The more diverse, flexible and productive the overall package
became, the more viably it could spread out across wider
geographical and ecological ranges, beyond its original clusters.
As this full demographic advantage did eventually take hold, it
benefited each of the emerging first farmer groups, not just one
of them. And just as they all carried broadly the same farming
package, irrespective of their genetic differences, any of them
that during the coalescence phase had come to share a language
would then have spread that same language too, in their
respective different directions out of the Fertile Crescent.
The Fertile Crescent arcs widely from the Mediterranean
through Anatolia to Mesopotamia, and there is no presumption
that farming spread out of just one of its initial clusters within
that span. Leading advocates of farming/language dispersals
have explicitly proposed separate spreads out of different parts
of the Near East, i.e. not just Indo-European westwards and
eastwards, but also Afro-Asiatic from the Levant into Africa.
Lazaridis et al.’s (2016) aDNA from the Levant reveals another
genetically distinct first farmer population here too. Into
Europe, meanwhile, archaeological evidence suggests a
dispersal not just overland, but also, largely independently,
through the Mediterranean. (One early cluster of domestication
was on Cyprus.) The overland route could have taken
predominantly one language lineage (Indo-European?), the sea
route one or more others, perhaps ancestral to the non-Indo-
European languages that survived into fragmented inscriptional
records around the Mediterranean.
In sum, various distinct genetic ancestries all go back to
(different parts of) the Fertile Crescent, and spread from there
in the same general directions as food production did.
Westwards, the predominance of ‘Early European Farmer’
ancestry (EEF) remains high in Mediterranean Europe to this
Indo-European and the Ancient DNA Revolution 161

day. Eastwards in Iranic and Indic speakers, there is instead a


high proportion of CHG/EFC. (That component also spread
northwards to the Caucasus and onto the Steppe, admixing
there with EHG before that mix was later carried to northern
Europe.) To underline just how distinct the EEF and CHG/EFC
lineages were, the mid-range estimate for the split between
them is set by Jones et al. (2015: 2, Fig. 2b) at c. 24,000 BP, i.e. the
Late Glacial Maximum, hinting thus at what may have isolated
these populations from each other. Confidence intervals on
such analyses are very broad, however, and extend to as late as
11,400 BP, when steps towards the first domestications were
already underway as conditions changed at the start of the
Holocene. Broushaki et al. (2016), meanwhile, estimate an even
earlier time-range of 44,000-77,000 BP, although not from
exactly the same samples. Obviously, such genetic split dates
are far deeper than the linguistic divergence of Indo-European,
but it is naive to presume that the two have to match in order
for the farming hypothesis for Indo-European to hold. That
requires only a fit with the date of dispersal of farming, not that
its speakers must have been genetically homogenous at that
time. As already observed, significant mismatches between
genetic and linguistic lineages, not least across the diverse
speakers of Indo-European, are interdisciplinary facts of life
that all hypotheses have to live with — and account for.

4. Compromise and Progress?


To close, let us sum up what the latest aDNA data imply for
the Indo-European question, specifically in the light of how
food production first arose in the Fertile Crescent and then
spread further afield — including onto the Steppe, where it
specialised into horse-based pastoralism. That summary starts,
then, with a recognition that Fertile Crescent farmers and
Yamnaya pastoralists are not polar opposites. Rather, both were
food producers, with pastoralism being largely derived out of
more general farming of both crops and animals. For Indo-
European, this means in principle a possible compromise
between parts, at least, of the logic and argumentation for both
main hypotheses. They are not mutually exclusive in all of their
aspects.
162 Paul Heggarty

Certainly, as the ancient genetic picture is gradually filled


in and clarified, neither hypothesis now appears well supported
in its most clear-cut version: neither the original farming
hypothesis for Europe, nor the Steppe hypothesis for Indo-
Iranic. There seems no support for any monolithic match across
the dispersals of languages, genes and (agri)culture. Instead, the
most plausible overall answer to the Indo-European puzzle is
already looking more complex than either leading hypothesis in
its basic form. One such variant could be a full hybrid that
keeps to first farming as what spread the Indo-Iranic branches
eastwards, but not directly to Europe. Rather, Indo-European
would have spread from the Eastern Fertile Crescent first onto
the Steppe, and from there only later westwards to Europe.
Another variant would be the A2 hypothesis sketched out here
(§2.9).
Indeed, some such combination has the potential to
explain more, and more coherently, than either hypothesis in
its original, exclusive form. The A2 hypothesis associates Indo-
European with the spread of food production (in the widest
sense), but not exclusively so. It does link Indo-European with
expansion out of parts of the Fertile Crescent, but the farming
package had come together piecemeal, and its early dispersals
would have spread other language lineages too, in different
directions. Moreover, A2 associates Indo-European also with a
major secondary development derived out of that first spread,
the predominantly pastoralist subsistence regime on the Pontic
Steppe. This recalls the pattern found in others of the world’s
biggest language families, widely taken to have spread not just
in a single phase, but also in significant secondary phases,
typically intensifications and specialisations supported by new
technologies. In Europe, two main demographic impacts are
observed through prehistory, in the Neolithic and the Bronze
Age, and under the A2 hypothesis both brought some branches
of Indo-European. As for which were which, Greek and other
Balkan branches seem the clearest ‘first farmer’ candidates,
Balto-Slavic the clearest ‘steppe pastoralist’, but for the other
European branches the case remains less clear for now. A2 also
neatly explains the other half of the Indo-European story, by
providing a close fit with the eastern Fertile Crescent
component in Iranic and Indic populations, not compatible with
Indo-European and the Ancient DNA Revolution 163

an origin on the Steppe. And A2 explains even more, in that it


no longer leaves unanswered the question of how and when the
Uralic lineage spread to north-eastern parts of Europe: also as
part of the Yamnaya → Corded Ware movement.
More than two centuries have elapsed since the Indo-
European conundrum was first posed, appropriately enough, by
a European in India: Sir William Jones in Calcutta in 1786. And
although it effectively marked the foundation of linguistics, that
discipline has itself still not found methods to settle the
question conclusively. Today, the prospects have never been
brighter for a resolution at last, thanks not least to huge strides
in archaeology, and now above all to the revolutions sweeping
through population genetics. As ancient DNA data emerge thick
and fast over the coming years, it is tempting to hope that this
revolution may at last be bringing the quest of Indo-European
origins into the end-game — provided that we all strive for a
broader, deeper, more coherent prehistory across our
complementary disciplines.

Since the above text was finalised, the blizzard of major


new publications in ancient DNA has continued apace. Many of
the new papers bear very directly on the Indo-European
question, so I consider these now in the final update section (to
June 2018) that now follows.
On the one hand, the latest findings reconfirm the pattern
of the key discovery from Haak et al. (2015): a powerful genetic
impact from the Yamnaya culture of the Pontic-Caspian Steppe
into the Corded Ware culture of north-central and north-
eastern Europe, during the Early Bronze Age. As for which
branches of Indo-European were likely involved (§2.10), this
reconfirmation points increasingly also to the lineage ancestral
to Germanic, as well as Balto-Slavic. Furthermore, Olalde et al.
(2018) now report that when the succeeding Bell Beaker culture
spread to Britain c. 4450 BP, that coincided with a genomic
transformation so far-reaching as to seem a prima facie
candidate for having introduced a new language lineage there.
Geographically, this might seem an obvious fit with the
dispersal of Celtic — but only if one steps back from established
thinking on the chronology of Celtic and its associations with
archaeological cultures. The traditional model has long assumed
164 Paul Heggarty

that Celtic dispersal was driven by the Hallstatt and La Tène


cultures, i.e. not until the Iron Age, and that the split between
Brythonic and Goidelic dates to as late as 2000 BP. To associate
Celtic instead with the incoming ancient DNA signal now
detected in Britain from 4450 BP would entail a significantly
earlier chronology for Celtic, then — or at least for a ‘para-
Celtic’, potentially including the much debated Pictish. That
said, Olalde et al.’s (2018) findings do not necessarily exclude a
later, ‘repeat’ migration, if the source population again came
from the same regions of the continent immediately facing
Britain — even if that would be a less parsimonious hypothesis.
On whether the Yamnaya → Corded Ware movement may
also have included speakers of an early Uralic language lineage
(§2.4), Mittnik et al. (2018) detect in modern Uralic-speakers an
additional ancestry component of East Asian origin, but which
is absent in the ancient DNA of the samples they report from
Corded Ware contexts. However, none of their samples from
that period actually came from areas where (in the present-day,
at least) Uralic languages are spoken, so we await better
coverage in time and space. Moreover, the additional
component they detect is only a minor one.
Outside northern Europe, on the other hand, the Steppe
hypothesis continues to find much less support in the latest
ancient DNA, and in ways already anticipated in this paper. The
new data tend to reinforce the overall picture that an origin in
the Pontic-Caspian Steppe cannot straightforwardly and
successfully explain all Indo-European. In southern Europe,
new samples continue to show much less impact attributable to
Yamnaya at time-depths early enough to explain the depth of
divergence between the main European branches. In Iberia, the
Bell Beaker phenomenon is associated with some in-migration,
but not remotely as heavy as in Britain (Olalde et al. 2018).
Input possibly attributable to Yamnaya is limited in scale in the
Balkans, too (Mathieson et al. 2018), and even more limited in
samples from Mycenaean Greece (Lazaridis et al. 2017). The
language spoken there was by then not only just already Indo-
European, but distinctly of the Greek branch, of course. Yet
these Mycenaeans “had at least three-quarters of their ancestry
from the first Neolithic farmers of western Anatolia and the
Aegean, and most of the remainder from ancient populations
Indo-European and the Ancient DNA Revolution 165

related to those of the Caucasus and Iran”. This leaves only a


very small share as possibly from the Steppe, although Armenia
is seen as a better-fitting candidate source in any case.
Indeed for the ancient Caucasus itself, i.e. the key region
that lies directly between the homelands of the two main
hypotheses, we now have much fuller ancient DNA coverage,
thanks to Wang et al. (2018). Their basic finding is that all the
way to the northern edge of the Caucasus ranges, where they
abut onto the Steppe, genetic ancestry is predominantly akin to
that found south of the Caucasus, not on the Steppe. This
includes samples from contexts of the Maykop culture. The
basic direction of movement through the Caucasus is not out of
the Steppe, then, but northwards out of the northern Fertile
Crescent. Wang et al.’s (2018: 4) discussion of the Maykop
culture lists, incidentally, several of the very attributes that
advocates of linguistic palaeontology have long imagined
support an origin of Indo-European on the Steppe, but not
further south. Also relevant is the recent finding from ancient
horse DNA that the Bronze Age and modern horse stock does
not in fact descend from the early domestication event in the
Botai region of Kazakhstan, and must presumably derive from a
separate domestication elsewhere (Gaunitz et al. 2018). There
are several candidate regions, particularly now that it has been
confirmed that the wild horse had a distribution wider than
often claimed in support of the Steppe hypothesis, and notably
did include Anatolia and the northern periphery of the Fertile
Crescent (Shev 2016: 129).
The steady stream of ever more human ancient DNA
samples continues to adjust past calculations, models and
interpretations, in any case. This applies especially to precisely
the case foreshadowed in §3.3, on the potential complexities
behind the apparently stark genetic contrasts originally
reported among the first farmers in the Near East. The picture
now is that the originally deeply split ancestries of different
parts of the Fertile Crescent (Lazaridis et al. 2016) had
progressively admixed, by the Chalcolithic period at the latest,
but potentially even earlier. The ‘Anatolian’ farmer ancestry so
dominant in early European farmers is reported at 33% now
also in eastern Iran (as it progressively declines eastwards).
Narasimhan et al. (2018: 7) see this as consistent with the
166 Paul Heggarty

eastward “spread of wheat and barley agriculture … in the 7th


and 6th millennia BCE”. In early farmers from the Levant too,
Anatolian ancestry is revised higher. Meanwhile, many Early
Bronze Age samples from the Pontic-Caspian Steppe, including
notably Yamnaya samples, are themselves now (re-)analysed as
having 10-20% of this same ‘Anatolian’ ancestry (Wang et al.
2018: 10).
Two new ancient DNA papers also bear directly on the
long-standing claim required within the Steppe hypothesis, for
a migration in the reverse direction from the Steppe into
Anatolia, usually envisaged via the Balkans (e.g. Anthony &
Ringe 2015: 208). Mathieson et al. (2018: 201) straightforwardly
conclude “No steppe migration to Anatolia via southeast
Europe”, and Damgaard et al. (2018a) have now published the
long-awaited first ancient DNA from periods and regions where
languages of the Anatolian branch were spoken. The ancestries
of these samples seem not derivable from the Steppe, to any
extent. Rather, Anatolia fits into the broader pattern of
progressive admixture between the originally deeply-split ‘first
farmer’ ancestries. That is, almost half of the ancestry profiles
in Hittite-speaking regions is made up of the same CHG/EFC
component that also spread separately to make up even higher
proportions — half or more — of the ancestry in ancient
samples from Yamnaya (by c. 6500 BP) and from Iran (already
by the Neolithic), as well as in most modern Indic-speaking
populations.
Indeed further east, as noted in §3.1, although modern
populations of South Asia do show some ancestry from the
Steppe, it is much more limited in scale, especially relative to
the Eastern Fertile Crescent (EFC) component. In fact, the first
ancient DNA samples have now emerged from cultures
originally assumed by some advocates of a Steppe origin of
Proto-Indo European (see Mallory 1989, Bryant 2001) as
candidates for having brought Indo-Iranic into South Asia,
namely BMAC and/or the Gandhara Grave culture. Narasimhan
et al. (2018) report that the BMAC samples in fact show
negligible Steppe ancestry and cannot be the source of this in
South Asia. They therefore push the chronology later, in a
model of migrations in which a contribution of Steppe ancestry
is first detected as late as 3200 BP in Iron Age samples from the
Indo-European and the Ancient DNA Revolution 167

Swat Valley, and builds up there only progressively.


Independently, Damgaard et al. (2018a) bring further new data
to confirm that this ancestry in South Asia cannot stem from
Yamnaya directly (contra Narasimhan et al. 2018: line 484), nor
from its Early Bronze Age time-frame, but only from a later
phase.
For a correspondence with the linguistics, it is above all on
the chronological level that these findings from Central and
South Asia bear consequences for the Indo-European origins
debate. Many other supporters of a Steppe origin have of course
long argued (on the basis of ‘chariotry’ vocabulary, for
example) that Indo-Iranic expansion south of the steppe does
indeed date only to as late as the Middle and Late Bronze Age.
That Steppe ancestry is not found in BMAC, and entered South
Asia only in more recent periods still, would seem coherent
with that. Yet it only highlights the linguistic counter-
argument, too: that this brings us forward into a time-frame too
recent to sit well alongside the scale of language divergence
across and even within the Iranic, Nuristani and Indic branches.
Moreover, the positions of Iranic and Indic within the PCA
plot in Damgaard et al.’s (2018a) figure 2b continue to appear
straightforwardly compatible with the alternative scenario
proposed in §3: the branch ancestral to Indo-Iranic spread
eastwards with farming from Neolithic Iran. The later impacts
detected from the Steppe via Central Asia would then not have
succeeded in introducing a major new language lineage in
Persia and South Asia. Their demographic impacts seem too
small and too gradual to have plausibly brought about the near
total replacement of all native languages across the huge region
of Indo-Iranic speech. The BMAC culture, then, far from the
early suggestion than it may have acted as a conduit for Steppe
influences southwards into Iran and South Asia, may instead
have been what brought ancient Iranian ancestry (mostly EFC)
to spill over northwards into Central Asia. There it admixed (in
roughly balanced proportions) with local populations with
significant East Asian ancestry, to give rise to the ancestry
profiles of many of the “Scythian” groups in Damgaard et al.’s
(2018b) extended data figures 1d and 1e. This scenario is
likewise broadly compatible with the new ancient DNA findings
and analyses in Narasimhan et al. (2018).
168 Paul Heggarty

For South Asia itself, a great deal remains essentially at


the level of modelling. For aside from the 65 samples reported
by Narasimhan et al. (2018) from far northern Pakistan, ancient
DNA is still lacking from South Asia. Particularly in that
context, what seems strangest in Narasimhan et al. (2018) is
their confidence in going so far as to claim to have identified
“the populations that almost certainly were responsible for
spreading Indo-European languages across much of Eurasia”.
Not least in the light of the latest findings from Anatolia, that
seems decidedly premature. Narasimhan et al. (2018) may have
identified a population that brought a (limited) contribution of
Steppe ancestry into South Asia. But those data prove nothing
on whether Indo-Iranic languages came south with that
population. Indeed, by confirming such a late chronology for
when Steppe ancestry entered South Asia, in a time-frame even
more recent than BMAC, it is their own findings that make an
alternative scenario seem more compatible with the linguistic
diversity and time-depth of Indic, Nuristani and Iranic: these
languages had already been long-established and begun
diverging in Persia and South Asia, and were not dislodged by
the late incomers from the Steppe.
This only highlights the limitation that continues to unite
these latest studies: in effect they still explore only alternative
models of when Steppe and Central Asian ancestry, however
limited, entered South Asia. Neither Damgaard et al. (2018a) nor
Narasimhan et al. (2018) question whether this demographically
limited impact actually brought Indo-Iranic at all, or tests those
models against the alternative scenario advanced here, that it
did not. For Indo-Iranic, these papers still constrain their own
explorations of prehistory by presuming the Steppe hypothesis
in the first place. And this despite Damgaard et al. (2018a)
themselves bringing new evidence of their own against certain
aspects of that hypothesis. As the senior author himself puts it:
“What we see does not support a classical way of looking at the
steppe hypothesis” (Eske Willerslev, quoted in Price 2018).
This entails consequences more broadly, too, for quite
how much confidence to place in some of the traditional
methodologies that certain Indo-Europeanists have considered
to have already convincingly demonstrated that Steppe
hypothesis in “classical” form. In effect, the ancient DNA
Indo-European and the Ancient DNA Revolution 169

evidence is now beginning to suggest that those methodologies


may not have been so precise and infallible after all. Indeed,
other linguists of Indo-European have long since countered that
the linguistic data, including loanwords and linguistic
palaeontology, can just as well be taken to support a radically
different hypothesis instead. Gamkrelidze & Ivanov (1995) also
appealed to linguistic palaeontology and ancient loanwords, but
in support of their alternative scenario that the Steppe was but
a secondary staging post for the main European branches, the
ultimate homeland was “Near Eastern”, and Indo-Iranic spread
eastwards by a route south of the Steppe.
In sum, the latest ancient DNA findings fail to dovetail with
either of the main hypotheses on Indo-European origin, in its
full original form. Initial findings did first give a widespread
first impression of strong support for a steppe origin, at least of
major branches of Indo-European in Europe (especially in the
east and north). Elsewhere, however, the very latest
publications in ancient DNA continue to lay down a challenge to
start thinking outside the Steppe hypothesis box again — and
not just for Anatolian, Greek and Armenian, but for Indo-Iranic
too.

References
Allentoft, M.E., Sikora, M., Sjögren, K.-G., Rasmussen, S., et al.
2015 Population genomics of Bronze Age Eurasia. Nature 522(7555):
p.167–172. http://dx.doi.org/10.1038/nature14507

Anthony, D.W.
2013 Two IE phylogenies, three PIE migrations, and four kinds of
steppe pastoralism. Journal of Language Relationship 9: p.1–21.
http://jolr.ru/article.php?id=104

Anthony, D.W., & Ringe, D.


2015 The Indo-European homeland from linguistic and archaeological
perspectives. Annual Review of Linguistics 1(1): p.199–219.
http://dx.doi.org/10.1146/annurev-linguist-030514-124812

Balaresque, P., Bowden, G.R., Adams, S.M., Leung, H.-Y., et al.


2010 A predominantly Neolithic origin for European paternal lineages.
PLoS Biology 8(1): p.e1000285.
http://dx.doi.org/10.1371/journal.pbio.1000285
170 Paul Heggarty

Bellwood, P.
2005 First Farmers: The Origins of Agricultural Societies. Oxford:
Blackwell.

Bollongino, R., Nehlich, O., Richards, M.P., Orschiedt, J., et al.


2013 2000 years of parallel societies in Stone Age Central Europe.
Science: p.1245049. http://dx.doi.org/10.1126/science.1245049

Brandt, G., Haak, W., Adler, C.J., Roth, C., et al.


2013 Ancient DNA reveals key stages in the formation of Central
European mitochondrial genetic diversity. Science 342(6155):
p.257–261. http://dx.doi.org/10.1126/science.1241844

Broushaki, F., Thomas, M.G., Link, V., López, S., et al.


2016 Early Neolithic genomes from the eastern Fertile Crescent.
Science: p.aaf7943. http://dx.doi.org/10.1126/science.aaf7943

Bryant, E.
2001 The Quest for the Origins of Vedic Culture: The Indo-Aryan
Migration Debate. Oxford: Oxford University Press.

Cassidy, L.M., Martiniano, R., Murphy, E.M., Teasdale, M.D., et al.


2016 Neolithic and Bronze Age migration to Ireland and establishment
of the insular Atlantic genome. Proceedings of the National
Academy of Sciences 113(2): p.368–373.
http://dx.doi.org/10.1073/pnas.1518445113

Damgaard, P. de B., Marchi, N., et al.


2018 137 ancient human genomes from across the Eurasian steppes.
Nature: p.1.
http://doi.org/10.1038/s41586-018-0094-2

Damgaard, P. de B., Martiniano, R., et al.


2018 The first horse herders and the impact of early Bronze Age steppe
expansions into Asia. Science: p.eaar7711.
http://doi.org/10.1126/science.aar7711

Falush, D., Dorp, L. van, & Lawson, D.


2016 A tutorial on how (not) to over-interpret
STRUCTURE/ADMIXTURE bar plots. bioRxiv : p.66431.
http://dx.doi.org/10.1101/066431

Gallego Llorente, M., Connell, S., Jones, E.R., Merrett, D., et al.
2016 The genetics of an early Neolithic pastoralist from the Zagros,
Iran. Scientific Reports 6: p.31326. http://doi.org/10.1038/srep31326
Indo-European and the Ancient DNA Revolution 171

Gamkrelidze, T.V., & Ivanov, V.V.


1995 Indo-European and the Indo-Europeans: A Reconstruction and
Historical Analysis of a Proto-Language and a Proto-Culture.
Berlin: Mouton de Gruyter.

Gaunitz, C. et al.
2018 Ancient genomes revisit the ancestry of domestic and
Przewalski’s horses. Science 360 (6384): p.111–114.
http://doi.org/10.1126/science.aao3297

Haak, W., Lazaridis, I., Patterson, N., Rohland, N., et al.


2015 Massive migration from the steppe was a source for Indo-
European languages in Europe. Nature 522(7555): p.207–211.
http://dx.doi.org/10.1038/nature14317

Heggarty, P.
2014 Prehistory through language and archaeology. In C. Bowern & B.
Evans (eds) Routledge Handbook of Historical Linguistics, 598–626.
London: Routledge. https://www.academia.edu/3687718

Heggarty, P., & Beresford-Jones, D.G.


2010 Agriculture and language dispersals: limitations, refinements, and
an Andean exception? Current Anthropology 51(2): p.163–191.
http://dx.doi.org/10.1086/650533
2014 Farming-language dispersals (1): principles. In C. Smith (ed)
Encyclopedia of Global Archaeology, 2739–2749. New York:
Springer. http://dx.doi.org/10.1007/978-1-4419-0465-2_2415

Heggarty, P., & Renfrew, C.


2014 South and Island South-East Asia: Languages. In C. Renfrew & P.
Bahn (eds) The Cambridge World Prehistory, 534–558. Cambridge:
Cambridge University Press.

Jones, E.R., Gonzalez-Fortes, G., Connell, S., Siska, V., et al.


2015 Upper Palaeolithic genomes reveal deep roots of modern
Eurasians. Nature Communications 6: p.8912.
http://dx.doi.org/10.1038/ncomms9912

Kallio, P.
2006 Suomen kantakielten absoluuttista kronologiaa. [On the absolute
chronology of the proto-languages of Finnish.]. Virittäjä 110(1):
p.2–25.
www.kotikielenseura.fi/virittaja/hakemistot/jutut/2006_2.pdf

Khazanov, A.M.
1984 Nomads and the Outside World. Madison: University of Wisconsin
Press.
172 Paul Heggarty

Lazaridis, I. et al.
2017 Genetic origins of the Minoans and Mycenaeans. Nature 548
(7666): p.214–218.
http://doi.org/10.1038/nature23310

Lazaridis, I., Nadel, D., Rollefson, G., Merrett, D.C., et al.


2016 Genomic insights into the origin of farming in the ancient Near
East. Nature 536 (7617): p.419-424.
http://doi.org/10.1038/nature19310

Mathieson, I. et al.
2018 The genomic history of southeastern Europe. Nature 555 (7695):
p.197.
http://doi.org/10.1038/nature25778

Mittnik, A. et al.
2018 The genetic prehistory of the Baltic Sea region. Nature
Communications 9 (1): p.442.
http://doi.org/10.1038/s41467-018-02825-9

Nakhleh, L., Ringe, D., & Warnow, T.


2005 Perfect phylogenetic networks: a new methodology for
reconstructing the evolutionary history of natural languages.
Language 81(2): p.382–420. www.jstor.org/stable/4489897

Narasimhan, V.M. et al.


2018 The Genomic Formation of South and Central Asia. bioRxiv:
p.292581.
http://doi.org/10.1101/292581

Olalde, I. et al.
2018 The Beaker phenomenon and the genomic transformation of
northwest Europe. Nature 555 (7695): p.190.
http://doi.org/10.1038/nature25738
Price, M.
2018 Finding the first horse tamers. Science 360 (6389): p.587–587.
http://science.sciencemag.org/content/360/6389/587

Reich, D., Thangaraj, K., Patterson, N., Price, A.L., et al.


2009 Reconstructing Indian population history. Nature 461(7263):
p.489–494. http://dx.doi.org/10.1038/nature08365

Renfrew, C.
1987 Archaeology and Language: The Puzzle of Indo-European Origins.
London: Jonathan Cape.
Indo-European and the Ancient DNA Revolution 173

Ringe, D.A., Warnow, T., & Taylor, A.


2002 Indo-European and computational cladistics. Transactions of the
Philological Society 100(1): p.59–129.
http://dx.doi.org/10.1111/1467-968X.00091

Sherratt, A., & Sherratt, S.


1988 The archaeology of Indo-European: an alternative view. Antiquity
62(236): p.584–595.
www.antiquity.ac.uk/Ant/062/Ant0620584.htm

Shev, E.T.
2016 The introduction of the domesticated horse in Southwest Asia.
Archaeology, Ethnology & Anthropology of Eurasia 44 (1): p.123–
136.
https://www.researchgate.net/publication/
304711332_The_introduction_of_the_domesticated_horse_in_Sou
thwest_Asia

Wang, C.-C. et al.


2018 The genetic prehistory of the Greater Caucasus. bioRxiv: p.322347.
http://doi.org/10.1101/322347

Willcox, G.
2013 The roots of cultivation in Southwestern Asia. Science 341(6141):
p.39–40. http://dx.doi.org/10.1126/science.1240496

Zeder, M.A.
2008 Domestication and early agriculture in the Mediterranean Basin:
Origins, diffusion, and impact. Proceedings of the National
Academy of Sciences 105(33): p.11597–11604.
http://dx.doi.org/10.1073/pnas.0801317105

You might also like