Professional Documents
Culture Documents
Quantifying Language Dynamics On The Cutting Edge of Areal and Phylogenetic Linguistics 1st Edition Soren Wichmann
Quantifying Language Dynamics On The Cutting Edge of Areal and Phylogenetic Linguistics 1st Edition Soren Wichmann
https://ebookmeta.com/product/life-on-the-cutting-edge-sal-
rachele/
https://ebookmeta.com/product/cutting-edge-hubble-telescope-data-
christy-peterson/
https://ebookmeta.com/product/fifty-key-thinkers-on-language-and-
linguistics-1st-edition-margaret-thomas/
https://ebookmeta.com/product/why-do-linguistics-reflective-
linguistics-and-the-study-of-language-2nd-edition-fiona-english/
Treatment of Spine Disease in the Elderly: Cutting Edge
Techniques and Technologies 1st Edition Kai-Ming G. Fu
https://ebookmeta.com/product/treatment-of-spine-disease-in-the-
elderly-cutting-edge-techniques-and-technologies-1st-edition-kai-
ming-g-fu/
https://ebookmeta.com/product/methicillin-resistant-
staphylococcus-aureus-mrsa-protocols-cutting-edge-technologies-
and-advancements-yinduo-ji/
https://ebookmeta.com/product/the-cambridge-handbook-of-working-
memory-and-language-cambridge-handbooks-in-language-and-
linguistics-john-w-schwieter/
https://ebookmeta.com/product/low-dimensional-nanoelectronic-
devices-theoretical-analysis-and-cutting-edge-research-1st-
edition-angsuman-sarkar-editor/
https://ebookmeta.com/product/biologic-therapy-for-psoriasis-
cutting-edge-treatment-principles-1st-edition-nicholas-
brownstone-editor/
Quantifying Language Dynamics
Quantifying Language Dynamics
On the Cutting Edge
of Areal and Phylogenetic Linguistics
Edited by
Søren Wichmann
Jeff Good
leiden | boston
The following articles published in this paperback originally appeared in Brill's journal Language Dynamics
and Change:
Bentz, Christian and Bodo Winter. 2013. Languages with More Second Language Learners Tend to Lose
Nominal Case. Language Dynamics and Change 3: 1-27.
Hammarström, Harald and Tom Güldemann. 2014. Quantifying Geographical Determinants of Large-Scale
Distributions of Linguistic Features. Language Dynamics and Change 4: 87-115.
Jäger, Gerhard. 2014. Phylogenetic Inference from Word Lists Using Weighted Alignment with Empirically
Determined Weights. Language Dynamics and Change 3: 245–291.
Michael, Lev, Will Chang, and Tammy Stark. 2014. Exploring Phonological Areality in the Circum-Andean
Region Using a Naive Bayes Classifier. Language Dynamics and Change 4: 27-86.
This publication has been typeset in the multilingual “Brill” typeface. With over 5,100 characters covering
Latin, ipa, Greek, and Cyrillic, this typeface is especially suitable for use in the humanities. For more
information, please see www.brill.com/brill-typeface.
Introduction 1
Søren Wichmann and Jeff Good
Christian Bentz
studied Germanistics, Macroeconomics, and Philosophy in Heidelberg and
Rome. He was a visiting researcher at the Cognitive Neuroscience Lab at Cor-
nell University and at the Max Planck Institute for Evolutionary Anthropology.
He received his MPhil in English and Applied Linguistics from the University
of Cambridge, where he is currently a Ph.D. student in Computation, Cognition
and Language.
Will Chang
is a graduate student at the University of California, Berkeley. He works on
phylogenetic models and Polynesian languages.
Hans Geisler
completed his Ph.D. on typological evolution from Latin to French at the
University of Munich in 1980. In 1987 he became Private Lecturer with a disser-
tation (Habilitationsschrift) on sound change in Romance languages. In 1996
he received an appointment as Professor at Heinrich Heine University Düssel-
dorf where he is currently Chair at the Department of Romance Languages and
Literatures.
Tom Güldemann
is Professor of African linguistics at the Humboldt University Berlin and is
also associated with the Max Planck Institute for Evolutionary Anthropology in
Leipzig. He specializes in language typology, historical linguistics, and language
documentation and description, with a field research focus on Khoisan and
Bantu languages.
Harald Hammarström
Ph.D. (2009), Chalmers University, is Research Staff at the Max Planck Institute
of Psycholinguistics, Nijmegen. He has published papers and monographs in
computational linguistics and linguistic typology.
Gerhard Jäger
Dr. phil. (1996 at Humboldt University Berlin), is Professor of General
Linguistics at Tübingen University. He has published on formal semantics, opti-
mality theory, game theoretic linguistics and language evolution. Since 2013
he holds an erc Advanced Grant, “Language Evolution: The Empirical Turn
(evolaemp)”.
viii notes on contributors
Johann-Mattis List
Ph.D. (2013), Heinrich Heine University Düsseldorf, is Post-Doctoral Researcher
at Philipps-University Marburg. He has published several articles on compu-
tational methods in historical linguistics, including “Networks of lexical bor-
rowing and lateral gene transfer in language and genome evolution” (Bioessays,
2014).
William Martin
completed his Ph.D. in 1988 in Cologne with Heinz Saedler on molecular genet-
ics and plant evolution. He then joined Rüdiger Cerff at the University of
Braunschweig to work on molecular evolution and endosymbiosis. In 1999 he
received an appointment as Professor at Heinrich Heine University Düsseldorf
where he is currently head of the Institute for Molecular Evolution.
Lev Michael
Ph.D. (2008), University of Texas at Austin, is Associate Professor of Linguistics
at the University of California, Berkeley. He has carried out fieldwork on sev-
eral Amazonian languages, including Iquito (Zaparoan), Nanti (Arawak), and
Máíh̃ki (Tukanoan), and has published on the anthropological, comparative,
and areal linguistics of Amazonian languages.
Shijulal Nelson-Sathi
Ph.D. (2013), Heinrich Heine University Düsseldorf, is a Post-Doctoral Re-
searcher at Heinrich Heine University Düsseldorf under William F. Martin on
Molecular Evolution.
Frank Seifart
works at Max Planck Institute for Evolutionary Anthropology and the Uni-
versity of Amsterdam and coordinates a project on frequencies of nouns and
verbs cross-linguistically. His main research interests are linguistic typology,
language history and contact, and documentation and description of Bora-
Miraña and Resígaro (North West Amazon).
Tammy Stark
is a graduate student at the University of California, Berkeley. Her research
focuses on morphosyntactic variation in the Northern Arawak languages of
South and Central America.
notes on contributors ix
Bodo Winter
Ph.D. candidate at the Cognitive and Information Sciences group, University of
California, Merced, does research within the domain of experimental cognitive
linguistics, language evolution and statistical methods.
Editor Biographies
Jeff Good
Ph.D. (2003), University of California, Berkeley, is Associate Professor of Lin-
guistics at the University at Buffalo. His research interests include comparative
Niger-Congo linguistics, morphosyntactic typology, and language documen-
tation. He has published in Language, Diachronica, and Morphology, among
others, and serves as General Editor of Language Dynamics and Change.
Søren Wichmann
Ph.D. (1996), University of Copenhagen, is Senior Scientist at Max Planck Insti-
tute for Evolutionary Anthropology and works on historical linguistics, typol-
ogy, and Mesoamerican languages, often applying quantitative, computational
methods. He is founder and General Editor of the journal Language Dynamics
and Change.
Introduction
Søren Wichmann
Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
Jeff Good
University at Buffalo, Buffalo, New York, usa
With this book, which contains selected papers that are published or forthcom-
ing in the journal Language Dynamics and Change (ldc), we wish to celebrate
that the journal, launched in 2011, is soon to go into its fifth year. When the
first author of this introduction was originally approached in February 2010 by
a representative from Brill, the idea of the latter was to start a new journal for
historical linguistics. The former was interested in this idea, but he had some
additional motivations, which the second author also supported. One motiva-
tion was that, at the time, there was no obvious outlet for papers addressing
questions of historical linguistic relevance through proper statistical hypothe-
sis testing. Another motivation was to create a forum for the wider field of lan-
guage dynamics discussed in Wichmann (2008) and Loreto et al. (2011). Here we
reproduce the tentative definition of language dynamics by Wichmann (2014:
303) with some minor modifications:
group can be added to the dataset under investigation. Next, Bayesian logic is
applied in order to estimate the probability that a given language belongs to
the core area group given its inventory of features in some linguistic domain.
This probability is not interpreted as some sort of evidence for the linguistic
area or for the membership therein of individual languages. Rather, it serves as
a descriptive means of distinguishing between more or less focal members. The
paper applies this method using a dataset of phonological inventories of South
American languages, with the aim of delineating a linguistic area in the Andean
highlands and adjacent regions. The method was not only able to identify such
an area but was also able to further delineate two major subareas within the
larger Andean area, in both cases confirming linguists’ intuitions.
Drawing upon data from thousands of languages, Hammarström and Gül-
demann set out to test geographical correlates of the distribution of linguis-
tic features in two domains: numeral systems and basic word order. Testing
whether climatic zones have an effect on their distribution, they find that
homogeneity of numeral systems tends to be higher within different climatic
zones than within zones picked randomly, but the same is not the case for basic
word order. The difference in the two domains is tentatively ascribed to the
possibility that the link between climate and numeral systems is mediated by
subsistence strategies, which has some correlation with numeral systems (as
per earlier, unpublished work by Hammarström). They further test whether
there is a preference for similar features to be distributed along an east-west
axis, rather than a north-south one, as might be expected from the work of Dia-
mond (1997), and find some evidence for this distribution in both domains. A
final test is whether areas defined in terms of languages having identical val-
ues for the linguistic features investigated tend to be bounded by coastlines
and mountains/valleys to a greater extent than randomly-picked, spatially-
coherent geographic areas. The authors do not find such an effect. The paper
takes great care to present every step of its analyses to facilitate replication and
presents neat solutions to practical problems when dealing with the messy real-
ity of geography.
Trudgill (2011) and others have argued that language change induced by
adult l2 learners is expected to lead to reduced grammatical complexity. Bentz
and Winter set as their goal to test this claim statistically. They look specifically
at case distinctions. The authors were able to obtain information on both the
number of cases and the proportion of l2 speakers for 66 languages. The statis-
tically well-controlled analysis revealed that languages with more l2 speakers
tend to have fewer cases and also that there is an inverse relation between the
proportion of l2 speakers and the number of nominal case markers. Because of
the care with which the analyses are carried out and the thorough discussion of
4 wichmann and good
possible confounding factors, this paper serves as a good model for future stud-
ies testing hypotheses concerning the linguistic outcomes of different language
contact situations.
There is a widespread belief among students of language contact, expressed
explicitly by Weinreich (1953) and several others, that structural features of a
borrowing language constrain the kinds of morphemes that can be borrowed.
Seifart’s contribution is the first systematic test of this claim. Using his own
database of morphological borrowing and information on structural features
from Dryer and Haspelmath (2011) supplemented by some additional data
points, the author inspects 78 pairs of donor and borrowing languages to see
whether there is a correlation between structural similarity in general and the
amount of affix borrowing, counted as the number of broad morphological
categories expressed by the borrowed affixes. The straightforward and unam-
biguous result of this test is that there is no such relationship to be found:
the number of affix categories borrowed cannot be predicted from the struc-
tural distance between the donor and the borrowing language. This paper thus
serves as an exemplary case study demonstrating how the availability of large-
scale typological databases combined with quantitative methods allows for the
rigorous examination of long-standing hypotheses previously supported pri-
marily by impressionistic evidence.
List, Nelson-Sathi, Martin, and Geisler present a phylogenetic method for
characterizing relationships among languages that builds both vertical and
lateral transmission (inheritance and borrowing) into its basic design, and they
illustrate this method using Chinese data. The beauty of the method is its small
number of assumptions. Initially no decision is made about what is a borrowing
and what is not. It is only decided which words are homologs—i.e., whether
they are related in one or the other way. For instance, English mountain is
homologous with French montagne and Spanish montaña, but only the latter
two are inherited vertically. If mountain were inherited in the same way that
montagne and montaña are, the proto-language from which Germanic and
Romance both descend would have to have had two synonyms for mountain,
one for the proto-form giving rise to mountain, montagne, etc., and one for the
proto-form giving rise to German Berg, Danish bjerg, etc. With the additional
assumption that a proto-language should not have many more synonyms than
its descendant languages, forms such as mountain, which would contribute
to the proliferation of proto-synonyms and whose distributions also have a
poor fit with the structure of a reference tree (developed independently), are
singled out as candidates for being loanwords. In a network of languages where
the skeleton is the reference tree, such lateral connections can be depicted
as edges connecting languages or intermediate proto-languages criss-crossing
introduction 5
the reference tree. This method thus extends popular existing phylogenetic
techniques which are more strongly oriented towards modeling vertical over
lateral transmission.
Jäger’s paper represents a significant step forward in the application of
distance-based methods to lexical language data for the purpose of producing
phylogenies. In dialectological studies, it has repeatedly been found that simple
Levenshtein distance (ld) as a tool for producing an overall measure of distinc-
tiveness between forms is reasonably accurate, even if the ld simply counts
the number of operations needed to transform one string into another with-
out taking into account differences in the phonetic interpretations of aligned
symbols via some sort of system weighting some changes as more likely than
others. A possible reason why elaborate weighting schemes have not been very
influential is that assumptions about how to define weights are likely to be
controversial if they are based solely on theoretical criteria such as the nature
of phonetic features involved in a change. Instead, Jäger determines weights
empirically, first by developing a conservative criterion for cognacy and then,
after aligning cognates, estimating the weights based on the alignments found.
A large-scale test of the results shows that weights, when properly assigned, do
in fact improve the accuracy of classifications.
What unites the papers in this volume is that they bring large datasets and
sophisticated statistical arguments to bear on simple and fundamental ques-
tions in language dynamics. The advantage of hypothesis-driven statistical
inquiry over common-sensical reasoning supported merely by cherry-picked
examples is that the former greatly helps us to distinguish between produc-
tive avenues of investigation and dead ends, whereas the latter only sets up a
kind of magnetic field where opinions will fluctuate between opposite poles
without any detectable progress. For instance, the results of the paper by Ham-
marström and Güldemann favor a model of population interaction on a large
scale where geographic axes should be taken into account, while other possi-
ble geographic parameters, such as the presence of geographical boundaries,
can largely be ignored. Bentz and Winter show that the hypothesis that sim-
plification of grammatical systems can be a consequence of interference from
l2 speakers is worth further pursuing through studies on other domains of
grammars and with an extended empirical base. In contrast, Seifart’s results
strongly suggest that another popular idea, that there are grammatical con-
straints on borrowing, is basically a dead end. There is progress in uncovering
the lack of productivity in a particular avenue of investigation, just as there
is in suggesting new methodologies with provable advantages—with the lat-
ter eminently embodied in the contributions by Michael et al., List et al., and
Jäger.
6 wichmann and good
References
Diamond, Jared. 1997. Guns, Germs and Steel: The Fates of Human Societies. London:
Cape.
Dryer, Matthew S. and Martin Haspelmath (eds.) 2011. The World Atlas of Language
Structures Online. Max Planck Digital Library. Accessible at http://wals.info/.
Loreto, Vittorio, Andrea Baronchelli, Animesh Mukherjee, Andrea Puglisi, and Fran-
cesca Tria. 2011. Statistical physics of language dynamics. Journal of Statistical Me-
chanics: Theory and Experiment P04006 (doi: 10.1088/1742–5468/2011/04/P04006).
Trudgill, Peter. 2011. Sociolinguistic Typology: Social Determinants of Linguistic Complex-
ity. Oxford: Oxford University Press.
Weinreich, Uriel. 1953. Languages in Contact. New York: Linguistic Circle of New York.
Wichmann, Søren. 2008. The emerging field of language dynamics. Language and
Linguistics Compass 2: 442–455.
Wichmann, Søren. 2014. The challenges of language dynamics. Comment on “Mod-
elling language evolution: Examples and predictions” by Gong, Shuai & Zhang.
Physics of Life Reviews 11: 303–304.
Exploring Phonological Areality
in the Circum-Andean Region
Using a Naive Bayes Classifier
Lev Michael, Will Chang and Tammy Stark
University of California, Berkeley, usa
Corresponding author:
levmichael@berkeley.edu
Abstract
This paper describes the Core and Periphery technique: a quantitative method for
exploring areality that uses a naive Bayes classifier, a statistical tool for inferring class
membership based on training sets assembled from members of the classes in ques-
tion. The Core and Periphery technique is applied to the exploration of phonological
areality in the Andes and surrounding lowland regions, based on the South American
Phonological Inventory Database (SAPhon 1.1.3; Michael et al., 2013). Evidence is found
for a phonological area centering on the Andean highlands, and extending to parts of
the northern and central Andean foothills regions, the Chaco, and Patagonia. Evidence
is also found for Southern and North-Central phonological sub-areas within this larger
phonological area.
Keywords
1 Introduction
The goals of this paper are twofold: first, to describe the Core and Periphery
technique, an intuitively appealing quantitative method for exploring large
linguistic datasets for evidence of linguistic areality; and second, to illustrate
the utility of this technique by applying it to a dataset of South American
phonological inventories, focusing on the evidence of phonological areality in
the Andes and surrounding lowland areas.
Core and Periphery is a method that uses as a starting point linguists’ knowl-
edge of the languages and history of a region to generate initial hypotheses
regarding ‘cores’: sets of languages that constitute possible linguistic areas
(Campbell et al., 1986; Thomason, 2000; Muysken, 2008), or parts of such areas.
These hypotheses serve as the seed for the application of a statistical technique,
naive Bayes classification (nbc), which determines what features, if any, dis-
tinguish the core languages from other languages in the region, and also to
what degree languages outside the proposed core resemble the core languages.
Those languages deemed core-like, together with the proposed core, constitute
a candidate linguistic area, to be evaluated against pertinent sociohistorical
and geographical facts. If the languages deemed core-like fail to make sense
geographically, then the Core and Periphery technique has failed to identify a
linguistic area around the proposed core.
The Core and Periphery technique improves on conventional practices of
‘eyeballing’ areas in three ways. First, it provides a quantitative evaluation of the
degree to which the languages of a proposed area in fact exhibit features that
distinguish them from the languages of the larger region containing the pro-
posed area. Second, it provides a quantitative measure of similarity between
languages that can be applied to large datasets, allowing linguists to locate
unexpected similarities that help identify new areas or redefine accepted ones.
And third, quantitative measures of similarity also make it possible to visualize
and cogently discuss the structure of linguistic areas whose boundaries are gra-
dient in nature. Note, however, that Core and Periphery is not strictly speaking
a statistical test of areality, a point we return to in Section 6.
In this paper, we carry out two different Core and Periphery explorations
of phonological areality in the circum-Andean region, first treating the entire
Andean highlands from northern Chile to northern Ecuador as a single core,
and then treating the Andean highlands as consisting of two cores, a Southern
Andean core and a North-Central Andean core. The dividing line between the
latter two cores runs through the southern Peruvian Andes, grouping Cuzco-
Collao Quechua and Jaqaru with the Southern Andean core, while the remain-
ing Quechuan languages constitute the North-Central core. This dual core anal-
ysis is motivated by the qualitative observation that the Southern Andean lan-
guages, delimited in this way, share a number of phonological characteristics
otherwise rare in South America, including a three-way contrast between plain,
aspirated, and ejective stops.
The single core analysis reveals several clusters of languages in the Andean
foothills and adjacent lowland regions that pattern more strongly with the
languages of the Andean core than other lowland languages, including an
Ecuadorean Andean foothills cluster, a Huallaga River valley cluster, a cluster
exploring phonological areality in the circum-andean region 9
1 As one reviewer suggested, even languages on another continent could serve as control
languages.
10 michael et al.
2 The Core and Periphery results actually suggest that in most cases, the range of phonological
influence of the Andes into the surrounding lowlands does not exceed a few hundred kilo-
meters, but by choosing so distant a control class, we allow for the possibility of more distant
influence.
exploring phonological areality in the circum-andean region 11
3.1 SAPhon
The quantitative exploration of phonological areality presented in this paper
is based on the analysis of the phonological inventories found in the South
American Phonological Inventory Database, version 1.1.3 (SAPhon 1.1.3; Michael
et al., 2013).3 In this section we briefly describe the structure of the database,
and discuss particular decisions that we made in populating the database and
preparing it for quantitative analysis.
SAPhon 1.1.3 incorporates 359 phonological inventories that have been har-
vested from published sources, or contributed by linguists currently working on
the languages in question. This represents over 95 % coverage of South Amer-
ican languages for which phonological descriptions are known to exist in one
form or another.4 The vast majority of inventories in the SAPhon database
belong to living languages, but SAPhon also includes inventories from recently
extinct languages, such as Chamicuro (Parker, 1991), as well as inventories based
on the careful interpretation and re-analysis of older resources, as in the case
of Cholón (Alexander-Bakkerus, 2005).
To facilitate quantitative analysis, the phonological inventory of each lan-
guage is coded in a comprehensive phonological feature matrix, with lan-
guages along the y-axis and features along the x-axis,5 with a column for every
phoneme and contrastive supersegmental feature (e.g. nasal harmony) attested
in a South American language. Each phonological inventory is coded as a row
of ones and zeros in the table, where the presence of a given segment for a
given language is coded as 1 in the appropriate column, and absence coded
as 0. Exhaustively coding the inventories in this fashion relieves us of having to
decide in advance which segments or contrasts are relevant to the exploration
of areality.6
We now turn to a number of methodological and analytical issues posed
by the nature of the data on which SAPhon is based. Since SAPhon draws
data from a considerable range of published and unpublished sources, issues
of heterogeneity in those sources pose challenges for the development of the
database, and for the analytical purposes to which we put that data.
The first type of heterogeneity we must contend with is the existence of mul-
tiple, sometimes incompatible, phonological descriptions for a given language.
Since allowing multiple inventories for a given language poses significant ana-
lytical difficulties, we typically select one inventory from among the various
proposed for a given language, preferring those given in works that present
considerable supporting data and analytical detail, and prepared by authors
with substantial linguistic training. We also typically prefer inventories based
on more recent work, on the grounds that recent work takes into account both
previous analyses and new data. To improve the quality of our judgments in
evaluating conflicting analyses, we also consulted specialists in particular lan-
guages, language families, and known linguistic areas in South America. In
cases where there is compelling evidence that the differences between inven-
tories proposed for a given language are due to dialectal differences, we include
both dialects in the database.
The second type of heterogeneity stems from the divergent ways in which
different linguists treat the same empirical phenomena. In particular, different
5 In this article, feature always refers to a feature of a language as a whole (such as the
presence or absence of a particular phoneme in the phonological inventory) rather than to
phonological features such as labial or unrounded.
6 We thank Mark Donohue for sharing this very useful coding technique with us.
exploring phonological areality in the circum-andean region 13
4.1 Overview
A naive Bayes classifier is a probabilistic model that classifies objects into K
classes. Such a classifier is first trained on many examples, each labeled by a
human expert with the class to which it belongs. Thereafter, when presented
with a novel object, the classifier will report with what probability the object
belongs to each of the K classes.7
A common application of this technology is spam filtering. An e-mail
account may receive dozens of unwanted messages every day, but a typical clas-
sifier is smart enough to put almost all of them into a spam folder, saving the
user the trouble of ever having to look at them. In this application there are
two classes: spam and non-spam. The classifier is trained on messages that it
knows to be spam (such as those the user manually flags) and those it knows
to be non-spam (such as those that the user does not flag after reading). This
continuously-trained classifier is applied to incoming messages, and usually
works very well.8
A naive Bayes classifier analyzes each object in terms of features that char-
acterize it. In the case of e-mail, the features are the words that a message
contains. When an incoming message is analyzed, each word will push the
classification toward spam or non-spam, depending on how strongly the word
is associated with spam or non-spam in the messages on which the classifier
has been trained. A word such as Viagra is a strong indicator of spam, whereas
most low-frequency words (such as analysis or linguistics) are weak indicators
of non-spam. The classifier combines the evidence from each word to reach a
verdict about the message as a whole.
7 The origin of the naive Bayes classifier is obscure. It is a straightforward but non-trivial
application of Bayes’ Theorem, which dates from the 18th century. Widely-used texts such as
Mitchell (1997), Manning and Schütze (1999), Bishop (2007), and Jurafsky and Martin (2009)
discuss it without commenting on its origin. Gale et al. (1992), cited in Manning and Schütze
(1999), apply a naive Bayes classifier to the problem of word-sense disambiguation in natural
language processing, without referring to it as such. That paper, in turn, cites Mosteller and
Wallace (1963), a famous paper that used a naive Bayes classifier (also not referred to as such)
to determine the authorship of twelve of the Federalist Papers. We suspect that naive Bayes
classifiers were used in diverse settings before the name itself caught on.
8 The first academic papers to discuss Bayesian spam classifiers appeared in 1998 (Pantel and
Lin, 1998; Sahami et al., 1998). However, it was an essay from 2002 titled A Plan for Spam that
popularized the concept and made specific proposals to lower the rate of false positives to
the point where the technology became usable (Graham, 2008).
exploring phonological areality in the circum-andean region 17
ul = log ( ÷ ).
N1l N1
[provisional]
N2l N2
N1l is the number of training languages in class 1 that have feature l, and N1
quantities for class 2. The first ratio N1l /N2l is a comparison of the counts
is the total number of training languages in class 1. N2l and N2 are analogous
9 When we were devising the Core and Periphery technique, we tried using other kinds of
classifiers besides nbc, such as support vector machines and logistic regression. The latter two
are most often presented as classifying objects into two classes, but multiclass versions exist.
All three classifiers are supervised learners, in that they classify based on examples provided
by the analyst. In practice, nbc worked better than the other two methods, perhaps because
it is a generative model, whereas the other two are discriminative models. Generative models
tend to work better when the number of data points in the training data is relatively small
and the dimensionality of the data is large (Ng and Jordan, 2001).
As for unsupervised analyses such as principal components analysis or multidimensional
scaling, these are certainly useful as exploratory data analyses, and they may even identify
potentially interesting linguistic areas. But since they are unsupervised, they cannot be
directed by an analyst to examine an areal hypothesis that the analyst is specifically interested
in. We thus omit mention of these analyses in discussing the Core and Periphery technique.
18 michael et al.
N1 /N2 , which expresses the relative sizes of the two classes. The logarithm
of feature l in the two classes. This is counterweighted by the second ratio
has the effect of causing the weight to be zero when the feature is neutral,
positive when it is associated with class 1, and negative when associated with
class 2.
One problem with this formula is that when any of the counts are zero, the
feature weight ul ends up at either positive or negative infinity. To prevent this,
we inflate the counts by a small amount in order to regularize the result:
𝛼 + N1l 𝛼 + β + N1
ul = log ( ÷ ).
𝛼 + N2l 𝛼 + β + N2
For many applications it suffices to set 𝛼 = β = 1/2, but in our analyses we fit
these parameters to the data, as explained in Appendix b.3.
Strictly speaking, the above expression gives the feature weight for the pres-
ence of a feature. It is also necessary to calculate weights for the absence of a
feature, via
β + N1 − N1l 𝛼 + β + N1
vl = log ( ÷ ).
β + N2 − N2l 𝛼 + β + N2
have been replaced by counts for the absence of the feature N1 − N1l and
The main difference is that counts for the presence of a feature N1l and N2l
N2 − N2l . Once feature weights (for both present and absence features) have
been calculated, the classifier is ready to classify.
For the test language, the classifier produces a score
s = ∑{
L
ul if feature l is present in the test language,
(1)
l=1
vl if feature l is absent in the test language.
the procedures. The model posits that our data, which comprise the training
languages, the test language, and the labels for the training languages, were
generated via a set of random events, which are as follows.10
– Randomly generate a feature frequency θkl for each feature l and each class
k. This is the probability that a language in class k will have feature l. Feature
frequencies are unobserved.
– Assign each language, including the test language, to one of K classes with
probability 1/K.11 The assignments of the training languages are observed.
The assignment of the test language is unobserved.
– For each language, endow it with feature l with probability θkl , where k is the
class of the language. Each feature is generated independently of the others,
conditional on k. The features that a language has are all observed.
With this as the premise, the classifier seeks to infer the class of the test lan-
would be generated by the feature frequencies θk1 , …, θkL of class k. From this
guage. It calculates, for each class k, the probability f(k) that the test language
pk = .
f(1) + f(2) + ⋯ + f(K)
f(k)
(2)
If the feature frequencies were known, the formula for f(k) would be straight-
forward:
f(k) = ∏ {
1 − θkl
L
θkl if feature l is in the test language,
[provisional]
l=1
if feature l is not in the test language.
10 When thinking about such models, W.C. finds it helpful to imagine a deity generating the
data according to the procedure given, with some of the deity’s choices hidden from view.
What is not hidden comprises the data. On the basis of this data, we infer some of the
hidden things.
11 In a more sophisticated variant of this model, each language is assigned to class k with
the data. In two-way classification, this adds a term such as log[N1 /N2 ] to the score of the
some probability πk . The random variable πk is not observed, and must be inferred from
test language. When the number of training languages is fixed (as in our analyses), this
term moves all scores up or down by a fixed amount and does not alter any conclusions.
20 michael et al.
or absent) in the test language. We do not know what these feature frequencies
are, but we can obtain some insight (albeit not exactly the right answer)
θkl = Nkl /Nk , where Nkl is the number of times feature l exists among training
by estimating the feature frequencies directly from the data via the formula
⎧ NNkl
{ k
f(k) = ∏ ⎨
L if feature l is in the test language,
l=1 { Nk −Nkl
⎩ Nk
[provisional]
if feature l is not in the test language.
The correct equation, obtained by integrating over all possible values for all
feature frequency θkl ∼ Beta(𝛼, β) we get the following expression for the
feature frequencies, is similar. If we posit a beta distribution prior for each
likelihood:
⎧
{
u�+Nkl
{ u�+β+Nk
f(k) = ∏ ⎨
L if feature l is in the test language,
l=1 {
{ β+Nk −Nkl
⎩ u�+β+Nk
(3)
if feature l is not in the test language.
This, along with Eq. 2, yields the probabilities p1 , …, pK for K-way classification.
Appendices b.1 and b.2 restate the contents of this section more formally and
expand on it.
are related by the function S(o) = 1/(1 + e−o ). This function is plotted here:
the score s and p1 (the probability that a test language belongs in class 1). They
exploring phonological areality in the circum-andean region 21
S(s) = p1 . Conversely we can apply the inverse function S−1 (p) = log p/(1 − p)
⎧
{
u�+N1l
÷
u�+β+N1
{ u�+N2l u�+β+N2
s = ∑ log ⎨
L if feature l is in the test language,
{
{
⎩ β+N2−N2l ÷ u�+β+N2
l=1 β+N1 −N1l u�+β+N1
if feature l is not in the test language.
When K > 2, the structure of the computation in Section 4.3 does not result
were derived.
distinct training stage. Also, since the classification results in more than two
probabilities, it is no longer possible to indicate the classification of the test
language with a single score. We can, however, convert each pk into a log-odds
and indicate the classification with K scores. When reporting the results of
3-way classification in Appendix c.2, this is what we do.
classifying a language. All languages will suffer from this effect to some extent
when undergoing classification, since feature non-independence (or, more col-
loquially, feature clumping) occurs frequently. Vowels of a given height, nasal
vowels, long vowels, voiced stops, aspirated stops, ejective stops, etc.: each of
these classes of sounds tends to be a clump. The presence or absence, in a test
language, of any of these clumps exaggerates classification probabilities, ren-
dering a literal probabilistic interpretation problematic. In our analyses, we
sidestep this problem by disregarding the literal interpretation of the classifica-
tion probabilities and reinterpreting them as measures of linguistic admixture.
This interpretive leap calls for a careful explanation of admixture and how it is
that admixture is not directly modeled by a naive Bayes classifier, to which we
now turn.
By the term ‘admixture’ we refer to the phenomenon where the features of
a language derive from two or more sources. This is analogous on some level
to genetic admixture, where a person inherits certain genes from one parent
and certain genes from the other; or, more abstractly, where a person inherits
features from each of the K distinct ancestral populations in his or her ancestry.
If we were to posit admixture for circum-Andean languages, one way to do this
would be to posit two sources, one for the Andean core and one for the control
class, described in Section 2. Each source is a hypothetical ancestral population
in which there is a certain amount of linguistic diversity. A source does not
have to be an actual set of precursor languages, though this is a good way to
conceptualize it.12 Each modern language descends from one or more sources.
A pure language derives its features from just one source. If, for example, all
of the languages in the ancestral population have /p/, then a descendant of that
source will also have /p/. If 60% of the languages in the ancestral population
have /x/, then a descendant of that source will have /x/ with 60 % probability. In
general, the probability that a descendant has a feature matches the probability
that a randomly-chosen constituent of the ancestral population has it.13 Since
there is some diversity in any ancestral population, one pure descendant does
not have to be identical to another, but it will in almost all cases be classified
as descending from that population with little ambiguity, when all features are
taken into account.
Source k is represented by feature frequencies (θk1 , …, θkL ), where θkl is the frequency of
12 Formally, a source is represented by a bank of feature frequencies, one for each feature.
feature l among the languages of ancestral population k. This is formally identical to how
a class is modeled in nbc; see first bullet in Section 4.3.
13 This is formally identical to how languages are generated in nbc; see third bullet in Section
4.3.
exploring phonological areality in the circum-andean region 23
A mixed language derives its features from more than once source. If, for
example, two ancestral populations are involved, then a certain fraction of the
mixed language’s features may derive from one, while the rest derive from the
other.14 It is often much more reasonable to posit that a language is mixed
rather than pure. For instance, if a language has many distinctively Andean
features and also many distinctively non-Andean features, then it is, on an
intuitive level, best to posit admixture. (Just as, if a dog has many poodle
features and many labrador features, one surmises that it is a mixed breed.)
When a language is mixed, it is often possible to infer the extent to which it
drew from each ancestral population. For circum-Andean languages, such a
statistic would indicate how core-like or control-like a language is.
However, as previously mentioned, the naive Bayes classifier is not a model
of admixture. Rather unrealistically, every test language is assumed to be a pure
language. Classification involves determining not to what extent the language
descended from each ancestral population, but with what probability. Our inter-
pretive leap is to use the latter as an indicator of the former. Unfortunately,
the coarseness of this method of interpretation does not allow us to infer the
absolute proportions of admixture in a language. If the model reports that a
language belongs to class k with probability 0. 7, that is by no means the same
as indicating that 70% of the phonemes of the language are from the source
identified with class k. We can only conclude that, if pk is higher for language
X than for language Y, then X probably derives more of its phonemes from the
source corresponding to class k than Y. This relativistic interpretive strategy,
whatever its drawbacks, has the benefit that it allows us to work around the
fact that feature clumping exaggerates classification probabilities and deprives
them of their usual interpretation.
14 For an example of a model that implements admixture in exactly this way, see Pritchard
et al. (2000).
24 michael et al.
from our analyses all features that occur in five or fewer training languages. This
amounted to discarding 225 of the 304 features in the dataset, leaving 79.
To be consistent with culling rare features, we have also culled near-universal
features on the theory that, when absences are rare, the absences can clump
together just like rare features. Thus, we discarded any feature that is present
in all but five or fewer training languages. This resulted in discarding /t/, /k/,
/i/, and /a/ from our analyses, leaving 75 features.
𝛼 + N1l 𝛼 + N2l
δl = ul − vl = log ( ) − log ( ).
β + N1 − N1l β + N2 − N2l
This measure is zero if the feature is neutral, positive if it is associated with class
1, and negative if it is associated with class 2. We can generalize delta to K-way
classification by defining a set of K deltas for each feature:
∑j≠k hjl
δkl = log ( ) − log ( ),
1 − hkl ∑j≠k 1 − hjl
hkl
5 Results
strongly characteristic of the Andean core, and deltas between 1 and 2 (0. 73 <
p < 0. 88) and −1 and −2 as the range for segments whose presence or absence,
respectively, are moderately characteristic of the Andean core. Strongly char-
acteristic segments are printed in bold, while moderately characteristic ones
are printed in normal weight.
The distinctive phonological profile of the Andean core languages, i.e. the set
of segments that distinguish the Andean core languages from control languages
in terms of either their presence or their absence, is large. The size of this
distinctive phonological profile strongly suggests that the chosen core forms
part of a phonological area distinguishable from the set of control languages.
The distinctive Andean consonantal profile can be positively characterized
as exhibiting contrastive aspirated and ejective stops (a contrast found also in
the postalveolar affricate), as well as a comparatively large number of affricates,
fricatives, and liquids. Less common places of articulation that contribute
positively to the profile include palatal (nasal and liquid) and uvular (stop and
fricative). The consonantal profile can be negatively characterized as excluding
the voiced alveolar stop and affricate, the labialized velar voiceless stop and
nasal, voiced bilabial and voiceless labiodental fricatives, and the glottal stop
and fricative. The distinctive Andean vocalic profile is positively characterized
by /u/ and /iː, uː, aː/, but negatively by the absence of mid vowels, non-low
central vowels, nasal vowels, and long versions of many of these vowels.
The nbc score of each language is given in Appendix c.1 and is plotted on a
map in Fig. 2, where the orange line is a smoothed version of the 2000-meter
elevation contour. Languages with nbc scores near zero, and hence, difficult
to classify as either Andean or non-Andean, appear in light gray. Higher nbc
scores for a language correspond to greater red saturation, while the lower (i.e.
negative) nbc scores correspond to greater blue saturation.
26 michael et al.
table 1 Distinctive features of the Andean core languages. Left: distinctive phonemes
(positive feature deltas). Right: distinctive absences (negative feature deltas).
ph p’ th t’ kh k’ q qhq’ d kw ʔ
tʃ tʃh tʃ’ ʈʂ dʒ
s ʃ x χ βf h
ɲ ŋw
lɾ ʎ
Figure 2 also shows that the nbc score tapers gradually with distance from
the Andean core. The periphery of this phonological area is thus diffuse, lack-
ing a clear boundary separating peripheral languages that are unambiguously
members of the phonological area, such as Yanesha’ [ame], from those that
are clearly not, such as Aguaruna [agr]. If we consider any language with an
nbc score greater than zero to be a candidate for membership in the area, and
(somewhat arbitrarily) any language with an nbc score in the 95th percentile
or greater to be a strong candidate for membership in the area, we obtain a
partitioning of the periphery into ‘strong’ and ‘weak’ members of the linguistic
area. These peripheral members of the Andean core mostly cluster geograph-
ically, as indicated below, and are displayed in the more detailed maps in Figs
3–5.
ecuadorean foothills
Strong: Cha’palaa [cbi] (Barbacoan)
Weak: Kamsá [kbh] (isolate)
huallaga valley
Strong: Chamicuro [ccc] (Arawak), Cholón [cht] (isolate)
Weak: Shiwilu [jeb] (Cahuapanan), Candoshi [cbu] (isolate)
southern peruvian foothills
Strong: Yanesha’ [ame] (Arawak)
Weak: Ashéninka (Apurucayali [cpc] and Pichis [cpu] dialects) (Arawak)
chaco
Strong: Vilela [vil] (isolate), Maká [mca], Chulupí [cag] (both Matacoan)
Weak: Wichí [mtp] (Matacoan), Toba Takshek [tob_tks], Toba Lañagashik
[tob_lng], Mocoví [moc] (all three Guaicuruan)
patagonia
Strong: Ona [ona], Haush [ona_mtr], Puelche [pue], Tehuelche [teh] (all
Chon)
Weak: Northern Alacalufan [alc_nth], Central Alacalufan [alc_cnt], and
Southern Alacalufan [alc_sth] (Alacalufan)
miscellaneous
Weak: Arabela [arl] (Zaparoan), Leko [lec] (isolate)
lowland quechuan languages
Strong: Ferreñafe Quechua [quf], Inga (Jungle dialect) [inj], Napo Quichua
[qvo], San Martín Quechua [qvs], Santiago del Estero Quechua [qus]
figure 3 Languages of the North Andes and Circum-Andean regions (two-way nbc scores).
See Fig. 9 for language names.
languages is either known to have taken place (see, e.g. Adelaar and Muysken,
2004, 411–413; Payne, 1990, 1–10), or such contact is generally plausible, due to
geographical proximity and the ubiquity of trade between adjacent highland
and lowland regions.
Somewhat more surprising is the fact that Patagonia and the Chaco con-
stitute an essentially contiguous phonological area with the southern Andes.
30 michael et al.
figure 4 Languages of the Central Andes and Circum-Andean regions (two-way nbc scores).
See Fig. 10 for language names.
Although there is evidence of trade between the Tiwanaku polity and the
inhabitants of the Chaco between approximately ad 100 and ad 1100 (Angelo
and Capriles, 2000; Lecoq, 1991; Torres and Repke, 2006), it is unclear whether
those relations would have been sufficiently intense to produce the kind of con-
vergence we see between the southern Andean languages. Nevertheless, one
Chacoan linguistic isolate (Vilela) and several Chacoan languages of the Mata-
coan and Guaicuruan families exhibit features strongly statistically associated
with the Andean highlands, including ejectives, uvular consonants, and the
palatal lateral. Evidence of contact between Patagonian and southern Andean
peoples is even sparser, but the former languages likewise exhibit features
characteristic of the Andean core languages. It should be noted that in Pre-
Colombian times, the territory occupied by speakers of Patagonian languages
was contiguous with that occupied by Chacoan peoples (Viegas, 2005: 30),
raising the possibility that the similarity between Andean and Patagonian
exploring phonological areality in the circum-andean region 31
figure 5 Languages of Patagonia (two-way nbc scores). See Fig. 11 for language names.
32 michael et al.
languages arose not from direct contact between the languages of these two
regions, but was mediated by Chacoan languages.
Admixture between circum-Andean languages and more northern lan-
guages of the Andean core appears to involve relatively local and recent con-
vergence of these peripheral languages to Andean core ones, but the pho-
nological convergence evident among Chacoan, Patagonian, and southern
Andean languages does not exhibit clear directionality. The circumstances
that led to this broader areal convergence are less clear, suggesting that much
older, possibly multilateral, processes of phonological borrowing are respon-
sible for the large-scale phonological areality we see in the South American
Cone.
In addition to the languages enumerated above, which comprise an essen-
tially contiguous region with the Andean highlands, we find three other lan-
guages with positive nbc scores whose participation in the Andean and circum-
Andean phonological area is dubious. These languages, listed below as out-
liers, obtain their high nbc scores due, in large part, to having aspirated stops
and/or a palatal lateral in their phonological inventories. Given the probabilis-
tic nature of nbc results and the great distance of these languages from the
Andean core, which renders historical contact with the Andean core languages
extremely unlikely, we conclude that these languages simply bear a chance
resemblance to the languages of the Andean core.
outliers:
Strong: Yawalapití [yaw] (Arawak)
Weak: Yucuna [ycn] (Arawak), Yaathe [fun] (Macro-Ge)
15 This line was chosen to group together the Andean languages with a three-way contrast
between plain, aspirated, and ejective stops.
34 michael et al.
table 2 Distinctive features of the Southern Andean core languages. Left: distinctive
phonemes (positive feature deltas). Right: distinctive absences (negative feature
deltas).
ph p’ th t’ kh k’ q qhq’ d ɡ ʔ
tʃh tʃ’ dʒ ʈʂ
s x χ ɸβf z ʃ
lɬ ʎ ŋw
ɲ w
table 3 Distinctive features of the Northern-Central Andean core languages. Left: distinctive
phonemes (positive feature deltas). Right: distinctive absences (negative feature
deltas).
ts tʃ ʈʂ ɡ ph p’ th t’ kh k’ kw q qh q’ ʔ
sz ʃ tʃh tʃ’
βf ɣ
l ʎ ɬ
ɲ ŋw
iː uː ĩ ĩː ɨ ɨː ̃ ̃ː ũ ũ ː
e eː ẽ ẽː ɛ ɛ̃ ə əː ə̃ ə̃ ː o oː õ õ ː ɔ ɔ̃ ɤ
aː ã ã ː
nbc scores are generally closer to the Southern Core than to the North-Central
Core, and conversely for languages with high North-Central nbc scores. The
fact that Andean-like languages in the peripheral region pattern with the near-
est core, rather than being randomly associated with either sub-core, indicates
that convergence between circum-Andean languages and Andean languages is
a relatively local effect, attributable to language contact between the Andean
languages of each sub-core and their circum-Andean neighbors.
Another random document with
no related content on Scribd:
steun en de schutsengel van Nederland moet blijven, en wie anders
denkt, is mij een vijand!...”
Eenigen der Geuzen stonden verbaasd en sprakeloos; de
meesten nogtans luisterden met geklemde tanden en met eene
uitdrukking van misprijzen.
“De wind is wat spoedig gekeerd!” riep Van der Voort, “gisteren
Geus, heden Paapsch!”
“Neen, neen,” riep Lodewijk, “ik ben nooit veranderd. Ik heb
gezworen met u tegen de Spanjaarden samen te spannen; dit was
onder de voorwaarde, dat men niets van mij tegen den godsdienst
vergen zou, en ik hadde hem niet gedaan dien eed, die mij zoo
zwaar op het hart gelegen heeft, ware het niet geweest om aan de
begeerte van Godmaert te voldoen. Gij zijt het, mijne heeren, die
veranderd zijt; gij hebt het geloof uwer voorvaderen verzaakt om
eene nieuwe gezindheid aan te kleven.”
“Dit is niet waar,” viel Van Halen hem in de rede. “Ik ben getrouw
aan den godsdienst.”
“Wat zult gij morgen dan doen?” vroeg de jonkheer.
“Morgen,” antwoordde Van Halen, Lodewijks hand drukkende,
“morgen zal ik aan uwe zijde staan, en ik zal strijden met u tegen de
scheurders.”
Een algemeene schreeuw van verontwaardiging ging op onder de
Geuzen:
“Nog een lafaard! nog een verrader! Gebannen, de dwepers! Weg
met de Spaanschgezinden! De deur uit!”
De geheele vergadering stond in rep en roer. Dolken werden
vooruitgebracht, en men ging de bedreiging van “de deur uit!”
werkstellig maken, wanneer moeder Schrikkel, vol benauwdheid en
met de armen opgeheven, binnen de zaal kwam geloopen en huilde:
“Gauw, gauw, mijne heeren, vlucht weg! op den zolder, in de goot,
— in den kelder! De wacht is dáár, — het huis is omringd van
gewapende mannen! Gauw, gauw!”
De Geuzen wierpen eenen gloeienden blik op Lodewijk, alsof zij
hem nu van een waar verraad beschuldigden; geen van hen deed
wat moeder Schrikkel zoo angstig aangeraden had. Integendeel, zij
schaarden zich allen in een halfrond, bereidden hunne pistolen,
trokken hunne degens of dolken, en bleven staan met het
voornemen om zich dapper te verweren.
De deur der kamer ging open. Een man van uitnemende lengte en
sterkte trad binnen. Zware knevels daalden hem langs de wangen,
wapenen van allerhanden aard hingen aan zijnen gordel.
“Wolfangh!” riepen de Geuzen verbaasd uit, terwijl zij hunne
degens en dolken weder instaken.
“Heeren,” sprak Wolfangh, zijnen hoed afnemende, “wat is dit?
waartoe die krijgsorde?... Komt op dan!” riep hij, zich naar de
trappen keerende, “komt op, mannen!”
Een twintigtal roovers drongen de zaal in en bevonden zich te
midden der Geuzen, die zich met afkeer van hen verwijderden.
De lastige stappen van menschen, welke iets zwaars geladen
hadden, deden zich nog op de trap hooren.
“Wat brengt gij ons dan, Wolfangh?” vroeg Lodewijk.
“Wat ik u breng, jonkheer? — Godmaert.”
“Godmaert!!” riepen allen met verwondering.
Vier mannen droegen den grijzen Geus op een vederen bed, en
plaatsten hem zachtjes op den vloer neder.
“Vrienden!” sprak hij, “het verheugt mij, dat ik u nogmaals
wederzie. Wie wil mij de hand drukken?”
Lodewijk had deze reeds vast en kuste ze met liefde. De Geuzen
kwamen, de een na den ander, den grijsaard met medelijden in
hunne armen drukken. Allen stonden stilzwijgend en met verbaasde
blikken op hem te staren.
“Wolfangh,” vroeg Schuermans, “hoe hebt gij toch onzen meester
verlost?”
“Heeren,” antwoordde de roover, “dit heeft weinig moeite gekost.
Ik had het gisteren al in den zin, en wilde u eene aangename
verrassing toebrengen. Ik dacht nogtans, dat wij Godmaert in eenen
beteren toestand zouden gevonden hebben.... Nu dan, ik kwam met
mijne makkers zachtjes aan het Steen. Wie is daar?” riep een
schutter, die met vele anderen bij de poort stond. “Wolfangh!”
antwoordde ik met eene donderende stem; en eer ik bij het Steen
naderde, waren zij allen de Palingbrug over en den Vischberg
afgeloopen. De Steenwarer wilde niet opendoen, doch wanneer hij
de poort onder de slagen onzer voorhamers en onder het geweld
onzer hefboomen zag waggelen, liet hij ons ras binnen en smeekte
om zijn leven. Wij gingen dan, door hem vergezeld, tot in de
moordenaarsputten, waar wij Godmaert vonden liggen. Voorts
hebben wij den edelen gevangene van zijn stroo opgelicht en, het
bed van den Steenwarer tot draagbaar nemende, hebben wij hem op
zijne vraag tot hier gebracht.”
Wolfangh keerde zich naar Lodewijk en vroeg met stille stem:
“Jonkheer, hoe heet de priester, die bij Godmaert was?”
“Pater Franciscus uit het Predikheerenklooster.”
De roover bracht den vinger aan zijn voorhoofd, als iemand, die
een woord in zijne hersens wil drukken om het niet te vergeten.
“Oh, wist de dochter van Godmaert, dat haar vader uit de
gevangenis geraakt is, wat vreugde zou het haar zijn!....” zuchtte
Lodewijk.
“Pater Franciscus heeft zich met deze boodschap belast,”
antwoordde Wolfangh. “Mannen!” ging hij voort zich tot zijne
makkers keerende, “ieder ga naar zijne legerplaats. Morgen te acht
uren! Gij blijft hier,” sprak hij tot de vier, die het bed gedragen
hadden.
De roovers ruimden de zaal en, na de Geuzen Godmaert vele
teekens van vriendschap en medelijden gegeven hadden, werd er
gevraagd of men beginnen zou. De stoelen werden binnengebracht
en zoo wel geplaatst, dat allen zich om den grijsaard konden
nederzetten. Deze, door de rust en het bijzijn zijner vrienden een
weinig krachtiger geworden, kon zijne armen reeds verroeren, en
Lodewijk bemerkte met uiterste blijdschap, dat de dood hem niet
treffen zou. Zijn hart vloog naar zijne beminde Geertruid. Nijdig was
hij, dat dit nieuws haar door een ander was gedragen geworden.
“Mijne heeren,” sprak Godmaert, na met een teeken der hand de
stilzwijgendheid gevorderd te hebben, “ik heb mij naar deze
vergadering doen brengen, om met u te beraadslagen over hetgeen
er moet gedaan worden. Hebt gij reeds over de zaak gehandeld?”
Houtappel bezag Lodewijk met eene spottende uitdrukking en
kwam vooruit tot bij Godmaert, dan sprak hij:
“Morgen zullen wij om acht uren ons op de Groote Markt
bevinden. Dit is vastgesteld. Het volk zullen wij door den kreet:
Leven de Geuzen! tot woelen opmaken; het sermoen van Herman in
de hoofdkerk zal eene groote beroerte in de stad verwekken; wij
zullen deze ten onzen voordeele wenden. Dan naar het stadhuis;
alwat Spaansch of Spaanschgezind is, gevangen; de stad met
gewapende mannen bezet, en onzen vrienden van Brussel en van
de Noordergewesten kennis gegeven van den goeden uitslag. Dan
nieuwe wethouders benoemd, het volk uitgezonden om de steden en
vlekken van het markgraafschap te doorloopen en de Spanjaarden
overal te verdrijven. Ik ben zeker, dat dit ontwerp uwe goedkeuring
zal bekomen.”
Godmaert bleef een oogenblik in diep gepeins. Terwijl wachtten de
Geuzen op een antwoord, alhoewel zij niet twijfelden of de oude
krijgsman zou hunne onderneming toejuichen.
Maar hoe stonden zij verslagen, wanneer Godmaert hun zeide:
“Neen, ik kan dit ontwerp niet goedkeuren. De tijd is niet gekomen.
Wij mogen nu tegen de Spanjaarden niet strijden.”
“Hij ook!” riep Houtappel, als vervoerd door razenden toorn.
“Welaan, broederen, wij zijn verraden, maar niet geleverd. Laat ons,
zonder die lafaards langer te kennen, ons werk voortzetten. Zij
mogen alleen met de Spanjaarden, nonnen en papen naar den
hemel gaan!”
Die scherts ontroerde Godmaert; een lichte gloed van gramschap
kleurde zijn bleek voorhoofd, en hij sprak met een streng gelaat:
“Dank moogt gij zeggen, Houtappel, dat mijn lichaam door lijden
uitgeput is, of ik zou uwe goddelooze spotternij op uwen mond doen
sterven. Stil, Lodewijk, word bedaard, mijn zoon.”
Houtappel dorst den grijsaard niet meer hoonen, en ging voort met
tusschen zijne makkers in stilte de verwijtingen en den haat uit te
strooien.
“Ha, nu begrijp ik het!” sprak Godmaert in zich zelven, “nu ken ik
u. — Het is waar, wat pater Franciscus mij zeide: er zijn ketters
onder ons. — Mijne heeren,” ging hij met meer kracht voort, “aan u,
die mijne vrienden zijt, ben ik de uitlegging van mijn gedrag
verschuldigd. Wij haten altemaal de Spanjaarden, eenigen om
persoonlijke redenen, allen omdat zij vreemdelingen zijn en ons
hoonen. Ik heb veel bijgebracht om dien haat onder u aan te stoken;
doch nu betreur ik het.... Mijne oogen zijn opengegaan, en ik heb
met pijn bevonden, dat al onze pogingen, zonder dat ik en velen
onder ons het wisten, tegen onzen godsdienst gericht waren. Dan,
hoe vurig ook mijn haat tegen de Spanjaarden zij, nimmer zal ik met
de vijanden van mijn geloof samenspannen.”
“Wat heeft de biecht gemeens met de omwenteling van morgen?”
schreeuwde Houtappel van uit eenen hoek der kamer.
“Wat zij er mede gemeens heeft, weet gij best,” hernam Godmaert.
“Gij weet, dat Herman Stuyck en zijne aanhangers de kerk van Onze
Lieve Vrouw willen ontheiligen: gij weet, dat de scheurders eene
gelegenheid zoeken om al onze tempels te verwoesten en de
beelden te breken; en gij hoopt, dat de beroerten van morgen die
gelegenheid van zelf zullen doen geboren worden. Ik beklaag mij,
dat ik machteloos ben.... want anders zou ik u misschien kunnen
ontmoeten en bestrijden, in uwe goddelooze aanvallen. En gij, mijne
vrienden, die mij altijd met achting aangehoord hebt, ik bezweer u,
helpt de ketters niet; stelt de omwenteling uit. Verlaat de zijde
dergenen, die zich niet schamen, in deze vergadering zelve met
spotternij te spreken van voor ons heilige zaken.”
Eene merkbare scheuring was er onder de Geuzen gebeurd. In
het diepe der kamer, rond Houtappel en Van der Voort, stonden die,
welke van geen uitstel wilden hooren. Omtrent Godmaert bevonden
zich Lodewijk, Van Halen, De Eydt en bijna de eene helft der
Geuzen. Schuermans liep over en weer, en wist niet bij wat gedeelte
hij zich voegen zou, terwijl Wolfangh zich als een vreemdeling in
deze onderhandeling gedroeg.
Nadat Houtappel met eenigen zijner makkers gesproken had,
kwam hij in het midden der kamer staan, als iemand, die eene
uitdaging gaat doen, en, de hand in de hoogte heffende, riep hij:
“Wij scheiden ons af van de bevreesden! Al wie den naam van
Geus liefheeft, al wie met ons tegen de Spanjaarden strijden wil, dat
hij ons volge.... Wij gaan in eene andere plaats onze
beraadslagingen voortzetten! Verraders mogen ons niet hooren!”
Omtrent de helft gingen de deur uit en verlieten vloekend de
kamer. Houtappel vond zich niet weinig bedrogen, wanneer hij zag,
dat Wolfangh geene beweging deed om met hem te gaan.
“Kom aan, Wolfangh,” riep hij. “Wat kont gij bij deze vreedzame
menschen doen? Gij behoort er bij als een hond in een kegelspel!”
De roover sloeg zijne hand aan een pistool en wilde Houtappel die
scherts met het leven doen betalen; maar Lodewijk belette hem dit
met een teeken.
“Gij zijt gelukkig,” riep Wolfangh. “Ga, ik heb met u niets gemeens,
en laat mij met vrede, of ik zal u leeren spotten!”
Houtappel ging morrend de trappen af. Er bleef dan in de kamer
nog één Geus, die niet wist wat hij doen zou; hij sloeg zich met de
handen tegen het hoofd om een besluit er uit te krijgen; eindelijk riep
hij:
“Zult gijlieden morgen niet vechten?”
“Ja, Schuermans,” antwoordde Van Halen, “tegen de ketters zullen
wij strijden.”
“Ha, dan blijf ik nog liever met u.”
“Ik versta de vreeze van den edelen Godmaert zeer wel,” sprak De
Rydt. “Die vervloekte predikers hebben den haat van een deel des
volks tot hun voordeel gekeerd en hen tot beeldenstormen
opgemaakt. Daar zij in ’t eerst, evenals wij, de Spanjaarden alleen
als vijanden aanzagen, hebben die aanbrengers eener nieuwe leer
het volk haat voor den godsdienst ingeboezemd, en nu denkt het,
dat beelden en Spanjaarden één zijn.”
“Ik heb gehoord,” sprak Van Halen, “dat zij morgen iets tegen
Onze-Lieve-Vrouwekerk willen ondernemen. Zij spreken niet meer
dan van branden en verwoesten. Hoe gaan wij die heiligschenderij
beletten?”
“Ik heb twintig uitgelezene mannen,” zei Wolfangh; “dezen zullen
uwe bevelen stiptelijk ten uitvoer brengen.”
“Meester,” viel een der vier roovers hem in de rede, “zoo wij niets
stelen mogen, zullen die heeren Geuzen hunne beloften ook moeten
volbrengen, of....”
“Zwijg, kerel!” riep Wolfangh.
De roover zweeg en gaf zijne wezenstrekken eene zeer
wantrouwende uitdrukking. Vele Geuzen waren over zijne woorden
verbaasd; want zij wisten niets van deze beloften. Godmaert alleen
kende ze, mits hij ze gedaan had.
“Onze zaak,” sprak de zieke, “is te edel en te verheven geworden
om nog betaalde mannen er toe te gebruiken. Ik zal u het beloofde
loon doen geven. Maar van nu af aan zijt gij ontbonden. Keert terug
naar Zoersel, indien gij wilt.”
“Zij zullen blijven!” riep Wolfangh met een bliksemenden oogslag.
“Ik zal hen dwingen tot goeddoen.... Geen woord meer, kerel!”
De roover sloeg zijne oogen nederwaarts voor de bedreiging van
zijnen meester.
“Luistert, mijne heeren,” hernam Godmaert. “Ziet hier wat gij zoudt
kunnen doen: er zijn nog genoeg getrouwe burgers in onze stad; wij
kennen er veel, die tegen de ketters zijn. Roept die morgen bij
elkander, en gebruikt hen om alle beroerte te beletten en de kerken
te beschutten. Dat Schuermans het volk van het Klapdorp met zich
brenge, De Rydt, gij de trouwe burgers der Nieuwstad, Lodewijk,
onze vrienden van het Kipdorp, Van Halen, de bootsliên van den
Burcht, enzoovoorts, ieder van ulieden degenen, die hem toegedaan
zijn. Gij zult u dan morgen op de Groote Markt bevinden en de
wapenbroeders helpen, indien het noodig is. Op de plaats zelve zult
gij misschien betere maatregelen uitvinden. Alles zal wel gaan.”
Godmaert had tweemaal eenen schotel wijn tot den bodem
geledigd, en dit had hem wonderlijk versterkt, want zijne wangen
waren reeds zacht gekleurd. Lodewijk zag met opgetogenheid den
verbeterden staat des grijsaards: hij verliet hem geen oogenblik en
scheen ten uiterste voor hem bezorgd; op het minste teeken vloog
hij Godmaerts wenschen vooruit, lichtte zijn hoofd op, dekte zijne
ledematen of reikte hem het drinkvat, om zijnen vrienden bescheid te
doen.
Nu hoorde men de voordeur opengaan, en het gerucht van een
krijschend zijden kleedsel deed zich op de trap hooren. Na eenige
oogenblikken lag Geertruid op de borst haars vaders te weenen, niet
van droefheid, maar van verrukking en blijdschap.
“Vader, vader!” riep zij, “ziet gij wel, dat gij genezen zult? O, gij
bloost reeds! En uwe armen kunnen zich om mijnen hals drukken,
laat mij u kussen; gij weet wel, dat de zoenen uwer dochter warm en
krachtig zijn. Vader, lieve vader, gij lacht mij toe!...”
En hare handen lagen plat op des grijsaards wangen. Deze
genoot met verrukking de liefde zijner dochter.
“Lief kind!” zuchtte hij, “gij zijt mij een zegen des hemels!”
Hij knelde haar met teederheid op zijne borat.
De omstanders schouwden in godsdienstig stilzwijgen op dit
tooneel. Schuermans en vele anderen leekten warme tranen van de
wangen. Wolfangh, die nu de belooning eener weldaad smaakte,
had zijne oogen met de handen bedekt en stond in eenen hoek der
zaal geweken. Lodewijk, die geenen enkelen oogwenk van zijne
Geertruid ontvangen had, was half treurig; doch die aandoening was
kort, want Geertruid vatte hem de hand en drukte ze teederlijk. De
jongeling verstond het meisje; een heldere glimlach rees over zijn
gelaat.
“Wolfangh, waar zijt gij?” riep Geertruid, de kamer rondziende.
“Ha, daar zijt gij, verlosser mijns vaders! Dank moet gij hebben; — ik
zal voor u bidden....”
De oogen des roovers blonken van ontroering.
“Ik ben uwe erkentenis onwaardig, edele jonkvrouw,” sprak hij.
“Niettemin acht ik mij gelukkig, iets te hebben kunnen doen, dat u
aangenaam is. Uwe blijdschap is mij eene zoete belooning.”
“Heer Wolfangh,” hernam Geertruid met eene droeve, doch
vriendelijke uitdrukking, “O, het spijt mij, dat een moedig mensch als
gij....”
“Ik versta u, jonkvrouw,” antwoordde de roover, “maar alle hoop is
niet verloren.... Gedenk mijner in uwe gebeden.”
Terwijl Geertruid voortging met Wolfangh te spreken, stond de
oude Theresia, die met de jonkvrouw was binnengekomen, bij haren
grijzen meester te weenen. Duizend uitroepingen kwamen haar uit
den mond, en zij vervulde de kamer met droefheidsgillen; want zij
zag hem voor de eerste maal en kon des meisjes blijdschap niet
begrijpen. Had zij hem zoo nabij het graf gezien als zijne dochter, zij
zou zeker ook wel verheugd zijn geweest. Op Lodewijks bevel
zweeg zij, doch weende voort met doffe snikken.
“Vader,” sprak Geertruid, “laat mij u in onze woning brengen, opdat
gij rusten moget en morgen welgemoed onder mijne zoenen
ontwaket.”
“Heeren,” riep Godmaert, “ik verlaat u. Maakt, dat de dag van
morgen geene gruwelen zie.... Komt, uwe hand nog eens gedrukt,
mijne vrienden, en blijft met God!”
Allen kwamen hem beurtelings de hand drukken en een eerbiedig
vaarwel zeggen.
Wolfangh deed de draagbaar naderen.
“Mannen,” sprak hij tot zijne makkers, “dat men den edelen
Godmaert naar zijne woning drage! Gij allen zult bij het huis blijven
waken en mij op uw leven voor al wat hem geschieden kan,
verantwoorden.”
“Ik dank u, heer Wolfangh,” zei Geertruid, zich voor hem buigende.
De grijsaard werd voorzichtig door de vier roovers opgelicht en
verliet de zaal onder het gejuich zijner vrienden.
“Lodewijk, als gezegd is, heden te acht uren!” riep Schuurmans.
In min dan een oogenblik was de kamer ledig; de stappen der
heengaande personen weergalmden op de trappen, en de voordeur
werd achter hen gesloten.
“Jezus, Jezus! wat zal er vandaag nog gebeuren!” zuchtte moeder
Schrikkel.
En zij schoof den laatsten grendel toe.
IX
...onedele gemeente,
Wat bitse nyd verteert het merch in u gebeente?
Wat dolheid u vervoert?