Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

Analysis Methods in Neural Language Processing: A Survey

Yonatan Belinkov12 and James Glass1


1
MIT Computer Science and Artificial Intelligence Laboratory
2
Harvard School of Engineering and Applied Sciences
Cambridge, MA, USA
{belinkov, glass}@mit.edu

Abstract of analyzing neural networks has connections to


the broader work on interpretability in machine
The field of natural language processing has
seen impressive progress in recent years,
learning, along with specific characteristics of the
with neural network models replacing many NLP field.
arXiv:1812.08951v1 [cs.CL] 21 Dec 2018

of the traditional systems. A plethora of new Why should we analyze our neural NLP mod-
models have been proposed, many of which els? To some extent, this question falls into
are thought to be opaque compared to their the larger question of interpretability in machine
feature-rich counterparts. This has led re- learning, which has been the subject of much de-
searchers to analyze, interpret, and evalu-
bate in recent years.2 Arguments in favor of in-
ate neural networks in novel and more fine-
grained ways. In this survey paper, we re-
terpretability in machine learning usually mention
view analysis methods in neural language goals like accountability, trust, fairness, safety,
processing, categorize them according to and reliability (Doshi-Velez and Kim, 2017; Lip-
prominent research trends, highlight exist- ton, 2016). Arguments against typically stress per-
ing limitations, and point to potential direc- formance as the most important desideratum. All
tions for future work. these arguments naturally apply to machine learn-
ing applications in NLP.
1 Introduction In the context of NLP, this question needs to
The rise of deep learning has transformed the be understood in light of earlier NLP work, often
field of natural language processing (NLP) in re- referred to as feature-rich or feature-engineered
cent years. Models based on neural networks systems. In some of these systems, features are
have obtained impressive improvements in vari- more easily understood by humans – they can be
ous tasks, including language modeling (Mikolov morphological properties, lexical classes, syntac-
et al., 2010; Jozefowicz et al., 2016), syntactic tic categories, semantic relations, etc. In theory,
parsing (Kiperwasser and Goldberg, 2016), ma- one could observe the importance assigned by sta-
chine translation (MT) (Bahdanau et al., 2014; tistical NLP models to such features in order to
Sutskever et al., 2014), and many other tasks; see gain a better understanding of the model.3 In con-
Goldberg (2017) for example success stories. trast, it is more difficult to understand what hap-
This progress has been accompanied by a myr- pens in an end-to-end neural network model that
iad of new neural network architectures. In many takes input (say, word embeddings) and generates
cases, traditional feature-rich systems are being re- an output (say, a sentence classification). Much of
placed by end-to-end neural networks that aim to the analysis work thus aims to understand how lin-
map input text to some output prediction. As end- guistic concepts that were common as features in
to-end systems are gaining prevalence, one may NLP systems are captured in neural networks.
point to two trends. First, some push back against As the analysis of neural networks for language
the abandonment of linguistic knowledge and call www.youtube.com/watch?v=fKk9KhGRBdI. (Videos
for incorporating it inside the networks in different accessed on December 11, 2018.)
2
ways.1 Others strive to better understand how neu- See, for example, the NIPS 2017 debate:
ral language processing models work. This theme www.youtube.com/watch?v=2hW05ZfsUUo. (Ac-
cessed on December 11, 2018.)
1 3
See, for instance, Noah Smith’s invited talk at ACL Nevertheless, one could question how feasible such an
2017: vimeo.com/234958746. See also a recent de- analysis is; consider for example interpreting support vectors
bate on this matter by Chris Manning and Yann LeCun: in high-dimensional support vector machines (SVMs).
is becoming more and more prevalent, neural net- past tense and analyzed its performance on a va-
works in various NLP tasks are being analyzed; riety of examples and conditions. They were es-
different network architectures and components pecially concerned with the performance over the
are being compared; and a variety of new anal- course of training, as their goal was to model the
ysis methods are being developed. This survey past form acquisition in children. They also ana-
aims to review and summarize this body of work, lyzed a scaled-down version having 8 input units
highlight current trends, and point to existing lacu- and 8 output units, which allowed them to de-
nae. It organizes the literature into several themes. scribe it exhaustively and examine how certain
Section 2 reviews work that targets a fundamen- rules manifest in network weights.
tal question: what kind of linguistic information In his seminal work on recurrent neural net-
is captured in neural networks? We also point to works (RNNs), Elman trained networks on syn-
limitations in current methods for answering this thetic sentences in a language prediction task (El-
question. Section 3 discusses visualization meth- man, 1989, 1990, 1991). Through extensive anal-
ods, and emphasizes the difficulty in evaluating vi- yses, he showed how networks discover the no-
sualization work. In Section 4 we discuss the com- tion of a word when predicting characters; cap-
pilation of challenge sets, or test suites, for fine- ture syntactic structures like number agreement;
grained evaluation, a methodology that has old and acquire word representations that reflect lexi-
roots in NLP. Section 5 deals with the generation cal and syntactic categories. Similar analyses were
and use of adversarial examples to probe weak- later applied to other networks and tasks (Har-
nesses of neural networks. We point to unique ris, 1990; Niklasson and Linåker, 2000; Pollack,
characteristics of dealing with text as a discrete 1990; Frank et al., 2013).
input and how different studies handle them. Sec- While Elman’s work was limited in some
tion 6 summarizes work on explaining model pre- ways, such as evaluating generalization or various
dictions, an important goal of interpretability re- linguistic phenomena—as Elman himself recog-
search. This is a relatively under-explored area, nized (Elman, 1989)—it introduced methods that
and we call for more work in this direction. Sec- are still relevant today: from visualizing network
tion 7 mentions a few other methods that do not activations in time, through clustering words by
fall neatly into one of the above themes. In the hidden state activations, to projecting representa-
conclusion, we summarize the main gaps and po- tions to dimensions that emerge as capturing prop-
tential research directions for the field. erties like sentence number or verb valency. The
The paper is accompanied by online supple- sections on visualization (Section 3) and identi-
mentary materials that contain detailed refer- fying linguistic information (Section 2) contain
ences for studies corresponding to Sections 2, many examples for these kinds of analysis.
4, and 5 (Tables SM1, SM2, and SM3, respec-
2 What linguistic information is
tively), available at boknilev.github.io/
captured in neural networks
nlp-analysis-methods.
Neural network models in NLP are typically
Before proceeding, we briefly mention some
trained in an end-to-end manner on input-output
earlier work of a similar spirit.
pairs, without explicitly encoding linguistic fea-
A historical note Reviewing the vast literature tures. Thus a primary questions is the following:
on neural networks for language is beyond our what linguistic information is captured in neural
scope.4 However, we mention here a few repre- networks? When examining answers to this ques-
sentative studies that focused on analyzing such tion, it is convenient to consider three dimensions:
networks, in order to illustrate how recent trends which methods are used for conducting the analy-
have roots that go back to before the recent deep sis, what kind of linguistic information is sought,
learning revival. and which objects in the neural network are be-
Rumelhart and McClelland (1986) built a feed- ing investigated. Table SM1 (in the supplementary
forward neural network for learning the English materials) categorizes relevant analysis work ac-
4
cording to these criteria. In the next sub-sections,
For instance, a neural network that learns distributed rep-
resentations of words was developed already in Miikkulainen
we discuss trends in analysis work along these
and Dyer (1991). See Goodfellow et al. (2016, chapter 12.4) lines, followed by a discussion of limitations of
for references to other important milestones. current approaches.
2.1 Methods computing correlations between neural network
activations and some property, for example, cor-
The most common approach for associating neu-
relating RNN state activations with depth in a
ral network components with linguistic properties
syntactic tree (Qian et al., 2016a) or with Mel-
is to predict such properties from activations of
frequency cepstral coefficient (MFCC) acoustic
the neural network. Typically, in this approach
features (Wu and King, 2016). Such correspon-
a neural network model is trained on some task
dence may also be computed indirectly. For in-
(say, MT) and its weights are frozen. Then, the
stance, Alishahi et al. (2017) defined an ABX dis-
trained model is used for generating feature repre-
crimination task to evaluate how a neural model of
sentations for another task by running it on a cor-
speech (grounded in vision) encoded phonology.
pus with linguistic annotations and recording the
Given phoneme representations from different lay-
representations (say, hidden state activations). An-
ers in their model, and three phonemes, A, B, and
other classifier is then used for predicting the prop-
X, they compared whether the model representa-
erty of interest (say, part-of-speech (POS) tags).
tion for X is closer to A or B. This discrimina-
The performance of this classifier is used for eval-
tion task enabled them to draw conclusions about
uating the quality of the generated representations,
which layers encoder phonology better, observing
and by proxy that of the original model. This kind
that lower layers generally encode more phonolog-
of approach has been used in numerous papers in
ical information.
recent years; see Table SM1 for references.5 It is
referred to by various names, including “auxiliary
prediction tasks” (Adi et al., 2017b), “diagnostic 2.2 Linguistic phenomena
classifiers” (Veldhoen et al., 2016), and “probing Different kinds of linguistic information have been
tasks” (Conneau et al., 2018). analyzed, ranging from basic properties like sen-
As an example of this approach, let us tence length, word position, word presence, or
walk through an application to analyzing syn- simple word order, to morphological, syntactic,
tax in neural machine translation (NMT) by and semantic information. Phonetic/phonemic in-
Shi et al. (2016b). In this work, two NMT formation, speaker information, and style and ac-
models were trained on standard parallel data cent information have been studied in neural net-
– English→French and English→German. The work models for speech, or in joint audio-visual
trained models (specifically, the encoders) were models. See Table SM1 for references.
run on an annotated corpus and their hidden states While it is difficult to synthesize a holistic pic-
were used for training a logistic regression clas- ture from this diverse body of work, it appears
sifier that predicts different syntactic properties. that neural networks are able to learn a substan-
The authors concluded that the NMT encoders tial amount of information on various linguistic
learn significant syntactic information at both phenomena. These models are especially success-
word-level and sentence-level. They also com- ful at capturing frequent properties, while some
pared representations at different encoding layers rare properties are more difficult to learn. Linzen
and found that “local features are somehow pre- et al. (2016), for instance, found that long short-
served in the lower layer whereas more global, term memory (LSTM) language models are able
abstract information tends to be stored in the up- to capture subject-verb agreement in many com-
per layer.” These results demonstrate the kind of mon cases, while direct supervision is required for
insights that the classification analysis may lead solving harder cases.
to, especially when comparing different models or Another theme that emerges in several studies
model components. is the hierarchical nature of the learned represen-
Other methods for finding correspondences be- tations. We have already mentioned such findings
tween parts of the neural network and certain regarding NMT (Shi et al., 2016b) and a visually
properties include counting how often attention grounded speech model (Alishahi et al., 2017).
weights agree with a linguistic property like Hierarchical representations of syntax were also
anaphora resolution (Voita et al., 2018) or directly reported to emerge in other RNN models (Blevins
5 et al., 2018).
A similar method has been used to analyze hierarchi-
cal structure in neural networks trained on arithmetic expres- Finally, a couple of papers discovered that mod-
sions (Veldhoen et al., 2016; Hupkes et al., 2018). els trained with latent trees perform better on nat-
ural language inference (NLI) (Williams et al., which can be uncovered by applying a linear trans-
2018; Maillard and Clark, 2018) than ones trained formation on the learned embeddings. Their re-
with linguistically-annotated trees. Moreover, the sults suggest an alternative explanation, showing
trees in these models do not resemble syntactic that “embedding models are able to encode diver-
trees corresponding to known linguistic theories, gent linguistic information but have limits on how
which casts doubts on the importance of syntax- this information is surfaced.”
learning in the underlying neural network.6 From a methodological point of view, most of
the relevant analysis work is concerned with cor-
2.3 Neural network components
relation: how correlated are neural network com-
In terms of the object of study, various neural neu- ponents with linguistic properties? What may be
ral network components were investigated, includ- lacking is a measure of causation: how does the
ing word embeddings, RNN hidden states or gate encoding of linguistic properties affect the sys-
activations, sentence embeddings, and attention tem output. Giulianelli et al. (2018) make some
weights in sequence-to-sequence (seq2seq) mod- headway on this question. They predicted number
els. Generally less work has analyzed convolu- agreement from RNN hidden states and gates at
tional neural networks (CNNs) in NLP, but see different time steps. They then intervened in how
Jacovi et al. (2018) for a recent exception. In the model processes the sentence by changing a
speech processing, researchers have analyzed lay- hidden activation based on the difference between
ers in deep neural networks for speech recognition the prediction and the correct label. This improved
and different speaker embeddings. Some analy- agreement prediction accuracy, and the effect per-
sis has also been devoted to joint language-vision sisted over the coarse of the sentence, indicating
or audio-vision models, or to similarities between that this information has an effect on the model.
word embeddings and convolutional image rep- However, they did not report the effect on overall
resentations. Table SM1 provides detailed refer- model quality, for example by measuring perplex-
ences. ity. Methods from causal inference may shed new
light on some of these questions.
2.4 Limitations
Finally, the predictor for the auxiliary task is
The classification approach may find that a cer- usually a simple classifier, such as logistic re-
tain amount of linguistic information is captured gression. A few studies compared different clas-
in the neural network. However, this does not sifiers and found that deeper classifiers lead to
necessarily mean that the information is used by overall better results, but do not alter the respec-
the network. For example, Vanmassenhove et al. tive trends when comparing different models or
(2017) investigated aspect in NMT (and in phrase- components (Qian et al., 2016b; Belinkov, 2018).
based statistical MT). They trained a classifier on Interestingly, Conneau et al. (2018) found that
NMT sentence encoding vectors and found that tasks requiring more nuanced linguistic knowl-
they can accurately predict tense about 90% of the edge (e.g., tree depth, coordination inversion) gain
time. However, when evaluating the output trans- the most from using a deeper classifier. However,
lations, they found them to have the correct tense the approach is usually taken for granted; given
only 79% of the time. They interpreted this re- its prevalence, it appears that better theoretical or
sult to mean that “part of the aspectual informa- empirical foundations are in place.
tion is lost during decoding”. Relatedly, Cífka and
Bojar (2018) compared the performance of vari- 3 Visualization
ous NMT models in terms of translation quality
(BLEU) and representation quality (classification Visualization is a valuable tool for analyzing neu-
tasks). They found a negative correlation between ral networks in the language domain and beyond.
the two, suggesting that high-quality systems may Early work visualized hidden unit activations in
not be learning certain sentence meanings. In con- RNNs trained on an artificial language modeling
trast, Artetxe et al. (2018) showed that word em- task, and observed how they correspond to certain
beddings contain divergent linguistic information, grammatical relations such as agreement (Elman,
6
Others found that even simple binary trees may work
1991). Much recent work has focused on visu-
well in MT (Wang et al., 2018b) and sentence classifica- alizing activations on specific examples in mod-
tion (Chen et al., 2015). ern neural networks for language (Karpathy et al.,
Figure 1: A heatmap visualizing neuron activa-
tions. In this case, the activations capture position
in the sentence.

2015; Kádár et al., 2017; Qian et al., 2016a; Liu


et al., 2018) and speech (Wu and King, 2016;
Nagamine et al., 2015; Wang et al., 2017b). Fig-
ure 1 shows an example visualization of a neuron
that captures position of words in a sentence. The
heatmap uses blue and red colors for negative and
positive activation values, respectively, enabling
the user to quickly grasp the function of this neu- Figure 2: A visualization of attention weights,
ron. showing soft alignment between source and target
The attention mechanism that originated in sentences in an NMT model. Reproduced from
work on NMT (Bahdanau et al., 2014) also lends Bahdanau et al. (2014), with permission.
itself to a natural visualization. The alignments
obtained via different attention mechanisms have
produced visualizations ranging from tasks like An instructive visualization technique is to clus-
NLI (Rocktäschel et al., 2016; Yin et al., 2016), ter neural network activations and compare them
summarization (Rush et al., 2015), MT post- to some linguistic property. Early work clustered
editing (Jauregi Unanue et al., 2018), and morpho- RNN activations, showing that they organize in
logical inflection (Aharoni and Goldberg, 2017), lexical categories (Elman, 1989, 1990). Similar
to matching users on social media (Tay et al., techniques have been followed by others. Re-
2018). Figure 2 reproduces a visualization of cent examples include clustering of sentence em-
attention alignments from the original work by beddings in an RNN encoder trained in a multi-
Bahdanau et al.. Here grayscale values corre- task learning scenario (Brunner et al., 2017), and
spond to the weight of the attention between words phoneme clusters in a joint audio-visual RNN
in an English source sentence (columns) and its model (Alishahi et al., 2017).
French translation (rows). As Bahdanau et al. A few online tools for visualizing neu-
explain, this visualization demonstrates that the ral networks have recently become available.
NMT model learned a soft alignment between LSTMVis (Strobelt et al., 2018b) visualizes RNN
source and target words. Some aspects of word activations, focusing on tracing hidden state dy-
order may also be noticed, as in the reordering namics.8 Seq2Seq-Vis (Strobelt et al., 2018a)
of noun and adjective when translating the phrase visualizes different modules in attention-based
“European Economic Area”. seq2seq models, with the goal of examining model
Another line of work computes various saliency decisions and testing alternative decisions. An-
measures to attribute predictions to input features. other tool focused on comparing attention align-
The important or salient features can then be vi- ments was proposed by Rikters (2018). It also pro-
sualized in selected examples (Li et al., 2016a; vides translation confidence scores based on the
Aubakirova and Bansal, 2016; Sundararajan et al., distribution of attention weights. NeuroX (Dalvi
2017; Arras et al., 2017a,b; Ding et al., 2017; Mur- et al., 2019b) is a tool for finding and analyzing
doch et al., 2018; Mudrakarta et al., 2018; Mon- individual neurons, focusing on machine transla-
tavon et al., 2018; Godin et al., 2018). Saliency tion.
can also be computed with respect to intermediate
values, rather than input features (Ghaeini et al., Evaluation As in much work on interpretability,
2018).7 evaluating visualization quality is difficult and of-
ten limited to qualitative examples. A few notable
7
Generally, many of the visualization methods are
8
adapted from the vision domain, where they have been ex- RNNVis (Ming et al., 2017) is a similar tool, but its on-
tremely popular; see Zhang and Zhu (2018) for a survey. line demo does not seem to be available at the time of writing.
exceptions report human evaluations of visualiza- less utilized. The challenge datasets can be cate-
tion quality. Singh et al. (2018) showed humans gorized along the following criteria: the task they
hierarchical clusterings of input words generated seek to evaluate, the linguistic phenomena they
by two interpretation methods, and asked them aim to study, the language(s) they target, their
to evaluate which method is more accurate, or in size, their method of construction, and how perfor-
which method they trust more. Others reported mance is evaluated.10 Table SM2 (in the supple-
human evaluations for attention visualization in mentary materials) categorizes many recent chal-
conversation modeling (Freeman et al., 2018) and lenge sets along these criteria. Below we discuss
medical code prediction tasks (Mullenbach et al., common trends along these lines.
2018).
The availability of open-source tools of the sort 4.1 Task
described above will hopefully encourage users to
utilize visualization in their regular research and By far, the most targeted tasks in challenge sets
development cycle. However, it remains to be seen are NLI and MT. This can partly be explained by
how useful visualizations turn out to be. the popularity of these tasks and the prevalence
of neural models proposed for solving them. Per-
4 Challenge sets haps more importantly, tasks like NLI and MT ar-
guably require inferences at various linguistic lev-
The majority of benchmark datasets in NLP are
els, making the challenge set evaluation especially
drawn from text corpora, reflecting a natural
attractive. Still, other high-level tasks like read-
frequency distribution of language phenomena.
ing comprehension or question answering have not
While useful in practice for evaluating system
received as much attention, and may also benefit
performance in the average case, such datasets
from the careful construction of challenge sets.
may fail to capture a wide range of phenomena.
An alternative evaluation framework consists of A significant body of work aims to evaluate
challenge sets, also known as test suites, which the quality of embedding models by by correlat-
have been used in NLP for a long time (Lehmann ing the similarity they induce on word or sentence
et al., 1996), especially for evaluating MT sys- pairs with human similarity judgments. Datasets
tems (King and Falkedal, 1990; Isahara, 1995; containing such similarity scores are often used
Koh et al., 2001). Lehmann et al. (1996) noted to evaluate word embeddings (Finkelstein et al.,
several key properties of test suites: systematicity, 2002; Bruni et al., 2012; Hill et al., 2015, in-
control over data, inclusion of negative data, and ter alia) or sentence embeddings; see the many
exhaustivity. They contrasted such datasets with shared tasks on semantic textual similarity in Se-
test corpora, “whose main advantage is that they mEval (Cer et al., 2017, and previous editions).
reflect naturally occurring data.” This idea under- Many of these datasets evaluate similarity at a
lines much of the work on challenge sets and is coarse-grained level, but some provide a more
echoed in more recent work (Wang et al., 2018a). fine-grained evaluation of similarity or related-
For instance, Cooper et al. (1996) constructed a se- ness. For example, some datasets are dedicated
mantic test suite that targets phenomena as diverse for specific word classes such as verbs (Gerz et al.,
as quantifiers, plurals, anaphora, ellipsis, adjecti- 2016) or rare words (Luong et al., 2013), or for
val properties, and so on. evaluating compositional knowledge in sentence
embeddings (Marelli et al., 2014). Multilingual
After a hiatus of a couple of decades,9 challenge
and cross-lingual versions have also been col-
sets have recently gained renewed popularity in
lected (Leviant and Reichart, 2015; Cer et al.,
the NLP community. In this section, we include
2017). Although these datasets are widely used,
datasets used for evaluating neural network mod-
this kind of evaluation has been criticized for
els that diverge from the common average-case
its subjectivity and questionable correlation with
evaluation. Many of them share some of the prop-
downstream performance (Faruqui et al., 2016).
erties noted by Lehmann et al. (1996), although
negative examples (ill-formed data) are typically
10
Another typology of evaluation protocols was put forth
9
One could speculate that their decrease in popularity can by Burlot and Yvon (2017). Their criteria are partially over-
be attributed to the rise of large-scale quantitative evaluation lapping with ours, although they did not provide a compre-
of statistical NLP systems. hensive categorization as the one compiled here.
4.2 Linguistic phenomena 4.4 Scale
One of the primary goals of challenge sets is to The size of proposed challenge sets varies greatly
evaluate models on their ability to handle spe- (Table SM2). As expected, datasets constructed
cific linguistic phenomena. While earlier stud- by hand are smaller, with typical sizes in the
ies emphasized exhaustivity (Cooper et al., 1996; hundreds. Automatically-built datasets are much
Lehmann et al., 1996), recent ones tend to fo- larger, ranging from several thousands to close to a
cus on a few properties of interest. For exam- hundred thousand (Sennrich, 2017), or even more
ple, Sennrich (2017) introduced a challenge set for than one million examples (Linzen et al., 2016).
MT evaluation focusing on 5 properties: subject- In the latter case, the authors argue that such a
verb agreement, noun phrase agreement, verb- large test set is needed for obtaining a sufficient
particle constructions, polarity, and transliteration. representation of rare cases. A few manually-
Slightly more elaborated is an MT challenge set constructed datasets contain a fairly large number
for morphology, including 14 morphological prop- of examples, up to 10k (Burchardt et al., 2017).
erties (Burlot and Yvon, 2017). See Table SM2 for
references to datasets targeting other phenomena. 4.5 Construction method
Other challenge sets cover a more diverse range Challenge sets are usually created either program-
of linguistic properties, in the spirit of some of matically or manually, by hand-crafting specific
the earlier work. For instance, extending the cat- examples. Often, semi-automatic methods are
egories in Cooper et al. (1996), the GLUE anal- used to compile an initial list of examples that
ysis set for NLI covers more than 30 phenom- is manually verified by annotators. The specific
ena in four coarse categories (lexical semantics, method also affects the kind of language use and
predicate-argument structure, logic, and knowl- how natural or artificial/synthetic the examples
edge). In MT evaluation, Burchardt et al. (2017) are. We describe here some trends in dataset con-
reported results using a large test suite cover- struction methods in the hope that they may be
ing 120 phenomena, partly based on Lehmann useful for researchers contemplating new datasets.
et al. (1996).11 Isabelle et al. (2017) and Is- Several datasets were constructed by modify-
abelle and Kuhn (2018) prepared challenge sets ing or extracting examples from existing datasets.
for MT evaluation covering fine-grained phenom- For instance, Sanchez et al. (2018) and Glockner
ena at morpho-syntactic, syntactic, and lexical lev- et al. (2018) extracted examples from SNLI (Bow-
els. man et al., 2015) and replaced specific words such
Generally, datasets that are constructed pro- as hypernyms, synonyms, and antonyms, followed
grammatically tend to cover less fine-grained lin- by manual verification. Linzen et al. (2016), on
guistic properties, while manually constructed the other hand, extracted examples of subject-verb
datasets represent more diverse phenomena. agreement from raw texts using heuristics, result-
ing in a large-scale dataset. Gulordava et al. (2018)
4.3 Languages
extended this to other agreement phenomena, but
As unfortunately usual in much NLP work, espe- they relied on syntactic information available in
cially neural NLP, the vast majority of challenge treebanks, resulting in a smaller dataset.
sets are in English. This situation is slightly better Several challenge sets utilize existing test suites,
in MT evaluation, where naturally all datasets fea- either as a direct source of examples (Burchardt
ture other languages (see Table SM2). A notable et al., 2017) or for searching similar naturally oc-
exception is the work by Gulordava et al. (2018), curring examples (Wang et al., 2018a).12
who constructed examples for evaluating number Sennrich (2017) introduced a method for eval-
agreement in language modeling in English, Rus- uating NMT systems via contrastive translation
sian, Hebrew, and Italian. Clearly, there is room pairs, where the system is asked to estimate the
for more challenge sets in non-English languages. probability of two candidate translations that are
However, perhaps more pressing is the need for designed to reflect specific linguistic properties.
large-scale non-English datasets (besides MT) to Sennrich generated such pairs programmatically
develop neural models for popular NLP tasks.
12
Wang et al. (2018a) also verified that their examples do
11
Their dataset does not seem to be available yet, but more not contain annotation artifacts, a potential problem noted in
details are promised to appear in a future publication. recent studies (Gururangan et al., 2018; Poliak et al., 2018b).
by applying simple heuristics, such as chang- The basic setup in work on adversarial examples
ing gender and number to induce agreement er- can be described as follows.13 Given a neural net-
rors, resulting in a large-scale challenge set of work model f and an input example x, we seek to
close to 100 thousand examples. This frame- generate an adversarial example x0 that will have
work was extended to evaluate other properties, a minimal distance from x, while being assigned a
but often requiring more sophisticated genera- different label by f :
tion methods like using morphological analyz-
ers/generators (Burlot and Yvon, 2017) or more min
0
||x − x0 ||
x
manual involvement in generation (Bawden et al.,
s.t. f (x) = l, f (x0 ) = l0 , l 6= l0
2018) or verification (Rios Gonzales et al., 2017).
Finally, a few of studies define templates In the vision domain, x can be the input image pix-
that capture certain linguistic properties and in- els, resulting in a fairly intuitive interpretation of
stantiate them with word lists (Dasgupta et al., this optimization problem: measuring the distance
2018; Rudinger et al., 2018; Zhao et al., 2018a). ||x − x0 || is straightforward, and finding x0 can be
Template-based generation has the advantage of done by computing gradients with respect to the
providing more control, for example for obtaining input, since all quantities are continuous.
a specific vocabulary distribution, but this comes In the text domain, the input is discrete (for ex-
at the expense of how natural the examples are. ample, a sequence of words), which poses two
problems. First, it is not clear how to measure the
4.6 Evaluation distance between the original and adversarial ex-
Systems are typically evaluated by their perfor- amples, x and x0 , which are two discrete objects
mance on the challenge set examples, either with (say, two words or sentences). Second, minimiz-
the same metric used for evaluating the system in ing this distance cannot be easily formulated as an
the first place, or via a proxy, as in the contrastive optimization problem, as this requires computing
pairs evaluation of Sennrich (2017). Automatic gradients with respect to a discrete input.
evaluation metrics are cheap to obtain and can be In the following, we review methods for han-
calculated on a large scale. However, they may dling these difficulties according to several cri-
miss certain aspects. Thus a few studies report hu- teria: the adversary’s knowledge, the specificity
man evaluation on their challenge sets, such as in of the attack, the linguistic unit being modified,
MT (Isabelle et al., 2017; Burchardt et al., 2017). and the task on which the attacked model was
We note here also that judging the quality of a trained.14 Table SM3 (in the supplementary ma-
model by its performance on a challenge set can terials) categorizes work on adversarial examples
be tricky. Some authors emphasize their wish to in NLP according to these criteria.
test systems on extreme or difficult cases, “beyond
normal operational capacity” (Naik et al., 2018). 5.1 Adversary’s knowledge
However, whether or not one should expect sys- Adversarial examples can be generated using ac-
tems to perform well on specially chosen cases (as cess to model parameters, also known as white-
opposed to the average case) may depend on one’s box attacks, or without such access, with black-
goals. To put results in perspective, one may com- box attacks (Papernot et al., 2016a, 2017; Narodyt-
pare model performance to human performance on ska and Kasiviswanathan, 2017; Liu et al., 2017).
the same task (Gulordava et al., 2018). White-box attacks are difficult to adapt to the
text world as they typically require computing gra-
5 Adversarial examples dients with respect to the input, which would be
discrete in the text case. One option is to com-
Understanding a model requires also an under- pute gradients with respect to the input word em-
standing of its failures. Despite their success in beddings, and perturb the embeddings. Since this
many tasks, machine learning systems can also be may result in a vector that does not correspond to
very sensitive to malicious attacks or adversarial
13
examples (Szegedy et al., 2014; Goodfellow et al., The notation here follows Yuan et al. (2017).
14
These criteria are partly taken from Yuan et al. (2017),
2015). In the vision domain, small changes to the where a more elaborate taxonomy is laid out. At present,
input image can lead to misclassification, even if though, the work on adversarial examples in NLP is more
such changes are indistinguishable by humans. limited than in computer vision, so our criteria will suffice.
any word, one could search for the closest word the majority of adversarial examples in NLP are
embedding in a given dictionary (Papernot et al., non-targeted (see Table SM3). A few targeted at-
2016b); Cheng et al. (2018) extended this idea to tacks include Liang et al. (2018), which specified
seq2seq models. Others computed gradients with a desired class to fool a text classifier, and Chen
respect to input word embeddings to identify and et al. (2018a), which specified words or captions
rank words to be modified (Samanta and Mehta, to generate in an image captioning model. Oth-
2017; Liang et al., 2018). Ebrahimi et al. (2018b) ers targeted specific words to omit, replace, or
developed an alternative method by representing include when attacking seq2seq models (Cheng
text edit operations in vector space (e.g., a bi- et al., 2018; Ebrahimi et al., 2018a).
nary vector specifying which characters in a word Methods for generating targeted attacks in NLP
would be changed) and approximating the change could possibly take more inspiration from adver-
in loss with the derivative along this vector. sarial attacks in other fields. For instance, in at-
Given the difficulty in generating white-box ad- tacking malware detection systems, several stud-
versarial examples for text, much research has ies developed targeted attacks in a black-box sce-
been devoted to black-box examples. Often, the nario (Yuan et al., 2017). A black-box targeted at-
adversarial examples are inspired by text edits that tack for MT was proposed by Zhao et al. (2018c),
are thought to be natural or commonly generated who used GANs to search for attacks on Google’s
by humans, such as typos, misspellings, and so MT system after mapping sentences into contin-
on (Sakaguchi et al., 2017; Heigold et al., 2018; uous space with adversarially regularized autoen-
Belinkov and Bisk, 2018). Gao et al. (2018) de- coders (Zhao et al., 2018b).
fined scoring functions to identify tokens to mod-
5.3 Linguistic unit
ify. Their functions do not require access to model
internals, but they do require the model prediction Most of the work on adversarial text examples
score. After identifying the important tokens, they involves modifications at the character- and/or
modify characters with common edit operations. word-level; see Table SM3 for specific references.
Zhao et al. (2018c) used generative adversar- Other transformations include adding sentences
ial networks (GANs) (Goodfellow et al., 2014) to or text chunks (Jia and Liang, 2017) or gen-
minimize the distance between latent representa- erating paraphrases with desired syntactic struc-
tions of input and adversarial examples, and per- tures (Iyyer et al., 2018). In image captioning,
formed perturbations in latent space. Since the la- Chen et al. (2018a) modified pixes in the input im-
tent representations do not need to come from the age to generate targeted attacks on the caption text.
attacked model, this is a black-box attack. 5.4 Task
Finally, Alzantot et al. (2018) developed an in-
Generally, most work on adversarial examples
teresting population-based genetic algorithm for
in NLP concentrates on relatively high-level lan-
crafting adversarial examples for text classifica-
guage understanding tasks, such as text classifi-
tion, by maintaining a population of modifications
cation (including sentiment analysis) and reading
of the original sentence and evaluating fitness of
comprehension, while work on text generation fo-
modifications at each generation. They do not re-
cuses mainly on MT. See Table SM3 for refer-
quire access to model parameters, but do use pre-
ences. There is relatively little work on adversar-
diction scores. A similar idea was proposed by
ial examples for more low-level language process-
Kuleshov et al. (2018).
ing tasks, although one can mention morphologi-
cal tagging (Heigold et al., 2018) and spelling cor-
5.2 Attack specificity
rection (Sakaguchi et al., 2017).
Adversarial attacks can be classified to targeted
vs. non-targeted attacks (Yuan et al., 2017). A 5.5 Coherence & perturbation measurement
targeted attack specifies a specific false class, l0 , In adversarial image examples, it is fairly straight-
while a non-targeted attack only cares that the pre- forward to measure the perturbation, either by
dicted class is wrong, l0 6= l. Targeted attacks measuring distance in pixel space, say ||x − x0 ||
are more difficult to generate, as they typically re- under some norm, or with alternative measures
quire knowledge of model parameters, i.e., they that are better correlated with human percep-
are white-box attacks. This might explain why tion (Rozsa et al., 2016). It is also visually com-
pelling to present an adversarial image with imper- put segments to explain predictions, they do not
ceptible difference from its source image. In the shed much light on the internal computations that
text domain, measuring distance is not as straight- take place in the network.
forward and even small changes to the text may At present, despite the recognized importance
be perceptible by humans. Thus, evaluation of at- for interpretability, our ability to explain predic-
tacks is fairly tricky. Some studies imposed con- tions of neural networks in NLP is still limited.
straints on adversarial examples to have a small
number of edit operations (Gao et al., 2018). Oth- 7 Other methods
ers ensured syntactic or semantic coherence in
We briefly mention here several analysis methods
different ways, such as filtering replacements by
that do not fall neatly into the previous sections.
word similarity or sentence similarity (Alzantot
et al., 2018; Kuleshov et al., 2018), or by us- A number of studies evaluated the effect of eras-
ing synonyms and other word lists (Samanta and ing or masking certain neural network compo-
Mehta, 2017; Yang et al., 2018). nents, such as word embedding dimensions, hid-
den units, or even full words (Li et al., 2016b;
Some reported whether a human can classify
Feng et al., 2018; Khandelwal et al., 2018; Bau
the adversarial example correctly (Yang et al.,
et al., 2018). For example, Li et al. (2016b) erased
2018), but this does not indicate how perceptible
specific dimensions in word embeddings or hid-
the changes are. More informative human studies
den states and computed the change in proba-
evaluate grammaticality or similarity of the adver-
bility assigned to different labels. Their exper-
sarial examples to the original ones (Zhao et al.,
iments revealed interesting differences between
2018c; Alzantot et al., 2018). Given the inherent
word embedding models, where in some models
difficulty in generating imperceptible changes in
information is more focused in individual dimen-
text, more such evaluations are needed.
sions. They also found that information is more
6 Explaining predictions distributed in hidden layers than in the input layer,
and erased entire words to find important words in
Explaining specific predictions is recognized as a sentiment analysis task.
a desideratum in intereptability work (Lipton, Several studies conducted behavioral experi-
2016), argued to increase the accountability of ma- ments to interpret word embeddings by defining
chine learning systems (Doshi-Velez et al., 2017). intrusion tasks, where humans need to identify
However, explaining why a deep, highly non- an intruder word, chosen based on difference in
linear neural network makes a certain prediction word embedding dimensions (Murphy et al., 2012;
is not trivial. One solution is to ask the model to Fyshe et al., 2015; Faruqui et al., 2015).16 In this
generate explanations along with its primary pre- kind of work, a word embedding model may be
diction (Zaidan et al., 2007; Zhang et al., 2016),15 deemed more interpretable if humans are better
but this approach requires manual annotations of able to identify the intruding words. Since the
explanations, which may be hard to collect. evaluation is costly for high-dimensional represen-
An alternative approach is to use parts of the tations, alternative automatic metrics were consid-
input as explanations. For example, Lei et al. ered (Park et al., 2017; Senel et al., 2018).
(2016) defined a generator that learns a distribu- A long tradition in work on neural networks is
tion over text fragments as candidate rationales to evaluate and analyze their ability to learn dif-
for justifying predictions, evaluated on sentiment ferent formal languages (Das et al., 1992; Casey,
analysis. Alvarez-Melis and Jaakkola (2017) dis- 1996; Gers and Schmidhuber, 2001; Bodén and
covered input-output associations in a sequence- Wiles, 2002; Chalup and Blair, 2003). This trend
to-sequence learning scenario, by perturbing the continues today, with research into modern ar-
input and finding the most relevant associations. chitectures and what formal languages they can
Gupta and Schütze (2018) inspected how informa- learn (Weiss et al., 2018; Bernardy, 2018; Suzgun
tion is accumulated in RNNs towards a prediction, et al., 2019), or the formal properties they pos-
and associated peaks in prediction scores with im- sess (Chen et al., 2018b).
portant input segments. As these methods use in-
16
The methodology follows earlier work on evaluating the
15
Other work considered learning textual-visual explana- interpretability of probabilistic topic models with intrusion
tions from multi-modal annotations (Park et al., 2018). tasks (Chang et al., 2009).
8 Conclusion Finally, as with any survey in a rapidly evolving
field, this paper is likely to omit relevant recent
Analyzing neural networks has become a hot topic
work by the time of publication. While we in-
in NLP research. This survey attempted to review
tend to continue updating the online appendix with
and summarize as much of the current research as
newer publications, we hope that our summariza-
possible, while organizing it along several promi-
tion of prominent analysis work and its categoriza-
nent themes. We have emphasized aspects in anal-
tion into several themes will be a useful guide for
ysis that are specific to language – namely, what
scholars interested in analyzing and understanding
linguistic information is captured in neural net-
neural networks for NLP.
works, which phenomena they are successful at
capturing, and where they fail. Many of the analy- Acknowledgments
sis methods are general techniques from the larger
machine learning community, such as visualiza- We would like to thank the anonymous review-
tion via saliency measures, or evaluation by ad- ers and the Action Editor for their very helpful
versarial examples. But even those sometimes re- comments. This work was supported by the Qatar
quire non-trivial adaptations to work with text in- Computing Research Institute. Y.B. is also sup-
put. Some methods are more specific to the field, ported by the Harvard Mind, Brain, Behavior Ini-
but may prove useful in other domains. Challenge tiative.
sets or test suites are such a case.
Throughout this survey, we have identified sev-
eral limitations or gaps in current analysis work: References
• The use of auxiliary classification tasks for Yossi Adi, Einat Kermany, Yonatan Belinkov, Ofer
identifying which linguistic properties neural Lavi, and Yoav Goldberg. 2017a. Analysis of
networks capture has become standard prac- sentence embedding models using prediction
tice (Section 2), while lacking both a theoret- tasks in natural language processing. IBM Jour-
ical foundation and a better empirical consid- nal of Research and Development, 61(4):3–9.
eration of the link between the auxiliary tasks
and the original task. Yossi Adi, Einat Kermany, Yonatan Belinkov, Ofer
Lavi, and Yoav Goldberg. 2017b. Fine-grained
• Evaluation of analysis work is often lim- Analysis of Sentence Embeddings Using Auxil-
ited or qualitative, especially in visualization iary Prediction Tasks. In International Confer-
techniques (Section 3). Newer forms of eval- ence on Learning Representations (ICLR).
uation are needed for determining the success
of different methods. Roee Aharoni and Yoav Goldberg. 2017. Morpho-
logical Inflection Generation with Hard Mono-
• Relatively little work has been done on ex-
tonic Attention. In Proceedings of the 55th
plaining predictions of neural network mod-
Annual Meeting of the Association for Compu-
els, apart from providing visualizations (Sec-
tational Linguistics (Volume 1: Long Papers),
tion 6). With the increasing public de-
pages 2004–2015. Association for Computa-
mand for explaining algorithmic choices in
tional Linguistics.
machine learning systems (Doshi-Velez and
Kim, 2017; Doshi-Velez et al., 2017), there is Wasi Uddin Ahmad, Xueying Bai, Zhechao
pressing need for progress in this direction. Huang, Chao Jiang, Nanyun Peng, and Kai-Wei
• Much of the analysis work is focused on the Chang. 2018. Multi-task Learning for Univer-
English language, especially in constructing sal Sentence Embeddings: A Thorough Evalua-
challenge sets for various tasks (Section 4), tion using Transfer and Auxiliary Tasks. arXiv
with the exception of MT due to its inherent preprint arXiv:1804.07911v2.
multilingual character. Developing resources
Afra Alishahi, Marie Barking, and Grzegorz Chru-
and evaluating methods on other languages is
pała. 2017. Encoding of phonology in a re-
important as the field grows and matures.
current neural model of grounded speech. In
• More challenge sets for evaluating other tasks Proceedings of the 21st Conference on Com-
besides NLI and MT are needed. putational Natural Language Learning (CoNLL
2017), pages 368–378. Association for Compu- Anthony Bau, Yonatan Belinkov, Hassan Sajjad,
tational Linguistics. Nadir Durrani, Fahim Dalvi, and James Glass.
2018. Identifying and Controlling Important
David Alvarez-Melis and Tommi Jaakkola. 2017.
Neurons in Neural Machine Translation. arXiv
A causal framework for explaining the predic-
preprint arXiv:1811.01157v1.
tions of black-box sequence-to-sequence mod-
els. In Proceedings of the 2017 Conference on Rachel Bawden, Rico Sennrich, Alexandra Birch,
Empirical Methods in Natural Language Pro- and Barry Haddow. 2018. Evaluating Dis-
cessing, pages 412–421. Association for Com- course Phenomena in Neural Machine Transla-
putational Linguistics. tion. In Proceedings of the 2018 Conference
of the North American Chapter of the Associ-
Moustafa Alzantot, Yash Sharma, Ahmed Elgo-
ation for Computational Linguistics: Human
hary, Bo-Jhang Ho, Mani Srivastava, and Kai-
Language Technologies, Volume 1 (Long Pa-
Wei Chang. 2018. Generating Natural Lan-
pers), pages 1304–1313. Association for Com-
guage Adversarial Examples. In Proceedings
putational Linguistics.
of the 2018 Conference on Empirical Methods
in Natural Language Processing, pages 2890– Yonatan Belinkov. 2018. On Internal Language
2896. Association for Computational Linguis- Representations in Deep Learning: An Analy-
tics. sis of Machine Translation and Speech Recog-
Leila Arras, Franziska Horn, Grégoire Montavon, nition. Ph.D. thesis, Massachusetts Institute of
Klaus-Robert Müller, and Wojciech Samek. Technology.
2017a. "What is relevant in a text document?": Yonatan Belinkov and Yonatan Bisk. 2018. Syn-
An interpretable machine learning approach. thetic and Natural Noise Both Break Neural
PLOS ONE, 12(8):1–23. Machine Translation. In International Confer-
Leila Arras, Grégoire Montavon, Klaus-Robert ence on Learning Representations (ICLR).
Müller, and Wojciech Samek. 2017b. Explain- Yonatan Belinkov, Nadir Durrani, Fahim Dalvi,
ing Recurrent Neural Network Predictions in Hassan Sajjad, and James Glass. 2017a. What
Sentiment Analysis. In Proceedings of the do Neural Machine Translation Models Learn
8th Workshop on Computational Approaches to about Morphology? In Proceedings of the 55th
Subjectivity, Sentiment and Social Media Anal- Annual Meeting of the Association for Compu-
ysis, pages 159–168. Association for Computa- tational Linguistics (Volume 1: Long Papers),
tional Linguistics. pages 861–872. Association for Computational
Mikel Artetxe, Gorka Labaka, Inigo Lopez- Linguistics.
Gazpio, and Eneko Agirre. 2018. Uncover-
Yonatan Belinkov and James Glass. 2017. Ana-
ing Divergent Linguistic Information in Word
lyzing Hidden Representations in End-to-End
Embeddings with Lessons for Intrinsic and Ex-
Automatic Speech Recognition Systems. In
trinsic Evaluation. In Proceedings of the 22nd
I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach,
Conference on Computational Natural Lan-
R. Fergus, S. Vishwanathan, and R. Garnett, ed-
guage Learning, pages 282–291. Association
itors, Advances in Neural Information Process-
for Computational Linguistics.
ing Systems 30, pages 2441–2451. Curran As-
Malika Aubakirova and Mohit Bansal. 2016. In- sociates, Inc.
terpreting Neural Networks to Improve Polite-
Yonatan Belinkov, Lluís Màrquez, Hassan Sajjad,
ness Comprehension. In Proceedings of the
Nadir Durrani, Fahim Dalvi, and James Glass.
2016 Conference on Empirical Methods in Nat-
2017b. Evaluating Layers of Representation in
ural Language Processing, pages 2035–2041.
Neural Machine Translation on Part-of-Speech
Association for Computational Linguistics.
and Semantic Tagging Tasks. In Proceedings
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua of the Eighth International Joint Conference on
Bengio. 2014. Neural Machine Translation by Natural Language Processing (Volume 1: Long
Jointly Learning to Align and Translate. arXiv Papers), pages 1–10. Asian Federation of Natu-
preprint arXiv:1409.0473v7. ral Language Processing.
Jean-Philippe Bernardy. 2018. Can Recurrent Franck Burlot and François Yvon. 2017. Evaluat-
Neural Networks Learn Nested Recursion? ing the morphological competence of Machine
LiLT (Linguistic Issues in Language Technol- Translation Systems. In Proceedings of the Sec-
ogy), 16(1). ond Conference on Machine Translation, pages
43–55. Association for Computational Linguis-
Arianna Bisazza and Clara Tump. 2018. The Lazy
tics.
Encoder: A Fine-Grained Analysis of the Role
of Morphology in Neural Machine Translation. Mike Casey. 1996. The Dynamics of Discrete-
In Proceedings of the 2018 Conference on Em- Time Computation, with Application to Recur-
pirical Methods in Natural Language Process- rent Neural Networks and Finite State Machine
ing, pages 2871–2876. Association for Compu- Extraction. Neural computation, 8(6):1135–
tational Linguistics. 1178.
Terra Blevins, Omer Levy, and Luke Zettlemoyer.
Daniel Cer, Mona Diab, Eneko Agirre, Inigo
2018. Deep RNNs Encode Soft Hierarchical
Lopez-Gazpio, and Lucia Specia. 2017.
Syntax. In Proceedings of the 56th Annual
SemEval-2017 Task 1: Semantic Textual Sim-
Meeting of the Association for Computational
ilarity Multilingual and Crosslingual Focused
Linguistics (Volume 2: Short Papers), pages 14–
Evaluation. In Proceedings of the 11th Inter-
19. Association for Computational Linguistics.
national Workshop on Semantic Evaluation
Mikael Bodén and Janet Wiles. 2002. On (SemEval-2017), pages 1–14. Association for
learning context-free and context-sensitive lan- Computational Linguistics.
guages. IEEE Transactions on Neural Net-
works, 13(2):491–493. Rahma Chaabouni, Ewan Dunbar, Neil Zeghi-
dour, and Emmanuel Dupoux. 2017. Learning
Samuel R. Bowman, Gabor Angeli, Christopher weakly supervised multimodal phoneme em-
Potts, and Christopher D. Manning. 2015. A beddings. In Interspeech 2017.
large annotated corpus for learning natural lan-
guage inference. In Proceedings of the 2015 Stephan K. Chalup and Alan D. Blair. 2003. Incre-
Conference on Empirical Methods in Natural mental Training of First Order Recurrent Neu-
Language Processing, pages 632–642. Associ- ral Networks to Predict a Context-sensitive Lan-
ation for Computational Linguistics. guage. Neural Networks, 16(7):955–972.
Elia Bruni, Gemma Boleda, Marco Baroni, and
Jonathan Chang, Sean Gerrish, Chong Wang, Jor-
Nam Khanh Tran. 2012. Distributional Seman-
dan L. Boyd-graber, and David M. Blei. 2009.
tics in Technicolor. In Proceedings of the 50th
Reading Tea Leaves: How Humans Interpret
Annual Meeting of the Association for Compu-
Topic Models. In Y. Bengio, D. Schuurmans,
tational Linguistics (Volume 1: Long Papers),
J. D. Lafferty, C. K. I. Williams, and A. Cu-
pages 136–145. Association for Computational
lotta, editors, Advances in Neural Information
Linguistics.
Processing Systems 22, pages 288–296. Curran
Gino Brunner, Yuyi Wang, Roger Wattenhofer, Associates, Inc.
and Michael Weigelt. 2017. Natural Language
Multitasking: Analyzing and Improving Syn- Hongge Chen, Huan Zhang, Pin-Yu Chen, Jinfeng
tactic Saliency of Hidden Representations. The Yi, and Cho-Jui Hsieh. 2018a. Attacking visual
31st Annual Conference on Neural Information language grounding with adversarial examples:
Processing (NIPS) - Workshop on Learning Dis- A case study on neural image captioning. In
entangled Features: from Perception to Control. Proceedings of the 56th Annual Meeting of the
Association for Computational Linguistics (Vol-
Aljoscha Burchardt, Vivien Macketanz, Jon De- ume 1: Long Papers), pages 2587–2597. Asso-
hdari, Georg Heigold, Jan-Thorsten Peter, and ciation for Computational Linguistics.
Philip Williams. 2017. A Linguistic Evaluation
of Rule-Based, Phrase-Based, and Neural MT Xinchi Chen, Xipeng Qiu, Chenxi Zhu, Shiyu Wu,
Engines. The Prague Bulletin of Mathematical and Xuanjing Huang. 2015. Sentence Model-
Linguistics, 108(1):159–170. ing with Gated Recursive Neural Network. In
Proceedings of the 2015 Conference on Empir- Fahim Dalvi, Nadir Durrani, Hassan Sajjad,
ical Methods in Natural Language Processing, Yonatan Belinkov, D. Anthony Bau, and James
pages 793–798. Association for Computational Glass. 2019a. What Is One Grain of Sand
Linguistics. in the Desert? Analyzing Individual Neurons
in Deep NLP Models. In Proceedings of the
Yining Chen, Sorcha Gilroy, Andreas Maletti, Thirty-Third AAAI Conference on Artificial In-
Jonathan May, and Kevin Knight. 2018b. Re- telligence (AAAI).
current Neural Networks as Weighted Lan-
guage Recognizers. In Proceedings of the Fahim Dalvi, Nadir Durrani, Hassan Sajjad,
2018 Conference of the North American Chap- Yonatan Belinkov, and Stephan Vogel. 2017.
ter of the Association for Computational Lin- Understanding and Improving Morphological
guistics: Human Language Technologies, Vol- Learning in the Neural Machine Translation
ume 1 (Long Papers), pages 2261–2271. Asso- Decoder. In Proceedings of the Eighth In-
ciation for Computational Linguistics. ternational Joint Conference on Natural Lan-
guage Processing (Volume 1: Long Papers),
Minhao Cheng, Jinfeng Yi, Huan Zhang, Pin-Yu pages 142–151. Asian Federation of Natural
Chen, and Cho-Jui Hsieh. 2018. Seq2Sick: Language Processing.
Evaluating the Robustness of Sequence-to-
Sequence Models with Adversarial Examples. Fahim Dalvi, Avery Nortonsmith, D. Anthony
arXiv preprint arXiv:1803.01128v1. Bau, Yonatan Belinkov, Hassan Sajjad, Nadir
Durrani, and James Glass. 2019b. NeuroX:
Grzegorz Chrupała, Lieke Gelderloos, and Afra A Toolkit for Analyzing Individual Neurons
Alishahi. 2017. Representations of language in in Neural Networks. In Proceedings of the
a model of visually grounded speech signal. In Thirty-Third AAAI Conference on Artificial In-
Proceedings of the 55th Annual Meeting of the telligence (AAAI): Demonstrations Track.
Association for Computational Linguistics (Vol-
Sreerupa Das, C. Lee Giles, and Guo-Zheng Sun.
ume 1: Long Papers), pages 613–622. Associa-
1992. Learning Context-free Grammars: Capa-
tion for Computational Linguistics.
bilities and Limitations of a Recurrent Neural
Ondřej Cífka and Ondřej Bojar. 2018. Are BLEU Network with an External Stack Memory. In
and Meaning Representation in Opposition? In Proceedings of The Fourteenth Annual Confer-
Proceedings of the 56th Annual Meeting of the ence of Cognitive Science Society. Indiana Uni-
Association for Computational Linguistics (Vol- versity, page 14.
ume 1: Long Papers), pages 1362–1371. Asso- Ishita Dasgupta, Demi Guo, Andreas Stuhlmüller,
ciation for Computational Linguistics. Samuel J. Gershman, and Noah D. Good-
man. 2018. Evaluating Compositionality
Alexis Conneau, Germán Kruszewski, Guillaume
in Sentence Embeddings. arXiv preprint
Lample, Loïc Barrault, and Marco Baroni.
arXiv:1802.04302v2.
2018. What you can cram into a single $&!#*
vector: Probing sentence embeddings for lin- Dhanush Dharmaretnam and Alona Fyshe. 2018.
guistic properties. In Proceedings of the 56th The Emergence of Semantics in Neural Net-
Annual Meeting of the Association for Compu- work Representations of Visual Information.
tational Linguistics (Volume 1: Long Papers), In Proceedings of the 2018 Conference of the
pages 2126–2136. Association for Computa- North American Chapter of the Association for
tional Linguistics. Computational Linguistics: Human Language
Technologies, Volume 2 (Short Papers), pages
Robin Cooper, Dick Crouch, Jan van Eijck, Chris 776–780. Association for Computational Lin-
Fox, Josef van Genabith, Jan Jaspars, Hans guistics.
Kamp, David Milward, Manfred Pinkal, Mas-
simo Poesio, Steve Pulman, Ted Briscoe, Hol- Yanzhuo Ding, Yang Liu, Huanbo Luan, and
ger Maier, and Karsten Konrad. 1996. Using Maosong Sun. 2017. Visualizing and Under-
the framework. Technical report, The FraCaS standing Neural Machine Translation. In Pro-
Consortium. ceedings of the 55th Annual Meeting of the As-
sociation for Computational Linguistics (Vol- Jeffrey L. Elman. 1989. Representation and Struc-
ume 1: Long Papers), pages 1150–1159. Asso- ture in Connectionist Models. Technical report,
ciation for Computational Linguistics. University of California, San Diego, Center for
Research in Language.
Finale Doshi-Velez and Been Kim. 2017. To-
wards A Rigorous Science of Interpretable Jeffrey L. Elman. 1990. Finding Structure in
Machine Learning. In arXiv preprint Time. Cognitive science, 14(2):179–211.
arXiv:1702.08608v2. Jeffrey L. Elman. 1991. Distributed representa-
tions, simple recurrent networks, and grammat-
Finale Doshi-Velez, Mason Kortz, Ryan Budish,
ical structure. Machine learning, 7(2-3):195–
Chris Bavitz, Sam Gershman, David O’Brien,
225.
Stuart Shieber, James Waldo, David Wein-
berger, and Alexandra Wood. 2017. Account- Allyson Ettinger, Ahmed Elgohary, and Philip
ability of AI Under the Law: The Role of Ex- Resnik. 2016. Probing for semantic evidence
planation. Berkman Center Publication Forth- of composition by means of simple classifica-
coming. tion tasks. In Proceedings of the 1st Workshop
on Evaluating Vector-Space Representations for
Jennifer Drexler and James Glass. 2017. Analy- NLP, pages 134–139. Association for Computa-
sis of Audio-Visual Features for Unsupervised tional Linguistics.
Speech Recognition. In International Work-
shop on Grounding Language Understanding. Manaal Faruqui, Yulia Tsvetkov, Pushpendre Ras-
togi, and Chris Dyer. 2016. Problems With
Javid Ebrahimi, Daniel Lowd, and Dejing Dou. Evaluation of Word Embeddings Using Word
2018a. On Adversarial Examples for Character- Similarity Tasks. In Proc. of the 1st Workshop
Level Neural Machine Translation. In Proceed- on Evaluating Vector Space Representations for
ings of the 27th International Conference on NLP.
Computational Linguistics, pages 653–663. As-
sociation for Computational Linguistics. Manaal Faruqui, Yulia Tsvetkov, Dani Yogatama,
Chris Dyer, and Noah A. Smith. 2015. Sparse
Javid Ebrahimi, Anyi Rao, Daniel Lowd, and De- Overcomplete Word Vector Representations. In
jing Dou. 2018b. HotFlip: White-Box Adver- Proceedings of the 53rd Annual Meeting of the
sarial Examples for Text Classification. In Pro- Association for Computational Linguistics and
ceedings of the 56th Annual Meeting of the As- the 7th International Joint Conference on Natu-
sociation for Computational Linguistics (Vol- ral Language Processing (Volume 1: Long Pa-
ume 2: Short Papers), pages 31–36. Association pers), pages 1491–1500. Association for Com-
for Computational Linguistics. putational Linguistics.

Ali Elkahky, Kellie Webster, Daniel Andor, and Shi Feng, Eric Wallace, Alvin Grissom II, Mo-
Emily Pitler. 2018. A Challenge Set and Meth- hit Iyyer, Pedro Rodriguez, and Jordan Boyd-
ods for Noun-Verb Ambiguity. In Proceedings Graber. 2018. Pathologies of Neural Models
of the 2018 Conference on Empirical Methods Make Interpretations Difficult. In Proceedings
in Natural Language Processing, pages 2562– of the 2018 Conference on Empirical Methods
2572. Association for Computational Linguis- in Natural Language Processing, pages 3719–
tics. 3728. Association for Computational Linguis-
tics.
Zied Elloumi, Laurent Besacier, Olivier Galib- Lev Finkelstein, Evgeniy Gabrilovich, Yossi Ma-
ert, and Benjamin Lecouteux. 2018. Analyzing tias, Ehud Rivlin, Zach Solan, Gadi Wolfman,
Learned Representations of a Deep ASR Per- and Eytan Ruppin. 2002. Placing Search in
formance Prediction Model. In Proceedings of Context: The Concept Revisited. ACM Trans-
the 2018 EMNLP Workshop BlackboxNLP: An- actions on information systems, 20(1):116–131.
alyzing and Interpreting Neural Networks for
NLP, pages 9–15. Association for Computa- Robert Frank, Donald Mathis, and William
tional Linguistics. Badecker. 2013. The Acquisition of Anaphora
by Simple Recurrent Networks. Language Ac- Daniela Gerz, Ivan Vulić, Felix Hill, Roi Reichart,
quisition, 20(3):181–227. and Anna Korhonen. 2016. SimVerb-3500: A
Large-Scale Evaluation Set of Verb Similarity.
Cynthia Freeman, Jonathan Merriman, Abhinav
In Proceedings of the 2016 Conference on Em-
Aggarwal, Ian Beaver, and Abdullah Mueen.
pirical Methods in Natural Language Process-
2018. Paying Attention to Attention: Highlight-
ing, pages 2173–2182. Association for Compu-
ing Influential Samples in Sequential Analysis.
tational Linguistics.
arXiv preprint arXiv:1808.02113v1.
Alona Fyshe, Leila Wehbe, Partha P. Talukdar, Hamidreza Ghader and Christof Monz. 2017.
Brian Murphy, and Tom M. Mitchell. 2015. What does Attention in Neural Machine Trans-
A Compositional and Interpretable Semantic lation Pay Attention to? In Proceedings of the
Space. In Proceedings of the 2015 Conference Eighth International Joint Conference on Natu-
of the North American Chapter of the Associ- ral Language Processing (Volume 1: Long Pa-
ation for Computational Linguistics: Human pers), pages 30–39. Asian Federation of Natural
Language Technologies, pages 32–41. Associ- Language Processing.
ation for Computational Linguistics.
Reza Ghaeini, Xiaoli Fern, and Prasad Tadepalli.
David Gaddy, Mitchell Stern, and Dan Klein. 2018. Interpreting Recurrent and Attention-
2018. What’s Going On in Neural Constituency Based Neural Models: A Case Study on Nat-
Parsers? An Analysis. In Proceedings of the ural Language Inference. In Proceedings of the
2018 Conference of the North American Chap- 2018 Conference on Empirical Methods in Nat-
ter of the Association for Computational Lin- ural Language Processing, pages 4952–4957.
guistics: Human Language Technologies, Vol- Association for Computational Linguistics.
ume 1 (Long Papers), pages 999–1010. Associ-
ation for Computational Linguistics. Mario Giulianelli, Jack Harding, Florian Mohn-
ert, Dieuwke Hupkes, and Willem Zuidema.
J. Ganesh, Manish Gupta, and Vasudeva Varma. 2018. Under the Hood: Using Diagnostic Clas-
2017. Interpretation of Semantic Tweet Rep- sifiers to Investigate and Improve how Lan-
resentations. In Proceedings of the 2017 guage Models Track Agreement Information.
IEEE/ACM International Conference on Ad- In Proceedings of the 2018 EMNLP Workshop
vances in Social Networks Analysis and Mining BlackboxNLP: Analyzing and Interpreting Neu-
2017, ASONAM ’17, pages 95–102, New York, ral Networks for NLP, pages 240–248. Associ-
NY, USA. ACM. ation for Computational Linguistics.
Ji Gao, Jack Lanchantin, Mary Lou Soffa,
Max Glockner, Vered Shwartz, and Yoav Gold-
and Yanjun Qi. 2018. Black-box Genera-
berg. 2018. Breaking NLI Systems with Sen-
tion of Adversarial Text Sequences to Evade
tences that Require Simple Lexical Inferences.
Deep Learning Classifiers. arXiv preprint
In Proceedings of the 56th Annual Meeting of
arXiv:1801.04354v5.
the Association for Computational Linguistics
Lieke Gelderloos and Grzegorz Chrupała. 2016. (Volume 2: Short Papers), pages 650–655. As-
From phonemes to images: Levels of represen- sociation for Computational Linguistics.
tation in a recurrent neural model of visually-
grounded language learning. In Proceedings of Fréderic Godin, Kris Demuynck, Joni Dambre,
COLING 2016, the 26th International Confer- Wesley De Neve, and Thomas Demeester. 2018.
ence on Computational Linguistics: Technical Explaining Character-Aware Neural Networks
Papers, pages 1309–1319, Osaka, Japan. The for Word-Level Prediction: Do They Discover
COLING 2016 Organizing Committee. Linguistic Rules? In Proceedings of the 2018
Conference on Empirical Methods in Natural
Felix A. Gers and Jürgen Schmidhuber. 2001. Language Processing, pages 3275–3284. Asso-
LSTM Recurrent Networks Learn Simple ciation for Computational Linguistics.
Context-Free and Context-Sensitive Lan-
guages. IEEE Transactions on Neural Yoav Goldberg. 2017. Neural Network methods
Networks, 12(6):1333–1340. for Natural Language Processing, volume 10 of
Synthesis Lectures on Human Language Tech- Catherine L. Harris. 1990. Connectionism and
nologies. Morgan & Claypool Publishers. Cognitive Linguistics. Connection Science,
2(1-2):7–33.
Ian Goodfellow, Yoshua Bengio, and Aaron
Courville. 2016. Deep Learning. MIT Press. David Harwath and James Glass. 2017. Learn-
http://www.deeplearningbook.org. ing Word-Like Units from Joint Audio-Visual
Analysis. In Proceedings of the 55th Annual
Ian Goodfellow, Jean Pouget-Abadie, Mehdi
Meeting of the Association for Computational
Mirza, Bing Xu, David Warde-Farley, Sherjil
Linguistics (Volume 1: Long Papers), pages
Ozair, Aaron Courville, and Yoshua Bengio.
506–517. Association for Computational Lin-
2014. Generative Adversarial Nets. In Ad-
guistics.
vances in neural information processing sys-
tems, pages 2672–2680. Georg Heigold, Günter Neumann, and Josef van
Ian J. Goodfellow, Jonathon Shlens, and Christian Genabith. 2018. How Robust Are Character-
Szegedy. 2015. Explaining and Harnessing Ad- Based Word Embeddings in Tagging and MT
versarial Examples. In International Confer- Against Wrod Scramlbing or Randdm Nouse?
ence on Learning Representations (ICLR). In Proceedings of the 13th Conference of The
Association for Machine Translation in the
Kristina Gulordava, Piotr Bojanowski, Edouard Americas (Volume 1: Research Track), pages
Grave, Tal Linzen, and Marco Baroni. 2018. 68–79.
Colorless Green Recurrent Networks Dream
Hierarchically. In Proceedings of the 2018 Felix Hill, Roi Reichart, and Anna Korhonen.
Conference of the North American Chapter 2015. SimLex-999: Evaluating Semantic
of the Association for Computational Linguis- Models With (Genuine) Similarity Estimation.
tics: Human Language Technologies, Volume 1 Computational Linguistics, 41(4):665–695.
(Long Papers), pages 1195–1205. Association
for Computational Linguistics. Dieuwke Hupkes, Sara Veldhoen, and Willem
Zuidema. 2018. Visualisation and ’diagnos-
Abhijeet Gupta, Gemma Boleda, Marco Baroni, tic classifiers’ reveal how recurrent and recur-
and Sebastian Padó. 2015. Distributional vec- sive neural networks process hierarchical struc-
tors encode referential attributes. In Proceed- ture. Journal of Artificial Intelligence Research,
ings of the 2015 Conference on Empirical Meth- 61:907–926.
ods in Natural Language Processing, pages 12–
21. Association for Computational Linguistics. Pierre Isabelle, Colin Cherry, and George Foster.
2017. A Challenge Set Approach to Evaluat-
Pankaj Gupta and Hinrich Schütze. 2018. LISA: ing Machine Translation. In Proceedings of the
Explaining Recurrent Neural Network Judg- 2017 Conference on Empirical Methods in Nat-
ments via Layer-wIse Semantic Accumulation ural Language Processing, pages 2486–2496.
and Example to Pattern Transformation. In Association for Computational Linguistics.
Proceedings of the 2018 EMNLP Workshop
BlackboxNLP: Analyzing and Interpreting Neu- Pierre Isabelle and Roland Kuhn. 2018. A Chal-
ral Networks for NLP, pages 154–164. Associ- lenge Set for French–> English Machine Trans-
ation for Computational Linguistics. lation. arXiv preprint arXiv:1806.02725v2.

Suchin Gururangan, Swabha Swayamdipta, Omer Hitoshi Isahara. 1995. JEIDA’s test-sets for qual-
Levy, Roy Schwartz, Samuel Bowman, and ity evaluation of MT systems-technical evalua-
Noah A. Smith. 2018. Annotation Artifacts in tion from the developer’s point of view. In Pro-
Natural Language Inference Data. In Proceed- ceedings of MT Summit V.
ings of the 2018 Conference of the North Amer-
ican Chapter of the Association for Computa- Mohit Iyyer, John Wieting, Kevin Gimpel, and
tional Linguistics: Human Language Technolo- Luke Zettlemoyer. 2018. Adversarial Exam-
gies, Volume 2 (Short Papers), pages 107–112. ple Generation with Syntactically Controlled
Association for Computational Linguistics. Paraphrase Networks. In Proceedings of the
2018 Conference of the North American Chap- Margaret King and Kirsten Falkedal. 1990. Using
ter of the Association for Computational Lin- Test Suites in Evaluation of Machine Transla-
guistics: Human Language Technologies, Vol- tion Systems. In COLNG 1990 Volume 2: Pa-
ume 1 (Long Papers), pages 1875–1885. Asso- pers presented to the 13th International Confer-
ciation for Computational Linguistics. ence on Computational Linguistics.

Alon Jacovi, Oren Sar Shalom, and Yoav Gold- Eliyahu Kiperwasser and Yoav Goldberg. 2016.
berg. 2018. Understanding Convolutional Neu- Simple and Accurate Dependency Parsing Us-
ral Networks for Text Classification. In Pro- ing Bidirectional LSTM Feature Representa-
ceedings of the 2018 EMNLP Workshop Black- tions. Transactions of the Association for Com-
boxNLP: Analyzing and Interpreting Neural putational Linguistics, 4:313–327.
Networks for NLP, pages 56–65. Association
Sungryong Koh, Jinee Maeng, Ji-Young Lee,
for Computational Linguistics.
Young-Sook Chae, and Key-Sun Choi. 2001. A
test suite for evaluation of English-to-Korean
Inigo Jauregi Unanue, Ehsan Zare Borzeshi, and
machine translation systems. In MT Summit
Massimo Piccardi. 2018. A Shared Attention
Conference.
Mechanism for Interpretation of Neural Auto-
matic Post-Editing Systems. In Proceedings of Arne Köhn. 2015. What’s in an Embedding? Ana-
the 2nd Workshop on Neural Machine Transla- lyzing Word Embeddings through Multilingual
tion and Generation, pages 11–17. Association Evaluation. In Proceedings of the 2015 Con-
for Computational Linguistics. ference on Empirical Methods in Natural Lan-
guage Processing, pages 2067–2073, Lisbon,
Robin Jia and Percy Liang. 2017. Adversarial ex- Portugal. Association for Computational Lin-
amples for evaluating reading comprehension guistics.
systems. In Proceedings of the 2017 Confer-
ence on Empirical Methods in Natural Lan- Volodymyr Kuleshov, Shantanu Thakoor,
guage Processing, pages 2021–2031. Associa- Tingfung Lau, and Stefano Ermon. 2018.
tion for Computational Linguistics. Adversarial Examples for Natural Language
Classification Problems.
Rafal Jozefowicz, Oriol Vinyals, Mike Schuster,
Brenden Lake and Marco Baroni. 2018. Gener-
Noam Shazeer, and Yonghui Wu. 2016. Explor-
alization without Systematicity: On the Com-
ing the Limits of Language Modeling. arXiv
positional Skills of Sequence-to-Sequence Re-
preprint arXiv:1602.02410v2.
current Networks. In Proceedings of the 35th
International Conference on Machine Learning,
Akos Kádár, Grzegorz Chrupała, and Afra Al-
volume 80 of Proceedings of Machine Learning
ishahi. 2017. Representation of Linguistic
Research, pages 2873–2882, Stockholmsmäs-
Form and Function in Recurrent Neural Net-
san, Stockholm, Sweden. PMLR.
works. Computational Linguistics, 43(4):761–
780. Sabine Lehmann, Stephan Oepen, Sylvie Regnier-
Prost, Klaus Netter, Veronika Lux, Judith Klein,
Andrej Karpathy, Justin Johnson, and Fei- Kirsten Falkedal, Frederik Fouvry, Dominique
Fei Li. 2015. Visualizing and Understand- Estival, Eva Dauphin, Herve Compagnion, Ju-
ing Recurrent Networks. arXiv preprint dith Baur, Lorna Balkan, and Doug Arnold.
arXiv:1506.02078v2. 1996. TSNLP - Test Suites for Natural Lan-
guage Processing. In COLING 1996 Volume 2:
Urvashi Khandelwal, He He, Peng Qi, and Dan Ju- The 16th International Conference on Compu-
rafsky. 2018. Sharp Nearby, Fuzzy Far Away: tational Linguistics.
How Neural Language Models Use Context. In
Proceedings of the 56th Annual Meeting of the Tao Lei, Regina Barzilay, and Tommi Jaakkola.
Association for Computational Linguistics (Vol- 2016. Rationalizing Neural Predictions. In
ume 1: Long Papers), pages 284–294. Associa- Proceedings of the 2016 Conference on Empir-
tion for Computational Linguistics. ical Methods in Natural Language Processing,
pages 107–117. Association for Computational Thang Luong, Richard Socher, and Christopher
Linguistics. Manning. 2013. Better Word Representa-
tions with Recursive Neural Networks for Mor-
Ira Leviant and Roi Reichart. 2015. Separated phology. In Proceedings of the Seventeenth
by an Un-common Language: Towards Judg- Conference on Computational Natural Lan-
ment Language Informed Vector Space Model- guage Learning, pages 104–113. Association
ing. arXiv preprint arXiv:1508.00106v5. for Computational Linguistics.

Jiwei Li, Xinlei Chen, Eduard Hovy, and Dan Ju- Jean Maillard and Stephen Clark. 2018. La-
rafsky. 2016a. Visualizing and Understanding tent Tree Learning with Differentiable Parsers:
Neural Models in NLP. In Proceedings of the Shift-Reduce Parsing and Chart Parsing. In
2016 Conference of the North American Chap- Proceedings of the Workshop on the Relevance
ter of the Association for Computational Lin- of Linguistic Structure in Neural Architectures
guistics: Human Language Technologies, pages for NLP, pages 13–18. Association for Compu-
681–691. Association for Computational Lin- tational Linguistics.
guistics. Marco Marelli, Luisa Bentivogli, Marco Ba-
roni, Raffaella Bernardi, Stefano Menini, and
Jiwei Li, Will Monroe, and Dan Jurafsky.
Roberto Zamparelli. 2014. SemEval-2014 Task
2016b. Understanding Neural Networks
1: Evaluation of Compositional Distributional
through Representation Erasure. arXiv preprint
Semantic Models on Full Sentences through
arXiv:1612.08220v3.
Semantic Relatedness and Textual Entailment.
Bin Liang, Hongcheng Li, Miaoqiang Su, Pan In Proceedings of the 8th International Work-
Bian, Xirong Li, and Wenchang Shi. 2018. shop on Semantic Evaluation (SemEval 2014),
Deep Text Classification Can be Fooled. In pages 1–8. Association for Computational Lin-
Proceedings of the Twenty-Seventh Interna- guistics.
tional Joint Conference on Artificial Intelli- R. Thomas McCoy, Robert Frank, and Tal Linzen.
gence, IJCAI-18, pages 4208–4215. Interna- 2018. Revisiting the poverty of the stimulus:
tional Joint Conferences on Artificial Intelli- Hierarchical generalization without a hierarchi-
gence Organization. cal bias in recurrent neural networks. In Pro-
ceedings of the 40th Annual Conference of the
Tal Linzen, Emmanuel Dupoux, and Yoav Gold- Cognitive Science Society.
berg. 2016. Assessing the Ability of LSTMs to
Learn Syntax-Sensitive Dependencies. Trans- Risto Miikkulainen and Michael G. Dyer. 1991.
actions of the Association for Computational Natural Language Processing With Modular
Linguistics, 4:521–535. Pdp Networks and Distributed Lexicon. Cog-
nitive Science, 15(3):343–399.
Zachary C. Lipton. 2016. The Mythos of Model
Interpretability. In ICML Workshop on Human Tomáš Mikolov, Martin Karafiát, Lukáš Bur-
Interpretability of Machine Learning. get, Jan Černockỳ, and Sanjeev Khudanpur.
2010. Recurrent neural network based language
Nelson F. Liu, Omer Levy, Roy Schwartz, Chen- model. In Eleventh Annual Conference of the
hao Tan, and Noah A. Smith. 2018. LSTMs Ex- International Speech Communication Associa-
ploit Linguistic Attributes of Data. In Proceed- tion.
ings of The Third Workshop on Representation Yao Ming, Shaozu Cao, Ruixiang Zhang, Zhen
Learning for NLP, pages 180–186. Association Li, Yuanzhe Chen, Yangqiu Song, and Huamin
for Computational Linguistics. Qu. 2017. Understanding Hidden Memories of
Recurrent Neural Networks. In IEEE Confer-
Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn
ence on Visual Analytics Science and Technol-
Song. 2017. Delving into Transferable Adver-
ogy (IEEE VAST 2017).
sarial Examples and Black-box Attacks. In In-
ternational Conference on Learning Represen- Grégoire Montavon, Wojciech Samek, and Klaus-
tations (ICLR). Robert Müller. 2018. Methods for interpreting
and understanding deep neural networks. Digi- on Deep Neural Networks. In 2017 IEEE Con-
tal Signal Processing, 73:1 – 15. ference on Computer Vision and Pattern Recog-
nition Workshops (CVPRW), pages 1310–1318.
Pramod Kaushik Mudrakarta, Ankur Taly,
Mukund Sundararajan, and Kedar Dhamdhere. Lars Niklasson and Fredrik Linåker. 2000. Dis-
2018. Did the Model Understand the Question? tributed representations for extended syntac-
In Proceedings of the 56th Annual Meeting of tic transformation. Connection Science, 12(3-
the Association for Computational Linguistics 4):299–314.
(Volume 1: Long Papers), pages 1896–1906.
Association for Computational Linguistics. Tong Niu and Mohit Bansal. 2018. Adversar-
ial Over-Sensitivity and Over-Stability Strate-
James Mullenbach, Sarah Wiegreffe, Jon Duke, gies for Dialogue Models. In Proceedings of
Jimeng Sun, and Jacob Eisenstein. 2018. Ex- the 22nd Conference on Computational Natural
plainable Prediction of Medical Codes from Language Learning, pages 486–496. Associa-
Clinical Text. In Proceedings of the 2018 tion for Computational Linguistics.
Conference of the North American Chapter
of the Association for Computational Linguis- Nicolas Papernot, Patrick McDaniel, and Ian
tics: Human Language Technologies, Volume 1 Goodfellow. 2016a. Transferability in Ma-
(Long Papers), pages 1101–1111. Association chine Learning: From Phenomena to Black-
for Computational Linguistics. Box Attacks using Adversarial Samples. arXiv
preprint arXiv:1605.07277v1.
W. James Murdoch, Peter J. Liu, and Bin Yu.
2018. Beyond Word Importance: Contex- Nicolas Papernot, Patrick McDaniel, Ian Goodfel-
tual Decomposition to Extract Interactions from low, Somesh Jha, Z. Berkay Celik, and Anan-
LSTMs. In International Conference on Learn- thram Swami. 2017. Practical Black-Box At-
ing Representations. tacks Against Machine Learning. In Proceed-
ings of the 2017 ACM on Asia Conference on
Brian Murphy, Partha Talukdar, and Tom Mitchell. Computer and Communications Security, ASIA
2012. Learning Effective and Interpretable Se- CCS ’17, pages 506–519, New York, NY, USA.
mantic Models using Non-Negative Sparse Em- ACM.
bedding. In Proceedings of COLING 2012,
pages 1933–1950. The COLING 2012 Organiz- Nicolas Papernot, Patrick McDaniel, Ananthram
ing Committee. Swami, and Richard Harang. 2016b. Craft-
ing Adversarial Input Sequences for Recurrent
Tasha Nagamine, Michael L. Seltzer, and Nima Neural Networks. In Military Communications
Mesgarani. 2015. Exploring How Deep Neural Conference, MILCOM 2016-2016 IEEE, pages
Networks Form Phonemic Categories. In Inter- 49–54. IEEE.
speech 2015.
Dong Huk Park, Lisa Anne Hendricks, Zeynep
Tasha Nagamine, Michael L. Seltzer, and Nima Akata, Anna Rohrbach, Bernt Schiele, Trevor
Mesgarani. 2016. On the Role of Nonlin- Darrell, and Marcus Rohrbach. 2018. Multi-
ear Transformations in Deep Neural Network modal Explanations: Justifying Decisions and
Acoustic Models. In Interspeech 2016, pages Pointing to the Evidence. In The IEEE Confer-
803–807. ence on Computer Vision and Pattern Recogni-
tion (CVPR).
Aakanksha Naik, Abhilasha Ravichander, Nor-
man Sadeh, Carolyn Rose, and Graham Neu- Sungjoon Park, JinYeong Bak, and Alice Oh.
big. 2018. Stress Test Evaluation for Nat- 2017. Rotated Word Vector Representations
ural Language Inference. In Proceedings of and their Interpretability. In Proceedings of the
the 27th International Conference on Compu- 2017 Conference on Empirical Methods in Nat-
tational Linguistics, pages 2340–2353. Associ- ural Language Processing, pages 401–411. As-
ation for Computational Linguistics. sociation for Computational Linguistics.
Nina Narodytska and Shiva Kasiviswanathan. Matthew Peters, Mark Neumann, Luke Zettle-
2017. Simple Black-Box Adversarial Attacks moyer, and Wen-tau Yih. 2018. Dissecting
Contextual Word Embeddings: Architecture Matı̄ss Rikters. 2018. Debugging Neural
and Representation. In Proceedings of the 2018 Machine Translations. arXiv preprint
Conference on Empirical Methods in Natural arXiv:1808.02733v1.
Language Processing, pages 1499–1509. Asso-
ciation for Computational Linguistics. Annette Rios Gonzales, Laura Mascarell, and Rico
Sennrich. 2017. Improving Word Sense Disam-
Adam Poliak, Aparajita Haldar, Rachel Rudinger, biguation in Neural Machine Translation with
J. Edward Hu, Ellie Pavlick, Aaron Steven Sense Embeddings. In Proceedings of the Sec-
White, and Benjamin Van Durme. 2018a. Col- ond Conference on Machine Translation, pages
lecting Diverse Natural Language Inference 11–19. Association for Computational Linguis-
Problems for Sentence Representation Evalua- tics.
tion. In Proceedings of the 2018 Conference on
Tim Rocktäschel, Edward Grefenstette,
Empirical Methods in Natural Language Pro-
Karl Moritz Hermann, Tomáš Kočiskỳ,
cessing, pages 67–81. Association for Compu-
and Phil Blunsom. 2016. Reasoning about
tational Linguistics.
Entailment with Neural Attention. In Interna-
tional Conference on Learning Representations
Adam Poliak, Jason Naradowsky, Aparajita
(ICLR).
Haldar, Rachel Rudinger, and Benjamin
Van Durme. 2018b. Hypothesis Only Baselines Andras Rozsa, Ethan M. Rudd, and Terrance E.
in Natural Language Inference. In Proceedings Boult. 2016. Adversarial Diversity and Hard
of the Seventh Joint Conference on Lexical Positive Generation. In Proceedings of the
and Computational Semantics, pages 180–191. IEEE Conference on Computer Vision and Pat-
Association for Computational Linguistics. tern Recognition Workshops, pages 25–32.

Jordan B. Pollack. 1990. Recursive dis- Rachel Rudinger, Jason Naradowsky, Brian
tributed representations. Artificial Intelligence, Leonard, and Benjamin Van Durme. 2018.
46(1):77 – 105. Gender Bias in Coreference Resolution. In Pro-
ceedings of the 2018 Conference of the North
Peng Qian, Xipeng Qiu, and Xuanjing Huang. American Chapter of the Association for Com-
2016a. Analyzing Linguistic Knowledge in Se- putational Linguistics: Human Language Tech-
quential Model of Sentence. In Proceedings of nologies, Volume 2 (Short Papers), pages 8–14.
the 2016 Conference on Empirical Methods in Association for Computational Linguistics.
Natural Language Processing, pages 826–835,
D. E. Rumelhart and J. L. McClelland. 1986. Par-
Austin, Texas. Association for Computational
allel Distributed Processing: Explorations in the
Linguistics.
Microstructure of Cognition. volume 2, chapter
On Learning the Past Tenses of English Verbs,
Peng Qian, Xipeng Qiu, and Xuanjing Huang.
pages 216–271. MIT Press, Cambridge, MA,
2016b. Investigating Language Universal and
USA.
Specific Properties in Word Embeddings. In
Proceedings of the 54th Annual Meeting of the Alexander M. Rush, Sumit Chopra, and Jason
Association for Computational Linguistics (Vol- Weston. 2015. A Neural Attention Model for
ume 1: Long Papers), pages 1478–1488, Berlin, Abstractive Sentence Summarization. In Pro-
Germany. Association for Computational Lin- ceedings of the 2015 Conference on Empiri-
guistics. cal Methods in Natural Language Processing,
pages 379–389. Association for Computational
Marco Tulio Ribeiro, Sameer Singh, and Carlos Linguistics.
Guestrin. 2018. Semantically Equivalent Ad-
versarial Rules for Debugging NLP models. In Keisuke Sakaguchi, Kevin Duh, Matt Post, and
Proceedings of the 56th Annual Meeting of the Benjamin Van Durme. 2017. Robsut Wrod Re-
Association for Computational Linguistics (Vol- ocginiton via Semi-Character Recurrent Neu-
ume 1: Long Papers), pages 856–865. Associa- ral Network. In Proceedings of the Thirty-
tion for Computational Linguistics. First AAAI Conference on Artificial Intelli-
gence, February 4-9, 2017, San Francisco, Cal- Xing Shi, Inkit Padhi, and Kevin Knight. 2016b.
ifornia, USA., pages 3281–3287. AAAI Press. Does String-Based Neural MT Learn Source
Syntax? In Proceedings of the 2016 Con-
Suranjana Samanta and Sameep Mehta. 2017.
ference on Empirical Methods in Natural Lan-
Towards Crafting Text Adversarial Samples.
guage Processing, pages 1526–1534, Austin,
arXiv preprint arXiv:1707.02812v1.
Texas. Association for Computational Linguis-
Ivan Sanchez, Jeff Mitchell, and Sebastian Riedel. tics.
2018. Behavior Analysis of NLI Models: Un-
Chandan Singh, W. James Murdoch, and Bin
covering the Influence of Three Factors on Ro-
Yu. 2018. Hierarchical interpretations for
bustness. In Proceedings of the 2018 Confer-
neural network predictions. arXiv preprint
ence of the North American Chapter of the As-
arXiv:1806.05337v1.
sociation for Computational Linguistics: Hu-
man Language Technologies, Volume 1 (Long Hendrik Strobelt, Sebastian Gehrmann, Michael
Papers), pages 1975–1985. Association for Behrisch, Adam Perer, Hanspeter Pfister, and
Computational Linguistics. Alexander M. Rush. 2018a. Seq2Seq-Vis:
A Visual Debugging Tool for Sequence-
Motoki Sato, Jun Suzuki, Hiroyuki Shindo, and
to-Sequence Models. arXiv preprint
Yuji Matsumoto. 2018. Interpretable Adversar-
arXiv:1804.09299v1.
ial Perturbation in Input Embedding Space for
Text. In Proceedings of the Twenty-Seventh In- Hendrik Strobelt, Sebastian Gehrmann, Hanspeter
ternational Joint Conference on Artificial In- Pfister, and Alexander M. Rush. 2018b. LST-
telligence, IJCAI-18, pages 4323–4330. Inter- MVis: A Tool for Visual Analysis of Hidden
national Joint Conferences on Artificial Intelli- State Dynamics in Recurrent Neural Networks.
gence Organization. IEEE transactions on visualization and com-
Lutfi Kerem Senel, Ihsan Utlu, Veysel Yucesoy, puter graphics, 24(1):667–676.
Aykut Koc, and Tolga Cukur. 2018. Seman- Mukund Sundararajan, Ankur Taly, and Qiqi Yan.
tic Structure and Interpretability of Word Em- 2017. Axiomatic Attribution for Deep Net-
beddings. IEEE/ACM Transactions on Audio, works. In Proceedings of the 34th Interna-
Speech, and Language Processing. tional Conference on Machine Learning, vol-
Rico Sennrich. 2017. How Grammatical is ume 70 of Proceedings of Machine Learning
Character-level Neural Machine Translation? Research, pages 3319–3328, International Con-
Assessing MT Quality with Contrastive Trans- vention Centre, Sydney, Australia. PMLR.
lation Pairs. In Proceedings of the 15th Confer- Ilya Sutskever, Oriol Vinyals, and Quoc V. Le.
ence of the European Chapter of the Association 2014. Sequence to Sequence Learning with
for Computational Linguistics: Volume 2, Short Neural Networks. In Advances in neural infor-
Papers, pages 376–382. Association for Com- mation processing systems, pages 3104–3112.
putational Linguistics.
Mirac Suzgun, Yonatan Belinkov, and Stuart M.
Haoyue Shi, Jiayuan Mao, Tete Xiao, Yuning Shieber. 2019. On Evaluating the Generaliza-
Jiang, and Jian Sun. 2018. Learning Visually- tion of LSTM Models in Formal Languages. In
Grounded Semantics from Contrastive Adver- Proceedings of the Society for Computation in
sarial Samples. In Proceedings of the 27th In- Linguistics (SCiL).
ternational Conference on Computational Lin-
guistics, pages 3715–3727. Association for Christian Szegedy, Wojciech Zaremba, Ilya
Computational Linguistics. Sutskever, Joan Bruna, Dumitru Erhan, Ian
Goodfellow, and Rob Fergus. 2014. Intrigu-
Xing Shi, Kevin Knight, and Deniz Yuret. 2016a. ing properties of neural networks. In Interna-
Why Neural Translations are the Right Length. tional Conference on Learning Representations
In Proceedings of the 2016 Conference on Em- (ICLR).
pirical Methods in Natural Language Process-
ing, pages 2278–2282. Association for Compu- Gongbo Tang, Rico Sennrich, and Joakim Nivre.
tational Linguistics. 2018. An Analysis of Attention Mechanisms:
The Case of Word Sense Disambiguation in Xinyi Wang, Hieu Pham, Pengcheng Yin, and
Neural Machine Translation. In Proceedings of Graham Neubig. 2018b. A Tree-based Decoder
the Third Conference on Machine Translation: for Neural Machine Translation. In Conference
Research Papers, pages 26–35. Association for on Empirical Methods in Natural Language
Computational Linguistics. Processing (EMNLP), Brussels, Belgium.
Yi Tay, Anh Tuan Luu, and Siu Cheung Hui. Yu-Hsuan Wang, Cheng-Tao Chung, and Hung-yi
2018. CoupleNet: Paying Attention to Couples Lee. 2017b. Gate Activation Signal Analysis
with Coupled Attention for Relationship Rec- for Gated Recurrent Neural Networks and Its
ommendation. In Proceedings of the Twelfth In- Correlation with Phoneme Boundaries. In In-
ternational AAAI Conference on Web and Social terspeech 2017.
Media (ICWSM).
Gail Weiss, Yoav Goldberg, and Eran Yahav. 2018.
Ke Tran, Arianna Bisazza, and Christof Monz.
On the Practical Computational Power of Finite
2018. The Importance of Being Recurrent
Precision RNNs for Language Recognition. In
for Modeling Hierarchical Structure. In Pro-
Proceedings of the 56th Annual Meeting of the
ceedings of the 2018 Conference on Empiri-
Association for Computational Linguistics (Vol-
cal Methods in Natural Language Processing,
ume 2: Short Papers), pages 740–745. Associa-
pages 4731–4736. Association for Computa-
tion for Computational Linguistics.
tional Linguistics.
Eva Vanmassenhove, Jinhua Du, and Andy Way. Adina Williams, Andrew Drozdov, and Samuel R.
2017. Investigating ‘Aspect’ in NMT and Bowman. 2018. Do latent tree learning mod-
SMT: Translating the English Simple Past and els identify meaningful structure in sentences?
Present Perfect. Computational Linguistics in Transactions of the Association for Computa-
the Netherlands Journal, 7:109–128. tional Linguistics, 6:253–267.

Sara Veldhoen, Dieuwke Hupkes, and Willem Zhizheng Wu and Simon King. 2016. Investigat-
Zuidema. 2016. Diagnostic Classifiers: Reveal- ing gated recurrent networks for speech syn-
ing how Neural Networks Process Hierarchical thesis. In 2016 IEEE International Confer-
Structure. In CEUR Workshop Proceedings. ence on Acoustics, Speech and Signal Process-
ing (ICASSP), pages 5140–5144. IEEE.
Elena Voita, Pavel Serdyukov, Rico Sennrich, and
Ivan Titov. 2018. Context-Aware Neural Ma- Puyudi Yang, Jianbo Chen, Cho-Jui Hsieh, Jane-
chine Translation Learns Anaphora Resolution. Ling Wang, and Michael I. Jordan. 2018.
In Proceedings of the 56th Annual Meeting of Greedy Attack and Gumbel Attack: Generating
the Association for Computational Linguistics Adversarial Examples for Discrete Data. arXiv
(Volume 1: Long Papers), pages 1264–1274. preprint arXiv:1805.12316v1.
Association for Computational Linguistics.
Wenpeng Yin, Hinrich Schütze, Bing Xiang, and
Ekaterina Vylomova, Trevor Cohn, Xuanli He,
Bowen Zhou. 2016. ABCNN: Attention-Based
and Gholamreza Haffari. 2016. Word Rep-
Convolutional Neural Network for Modeling
resentation Models for Morphologically Rich
Sentence Pairs. Transactions of the Association
Languages in Neural Machine Translation.
for Computational Linguistics, 4:259–272.
arXiv preprint arXiv:1606.04217v1.
Alex Wang, Amapreet Singh, Julian Michael, Fe- Xiaoyong Yuan, Pan He, Qile Zhu, and Xiaolin
lix Hill, Omer Levy, and Samuel R. Bowman. Li. 2017. Adversarial Examples: Attacks and
2018a. GLUE: A Multi-Task Benchmark and Defenses for Deep Learning. arXiv preprint
Analysis Platform for Natural Language Under- arXiv:1712.07107v3.
standing. arXiv preprint arXiv:1804.07461v1.
Omar Zaidan, Jason Eisner, and Christine Piatko.
Shuai Wang, Yanmin Qian, and Kai Yu. 2017a. 2007. Using “Annotator Rationales” to Im-
What Does the Speaker Embedding Encode? In prove Machine Learning for Text Categoriza-
Interspeech 2017, pages 1497–1501. tion. In Human Language Technologies 2007:
The Conference of the North American Chap-
ter of the Association for Computational Lin-
guistics; Proceedings of the Main Conference,
pages 260–267. Association for Computational
Linguistics.

Quan-shi Zhang and Song-chun Zhu. 2018. Vi-


sual interpretability for deep learning: A sur-
vey. Frontiers of Information Technology &
Electronic Engineering, 19(1):27–39.

Ye Zhang, Iain Marshall, and Byron C. Wal-


lace. 2016. Rationale-Augmented Convolu-
tional Neural Networks for Text Classification.
In Proceedings of the 2016 Conference on Em-
pirical Methods in Natural Language Process-
ing, pages 795–804. Association for Computa-
tional Linguistics.

Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente


Ordonez, and Kai-Wei Chang. 2018a. Gen-
der Bias in Coreference Resolution: Evaluation
and Debiasing Methods. In Proceedings of the
2018 Conference of the North American Chap-
ter of the Association for Computational Lin-
guistics: Human Language Technologies, Vol-
ume 2 (Short Papers), pages 15–20. Association
for Computational Linguistics.

Junbo Zhao, Yoon Kim, Kelly Zhang, Alexan-


der Rush, and Yann LeCun. 2018b. Adversar-
ially Regularized Autoencoders. In Proceed-
ings of the 35th International Conference on
Machine Learning, volume 80 of Proceedings
of Machine Learning Research, pages 5902–
5911, Stockholmsmässan, Stockholm, Sweden.
PMLR.

Zhengli Zhao, Dheeru Dua, and Sameer Singh.


2018c. Generating Natural Adversarial Exam-
ples. In International Conference on Learning
Representations.

You might also like