Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

NeuroNER: an easy-to-use program for

named-entity recognition based on neural networks

Franck Dernoncourt∗ Ji Young Lee∗ Peter Szolovits


MIT MIT MIT
francky@mit.edu jjylee@mit.edu psz@mit.edu

Abstract Fully supervised approaches to NER include


Named-entity recognition (NER) aims at support vector machines (SVM) (Asahara and
identifying entities of interest in a text. Ar- Matsumoto, 2003), maximum entropy mod-
arXiv:1705.05487v1 [cs.CL] 16 May 2017

tificial neural networks (ANNs) have re- els (Borthwick et al., 1998), decision trees (Sekine
cently been shown to outperform existing et al., 1998) as well as sequential tagging meth-
NER systems. However, ANNs remain ods such as hidden Markov models (Bikel et al.,
challenging to use for non-expert users. In 1997), Markov maximum entropy models (Kumar
this paper, we present NeuroNER, an easy- and Bhattacharyya, 2006), and conditional ran-
to-use named-entity recognition tool based dom fields (CRFs) (McCallum and Li, 2003; Tsai
on ANNs. Users can annotate entities us- et al., 2006; Benajiba and Rosso, 2008; Filannino
ing a graphical web-based user interface et al., 2013). Similar to rule-based systems, these
(BRAT): the annotations are then used approaches rely on handcrafted features, which are
to train an ANN, which in turn predict challenging and time-consuming to develop and
entities’ locations and categories in new may not generalize well to new datasets.
texts. NeuroNER makes this annotation- More recently, artificial neural networks
training-prediction flow smooth and ac- (ANNs) have been shown to outperform other
cessible to anyone. supervised algorithms for NER (Collobert et al.,
2011; Lample et al., 2016; Lee et al., 2016;
1 Introduction Labeau et al., 2015; Dernoncourt et al., 2016).
Named-entity recognition (NER) aims at identify- The effectiveness of ANNs can be attributed to
ing entities of interest in the text, such as location, their ability to learn effective features jointly
organization and temporal expression. Identified with model parameters directly from the training
entities can be used in various downstream appli- dataset, instead of relying on handcrafted features
cations such as patient note de-identification and developed from a specific dataset. However,
information extraction systems. They can also be ANNs remain challenging to use for non-expert
used as features for machine learning systems for users.
other natural language processing tasks.
Early systems for NER relied on rules de- Contributions NeuroNER makes state-of-the-
fined by humans. Rule-based systems are time- art named-entity recognition based on ANN avail-
consuming to develop, and cannot be easily trans- able to anyone, by focusing on usability. To enable
ferred to new types of texts or entities. To address users to create or modify annotations for a new
these issues, researchers have developed machine- or existing corpus, NeuroNER interfaces with the
learning-based algorithms for NER, using a vari- web-based annotation program BRAT (Stenetorp
ety of learning approaches, such as fully super- et al., 2012). NeuroNER makes the annotation-
vised learning, semi-supervised learning, unsuper- training-prediction flow smooth and accessible to
vised learning, and active learning. NeuroNER anyone, while leveraging the state-of-the-art pre-
is based on a fully supervised learning algorithm, diction capabilities of ANNs. NeuroNER is open
which is the most studied approach (Nadeau and source and freely available online1 .
Sekine, 2007). 1
NeuroNER can be found online at:

These authors contributed equally to this work. http://neuroner.com
2 Related Work The NER engine’s ANN contains three layers:

Existing publicly available NER systems geared • Character-enhanced token-embedding layer,


toward non-experts do not use ANNs. For • Label prediction layer,
example, Stanford NER (Finkel et al., 2005),
ABNER (Settles, 2005), the MITRE Identifica- • Label sequence optimization layer.
tion Scrubber Toolkit (MIST) (Aberdeen et al.,
The character-enhanced token-embedding layer
2010), (Boag et al., 2015), BANNER (Leaman
maps each token to a vector representation. The
et al., 2008) and NERsuite (Cho et al., 2010) rely
sequence of vector representations corresponding
on CRFs. GAPSCORE uses SVMs (Chang et al.,
to a sequence of tokens is then input to label pre-
2004). Apache cTAKES (Savova et al., 2010) and
diction layer, which outputs the sequence of vec-
Gate’s ANNIE (Cunningham et al., 1996; May-
tors containing the probability of each label for
nard and Cunningham, 2003) use mostly rules.
each corresponding token. Lastly, the label se-
NeuroNER, the first ANN-based NER system for
quence optimization layer outputs the most likely
non-experts, is more generalizable to new corpus
sequence of predicted labels based on the se-
due to the ANNs’ capability to learn effective fea-
quence of probability vectors from the previous
tures jointly with model parameters.
layer. All layers are learned jointly.
Furthermore, in many cases, the NER systems The ANN as well as the training process
assume that the user already has an annotated cor- have several hyperparameters such as charac-
pus formatted in a specific data format. As a result, ter embedding dimension, character-based token-
users often have to connect their annotation tool embedding LSTM dimension, token embedding
with the NER systems by reformatting annotated dimension, and dropout probability. All hyperpa-
data, which can be time-consuming and error- rameters may be specified in a configuration file
prone. Moreover, if users want to manually im- that is human-readable, so that the user does not
prove the annotations predicted by the NER sys- have to dive into any code. Listing 1 presents an
tem (e.g., if they use the NER system to accelerate excerpt of the configuration file.
the human annotations), they have to perform ad-
ditional data conversion. NeuroNER streamlines
[dataset]
this process by incorporating BRAT, a widely- dataset_folder = dat/conll
used and easy-to-use annotation tool.
[character_lstm]
using_character_lstm = True
3 System Description char_embedding_dimension = 25
char_lstm_dimension = 50
NeuroNER comprises two main components: an
[token_lstm]
NER engine and an interface with BRAT. Neu- token_emb_pretrained_file = glove.txt
roNER also comes with real-time monitoring tools token_embedding_dimension = 200
for training, and pre-trained models that can be token_lstm_dimension = 300
loaded to the NER engine in case the user does [crf]
not have access to labelled training data. Figure 1 using_crf = True
presents an overview of the system. random_initial_transitions = True

[training]
3.1 NER engine dropout = 0.5
patience = 10
The NER engine takes as input three sets of data maximum_number_of_epochs = 100
with gold labels: the training set, the validation maximum_training_time = 10
number_of_cpu_threads = 8
set, and the test set. Additionally, it can also take
as input the deployment set, which refers to any Listing 1: Excerpt of the configuration file used
new text without gold labels that the user wishes to define the ANN as well as the training process.
to label. The files that comprise each set of data Only the dataset folder parameter needs to
should be in the same format as used for the anno- be changed by the user: the other parameters have
tation tool BRAT or the CoNLL-2003 NER shared reasonable default values, which the user may op-
task dataset (Tjong Kim Sang and De Meulder, tionally tune.
2003), and organized in the corresponding folder.
Learning curve
Train set
Training & Monitoring

NeuroNER engine

TensorBoard graphs
Validation set

Test set Test set with predicted entities

NeuroNER engine
Prediction & Evaluation

Confusion matrix Classification report

Deployment set with predicted entities


Deployment set NeuroNER engine
Deployment

Figure 1: NeuroNER system overview. In the NeuroNER engine, the training set is used to train the
parameters of the ANN, and the validation set is used to determine when to stop training. The user can
monitor the training process in real time via the learning curve and TensorBoard. To evaluate the trained
ANN, the labels are predicted for the test set: the performance metrics can be calculated and plotted by
comparing the predicted labels with the gold labels. The evaluation can be done at the same time as
the training if the test set is provided along with the training and validation sets, or separately after the
training or using a pre-trained model. Lastly, the NeuroNER engine can label the deployment set, i.e.
any new text without gold labels.
3.2 Real-time monitoring for training • improving the annotations of an already la-
As training an ANN may take many hours, or beled dataset: the annotations may have been
even a few days on very large datasets, NeuroNER done by another human or by a previous run
provides the user with real-time feedback during of NeuroNER.
the training for monitoring purpose. Feedback is
In the latter case, the user may use NeuroNER in-
given through two different means: plots gener-
teractively, by iterating between manually improv-
ated by NeuroNER, and TensorBoard.
ing the annotations and running the NeuroNER en-
Plots NeuroNER generates several plots show- gine with the new annotations to obtain more ac-
ing the training progress and outcome at each curate annotations.
epoch. Plots include the evolution of the overall NeuroNER can take as input datasets in the
F1-score over time, confusion matrices visualizing BRAT format, and outputs BRAT-formatted pre-
the number of correct versus incorrect predictions dictions, which makes it easy to start training di-
for each class, and classification reports showing rectly from the annotations as well as visualize and
the F1-score, precision and recall for each class. analyze the predictions. We chose BRAT for two
main reasons: it is easy to use, and it can be de-
TensorBoard As NeuroNER is based on Ten-
ployed as a web application, which allows crowd-
sorFlow , it leverages the functionalities of Tensor-
sourcing. As a result, the user may quickly gather
Board. TensorBoard is a suite of web applications
a vast amount of annotations by using crowd-
for inspecting and understanding TensorFlow runs
sourcing marketplaces such as Amazon Mechan-
and graphs. It allows to view in real time the per-
ical Turk (Buhrmester et al., 2011) and Crowd-
formances achieved by the ANN being trained.
Flower (Finin et al., 2010).
Moreover, since it is web-based, these perfor-
mances can be conveniently shared with anyone 3.5 System requirements
remotely. Lastly, since graphs generated by Ten-
sorBoard are interactive, the user may gain further NeuroNER runs on Linux, Mac OS X, and Mi-
insights on the ANN performances. crosoft Windows. It requires Python 3.5, Tensor-
Flow 1.0 (Abadi et al., 2016), scikit-learn (Pe-
3.3 Pre-trained models dregosa et al., 2011), and BRAT. A setup script is
Some users may prefer not to train any ANN provided to make the installation straightforward.
model, either due to time constraints or unavail- It can use the GPU if available, and the number of
able gold labels. For example, if the user wants to CPU threads and GPUs to use can be specified in
tag protected health information, they might not be the configuration file.
able to have access to a labeled identifiable dataset.
3.6 Performances
To address this need, NeuroNER provides a set
of pre-trained models. Users are encouraged to To assess the quality of NeuroNER’s predictions,
contribute by uploading their own trained models. we use two publicly and freely available datasets
NeuroNER also comes with several pre-trained to- for named-entity recognition: CoNLL 2003 and
ken embeddings, either with word2vec (Mikolov i2b2 2014. CoNLL 2003 (Tjong Kim Sang and
et al., 2013a,b,c) or GloVe (Pennington et al., De Meulder, 2003) is a widely studied dataset with
2014), which the NeuroNER engine can load eas- 4 usual types of entity: persons, organizations, lo-
ily once specified in the configuration file. cations and miscellaneous names. We use the En-
glish version.
3.4 Annotations
NeuroNER is designed to smoothly integrate with Model CoNLL 2003 i2b2 2014
the freely available web-based annotation tool Best published 90.9 97.9
BRAT, so that non-expert users may create or im- NeuroNER 90.5 97.7
prove annotations. Specifically, NeuroNER ad-
dresses two main use cases: Table 1: F1-scores (%) on the test set compar-
• creating new annotations from scratch, e.g. if ing NeuroNER with the best published methods in
the goal is to annotate a dataset for which no the literature, viz. (Passos et al., 2014) for CoNLL
gold label is available, 2003, (Dernoncourt et al., 2016) for i2b2 2014.
The i2b2 2014 dataset (Stubbs et al., 2015) of the fifth conference on Applied natural language
was released as part of the 2014 i2b2/UTHealth processing. Association for Computational Linguis-
tics, pages 194–201.
shared task Track 1. It is the largest publicly avail-
able dataset for de-identification, which is a form William Boag, Kevin Wacome, Tristan Naumann, and
of named-entity recognition where the entities Anna Rumshisky. 2015. Cliner: A lightweight tool
are protected health information such as patients’ for clinical named entity recognition. American
names and patients’ phone numbers. 22 systems Medical Informatics Association (AMIA) Joint Sum-
mits on Clinical Research Informatics (poster) .
were submitted for this shared task.
Table 1 compares NeuroNER with state-of-the- Andrew Borthwick, John Sterling, Eugene Agichtein,
art systems on CoNLL 2003 and i2b2 2014. Al- and Ralph Grishman. 1998. Nyu: Description of the
though the hyperparameters of NeuroNER were mene named entity system as used in muc-7. In In
Proceedings of the Seventh Message Understanding
not optimized for these datasets (the default hyper- Conference (MUC-7).
parameters were used), the performances of Neu-
roNER are on par with the state-of-the-art sys- Michael Buhrmester, Tracy Kwang, and Samuel D
tems. Gosling. 2011. Amazon’s mechanical turk a new
source of inexpensive, yet high-quality, data? Per-
spectives on psychological science 6(1):3–5.
4 Conclusions
In this article we have presented NeuroNER, an Jeffrey T Chang, Hinrich Schütze, and Russ B Altman.
2004. Gapscore: finding gene and protein names
ANN-based NER tool that is accessible to non- one word at a time. Bioinformatics 20(2):216–225.
expert users and yields state-of-the-art results. Ad-
dressing the need of many users who want to cre- HC Cho, N Okazaki, M Miwa, and J Tsujii. 2010.
ate or improve annotations, NeuroNER smoothly Nersuite: a named entity recognition toolkit. Tsu-
jii Laboratory, Department of Information Science,
integrates with the web-based annotation tool University of Tokyo, Tokyo, Japan .
BRAT.
Ronan Collobert, Jason Weston, Léon Bottou, Michael
Karlen, Koray Kavukcuoglu, and Pavel Kuksa.
References 2011. Natural language processing (almost) from
scratch. The Journal of Machine Learning Research
Martı́n Abadi, Ashish Agarwal, Paul Barham, Eugene 12:2493–2537.
Brevdo, Zhifeng Chen, Craig Citro, Greg S Corrado,
Andy Davis, Jeffrey Dean, Matthieu Devin, et al. Hamish Cunningham, Yorick Wilks, and Robert J
2016. Tensorflow: Large-scale machine learning on Gaizauskas. 1996. Gate: a general architecture for
heterogeneous distributed systems. arXiv preprint text engineering. In Proceedings of the 16th confer-
arXiv:1603.04467 . ence on Computational linguistics-Volume 2. Asso-
ciation for Computational Linguistics, pages 1057–
John Aberdeen, Samuel Bayer, Reyyan Yeniterzi, Ben 1060.
Wellner, Cheryl Clark, David Hanauer, Bradley Ma-
lin, and Lynette Hirschman. 2010. The mitre identi- Franck Dernoncourt, Ji Young Lee, Ozlem Uzuner, and
fication scrubber toolkit: design, training, and as- Peter Szolovits. 2016. De-identification of patient
sessment. International journal of medical infor- notes with recurrent neural networks. Journal of the
matics 79(12):849–859. American Medical Informatics Association .
Masayuki Asahara and Yuji Matsumoto. 2003.
Japanese named entity extraction with redundant Michele Filannino, Gavin Brown, and Goran Nenadic.
morphological analysis. In Proceedings of the 2003 2013. Mantime: Temporal expression identifica-
Conference of the North American Chapter of the tion and normalization in the tempeval-3 challenge.
Association for Computational Linguistics on Hu- arXiv preprint arXiv:1304.7942 .
man Language Technology-Volume 1. Association
for Computational Linguistics, pages 8–15. Tim Finin, Will Murnane, Anand Karandikar, Nicholas
Keller, Justin Martineau, and Mark Dredze. 2010.
Yassine Benajiba and Paolo Rosso. 2008. Arabic Annotating named entities in twitter data with
named entity recognition using conditional random crowdsourcing. In Proceedings of the NAACL HLT
fields. In Proc. of Workshop on HLT & NLP within 2010 Workshop on Creating Speech and Language
the Arabic World, LREC. Citeseer, volume 8, pages Data with Amazon’s Mechanical Turk. Association
143–153. for Computational Linguistics, pages 80–88.

Daniel M Bikel, Scott Miller, Richard Schwartz, Jenny Rose Finkel, Trond Grenager, and Christopher
and Ralph Weischedel. 1997. Nymble: a high- Manning. 2005. Incorporating non-local informa-
performance learning name-finder. In Proceedings tion into information extraction systems by gibbs
sampling. In Proceedings of the 43rd annual meet- Alexandre Passos, Vineet Kumar, and Andrew Mc-
ing on association for computational linguistics. As- Callum. 2014. Lexicon infused phrase embeddings
sociation for Computational Linguistics, pages 363– for named entity resolution. In Proceedings of the
370. Eighteenth Conference on Computational Natural
Language Learning. Association for Computational
N Kumar and Pushpak Bhattacharyya. 2006. Named Linguistics, Ann Arbor, Michigan, pages 78–86.
entity recognition in Hindi using MEMM. Techbical http://www.aclweb.org/anthology/W14-1609.
Report, IIT Mumbai .
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gram-
Matthieu Labeau, Kevin Löser, and Alexandre Al- fort, Vincent Michel, Bertrand Thirion, Olivier
lauzen. 2015. Non-lexical neural architecture for Grisel, Mathieu Blondel, Peter Prettenhofer, Ron
fine-grained POS tagging. In Proceedings of the Weiss, Vincent Dubourg, et al. 2011. Scikit-learn:
2015 Conference on Empirical Methods in Natu- Machine learning in python. Journal of Machine
ral Language Processing. Association for Computa- Learning Research 12(Oct):2825–2830.
tional Linguistics, Lisbon, Portugal, pages 232–237.
http://aclweb.org/anthology/D15-1025. Jeffrey Pennington, Richard Socher, and Christopher D
Manning. 2014. GloVe: global vectors for word
Guillaume Lample, Miguel Ballesteros, Sandeep Sub- representation. Proceedings of the Empiricial Meth-
ramanian, Kazuya Kawakami, and Chris Dyer. 2016. ods in Natural Language Processing (EMNLP 2014)
Neural architectures for named entity recognition. 12:1532–1543.
arXiv preprint arXiv:1603.01360 .
Guergana K Savova, James J Masanz, Philip V Ogren,
Robert Leaman, Graciela Gonzalez, et al. 2008. Ban- Jiaping Zheng, Sunghwan Sohn, Karin C Kipper-
ner: an executable survey of advances in biomedical Schuler, and Christopher G Chute. 2010. Mayo clin-
named entity recognition. In Pacific symposium on ical text analysis and knowledge extraction system
biocomputing. volume 13, pages 652–663. (ctakes): architecture, component evaluation and ap-
plications. Journal of the American Medical Infor-
Ji Young Lee, Franck Dernoncourt, Ozlem Uzuner, and matics Association 17(5):507–513.
Peter Szolovits. 2016. Feature-augmented neural
networks for patient note de-identification. COL- Satoshi Sekine et al. 1998. Nyu: Description of the
ING Clinical NLP . japanese ne system used for met-2. In Proc. Mes-
sage Understanding Conference.
Diana Maynard and Hamish Cunningham. 2003. Mul-
tilingual adaptations of annie, a reusable informa- Burr Settles. 2005. ABNER: An open source tool
tion extraction tool. In Proceedings of the tenth for automatically tagging genes, proteins, and other
conference on European chapter of the Association entity names in text. Bioinformatics 21(14):3191–
for Computational Linguistics-Volume 2. Associa- 3192.
tion for Computational Linguistics, pages 219–222. Pontus Stenetorp, Sampo Pyysalo, Goran Topić,
Tomoko Ohta, Sophia Ananiadou, and Jun’ichi Tsu-
Andrew McCallum and Wei Li. 2003. Early results for jii. 2012. Brat: a web-based tool for nlp-assisted text
named entity recognition with conditional random annotation. In Proceedings of the Demonstrations at
fields, feature induction and web-enhanced lexicons. the 13th Conference of the European Chapter of the
In Proceedings of the seventh conference on Natu- Association for Computational Linguistics. Associa-
ral language learning at HLT-NAACL 2003-Volume tion for Computational Linguistics, pages 102–107.
4. Association for Computational Linguistics, pages
188–191. Amber Stubbs, Christopher Kotfila, and Özlem Uzuner.
2015. Automated systems for the de-identification
Tomas Mikolov, Kai Chen, Greg Corrado, and Jef- of longitudinal clinical narratives: Overview of
frey Dean. 2013a. Efficient estimation of word 2014 i2b2/uthealth shared task track 1. Journal of
representations in vector space. arXiv preprint biomedical informatics 58:S11–S19.
arXiv:1301.3781 .
Erik F Tjong Kim Sang and Fien De Meulder.
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Cor- 2003. Introduction to the conll-2003 shared task:
rado, and Jeff Dean. 2013b. Distributed representa- Language-independent named entity recognition. In
tions of words and phrases and their compositional- Proceedings of the seventh conference on Natural
ity. In Advances in neural information processing language learning at HLT-NAACL 2003-Volume 4.
systems. pages 3111–3119. Association for Computational Linguistics, pages
142–147.
Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig.
2013c. Linguistic regularities in continuous space Richard Tzong-Han Tsai, Cheng-Lung Sung, Hong-Jie
word representations. In HLT-NAACL. pages 746– Dai, Hsieh-Chuan Hung, Ting-Yi Sung, and Wen-
751. Lian Hsu. 2006. Nerbio: using selected word con-
junctions, term normalization, and global patterns to
David Nadeau and Satoshi Sekine. 2007. A sur- improve biomedical named entity recognition. BMC
vey of named entity recognition and classification. bioinformatics 7(5):S11.
Lingvisticae Investigationes 30(1):3–26.

You might also like