Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Named Entity Recognizer for English

Language
Shahan Ali Memon and Muhammad Taimur Rizwan
December 11, 2014

Carnegie Mellon University


Doha, Qatar.
{samemon,mrizwan}@andrew.cmu.edu

Abstract

There is alot of noisy and informal data present around the web
and beyond. This data consists of alot of useful information such as
emotions,named-entities,translations,etc. But information extraction is
not an easy task. Named Entity Recognition (NER) is one of the subtopics
that come under the heading of information extraction. Recognition of
named entities such as people,locations and organizations has become an
essential task in many natural language processing applications. Hence
there are alot of tools available around to help solve this problem of NER.
These tools include Python-based NLTK library which includes an NE
tagger. In this regard, Stanford's Java-based NE tagger is also one of the
widely used tools. As a nal project for an introductory text processing course at Carnegie Mellon University, we built a Naive-Bayes namedentity classier using NLTK, and evaluated the precision,recall and the
F-Measure of the classier. This project report basically addresses the
methods and the features used to solve the problem as well as the results of accuracy measures we calculated via our evaluator. Finally the
report compares the results of the Naive-Bayes NE classier with NLTK's
built-in tagger as well as Stanford's NE tagger.

Introduction
As a nal project for our 15-383 course at Carnegie Mellon Univer-

sity, we used Python programming language along with the NLTK library to
build a Naive-Bayes named-entity classier. The classier is basically trained
on a training set provided by the instructor, and then is tested on the development as well as the test set, also provided by the instructor. The classier
then identies and tags each word in the sentences in the test data with a tag:
I-ORG, I-LOC, I-PER or O for organization,location,person or others respectively. The classier is built using 4 dierent methods : without context, with
lexical context, with lexical and phrasal context and nally with lexical,phrasal
and historical context. On each level we record the precision,recall and the FMeasure of the classier by comparing the NER tags given by the classier with
the correct tags for each word.

Previous Work

2.1 GENIA Tagger - part-of-speech tagging, shallow parsing, and named entity recognition for biomedical text
2.1.1

Summary

The GENIA tagger analyzes English sentences and outputs the base forms,
part-of-speech tags, chunk tags, and named entity tags. The tagger is specifically tuned for biomedical text such as MEDLINE abstracts. If you need to
extract information from biomedical documents, this tagger might be a useful
preprocessing tool.

2.1.2

Performance

The named entity tagger is trained on the NLPBA data set. The featuers and
parameters were tuned using the training data. The nal performance on the
evaluation set is as follows :

2.2 A New State-Of-The-Art Czech Named Entity Recognizer.


2.2.1

Summary

The recognizer is based on Maximum Entropy Markov Model and a Viterbi algorithm decodes an optimal sequence labeling using probabilities estimated by
a maximum entropy classier. The classication features utilize morphological
analysis, two-stage prediction, word clustering and gazetteers. Their methodology is very detailed and can be found on the following link : http://ufal.m.cuni.cz/~straka/papers/2013tsd_ner.pdf

2.2.2

Performance

They call  baseline the simplest model where they used the common set of
classication features in maximum entropy model, then decoded the probability
distribution given by the classier with dynamic programming and in Czech,
post-edited the result with three automatically discovered rules. Table 1 shows
the eect of more sophisticated classication features or processing: (A) new
tagging, lemmatization and chunking, (B) two stage prediction, (C) gazetteers,
(D) Brown clusters, (E) linear combination with the Stanford NER. The experiments (A), (B), (C), (D) and (E) show the system improvement after adding
the respective feature to the baseline. The last line of the table shows results
after combining all features. All new features and preprocessing steps improved
the system performance over the baseline and the gains were similar in both
languages. In the Czech language, most of the impact of adding gazetteers (C)
is formed by the manually annotated proper name.

2.3 Maximum Entropy Models for Named Entity Recognition


2.3.1

Summary :

They present an approach for extracting the named entities (NE) of natural
language inputs which uses the maximum entropy (ME) framework (Berger
et al., 1996).

The objective can be described as follows.

Given a natural in-

put sequence wN1 = w1:::wn:::wN they choose the NE tag sequence cN 1 =


c1:::cn:::cN with the highest probability among all possible tag sequences: ^cN1
= argmaxcN1Pr(cN1 jwN1 ):
problem, i.e.

The argmax operation denotes the search

the generation of the sequence of named entities.

According to

the CoNLL-2003 competition, they concentrate on four types of named entities:


persons (PER), locations (LOC), organizations (ORG), and names of miscellaneous entities (MISC) that do not belong to the previous three.

2.3.2

Peformance :

For English test :


Label

Precision

Recall

F-Measure

LOC

86.44%

89.81%

88.09%

MISC

78.35%

73.22%

75.70%

ORG

80.27%

76.16%

78.16%

PER

89.77%

87.88%

88.81%

OVERALL

84.68%

83.18%

83.92%

Methods

3.1 Data availability :


The data that was available to us for our project was very limited and
narrow.We were given sets of les : training set, development set and testing
set. Each set consisted of a collection of sentences from dierent contexts. The
classier was trained only via the training set, and was then tested using the
untouched development and the training set.

3.2 Features :
As discussed earlier in the introduction, the classier was built in four
dierent levels. The description along with the methods and features of each
level are as follows :

3.2.1

No Context :

In this level, the Naive-Bayes NE Classier did sequence classication, and


then assigned labeled tags to identify which parts of a sentence constituted a
named-entity. This task was done without any context.

Following features were used to train the classier for this task :

Last letter of the word.

POS tag.

Capitalization of the rst letter of the word.

Capitalization of all letters of the word.

The word itself.

Constituency of a hyphen(-) in the word.

3.2.2

Incorporating Lexical Context :

In this level, the classier was made to incorporate the context of the other
words in the sentence and new features were added.
The additional features added in this level are as follows :

Checking the POS tag of the next word in the sentence being a verb.

Checking the previous word along with its tag.

Checking the next word along with its tag

3.2.3

Incorporating Lexical and Phrasal Context :

In this level, phrasal heads were made the part of the context along with the
lexical context. The inclusion of the pharsal context did not contribute much
towards the improvement of the accuracy measures. In fact, it decreased the
accuracy measures. Hence we incorporated only very limited features related to
phrasal context. They are as follows :

3.2.4

The phrasal head of the previous word.

Incorporating Historical context along with Lexical and Phrasal


Context :

In the nal level of the classier, it was trained incorporating the historical
context i.e whenever the classier tagged a word, it took the NER tag of the
previous words in the sentence into consideration. This boosted up the accuracy
measures of the test data by around 2%.

This was because apparently there

were dierent patterns found in the training data relating the NER tags. For
example : one of the patterns we found was that many of the named-entities
were following by another named entity.
Hence following were the features introduced in this level :

The NER tags of a maximum of last 4 tags in the sentence.

Results
The results include the precision, recall and the F-Measure of the classier.

The results also consitute the precision and recall of the I/O tags with the
exclusion of the type of named-entity.
The results for each level are as follows :

4.1 Classier with no context :


4.1.1

Development Set

LABEL

LOC

LABEL

PER

LABEL

ORG

LABEL

I/0

Precision

0.72

Precision

0.78

Precision

0.65

Precision

0.73

Recall

0.82

Recall

0.90

Recall

0.69

Recall

0.82

F-Measure

0.77

F-Measure

0.84

F-Measure

0.67

F-Measure

0.77

4.1.2

Test Set

LABEL

LOC

LABEL

PER

LABEL

ORG

LABEL

I/0

Precision

0.59

Precision

0.70

Precision

0.62

Precision

0.64

Recall

0.79

Recall

0.84

Recall

0.64

Recall

0.75

F-Measure

0.67

F-Measure

0.76

F-Measure

0.63

F-Measure

0.70

4.2 Classier with lexical context :


4.2.1

Development Set

LABEL

LOC

LABEL

PER

LABEL

ORG

LABEL

I/0

Precision

0.70

Precision

0.88

Precision

0.62

Precision

0.74

Recall

0.87

Recall

0.90

Recall

0.76

Recall

0.85

F-Measure

0.78

F-Measure

0.89

F-Measure

0.68

F-Measure

0.79

4.2.2

Test Set

LABEL

LOC

LABEL

PER

LABEL

ORG

LABEL

I/0

Precision

0.64

Precision

0.83

Precision

0.56

Precision

0.66

Recall

0.84

Recall

0.85

Recall

0.74

Recall

0.81

F-Measure

0.72

F-Measure

0.84

F-Measure

0.63

F-Measure

0.73

4.3 Classier with lexical and phrasal context :


4.3.1

Development Set

LABEL

LOC

LABEL

PER

LABEL

ORG

LABEL

I/0

Precision

0.69

Precision

0.88

Precision

0.61

Precision

0.73

Recall

0.86

Recall

0.89

Recall

0.76

Recall

0.84

F-Measure

0.76

F-Measure

0.89

F-Measure

0.68

F-Measure

0.79

4.3.2

Test Set

LABEL

LOC

LABEL

PER

LABEL

ORG

LABEL

I/0

Precision

0.63

Precision

0.84

Precision

0.60

Precision

0.66

Recall

0.81

Recall

0.85

Recall

0.75

Recall

0.80

F-Measure

0.71

F-Measure

0.84

F-Measure

0.63

F-Measure

0.72

4.4 Classier with lexical,phrasal and historical context :


4.4.1

Development Set

LABEL

LOC

LABEL

PER

LABEL

ORG

LABEL

I/0

Precision

0.73

Precision

0.90

Precision

0.67

Precision

0.77

Recall

0.87

Recall

0.91

Recall

0.82

Recall

0.87

F-Measure

0.80

F-Measure

0.91

F-Measure

0.73

F-Measure

0.82

4.4.2

Test Set

LABEL

LOC

LABEL

PER

LABEL

ORG

LABEL

I/0

Precision

0.67

Precision

0.87

Precision

0.61

Precision

0.71

Recall

0.86

Recall

0.89

Recall

0.81

Recall

0.85

F-Measure

0.76

F-Measure

0.88

F-Measure

0.70

F-Measure

0.77

4.5 NLTK NE TAGGER


4.5.1

Test Set

LABEL

LOC

LABEL

PER

LABEL

ORG

LABEL

I/0

Precision

0.63

Precision

0.72

Precision

0.47

Precision

0.63

Recall

0.56

Recall

0.75

Recall

0.32

Recall

0.55

F-Measure

0.59

F-Measure

0.73

F-Measure

0.38

F-Measure

0.59

4.6 STANFORD TAGGER


4.6.1

Test Set

LABEL

LOC

LABEL

PER

LABEL

ORG

LABEL

I/0

Precision

0.82

Precision

0.88

Precision

0.61

Precision

0.93

Recall

0.68

Recall

0.78

Recall

0.82

Recall

0.93

F-Measure

0.73

F-Measure

0.84

F-Measure

0.70

F-Measure

0.93

Analysis

5.1 Comparison between Naive Bayes NE Classer and


NLTK NE tagger:
Comparing and analyzing the data through the tables given above, we can
conclude that the precision,recall and the F-measure for all the labels has been

increased by incorporating the lexical,phrasal and the historical context. The


dierence is very signicant.

5.2 Comparison between Naive Bayes NE Classier and


Stanford's NE tagger:
Comparing and analyzing the data, we can conclude that Stanford's NE
tagger's performance is comparatively better than our classier. However, for
the organization label, our classier's performance is almost the same. For I/O
tags, Stanford's NE tagger is signicantly better. The dierence between the
performance of our classier and Stanford's NE tagger is not very huge which
is commendable.

Conclusion
In our paper we have analyzed a comprehensive set of features used in the

NER. We have also considered the impact of each feature on the precision,recall
and the f-measure.

The problem of the name entity recognition is a dicult

problem. Even after incorporating 4 dierent methods, we could only achieve


a performance of around 0.7 or 0.8 for all the measures. Some features like the
historical context boosted up the accuracy. However, phrasal context did not
help much. In fact it decreased the accuracy to some extent. The interesting
fact was that jumping from 0.80 to 0.90 for each measure was not as dicult
as jumping from 0.90 to 0.92. In all, we plan to incorporate more features and
improve the accuracy of the NE tagger in future.

References

GENIA tagger, Tsuruoka,Y (2006) . Available at: http://www.nactem.ac.uk/tsujii/GENIA/tagger/


as of 12/11/2014.

A New State-Of-The-Art Czech Named Entity Recognizer.

Stakova J ,

Straka M, Hajic J.Available at: http://ufal.m.cuni.cz/~straka/papers/2013tsd_ner.pdf as of 12/11/2014

Maximum Entropy Models for Named Entity Recognition. Bender1 O.,


Och2 F J, Ney H.

You might also like