Professional Documents
Culture Documents
Jap Ner PDF
Jap Ner PDF
97
Proceedings of the First Workshop on Subword and Character Level Models in NLP, pages 97–102,
c
Copenhagen, Denmark, September 7, 2017.
2017 Association for Computational Linguistics.
inputs which are hand-crafted features such as
POS tags. Support Vector Machine (Isozaki and
Kazawa, 2002), maximum entropy models (Ben-
der et al., 2003), Hidden Markov Models (Zhou
and Su, 2002) and CRF (Klinger, 2011; Chen
et al., 2006; Marcinczuk, 2015) were applied. - +
98
CNN Layer: This layer is aimed at extracting sub-
word information. The inputs are character em-
beddings of a word. This layer consists of convo-
lution and pooling layers. The convolution layer
produces a matrix for a word with consideration
of the sub-word. The pooling layer compresses - +
99
BLSTM Char-BLSTM Char-BLSTM Char-BLSTM Char-
BLSTM BLSTM CRF
-CNNs -CRF -CRF -CRF BLSTM
-CRF -CNNs
-CRF w/o word w/o char w/o pretraining -CRF
input word word+character character word word+character
output word character
Product †83.89 80.83 83.82 80.72 78.12 84.28 80.13 84.46
Location †88.57 87.52 88.46 87.54 86.77 91.30 87.63 91.47
Organization 85.87 82.07 †85.99 79.62 77.72 85.26 80.79 85.56
Time †94.39 92.50 93.51 93.00 93.35 94.03 93.55 94.44
Average †87.16 84.59 86.98 84.25 82.81 87.80 84.33 88.06
Table 2: F1 score of each models. Average is a weighted average. † expresses the best result in the
word-based models for each entity category. Bold means the best result in all models for each entity
category. “output” means the unit of prediction; “input” shows information used as inputs.
Entity Category Pro. Loc. Org. Time Pro. Loc. Org. Time
Word Length in Entity 1.99 2.07 2.51 1.20 Total Conflicts 66 75 23 7
Extracted Entities 25 68 8 3
Table 3: Averaged word length in entity.
Table 4: Number of entities with boundary con-
flicts and that of entities extracted by Char-
Word-based Neural Models: Among word- BLSTM-CRF.
based models, BLSTM-CNNs-CRF is the best
model for Organization. Also, BLSTM-CRF
tributes to the performance improvement in Char-
is the best model for Product, Location, and
BLSTM-CRF, although the input degrades the per-
Time. We confirm that cutting-edge neural models
formance in a word-based model.
are suitable for Japanese language because each
model outperforms CRF. Total Conflicts in the table 4 is the total num-
ber of entities with boundary conflicts in the test
When comparing BLSTM-CNNs and BLSTM-
data. Extracted Entities in the table is the number
CNNs-CRF, the CRF layer contributes to improve-
of entities that Char-BLSTM-CRF extracts among
ment by 2.39 pt. When comparing BLSTM-CRF
the entities with boundary conflicts. Results show
and BLSTM-CNNs-CRF, the CNN layer is worse
that the model extracts entities with boundary con-
by 0.18 pt. Here, the CNN layer and the CRF layer
flicts which cannot be extracted by word-based
improve by 2.36 pt and 1.75 pt in English (Ma and
models. The number of entities with boundary
Hovy, 2016). Therefore, the CRF layer performs
conflicts extracted by Char-BLSTM-CRF is the
similarly but the CNN layer performs differently.
largest in Location. When comparing the perfor-
The CNN layer enhances the model flexibility.
mance of Char-BLSTM-CRF and BLSTM-CRF
Nevertheless, this layer can scarcely extract infor-
for each entity category, the largest performance
mation from characters because Japanese words
improvement of 2.90 pt is achieved in Location.
are shorter than English, according to Section 3.
By extracting 68 entities with boundary conflicts
Table 3 shows the averaged word length after in Location, Char-BLSTM-CRF achieves about 2
splitting an entity into words for each entity cate- pt improvement out of total 2.90 pt. It can be said
gory. Words composing Time entities is the short- that almost all improvements of Char-BLSTM-
est. Therefore, information is scarce, especially in CRF are from extracting these entities.
Time. In contrast, the words composing Organi- In contrast, Char-BLSTM-CRF is inappropriate
zation is long. Therefore, CNN can extract infor- for Organization. The averaged word length of en-
mation from characters in a word in Organization. tities that are not extracted accurately by BLSTM-
This is the reason why BLSTM-CNNs-CRF per- CNNs-CRF is 4.07; that by Char-BLSTM-CRF is
forms better than BLSTM-CRF in Organization. 4.87. It can be said that Char-BLSTM-CRF is un-
Character-based Neural Models: The results of suitable for extracting long words. We infer that
averaged F1 scores show that Char-BLSTM-CRF the inputs of LSTM become redundant and that
is more suitable for Japanese than word-based LSTM does not work efficiently. Especially, the
models. When comparing four configurations of averaged word length of Organization is long ac-
Char-BLSTM-CRF, pre-training is critically im- cording to Table 3. For that reason, the character-
portant for performance. Character input also con- based model is inappropriate in Organization.
100
6 Conclusions Taiichi Hashimoto, Takashi Inui, and Koji Murakami.
2008. Constructing extended named entity anno-
As described in this paper, we verified the effec- tated corpora (in japanese). The Special Interest
tiveness of the state-of-the-art neural NER model Group Technical Reports of the Information Pro-
cessing Society of Japan, 113:113–120.
for Japanese. The experimentally obtained results
show that the model outperforms conventional Zhiheng Huang, Wei Xu, and Kai Yu. 2015. Bidirec-
CRF in Japanese. Results show that the CNN layer tional lstm-crf models for sequence tagging. arXiv
works improperly for Japanese because words of preprint arXiv:1508.01991.
the Japanese language are short. Hideki Isozaki and Hideto Kazawa. 2002. Efficient
We proposed a character-based neural model in- support vector classifiers for named entity recog-
corporating words and characters: Char-BLSTM- nition. In Proceedings of the 19th international
conference on Computational linguistics-Volume 1,
CRF. This model outperforms a state-of-the-art pages 1–7.
neural NER model in Japanese, especially for the
entity category consisting of short words. Our fu- Tomoya Iwakura. 2011. A named entity recognition
method using rules acquired from unlabeled data. In
ture work will examine reduction of redundancy Proceedings of the 8th International Conference on
in character-based model by preparing and com- Recent Advances in Natural Language Processing,
bining different LSTMs for word and character in- pages 170–177.
puts. Also to examine the effects of pre-training
Diederik Kingma and Jimmy Ba. 2014. Adam: A
of characters in Char-BLSTM-CRF is our future method for stochastic optimization. arXiv preprint
work. arXiv:1412.6980.
101
Cicero Nogueira dos Santos and Victor Guimaraes.
2015. Boosting named entity recognition with
neural character embeddings. arXiv preprint
arXiv:1505.05008.
Ryohei Sasano and Sadao Kurohashi. 2008. Japanese
named entity recognition using structural natural
language processing. In Proceedings of the 3rd In-
ternational Joint Conference on Natural Language
Processing, pages 607–612.
Manabu Sassano and Takehito Utsuro. 2000. Named
entity chunking techniques in supervised learning
for japanese named entity recognition. In Pro-
ceedings of the 18th conference on Computational
linguistics-Volume 2, pages 705–711.
Satoshi Sekine, Kiyoshi Sudo, and Chikashi Nobata.
2002. Extended named entity hierarchy. In Pro-
ceedings of 3rd International Conference on Lan-
guage Resources and Evaluation, pages 1818–1824.
Erik F Tjong Kim Sang and Fien De Meulder.
2003. Introduction to the conll-2003 shared task:
Language-independent named entity recognition.
In Proceedings of the 7th conference on North
American Chapter of the Association for Computa-
tional Linguistics on Human Language Technology-
Volume 4, pages 142–147.
102