Professional Documents
Culture Documents
W14 0105 PDF
W14 0105 PDF
W14 0105 PDF
they suffer from either limited coverage or limited and anglicization produced predictable changes.
accuracy. The authors decided to supplement this It is possible to apply the same changes to the
limited evaluation by a manual evaluation of lit- French candidate literal before computing the Lev-
erals that do not exist in WOLF, but it has been enshtein distance, bringing related words closer.
done on 120 (literal, synset) pairs only by a single We remove diacritics before applying several op-
annotator. The accuracy of JAWS is evaluated to erations to word tails (Table 1). For example, re-
67.1%, which is lower than WOLF 0.1.4 and sig- versing the "r" and "e" letter takes into account
nificantly lower than the accuracy of WOLF 1.0b. (ordre/order) and (tigre/tiger). 2 As before, false
Furthermore this score should be taken with cau- friends are not taken into account.
tion because of the size of the test sample: the con-
fidence interval is approximately 25%. 3.2 Learning thresholds
In JAWS, each English literal can only correspond
3 WoNeF: JAWS improved and extended to the highest scoring French translation, regard-
to other parts of speech less of the scores of lower-rated translations. This
rejects valid candidates and accepts wrong ones.
This section presents three key enhancements that
For example, JAWS does not include particulier
have been made to JAWS and its extension to
in the human being synset because personne is al-
cover verbs, adjectives and adverbs. A change that
ready included with a higher score.
is not detailed here is the one that led to a dramat-
In WoNeF, we learned a threshold for each part
ically higher execution speed: JAWS built in sev-
of speech and selector. We first generated scores
eral hours versus less than a minute for WoNeF,
for all (literal, synset) candidate pairs, then sorted
which helped to run many more experiments.
these pairs by score. The 12 399 pairs present in
3.1 Initial selectors the WOLF 1.0b manual evaluation (our training
set) were considered to be correct, while the pairs
JAWS initial selectors are not optimal. While we
2
keep the monosemy and uniqueness selectors, we The Damerau-Levenshtein distance which takes into ac-
changed the other ones. The direct translation se- count transpositions anywhere in a word (Damerau, 1964) led
to poorer results.
lector is deleted as its precision was very low, even
for nouns. A new selector considers candidate
translations coming from several different English -que -k banque → bank
words in a given synset: the multiple sources se- -aire -ary tertiaire → tertiary
lector, a variant the variant criterion of (Atserias et eur er chercheur → researcher
al., 1997). For example, in the synset line, railway ie y cajolerie → cajolery
line, rail line, the French literals ligne de chemin -té -ty extremité → extremity
de fer and voie are translations of both line and -re -er tigre → tiger
railway line and are therefore chosen as transla- ais ese libanais → lebanese
tions. -ant -ing changeant → changing
Finally, the Levenshtein distance selector has
been improved. 28% of English vocabulary is Table 1: Changes to French word tails before
of French origin (Finkenstaedt and Wolff, 1973) applying the Levenshtein distance.
outside this set were not. We then calculated the continued this work on all other parts of speech:
thresholds maximizing precision and F-score. verbs, adjectives and adverbs. Here, generic selec-
Once these thresholds are defined, the selec- tors have been modified, but in the future, we will
tors choose all candidates above the new thresh- develop selectors taking into account the different
old. This has two positive effects: valid candidates parts of speech characteristics in WordNet.
are not rejected when only the best candidate is
already selected (improving both recall and cov- Verbs Selectors chosen for verbs are the unique-
erage) and invalid candidates which were previ- ness and monosemy selectors. Indeed, the Leven-
ously accepted are now rejected thanks to a stricter shtein distance gave poor results for verbs: only
threshold (increasing precision). 25% of the verbs chosen by this selector were cor-
rect translations. For syntactic selectors, only the
3.3 Vote selector by synonymy gave good results, while the
After applying all selectors, our WordNet is large selector by hyponymy had the performance of a
but contains some noisy synsets. In WoNeF, noise random classifier.
comes from several factors: selectors try to in- Adjectives For adjectives, all initial selectors
fer semantic information from a syntactic analy- are chosen, and the selected syntactic selector is
sis without taking into account the full complex- the selector by synonymy.
ity of the syntax-semantics interface; the parser
itself produces some noisy results; the syntactic Adverbs The configuration is the same than for
language model is generated from a noisy corpus adjectives. We have no gold standard for adverbs,
extracted from the Web (poorly written text, non- which explains why they are not included in our
text content, non French sentences); and selected evaluation. However, comparison with WOLF
translations in one step are considered valid in the (section 5.4) shows that adverbs are better than
following steps while this is not always the case. other parts of speech.
For the high-precision resource, we only keep
literals for which the selectors were more confi- 4 WoNeF: an evaluated JAWS
dent. Since multiple selectors can now choose a
4.1 Gold standard development
given translation (section 3.2), our solution is sim-
ple and effective: translations proposed by multi- Evaluation of JAWS suffers from a number of lim-
ple selectors are kept while the others are deleted. itations (section 2.2). We produced a gold stan-
This voting principle is inspired from ensemble dard for rigorous evaluation to evaluate WoNeF.
learning in machine learning. It is also similar to For nouns, verbs and adjectives, 300 synsets have
the combination method used in (Atserias et al., been annotated by two authors of this paper, both
1997) but we can avoid their manual inspection of computational linguists, both native French speak-
samples of each method thanks to the development ers and respectively with a background in com-
of our gold standard. puter science and linguistics. For each candi-
This cleaning operation retains only 18% of date provided by our translation dictionaries, they
translations (from 87 757 (literal, synset) pairs to had to decide whether or not it belonged to the
15 625) but the accuracy increases from 68.4% to synset. They used WordNet synsets to examine
93.3%. This high precision resource can be used their neighbors, the Merriam-Webster dictionary,
as training data for other French WordNets. A the French electronic dictionary TLFi and search
typical voting methods problem is to choose only engines to demonstrate the use of different senses
easier and poorly interesting examples, but the of the words in question. Because dictionaries do
resource obtained here is well balanced between not provide candidates for all synsets and some
synsets containing only monosemic words and synsets have no suitable candidate, the actual num-
other synsets containing polysemous and more ber of non-empty synsets is less than 300 (section
difficult to disambiguate words (section 5.2). 4.2).
During manual annotation, we encountered dif-
3.4 Extension to verbs, adjectives and ficulties arising from the attempt to translate the
adverbs Princeton WordNet to French. Most problems
The work on JAWS began with nouns because they come from verbs and adjectives appearing in a col-
represent 70% of the synsets in WordNet. We location. In WordNet, they can be grouped in a
way that makes sense in English, but that is not Nouns Verbs Adj.
reflected directly in another language. For exam-
Fleiss Kappa 0.715 0.711 0.663
ple, the adjective pointed is the only element of
Synsets 270 222 267
a synset defined as Direct and obvious in mean-
Candidates 6.22 14.50 7.27
ing or reference; often unpleasant, “a pointed cri-
tique”, “a pointed allusion to what was going
Table 2: Gold standard inter-annotator agreement
on”, “another pointed look in their direction”.
These three examples would result in three differ-
ent translations in French: une critique dure, une parts of speech. Even if it is a discussed met-
allusion claire and un regard appuyé. There is no ric (Powers, 2012), all existing evaluation tables
satisfactory solution in translating such a synset: consider these scores as high enough to describe
the resulting synset contains either too many or too the inter-annotator agreement as "good" (Gwet,
few translations. We view this issue as a mainly 2001), which allows us to say that our gold stan-
linguistic one in the way WordNet has grouped dard is good. The expand approach for the transla-
those three usages of pointed. We marked the con- tion of WordNets is also validated : it is possible to
cerned synsets and will handle them in a future produce useful resource in spite of the difficulties
work, either manually or with other approaches. mentioned in section 4.1.
These granularity problems concern 3% of nomi-
5 Results
nal synsets, 8% of verbal synsets and 6% of adjec-
tival synsets. We present in this section the results of WoNeF.
The other main difficulty stems from transla- We first describe the initial selectors and proceed
tions in our bilingual dictionaries. Rare meanings with the full resource. Our gold standard is di-
of a word are sometimes missing. For example, vided into two parts: 10% of the literals form the
there is a WordNet synset containing the egg verb validation set used to choose the selectors that ap-
for its coat with beaten egg sense. Our dictionar- ply to different versions of WoNeF, while the re-
ies only consider egg as a noun: neither our gold maining 90% form the evaluation set. No train-
standard nor JAWS can translate this synset. This ing was performed on our gold standard. Precision
case appeared rarely in practice, and none of these and recall are based on the intersection of synsets
senses are in the most polysemous synsets (BCS present in WoNeF and our gold standard. Preci-
synsets as defined in section 5.2), confirming that sion is the fraction of correct (literal, synset) pairs
it doesn’t affect the quality of our gold standard for in the intersection while recall is the fraction of
the most important synsets. Yet WoNeF could be correctly retrieved pairs.
improved by using specific dictionaries for species
5.1 Initial selectors
(as in (Sagot and Fišer, 2008) with WikiSpecies),
medical terms, etc. Unwanted translations are an- For nouns, verbs and adjectives, we calculated the
other issue. Our dictionaries translate unkindly to efficiency of each initial selector on our develop-
sans aménité (without amenity) which is a com- ment set, and used this data to determine which
positional phrase. While such a translation is ex- ones should be included in the high precision ver-
pected in a bilingual dictionary, it should not be sion, the high F-score version and the large cover-
integrated in a lexical resource. The last diffi- age one. Scores are reported on the test set.
culty lied in judgment adjectives: for example, Table 3 shows the results of this operation. Cov-
there is no good translation of weird in French. erage gives an idea of the size of the resource. De-
Although most dictionaries provide bizarre as a pending on the objectives of each resource, the se-
translation, this one does not provide the stupid lected initial selectors are different. Since differ-
aspect of weird. There is no translation that would ent selectors can choose the same translation, the
fit in all contexts: the synset meaning is not fully sum of coverages is greater than the coverage of
preserved after translation. the high coverage resource.
Jordi Atserias, Salvador Climent, Xavier Farreres, Ger- Christine Jacquin, Emmanuel Desmontils, and Laura
man Rigau, and Horacio Rodr Guez. 1997. Com- Monceaux. 2007. French eurowordnet lexical
bining Multiple Methods for the Automatic Con- database improvements. In CICLing 2007, Febru-
struction of Multilingual WordNets. In RANLP’97, ary.
September.
Hai-Son Le, Alexandre Allauzen, and François Yvon.
Laura Benítez, Sergi Cervell, Gerard Escudero, Mònica 2012. Continuous Space Translation Models with
López, German Rigau, and Mariona Taulé. 1998. Neural Networks. In NAACL-HLT 2012, June.
Methods and Tools for Building the Catalan Word-
Alessandro Lenci and Giulia Benotto. 2012. Identify-
Net. In ELRA Workshop on Language Resources for
ing hypernyms in distributional semantic spaces. In
European Minority Languages, May.
*SEM 2012, June.
Romaric Besançon, Gaël de Chalendar, Olivier Ferret, Claire Mouton and Gaël de Chalendar. 2010. JAWS:
Faiza Gara, and Nasredine Semmar. 2010. LIMA: Just Another WordNet Subset. In TALN 2010, June.
A Multilingual Framework for Linguistic Analysis
and Linguistic Resources Development and Evalua- Roberto Navigli and Simone Paolo Ponzetto. 2010.
tion. In LREC 2010, May. BabelNet: Building a very large multilingual seman-
tic network. In ACL 2010, July.
Fred J. Damerau. 1964. A technique for computer de-
tection and correction of spelling errors. Communi- David M W Powers. 2012. The Problem with Kappa.
cations of the ACM, 7(3):171–176, March. In EACL 2012, April.
Gerard de Melo and Gerhard Weikum. 2008. On the German Rigau and Eneko Agirre. 1995. Disam-
Utility of Automatically Generated Wordnets. In biguating bilingual nominal entries against Word-
GWC 2008, January. Net. In Workshop "The Computational Lexicon".
ESSLLI’95, August.
Gerard de Melo and Gerhard Weikum. 2009. Towards
a universal wordnet by learning from combined evi- Benoît Sagot and Darja Fišer. 2008. Building a free
dence. In CIKM 2009, November. French wordnet from multilingual resources. In On-
tolex 2008 Workshop, May.
Helge Dyvik. 2002. Translations as Semantic Mirrors:
Benoît Sagot and Darja Fišer. 2012. Automatic Exten-
From Parallel Corpus to WordNet. In ICAME 23,
sion of WOLF. In GWC 2012, January.
May.
Dan Tufiş, Dan Cristea, and Sofia Stamou. 2004.
Javier Farreres, Karina Gibert, Horacio Rodríguez, and BalkaNet: Aims, methods, results and perspectives.
Charnyote Pluempitiwiriyawej. 2010. Inference of a general overview. Romanian Journal of Informa-
lexical ontologies. The LeOnI methodology. Artifi- tion Science and Technology, 7(1-2):9–43.
cial Intelligence, 174(1):1–19, January.
Piek Vossen. 1998. EuroWordNet: a multilingual
Christiane Fellbaum and Piek Vossen. 2007. Connect- database with lexical semantic networks. Kluwer
ing the Universal to the Specific: Towards the Global Academic, October.
Grid. In IWIC 2007, January.