Professional Documents
Culture Documents
A Graph Based Approach To WSD Using Rele
A Graph Based Approach To WSD Using Rele
A. Gelbukh (Ed.): CICLing 2012, Part I, LNCS 7181, pp. 225–237, 2012.
© Springer-Verlag Berlin Heidelberg 2012
226 Y. Gutiérrez, S. Vázquez, and A. Montoyo
2 ISR-WN Resource
In this section we describe a semantic network that links different resources: WN,
WND, SUMO and WNA. This resource is called ISR-WN [9, 11]. The aim of
developing this resource is that we need to extract and relate information from
different lexical resources to obtain a multi-conceptual network.
In order to find the right environment to apply our graph-based approach we have
analyzed several resources that align different kind of semantic information. Many
efforts have been focused on the idea of building semantic networks like
MultiWordNet (MWN) [24], EuroWordNet (EWN) [6], Multilingual Central
Repository (MCR) [2], etc. For example: MWN is able to align the Italian and
English lexical dictionaries conceptualized by Domain labels. The MultiWordNet
browser also allows to access to the Spanish, Portuguese, Hebrew, Romanian and
Latin WordNets, but these wordnets are not part of the MultiWordNet distribution.
EWN was developed to align Dutch, Italian, Spanish, German, French, Czech,
English and other lexical dictionaries. MCR integrates into the EWN framework an
1
http://www.cogsci.princeton.edu/~wn/
2
http://www.ontologyportal.org
3
http://wndomains.fbk.eu
4
http://wndomains.fbk.eu/wnaffect.html
A graph-Based Approach to WSD Using Relevant Semantic Trees 227
upgraded version of the EWN Top Concept ontology, the MWN Domains, SUMO
and hundreds of thousands of new semantic relations and properties automatically
acquired from corpora.
ISR-WN, takes into account different kind of labels linked to WN: Level Upper
Concepts (SUMO), Domains and Emotion labels. In this work our purpose is to use a
semantic network which links different semantic resources aligned to WN. After
several tests we decided to apply ISR-WN. Although each resource presented above
provides different semantic relations, ISR-WN has the highest quantity of semantic
dimensions aligned, so it is a suitable resource to run our algorithm. Using ISR-WN
we are able to extract important information from the interrelations of four
ontological resources: WN, WND, WNA and SUMO. ISR-WN resource is based on
WN1.6 or WN2.0 versions. In the last updated version, Semantic Classes and
SentiWordNet were also included but these new dimensions are not taken into
account in this work. Furthermore, ISR-WN provides a tool that allows the navigation
across internal links. At this point, we can discover the multidimensionality of
concepts that exists in each sentence. In order to establish the concepts associated to
each sentence we apply Relevant Semantic Trees [10, 12] approach using the
provided links of ISR-WN.
Since the Clique Partitioning Technique of our approach needs as input data an
initial graph, we use ISR-WN to extract an initial graph from each sentence. Each
graph will be based on the minimal path obtained among the Relevant Concepts and
the senses of all words in each sentence. In next section we describe the algorithm of
our graph-based approach and the method that obtains the Relevant Concepts.
The Clique model was formally defined by Luce and Perry [18] and they provided
this statement: “A Clique is a set of more than two people if they are all mutual
friends of one another”. As we can see, this model had its origin in Social Network
studies.
In order to understand what is a Clique in terms of graphs definition, we present
the following explanation by Cavique et. al. [4]: “Given an undirected graph
, where denotes the set of vertices and the set of edges, the graph 1
1, 1 is called a sub-graph of if 1 ∈ , 1 ∈ and for every edge ( i, j)∈
1 the vertices i , j ∈ 1 . A sub-graph 1 is said to be complete if there is an edge
for each pair of vertices”. Each complete sub-graph is also called a Clique.
As we can appreciate the Clique model proposal obtains complete sub-graphs
where the maximal distance between the vertices is one edge. Due to the fact that the
semantic network where we will apply the Partitioning Technique is integrated by
thousand of vertices, the maximal distance between the vertices must be increased. In
order to decide which model we should use to modify the original algorithm we have
studied different authors [3, 8, 14, 17] with N-Cliques, [3] with K-Plex and [20] with
Clubs and Clans.
After studying different models we considered to use N-Cliques model to
Partitioning Technique which is very similar to Cliques model. Instead we use
distance among vertices of each complete sub-graph. This partitioning idea was
introduces on WSD by Gutiérrez et. al. [14], which was assumed by us. In order to
apply this technique, we only have taken into account the creation of one N-Clique
and the rest of the complete sub-graph will be Cliques. Our goal is to centralize the
highest quantity of semantic information completely on one N-Clique (explained
more in detail in [14]). To understand this algorithm we show an example at Fig 1
using 2 as edges distances.
N=2
S(s1,s2) = {s3, s4, s5, s6} N=1
s1 s2 s1 s3 S(s26,s1) = {s5}
s3 S(s1,s5) = {s2, s6} s2 S(s26,s3) = {}
S(s2,s4) = {s1, s3, s6}
S(s26,s4) = {}
S(s2,s3) = {s1, s6, s4}
s4 s5 s6 s6 S(s26,s5) = {}
S(s2,s6) = {s1, s3, s4, s5, s8} s4 s5
S(s5,s6) = {s1, s2, s8} S(s26,s8) = {s5}
S(s1,s5) = {s26}
s7 s8 S(s6,s8) = {s2, s5, s7}
s7 s8 S(s7,s8) = {s9}
S(s7,s8) = {s9, s6}
s9 S(s7,s9) = {s8}
S(s7,s9) = {s8, s6} s9
Iteration 1 S(s8,s9) = {s7,s 6} Iteration 2 S(s8,s9) = {s7}
N=1 s3 N=1
s1 s3 S(s261,s5) = {}
s2 s1 s2 S(s261,s5) = {}
S(s7,s8) = {s9} s6 S(s78,s9) = {}
s4 s6 S(s7,s9) = {s8}
S(s8,s9) = {s7}
s5 s5 s8
s8 s7
s7
s4
Iteration 3 Iteration 4
s9 s9
N=1
s1 s2 s3 S(s78,s9) = {} s1 s2 s3
s6 s6 Iteration 6
s5 s5
s8
s7 s8 Iteration 5 s7
s4 s4 s9
s9 s789
As we can see this heuristic algorithm is able to obtain one set of nodes with
maximal distance among all nodes 2 edges, because 2 . And it continues
obtaining Cliques with 1 where 1 for each iteration while . As
we have described in previous section, the ISR-WN resource is used to apply this
algorithm.
, , (1)
Where
,
, , log (2)
After obtaining the Initial Concept Vector of Domains we apply the Equation (3) in
order to build the Relevant Semantic Tree related to the sentence.
, , , (3)
Where:
,
, (4)
Root_Domain
As result, for each resource included (WN, WND, WNA and SUMO) an RST is
obtained. Next, it is explained the WSD global method.
1
Initial graph
Fig. 3. Initial graph creation with all senses vs the ten most relevant concepts
232 Y. Gutiérrez, S. Vázquez, and A. Montoyo
5 Evaluation
Our method has been evaluated using the “English All Words” task corpus of
Senseval-2 [5] competition. The analysis has been divided into eight experiments that
have been analyzed in detail to evaluate the influence of different combinations of
resources. The experimental distance used in the Partitioning Technique was 3.
This distance is more effective and faster than other upper distances. However, using
minor distances produces worst results. The experiments are described next applying
the proposal with a graph composed by:
1. WN, WND, WNA and SUMO, using RST of Domains.
2. WN, WND, WNA and SUMO, using RST of SUMO.
3. WN, WND, WNA and SUMO, using RST of Affects.
5
http://nlp.lsi.upc.edu/freeling/
6
After conducting several experiments the ten most relevant concepts have demonstrated to be
the best choice.
A graph-Based Approach to WSD Using Relevant Semantic Trees 233
7
Later of an analysis over WordNet 1.6 and 2.0, the polysemy averages for each grammatical
category respectively are the next: verbs (≈ 2.138, ≈ 2.179), adjectives (≈1.481, ≈1.447),
adverbs (≈1.249, ≈1.246) and nouns (≈1.231, ≈1.236).
234 Y. Gutiérrez, S. Vázquez, and A. Montoyo
The most relevant experiment was that use all mentioned resources (creating the
initial graph with RST of Domains) where the recall obtained reached 42.6%. That
indicates that WND is a semantically more influential resource than the others, when
all semantic dimensions conform the knowledge base. This result could locate our
proposal in the 11th place of Senseval-2 ranking. It is important to remark that using
WNA RST on ISR-WN improved all precision results obtained by our system.
Moreover, the usage of the affective dimension could be more effective if the
evaluated context was related to emotions.
References
1. Agirre, E., Soroa, A.: Personalizing PageRank for Word Sense Disambiguation. In:
Proceedings of the 12th Conference of the European Chapter of the Association for
Computational Linguistics (EACL 2009), Athens, Greece (2009)
2. Atserias, J., Villarejo, L., Rigau, G., Agirre, E., Carroll, J., Magnini, B., Vossen, P.: The
MEANING Multilingual Central Repository. In: Proceedings of the Second International
Global WordNet Conference (GWC 2004), Brno, Czech Republic (2004)
3. Balasundaram, B., Butenko, S., Hicks, I.V., Sachdeva, S.: Clique relaxations in Social
Network Analisis: The Maximun k-plex Problem (2006)
4. Cavique, L., Mendes, A.B., Santos, J.M.A.: An Algorithm to Discover the k-Clique Cover
in Networks. In: Lopes, L.S., Lau, N., Mariano, P., Rocha, L.M. (eds.) EPIA 2009. LNCS,
vol. 5816, pp. 363–373. Springer, Heidelberg (2009)
5. Cotton, S., Edmonds, P., Kilgarriff, A., Palmer, M.: English All word. In: SENSEVAL-2:
Second International Workshop on Evaluating Word Sense Disambiguation Systems.
Association for Computational Linguistics, Toulouse, Toulouse (2001)
6. Dorr, B.J., Castellón, M.A.M.: Spanish EuroWordNet and LCS-Based Interlingual MT. In:
AMTA/SIG-IL First Workshop on Interlinguas, San Diego, CA (1997)
7. Fellbaum, C.: WordNet. An Electronic Lexical Database. The MIT Press, University of
Cambridge (1998)
8. Friedkin, N.E.: Structural Cohesion and Equivalence Explanations of Social Homogeneity.
Sociological Methods & Research 12, 235–261 (1984)
236 Y. Gutiérrez, S. Vázquez, and A. Montoyo
9. Gutiérrez, Y., Fernández, A., Montoyo, A., Vázquez, S.: Integration of semantic resources
based on WordNet. In: XXVI Congreso de la Sociedad Española para el Procesamiento del
Lenguaje Natural, SEPLN 2010, vol. 45, pp. 161–168. Universidad Politécnica de
Valencia, Valencia (2010)
10. Gutiérrez, Y., Fernández, A., Montoyo, A., Vázquez, S.: UMCC-DLSI: Integrative
resource for disambiguation task. In: Proceedings of the 5th International Workshop on
Semantic Evaluation, pp. 427–432. Association for Computational Linguistics, Uppsala
(2010)
11. Gutiérrez, Y., Fernández, A., Montoyo, A., Vázquez, S.: Enriching the Integration of
Semantic Resources based on WordNet. Procesamiento del Lenguaje Natural 47, 249–257
(2011)
12. Gutiérrez, Y., Vázquez, S., Montoyo, A.: Improving WSD using ISR-WN with Relevant
Semantic Trees and SemCor Senses Frequency. In: Proceedings of the International
Conference Recent Advances in Natural Language Processing 2011, RANLP 2011
Organising Committee, Hissar, Bulgaria, pp. 233–239 (2011)
13. Gutiérrez, Y., Vázquez, S., Montoyo, A.: Sentiment Classification Using Semantic
Features Extracted from WordNet-based Resources. In: Proceedings of the 2nd Workshop
on Computational Approaches to Subjectivity and Sentiment Analysis (WASSA 2011), pp.
139–145. Association for Computational Linguistics, Portland (2011)
14. Gutiérrez, Y., Vázquez, S., Montoyo, A.: Word Sense Disambiguation: A Graph-Based
Approach Using N-Cliques Partitioning Technique. In: Muñoz, R., Montoyo, A., Métais,
E. (eds.) NLDB 2011. LNCS, vol. 6716, pp. 112–124. Springer, Heidelberg (2011)
15. Ide, N., Véronis, J.: Introduction to the Special Issue on Word Sense Disambiguation: The
State of the Art. Computational Linguistics 24, 2–40 (1998)
16. Laparra, E., Rigau, G., Cuadros, M.: Exploring the integration of WordNet and FrameNet.
In: Proceedings of the 5th Global WordNet Conference (GWC 2010), Mumbai, India
(2010)
17. Luce, R.D.: Connectivity and generalized cliques in sociometric group structure.
Psychometrika 15, 159–190 (1950)
18. Luce, R.D., Perry, A.D.: A Method of Matrix Analysis of Group Structure.
Psychometrie 14, 95–116 (1949)
19. Magnini, B., Strapparava, C., Pezzulo, G., Gliozzo, A.: Comparing Ontology-Based and
Corpus-Based Domain Annotations in WordNet. In: Proceedings of the First International
WordNet Conference, Mysore, India, pp. 21–25 (2002)
20. Mokken, R.J.: Cliques, Clubs and Clans, vol. 13. Elsevier Scientific Publishing Company,
Amsterdam (1979)
21. Moldovan, D.I., Rus, V.: Explaining Answers with Extended WordNet. ACL (2001)
22. Navigli, R.: Word sense disambiguation: A survey. ACM Comput. Surv. 41, 10:11–10:69
(2009)
23. Navigli, R., Velardi, P.: Structural Semantic Interconnections: a Knowledge-Based
Approach to Word Sense Disambiguation. IEEE Transactions on Pattern Analysis and
Machine Intelligence (TPAMI) 27 (2005)
24. Pianta, E., Bentivogli, L., Girardi, C.: MultiWordNet. Developing an aligned multilingual
database. In: Proceedings of the 1st International WordNet Conference, Mysore, India, pp.
293–302 (2002)
25. Ponzetto, S.P., Navigli, R.: Knowledge-Rich Word Sense Disambiguation Rivaling
Supervised Systems. In: ACL, pp. 1522–1531 (2010)
A graph-Based Approach to WSD Using Relevant Semantic Trees 237
26. Reddy, S., Inumella, A., McCarthy, D., Stevenson, M.: IIITH: Domain Specific Word
Sense Disambiguation. In: Proceedings of the 5th International Workshop on Semantic
Evaluation. Association for Computational Linguistics, Uppsala (2010)
27. Sinha, R., Mihalcea, R.: Unsupervised Graph-based Word Sense Disambiguation Using
Measures of Word Semantic Similarity. In: Proceedings of the IEEE International
Conference on Semantic Computing (ICSC 2007), Irvine, CA (2007)
28. Soroa, A., Agirre, E., de Lacalle, O.L., Bosma, W., Vossen, P., Monachini, M., Lo, J.,
Hsieh, S.-K.: Kyoto: An Integrated System for Specific Domain WSD. In: Proceedings of
the 5th International Workshop on Semantic Evaluation. Association for Computational
Linguistics, Uppsala (2010)
29. Tseng, C.-J., Siewiorek, D.P.: Automated Synthesis of Data Paths in Digital Systems.
IEEE Trans. on CAD of Integrated Circuits and Systems 5, 379–395 (1986)