2lexical Semantics and WSD - 080115

Lexical Semantics
Lexical Semantics
 What is lexical? Lexical means relating to the words or vocabulary of
a language. Lexical item can be a single word, a part of a word, or a
chain of words.
 What is semantics?
 Semantics is the study of meaning of words, phrases and
sentences or
 Is the study of meaning of linguistic expressions, such as
morphemes, words, phrases, clauses, and sentences.
 Semantics is concerned with denotation (primary or objective)
meaning not with connotation (idea or feeling) meaning
 There are two general types of semantics namely:
a. Lexical semantics which deals with the meaning of the words
(or is the study of words in general, study of related words).
b. Structural semantics which deals with the meaning of
utterances (vocal sound) larger than words.
Cont…
 What is a word?
 Definitions we’ve used over the class: Types, tokens, stems, roots,
uninflected forms, etc…
 Lexeme: an entry in the lexicon (An entry in a lexicon consisting of a
pairing of a form with a single meaning representation) that includes:
an orthographic representation
a phonological form
a symbolic meaning representation or sense
 Lexicon: A collection of lexemes (mental dictionary).
 Dictionary: is a kind of lexicon where meanings are expressed through
definitions and examples
Red (‘red) n: the color of blood
Blood (‘bluhd) n: the red liquid that circulates in the heart, arteries
and veins of animals
Right (‘right) adj: located nearer the right hand esp. being on
the right when facing the same direction as the observer
Left (‘left) adj: located nearer to this side of the body than the
right
• What can we learn from dictionaries?
– Relations between words:
• Oppositions, similarities, hierarchies
Relationships between word meanings (lexical relations)
 Homonyms: Words with same form but different, unrelated
meanings, or senses (multiple lexemes)
 It is a relation between words that have the same form and the same
PoS, but unrelated meanings
A bank holds investments in a custodial account in the client’s
name.
As agriculture is burgeoning on the east bank, the river will shrink
even more
 It causes ambiguities for the interpretation of a sentence since it
defines a set of different lexemes with the same orthographic form
(bank1, bank2,..)
 Related properties are homophony (same pronunciation but different
orthography, e.g. be-bee) and homography (same orthography but
different meaning like lead/lead)
Cont…
 Homonymy causes problems for NLP applications
 General semantic interpretation
 Machine translation
 Spelling correction
 Speech recognition
 Text to speech
 Same orthographic form but different phonological form
 Bass vs bass
 Bow vs bow
 Record vs record
 Information retrieval
 Different meanings same orthographic form
Cont…
 Polysemy: Words with multiple but related meanings (same
lexeme)
 It happens when a lexeme has more related meanings
 When two senses are related semantically, we call it polysemy (rather
than homonymy)
 It depends on the word etymology (unrelated meanings usually have a
different origin) - e.g. bank/data bank/blood bank
They rarely serve red meat.
He served as U.S. ambassador.
He might have served his time in prison.
 What’s the difference between polysemy and homonymy?
 Homonymy:
 Distinct, unrelated meanings
 Polysemy:
Distinct, but related meanings
Idea bank, blood bank, bank bank
Cont…
 Synonymy: Substitutability: different lexemes with the
same meaning
 It is a relationship between two distinct lexemes with the same
meaning (i.e. they can be substituted for one another in a given
context without changing its meaning and correctness) – e.g. I
received a gift/present
How big is that plane?
How large is that plane?
How big are you? Big brother is watching.
The substitutability may not be valid for any context due to
small semantic differences (e.g. price/fare of a service – the bus
fare/the ticket price)
In general substitutability depends on the “semantic
intersection” of the senses of the two lexemes and, in same cases,
also by social factors (father/dad)
Cont…
 Hyponymy: is a relationship between two lexemes (more precisely
two senses) such that one denotes a subclass of the other
 car, vehicle – shark, fish – apple, fruit
 The relationship is not symmetric
 The more specialized concept is the hyponym of the more general one
 The more general concept is the hypernym of the more specialized
one
 Hyponym (hypernym) is the basis for the definition of a taxonomy ( a
tree structure that defines inclusion relationships in an object
ontology) even if it is not properly a taxonomy
 The definition of a formal taxonomy would require a more
uniform/rigorous formalism in the interpretation of the inclusion
relationship
 However, the relationship defines a inheritance mechanism of the
properties from the ancestors of a given concept in the hierarchy
Cont…
 General: hypernym (super…ordinate)
dog is a hypernym of poodle
 Specific: hyponym (under…neath)
poodle is a hyponym of dog
 What is ontology? Object in some domain
 What is taxonomy? Structuring of those objects
 What is object hierarchy? Structured hierarchy that supports feature

inheritance
Cont…
 Antonyms: are senses that are opposites with respect to one feature
of their meaning otherwise, they are very similar!
Dark / light
Short / long
Hot / cold
Up / down
In / out
 More formally: antonyms can
Define a binary opposition or are at opposite ends of a scale
(long/short, fast/slow)
Be reversive (describe a change of movement in opposite
directions): rise/fall, up/down
Wordnet
 It is a lexical database for English (versions for other languages are
available) organized as a semantic network of senses
It represents nouns, verbs, adjectives, and adverbs but it does
not include functional terms in the closed classes (prepositions,
conjunctions, etc.)
The lexemes are grouped into sets of cognitive synonyms
(synset), each representing a distinct concept
A set of senses (synset) is associated to each lexeme (unique
orthographic form)
Synsets are linked by conceptual/semantic and lexical
relationships
Wordnet consists lexicographic files, an application to load
these files into a database and a library of search and browsing
functions to visualize and access the database contents
Wordnet Statistics
Word sense disambiguation
 Word sense disambiguation (WSD) is the task of selecting the
correct sense for a word in a given sentence
 This problem has to be faced for words having more
meanings
 It requires a dictionary listing all the possible senses for
each word
 It can be faced for each single word or jointly for all the
words in the sentence (all the meaning combinations should
be considered)
 Examples: I ate a cold dish, I washed a dirty dish.
Cont…
 One of the central challenges in NLP.
 Ubiquitous across all languages.
 Needed in:
Machine Translation: For correct lexical choice.
Information Retrieval: Resolving ambiguity in queries.
Information Extraction: For accurate analysis of text.
 Computationally determining which sense of a word is activated

by its use in a particular context.
E.g. I am going to withdraw money from the bank. 16
16
Approaches
 Several approaches to WSD have been proposed
 Knowledge Based Approaches
 WSD using Selectional Preferences (or restrictions)
 Overlap Based Approaches
 Machine Learning Based Approaches
 Supervised Approaches
 Semi-supervised Approaches
 Unsupervised Approaches
 Hybrid Approaches
Cont…
 Knowledge Based Approaches
Rely on knowledge resources like WordNet, Thesaurus etc.
May use grammar rules for disambiguation.
May use hand coded rules for disambiguation.
 Machine Learning Based Approaches
Rely on corpus evidence.
Train a model using tagged or untagged corpus.
Probabilistic/Statistical models.
 Hybrid Approaches
18
 Use corpus evidence as well as semantic relations form WordNet.
WSD using selectional preferences
Sense 1 Sense 2
 This airlines serve dinner in  This airlines serve the sector
the evening flight. between Jima & AA.
 serve (Verb)  serve (Verb)
agent  agent
 object – sector
object – edible
 Requires exhaustive enumeration of:
 Argument-structure of verbs.
 Selectional preferences of arguments.
 Description of properties of words such that meeting the

selectional preference criteria can be decided.
19
 E.g. This flight serves the “region” between Mumbai and Delh
19
Overlap Based Approaches
 Require a Machine Readable Dictionary (MRD).
 Find the overlap between the features of different senses of an

ambiguous word (sense bag) and the features of the words in its
context (context bag).
 These features could be sense definitions, example sentences,

hypernyms etc.
 The features could also be given weights.
 The sense which has the maximum overlap is selected as the

contextually appropriate sense.
20
20
Supervised Approaches
 WSD can be approached as a classification task
 The correct sense is the class to be predicted
 The word is represented by a set (vector) of features to be
processed as the classifier input
Usually the feature includes a representation of the word to
be disambiguated (target) and of its context (a given number
of words at the left and the right of the target word)
The word itself, the word stem, the word PoS can be
exploited as features
 The classifier can be learnt from examples given a labeled
dataset
 Different models can be exploited to implement the classifier
(Naïve Bayes, neural networks, decision trees…)
Semi-supervised Approaches
 Step1: Train the Decision List algorithm using a small amount of seed
data.
 Step2: Classify the entire sample set using the trained classifier.
 Step3: Create new seed data by adding those members which are
tagged as Sense-A or Sense-B with high probability.
 Step4: Retrain the classifier using the increased seed data.
 Exploits “One sense per discourse” property
Identify words that are tagged with low confidence and label them
with the sense which is dominant for that document
22
22
Unsupervised Approaches
 Unsupervised approaches to sense disambiguation eschew (avoid)
the use of sense tagged data of any kind during training.
 In these approaches, feature-vector representations of unlabeled
instances are taken as input and are then grouped into clusters
according to a similarity metric.
 These clusters can then be represented as the average of their
constituent feature-vectors, and labeled by hand with known word
senses.
 Unseen feature-encoded instances can be classified by assigning
them the word sense from the cluster to which they are closest
according to the similarity metric.
Cont…
 Fortunately, clustering is a well-studied problem with a wide
number of standard algorithms that can be applied to inputs
structured as vectors of numerical values .
 The most frequently used technique in language applications is
known as agglomerative clustering.
 In this technique, each of the N training instances is initially
assigned to its own cluster.
 New clusters are then formed in a bottom-up fashion by
successively merging the two clusters that are most similar.
 This process continues until a either a specified number of clusters
is reached, or some global goodness measure among the clusters is
achieved.
 In cases where the number of training instances makes this method
too expensive, random sampling can be used on the original training
set to achieve similar results.
Cont…
 Of course, the fact that these unsupervised methods do not
make use of hand-labeled data poses a number of
challenges for evaluating the goodness of any clustering
result.
 The following problems are among the most important
ones that have to be addressed in unsupervised approaches.
The correct senses of the instances used in the
training data may not be known.
The clusters are almost certainly heterogeneous
with respect to the senses of the training instances
contained within them.
The number of clusters is almost always different
from the number of senses of the target word being
disambiguated.
Hybrid Approaches
 Uses semantic relations (synonymy and hypernymy) form WordNet.
 Extracts collocational and contextual information form WordNet

(gloss) and a small amount of tagged data.
 Monosemic words in the context serve as a seed set of disambiguated

words.
 In each iteration new words are disambiguated based on their

semantic distance from already disambiguated words.
 It would be interesting to exploit other semantic relations available in

WordNet.

2lexical Semantics and WSD - 080115

Uploaded by

Copyright:

Available Formats

You might also like

2lexical Semantics and WSD - 080115

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2lexical Semantics and WSD - 080115

Uploaded by

Copyright:

Available Formats

Lexical Semantics

dog is a hypernym of poodle

 Specific: hyponym (under…neath)

poodle is a hyponym of dog

 What is ontology? Object in some domain

 What is taxonomy? Structuring of those objects

 What is object hierarchy? Structured hierarchy that supports feature

 Ubiquitous across all languages.

Machine Translation: For correct lexical choice.

Information Retrieval: Resolving ambiguity in queries.

Information Extraction: For accurate analysis of text.

 Computationally determining which sense of a word is activated

E.g. I am going to withdraw money from the bank. 16

 WSD using Selectional Preferences (or restrictions)

 Overlap Based Approaches

 Machine Learning Based Approaches

Rely on knowledge resources like WordNet, Thesaurus etc.

May use grammar rules for disambiguation.

May use hand coded rules for disambiguation.

 Machine Learning Based Approaches

Rely on corpus evidence.

Train a model using tagged or untagged corpus.

 Selectional preferences of arguments.

 Description of properties of words such that meeting the

 Find the overlap between the features of different senses of an

 These features could be sense definitions, example sentences,

 The features could also be given weights.

 The sense which has the maximum overlap is selected as the

 Step4: Retrain the classifier using the increased seed data.

 Exploits “One sense per discourse” property

 Extracts collocational and contextual information form WordNet

 Monosemic words in the context serve as a seed set of disambiguated

 In each iteration new words are disambiguated based on their

 It would be interesting to exploit other semantic relations available in

You might also like