Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

NLP UNIT-V

Semantics Vector Semantics; Words and Vector; Measuring Similarity; Semantics with dense
vectors; SVD and Latent Semantic Analysis; Embeddings from prediction: Skip-gram and
CBOW; Concept of Word Sense; Introduction to WordNet.

Semantics Vector Semantics

Semantics refers to the study of the meaning of language. In natural language processing
(NLP), understanding the meaning of words and phrases is crucial for various
language-related tasks. Vector Semantics, also known as distributional semantics, is an
approach to understanding this meaning by representing words as vectors in a
high-dimensional space. This approach is based on the idea that words that appear in similar
contexts tend to have similar meanings.

Here's a clear explanation with an example:

Representation of Word Meaning: In Vector Semantics, each word or phrase in a language


is represented as a vector in a high-dimensional space. Think of this space as having many
dimensions (possibly hundreds or even thousands), with each dimension representing a
different aspect of context or meaning.
https://vijaykumarnvs.blogspot.com/
Contextual Information: The key insight is that words derive their meaning from the
contexts in which they appear. Words that often appear together in similar contexts are likely
to have related meanings. For instance, consider two words: "cat" and "dog." In many
contexts, they might appear together or in similar surroundings like "pet," "animal," or
"furry." This similarity in context implies that "cat" and "dog" are related in meaning.

Example: Let's create a simplified example. Suppose we have a small corpus of text (a
collection of sentences) with the words "cat," "dog," "pet," and "animal":

● "I have a cat."


● "She has a dog."
● "A pet can be a joy."
● "An animal is a living being."

We can represent the words in a vector space based on how often they appear together:

● The vector for "cat" might have high values in dimensions related to "pet" and
"animal."
● Similarly, the vector for "dog" will also have high values in those dimensions.
● The vectors for "pet" and "animal" might have overlapping values in the same
dimensions.
Vector Operations: Once words are represented as vectors, you can perform various
mathematical operations on these vectors to derive meaning relationships. For instance, you
can calculate the cosine similarity between two word vectors. Words with vectors that are
close in the vector space (i.e., have a high cosine similarity) are considered to be similar in
meaning.

Applications: Vector semantics has practical applications in NLP, including:

● Word Similarity: It can determine how similar two words are in meaning. Words
with vectors close in space are similar.
● Document Retrieval: It helps match search queries to relevant documents.
● Sentiment Analysis: It can analyze the sentiment of text by considering the meanings
of words.
● Machine Translation: In translation systems, it helps find equivalent words or
phrases in different languages.

Training: To create these word vectors, large text corpora are used for training. Techniques
like Word2Vec, GloVe, and fastText are commonly used for generating word vectors based on
co-occurrence statistics in text.

In summary, Vector Semantics is a powerful approach to representing and understanding


word and phrase meanings in natural language. It's based on the idea that words with similar
https://vijaykumarnvs.blogspot.com/
contexts have similar meanings and is fundamental to many modern NLP applications and
language understanding systems.

Words and Vector

Words: Words are fundamental units of language that carry meaning. In NLP, text is
composed of words, and understanding the meaning and relationships between
words is a key challenge. Words can represent entities, actions, emotions, and
more, and they are the building blocks of language.

Vectors: Vectors are mathematical objects used to represent quantities or entities in


a multi-dimensional space. In the context of NLP, vectors are used to represent
words in a high-dimensional space. Each word is associated with a unique vector,
and these vectors are constructed in such a way that they capture semantic
information about the words.

Word Vectors (Word Embeddings): Word vectors, often referred to as word


embeddings or word representations, are vector-based representations of words in
NLP. These representations are designed to capture the meaning and relationships
between words. Here's how they work:

● High-Dimensional Space: Each word is represented as a vector in a


high-dimensional space. The dimensions of this space can range from a few
hundred to thousands.
● Contextual Information: The key idea is that words used in similar contexts
have similar vector representations. Words that often appear together in
sentences will have vectors that are close to each other in this
high-dimensional space.
● Learned from Data: Word vectors are learned from large amounts of text data.
Algorithms like Word2Vec, GloVe, and fastText use unsupervised learning to
train these word embeddings. They do so by observing patterns in the
co-occurrence of words in sentences.
● Semantic Relationships: Word vectors capture semantic relationships. For
example, in a well-trained word embedding model, the vectors for "king" and
"queen" will be closer together than the vectors for "king" and "apple" because
"king" and "queen" often appear in similar contexts.

https://vijaykumarnvs.blogspot.com/
Measuring Similarity
1. Cosine Similarity:

● Definition: Cosine similarity measures the cosine of the angle between two
vectors in a high-dimensional space.
● Example: Suppose we have word embeddings for two words, "king" and
"queen," represented as vectors in a high-dimensional space. Cosine similarity
calculates how similar their directions are in this space. If the cosine similarity
is close to 1, it indicates a high degree of similarity.
● Calculation: Cosine similarity between vectors A and B is calculated as (A
dot B) / (||A|| * ||B||).

2. Jaccard Similarity:

● Definition: Jaccard similarity measures the similarity between two sets by


comparing their intersection to their union.
● Example: Consider two sets of words: Set A = {"cat", "dog", "pet"} and Set B =
{"dog", "pet", "bird"}. Jaccard similarity calculates the ratio of the number of
common words (in this case, 2: "dog" and "pet") to the total number of unique
words in both sets.
● Calculation: Jaccard similarity = (Intersection of A and B) / (Union of A and B)

3. Edit Distance:

● Definition: Edit distance, also known as Levenshtein distance, measures the


number of operations (insertions, deletions, and substitutions) required to
transform one word into another.
● Example: Comparing the words "kitten" and "sitting," the edit distance is 3
because you can transform "kitten" into "sitting" with three operations: "k" ->
"s," "e" -> "i," and appending "g" at the end.

4. WordNet Similarity:

● Definition: WordNet is a lexical database that groups words into sets of


synonyms called synsets. WordNet similarity measures the similarity between
two words based on their position in the WordNet hierarchy.
● Example: If you compare "cat" and "dog" in WordNet, you can find that they
belong to different synsets but share a common hypernym (a more general
term) like "animal," indicating a certain degree of similarity.

5. Embedding-Based Similarity:
https://vijaykumarnvs.blogspot.com/
● Definition: Embedding-based similarity measures the similarity between
words or phrases by calculating the cosine similarity between their respective
word embeddings.
● Example: Using word embeddings, you can calculate the similarity between
"apple" and "orange" by computing the cosine similarity between their vectors.
High cosine similarity indicates that these words are related in meaning.

Choice of Measure: The choice of similarity measure depends on the specific NLP
task and the nature of the data. Cosine similarity is commonly used when working
with word embeddings. Jaccard similarity is suitable for comparing sets of words,
while edit distance quantifies the similarity in terms of character-level operations.
WordNet similarity is useful for capturing hierarchical relationships, and
embedding-based similarity leverages word vectors for semantic similarity.

Semantics with dense vectors

Semantics with Dense Vectors: Semantics with dense vectors, also known as
distributed or dense vector semantics, is an approach in NLP that represents words
as dense vectors in a high-dimensional space. These dense vectors are designed to
capture the meaning and relationships between words, and they are particularly
well-suited for computational efficiency and effectiveness.

Key Points:

​ Representation as Dense Vectors: In this approach, words from a large


corpus of text are represented as dense vectors. Unlike sparse
representations (e.g., one-hot encoding), where most elements are zeros,
dense vectors have values in nearly all dimensions.
​ Low-Dimensional Vectors: The dimensions of these dense vectors are
relatively low, typically ranging from 100 to 300 dimensions. Despite the lower
dimensionality, they effectively capture semantic information.
​ Learning Process: The process of learning these word vectors involves
training a neural network. The network is exposed to pairs of words that are
either similar or dissimilar in meaning. By adjusting the weights of the
network, it learns to produce similar vectors for similar words and dissimilar
vectors for dissimilar words.
​ Semantic Relationships: Dense vectors capture semantic relationships
between words. Words that have similar meanings or are used in similar
contexts will have vectors that are closer together in the high-dimensional
space, while words with different meanings will have more distant vectors.
https://vijaykumarnvs.blogspot.com/
Applications: Once word vectors have been learned through this approach, they can
be applied to various NLP tasks:

● Word Similarity: Word vectors enable the measurement of similarity between


two words or phrases. Similarity can be calculated using methods like cosine
similarity on the vectors.
● Text Classification: Dense vector representations of words or documents can
improve text classification tasks, as they encode semantic information.
● Machine Translation: Word vectors enhance machine translation models by
capturing the meaning of words in different languages, aiding in accurate
translations.
● Information Retrieval: In information retrieval systems, word vectors can be
used to rank and retrieve documents that are most relevant to a user's query,
considering semantic meaning.

Advantages:

● Dense vectors are computationally efficient, especially in comparison to


high-dimensional one-hot encodings.
● They capture subtle semantic relationships between words, even with
low-dimensional representations.
● Dense vectors are useful for various NLP tasks and can be transferred
between different applications.

Popular Algorithms: Word2Vec, GloVe, and fastText are popular algorithms that
learn dense vector representations of words from large text corpora.

SVD and Latent Semantic Analysis


Latent Semantic Analysis is a natural language processing method that uses the statistical
approach to identify the association among the words in a document. LSA deals with the
following kind of issue:
Example: mobile, phone, cell phone, telephone are all similar but if we pose a query like
“The cell phone has been ringing” then the documents which have “cell phone” are only
retrieved whereas the documents containing the mobile, phone, telephone are not retrieved.
Assumptions of LSA:
1. The words which are used in the same context are analogous to each other.
2. The hidden semantic structure of the data is unclear due to the ambiguity of the
words chosen.
https://vijaykumarnvs.blogspot.com/
Singular Value Decomposition:
Singular Value Decomposition is the statistical method that is used to find the latent(hidden)
semantic structure of words spread across the document.
Let
C = collection of documents.
d = number of documents.
n = number of unique words in the whole collection.
M=dXn
The SVD decomposes the M matrix i.e word to document matrix into three matrices as
follows
M = U∑VT
where
U = distribution of words across the different contexts
∑ = diagonal matrix of the association among the contexts
VT = distribution of contexts across the different documents
A very significant feature of SVD is that it allows us to truncate few contexts which are not
necessarily required by us. The ∑ matrix provides us with the diagonal values which
represent the significance of the context from highest to the lowest. By using these values we
can reduce the dimensions and hence this can be used as a dimensionality reduction technique
too.
If we select the k the largest diagonal values in ∑ a matrix we obtain
Mk = Uk∑kVTK

https://vijaykumarnvs.blogspot.com/
where
Mk = approximated matrix of M
Uk, ∑k, VTk are the matrices containing only the k contexts from U, ∑, VT respectively

Embeddings from prediction

Embeddings from Prediction:


Embeddings from prediction is a technique used in natural language processing (NLP) to
learn word embeddings by training a neural network to predict a certain property of a word
(such as its context or co-occurrence with other words) from its embedding. Here's how it
works:
Objective: The primary objective of this technique is to create word embeddings that capture
the semantic relationships between words based on their contextual usage.
Neural Network Model: A shallow feedforward neural network, often called a neural
language model, is used for this task. The neural network takes the embedding of a word as
input and produces a probability distribution over the vocabulary as output.
Training Data: The training data consists of a large corpus of text. For each word in the
corpus, the model is trained to predict a target word based on its context (or vice versa). This
means that the neural network learns to associate words with their neighboring words in
sentences.
Embedding Learning: During training, the word embeddings are learned as a part of the
neural network's weights. The embeddings are essentially the weights of the input layer of the
neural network. Backpropagation is used to adjust these weights during training to minimize
the prediction error.
Prediction Task: The neural network's prediction task helps it understand the relationships
between words in the context of the training data. For example, if the network is trained to
https://vijaykumarnvs.blogspot.com/
predict the context words for a given target word, it learns to place similar words closer
together in the embedding space.
Embedding Space: Once the training is complete, the learned word embeddings reside in a
lower-dimensional vector space. Each word is represented as a dense vector in this space,
capturing its semantic meaning and contextual relationships with other words.
Applications: These word embeddings can be used for various NLP tasks. For instance, they
can be used to measure the similarity between words or phrases by calculating the cosine
similarity between their respective embeddings. They are also valuable for tasks like
sentiment analysis, machine translation, and text classification.
Example:
Here's a simple Python example using TensorFlow and Keras to demonstrate word
embeddings:
python
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding
# Sample text data
docs = ["Data Science is a popular field.",
"Data Science is useful in predicting financial performance of economies",
"Performance is important in data science"]

# Tokenize the text data


tokenizer = Tokenizer()
tokenizer.fit_on_texts(docs)
vocab_size = len(tokenizer.word_index) + 1

# Create a neural network model with an embedding layer


model = Sequential()
model.add(Embedding(input_dim=vocab_size, output_dim=3, input_length=1))
model.compile(optimizer='rmsprop', loss='mse')

https://vijaykumarnvs.blogspot.com/
# Input word to get its embedding
input_array = tokenizer.texts_to_sequences(["Data"])
output_array = model.predict(input_array)

print(output_array)

In this example, we create a simple word embedding model that condenses word
representations into a 3-dimensional vector space. The word "Data" is passed through the
model to obtain its embedding vector.
The output represents the word "Data" in the 3-dimensional embedding space, and similar
words should have similar embeddings in this space, capturing their contextual relationships.

Skip-gram and CBOW

Objective: The Skip-gram model aims to learn word embeddings by predicting the
context words surrounding a given target word.

Data Preparation:
​ Text Corpus: Take the example sentence: "the quick brown fox jumps over the
lazy dog."
​ Context Window: Define a context window size. In this example, we'll use a
context window of 1, meaning we consider one word to the left and one word
to the right of the target word.

Training Data Creation:

​ Pairs Creation: For each word in the corpus, create training pairs. Pair the
target word with each word within its context window. This forms a dataset of
(target word, context word) pairs. Here are some pairs for our example:
● ("quick", "the")
● ("quick", "brown")
● ("brown", "quick")
● ("brown", "fox")
● ("fox", "brown")
● ("fox", "jumps")
● ("jumps", "fox")
● ("jumps", "over")
● ("over", "jumps")
● ("over", "the")
https://vijaykumarnvs.blogspot.com/
● ("the", "over")
● ("the", "lazy")
● ("lazy", "the")
● ("lazy", "dog")
● ("dog", "lazy")

Neural Network Model:


​ Neural Network Setup: Create a neural network with an embedding layer that
will learn word vectors. In the Skip-gram model, the target word is used as
input, and the model is trained to predict the context words within the window.
● Input: The target word (e.g., "quick").
● Output: Probability distribution over the vocabulary for context words
("the," "brown," "fox," etc.).

Learning Word Vectors:

​ Training: During the training process, the neural network adjusts its weights
(word vectors) to maximize the probability of predicting the correct context
words for a given target word. The objective is to make the predicted context
words as close as possible to the actual context words observed in the
training data.

Embedding Space:

​ Word Vectors: After training is complete, the word vectors learned by the
model reside in a lower-dimensional vector space. Each word in the
vocabulary now has an associated vector that represents its semantic
meaning based on the contextual usage observed in the training corpus.
https://vijaykumarnvs.blogspot.com/
These word vectors can be used for various natural language processing tasks, such
as measuring word similarity, text classification, and machine translation. The
Skip-gram model captures semantic relationships between words by learning from
their contextual usage within the corpus.

Please note that in practice, the training data is usually much larger, and the
dimensionality of the word vectors is often set to a relatively small number, e.g.,
100-300 dimensions, to balance computational efficiency and meaningful semantic
representation.

Continuous Bag of Words model (CBOW)

Objective: The CBOW model aims to learn word embeddings by predicting a target
word based on the context words surrounding it.

Data Preparation:
​ Text Corpus: Use the example sentence: "It would be sad memory to watch it
would be unhappy memory to watch."


​ Context Window: Define a context window size. In this example, we'll use a
context window of 2, meaning we consider two words to the left and two
words to the right of the target word.

Training Data Creation:

​ Pairs Creation: For each word in the corpus, create training pairs to predict the
target words "sad" and "unhappy." Pair each target word with the context
words within its context window. Here are some pairs for both "sad" and
"unhappy":
https://vijaykumarnvs.blogspot.com/
● For "sad":
● ("sad", "be", "would", "memory")
● ("sad", "would", "be", "memory")
● ("sad", "be", "would", "unhappy")
● ("sad", "would", "be", "memory")
● ("sad", "be", "would", "to")
● ("sad", "would", "be", "memory")
● For "unhappy":
● ("unhappy", "be", "would", "memory")
● ("unhappy", "would", "be", "memory")
● ("unhappy", "be", "would", "sad")
● ("unhappy", "would", "be", "memory")
● ("unhappy", "be", "would", "to")
● ("unhappy", "would", "be", "memory")

Neural Network Model:

​ Neural Network Setup: Create a neural network with an embedding layer that
learns word vectors. In the CBOW model, the context words are used as input,
and the model is trained to predict the target word.
● Input: Context words (e.g., ["be", "would", "memory"] for predicting
"sad").
● Output: Probability distribution over the vocabulary for the target word
("sad").

Learning Word Vectors:

​ Training: During the training process, the neural network adjusts its weights
(word vectors) to maximize the probability of predicting the correct target
word based on the context words.

Embedding Space:

​ Word Vectors: After training, the word vectors learned by the model reside in a
lower-dimensional vector space. Each word in the vocabulary has an
associated vector representing its semantic meaning based on the contextual
usage observed in the training corpus.

These word vectors can be used for various NLP tasks. In this case, the word vectors
for "sad" and "unhappy" have been learned based on their contextual usage in the
given sentence.
https://vijaykumarnvs.blogspot.com/
The CBOW model is particularly useful for capturing semantic relationships between
words when the training data is limited or noisy, as it considers multiple context
words to predict the target word.

Concept of Word Sense

Word Sense: In linguistics and natural language processing (NLP), the term "word
sense" refers to the multiple meanings or interpretations that a single word can have
in different contexts. It acknowledges that words can carry distinct senses or
nuances based on their usage, and these senses are the specific ways in which a
word conveys meaning.

Importance of Word Sense: Word sense is a crucial concept in NLP for several
reasons:

​ Disambiguation: Words often have multiple senses, and identifying the


correct sense of a word in a particular context is essential to understand the
overall meaning of a sentence. For example, consider the word "bank," which
can refer to a financial institution or the side of a river. Determining which
sense is intended in a given sentence is critical for comprehension.
● Example 1: "I went to the bank to deposit my paycheck." (Financial
institution)
● Example 2: "I sat by the bank and watched the river flow." (Side of a
river)
​ Improved Accuracy: NLP applications such as machine translation, text
classification, and sentiment analysis rely on understanding the meaning of
words. Accurate word sense disambiguation can significantly enhance the
performance of these applications.

Approaches to Representing Word Sense:

​ Sense Inventory: One approach involves creating a sense inventory, which is


essentially a catalog of all the possible senses or meanings of a word. Each
sense is associated with a unique identifier or definition. For instance, the
word "bank" may have entries in the sense inventory representing "financial
institution" and "side of a river."
● Sense Inventory for "bank":
● Sense 1: "Financial institution"
● Sense 2: "Side of a river"
https://vijaykumarnvs.blogspot.com/
​ Context-Based Representations: Another approach relies on the context in
which a word appears to infer its sense. The sense of a word is determined
based on the surrounding words in a given text. Context-based methods use
the idea that words that frequently appear together in similar contexts are
likely to share the same sense.

Example: Let's consider the word "run," which has multiple senses:

● Sense 1: "Physical exercise" - "I go for a run every morning."


● Sense 2: "Managing a business" - "She can run the company efficiently."

In the first sentence, "run" refers to physical exercise, while in the second sentence, it
has a different sense, meaning managing a business. The context of the word helps
us disambiguate its meaning.
Introduction to WordNet

WordNet is a comprehensive lexical database for the English language. Developed at


Princeton University and first released in 1985, it serves as a valuable resource for
natural language processing (NLP) and linguistics. WordNet organizes words into
groups of synonyms, known as "synsets," and illustrates semantic relationships
between these words.

Key Elements of WordNet:

​ Synsets: The fundamental building blocks of WordNet are synsets. A synset is


https://vijaykumarnvs.blogspot.com/
a collection of words that are synonymous or closely related in meaning.
These words are grouped together based on their shared semantic sense.

Example:

● Synset for "Car":


● Words: "automobile," "machine," "motorcar"
● Explanation: These words belong to the same synset because they
share the common meaning of a "car."
​ Semantic Relationships: WordNet establishes various semantic relationships
between words using specific types of links. These relationships help capture
the nuances of word meanings and associations.
● Hypernymy: This relationship describes a hierarchy where a more
general term is linked to a more specific term. For instance:
● "Animal" is a hypernym of "Dog." Here, "animal" is the more
general term that encompasses "dog."
● Hyponymy: In contrast to hypernymy, hyponymy denotes a relationship
where a more specific term is associated with a more general term.
● "Dog" is a hyponym of "Animal." In this case, "dog" is the specific
term that falls under the broader category of "animal."
● Meronymy and Holonymy: These relationships describe parts and
wholes. Meronymy specifies a part-of relationship, while holonymy
indicates a whole-of relationship.
● Example: In the context of a "car," "engine" is a meronym (part
of), and "car" is the holonym (the whole entity).

Key Distinctions: WordNet has some important distinctions compared to a


thesaurus:

​ Sense Disambiguation: WordNet links not only word forms but specific
senses of words. This means that words are linked based on their distinct
meanings in different contexts, enabling precise sense disambiguation.
​ Semantic Relations: WordNet labels the semantic relationships among words
explicitly. These relationships provide deeper insights into word connections
beyond mere synonymy.

https://vijaykumarnvs.blogspot.com/

WordNet in Practice: The Natural Language Toolkit (NLTK), an open-source Python


library for NLP, includes the English WordNet. This resource contains a vast
vocabulary with 155,287 words and 117,659 synonym sets. NLTK's WordNet module
allows developers and researchers to access and leverage WordNet data for various
NLP tasks, including word sense disambiguation, synonym detection, and semantic
analysis.
In summary, WordNet is a powerful linguistic resource that categorizes words into
synsets and describes the intricate web of semantic relationships between them. It
plays a vital role in NLP by aiding in the understanding of word meanings, promoting
accurate text analysis, and facilitating the development of intelligent language
processing systems.

https://vijaykumarnvs.blogspot.com/

You might also like