Word 2 Vector

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Word Vector Representation:

word2vec

made by: Mladen Tasevski 171524


1. Introduction

We use words every single day. Still we almost never ponder on how information they
convey, and how easily we understand each other only by using what would seem to be just a
sequence of characters. Every single word we hear activates our associative machine in a way that
we know the words meaning instantly. There are so many words in each of our vocabularies. A
question arises.

What are words, and how do we represent it’s meaning?

According to the Oxford Dictionary a word is:


• single distinct meaningful element of speech or writing, used with others
(or sometimes alone) to form a sentence and typically shown with a space on either side when
written or printed.
By the Webster dictionary meaning can be:
• the idea that is represented by a word, phrase, etc
• the idea that a person wants to express by using words, signs, etc.
• the idea that is expressed in a work of writing, art, etc.
In linguistics, meaning is the information or concepts that a sender intends to convey, or does
convey, in communication with a receiver.
But what would be the most useful for us trying to represent words into vectors is the semantic
meaning. That is the relationship between words and their reference. Meaning is conveyed through
language and signs.

An

example of word representation in WordNet

The most common way that meaning has been represented in computers is by using taxonomic
resources. An example of such would be WordNet. It is very popular among linguists and people
who work with NLP. It is free to use and to copy. It contains a lot of taxonomic information and it
has been and still is very useful when solving language processing problems.
It has hypernyms(is-a) relationships and synonym sets. It is the largest such representation.

Yet WordNet still fails to capture some relationships. For example it has the following words
labeled as synonyms: adept, expert, good, practiced, proficient, skillful. We can see how a problem
would arise. Also it is common for it to lack new words. That being if we would want to process
tweets for example, a lot of the words would be unknown for WordNet. It also requires human labor
to create and adapt. Generally it is hard to compute accurate word similarity.
Most of NLP work regards words as atomic symbols, sometimes using statistical models for
words(applying probabilities to words appearing with other words). The vectors used for each
words when we use them as atomic would get enormous as we increase the vocabulary. Also when
we use that kind of representation in neural networks we don’t capture the relationships between
words(maybe a little bit with the statistical models, but far from enough). For example if we would
want to implement these representations in a search engine there would be no way to capture the
meaning of the words. Having search results that only match words is not sufficient. We would want
the results to contain websites that have words that relate to the words we have typed in. For
example if we search “food near me” we would want to have results for all kinds of foods like
pizza, pasta, sandwiches, hamburgers etc. All of this being said, we should understand the need for
some other, more useful word representation.

2. Word embedding

Word embedding is a kind of word representation. It can be used for all kinds of texts,
depending on what we teach it on. Words that are similar would have similar representations. It is
this specific approach to representing words that would solve the above stated problem. This is
currently the most used approach to representing words. There are many benefits of using this kind
of approach to this problem. One is computational. Word embedding techniques produce dense and
low dimensional vectors which are apt for neural networks. Neural networks are known not to work
well with sparse vectors that are high dimensional. Perhaps the greatest benefit we get from these
techniques is the generalization that we achieve with the vectors being so dense.
It works in a way such that words are represented as real valued vectors in a predefined
vector space. Each word(vector that represents it) is assigned initial values and then goes through a
learning process that is similar to neural networks. It is this reason that it is meshed together with
deep learning models.
Often the real-valued vectors have tens or hundreds of dimensions. In contrast to some other
methods such as one hot encoding, this is a very small number of dimensions. One hot encoded
words can have thousands even millions of dimensions.
In a sense each dimension of the word embedding vectors is a kind of a feature. So a value
in each dimension, is associated to some kind of semantic meaning of the word. Words are seen as
points in the vector space.
The values of the vectors are tweaked, learned, from the usage of the words. This is exactly
what enables these methods to capture the semantic meaning of the words. When contrasted to a
method like bag of words, the meaning that is captured is substantial. In BOW, unless managed in
some way, different words have totally different representation, without regard to how they are
used.
There is a linguistic theory behind approaches of the rank of word embedding. It is called
“distributional hypothesis”. It states that words that have similar context have similar meaning. We
can see how that makes sense. Just think of some word. You can immediately conjure up words that
are similar to that one. Our personal neural network, the associative machine in our head does it
without breaking a sweat. Just take a moment to appreciate that.
There is deeper linguistic theory behind the approach, namely the “distributional hypothesis” by
Zellig Harris that could be summarized as: words that have similar context will have similar
meanings.
This notion of letting the usage of the word define its meaning can be summarized by an oft
repeated quip by John Firth:
You shall know a word by the company it keeps!

You might also like