Professional Documents
Culture Documents
NLP - PPT - CH 7
NLP - PPT - CH 7
NLP - PPT - CH 7
Processing
Pushpak Bhattacharya
Aditya Joshi
Chapter 7
Machine Translation
• Translation has served as the vehicle for making ideas expressed in one language
accessible in other languages
• Every layer in the NLP stack sends signals into the translation process
• NLP stack help reduce the requirement of data needed for training an MT system
using only raw data
Machine learning (ML) based MT, which means SMT and neural machine translation
(NMT), relegates the responsibility of ambiguity resolution to data and ML
• All these MT paradigms have an ‘A’ word as the essence of the paradigm
i. Analysis in RBMT
• One of the ideals of MT has always been the extraction of meaning completely and
correctly from the source text
• Then the production of the target language text from the extracted meaning
i. Lexico-semantic divergence:
Here, languages are different in the manner in which they arrange words and
phrases in a sentence
• The top of the triangle represents the complete disambiguated meaning of the source
sentence
• On the way down from the top, we begin to generate the target language sentence
• Descending down the right side of the triangle through different stages of natural
language generations (NLG)
• The broad stages of NLG are root word determination, target root substitution,
morphology generation on target roots
• A simplified Vauquois triangle is depicted next with source and target languages at the
bottom of the pyramid
• A transfer happening at some place between the top and bottom of the triangle
• The left side of the triangle is the analysis side and the right side is the generation side
• Any transition into the generation side below the top gives rise to the transfer-based
machine translation (TBMT)
• So, the responsibility of correct and complete capturing of language and translation
phenomena and formulating rules therefrom lies with a human system designer
• The pipeline shown next is the typical architecture for Indian language to Indian
language machine translation (ILILMT)
i. Scale and diversity: There are 22 scheduled languages in India, written in 13 different
scripts, with over 720 dialects
ii. Code mixing: Owing to India’s multilingual culture, people in India routinely and
seamlessly use at least two languages in their day-to-day communication resulting in
code-mixing
iii. Absence of basic NLP tools and resources: Most Indian languages do not have these
tools and resources, MT may be relegated to reliance on low quality or absence of
tools
v. Script complexity and non-standard input mechanism: The QWERTY keyboard for
Roman scripts is non-optimal for Indian languages
vii. Non-standard storage: Many organizations in India have their proprietary fonts that
do not follow the Unicode format
viii. Challenging language phenomena: Compound verbs in Indian languages are one
such phenomenon
• In 2014, MeitY of India funded creation of parallel corpora in many Indian languages
• About 100,000 parallel sentences were created for languages from the Indo-Aryan
and Dravidian families
• Leveraging on the created parallel corpora, SMT systems were created for pairs of
Indian languages
• The BLEU scores, which are performance measures of translating to-and-fro different
pairs of languages
• Translation involving Dravidian languages requires looking inside the words and
mapping morphemes for obtaining proper translation
We have only a handful of methods for MT for mitigating the resource problem:
i. Subwords:
• Subword-based MT involves breaking the word into its parts, making use of
characters, syllables, orthographic syllables, and byte pair encodings (BPE)
• This aims to take help from another language, which can happen in two ways:
When translating a sentence from one language to another, a simple approach may
be:
• Such an approach will require a dictionary that maps words of the source
language to those of the target language
• Instead of words, if we allow word groups to align, the modelling becomes much
simpler
Alignments are marked with ‘X’. For English Hindi, the alignment set is
‘The play is on’ ‘khel chal rahaa hai; gloss: play continue <progressive auxiliary>
<auxiliary>’
• thus depriving the more deserving candidates like ‘par’ and ‘upar’
• This may lead to the strange translation of ‘The book is on the table’ as:
‘mej ke rahaa kitaab hai’ instead of the correct ‘mej ke upar kitaab hai’!
• Unless the source and target languages are extremely close linguistically and
culturally
• Example:
• Not a single word in the Bengali sentence above has an equivalent translation in
the parallel English sentence
• Phrases in PBSMT are not necessarily linguistic phrases but are sequences of
words
• Albeit some of these word sequences can be linguistic phrases, but it is not
necessarily so
• Even when the two languages are close to each other, phrases aligned can be non-
linguistic
• We have to work with the aligned phrases in the phrase table along with their
probability values
• The probability value of a phrase translation indicates how good a translation pair
is, as formed by the phrase and its translation
• When a new sentence needs to be translated, we have to match parts of the input
sentence in the phrase table, pick up the translations
• It also combines the translations, and finally score the resulting ‘sentences
• Everything starts with finding and matching parts of the input sentence in the
phrase table
• The size of the phrase table is, thus, an important factor in the translation process
• Here, e and f have their usual meaning of output and input, respectively
• P( f |e) and PLM(e) are the translation model and language model, respectively
• Hindi grammar rules say that the agent should get the ergative marker ‘ne’ if the
verb is transitive (‘sakarmak kriya’) and in the past tense
• This rule holds even for ellipsis wherein lexemes are implicit
• If the word ‘aam’ is dropped, the translation will still be ‘mei_ne khaa_yaa’
• Here, is the highest probability output sentence, as per argmax over e, given the
input sentence f
Fig 7.17 BLEU scores for MC→EN translation with and without pivot; Grey bars are with FR as pivot
• Potentially, the language objects are parts of a continuum, and the whole power
of geometry, algebra, calculus can be harnessed for doing NLP
• Let us enumerate the essential steps through the now ‘classic-in-NMT’ encoder-
decoder:
i. The input sentence passes through what is called the encoder as a sequence of
word vectors
iii. This encoder output vector is processed by the decoder to output the target
language sentence
• Generating word forms conforming to agreement rules has to grapple with the
challenge of long-distance dependency
• This problem of attenuation of memory brings on stage two key ideas: ‘context
vector’ and ‘attention’
• After every token, the encoder output is tapped and sent to the decoder. That
output is called context vector
• There is combination of context vectors from the encoder at every token along
with autoregression
• Most recently in in WMT 14 English to German and English to French tasks in 2017
• The new performance figures forced the community to take these techniques
seriously
• Subsequent sustained interest and new very good performance figures in diverse
applications cemented the position of Transformers
• The training phase teaches the Transformer to condition the output by paying
attention to not only input words, but also their positions
• Let POS denote the position vector of dimension d. Each position t in the input
sentence has a position vector associated with it
• Let us call this POS t . Let the i-th component of the t-th position vector be
denoted as pos(t, i), i varying from 0 to (d/2) – 1
Foundational Observation 1:
• Let S be a set of symbols. Let P be the set of patterns the symbols create
• If |P |>| S|, then there must exist patterns in P that have repeated symbols
• If the patterns can be arranged in a series with equal difference of values between
every consecutive pair
• Then, at any given position, the symbols at different positions of the pattern
strings must REPEAT
• The power of Transformer comes from positional embeddings and self and cross-
attention