Download as pdf or txt
Download as pdf or txt
You are on page 1of 42

Machine Translation

SNLP 2014
CSE, IIT Kharagpur

November 17, 2014

SNLP 2014 (IIT Kharagpur)

Machine Translation

November 17, 2014

1 / 27

Machine Translation

Automatically translate one natural language into another.

SNLP 2014 (IIT Kharagpur)

Machine Translation

November 17, 2014

2 / 27

Ambiguity Resolution is Required

An early MT system, when translating from English to Russian and then back
to English:
The spirit is willing but the flesh is weak.

SNLP 2014 (IIT Kharagpur)

Machine Translation

November 17, 2014

3 / 27

Ambiguity Resolution is Required

An early MT system, when translating from English to Russian and then back
to English:
The spirit is willing but the flesh is weak. The liquor is good but the
meat is spoiled.

SNLP 2014 (IIT Kharagpur)

Machine Translation

November 17, 2014

3 / 27

Ambiguity Resolution is Required

An early MT system, when translating from English to Russian and then back
to English:
The spirit is willing but the flesh is weak. The liquor is good but the
meat is spoiled.
Out of sight, out of mind.

SNLP 2014 (IIT Kharagpur)

Machine Translation

November 17, 2014

3 / 27

Ambiguity Resolution is Required

An early MT system, when translating from English to Russian and then back
to English:
The spirit is willing but the flesh is weak. The liquor is good but the
meat is spoiled.
Out of sight, out of mind. Invisible idiot.

SNLP 2014 (IIT Kharagpur)

Machine Translation

November 17, 2014

3 / 27

Linguistic Issues Making MT Difficult

Morphological issues, specially with agglutinative languages


Syntactic variation between SVO (e.g. English), SOV (e.g. Hindi), and
VSO (e.g. Arabic) languages.
I
I

SVO languages use prepositions


SOV language use postpositions

SNLP 2014 (IIT Kharagpur)

Machine Translation

November 17, 2014

4 / 27

Linguistic Issues Making MT Difficult

Morphological issues, specially with agglutinative languages


Syntactic variation between SVO (e.g. English), SOV (e.g. Hindi), and
VSO (e.g. Arabic) languages.
I
I

SVO languages use prepositions


SOV language use postpositions

SNLP 2014 (IIT Kharagpur)

Machine Translation

November 17, 2014

4 / 27

Problems

Translation Divergence
It is running Wah bhaag raha hai
It is raining Baarish ho rahi hai

SNLP 2014 (IIT Kharagpur)

Machine Translation

November 17, 2014

5 / 27

Problems

Translation Divergence
It is running Wah bhaag raha hai
It is raining Baarish ho rahi hai

Structural Divergence
Ram will attend the meeting ram sabha mein jayega
Ram will go to school ram school jayega

SNLP 2014 (IIT Kharagpur)

Machine Translation

November 17, 2014

5 / 27

Problems

Other Divergence
The fan is on [adverb] Pankha chal [verb] raha hai
The fan is good [adjective] Pankha achcha [adjective] hai

SNLP 2014 (IIT Kharagpur)

Machine Translation

November 17, 2014

6 / 27

Problems

Other Divergence
The fan is on [adverb] Pankha chal [verb] raha hai
The fan is good [adjective] Pankha achcha [adjective] hai

Conflational Divergence: (to make bigger)


X killed Y X ne Y ko mara
X stabbed Y X ne Y ko chaku se mara

SNLP 2014 (IIT Kharagpur)

Machine Translation

November 17, 2014

6 / 27

Vauquois Triangle for MT

SNLP 2014 (IIT Kharagpur)

Machine Translation

November 17, 2014

7 / 27

IBM Model 1

First model proposed as part of CANDIDE, the first complete SMT system
Assumes a simple generative model of producing F from E = e1 , e2 , . . . , eI

SNLP 2014 (IIT Kharagpur)

Machine Translation

November 17, 2014

8 / 27

IBM Model 1

First model proposed as part of CANDIDE, the first complete SMT system
Assumes a simple generative model of producing F from E = e1 , e2 , . . . , eI

Generative model
Choose length, J , of F sentence: F = f1 , f2 , . . . fJ
Choose a one to many alignment A = a1 , a2 , . . . aJ
For each position in F , generate a word fj from the aligned word in E : eaj

SNLP 2014 (IIT Kharagpur)

Machine Translation

November 17, 2014

8 / 27

Generative Model

SNLP 2014 (IIT Kharagpur)

Machine Translation

November 17, 2014

9 / 27

Generative Model

SNLP 2014 (IIT Kharagpur)

Machine Translation

November 17, 2014

10 / 27

Generative Model

SNLP 2014 (IIT Kharagpur)

Machine Translation

November 17, 2014

11 / 27

Generative Model

SNLP 2014 (IIT Kharagpur)

Machine Translation

November 17, 2014

12 / 27

Generative Model

SNLP 2014 (IIT Kharagpur)

Machine Translation

November 17, 2014

13 / 27

Computing P(F|E) in IBM Model 1


Assume some length distribution P(J|E)
Assume all alignments are equally likely. Number of possible alignments:

SNLP 2014 (IIT Kharagpur)

Machine Translation

November 17, 2014

14 / 27

Computing P(F|E) in IBM Model 1


Assume some length distribution P(J|E)
Assume all alignments are equally likely. Number of possible alignments:

(I + 1)J
P(A|E) = P(A|E, J)P(J|E) =

SNLP 2014 (IIT Kharagpur)

Machine Translation

P(J|E)
(I + 1)J

November 17, 2014

14 / 27

Computing P(F|E) in IBM Model 1


Assume some length distribution P(J|E)
Assume all alignments are equally likely. Number of possible alignments:

(I + 1)J
P(A|E) = P(A|E, J)P(J|E) =

P(J|E)
(I + 1)J

Assume t(fx , ey ) is the probability of translating ey as fx , therefore:


J

P(F|E, A) = t(fj , eaj )


j=1

Determine P(F|E) by summing over all alignments:

P(F|E) = P(F|E, A)P(A|E) =


A
SNLP 2014 (IIT Kharagpur)

A
Machine Translation

P(J|E)
(I + 1)J

t(fj , ea )
j

j=1

November 17, 2014

14 / 27

Decoding for IBM Model 1


Find the most probable alignment given a parameterized model

= arg max P(F, A|E)


A
A

= arg max
A

P(J|E)
(I + 1)J

t(fj , ea )
j

j=1

= arg max t(fj , eaj )


A

SNLP 2014 (IIT Kharagpur)

Machine Translation

November 17, 2014

15 / 27

Decoding for IBM Model 1


Find the most probable alignment given a parameterized model

= arg max P(F, A|E)


A
A

= arg max
A

P(J|E)
(I + 1)J

t(fj , ea )
j

j=1

= arg max t(fj , eaj )


A

Since translation choice for each position j is independent, the product is


maximized by maximizing each term:

aj = arg max t(fj , ei ), 1 j J


0iI

SNLP 2014 (IIT Kharagpur)

Machine Translation

November 17, 2014

15 / 27

MT Evaluation

Collect one or more human reference translations of the source


Compare MT output to these reference translations
Score result based on similarity to the reference translations
BLEU is a very common metric for evaluation

SNLP 2014 (IIT Kharagpur)

Machine Translation

November 17, 2014

16 / 27

MT Evaluation

Collect one or more human reference translations of the source


Compare MT output to these reference translations
Score result based on similarity to the reference translations
BLEU is a very common metric for evaluation

BLEU
Determine number of n-grams of various sizes that the MT output shares
with the reference translations.
Compute a modified precision measure of the n-grams in MT result.

SNLP 2014 (IIT Kharagpur)

Machine Translation

November 17, 2014

16 / 27

BLEU Example

SNLP 2014 (IIT Kharagpur)

Machine Translation

November 17, 2014

17 / 27

BLEU Example: Candidate 1 Unigram

SNLP 2014 (IIT Kharagpur)

Machine Translation

November 17, 2014

18 / 27

BLEU Example: Candidate 1 Unigram

SNLP 2014 (IIT Kharagpur)

Machine Translation

November 17, 2014

19 / 27

BLEU Example: Candidate 1 Bigram

SNLP 2014 (IIT Kharagpur)

Machine Translation

November 17, 2014

20 / 27

BLEU Example

SNLP 2014 (IIT Kharagpur)

Machine Translation

November 17, 2014

21 / 27

BLEU Example: Candidate 2 Unigram

SNLP 2014 (IIT Kharagpur)

Machine Translation

November 17, 2014

22 / 27

BLEU Example: Candidate 2 Bigram

SNLP 2014 (IIT Kharagpur)

Machine Translation

November 17, 2014

23 / 27

Modified N-gram Precision

SNLP 2014 (IIT Kharagpur)

Machine Translation

November 17, 2014

24 / 27

Brevity Penalty

Not easy to compute recall to complement precision since there are


multiple alternative gold-standard references
Instead, use a penalty for translations that are shorter than the reference
translations

SNLP 2014 (IIT Kharagpur)

Machine Translation

November 17, 2014

25 / 27

Brevity Penalty

Not easy to compute recall to complement precision since there are


multiple alternative gold-standard references
Instead, use a penalty for translations that are shorter than the reference
translations
Define effective reference length, r, for each sentence as the length of the
reference sentence with the largest number of ngram matches.

SNLP 2014 (IIT Kharagpur)

Machine Translation

November 17, 2014

25 / 27

Brevity Penalty

Not easy to compute recall to complement precision since there are


multiple alternative gold-standard references
Instead, use a penalty for translations that are shorter than the reference
translations
Define effective reference length, r, for each sentence as the length of the
reference sentence with the largest number of ngram matches.
Let c be the candidate sentence length.

SNLP 2014 (IIT Kharagpur)

Machine Translation

November 17, 2014

25 / 27

Brevity Penalty

Not easy to compute recall to complement precision since there are


multiple alternative gold-standard references
Instead, use a penalty for translations that are shorter than the reference
translations
Define effective reference length, r, for each sentence as the length of the
reference sentence with the largest number of ngram matches.
Let c be the candidate sentence length.

BP = e(1r/c) if c r, 1 otherwise

SNLP 2014 (IIT Kharagpur)

Machine Translation

November 17, 2014

25 / 27

BLEU Example

SNLP 2014 (IIT Kharagpur)

Machine Translation

November 17, 2014

26 / 27

BLEU Example

Reference 1 has
the largest ngram matches with candidate 1, while Reference 2 has the
largest ngram matches with candidate 2.

SNLP 2014 (IIT Kharagpur)

Machine Translation

November 17, 2014

26 / 27

Final BLEU Score


Final BLEU Score
BLEU = BP p

SNLP 2014 (IIT Kharagpur)

Machine Translation

November 17, 2014

27 / 27

You might also like