Thesis Presentation

Neural Machine
Translation
Quinn Lanners1
Advisor: Dr. Thomas Laurent1
Department of Mathematics, Loyola Marymount University

1
What is Neural Machine Translation?
Encoder-Decoder Structure
Objective: Improve PyTorch Tutorial
Results
Future Enhancements
Outline
What is Neural Machine Translation (NMT)?
Algorithm
the cat likes to eat pizza el gato le gusta comer pizza
(Google Translate)
Training a Neural Network
• Large labeled dataset
• WMT’15 English-Czech: 15.8 million sentence pairs1
Train Set Test Set

Input Target Input Target
the cat likes to eat pizza el gato le gusta comer pizza the bird can fly ?
the dog likes to run el perro le gusta correr the lion likes to hunt ?
● ● ● ●
● ● cows live on farms ?
● ●
cats hate water gatos odian el agua
1
Manning et. al
The Encoder-Decoder Structure
#
the cat likes # el gato le gusta
Encoder Decoder
to eat pizza # comer pizza
#
Input Language One Hot Encoding Vectors for
Vocabulary Input Sentence
0 1 0 0 0 0
Word Index
0 0 0 0 0 0
cat 1
1 0 0 0 0 0
dog 2
the 3 0 0 0 0 0 1
pizza 4 0 0 0 0 0 0
water 5 the = cat = likes = to = eat = pizza =
0 0 1 0 0 0
likes 6
hates 7 0 0 0 0 0 0
eat 8 0 0 0 0 1 0
run 9 0 0 0 0 0 0
to 10
0 0 0 1 0 0
𝐸 𝐸 𝐸 𝐸 𝐸 𝐸 𝐸 𝐷
h 𝑡=0 RE h 𝑡=1 R
E
h
𝑡=2 R
E
h 𝑡=3 R E
h 𝑡=4 R
E
h 𝑡=5 RE h 𝑡=6 h 𝑡=0 Decoder
WE WE WE WE WE WE
0
the 1
cat 0
likes 0
to 0
eat 0
pizza
0 0 0 0 0 0
1 0 0 0 0 0
0 0 0 0 0 1
0 0 0 0 0 0
0 0 1 0 0 0
0 0 0 0 0 0
0 0 0 0 1 0
0 0 0 0 0 0
0 0 0 1 0 0
el el el el el el
gato gato gato gato gato gato
le le le le le le
gusta gusta gusta gusta gusta gusta
comer comer comer comer comer comer
pizza el pizza gato pizza le pizza gusta pizza pizza pizza pizza <EOS>
UD UD UD UD UD UD UD
𝐷 𝐷 𝐷 𝐷 𝐷 𝐷 𝐷 𝐷
h 𝑡=0 R
D
h 𝑡=1 R
D
h 𝑡=2 R D
h 𝑡=3 R
D
h
𝑡=4 RD
h 𝑡=5 RD
h𝑡=6 R
D
h 𝑡=7
WD WD WD WD WD WD WD
<SOS> el gato le gusta comer

pizza pizza
Calculating Loss (Error)
Cross-Entropy Loss
𝑦 =− log ⁡(𝑥)
predicted probability of vocab entry e on word w.
= 1 when the vocabulary entry is the correct word
= 0 when the vocabulary entry is not the correct word

Attention Mechanism el
gato
• Decoder “looks back” at Encoder to le

help make predictions gusta
comer
pizza
𝐷
𝑚𝑡=1 UD
𝐷 𝐷
𝐸
𝑡=0
𝐸
h h h h h h h
𝑡=1
𝐸
𝑡=2
𝐸
𝑡=3
𝐸
𝑡=4
𝐸
𝑡=5
𝐸
𝑡=6
h 𝑡=0
RD h 𝑡=1
Memory WD
<SOS>
Objective: Improve PyTorch Tutorial
• 2 main Python Machine Learning packages
• PyTorch (Facebook)
• TensorFlow (Google)
• PyTorch Neural Machine Translation Tutorial “TRANSLATION WITH A
SEQUENCE TO SEQUENCE NETWORK AND ATTENTION”
• Shortcomings
• No test set
• Does not batchify
• GRU rather than LSTM
https://pytorch.org/
Data Set for PyTorch Tutorial
• French -> English
• Training Set: 10,853 sentence pairs
• Test Set: 0 sentence pairs
• Vocabulary Sizes
• French: 4489
• English: 2925
Sample
Translations
Training Pairs Evaluation
• Input: je suis en bequilles pour un mois .
• Target: i m on crutches for the next month .
• Predictions:
• i m on business to go this . . <EOS>
• i m on crutches for the next month . <EOS>
• Input: elles sont dingues de jazz .
• Target: they are crazy about jazz .
• Predictions:
• they re crazy about it . <EOS>
• they are crazy about jazz . <EOS>
Downfalls to PyTorch Tutorial Model
• Does not generalize well to sentences outside the training set
• Input: j espere que tu as une journee incroyable .
• Target: i hope you have an incredible day .
• Predictions:
• i m not telling you some . . <EOS>
• i m looking for you another wrong . <EOS>
• Cannot Handle Unknown Words or Longer Sentences

• Input: j'ai fait très attention à ne pas réveiller le chien quand je suis parti tôt ce matin
• Target: i was very careful to not wake the dog when i left early this morning .
• Predictions:
• i am very not around the morning . <EOS>
Improvements to the model
• Enhanced ability to make predictions on data that was not

trained on
• Better able to handle longer sentences
Loss
• Input: j'ai fait très attention à ne pas réveiller le chien
quand je suis parti tôt ce matin
• Target: i was very careful to not wake the dog when i left
early this morning .
• Predictions:
• i had a very careful not to wake up the dog when i
left early this morning . <EOS>
Time (hours)
Further Advancements
Scale Up Enhance Model Quantify Results
Train the model on Implement more Implement functions

an even larger advanced and to better analyze the
dataset effective Attention accuracy of the
Mechanisms translations (ex.
BLEU score)
Translation Gaffes
Input tu es tom ist c est un pauvre il est fauche

Sentence incroyablement wellenreiter type sans c ur
stupide
Correct you’re tom is a surfer he’s a cold he’s broke

Translation unbelievably hearted jerk
stupid
Model you’re my son tom is a moron he is an an he is a teacher

Translation artist
Sources
• Jozefowicz, R., Zaremba, W., & Sutskever, I. (2015, June). An empirical
exploration of recurrent network architectures. In International Conference
on Machine Learning (pp. 2342-2350).
• Luong, M. T., Pham, H., & Manning, C. D. (2015). Effective approaches to
attention-based neural machine translation. arXiv preprint arXiv:1508.04025.
• Manning, Luong, See, & Pham. Neural Machine Translation. The Stanford
Natural Language Processing Group. Retrieved on April 15, 2019 from
https://nlp.stanford.edu/projects/nmt/
• Zhang, Z., & Sabuncu, M. (2018). Generalized cross entropy loss for training
deep neural networks with noisy labels. In Advances in Neural Information
Processing Systems (pp. 8778-8788).
Thank You!
Dr. Thomas Laurent LMU Mathematics Department

Questions?

Thesis Presentation

Uploaded by

Copyright:

Available Formats

You might also like

Thesis Presentation

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Thesis Presentation

Uploaded by

Copyright:

Available Formats

Neural Machine

Department of Mathematics, Loyola Marymount University

Train Set Test Set

<SOS> el gato le gusta comer

predicted probability of vocab entry e on word w.

= 1 when the vocabulary entry is the correct word

= 0 when the vocabulary entry is not the correct word

• Decoder “looks back” at Encoder to le

• Cannot Handle Unknown Words or Longer Sentences

• Enhanced ability to make predictions on data that was not

Scale Up Enhance Model Quantify Results

Train the model on Implement more Implement functions

Input tu es tom ist c est un pauvre il est fauche

Correct you’re tom is a surfer he’s a cold he’s broke

Model you’re my son tom is a moron he is an an he is a teacher

Dr. Thomas Laurent LMU Mathematics Department

You might also like