Thesis Presentation

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 20

Neural Machine

Translation
Quinn Lanners1
Advisor: Dr. Thomas Laurent1

Department of Mathematics, Loyola Marymount University


1
What is Neural Machine Translation?
Encoder-Decoder Structure
Objective: Improve PyTorch Tutorial
Results
Future Enhancements

Outline
What is Neural Machine Translation (NMT)?

Algorithm
the cat likes to eat pizza el gato le gusta comer pizza
(Google Translate)
Training a Neural Network
• Large labeled dataset
• WMT’15 English-Czech: 15.8 million sentence pairs1

Train Set Test Set


Input Target Input Target
the cat likes to eat pizza el gato le gusta comer pizza the bird can fly ?
the dog likes to run el perro le gusta correr the lion likes to hunt ?
● ● ● ●
● ● cows live on farms ?
● ●
cats hate water gatos odian el agua
1
Manning et. al
The Encoder-Decoder Structure

#
the cat likes # el gato le gusta
Encoder Decoder
to eat pizza # comer pizza
#
Input Language One Hot Encoding Vectors for
Vocabulary Input Sentence
0 1 0 0 0 0
Word Index
0 0 0 0 0 0
cat 1
1 0 0 0 0 0
dog 2
the 3 0 0 0 0 0 1
pizza 4 0 0 0 0 0 0
water 5 the = cat = likes = to = eat = pizza =
0 0 1 0 0 0
likes 6
hates 7 0 0 0 0 0 0

eat 8 0 0 0 0 1 0
run 9 0 0 0 0 0 0
to 10
0 0 0 1 0 0
𝐸 𝐸 𝐸 𝐸 𝐸 𝐸 𝐸 𝐷
h 𝑡=0 RE h 𝑡=1 R
E
h
𝑡=2 R
E
h 𝑡=3 R E
h 𝑡=4 R
E
h 𝑡=5 RE h 𝑡=6 h 𝑡=0 Decoder

WE WE WE WE WE WE

0
the 1
cat 0
likes 0
to 0
eat 0
pizza
0 0 0 0 0 0
1 0 0 0 0 0
0 0 0 0 0 1
0 0 0 0 0 0
0 0 1 0 0 0
0 0 0 0 0 0
0 0 0 0 1 0
0 0 0 0 0 0
0 0 0 1 0 0
el el el el el el
gato gato gato gato gato gato
le le le le le le
gusta gusta gusta gusta gusta gusta
comer comer comer comer comer comer
pizza el pizza gato pizza le pizza gusta pizza pizza pizza pizza <EOS>

UD UD UD UD UD UD UD

𝐷 𝐷 𝐷 𝐷 𝐷 𝐷 𝐷 𝐷
h 𝑡=0 R
D
h 𝑡=1 R
D
h 𝑡=2 R D
h 𝑡=3 R
D
h
𝑡=4 RD
h 𝑡=5 RD
h𝑡=6 R
D
h 𝑡=7

WD WD WD WD WD WD WD

<SOS> el gato le gusta comer


pizza pizza
Calculating Loss (Error)
Cross-Entropy Loss
𝑦 =− log ⁡(𝑥)

predicted probability of vocab entry e on word w.

= 1 when the vocabulary entry is the correct word

= 0 when the vocabulary entry is not the correct word


Attention Mechanism el
gato

• Decoder “looks back” at Encoder to le


help make predictions gusta
comer
pizza
𝐷
𝑚𝑡=1 UD

𝐷 𝐷
𝐸
𝑡=0
𝐸
h h h h h h h
𝑡=1
𝐸
𝑡=2
𝐸
𝑡=3
𝐸
𝑡=4
𝐸
𝑡=5
𝐸
𝑡=6
h 𝑡=0
RD h 𝑡=1

Memory WD

<SOS>
Objective: Improve PyTorch Tutorial
• 2 main Python Machine Learning packages
• PyTorch (Facebook)
• TensorFlow (Google)
• PyTorch Neural Machine Translation Tutorial “TRANSLATION WITH A
SEQUENCE TO SEQUENCE NETWORK AND ATTENTION”
• Shortcomings
• No test set
• Does not batchify
• GRU rather than LSTM

https://pytorch.org/
Data Set for PyTorch Tutorial
• French -> English
• Training Set: 10,853 sentence pairs
• Test Set: 0 sentence pairs
• Vocabulary Sizes
• French: 4489
• English: 2925
Sample
Translations
Training Pairs Evaluation
• Input: je suis en bequilles pour un mois .
• Target: i m on crutches for the next month .
• Predictions:
• i m on business to go this . . <EOS>
• i m on crutches for the next month . <EOS>
• Input: elles sont dingues de jazz .
• Target: they are crazy about jazz .
• Predictions:
• they re crazy about it . <EOS>
• they are crazy about jazz . <EOS>
Downfalls to PyTorch Tutorial Model
• Does not generalize well to sentences outside the training set
• Input: j espere que tu as une journee incroyable .
• Target: i hope you have an incredible day .
• Predictions:
• i m not telling you some . . <EOS>
• i m looking for you another wrong . <EOS>

• Cannot Handle Unknown Words or Longer Sentences


• Input: j'ai fait très attention à ne pas réveiller le chien quand je suis parti tôt ce matin
• Target: i was very careful to not wake the dog when i left early this morning .
• Predictions:
• i am very not around the morning . <EOS>
Improvements to the model

• Enhanced ability to make predictions on data that was not


trained on
• Better able to handle longer sentences

Loss
• Input: j'ai fait très attention à ne pas réveiller le chien
quand je suis parti tôt ce matin
• Target: i was very careful to not wake the dog when i left
early this morning .
• Predictions:
• i had a very careful not to wake up the dog when i
left early this morning . <EOS>

Time (hours)
Further Advancements

Scale Up Enhance Model Quantify Results

Train the model on Implement more Implement functions


an even larger advanced and to better analyze the
dataset effective Attention accuracy of the
Mechanisms translations (ex.
BLEU score)
Translation Gaffes

Input tu es tom ist c est un pauvre il est fauche


Sentence incroyablement wellenreiter type sans c ur
stupide

Correct you’re tom is a surfer he’s a cold he’s broke


Translation unbelievably hearted jerk
stupid

Model you’re my son tom is a moron he is an an he is a teacher


Translation artist
Sources
• Jozefowicz, R., Zaremba, W., & Sutskever, I. (2015, June). An empirical
exploration of recurrent network architectures. In International Conference
on Machine Learning (pp. 2342-2350).
• Luong, M. T., Pham, H., & Manning, C. D. (2015). Effective approaches to
attention-based neural machine translation. arXiv preprint arXiv:1508.04025.
• Manning, Luong, See, & Pham. Neural Machine Translation. The Stanford
Natural Language Processing Group. Retrieved on April 15, 2019 from
https://nlp.stanford.edu/projects/nmt/
• Zhang, Z., & Sabuncu, M. (2018). Generalized cross entropy loss for training
deep neural networks with noisy labels. In Advances in Neural Information
Processing Systems (pp. 8778-8788).
Thank You!

Dr. Thomas Laurent LMU Mathematics Department


Questions?

You might also like