06 Neural Networks For NLP

EE782 Advanced Topics in Machine Learning
06 Neural Networks for NLP

Amit Sethi
Electrical Engineering, IIT Bombay
Module objectives
• Design LSTM based networks for various NLP tasks

• Articulate the information bottleneck in encoder-decoder
architectures
• Explain how attention alleviates the information bottleneck
• Understand how attention can do almost all tasks done by the
other layers
Pre-processing for NLP
• The most basic pre-processing is to convert words into an
embedding using Word2Vec or GloVe
• Otherwise, a one-hot-bit input vector can be too long and

sparse, and require lots on input weights
Pre-training LSTMs
• Learning to predict the next word can imprint powerful
language models in LSTMs
• This captures the grammar and syntax
• Usually, LSTMs are pre-trained on corpora
boy swimming in water <END>
LSTM LSTM LSTM LSTM LSTM
Embed Embed Embed Embed Embed
A boy swimming in water

Sentiment analysis
• Very common for customer review or new article
analysis
• Output before the end can be discarded (not
used for backpropagation)
• This is a many-to-one task
Positive
LSTM LSTM LSTM LSTM LSTM LSTM
Embed Embed Embed Embed Embed Embed
Very pleased with their service <END>

Multi-layer LSTM
• More than one hidden layer can be used
yn-3 yn-2 yn-1 yn
Second hidden layer  LSTM2 LSTM2 LSTM2 LSTM2
First hidden layer  LSTM1 LSTM1 LSTM1 LSTM1
xn-3 xn-2 xn-1 xn

Machine translation
• A naïve model would be to use a many-to-
many network and directly train it
<START> म जा रहा ँ <END>
LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM
Embed Embed Embed Embed
I am going <END>
Machine translation
• One could also feed in the output to the next
instance input to predict a coherent structure
<START> म जा रहा ँ <END>
LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM
Embed Embed Embed Embed Embed2 Embed2 Embed2 Embed2
I am going <END>
Machine translation
• In actuality, one would use separate LSTMs
pre-trained on two different languages
म जा रहा ँ <END>
Decoder network  LSTM2 LSTM2 LSTM2 LSTM2 LSTM2
Embed2 Embed2 Embed2 Embed2
LSTM LSTM LSTM LSTM
Embed Embed Embed Embed  Encoder network
I am going <END>
Machine
translation
using
encoder-
decoder
Source: “Learning Phrase Representations using RNN Encoder–Decoderfor Statistical Machine Translation”, Cho et al., ArXiv 2014
Bi-directional LSTM
• Many problems require a reverse flow of information as well
• For example, POS tagging may require context from future words
yn-3 yn-2 yn-1 yn
Backward layer  LSTM2 LSTM2 LSTM2 LSTM2
Forward layer  LSTM1 LSTM1 LSTM1 LSTM1
xn-3 xn-2 xn-1 xn

Sentence generation
• Very common for image captioning
• Input is given only in the beginning
• This is a one-to-many task
A boy swimming in water <END>
CNN
Sentence generation
• Very common for image captioning
• Input is given only in the beginning
• This is a one-to-many task
A boy swimming in water <END>
CNN Embed Embed Embed Embed

Video Caption Generation
Source: “Translating Videos to Natural Language Using Deep Recurrent Neural Networks”, Venugopal et al., ArXiv 2014
Information bottlenech in the context vector
म जा रहा ँ <END>
The entire info about the

previous sentence has to Decoder network  LSTM2 LSTM2 LSTM2 LSTM2 LSTM2
flow through this context
vector c
Embed2 Embed2 Embed2 Embed2
LSTM LSTM LSTM LSTM
Embed Embed Embed Embed  Encoder network
I am going <END>
Attention between encoder and decoder can allow
generated word to get context from all input words
Source: “Neural Machine Translation by Jointly Learning to Align and Translate,” by Bahdanau, Cho, Bengio, 2014
How attention changes information flow
• Previously • With attention
Source: “Neural Machine Translation by Jointly Learning to Align and Translate,” by Bahdanau, Cho, Bengio, 2014
Global attention
Source: “Effective Approaches to Attention-based Neural Machine Translation,” by Luong, Pham, Manning, 2015
Transformer
Source: “Attention Is All You Need,” by Vaswani et al., 2017

Attention in Transformer networks

Details of transfomer by Vaswani et al. (2017)

BERT
Source: “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” by Devli et al., 2018
BERT
BERT
Vision Transformer
“An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” Dosovitskiy et al. ICLR 2021 https://arxiv.org/pdf/2010.11929.pdf
Vision Transformer
“An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” Dosovitskiy et al. ICLR 2021 https://arxiv.org/pdf/2010.11929.pdf

06 Neural Networks For NLP

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

06 Neural Networks For NLP

Uploaded by

Copyright:

Available Formats

EE782 Advanced Topics in Machine Learning

06 Neural Networks for NLP

• Design LSTM based networks for various NLP tasks

• Otherwise, a one-hot-bit input vector can be too long and

LSTM LSTM LSTM LSTM LSTM

Embed Embed Embed Embed Embed

A boy swimming in water

LSTM LSTM LSTM LSTM LSTM LSTM

Embed Embed Embed Embed Embed Embed

Very pleased with their service <END>

yn-3 yn-2 yn-1 yn

Second hidden layer  LSTM2 LSTM2 LSTM2 LSTM2

First hidden layer  LSTM1 LSTM1 LSTM1 LSTM1

xn-3 xn-2 xn-1 xn

<START> म जा रहा ँ <END>

LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM

Embed Embed Embed Embed

<START> म जा रहा ँ <END>

LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM

Embed Embed Embed Embed Embed2 Embed2 Embed2 Embed2

Decoder network  LSTM2 LSTM2 LSTM2 LSTM2 LSTM2

Embed2 Embed2 Embed2 Embed2

LSTM LSTM LSTM LSTM

Embed Embed Embed Embed  Encoder network

yn-3 yn-2 yn-1 yn

Backward layer  LSTM2 LSTM2 LSTM2 LSTM2

Forward layer  LSTM1 LSTM1 LSTM1 LSTM1

xn-3 xn-2 xn-1 xn

A boy swimming in water <END>

LSTM LSTM LSTM LSTM LSTM LSTM

A boy swimming in water <END>

LSTM LSTM LSTM LSTM LSTM LSTM

CNN Embed Embed Embed Embed

The entire info about the

LSTM LSTM LSTM LSTM

Embed Embed Embed Embed  Encoder network

Source: “Attention Is All You Need,” by Vaswani et al., 2017

Source: “Attention Is All You Need,” by Vaswani et al., 2017

Source: “Attention Is All You Need,” by Vaswani et al., 2017

You might also like