Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 42

Recurrent neural

networks
AZMI HAIDER
MUHAMMAD SALAMAH
RNN: Process sequences
One to one: “Vanilla” Neural network

• Inputs are unrelated to one another


• Inputs/outputs are of fixed size
RNN: Process sequences
One to many: image captioning
RNN: Process sequences
Many to one: sentiment classification
RNN: Process sequences
Many to many: machine translation
RNN: Process sequences
Many to many: video classification on frame level
Recurrent neural network
RNN has a hidden state updated with each input x:
1. Input x
2. Update the hidden state
3. Outputs y
Recurrent neural network
New state is a function of old state and input:
(Vanilla) Recurrent neural
network
The hidden state is a single vector h:
RNN: computational graph
Another way to look at RNN is time flow graph:
RNN: computational graph
Many to many:
RNN: computational graph:
many to one:
RNN: computational graph:
one to many:
Example:
Character-level language model:

Vocabulary = [ h, e, l, o]

Example:
Training the sequence “hello”
Example:
Each input is a character from the training sequence
Example:
Example: test
At test time, output of a time sample
is fed to the next.
Image captioning
An example of the computer vision world:
Image captioning at test time
We’ve seen Wxh , Whh before… v is the output of the CNN:
Image captioning: in action
Image captioning: failure
Image captioning with
attention
Focus on different parts of the image:
CNN outputs a series of vectors of length L, one for each spatial location
in the image instead of one vector for the entire image.
Image captioning with
attention
In addition to the vocabulary output, now we have an output that
indicates where to give more ATTENTION in the image (meaning, which
of the L locations to take)
Image captioning with
attention
You can see the location vector at each step.
Another use of RNN with
attention
Visual question answering (http://www.visualqa.org):
Vanilla RNN Gradient Flow
Vanilla RNN Gradient Flow
Vanilla RNN Gradient Flow
Vanilla RNN Gradient Flow
Vanilla RNN Gradient Flow
Vanilla RNN Gradient Flow
Long Short-Term Memory (LSTM)
Long Short-Term Memory (LSTM)
Forget gate layer
Input gate layer
The current state
Output layer
Long Short-Term Memory (LSTM)
Long Short-Term Memory (LSTM)
Long Short-Term Memory
(LSTM): Gradient Flow
Long Short-Term Memory
(LSTM): Gradient Flow
RNNs allow a lot of flexibility in
architecture design
Vanilla RNNs are simple but don’t
work very well
Common to use LSTM or GRU: their
summary additive interactions improve gradient
flow
Backward flow of gradients in RNN
can explode or vanish. Exploding is
controlled with gradient clipping.
Vanishing is controlled with additive
interactions (LSTM)

You might also like