Professional Documents
Culture Documents
Recurrent Neural Networks: Azmi Haider Muhammad Salamah
Recurrent Neural Networks: Azmi Haider Muhammad Salamah
networks
AZMI HAIDER
MUHAMMAD SALAMAH
RNN: Process sequences
One to one: “Vanilla” Neural network
Vocabulary = [ h, e, l, o]
Example:
Training the sequence “hello”
Example:
Each input is a character from the training sequence
Example:
Example: test
At test time, output of a time sample
is fed to the next.
Image captioning
An example of the computer vision world:
Image captioning at test time
We’ve seen Wxh , Whh before… v is the output of the CNN:
Image captioning: in action
Image captioning: failure
Image captioning with
attention
Focus on different parts of the image:
CNN outputs a series of vectors of length L, one for each spatial location
in the image instead of one vector for the entire image.
Image captioning with
attention
In addition to the vocabulary output, now we have an output that
indicates where to give more ATTENTION in the image (meaning, which
of the L locations to take)
Image captioning with
attention
You can see the location vector at each step.
Another use of RNN with
attention
Visual question answering (http://www.visualqa.org):
Vanilla RNN Gradient Flow
Vanilla RNN Gradient Flow
Vanilla RNN Gradient Flow
Vanilla RNN Gradient Flow
Vanilla RNN Gradient Flow
Vanilla RNN Gradient Flow
Long Short-Term Memory (LSTM)
Long Short-Term Memory (LSTM)
Forget gate layer
Input gate layer
The current state
Output layer
Long Short-Term Memory (LSTM)
Long Short-Term Memory (LSTM)
Long Short-Term Memory
(LSTM): Gradient Flow
Long Short-Term Memory
(LSTM): Gradient Flow
RNNs allow a lot of flexibility in
architecture design
Vanilla RNNs are simple but don’t
work very well
Common to use LSTM or GRU: their
summary additive interactions improve gradient
flow
Backward flow of gradients in RNN
can explode or vanish. Exploding is
controlled with gradient clipping.
Vanishing is controlled with additive
interactions (LSTM)