Professional Documents
Culture Documents
RNN & LSTM: Nguyen Van Vinh Computer Science Department, UET, Vnu Ha Noi
RNN & LSTM: Nguyen Van Vinh Computer Science Department, UET, Vnu Ha Noi
3
Why not standard NN?
What is RNN?
Application:
+ Language translation.
+ Natural language processing (NLP).
+ Speech recognition.
+ Image captioning.
5
Types of RNN
6
Recurrent Neural Network Cell
ℎ0 𝑅𝑁𝑁 ℎ1
𝑥1
Recurrent Neural Network Cell
ℎ1 = tanh(𝑊ℎℎ ℎ0 + 𝑊ℎ𝑥 𝑥1 )
ℎ0 𝑅𝑁𝑁 ℎ1
𝑥1
Recurrent Neural Network Cell
𝑦1
ℎ1
ℎ0 𝑅𝑁𝑁 ℎ1
ℎ1 = tanh(𝑊ℎℎ ℎ0 + 𝑊ℎ𝑥 𝑥1 )
𝑥1
𝑦1 = softmax(𝑊ℎ𝑦 ℎ1 )
Recurrent Neural Network Cell
𝑦1
ℎ1
ℎ0 𝑅𝑁𝑁 ℎ1
𝑥1
Recurrent Neural Network Cell
𝑦1 = [0.1, 0.05, 0.05, 0.1, 0.7] e (0.7)
𝑥1 = [0 0 1 0 0]
abcde
Recurrent Neural Network Cell
𝑦1
ℎ1
ℎ0 𝑅𝑁𝑁 ℎ1
𝑥1
(Unrolled) Recurrent Neural Network
a t <<space>>
𝑦1 𝑦2 𝑦3
ℎ1 ℎ2 ℎ3
𝑥1 𝑥2 𝑥3
c a t
(Unrolled) Recurrent Neural Network
cat likes eating
𝑦1 𝑦2 𝑦3
ℎ1 ℎ2 ℎ3
𝑥1 𝑥2 𝑥3
ℎ3
𝑥1 𝑥2 𝑥3
𝑦1 𝑦2 𝑦3
ℎ1 ℎ2 ℎ3
ℎ0 𝐵𝑅𝑁𝑁 ℎ1 B𝑅𝑁𝑁 ℎ2 ℎ3
𝐵𝑅𝑁𝑁
𝑥1 𝑥2 𝑥3
𝑦1 𝑦2 𝑦3
ℎ1 ℎ2 ℎ3
ℎ1 ℎ2 ℎ3
𝑥1 𝑥2 𝑥3
c a t
Training RNNs is hard
Multiply the same matrix at each time step during forward prop
Ideally inputs from many time steps ago can modify output y
Take for an example RNN with 2 time steps!
The vanishing/exploding gradient problem
Multiply the same matrix at each time step during backprop
The vanishing gradient problem
Similar but simpler RNN formulation:
Remember
24
Architecture of LSTM cell
25
Architecture of LSTM cell
26
Architecture of LSTM cell
27
Architecture of LSTM cell
28
Architecture of LSTM cell
• Conclusion:
29
How does LSTM can solve vanishing gradient?
30
LSTM Variations (GRU)
● Gated Recurrent Unit (GRU)
- Combine the forget and input layer into a single “update gate”
- Merge the cell state and the hidden state
- Simpler.
31
Compare LSTM vs. GRU
- GRUs train faster and perform better than LSTMs on less training
data if you are doing language modeling (not sure about other
tasks).
- GRUs are simpler and thus easier to modify, for example adding
new gates in case of additional input to the network. It's just less
code in general.
- LSTMs should in theory remember longer sequences than GRUs
and outperform them in tasks requiring modeling long-distance
relations.
32
Successful Applications of LSTMs
Speech recognition: Language and acoustic modeling
Sequence labeling
POS Tagging
https://www.aclweb.org/aclwiki/index.php?title=POS_Tagging_(State_of_the_art)
NER
Phrase Chunking
Neural syntactic and semantic parsing
Image captioning: CNN output vector to sequence
Sequence to Sequence
Machine Translation (Sustkever, Vinyals, & Le, 2014)
Video Captioning (input sequence of CNN frame outputs)
33
Summary
Recurrent Neural Network is one of the best deep NLP
model families
Most important and powerful RNN extensions with
LSTMs and GRUs
Question and Discussion!