Professional Documents
Culture Documents
Understanding LSTM Networks
Understanding LSTM Networks
Understanding LSTM Networks
A A A A A
x0 x1 x2 x3 x4
RNN long-term dependencies
Language model trying to predict the next word based on the previous ones
h0 h1 h2 ht −1 ht
A A A A A
x0 x1 x2 x t −1 xt
Standard RNN
Backpropagation Through Time
(BPTT)
RNN forward pass
s t =tanh (Ux t +Ws t −1)
y^ t =softmax( Vst )
V V V V V
W W W W W
E( y , ^y )=−∑ E t ( y t , y^ t ) U U U U U
t
Backpropagation Through Time
∂E ∂ Et
=∑
∂W t ∂W
∂ E 3 ∂ E 3 ∂ y^3 ∂ s 3
=
∂W ∂ y^3 ∂ s 3 ∂ W
But s 3 =tanh(Uxt +Ws2)
∂ E 3 3 ∂ E3 ∂ y^3 ∂ s 3 ∂ s k
=∑
∂W k=0 ∂ y^ 3 ∂ s 3 ∂ s k ∂W
The Vanishing Gradient Problem
∂ E 3 3 ∂ E3 ∂ y^3 ∂ s 3 ∂ s k
=∑
∂W k=0 ∂ y^ 3 ∂ s 3 ∂ s k ∂W
∂ E 3 3 ∂ E3 ∂ y^3 3
=∑
∂W k=0 ∂ y^ 3 ∂ s 3 (∏
j=k +1
)
∂ s j ∂ sk
∂ s j−1 ∂W
C t⋅o t ot
Replaced by
Π σ
~ ( t)
C t = C t⋅i c + C t − 1
Edge to next
Π σ time step
it
Edge from previous
σ ~
C time step
t (and current input)
Weight fixed at 1
Input gate
● Use contextual information to decide
● Store input into memory
● Protect memory from overwritten
by other irrelevant inputs
C t⋅o t ot
Π σ
~ (t )
C t= Ct⋅i c +C t −1
Edge to next
Π σ time step
it
Edge from previous
σ ~
C time step
t (and current input)
Weight fixed at 1
Output gate
● Use contextual information to decide
● Access information in memory
● Block irrelevant information
C t⋅o t ot
Π σ
~ (t )
C t= Ct⋅i c +C t −1
Edge to next
Π σ time step
it
Edge from previous
σ ~
C time step
t (and current input)
Weight fixed at 1
Forget or reset gate
C t⋅o t ot
ft Π σ
σ
~ (t )
C t= Ct⋅i c +C t −1⋅f t
Π
Edge to next
Π σ time step
it
Edge from previous
σ ~
C time step
t (and current input)
Weight fixed at 1
LSTM with four interacting layers
The cell state
Gates
sigmoid layer
Step-by-Step LSTM Walk Through
Forget gate layer
Input gate layer
The current state
Output layer
Refrence
● http://colah.github.io/posts/2015-08-Understanding-LSTMs/
● http://www.wildml.com/
● http://nikhilbuduma.com/2015/01/11/a-deep-dive-into-recurrent-neural-netwo
rks/
● http://deeplearning.net/tutorial/lstm.html
● https://theclevermachine.files.wordpress.com/2014/09/act-funs.png
● http://blog.terminal.com/demistifying-long-short-term-memory-lstm-recurrent
-neural-networks/
● A Critical Review of Recurrent Neural Networks for Sequence Learning,
Zachary C. Lipton, John Berkowitz
● Long Short-Term Memory, Hochreiter, Sepp and Schmidhuber, Jurgen, 1997
● Gers, F. A.; Schmidhuber, J. & Cummins, F. A. (2000), 'Learning to Forget:
Continual Prediction with LSTM.', Neural Computation 12 (10) , 2451-2471 .