Professional Documents
Culture Documents
RNN & LSTM: Vamsi Krishna B 1 9 M E 0 2 3
RNN & LSTM: Vamsi Krishna B 1 9 M E 0 2 3
RNN & LSTM: Vamsi Krishna B 1 9 M E 0 2 3
VA M S I K R I S H N A
B19ME023
What is RNN?
• RNN are a class of neural networks that deals with sequential data.
• RNN is recurrent in nature as it performs the same function for every input of
data while the output of the current input depends on the past one computation
• For making a decision, it considers the current input and the output that it has
learned from the previous input
• RNNs can use their internal state (memory) to process sequences of inputs
• Applicable to tasks such as unsegmented, connected handwriting recognition
or speech recognition.
• Other neural networks : all the inputs are independent of each other
In RNN : all the inputs are related to each other.
How does RNN works?
• First, it takes the X0 from the sequence of input and then it outputs h0 which together with X1 is the
input for the next step. So, the h0 and X1 is the input for the next step. Similarly, h1 from the next is the
input with X2 for the next step and so on. This way, it keeps remembering the context while training.
Where,
• Total error/loss
L= ∑t Lt = L1 + L2 + _ _ _ + Lt
Gradient Descent
• The value can be calculated as the summation of gradients at each time step.
• Using the chain rule of calculus and using the fact that the output at a time step t is a function of the
current hidden state of the recurrent unit, the following expression arises:-
We finally get
Exploding and Vanishing Gradient descent
• Although the basic Recurrent Neural Network is fairly effective, it can suffer from a significant problem.
For deep networks, The Back-Propagation process can lead to the following issues:-
• Exploding Gradients: This occurs when the gradients become too large due to back-propagation.
Exploding means, without any reason gives stupidly high importance to weights. This problem can be
solved by truncating the gradients.
• Vanishing Gradients: This occurs when the gradients become very small and tend towards zero. When
gradient tends towards zero, that means it stops learning or takes very long time to run for the deeper
networks. This makes the RNN to run only for shorter depths.
The problem of long-term Dependencies
• “the clouds are in the sky”. “I grew up in France… I speak fluent French”.
• LSTM’s are upgraded version of RNN’s which works great even with the very deep neural networks, and
also works for events with large time gaps.
LSTM
• Long Short Term Memory networks – usually just called “LSTMs” – are a special kind of RNN,
capable of learning long-term dependencies.
• LSTMs are explicitly designed to avoid the long-term dependency problem. Remembering information
for long periods of time is practically their default behavior, not something they struggle to learn.
• In all standard RNN’s have the form of repeating module of neural network. Each of the neural
network has very simple structure such as single tanh layer in it.
• In LSTM, the structure of repeating module is very complicated, which makes it to capable of having
long memory.
Structure of LSTM
• Instead of single neural network layer, we four
layer neural network in LSTM.
1. Forget Gate
2. Input Gate
3. Output Gate
1)Forget Gate
• This gate decides how much of the past information the network should
remember.
• This gate looks at the previous hidden state (ht-1) and the current state
xt. The sigmoid function outputs a vector, with values ranging from 0 to
1, corresponding to each number in the cell state.
Here, the forget gate compares the previous hidden state with the current state input. Since,
it seems the current state is only talking about Ajay. We don’t need any previous
information about Ravi. So, Ravi is omitted from memory by forget gate.
2) Input Gate
• This unit decides how much of this unit information should be added to
the current state.
Ex: Ajay is good at coding. Yesterday I came to know that he is the university topper.
• input gate analysis the important information.
• there could be lot of choices for the empty dash. this final output gate replaces it
with Ajay.