Professional Documents
Culture Documents
Long Short-Term Memory (LSTM) : Kazi Shah Nawaz Ripon - Faculty of Computer Sciences 1 20.03.2021
Long Short-Term Memory (LSTM) : Kazi Shah Nawaz Ripon - Faculty of Computer Sciences 1 20.03.2021
For the next step, we feed the word “time” and the hidden state from the previous
step.
RNN now has information on both the word “What” and “time.”
The final step the RNN has encoded information from all the words in previous
steps.
Since the final output was created from the rest of the sequence,
It is possible to take the final output and pass it to the feed-forward layer to
classify an intent.
If you were able to recall all the items then you have a pretty good working memory.
Ø It is very unlikely that you will remember all these items till tomorrow.
Ø Consider this sentence — The bus, which went to Paris, was full.
Ø To construct this sentence correctly, we need to remember that the subject (the
bus) is a singular.
Ø Therefore towards the end of the sentence we have to use was.
Ø The same sentence with a plural subject becomes — The buses, which went to
Paris, were full.
Ø We saw how RNNs are good at sequence tasks like language modelling.
Ø The network then has to make the best guess with “is it?”
tanh squishes values to be between -1 and 1 sigmoid squishes values to be between 0 and 1
Ø The core concept of LSTM’s are the cell state, and it’s various gates.
1. Forget Gate.
2. Input Gate.
3. Output gate.
A tanh function ensures that the values stay between -1 and 1, thus regulating the
output of the NN.
sample
sample
sample
Ø Difference: giving a fixed batch size (8) and the input array shape will look like (8,
2, 10).
Ø Units in LSTM.
Ø The dimension of the hidden state (or the output).
Ø Here, the hidden state (the red circles) has length 2.
Ø The number of units is the number of neurons
connected to the layer holding the concatenated
vector of hidden state and input (the layer holding both
red and green circles below).
Ø Here, there are 2 neurons connected to that layer.
Ø return_sequences tells whether to return the output at each time step instead of the final time
step.
Ø Here, return_sequences to True, the output shape becomes a 3D array, instead of a 2D array.
Ø Now the shape of the output is (8, 2, 3).
Ø The one extra dimension in between representing the number of time steps.
Ø The reshape() function in numpy can be used to reshape the above 1 dimensional array
into a 3 dimensional array, with 1 sample, 10 time steps, and 1 feature.
Ø This function takes 1 argument which is a tuple that defines the new shape of the array.