RNN

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 23

RECURRENT

NEURAL
NETWORK
(RNN)
Presented By:
Muhammad Mohsin Zafar
Agenda

 Neural Network
 Common Neural Networks
 Introduction to RNN
 Working of RNN
 Types of RNN
 Advantages
 Disadvantages
Neural Network (NN)

 Also known as artificial neural networks (ANNs) or simulated


neural networks (SNNs), mimic the way that biological neurons
signal to one another
 ANNs are comprised of a node layers, containing an input layer,
one or more hidden layers, and an output layer. Each artificial
neuron, connects to another and has an associated weight and
threshold. If the output of any individual node is above the Neural Network to predict dog breed [2]
specified threshold value, that node is activated, sending data to
the next layer of the network

[1]
Common Neural Networks

 Feed-Forward Neural Network: Used for


general Regression and Classification problems
 Convolutional Neural Network: Used for object
detection and image classification
 RNN: Used for speech recognition, voice
recognition, time series prediction, and natural
language processing
[2]
Introduction to RNN

 A type of Artificial Neural Network that are good at modeling sequential data.
Traditional Deep Neural Networks assume that inputs and outputs are independent
of each other, the output of Recurrent Neural Networks depend on the prior
elements within the sequence. They have an inherent “memory” as they take
information from prior inputs to influence the current input and output. [3]
 RNNs are called recurrent because they perform the same task for every element
of a sequence, with the output being depended on the previous computations. [5]
Introduction to RNN

Because non of the existing models can handle


sequence data (Natural language data) effectively.
In Sequential data
 Elements can be repeated
 Order of elements matters
 Of variable (potentially infinite) length
Modeling sequential data requires the knowledge of Sequential data [4]
the context of data like time series data.
RNN solves the problem memorizing previous inputs
due to their internal memory (remember the context).
[4]

Simple RNN [6]


Working of RNN
Architecture
 In figure 1 x, h, o, L, y represents input, hidden
state, output, loss, and target value respectively
 It maps an input sequence x values to a
corresponding sequence of output o values. A loss
L measure the difference between the actual output
y and the predicted output o. The RNN has also
input to hidden connection parametrized by a
weight matrix U, hidden to hidden connections
parametrized by a weight matrix W, and hidden-to-
output connections parametrized by a weight
matrix V [7]

Figure 1 Architecture of RNN [7]


Working of RNN
Forward propagation
 Equation 1, 2 , 3 and 4 are derived from fig1, these equations are for forward → Eq 1
propagation for each time step from t = 1 to t = τ
→ Eq 2
 h is hidden layer output and o is the output at time step t
 hyperbolic tangent Tanh() is the activation function for hidden unites. → Eq 3
 Softmax activation function is applied as a post-processing step to obtain a
= softmax() → Eq 4
vector of normalized probabilities over the output o
 Parameters are the bias vectors b and c along with the weight matrices U, V
and W, respectively, for input-to-hidden, hidden-to-output and hidden-to-
hidden connections
[8]
Working of RNN
Loss Function
 Total loss for a sequence of x values paired with a sequence of y values is the → Eq 1
sum of the losses over all the time steps.
→ Eq 2
if is the negative log-likelihood of given , . . . , , then
→ Eq 3

= softmax() → Eq 4

In eq 7 is given by reading the entry for from the output vector


[8]
Working of RNN
Back Propagation for parameters optimization
 Back-propagation algorithm applied to RNN
 Backpropagation is done at each point in time. At timestep T, the
derivative of the loss L with respect to weight matrixes (W, V and
U) is expressed as follows:
For Hidden-to-Hidden weight W

For Hidden-to-Output weight V

[9] Figure 1 Architecture of RNN [7]


Working of RNN
Back Propagation for parameters optimization
 Backpropagation is done at each point in time. At timestep T, the
derivative of the loss L with respect to weight matrixes (W, V and
U) and biases (a and b) is expressed as follows:
For Input-to-Hidden weight U

For biases a and b (from equation 1 and 3)

Figure 1 Architecture of RNN [7]


[9]
One to One RNNN

 known as the Vanilla Neural Network


 Used for general machine learning problems,
which has a single input and a single output.
[2]
Types of
RNN

One to One [2]


One to Many RNNN

 Single input and multiple outputs


 Example of this is the image caption
[2]

Types of
RNN

One to Many [2]


Many to One RNNN

 Takes a sequence of inputs and


generates a single output
 Sentiment analysis is a good example
of this kind of network where a given
sentence can be classified as
Types of expressing positive or negative
sentiments
RNN [2]

Many to One [2]


Many to Many RNNN

 Takes a sequence of inputs and


generates a sequence of outputs
 Machine translation is one of the
examples
[2]
Types of
RNN

Many to Many [2]


Advantages
Handles Variable-Length Sequences
 RNNs are designed to handle input sequences of variable length, which makes them well-
suited for tasks such as speech recognition, natural language processing, and time series
analysis
Memory Of Past Inputs
 Have a memory of past inputs, which allows them to capture information about the context
of the input sequence. This makes them useful for tasks such as language modeling, where
the meaning of a word depends on the context in which it appears.
[2]
Advantages
Parameter Sharing
 Share the same set of parameters across all time steps, which reduces the number of
parameters that need to be learned and can lead to better generalization
Non-Linear Mapping
 Uses non-linear activation functions, which allows them to learn complex, non-linear
mappings between inputs and outputs
Sequential Processing
 Processes input sequences sequentially, which makes them computationally efficient and
easy to parallelize
[2]
Advantages
Flexibility
 RNNs can be adapted to a wide range of tasks and input types, including text, speech, and
image sequences
Improved Accuracy
 RNNs have been shown to achieve state-of-the-art performance on a variety of sequence
modeling tasks, including language modeling, speech recognition, and machine translation
[2]
Disadvantages
Vanishing And Exploding Gradients
 RNNs can suffer from the problem of vanishing or exploding gradients, which can make it
difficult to train the network effectively. This occurs when the gradients of the loss
function with respect to the parameters become very small or very large as they propagate
through time
Computational Complexity
 Can be computationally expensive to train, especially when dealing with long sequences.
This is because the network must process each input in sequence, which can be slow
[2]
Disadvantages
Difficulty In Capturing Long-Term Dependencies
 Although RNNs are designed to capture information about past inputs, they can struggle to
capture long-term dependencies in the input sequence. This is because the gradients can
become very small as they propagate through time, which can cause the network to forget
important information
Lack Of Parallelism
 RNNs are inherently sequential, which makes it difficult to parallelize the computation.
This can limit the speed and scalability of the network
[2]
References
1. https://www.ibm.com/topics/neural-networks
2. https://www.simplilearn.com/tutorials/deep-learning-tutorial/rnn
3. https://www.v7labs.com/blog/recurrent-neural-networks-guide
4.
https://storage.googleapis.com/deepmind-media/UCLxDeepMind_2020/L6%20-%20UCLxDeepMind%20DL2020.pdf
5. https://towardsdatascience.com/recurrent-neural-networks-rnns-3f06d7653a85
6. https://www.datacamp.com/tutorial/tutorial-for-recurrent-neural-network
7. https://www.analyticsvidhya.com/blog/2021/07/in-depth-explanation-of-recurrent-neural-network
8. https://www.deeplearningbook.org/contents/rnn.html#pf7
9. https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-recurrent-neural-networks

You might also like