ML (Cs-601) Unit 4 Complete

MACHINE LEARNING
UNIT 4
Recurrent Neural Network (RNN)
•A recurrent neural network (RNN) is any network whose neurons
send feedback signals to each other.
•A Recurrent Neural Network is a type of neural network that
contains loops, allowing information to be stored within the network.
•In order to achieve it, the RNN creates the networks with loops in
them, which allows it to persist the information.
This loop structure allows the neural network to take the sequence of
input.
•Thus RNN came into existence, which solved this issue with the help
of a Hidden Layer. The main and most important feature of RNN is
Hidden state, which remembers some information about a sequence.
•RNN have a “memory” which remembers all information about
what has been calculated. It uses the same parameters for each input as
it performs the same task on all the inputs or hidden layers to produce
the output.
•Formula for calculating current state:

where:
ht -> current state
ht-1 -> previous state
xt -> input state
Working concept of RNN
The layers are:
Training through RNN
• A single time step of the input is provided to the network.
• Then calculate its current state using set of current input
and the previous state.
• The current ht becomes ht-1 for the next time step.
• One can go as many time steps according to the problem
and join the information from all the previous states.
• Once all the time steps are completed the final current
state is used to calculate the output.
• The output is then compared to the actual output i.e the
target output and the error is generated.
• The error is then back-propagated to the network to
update the weights and hence the network (RNN) is
trained.
Long Short Term Memory (LSTM)
• Long Short Term Memory is a kind of recurrent neural network.
• An RNN remembers each and every information through time. It is

useful in time series prediction because of the feature to
remember previous input as well. But simple RNN architecture is
not capable to remove unusable information.
• The long term memory refers to the learned weights and the short
term memory refers to the gated cell state values that change with
each step through time This is called Long Short Term Memory
(LSTM).
Structure Of LSTM:
LSTM has a chain structure that contains four neural networks and
different memory blocks called cells.
Information is retained by the cells and the memory manipulations are

done by the gates. There are 3 gates :-
• Forget gate: The information from the current input (Xt)
and the previous hidden state (ht) is passed through the
sigmoid activation function. If the output value is closer to 0
means forget, and the closer to 1 means to retain.
• Input gate: It works as an input to the cell state. It consists

of two parts; first, we pass the previous hidden state (ht)
and current input (Xt) into a sigmoid function to decide
which values will be updated. Then, pass the same two
inputs into the tanh activation to regulate the network.
Finally, multiply the tanh output (Ct) with the sigmoid
output (it) to decide which information is important to
update the cell state.
• Cell state: The input from the previous cell state (Ct-1) is
point wise multiplied with the forget gate output. If the
forget output is 0, then it will discard the previous cell
output (Ct-1). This output is point wise added with the input
gate output to update the new cell state (Ct). The present
cell state will become the input to the next LSTM unit.
• Output gate: The hidden state contains information on
previous inputs and is used for prediction. The output gate
regulates the present hidden state (ht). The previous hidden
state (ht-1) and current input (xt) are passed to the sigmoid
function. This output is multiplied with the output of the
tanh function to obtain the present hidden state. The current
state (Ct) and present hidden state (ht) are the final outputs
from a classic LSTM unit.
1. Forget Gate
The information that no longer useful in the cell state is
removed with the forget gate.
2. Input gate:
Addition of useful information to the cell state is done
by input gate.
3. Output gate: The task of extracting useful information
from the current cell state to be presented as an output
is done by output gate.
Gated Recurrent Unit (GRU)
• One of the most famous variations is the Long Short Term Memory
Network(LSTM). One of the lesser known but equally effective
variations is the Gated Recurrent Unit Network(GRU).
• It consists of two gates and does not maintain an Internal Cell State.
• The information which is stored in the Internal Cell State (C t) in an
LSTM recurrent unit is incorporated into the hidden state of the Gated
Recurrent Unit. This collective information is passed onto the next Gated
Recurrent Unit.
The different gates of a GRU are as described below:-
I. Update Gate(z): It determines how much of the past knowledge

needs to be passed. It is analogous to the Output Gate in an LSTM
recurrent unit.
II. Reset Gate(r): It determines how much of the past knowledge to

forget.
III. Current Memory Gate (ht): It is incorporated into the Reset Gate
just like the Input Modulation Gate is a sub-part of the Input Gate
and is used to introduce some non-linearity into the input and to also
make the input Zero-mean. Another reason to make it a sub-part of
the Reset gate is to reduce the effect that previous information has
on the current information that is being passed into the future.
• GRUs are very similar to Long Short Term
Memory(LSTM). Just like LSTM, GRU uses gates to
control the flow of information. They are relatively new as
compared to LSTM. This is the reason they offer some
improvement over LSTM and have simpler architecture.
• GRU is less complex than LSTM and is significantly faster to
compute.
Beam Search and Width
• Beam search is a heuristic search algorithm that uses breadth-first
search to build its search tree and reduces the search space by
eliminating candidates/nodes to reduce the memory and time
requirements.
• Beam Search is an approximate search strategy. Beam search is a

restricted, or modified, version of either a breadth-first search.
• It is restricted in the sense that the amount of memory available for

storing the set of alternative search nodes is limited, and in the sense
that non-promising nodes can be pruned at any step in the search
Breadth First Search
Figure 1 Figure 2
Figure 3 Figure 4
Figure 5 Figure 6
Figure 7 Figure 8
Without the beam search, the worst time and space complexity for the
best-search would be O(bm) where b is the branching factor and m is
the maximum depth.
Beam width
Beam width or beam size, is a parameter in the beam search algorithm
which determines how many of the best partial solutions / adjacent
nodes to evaluate.
•
Bleu Score
The BLEU (Bi-Lingual Evaluation Understudy) score is a string-matching
algorithm that provides basic output quality metrics. BLEU is actually
nothing more than a method to measure the similarity between two text
strings.
• A fundamental problem/limitation with BLEU is that it DOES NOT EVEN TRY
to measure “translation quality”, but rather focuses on STRING SIMILARITY.
Scoring process
• The BLEU algorithm compares consecutive phrases of the automatic
translation with the consecutive phrases it finds in the reference
translation, and counts the number of matches.
• These matches are position independent.
• A higher match degree indicates a higher degree of similarity with the
reference translation, and higher score.
• A comparison between BLEU scores is only justifiable when BLEU results
are compared with the same Test set, the same language pair, and the
same MT engine.
• A value of 0 means that the machine-translated output has no overlap
with the reference translation (low quality) while a value of 1 means
there is perfect overlap with the reference translations (high quality).
• Interpretation
Calculating the BLEU score
To compute the BLEU score for each translation, we compute the
following statistics.
• N-Gram Precisions
The n-gram overlap counts how many unigrams, bigrams, trigrams, and
four-grams (i=1,2,3,4) match their n-gram counterpart in the reference
translations.
• Brevity-Penalty
BP stands for brevity penalty. Since BLEU is a kind of precision, short
outputs would score highly without BP. This penalty is defined simply as:
Where r and c is again the number of tokens in the reference

translation and candidate translation, respectively.
Where (almost always) precision, i.e. the
number of i- grams in the candidate translation which are confirmed by the
reference.
Attention Model
• Attention is a mechanism combined in the RNN allowing it to focus
on certain parts of the input sequence when predicting a certain
part of the output sequence, enabling easier learning and of higher
quality.
• Improved RNN models such as Long Short-Term Memory networks

(LSTMs) enable training on long sequences overcoming problems like
vanishing gradients. However, even the more advanced models have
their limitations and researchers had a hard time developing high-
quality models when working with long data sequences.
• The RNN encoder has an input sequence x1, x2, x3, x4. We denote the
encoder states by c1, c2, c3. The encoder outputs a single output
vector c which is passed as input to the decoder. Like the encoder,
the decoder is also a single-layered RNN, we denote the decoder
states by s1, s2, s3 and the network’s output by y1, y2, y3, y4.
• A problem with this architecture lies in the fact that the decoder
needs to represent the entire input sequence x1, x2, x3, x4 as a single
vector c, which can cause information loss. Moreover, the decoder
needs to decipher the passed information from this single vector, a
complex task in itself.
• A potential issue with this encoder–decoder approach is that a

neural network needs to be able to compress all the necessary
information of a source sentence into a fixed-length vector. This may
make it difficult for the neural network to cope with long sentences.
RNNs with an Attention Mechanism
An attention RNN looks like this:
RNN encoder-decoder architecture with Attention module

• Our attention model has a single layer RNN encoder, again with 4-
time steps. We denote the encoder’s input vectors by x1, x2, x3, x4
and the output vectors by h1, h2, h3, h4.
• The attention mechanism is located between the encoder and the
decoder, its input is composed of the encoder’s output vectors h1,
h2, h3, h4 and the states of the decoder s0, s1, s2, s3, the attention’s
output is a sequence of vectors called context vectors denoted by
c1, c2, c3, c4.
• The context vectors
The context vectors enable the decoder to focus on certain parts of
the input when predicting its output.
• Each context vector is a weighted sum of the encoder’s output
vectors h1, h2, h3, h4, each vector hi contains information about the
whole input sequence with a strong focus on the parts surrounding
the i-th vector of the input sequence. The vectors h1, h2, h3, h4 are
scaled by weights ɑij capturing the degree of relevance of input xj
to output at time i, yi.
Reinforcement Learning
• Reinforcement Learning is defined as a Machine Learning method that is concerned with how software agents should take actions in an environment.
• Reinforcement Learning is a feedback-based Machine learning technique in which an agent learns to behave in an environment by performing the actions and seeing
the results of actions. For each good action, the agent gets positive feedback, and for each bad action, the agent gets negative feedback or penalty.
• There is no supervisor, only a real number or reward signal.
• In Reinforcement Learning, the agent learns automatically using feedbacks without any labelled data.
• The agent learns with the process of hit and trial, and based on the experience, it learns to perform the task in a better way. Hence, we can say that "Reinforcement
learning is a type of machine learning method where an intelligent agent (computer program) interacts with the environment and learns to act.“
The agent learns that what actions lead to positive feedback or rewards and what
actions lead to negative feedback penalty. As a positive reward, the agent gets a
positive point, and as a penalty, it gets a negative point.
Here are some important terms used in Reinforcement Learning:
• Agent: It is an assumed entity which performs actions in an environment to gain
some reward.
• Environment (e): A scenario that an agent has to face.
• Reward (R): An immediate return given to an agent when he or she performs
specific action or task.
• State (s): State refers to the current situation returned by the environment.
• Policy (π): It is a strategy which applies by the agent to decide the next action
based on the current state.
• Value (V): It is expected long-term return with discount, as compared to the short-
term reward.
• Value Function: It specifies the value of a state that is the total amount of reward.
It is an agent which should be expected beginning from that state.
• Model of the environment: Used to do planning , instead of hit and trial.
Types of Reinforcement Learning
Two kinds of reinforcement learning methods are:
I. Positive:
• The positive reinforcement learning means adding something to increase the
tendency that expected behaviour would occur again. It impacts positively on the
behaviour of the agent and increases the strength of the behaviour .
• This type of reinforcement can sustain the changes for a long time.
II. Negative:
• The negative reinforcement learning is opposite to the positive reinforcement as it
increases the tendency that the specific behaviour will occur again by avoiding the
negative condition.
• It can be more effective than the positive reinforcement depending on situation
and behaviour , but it provides reinforcement only to meet minimum behaviour.
Applications of Reinforcement Learning

• Robotics for industrial automation, Business strategy planning , Data processing,
Aircraft control , robot motion control etc.
Markov Decision Process (MDP)
• Reinforcement Learning is a type of Machine Learning. It
allows machines and software agents to automatically
determine the ideal behaviour within a specific context, in
order to maximize its performance. Simple reward feedback
is required for the agent to learn its behaviour; this is known
as the reinforcement signal.
• Reinforcement Learning is defined by a specific type of
problem, and all its solutions are classed as Reinforcement
Learning algorithms. MDP is a framework that can solve
most Reinforcement learning problems. In the problem,
an agent is supposed to decide the best action to select based
on his current state. When this step is repeated, the problem
is known as a Markov Decision Process (MDP).
A Markov Decision Process (MDP) model contains:
• A set of possible world states S.
• A set of Models.
• A set of possible actions A.
• A real valued reward function R(s,a).
• A policy the solution of Markov Decision Process.
• A State is a set of tokens that represent every state that the agent
can be in.
• A Model (sometimes called Transition Model) gives an action’s effect
in a state.
• An Action A is set of all possible actions. A(s) defines the set of
actions that can be taken being in state S.
• A Reward is a real-valued reward function.
• A Policy is a solution to the Markov Decision Process.
Bellman Equation
• Bellman equation is the basic block of solving
reinforcement learning and is omnipresent in RL. It helps us
to solve MDP. To solve means finding the optimal policy
and value functions.
• The optimal value function V*(S) is one that yields
maximum value.
• The value of a given state is equal to the max action (action
which maximizes the value) of the reward of the optimal
action in the given state and add a discount factor
multiplied by the next state’s Value from the Bellman
Equation.
Bellman Equation
Value Iteration and Policy Iteration
We solve a Bellman equation using two powerful
algorithms:
i. Value Iteration ii. Policy Iteration
•In value iteration, we start off with a random value

function. As the value table is not optimized if
randomly initialized we optimize it iteratively.
•In Policy Iteration the actions which the agent needs

to take are decided or initialized first and the value
table is created according to the policy.
Actor Critic Model
• As an agent takes actions and moves through an
environment, it learns to map the observed state of
the environment to two possible outputs:
• Recommended action: A probability value for each
action in the action space. The part of the agent
responsible for this output is called the actor.
• Estimated rewards in the future: Sum of all
rewards it expects to receive in the future. The part
of the agent responsible for this output is the critic.
Q-Learning
• Q-Learning is a basic form of Reinforcement Learning which uses Q-
values (also called action values) to iteratively improve the behaviour
of the learning agent.
SARSA
• State–action–reward–state–action(SARSA) algorithm is a slight
variation of the popular Q-Learning algorithm.
• SARSA is an algorithm for learning a Markov decision process policy,
used in the reinforcement learning area of machine learning.
• A learning agent in any Reinforcement Learning algorithm it’s policy
can be of two types:-
• On Policy: In this, the learning agent learns the value function
according to the current action derived from the policy currently
being used.
• Off Policy: In this, the learning agent learns the value function
according to the action derived from another policy.

ML (Cs-601) Unit 4 Complete

Uploaded by

Copyright:

Available Formats

You might also like

ML (Cs-601) Unit 4 Complete

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ML (Cs-601) Unit 4 Complete

Uploaded by

Copyright:

Available Formats

MACHINE LEARNING

•Formula for calculating current state:

• An RNN remembers each and every information through time. It is

Information is retained by the cells and the memory manipulations are

• Input gate: It works as an input to the cell state. It consists

I. Update Gate(z): It determines how much of the past knowledge

II. Reset Gate(r): It determines how much of the past knowledge to

• Beam Search is an approximate search strategy. Beam search is a

• It is restricted in the sense that the amount of memory available for

Where r and c is again the number of tokens in the reference

• Improved RNN models such as Long Short-Term Memory networks

• A potential issue with this encoder–decoder approach is that a

RNN encoder-decoder architecture with Attention module

Applications of Reinforcement Learning

•In value iteration, we start off with a random value

•In Policy Iteration the actions which the agent needs

You might also like