Download as pdf or txt
Download as pdf or txt
You are on page 1of 36

Genetic Algorithms

 A genetic algorithm is a search heuristic that is inspired by


Charles Darwin’s theory of natural evolution.
 This algorithm reflects the process of natural selection where
the fittest individuals are selected in order to produce offspring
of the next generation.
 Notion of Natural Selection
 The process of natural selection starts with the selection of
fittest individuals from a population.
 They produce offspring which inherit the characteristics of the
parents and will be added to the next generation.
 If parents have better fitness, their offspring will be better than
parents and have a better chance at surviving. This process
keeps on iterating and at the end, a generation with the fittest
individuals will be found.
 This notion can be applied for a search problem.
Basic Terminologies
 Population − It is a subset of all the possible
(encoded) solutions to the given problem. The
population for a GA is analogous to the population for
human beings except that instead of human beings,
we have Candidate Solutions representing human
beings.
 Chromosomes − A chromosome is one such solution
to the given problem.
 Gene − A gene is one element position of a
chromosome.
 Allele − It is the value a gene takes for a particular
chromosome.
Basic Terminologies
GA Algorithm
Operators of Genetic Algorithms
Once the initial generation is created, the algorithm evolves
the generation using following operators –
1) Selection Operator: The idea is to give preference to
the individuals with good fitness scores and allow them to
pass their genes to successive generations.
2) Crossover Operator: This represents mating between
individuals. Two individuals are selected using selection
operator and crossover sites are chosen randomly. Then the
genes at these crossover sites are exchanged thus creating a
completely new individual (offspring). For example –
Operators of Genetic Algorithms
Operators of Genetic Algorithms
3) Mutation Operator: The key idea is to insert random
genes in offspring to maintain the diversity in the
population to avoid premature convergence. For
example –
GA Algorithm
1) Randomly initialize populations p
2) Determine fitness of population
3) Until convergence repeat:
a) Select parents from population
b) Crossover and generate new population
c) Perform mutation on new population
d) Calculate fitness for new population
Crowding in GA
 Crowding is a technique used in genetic algorithms to
preserve diversity in the population and to prevent
premature convergence to local optima.
 It consists of pairing each offspring with a similar
individual in the current population (pairing phase)
and deciding which of the two will remain in the
population (replacement phase).
Difference between Genetic
Algorithms and Traditional Algorithms
 A search space is the set of all possible solutions to the problem.
In the traditional algorithm, only one set of solutions is
maintained, whereas, in a genetic algorithm, several sets of
solutions in search space can be used.
 Traditional algorithms need more information in order to
perform a search, whereas genetic algorithms need only one
objective function to calculate the fitness of an individual.
 Traditional Algorithms cannot work parallelly, whereas genetic
Algorithms can work parallelly (calculating the fitness of the
individualities are independent).
 Traditional Algorithms can only generate one result in the end,
whereas Genetic Algorithms can generate multiple optimal
results from different generations.
Reinforcement Learning
 Reinforcement Learning is a feedback-based ML technique
in which an agent learns to behave in an environment by
performing the actions and seeing the results of actions.
 For each good action, the agent gets positive feedback, and
for each bad action, the agent gets negative feedback or
penalty.
 In Reinforcement Learning, the agent learns automatically
using feedbacks without any labeled data,
unlike supervised learning.
 RL solves a specific type of problem where decision making
is sequential, and the goal is long-term, such as game-
playing, robotics, etc.
Reinforcement Learning
Elements of Reinforcement
Learning
 Agent: An entity that can perceive/explore the environment
and act upon it.
 Environment: A situation in which an agent is present or
surrounded by. In RL, we assume the stochastic
environment, which means it is random in nature.
 Action: Actions are the moves taken by an agent within the
environment.
 State: State is a situation returned by the environment after
each action taken by the agent.
 Reward : A feedback returned to the agent from the
environment to evaluate the action of the agent.
Elements of Reinforcement
Learning
Elements of Reinforcement
Learning
 Policy: Policy is a strategy applied by the agent for the next
action based on the current state.
 A policy π is a function that takes as an input a state “S” (S2 in
our example) and returns an action “a” (a6 in our example).
i.e : π(s) → a
Or in our example, π(S2) → a6
 Value Function : Whereas the reward signal indicates what’s
good for the agent at the moment (e.g. taking action “a6” right
now when the agent is on S2), a value function specifies what
is good in the long run.
 The value of a state is the total amount of reward an agent can
expect to accumulate over the future, starting from that state.
How does Reinforcement Learning
Work?
 Let's take an example of a maze environment that the
agent needs to explore. Consider the below image:
 In the above image, the agent is at the very first block
of the maze. The maze is consisting of an S6 block,
which is a wall, S8 a fire pit, and S4 a diamond block.
 If the agent reaches the S4 block, then get the +1 reward; if it
reaches the fire pit, then gets -1 reward point. It can take four
actions: move up, move down, move left, and move right.
 Suppose the agent considers the path S9-S5-S1-S2-S3, so he will
get the +1-reward point.
 The agent will try to remember the preceding steps that it has
taken to reach the final step. To memorize the steps, it assigns 1
value to each previous step. Consider the below step:
 Now, the agent has successfully stored the previous steps
assigning the 1 value to each previous block.
 But what will the agent do if he starts moving from the block,
which has 1 value block on both sides?
 Consider the below diagram:

 Hence to solve the problem, we will use the Bellman equation,


which is the main concept behind reinforcement learning.
Bellman Equation
 Bellman equation was introduced by the
Mathematician Richard Ernest Bellman in the year 1953.
 It is associated with dynamic programming and is used to
calculate the values of a decision problem at a certain point by
including the values of previous states.
Key-elements used in Bellman equations are :
 Action performed by the agent is referred to as "a"
 State occurred by performing the action is "s."
 The reward/feedback obtained for each good and bad action is
"R."
 A discount factor is Gamma "γ."
 The Bellman equation can be written as:
 V(s) = max [R(s,a) + γV(s`)]
Where,
V(s)= value calculated at a particular point.
R(s,a) = Reward at a particular state s by performing an action.
γ = Discount factor
 If the discount rate is close to 0, future rewards won’t count for
much in comparison to immediate rewards.
 In contrast, if the discount rate is close to 1, rewards that are far
in the future will be almost as important as immediate
rewards.
V(s`) = The value at the previous state.
 So now, using the Bellman equation, we will find value at each
state of the given environment.
 We will start from the block, which is next to the target block.
 For 1st block: V(s3) = max [R(s,a) + γV(s`)], here V(s')= 0
because there is no further state to move.
 V(s3)= max[R(s,a)]=> V(s3)= max[1]=> V(s3)= 1.
 For 2nd block:
 V(s2) = max [R(s,a) + γV(s`)], here γ= 0.9(say), V(s')= 1,
and R(s, a)= 0, because there is no reward at this state.
 V(s2)= max[0.9(1)]=> V(s2)= max[0.9]=> V(s2) =0.9
 For 3rd block:
 V(s1) = max [R(s,a) + γV(s`)], here γ= 0.9, V(s')= 0.9, and
R(s, a)= 0, because there is no reward at this state also.
 V(s1)= max[0.9(0.9)]=> V(s3)= max[0.81]=> V(s1) =0.81
For 4th block:
 V(s5) = max [R(s,a) + γV(s`)], here γ= 0.9, V(s')= 0.81, and
R(s, a)= 0, because there is no reward at this state also.
 V(s5)= max[0.9(0.81)]=> V(s5)= max[0.81]=> V(s5) =0.73
For 5th block:
 V(s9) = max [R(s,a) + γV(s`)], here γ= 0.9, V(s')= 0.73, and
R(s, a)= 0, because there is no reward at this state also.
 V(s9)= max[0.9(0.73)]=> V(s4)= max[0.81]=> V(s4) =0.66
Learning Models in RL

 Markov Decision Process


 Q-Learning Algorithm
 Deep Q Learning
Markov Decision Process
 MDP is a framework that can solve most Reinforcement
Learning problems with discrete actions.
 With the Markov Decision Process, an agent can arrive at an
optimal policy for maximum rewards over time.
The Markov Property
 The Markov Property states that :
“Future is Independent of the past given the present”
E.g : In a Chess game, the players only focus on the current
state and do not need to remember past actions or states.
 Mathematically we can express this statement as :

 S[t] denotes the current state of the agent and s[t+1] denotes
the next state.
Markov Property
 It says that "If the agent is present in the current
state S1, performs an action a1 and move to the
state s2, then the state transition from s1 to s2
only depends on the current state and future
action and states do not depend on past actions,
rewards, or states."
 Or, in other words, as per Markov Property, the current
state transition does not depend on any past action or
state
Markov Decision Process
Q-Learning Algorithm
 Q-learning is a popular model-free reinforcement learning
algorithm based on the Bellman equation.
 Model-free means that the agent uses predictions of the
environment’s expected response to move forward.
 It does not use the reward system to learn, but rather, trial and
error.
 The main objective of Q-learning is to learn the policy which can
inform the agent that what actions should be taken for
maximizing the reward under what circumstances.
 The goal of the agent in Q-learning is to maximize the value of Q.
 Q stands for quality in Q-learning, which means it specifies the
quality of an action taken by the agent.
Q-Learning algorithm
Q-Learning algorithm works like this:
 Initialize all Q-values, e.g., with zeros
 Choose an action a in the current state s based on the
current best Q-value
 Perform this action a and observe the outcome (new
state s’).
 Measure the reward R after this action
 Update Q with an update formula that is called
the Bellman Equation.
Repeat steps 2 to 5 until the learning no longer improves
Q-Learning Algorithm
 An example of Q-learning is an Advertisement
recommendation system.
Deep Q-Learning
 Q-Learning approach is practical for very small
environments and quickly loses it’s feasibility when the
number of states and actions in the environment increases.
 The solution for the above leads us to Deep Q-
Learning which uses a deep neural network to approximate
the values.
 Deep Q Learning uses the Q-learning idea and takes it one
step further.
 Instead of using a Q-table, we use a Neural Network that
takes a state and approximates the Q-values for each action
based on that state.
 The basic working step for Deep Q-Learning is that the
initial state is fed into the neural network and it returns the
Q-value of all possible actions as an output.
 The difference between Q-Learning and Deep Q-Learning
can be illustrated as follows:
Deep Q-Learning
 We do this because using a classic Q-table is not very scalable.
 It might work for simple games, but in a more complex game
with dozens of possible actions and game states the Q-table
will soon get complex and cannot be solved efficiently
anymore.
 So now we use a Deep Neural Network that gets the state
as input, and produces different Q values for each action.
 Then again we choose the action with the highest Q-value.
 The learning process is still the same with the iterative update
approach, but instead of updating the Q-Table, here we
update the weights in the neural network so that the outputs
are improved.

You might also like