Professional Documents
Culture Documents
Deep Q-Network
Deep Q-Network
LEARNING
DEEP Q-NETWORKS
Presented by Group-8;
21BAI10156 AISHIKA RANJAN
WHAT IS DEEP Q-NETWORKS ??
state
Q- values
action
UNDERSTANDING DEEP Q-NETWORK
state
Q- values
action
state
Q-values
action
Qπ(s,a) = how good is
it to perform action a
in state s while
following policy π
-----
UNDERSTANDING DEEP Q-NETWORK
-
BACK PROPOGATION
}
--
Q-Network
Loss Loss
Function
Replay Memory
The use of replay memory ensures that the agent sees each data
point multple times, before the data point is removed from
memory.
This is especially good for environments where data samples are
costly to collect.
{
1. DATA
COLLECTION
2 PHASES
2. TRAINING
1. DATA
COLLECTION
(state, action, reward, new state)
-
BACK PROPOGATION
}
Q-value
--
Q-Network
Mean 1.8821
Sq. Error
Q-value
Target
Network
(Ideal Conscience)
Repeat for few more Batches And our Q-Network Gets Trained
EPSILON-GREEDY POLICY
The epsilon-greedy approach selects the action with the highest estimated reward most
of the time. The aim is to have a balance between exploration and exploitation.
2 r = random.random ( )
3 if r < :
4 perform random action
5 else :
For example, it has been used to train agents that can play games such as Atari and Go, and to
control robots for tasks such as grasping and navigation.
THANK YOU!