Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

REINFORCEMENT AND REPRESENTATION

LEARNING

DEEP Q-NETWORKS
Presented by Group-8;
21BAI10156 AISHIKA RANJAN
WHAT IS DEEP Q-NETWORKS ??

DQN (Deep Q-Network) is a Reinforcement Learning technique that uses a


powerful tool called a Neural Network to learn the best actions to take in the
situation

This method allows an AI agent to learn how to navigate an environment


through trial and error, by using deep learning to analyze the environment and
take actions that lead to positive rewards.
Deep Q Networks (DQN) combine deep learning and
reinforcement learning techniques to approximate optimal
action-selection policies.

DQN is particularly effective in environments with large, high-


dimensional state spaces, such as video games and robotic
control tasks. For eg. Atari

By leveraging neural networks to approximate Q-values, DQN


enables agents to learn complex decision-making strategies.
BUT, WHAT IS REINFORCEMENT LEARNING ??

RL is a method by which an AI agent


learns through interacting with its
environment.
The agent receives rewards for actions
that bring it closer to a goal and learns
to avoid actions that do not
UNDERSTANDING DEEP Q-NETWORK

This AI(agent) uses Q-table to navigate through


A Q-table is a data structure
the environment that contains sets of actions
and states. It's a lookup table
that calculates the maximum
expected future rewards for
each action at each state.
UNDERSTANDING DEEP Q-NETWORK

But if the Q-table gets too big ...


with more states & actions our AI
can’t handle it

So, We can have a function instead of a large Q-table as conscience of AI

state
Q- values
action
UNDERSTANDING DEEP Q-NETWORK

state
Q- values
action

This Function is nothing but Neural Network

state

Q-values

action
Qπ(s,a) = how good is
it to perform action a
in state s while
following policy π
-----
UNDERSTANDING DEEP Q-NETWORK

-
BACK PROPOGATION

}
--
Q-Network
Loss Loss
Function

Q- Learning is to learn the


values in the Q-table; Deep
Q-Learning is to learn the
Target parameters of Q-Network
Network
(Ideal Conscience) or a Neural Network
REPLAY MEMORY
Replay Memory is used to store each of the
Replay Memory interactions that the agent has with the
environment.

It has a fixed size of N.


N
As the memories fills up, older experiences are
removed to make space for newer ones.

Replay Memory

The use of replay memory ensures that the agent sees each data
point multple times, before the data point is removed from
memory.
This is especially good for environments where data samples are
costly to collect.
{
1. DATA
COLLECTION

2 PHASES
2. TRAINING
1. DATA
COLLECTION
(state, action, reward, new state)

(s,a,r,s’) values get stored in “REPLAY MEMORY”

then s’ -> s (new state becomes current state)

And Repeat the process of Data Collection


2. TRAINING
Data gets collected in “REPLAY MEMORY”
take a batch of data from the Replay Memory
Train the Q-network using back propagation
-----

-
BACK PROPOGATION

}
Q-value

--
Q-Network
Mean 1.8821
Sq. Error
Q-value

Target
Network
(Ideal Conscience)

Repeat for few more Batches And our Q-Network Gets Trained
EPSILON-GREEDY POLICY
The epsilon-greedy approach selects the action with the highest estimated reward most
of the time. The aim is to have a balance between exploration and exploitation.

1 Let = 0.5 Threshold value

2 r = random.random ( )
3 if r < :
4 perform random action

5 else :

6 perform QNet action


APPLICATION
Deep Q-Learning has been applied to a wide range of problems, including game playing, robotics,
and autonomous vehicles.

For example, it has been used to train agents that can play games such as Atari and Go, and to
control robots for tasks such as grasping and navigation.
THANK YOU!

You might also like