Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 5

REINFORCEMENT LEARNING

This model card presents a collection of reinforcement learning algorithms that have been
implemented on various OpenAI GYM environments to draw a comparison between all of
them. The algorithms that have been compared are A2C, Q learning, double Q learning,
DQN, double DQN, duelling double DQN, SARSA, expected SARSA, Proximal Policy
Optimisation and Soft Actor Critic.

MARKOV DECISION PROCESS


Reinforcement learning models are a type of state-based models that utilize the Markov
Decision Process. The Markov decision process is a model of predicting outcomes. Like a
Markov chain, the model attempts to predict an outcome given only information provided
by the current state. However, the Markov decision process incorporates the characteristics
of actions and motivations. At each step during the process, the decision maker may choose
to take an action available in the current state, resulting in the model moving to the next
step and offering the decision maker a reward. A Markov Decision Process implemented as a
Reinforcement Learning problem is depicted in the below diagram

OPENAI GYM ENVIRONMENT


OpenAI GYM is a toolkit for developing and comparing Reinforcement Learning algorithms.
The gym open source library gives access to a standardized set of environments. Gym makes
no assumptions about the structure of the agent and is compatible with any numerical
computation library such a TensorFlow, Theano or PySpark. The interfacing with gym
environments is coded in Python and the following env methods are used.

 reset (self): Reset the environment's state. Returns observation.


 step (self, action): Step the environment by one timestep. Returns observation,
reward, done, info.
 render (self, mode='human'): Render one frame of the environment. The default
mode will do something human friendly, such as pop up a window.

THE PROBLEMS
 BIPEDAL WALKER – V2
 TAXI – V3
 CartPole–V0
 BankHeist–V0
 Breakout–V0
 Kangaroo–V0
 Pong–V2
 Seaquest–V4
 SpaceInvaders–V2
 Pendulum–V0
 Ant–V2
 HalfCheetah–V2
 Hopper–V2
 Walker2D–V2
 Alien–V4
 BeamRider–V4
 FrostBite–V4

MODEL COMPARISON

You might also like