Reinforcement Learning - Open AI Gym

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 13

Reinforcement learning–

Open AI gym
Jakub Senčák, Pavel Podlužanský, Martin Pospísil,
Viet Anh Phan, Dinh Thao Le
Content

• Assignment
• Motivation
• Reinforcement learning
• The chosen problem
• Approach to the problem
• Created solution to the problem
• Results
• Conclusion
Assignment

• Get acquainted with the issue of reinforcement learning.


• Choose any environment from https://gym.openai.com/.
• Create a model that will be able to play the game.
Motivation

• Gaming
• Resouce management
• Personalized recommendations
• Robotics
Reinforcement learning

• Learning from interaction with an


environment to achieve some long-term
goal that is related to the state of the
environment.
• The goal is defined by reward signal,
which must be maximized
• Agent must be able to partially/fully
sense the environment state and take
actions to influence the environment
state
The chosen problem

• Lunar Lander – The goal is to get the


lander to land on the landing pad.
• If the lander lands on the pad =>
+ 100 to +140 points.
• If the lander lands outside of the
pad => -100 to -140 points.
• Episode finishes if the lander
crashes or comes to rest (-100 or
+100 points).
• The problem is solved if we get at
least 200 points.
• Four discrete actions available: do
nothing, fire left orientation engine,
fire main engine, fire right orientation
engine.
Approach to the problem

• Chosen method of RL:


• Deep Q-learning
• Used libraries:
• Numpy
• Tensorflow
• Keras
• The code is executed on the Google Colab notebook.
Q-learning

• The AI agent attempts to construct an optimal policy directly by interacting with the environment.
• It uses a trial-and-error-based approach - The AI agent repeatedly tries to solve the problem using
varied approach, and continuously updates its policy as it learns more and more about the
environment.
Deep Q-learning

• Q-Learning: A table maps each state-


action pair to its corresponding Q-value
• Deep Q-Learning: A Neural Network
maps input states to (action, Q-value)
pairs
Created solution to the problem

• Some codes and explanation here guys


Results

• Screenshot of the scores


• Maybe one or two GIFs or videos
Conclusion
• We get acquainted to Reinforcement learning, Q-learning, Deep Q-
learning
• We created a model that can play the Lunar Lander game.
• The result of the game is xxxxx after xxxxx episodes. Based on that,
we consider the model a success 
Thank you for your attention

You might also like