Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

REINFORCEMENT LEARNING

GameMindsDT
OUR DECISION TRANSFORMER

Pol Fernández
Omar Aguilera
Shuang Long
Edgar Planell
Alex Barrachina

Documentation on GitHub!
Motivation
● We want to success at getting an agent that
is able to learn and make its own decision in
a complex environment.

● RL is a interesting field to dive into it.

● Transformers are really popular nowadays.

● The fusion of both tech are a innovative idea,


since its another approach to RL.

2
Framework & Tech Requirements

3
Decision Transformer
Definition:
● Based on a transformer architecture
● Reduces de RL problem to a sequential
problem
● Uses a GPT-like model to make predictions

4
Decision Transformer
Implementation
Input embedding
Position embedding

5
Decision Transformer
Implementation

Continuous Discrete

Architecture Values inside range [-1, 1]. _


Hyperbolic Tangent is used

Training Mean Square Error is used Cross Entropy is used

6
Atari
environment

Why: Implementation:
○ Complexity ○ States are images ○ Tanh > LayerNorm
○ Benchmarking ○ Discrete ○ Global position encoder

DT-Atari
Atari environment
Results:
○ Learned to play well !
○ Better scores than other offline RL techniques
○ Better with expert data
○ Initial R-t-g => 5* maximum in dataset

DT-Atari 8
Atari environment
Future explorations:
○ Better data > lives
○ Split images in patches
○ Position encoding strategies > ALiBi, RoPE
○ Multi-Game-DT

DT-Atari 9
Decision Transformer
applied to

Pybullet Environments-V0

Hopper Walker2D Halfcheetah Ant


10
Link to GitHub Readme
Pybullet - Datasets Ex
pe
rt
Po
e Random: Datasets sampled with a randomly initialized policy. li
a c ●
cy?
S p ● Medium: Datasets sampled with a medium-level policy.
us
u o Mixed: Datasets collected during policy training.
i n ●

n t
Co

11
Pybullet - Datasets Sa
mp
le
s
Re
du
ce
d

● Normalization of episodes length:


Excluded episodes shorter than the mean duration to avoid introducing noise into training.

● Normalization of Observation Space:


With standard score normalization, to ensure normalization of the observation space beyond
mean and standard deviation from data.

● Generation of Additional Model Inputs:


Manually computed the return-to-go and timesteps arrays.
12
Pybullet - Hopper Environment
�� ♂
Root cause?
Normalization of Observation
Space during inference was Results:
missing!

��
Predic
ted Ac
tion
x1000


���
(~ 20 Train/Test runs ) 13
Pybullet - Walker 2D Environment
Training

Results:

Test
859,26 : Test Run [Attempt
1]

733,893 : Test Run [Attempt


1110,386 : Test Run [Attempt
2]

���
3]

(~ 20 Train/Test runs ) 14
Pybullet - Halfcheetah Environment
Changes Implemented?
● Training with ↑ Computational demanding
hyperparameters (↑Powerful Hardw) Results(?):
● Global Position Encoding

● Constant Return-to-go (∞ / 0)
“Exploration-exploitation trade-off”

Not Successful
results
�� �
���
(~ 100 Train/Test runs ) 15
Pybullet - Ant Environment
● Environment with the highest dimensionality (↑ Difficulty)

● Similar changes implemented with Halfcheetah env


(↑Powerful Hardw)
Not Successful
Training Highlights: results

��
��
(~ 40 Train/Test runs ) 16
Pybullet - Conclusions & Future Work
● The decision transformer succeed in 2/4 Pybullet Environments-V0

● Train Decision Transformer with a larger Dataset (Expert Policy)

● Implement other positional encoding strategies

✓ Global Position Encoding


✓ Rotary Position Embedding (RoPE)
✓ Attention with Linear Biases (ALiBi)

● Try other physics Environments (Mujoco/Gymnasium):

✓ More updated documentation (easier Troubleshooting)


✓ Less dependencies depreciated & more compatibility

● Benchmark Decision Transformer performance vs SoA RL Algorithms:

17
MineRL “Find Cave” - From theory to practice

18
MineRL - Our DT Agent Find Caves
Episode

VPT
Episodes
1024 x Frame actions
embeddings

19
Results
● 50% is not only luck
● Good Explorer
● The other tasks fails

20
Key insights

● Training is quite simple ● Not enough data for complex tasks


● VPT is Powerful ● Enough data for simple tasks

DT-MineRL

21
Project Milestones: Our Journey

22
Conclusions

📈 Deeper understanding of DT’s, harnessing its potential


💡 Improvements to the original Decision Transformer
🐋 The versatility and importance of Docker

23
THANK YOU FOR YOUR
ATTENTION

Q&A TIME!
24

You might also like