Professional Documents
Culture Documents
GameMindsDT - Final Report
GameMindsDT - Final Report
GameMindsDT
OUR DECISION TRANSFORMER
Pol Fernández
Omar Aguilera
Shuang Long
Edgar Planell
Alex Barrachina
Documentation on GitHub!
Motivation
● We want to success at getting an agent that
is able to learn and make its own decision in
a complex environment.
2
Framework & Tech Requirements
3
Decision Transformer
Definition:
● Based on a transformer architecture
● Reduces de RL problem to a sequential
problem
● Uses a GPT-like model to make predictions
4
Decision Transformer
Implementation
Input embedding
Position embedding
5
Decision Transformer
Implementation
Continuous Discrete
6
Atari
environment
Why: Implementation:
○ Complexity ○ States are images ○ Tanh > LayerNorm
○ Benchmarking ○ Discrete ○ Global position encoder
DT-Atari
Atari environment
Results:
○ Learned to play well !
○ Better scores than other offline RL techniques
○ Better with expert data
○ Initial R-t-g => 5* maximum in dataset
DT-Atari 8
Atari environment
Future explorations:
○ Better data > lives
○ Split images in patches
○ Position encoding strategies > ALiBi, RoPE
○ Multi-Game-DT
DT-Atari 9
Decision Transformer
applied to
Pybullet Environments-V0
n t
Co
11
Pybullet - Datasets Sa
mp
le
s
Re
du
ce
d
��
Predic
ted Ac
tion
x1000
�
���
(~ 20 Train/Test runs ) 13
Pybullet - Walker 2D Environment
Training
Results:
Test
859,26 : Test Run [Attempt
1]
(~ 20 Train/Test runs ) 14
Pybullet - Halfcheetah Environment
Changes Implemented?
● Training with ↑ Computational demanding
hyperparameters (↑Powerful Hardw) Results(?):
● Global Position Encoding
● Constant Return-to-go (∞ / 0)
“Exploration-exploitation trade-off”
Not Successful
results
�� �
���
(~ 100 Train/Test runs ) 15
Pybullet - Ant Environment
● Environment with the highest dimensionality (↑ Difficulty)
��
��
(~ 40 Train/Test runs ) 16
Pybullet - Conclusions & Future Work
● The decision transformer succeed in 2/4 Pybullet Environments-V0
17
MineRL “Find Cave” - From theory to practice
18
MineRL - Our DT Agent Find Caves
Episode
VPT
Episodes
1024 x Frame actions
embeddings
19
Results
● 50% is not only luck
● Good Explorer
● The other tasks fails
20
Key insights
DT-MineRL
21
Project Milestones: Our Journey
22
Conclusions
23
THANK YOU FOR YOUR
ATTENTION
Q&A TIME!
24