Professional Documents
Culture Documents
DRLAD ExploringApplicationsTalk2019 CognitiveVehicles
DRLAD ExploringApplicationsTalk2019 CognitiveVehicles
Victor Talpaert(1), Ibrahim Sobh(2), B Ravi Kiran(3), Patrick Mannion(4), Senthil Yogamani(5), Ahmad El-Sallab(2), Patrick Perez(7)
(1) U2IS, ENSTA ParisTech, Palaiseau, AKKA Technologies (2) Valeo Egypt, Cairo (3) Navya Labs, Paris (4) Galway-Mayo Institute of Technology,
Ireland (5) Valeo Vision Systems, Ireland (7) Valeo.ai, France
ravi.kiran@navya.tech
100%Autonomous 100%Driverless 100%Electric
OVERVIEW
Conclusion
Current solutions in deployment in industry
Summary and open questions
2
A U TO N O M O U S D R I V I N G
state
RL Agent Environment
Actions Sensor stream
States-to-Actions (Policy)
Cameras/Lidars/Radars
5
S TAT E S P A C E , A C T I O N S A N D R E W A R D S
7
T E R M I N O LO G I E S
assumption
Reinforcement Learning
Exploration
* On policy methods
Prediction Learning (P, R, Policy) using
Learning using current policy
Evaluation of Policy current policy
Exploitation
Control
Off policy methods Learn Policy given
Infer optimal policy
Learn from observations (P, R, Value function)
(policy/value iteration)
Issues :
Agent mimics expert behavior at the danger of not recovering 1986 2015
from unseen scenarios such as unseen driver behavior, vehicle
orientations, adversarial behavior of agents (overfits expert)
Poorly defined reward functions cause poor exploration
Requires huge No. (>30M samples) of human expert samples
Improvements :
Heuristics to improve data collection in corner cases (Dagger)
Imitation is efficient in practice and still an alternative
DAGGER : A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning
Hierarchical Imitation and Reinforcement Learning 9
S I M U L ATO R S : E N V I R O N M E N T F O R R L Motion planning & traffic simulators
TORCS SUMO
DEEPDRIVE CarRacing-v0
10
Audi partners with Israel's autonomous vehicle simulation startup Cognata
M O D E R N D AY R E I N F O R C E M E N T L E A R N I N G A P P L I C AT I O N S
Temporal abstraction
Credit assignment and Exploration-Exploitation
Dilemma
Cobra effect : RL algorithms are blind maximizers of
expected reward
Hierarchy of tasks
Decompose the problem in multilple
subproblems which are easier.
Combining learnt policies
https://thegradient.pub/the-promise-of-hierarchical-reinforcement-learning/
14
Composing Meta-Policies for Autonomous Driving Using Hierarchical Deep Reinforcement Learning
L O N G TA I L D R I V E R P O L I C Y
https://nv-tlabs.github.io/meta-sim/#
16
Carla Challenge 2019
C H A L L E N G E S S I M U L AT I O N - R E A L I T Y G A P
Handling domain transfer
https://worldmodels.github.io/
18
CHALLENGES: SAFET Y AND REPRODUCIBILIT Y
19
Carla Challenge 2019
D R L A D : M O D E R N D AY D E P L OY M E N T S
Options Graph :
21
WHERE IS THE HYPE ON DEEP RL
22
WHERE IS THE HYPE ON DEEP RL
24
REFERENCES
World Models Ha, Schimdhuber NeurIPS 2018 . Shai Shalev-Shwartz, Shaked Shammah, and Amnon
Shashua. Safe, multi-agent, reinforcement learning for
Jianyu Chen, Zining Wang, and Masayoshi Tomizuka. autonomous driving 2016.
“Deep Hierarchical Reinforcement Learning for
Autonomous Driving with Distinct Behaviors”. 2018 Learning to Drive in a Day, Alex Kendall et al. 2018
IEEE Intelligent Vehicles Symposium (IV). Wayve
Peter Henderson et al. “Deep Reinforcement ChauffeurNet: Learning to Drive by Imitating the Best
Learning That Matters”. In: (AAAI-18), 2018. and Synthesizing the Worst
Andrew Y Ng, Stuart J Russell, et al. “Algorithms for StreetLearn, Deepmind google
inverse reinforcement learning
Daniel Chi Kit Ngai and Nelson Hon Ching Yung. “A A Systematic Review of Perception System and
multiple-goal reinforcement learning method for Simulators for Autonomous Vehicles Research [pdf]
complex vehicle overtaking maneuvers”
Meta-Sim: Learning to Generate Synthetic Datasets
Stephane Ross and Drew Bagnell. “Efficient [link][pdf] Sanja Fidler et al.
reductions for imitation learning”. In: Proceedings of the
thirteenth international conference on artificial Deep Reinforcement Learning in the Enterprise:
intelligence and statistics. 2010
Bridging the Gap from Games to Industry 2017 [link]
Learning to Drive using Inverse Reinforcement
Learning and Deep Q-Networks, Sahand Sharifzadeh
et al.
25
DEEP Q LEARNING
Auto nom o us dr iving agent in TORCS
26
M A R KOV D E C I SI O N PRO C E SS
Markovian assumptio n on reward structur e
27
C H A L L E N G E S I M I TAT I O N L E A R N I N G
Source
improving
diversity of
steering angle