Professional Documents
Culture Documents
Lec 2 Supervised Learning Behaviors
Lec 2 Supervised Learning Behaviors
CS 285
Instructor: Sergey Levine
UC Berkeley
Terminology & notation
1. run away
2. ignore
3. pet
Terminology & notation
1. run away
2. ignore
3. pet
Aside: notation
управление
training supervised
data learning
behavioral cloning
Images: Bojarski et al. ‘16, NVIDIA
The original deep imitation learning system
ALVINN: Autonomous Land Vehicle In a Neural Network
1989
Does it work? No!
Does it work? Yes!
cost
RNN state
RNN state
RNN state
1. Output mixture of
Gaussians discrete dim 1
2. Latent variable models (discretized) distribution sampling value
over dimension 1 only
3. Autoregressive
discretization
Imitation learning: recap
training supervised
data learning
training supervised
data learning
For more analysis, see Ross et al. “A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning”
More general analysis
For more analysis, see Ross et al. “A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning”
Another way to imitate
Another imitation idea
Goal-conditioned behavioral cloning
1. Collect data 2. Train goal conditioned policy
3. Reach goals
Going beyond just imitation?
➢ Repeat