Robots Without Romance: Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT

Robots without Romance
Leslie Pack Kaelbling

Artificial Intelligence Laboratory
MIT
DARPA MARS PI Meeting 02/02
Human-Level Driving
This is incredibly hard:
21.23 Observes traffic ahead, both parked and moving vehicles, to
include cycles possible obscured by larger vehicles
21.262 Notes drivers who drive with frequent changes in speed
36.1151 Notes vehicles with exhaust smoke coming from it
41.1321 Signals as soon as possible without causing confusion
No way to just program this directly

Could we learn to do it? How? What parts?
Learning to Drive
How long does it take a human to learn to drive?
5 hours per week for a year = (about) 250 hours
How many experiences?
At 1/sec, about 1,000,000
This is
a lot of wall-clock time
not very much experience, from the reinforcementlearning perspective
Romantic View of Learning
Figure out the one true learning

algorithm
Dont pollute it with your preconceptions
Let it run for a long time
Come back and find a robot thats

smarter than you
The Death of Romance

algorithm

smarter than you
The Death of Romance

algorithm

smarter than you
Making Learning Practical

Run robot training camps
What to Learn?
Current state: estimate the current state of the world
localization, tracking, mapping
Policy: how should the robot behave?
world state action
World dynamics: how does the world change as a
function of the robots actions?
world state x action world state
use with a planner
How to Train a Robot

to behave like you
Full supervision: human provides input/output pairs
Evolutionary elimination of operator intervention:
The Man Behind the Curtain
Indirect supervision: human guides robot through a
behavior using different sensori-motor modalities
Using human vision + joystick to train mobile robot
with laser rangefinder
ALVINN
Imitation: watch someone else do the task
Like supervised learning, but hard to transfer
How to Train a Robot

to be better than you
Reinforcement: tell the robot when its behavior is good

or bad
Reinforcement with human guidance
10
RL with Human Guidance

Joint work with Bill Smart
Humans are bad at writing good robot programs,
but great at writing bad ones!
Supply an example policy
Need not be optimal
Explicitly coded or using direct control
Used to generate experience
Follow example policy and learn about the world
Shows learner interesting parts of the space
Bad initial policies might be better
11
Learning Phase One
Environment
R O
Supplied
Control
Policy
Learning
System
12
Learning Phase Two
Environment
R O
Supplied
Control
Policy
Learning
System
13
What Does This Give Us?

Natural way to insert human knowledge
Keeps the robot safe in early stages of learning
Bootstraps information into the Q-function
14
Corridor Following
+10
15
Look Ma! No zeros!

Average steps to goal
Steps to goal
Phase 1
Phase 2
125
Average training
105
85
optimal
65
5
15
25
15
25
Training runs
16
Obstacle Avoidance
+1
-1
-1
17
Obstacle Avoidance
% Successful Runs
Successful Runs
100
80
Phase 1
60
Phase 2
40
20
0
0
10
20
30
10
20
30
Training Runs
18
Obstacle Avoidance
Steps to Goal
150
Steps
130
best example
110
90
70
50
optimal
10 15 20 25 30 5 10 15 20 25 30
Training Runs
19
After 5 Phase 1 Runs
20
After 40 Phase 2 Runs
21
How Long?
Each task took about 2 hours
Phase 1 and phase 2 training
Not including evaluation runs
Much faster than hand-coding and tuning
Obstacle avoidance simulation with examples
No obstacles
Time to reach goal state with arbitrary actions
28.5% reach goal in less than one week
Average time for successful runs is 6 hours
22

Build in bias
direct policy search: fix controller with just a few
free parameters
factor: many small instances of a learning or
reasoning problem
23

Build in bias
Do brain transplants!
24
Driving is AI Complete!
In order to do a good job at driving you need
Nave physics
Will it damage my car if I run over that thing?
Folk psychology
Will that person be mad if I dart into that small
space in front of him?
Humans know a lot about the world before they start to
learn to drive
we need to understand fundamental commonsense
AI
25
If I Were King
Two high-level parallel efforts:
Bottom-up driving systems starting in highly
restricted domains and gradually relaxing
restrictions
Basic research in how to acquire and use common
sense in everyday tasks, such as driving
26
The Solipsist Driver

The solipsist driver imagines hes all alone:
Stays (mostly) in his lane
Executes left and right turns correctly (ignoring
traffic)
Avoids most static obstacles in his path
Can park in most reasonable spaces
27
The Moron Driver

The moron driver
Has the skills of the solipsist driver
Takes traffic into account during lane-changes
and turns
Predicts traffic motions using simple extrapolation
Tries to avoid pedestrians if it sees them, but
doesnt doesnt make clever predictions or
actively look for them
28
JuergenSchmidhuber

Robots Without Romance: Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Robots Without Romance: Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT

Uploaded by

Copyright:

Available Formats

Robots without Romance

Leslie Pack Kaelbling

Leslie Pack Kaelbling

DARPA MARS PI Meeting 02/02

No way to just program this directly

Leslie Pack Kaelbling

DARPA MARS PI Meeting 02/02

DARPA MARS PI Meeting 02/02

Romantic View of Learning

Figure out the one true learning

Dont pollute it with your preconceptions

Let it run for a long time

Come back and find a robot thats

Leslie Pack Kaelbling

DARPA MARS PI Meeting 02/02

The Death of Romance

Figure out the one true learning

Dont pollute it with your preconceptions

Let it run for a long time

Come back and find a robot thats

Leslie Pack Kaelbling

DARPA MARS PI Meeting 02/02

The Death of Romance

Figure out the one true learning

Dont pollute it with your preconceptions

Let it run for a long time

Come back and find a robot thats

Leslie Pack Kaelbling

DARPA MARS PI Meeting 02/02

Making Learning Practical

Leslie Pack Kaelbling

DARPA MARS PI Meeting 02/02

Leslie Pack Kaelbling

DARPA MARS PI Meeting 02/02

How to Train a Robot

DARPA MARS PI Meeting 02/02

How to Train a Robot

Reinforcement: tell the robot when its behavior is good

Reinforcement with human guidance

Leslie Pack Kaelbling

DARPA MARS PI Meeting 02/02

RL with Human Guidance

DARPA MARS PI Meeting 02/02

Learning Phase One

Leslie Pack Kaelbling

DARPA MARS PI Meeting 02/02

Learning Phase Two

Leslie Pack Kaelbling

DARPA MARS PI Meeting 02/02

What Does This Give Us?

Leslie Pack Kaelbling

DARPA MARS PI Meeting 02/02

Leslie Pack Kaelbling

DARPA MARS PI Meeting 02/02

Look Ma! No zeros!

Leslie Pack Kaelbling

DARPA MARS PI Meeting 02/02

Leslie Pack Kaelbling

DARPA MARS PI Meeting 02/02

Leslie Pack Kaelbling

DARPA MARS PI Meeting 02/02

Leslie Pack Kaelbling