Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 29

Robots without Romance

Leslie Pack Kaelbling


Artificial Intelligence Laboratory
MIT

Leslie Pack Kaelbling

DARPA MARS PI Meeting 02/02

Human-Level Driving
This is incredibly hard:
21.23 Observes traffic ahead, both parked and moving vehicles, to
include cycles possible obscured by larger vehicles
21.262 Notes drivers who drive with frequent changes in speed
36.1151 Notes vehicles with exhaust smoke coming from it
41.1321 Signals as soon as possible without causing confusion

No way to just program this directly


Could we learn to do it? How? What parts?

Leslie Pack Kaelbling

DARPA MARS PI Meeting 02/02

Learning to Drive
How long does it take a human to learn to drive?
5 hours per week for a year = (about) 250 hours
How many experiences?
At 1/sec, about 1,000,000
This is
a lot of wall-clock time
not very much experience, from the reinforcementlearning perspective
Leslie Pack Kaelbling

DARPA MARS PI Meeting 02/02

Romantic View of Learning

Figure out the one true learning


algorithm

Dont pollute it with your preconceptions

Let it run for a long time

Come back and find a robot thats


smarter than you

Leslie Pack Kaelbling

DARPA MARS PI Meeting 02/02

The Death of Romance

Figure out the one true learning


algorithm

Dont pollute it with your preconceptions

Let it run for a long time

Come back and find a robot thats


smarter than you

Leslie Pack Kaelbling

DARPA MARS PI Meeting 02/02

The Death of Romance

Figure out the one true learning


algorithm

Dont pollute it with your preconceptions

Let it run for a long time

Come back and find a robot thats


smarter than you

Leslie Pack Kaelbling

DARPA MARS PI Meeting 02/02

Making Learning Practical


Run robot training camps

Leslie Pack Kaelbling

DARPA MARS PI Meeting 02/02

What to Learn?
Current state: estimate the current state of the world
localization, tracking, mapping
Policy: how should the robot behave?
world state action
World dynamics: how does the world change as a
function of the robots actions?
world state x action world state
use with a planner

Leslie Pack Kaelbling

DARPA MARS PI Meeting 02/02

How to Train a Robot


to behave like you
Full supervision: human provides input/output pairs
Evolutionary elimination of operator intervention:
The Man Behind the Curtain
Indirect supervision: human guides robot through a
behavior using different sensori-motor modalities
Using human vision + joystick to train mobile robot
with laser rangefinder
ALVINN
Imitation: watch someone else do the task
Like supervised learning, but hard to transfer
Leslie Pack Kaelbling

DARPA MARS PI Meeting 02/02

How to Train a Robot


to be better than you

Reinforcement: tell the robot when its behavior is good


or bad

Reinforcement with human guidance

Leslie Pack Kaelbling

10

DARPA MARS PI Meeting 02/02

RL with Human Guidance


Joint work with Bill Smart
Humans are bad at writing good robot programs,
but great at writing bad ones!
Supply an example policy
Need not be optimal
Explicitly coded or using direct control
Used to generate experience
Follow example policy and learn about the world
Shows learner interesting parts of the space
Bad initial policies might be better
Leslie Pack Kaelbling

11

DARPA MARS PI Meeting 02/02

Learning Phase One

Environment

R O

Supplied
Control
Policy

Learning
System

Leslie Pack Kaelbling

12

DARPA MARS PI Meeting 02/02

Learning Phase Two

Environment

R O

Supplied
Control
Policy

Learning
System

Leslie Pack Kaelbling

13

DARPA MARS PI Meeting 02/02

What Does This Give Us?


Natural way to insert human knowledge
Keeps the robot safe in early stages of learning
Bootstraps information into the Q-function

Leslie Pack Kaelbling

14

DARPA MARS PI Meeting 02/02

Corridor Following

+10

Leslie Pack Kaelbling

15

DARPA MARS PI Meeting 02/02

Look Ma! No zeros!


Average steps to goal

Steps to goal

Phase 1

Phase 2

125

Average training

105
85

optimal

65
5

15

25

15

25

Training runs

Leslie Pack Kaelbling

16

DARPA MARS PI Meeting 02/02

Obstacle Avoidance

+1

-1

-1

Leslie Pack Kaelbling

17

DARPA MARS PI Meeting 02/02

Obstacle Avoidance

% Successful Runs

Successful Runs
100
80

Phase 1

60

Phase 2

40
20
0
0

10

20

30

10

20

30

Training Runs

Leslie Pack Kaelbling

18

DARPA MARS PI Meeting 02/02

Obstacle Avoidance
Steps to Goal
150

Steps

130
best example

110
90
70
50

Leslie Pack Kaelbling

optimal

10 15 20 25 30 5 10 15 20 25 30
Training Runs

19

DARPA MARS PI Meeting 02/02

After 5 Phase 1 Runs

Leslie Pack Kaelbling

20

DARPA MARS PI Meeting 02/02

After 40 Phase 2 Runs

Leslie Pack Kaelbling

21

DARPA MARS PI Meeting 02/02

How Long?
Each task took about 2 hours
Phase 1 and phase 2 training
Not including evaluation runs
Much faster than hand-coding and tuning
Obstacle avoidance simulation with examples
No obstacles
Time to reach goal state with arbitrary actions
28.5% reach goal in less than one week
Average time for successful runs is 6 hours

Leslie Pack Kaelbling

22

DARPA MARS PI Meeting 02/02

Making Learning Practical


Run robot training camps
Build in bias
direct policy search: fix controller with just a few
free parameters
factor: many small instances of a learning or
reasoning problem

Leslie Pack Kaelbling

23

DARPA MARS PI Meeting 02/02

Making Learning Practical


Run robot training camps
Build in bias
Do brain transplants!

Leslie Pack Kaelbling

24

DARPA MARS PI Meeting 02/02

Driving is AI Complete!
In order to do a good job at driving you need
Nave physics
Will it damage my car if I run over that thing?
Folk psychology
Will that person be mad if I dart into that small
space in front of him?
Humans know a lot about the world before they start to
learn to drive
we need to understand fundamental commonsense
AI
Leslie Pack Kaelbling

25

DARPA MARS PI Meeting 02/02

If I Were King
Two high-level parallel efforts:
Bottom-up driving systems starting in highly
restricted domains and gradually relaxing
restrictions
Basic research in how to acquire and use common
sense in everyday tasks, such as driving

Leslie Pack Kaelbling

26

DARPA MARS PI Meeting 02/02

The Solipsist Driver


The solipsist driver imagines hes all alone:
Stays (mostly) in his lane
Executes left and right turns correctly (ignoring
traffic)
Avoids most static obstacles in his path
Can park in most reasonable spaces

Leslie Pack Kaelbling

27

DARPA MARS PI Meeting 02/02

The Moron Driver


The moron driver
Has the skills of the solipsist driver
Takes traffic into account during lane-changes
and turns
Predicts traffic motions using simple extrapolation
Tries to avoid pedestrians if it sees them, but
doesnt doesnt make clever predictions or
actively look for them

Leslie Pack Kaelbling

28

DARPA MARS PI Meeting 02/02

JuergenSchmidhuber

You might also like