Professional Documents
Culture Documents
Learning Planning Operators by Observation and Practice Xuemei Wang
Learning Planning Operators by Observation and Practice Xuemei Wang
Learning Planning Operators by Observation and Practice Xuemei Wang
Abstract
Theworkdescribed in this paper addresseslearning
planningoperatorsby observingexpertagents and subsequent knowledgerefinementin a learning-by-doing
paradigm.Theobservationsof the expert agent consist of: 1) the sequenceof actions being executed,
2) the state in whicheachaction is executed,and 3)
the state resulting fromthe executionof each action.
Planning operators are learned fromthese observation sequencesin an incrementalfashion utilizing a
conservativespecific-to-generalinductivegeneralization process. In order to refine the newoperators
to makethemcorrect and complete,the systemuses
the newoperatorsto solve practice problems,analyzing and learning fromthe executiontraces of the resulting solutions or executionfailures. Wedescribe
techniquesfor planningand plan repair with incorrect and incompletedomainknowledge,and for operator refinementthrougha processwhichintegrates
planning, execution, and plan repair. Our learning methodis implementedon top of the PRODIGY
architecture(Carbonell, Knoblock,&Minton1990;
Carbonell et al. 1992) and is demonstratedin the
extended-strips domain(Minton
1988)and a subset
the processplanningdomain(Gil1991).
The Challenge
There is considerable interest in learning for planning
systems. Most of this work concentrates on learning
search control knowledge(Fikes, Hart, &Nilsson 1972;
Mitchell, Utgoff, & Banerji 1983; Minton 1988; Veloso
1992). Previous work on acquiring planning operators refines incomplete operators by active experimentation(Gil
1992). This paper describes a frameworkfor learning planning operators by observing expert agents and subsequent
knowledgerefinement in a learning-by-doingparadigm(Anzai & Simon 1979).
*Thisresearch is sponsoredby the WrightLaboratory,Aeronautical SystemsCenter, Air Force Materiel Command,
USAF,
and the Advanced
ResearchProjects Agency(ARPA)
under grant
numberF33615-93-1-1330.
Viewsand conclusions contained in
this document
are thoseof the authorsandshouldnot be interpreted
as necessarilyrepresentingofficial policiesor endorsements,
either
expressedor implied, of WrightLaboratoryor the UnitedStates
Government.
From: AIPS 1994 Proceedings. Copyright 1994, AAAI (www.aaai.org). All rights reserved.
Learning
Planning Operators
Architecture
- the
The Learning
~Xge~n~
- actio~v
:~ ob.~er~atio~"
Environment.__._.(
pla~EodX
ueC~tlon
)
lentatwe
plnn.
of
the
operators
Module
The learning moduleformulates and refines the representations of the operators in the domainfrom the observations
of the expert agent and the execution traces of the systems ownactions. Operators are learned in an incremental
fashion utilizing a conservative specific-to-general inductive generalization process and a set of domain-independent
heuristics. At any point of time duringlearning, any learned
operator alwayshas all the correct preconditions,but it may
also have someunnecessary preconditions. These unnecessary preconditions can be removedwith more observations
and practice. The learning modulealso learns heuristics for
subgoal ordering from the observed episodes of the expert
agent.
Learning Operators from Observations Given an observation, if the action is observedfor the first time, we
create the correspondingoperator such that its precondition
is the parameterizedpre-state, and its effect is the parameterized delta-state. The type for each variable in the operator is the most specific type in the type hierarchythat the
correspondingparameteriz~,.d object has2. This is the most
conservative constant-to-variable generalization. Generalizing the observation in Figure 2 yields the operator shown
in Figure3.
Pre-.r tat:
(connects dr 13 rml rm3) (connects dr 13 rm3rm
(dr-to-rm drl 3 rml ) (dr-to-rm drl3 rm3)
(inroom robot rml ) (inroom box I rm3) (pu~ablo box
(dr-clo.c@..ddrl 3) (unlockeddrl 3) (arm-empty)
Delta-state: add: (next-to robot drl3)
Figure2: Anobservationof the state beforeandafter the application of the operator GOTO-DR
Module
POSTERS
(operator guto-dr
(preconds ((<v ! > door) (<v2>object) (<v3> room)(<v4>
(and (inrcom robot <v4>)
(connects <v I <v3><v4>)
(connects <v I <v4><v3>)
(dr-to-rm <v I <v3>)
(dr-to-tin <v I <v4>)
(unlocked<v I >)
(dr-cloud <v 1 >)
(arm-empty)
(inroom <v2> <v3>)
(pushable <v2>)))
(effects nil ((add (next-to robot <v ! >)))))
From: AIPS 1994 Proceedings. Copyright 1994, AAAI (www.aaai.org). All rights reserved.
newobservation3, the effects are augmentedby adding facts
rying out the observedactions based on the "rational agent"
that are in the delta-state of the observation. For example,
assumption (Newell 1982). Inferring what the top-level
given the observation in Figure 4, the operator GOTO-DR
is
goals are and howthe preconditions of each operator in the
refined: (dr-closed <vl>) is removedfrom the preconditions,
observations is achieved provides the systemheuristics for
and the effects become:
planning. For example, the system can prefer to subgoal on
(effects ((<vS>Object))
the
preconditions of an operator that were subgoaled on by
(add(next-torobot<vl>))(del (next-torobot<v5>)))
the observedagent.
The system infers howthe preconditions of each operator
Pre-sMfe:
are achieved by creating an InfeTred Subgoal Graph(ISG).
(connectsdr ! 2 rm2rml) (connectsdrl 2 rmI rm2)
An ISG is a directed acyclic graph (DAG)whosenodes are
(dr-to-rmdrl 2 rml) (dr-to-rmdr12 rm2)
either operators or goals. Anedge in an ISGconnects an
(next-torobota) (dr-opendrl 2) (unlocked
drl
(inroomrobotrmI ) (inmom
c rm2)(iaroomb rm
operator to the goal it achieves, or connects a goal to the
(inroomarml) (pushablec) (pushableb) (ann-empty)
operator that requires it as a precondition. The root of an
ISGis the virtual operator *finish* that is connectedto the
Delta-state:add:(next-torobotdrl2)del: (next-torobot
partially inferred goals.
Figure 4: The secondobservation of the operator GOTO-DR
The systeminitializes the ISGby setting its root to the
virtual operator *finish*. The last operator in the episode
must be achieving sometop-level goals. Since everything
Refining Operators with Practice Because of the incorin the delta-state of the observationcan be a potential toprectness and incompleteness of the learned operators, the
level goal, and since it is difficult to distinguish the true
plans produced by the planning modulemay be incorrect.
top-level goals from subgoals or side effects, the complete
This will be explained in section. While executing these
delta-state is presumedto be part of the top-level goals. A
plans in the environment,one of the following two outcomes
node is created for these goals and is connectedto the root
is possible for each operator:
of the ISG, i.e. to operator *finish*. A node is also cre(1) The operator is executedif the state changespredicted
ated for the operator in the observation, and is connectedto
by the effects of the operator happenafter the operator exethe top-level goals it achieves. The systemthen infers how
cution. In this case, the executionis treated as an observaeach precondition of the operator is achieved. A precontion and the preconditins and the effects of the operator are
dition is inferred to be achievedby an observation if it is
modifiedas described in the previous sectione.
addedin the delta-state of the observation. Thenthe system
(2) The operator is un-executed if the state does not
recursively processes the episode and goes backwardsthe
change as predicted because of unmet necessary preconepisode, determines howthe preconditions of the operator
ditions of the operator in the environment. Whenan opare achieved. In an ISG, the preconditions of an operaerator is un-executed, the learning modulecomputes the
tor that are connected to someoperator that achieves them
unmetpreconditions of the operator. If there is only one
are considered to have been subgoaled on by the observed
such precondition, we mark this precondition as a necesagent. These preconditions are called subgoaled-preconds
sary precondition of the operator. Otherwise, none of the
of the operator.
preconditions is markedas necessarybecause of insufficient
information. Whenall the preconditins of an operator are
The Planning Module
markedas necessary, the system has learned the correct
The operators acquired by the learning moduleare themrepresentaionof the operator.
selves incorrect or incomplete. To plan with such opLearning Negated Preconditions The learned operators
erators, PRODIGYs
search algorithm is augmented with
are used to solve practice problems. During practice, if
somedomain-independentheuristics to generate plans that
are possibly incompleteor incorrect.
an operator is un-executed even whenall the preconditions
of the operators are satisfied, the learning modulediscovers
Givena practice problem, the systemfirst generates an
tentative plan. Then the plan is executed in a simulated
that somenegated preconditions are missing. The negations
environmentA, and repaired if someoperators in the plan
of all the predicatesthat are true in the current state whenthe
operator is un-executed,but are not true in the observations
cannot be executed because of unmet necessary preconditions in the environment. The following sections describe
of this operator are conjected as potential negated preconditions, and are addedto the preconditions of the operator.
our approach to plan with incorrect and incomplete operaSomeof the added negated preconditions maybe unnecestors and to repair a failed plan.
sary, and they will be removedin the same wayas all other
Planning with Incomplete or Incorrect DomainKnowlpreconditions with more observations and practices.
edge PRODIGYs
standard planner assumes a correct ac-
337
From: AIPS 1994 Proceedings. Copyright 1994, AAAI (www.aaai.org). All rights reserved.
tions: whena learned operator has unnecessary preconditions. In this case, the systemhas to achieveits unnecessary
preconditions during planning. Not only does this force
the system to do unnecessarysearch, but also it can make
manysolvable problems unsolvable because it maynot be
possible to achieve some unnecessary preconditions. (b)
incomplete effects: when someeffects of an operator are
not learned because they are present in the pre-state of the
operator. While searching for a plan, PRODIGY
applies in
its internal modelan operator that is missingsomeeffects,
the state becomesincorrect.
Theseproblemsmakeit very difficult to generate a correct plan for a problem. Therefore, we relax the assumption that all the plans generatedby the planner are correct.
over-specific preconditions prevents the system from generating any plan at all, and are dealt with by using applicable
threshold heuristic and subgoalordering heuristic to generate plans that includes operators with unmetpreconditions.
incompleteeffects is not dealt with explicitly in our planner.
Whenan operator cannot be executed because of unmetnecessary preconditions, the plan repair mechanism
is invoked
to generate a modified plan. PRODIGYs
basic search algorithm is modified by using the following two heuristics
to generate plans that are possibly incompleteor incorrect:
Applicable threshold heuristic: While computing the
applicable operators, a threshold variable *applicablethreshold* is introduced such that an operator is considered applicable in the planners internal modelduring
planning whenall the subgoaled-precondsand the necessary precondsare satisfied, and whenthe fraction of the
satisfied preconditions exceeds *applicable-threshold*.
Of course, this introduces the possibility of incomplete
or incorrect plans in that a necessary precondition may
be unsatisfied, necessitating plan repair (see the next section).
Subgoal ordering heuristic: While choosing a goal to
workon next, goals that are subgoaled-precondsare preferred over those that are not.
For example, Figure 5 illustrates an exampleproblem and
the solution generatedusing the aboveheuri sties with a set of
learned operators that still are incorrect and incomplete,and
*applicable-threshold* set to 0.7 5. This plan is compared
to a correct plan.
Plan Repair Plan repair is necessary whena planned operator fails to execute in the environmentbecause of unmet
necessary preconditions. Since there can be multiple unmet
preconditions, and since some of them may be unnecessary preconditions, it is not necessaryto generate a plan to
achieve all of them. Therefore, the plan repair mechanism
generates a plan fragmentto achieve one of them at a time.
Theplan repair algorithmis illustrated in Figure 6. In step
1, heuristics from variable bindings are used to determine
the order in which the unmet preconditions are achieved.
Anunmetprecondition of the highest priority is chosen in
~Tbe*applicable-threshold*
is currently set manually,howto
set it automatically
is a part of futurework.
338 POSTERS
From: AIPS 1994 Proceedings. Copyright 1994, AAAI (www.aaai.org). All rights reserved.
Empirical Results
The learning algorithm described in this paper has been
tested in the extended-STRIPS(Minton1988) domain and
a subset of the process planning domain(Gil1991). It also
easily learns operators in more simple domainssuch as the
blocksworld domainand the logistics domain. As already
discussed, the learned operators always have all the correct preconditions (except for negated preconditions), but
they mayalso have unnecessary preconditions that can be
removedwith more observations and practice. Therefore,
we use the percentage of unnecessary preconditions in the
learned operators to measurethe effectiveness of the learning system. Wealso measure thenumberof problems solved
with the learned operators becausethe goal of learning is to
solve problems in the domains.
The following table shows the systems performance in
the extended-strips and process planning domain. Both
the training and the practice problems in these domains
are randomly generated. In the extended-strips domain,
these problemshave up to 3. goals and solution lengths of
5 to 10 operators. In the process planning domain, these
problemshave up to 3 goals, and an average solution length
of 20. Weare currenlty performing more extensive tests
in the complete process planning domainthat has total 76
operators. In both domains,we see that the learned operators
can be used to solve the practice problemseffectively, and
the unnecessarypreconditions of the learned operators are
removedwith practice so that the operators will eventually
convergeto the correct ones.
extendedstrips
# of trainingprobs
7
# of learnedops
21
%of unnecessarypreconds 45%
# of practice probs
32
# of practice probs solved 26
%of practice probs solved 81%
%of unnecessarypreconds
(after practice)
25%
process
planning
20
12
43%
10
10
100%
38%
Related Work
There is considerable interest in learning for planning
systems. Most of this work concentrates on learning
WANG
339
From: AIPS 1994 Proceedings. Copyright 1994, AAAI (www.aaai.org). All rights reserved.
search control knowledge(Fikes, Hart, & Nilsson 1972;
Carbonell,
POSTERS
Acknowledgments
Thanksto JaimeCarbonelifor manydiscussions and insights in
this work.This manuscriptis submittedfor publicationwiththe
understanding
that the U.S.Government
is authorizedto reproduce
and distribute reprints for Governmental
purposes,notwithstanding any copyrightnotationthereon.