Learning Planning Operators by Observation and Practice Xuemei Wang

From: AIPS 1994 Proceedings. Copyright 1994, AAAI (www.aaai.org). All rights reserved.
Learning Planning Operators by Observation and Practice

Xuemei Wang*
School of ComputerScience
Carnegie Mellon University
Pittsburgh, PA15213
wxm+@cs.cmu.edu
Abstract
Theworkdescribed in this paper addresseslearning
planningoperatorsby observingexpertagents and subsequent knowledgerefinementin a learning-by-doing
paradigm.Theobservationsof the expert agent consist of: 1) the sequenceof actions being executed,
2) the state in whicheachaction is executed,and 3)
the state resulting fromthe executionof each action.
Planning operators are learned fromthese observation sequencesin an incrementalfashion utilizing a
conservativespecific-to-generalinductivegeneralization process. In order to refine the newoperators
to makethemcorrect and complete,the systemuses
the newoperatorsto solve practice problems,analyzing and learning fromthe executiontraces of the resulting solutions or executionfailures. Wedescribe
techniquesfor planningand plan repair with incorrect and incompletedomainknowledge,and for operator refinementthrougha processwhichintegrates
planning, execution, and plan repair. Our learning methodis implementedon top of the PRODIGY
architecture(Carbonell, Knoblock,&Minton1990;
Carbonell et al. 1992) and is demonstratedin the
extended-strips domain(Minton
1988)and a subset
the processplanningdomain(Gil1991).
The Challenge
There is considerable interest in learning for planning
systems. Most of this work concentrates on learning
search control knowledge(Fikes, Hart, &Nilsson 1972;
Mitchell, Utgoff, & Banerji 1983; Minton 1988; Veloso
1992). Previous work on acquiring planning operators refines incomplete operators by active experimentation(Gil
1992). This paper describes a frameworkfor learning planning operators by observing expert agents and subsequent
knowledgerefinement in a learning-by-doingparadigm(Anzai & Simon 1979).
*Thisresearch is sponsoredby the WrightLaboratory,Aeronautical SystemsCenter, Air Force Materiel Command,
USAF,
and the Advanced
ResearchProjects Agency(ARPA)
under grant
numberF33615-93-1-1330.
Viewsand conclusions contained in
this document
are thoseof the authorsandshouldnot be interpreted
as necessarilyrepresentingofficial policiesor endorsements,
either
expressedor implied, of WrightLaboratoryor the UnitedStates
Government.
The learning systemis given at the outset the description

languagefor the domain,whichincludes the types of objects
and the predicates that describe states and operators. The
observations of an expert agent consist of: 1) the sequence
of actions being executed, 2) the state in whicheach action
is executed, and 3) the state resulting from the execution
of each action. Our planning system uses STRIPS-like
operators(Fikes &Nilsson 1971) and the goal of this work
is to learn the preconditionsand the effects of the operators.
Operators are learned from these observation sequences in
an incrementalfashion utilizing a conservative specific-togeneral inductive generalization process. In order to refine
the newoperators to makethem correct and complete, the
system uses themto solve practice problems, analyzing and
learning fromthe executiontraces of the resulting solutions.
Learning by observation and by practice provides a new
point in the space of approaches for problemsolving and
learning. Thereare manydifficulties for such integration of
learning and planning, to namea few: there are manyirrelevant features in the state whenthe operators are applied;
the planner needs to plan with the newlyacquired, possibly
incomplete and incorrect operators; plan repair is necessary whenthe initial plan cannot be completedbecause of
someunmetnecessary preconditions in the operators in the
plan; planning and execution must be interleaved and integrated. This paper describe our approach to address these
problems. Techniques for planning and plan repair with
partially incorrect and incomplete domainknowledgeare
developed, and operators are refined during an integration
of planning, execution, and plan repair. The PRODIGY
architecture(Carbonell, Knoblock, &Minton1990; Carboneli
et al. 1992)is the test-bed for our learning method.
Our learning method applies when the domains can be
modelledwith discrete actions, the states are observable,
the description languagefor representing the states and the
operators (i.e. the set of predicates) is known.In many
cases, such as in the process planning domain(Gil 1991),
learning the operator representations is a difficult and important task. Our methodcan significantly reduce the effort
of knowledgeengineering in such cases i
mNote
that the term"observation"refers to a formof input to
the learningsystem.It cancomefromvisual or audioinput, or it
canbe input to the systemtypedin bya person,or it canbe input
WANG 335
Learning
Planning Operators
Architecture
- the
The Learning
Thearchitecture for learning by observation and practice in

planning include the observation module, the learning module, the planning module, and the plan execution module.
Figure 1 illustrates the functionality of each moduleand the
information transferred amongthese modules.
~Xge~n~
- actio~v
:~ ob.~er~atio~"
Environment.__._.(
pla~EodX
ueC~tlon
)
lentatwe
execul ion traces
plnn.
of
the
operators
Figure 1: Overviewof the data-flow amongdifferent modulesin

the learning system.Theobservationmoduleprovidesthe learning modulewith observationsof the expert agent. The learning
moduleformulatesandrefines operatorsin the domain,and these
operators are used by the planningmodule.The planningmodule
is essentially the PRODIGY
plannermodifiedto use someheuristics for planningwithincompleteand incorrect operatorsas well
as for plan repair, it generatestentative plansto solve practice
problems.Theplan executionmoduleexecutesthe plans, providing the learningmodulewiththe executiontraces, and passingthe
plan failures to the planningmodule
for plan repair.
Module
The learning moduleformulates and refines the representations of the operators in the domainfrom the observations
of the expert agent and the execution traces of the systems ownactions. Operators are learned in an incremental
fashion utilizing a conservative specific-to-general inductive generalization process and a set of domain-independent
heuristics. At any point of time duringlearning, any learned
operator alwayshas all the correct preconditions,but it may
also have someunnecessary preconditions. These unnecessary preconditions can be removedwith more observations
and practice. The learning modulealso learns heuristics for
subgoal ordering from the observed episodes of the expert
agent.
Learning Operators from Observations Given an observation, if the action is observedfor the first time, we
create the correspondingoperator such that its precondition
is the parameterizedpre-state, and its effect is the parameterized delta-state. The type for each variable in the operator is the most specific type in the type hierarchythat the
correspondingparameteriz~,.d object has2. This is the most
conservative constant-to-variable generalization. Generalizing the observation in Figure 2 yields the operator shown
in Figure3.
Pre-.r tat:
(connects dr 13 rml rm3) (connects dr 13 rm3rm
(dr-to-rm drl 3 rml ) (dr-to-rm drl3 rm3)
(inroom robot rml ) (inroom box I rm3) (pu~ablo box
(dr-clo.c@..ddrl 3) (unlockeddrl 3) (arm-empty)
Delta-state: add: (next-to robot drl3)
Figure2: Anobservationof the state beforeandafter the application of the operator GOTO-DR
This work makes the following assumptions:

1. Theoperators and the states are deterministic. If the same
operator with the same bindings is applied in the same
states, the resulting states are the same.
2. Noise-free sensors. Thereare no errors in the states.
3. The preconditions of operators are conjunctive predicates.
4. Everythingin the state is observable.
5. There are no conditional effects.
The Observation
Module
The observation modulemonitors the execution traces of the

expert agents. Anobservation of the other agent consists
of the nameof the operator being executed, the state before
the operator is executed,called pre-state, and the state after
the operator is executed, called post-state. Computingthe
difference betweenthe post-state and pre-state, we obtain a
delta-state. Anepisode is a sequenceof observations.
readin froma caselibrary.
336
POSTERS
(operator guto-dr
(preconds ((<v ! > door) (<v2>object) (<v3> room)(<v4>
(and (inrcom robot <v4>)
(connects <v I <v3><v4>)
(connects <v I <v4><v3>)
(dr-to-rm <v I <v3>)
(dr-to-tin <v I <v4>)
(unlocked<v I >)
(dr-cloud <v 1 >)
(arm-empty)
(inroom <v2> <v3>)
(pushable <v2>)))
(effects nil ((add (next-to robot <v ! >)))))
Figure 3: Learnedoperator GOTO-DR

whenthe observation in
Figure2 is givento the learningmodule
Given a new observation, whenthe system already has
an operator representation, the preconditionsare generalized
by removingfacts that are not present in the pre-state of the
2Objects that do not changefrom problemto problemare
treated as constantsand are not parameterized,such as the object ROBOT
in the extended-stripsdomain.
newobservation3, the effects are augmentedby adding facts
rying out the observedactions based on the "rational agent"
that are in the delta-state of the observation. For example,
assumption (Newell 1982). Inferring what the top-level
given the observation in Figure 4, the operator GOTO-DR
is
goals are and howthe preconditions of each operator in the
refined: (dr-closed <vl>) is removedfrom the preconditions,
observations is achieved provides the systemheuristics for
and the effects become:
planning. For example, the system can prefer to subgoal on
(effects ((<vS>Object))
the
preconditions of an operator that were subgoaled on by
(add(next-torobot<vl>))(del (next-torobot<v5>)))
the observedagent.
The system infers howthe preconditions of each operator
Pre-sMfe:
are achieved by creating an InfeTred Subgoal Graph(ISG).
(connectsdr ! 2 rm2rml) (connectsdrl 2 rmI rm2)
An ISG is a directed acyclic graph (DAG)whosenodes are
(dr-to-rmdrl 2 rml) (dr-to-rmdr12 rm2)
either operators or goals. Anedge in an ISGconnects an
(next-torobota) (dr-opendrl 2) (unlocked
drl
(inroomrobotrmI ) (inmom
c rm2)(iaroomb rm
operator to the goal it achieves, or connects a goal to the
(inroomarml) (pushablec) (pushableb) (ann-empty)
operator that requires it as a precondition. The root of an
ISGis the virtual operator *finish* that is connectedto the
Delta-state:add:(next-torobotdrl2)del: (next-torobot
partially inferred goals.
Figure 4: The secondobservation of the operator GOTO-DR
The systeminitializes the ISGby setting its root to the
virtual operator *finish*. The last operator in the episode
must be achieving sometop-level goals. Since everything
Refining Operators with Practice Because of the incorin the delta-state of the observationcan be a potential toprectness and incompleteness of the learned operators, the
level goal, and since it is difficult to distinguish the true
plans produced by the planning modulemay be incorrect.
top-level goals from subgoals or side effects, the complete
This will be explained in section. While executing these
delta-state is presumedto be part of the top-level goals. A
plans in the environment,one of the following two outcomes
node is created for these goals and is connectedto the root
is possible for each operator:
of the ISG, i.e. to operator *finish*. A node is also cre(1) The operator is executedif the state changespredicted
ated for the operator in the observation, and is connectedto
by the effects of the operator happenafter the operator exethe top-level goals it achieves. The systemthen infers how
cution. In this case, the executionis treated as an observaeach precondition of the operator is achieved. A precontion and the preconditins and the effects of the operator are
dition is inferred to be achievedby an observation if it is
modifiedas described in the previous sectione.
addedin the delta-state of the observation. Thenthe system
(2) The operator is un-executed if the state does not
recursively processes the episode and goes backwardsthe
change as predicted because of unmet necessary preconepisode, determines howthe preconditions of the operator
ditions of the operator in the environment. Whenan opare achieved. In an ISG, the preconditions of an operaerator is un-executed, the learning modulecomputes the
tor that are connected to someoperator that achieves them
unmetpreconditions of the operator. If there is only one
are considered to have been subgoaled on by the observed
such precondition, we mark this precondition as a necesagent. These preconditions are called subgoaled-preconds
sary precondition of the operator. Otherwise, none of the
of the operator.
preconditions is markedas necessarybecause of insufficient
information. Whenall the preconditins of an operator are
The Planning Module
markedas necessary, the system has learned the correct
The operators acquired by the learning moduleare themrepresentaionof the operator.
selves incorrect or incomplete. To plan with such opLearning Negated Preconditions The learned operators
erators, PRODIGYs
search algorithm is augmented with
are used to solve practice problems. During practice, if
somedomain-independentheuristics to generate plans that
are possibly incompleteor incorrect.
an operator is un-executed even whenall the preconditions
of the operators are satisfied, the learning modulediscovers
Givena practice problem, the systemfirst generates an
tentative plan. Then the plan is executed in a simulated
that somenegated preconditions are missing. The negations
environmentA, and repaired if someoperators in the plan
of all the predicatesthat are true in the current state whenthe
operator is un-executed,but are not true in the observations
cannot be executed because of unmet necessary preconditions in the environment. The following sections describe
of this operator are conjected as potential negated preconditions, and are addedto the preconditions of the operator.
our approach to plan with incorrect and incomplete operaSomeof the added negated preconditions maybe unnecestors and to repair a failed plan.
sary, and they will be removedin the same wayas all other
Planning with Incomplete or Incorrect DomainKnowlpreconditions with more observations and practices.
edge PRODIGYs
standard planner assumes a correct ac-
Inferring Subgoal Structure of the Observed Agent

Whenthe system observes an episode, it assumes that the
observed agent is achieving its owntop-level goals by car3 Aprecondition
is removed
iff the predicateof the precondition
is not presentin the newpre-state.
tion model. But before our learner acquires a complete

model of the domain, the operators can be incomplete or
incorrect in the followingways: (a) over-specific precondi4Thesimulatedenvironmentis implemented
using a complete
andcorrect modelof the domain.
WANG
337
tions: whena learned operator has unnecessary preconditions. In this case, the systemhas to achieveits unnecessary
preconditions during planning. Not only does this force
the system to do unnecessarysearch, but also it can make
manysolvable problems unsolvable because it maynot be
possible to achieve some unnecessary preconditions. (b)
incomplete effects: when someeffects of an operator are
not learned because they are present in the pre-state of the
operator. While searching for a plan, PRODIGY
applies in
its internal modelan operator that is missingsomeeffects,
the state becomesincorrect.
Theseproblemsmakeit very difficult to generate a correct plan for a problem. Therefore, we relax the assumption that all the plans generatedby the planner are correct.
over-specific preconditions prevents the system from generating any plan at all, and are dealt with by using applicable
threshold heuristic and subgoalordering heuristic to generate plans that includes operators with unmetpreconditions.
incompleteeffects is not dealt with explicitly in our planner.
Whenan operator cannot be executed because of unmetnecessary preconditions, the plan repair mechanism
is invoked
to generate a modified plan. PRODIGYs
basic search algorithm is modified by using the following two heuristics
to generate plans that are possibly incompleteor incorrect:
Applicable threshold heuristic: While computing the
applicable operators, a threshold variable *applicablethreshold* is introduced such that an operator is considered applicable in the planners internal modelduring
planning whenall the subgoaled-precondsand the necessary precondsare satisfied, and whenthe fraction of the
satisfied preconditions exceeds *applicable-threshold*.
Of course, this introduces the possibility of incomplete
or incorrect plans in that a necessary precondition may
be unsatisfied, necessitating plan repair (see the next section).
Subgoal ordering heuristic: While choosing a goal to
workon next, goals that are subgoaled-precondsare preferred over those that are not.
For example, Figure 5 illustrates an exampleproblem and
the solution generatedusing the aboveheuri sties with a set of
learned operators that still are incorrect and incomplete,and
*applicable-threshold* set to 0.7 5. This plan is compared
to a correct plan.
Plan Repair Plan repair is necessary whena planned operator fails to execute in the environmentbecause of unmet
necessary preconditions. Since there can be multiple unmet
preconditions, and since some of them may be unnecessary preconditions, it is not necessaryto generate a plan to
achieve all of them. Therefore, the plan repair mechanism
generates a plan fragmentto achieve one of them at a time.
Theplan repair algorithmis illustrated in Figure 6. In step
1, heuristics from variable bindings are used to determine
the order in which the unmet preconditions are achieved.
Anunmetprecondition of the highest priority is chosen in
~Tbe*applicable-threshold*
is currently set manually,howto
set it automatically
is a part of futurework.
338 POSTERS
(a) Initial State:

(connectsdr ! 2 rmI rm2)(dr-to-tindr 12rm2)
(connects
drl 2 rm2rmI ) (dr-to-rmdrl 2 rm
(unlockeddrl 2) (dr-opendrl 2) (inroomrobotrm2)(arm-empty)
finrooma rm2)(inroomb rm2)(pushablec) (inroomc
(inroom
key12rmI ) (carriablkey12) (is-keydr 12key
(b) Goal Statement:
(and(im~om
key 12rmI ) (inroomc rm2))
Incorrect and incomplete plan generated by the planner

with the aboveheuristics:
<goto-drdrl 2rm2rml>
<go-thru-drrm2rrnl
c drl2>
<cmry-thru-drrml
c rm2b c drl2>
A Correct plan that can be executed in the environment:

<goto-drdr 12tin2 rml>
<go-thru-drrm2rml c drl2>
<push-to-drdrl2 e keyl2e b rml rm2>
<push-thru-drdrl
2 e rmI rm2key 12 b>
Figure 5: A SampleProblem.The incorrect and incompleteplan

generatedby the plannerusing the aboveheudstics is compared
witha correctplan that canbe executedin the environment.
step 2 as the goal to achieve. If a goal is chosen,the system
attempts to achieve it using planning heuristics described
previously. Whena plan is thus generated, it is addedto
the initial plan to form the repaired plan. Whenthe goal
cannot be achieved, it is rememberedas an unachievable
precondition, and the systemloops back to step 2 to achieve
the next unmetprecondition. Whenthe failed operator still
cannot be executed in the environmentafter the systemhas
attempted to achieve each unmetprecondition in the order
determined in step 1, the system propose another operator to achieve the goal by using the one step dependency
directed plan repair heuristics: while generating bindings
for the failed operator, bindings leading to unachievable
preconditionsare rejected.
Input:inst-opthat failedto execute,top-goab,state
Output:repaired-plan
I. Decidethe order in whichthe unmetpreconditionsof inst-op
are to be achieved
2. Choosethe highestpriority unmetpreconditionas the goal g to
achieve
3. If g is chosen,do:
3.1. Generatea plan to achieveg fromstate
3.2. If the planis not empty,it is addedto the initial planto
formrepaired-plan.Returnrepaired-plan.
3.3. Otherwise,goto2
4. If g is empty,generatea planto achievetop goals,but avoidthe
operators with unachievablepreconditions.
Figure 6: ThePlan Repair Process
The Plan Execution Module
The plan execution process is integrated with plan repair.
For each operator in the initial plan, the plan execution
moduletests it in the environment.If it is executed, then

the learning modulerefines the operator according to the
states before and after the execution. If the operator is not
executed, plan repair mechanismis invoked to generated a
repaired plan and the execution continues. This process is
illustrated in Figure7.
Input: plan, goals, state
Output: Whethergoals are achievedor not

Foreachinstantiatedoperatorinst-opin plan, do:
I. Testif inst-opis applicablein the environment
or not
2. If it inst-opis applicable,thendo:
2. I. state,--- applyinst-opin state
2.2. Modifythe operator with execution
2.3. Checkif goals are achieved.If so, terminate
3. If inst-op is not executedbecauseofunmetpreconditions,then
do:
3.1. Modifythe operator with the un-executed

3.2. new-plan
,-- Repair-plan
(inst-op, goals,state)
3.3. If new-plan
is empty,terminateexecute-planwithfailure
3.4. Otherwise,abandon
the initial planandrestart execution:
execute-plan(new-plangoals state).
Figure7: Integration of Plan Executionand Plan Repair.
Examples
Lets trace howthe problemin Figure 5 is solved whenthe
systemhas learned a set of operators that are inco~ect or
incomplete. The initial plan is shownin Figure 5. While
executing the plan, the first two steps, i.e. <gote-dr drl2
rm2 rml> and <go-thru-dr rm2 rml c drl2> are executed in
the environment.<carry-thrn-dr rml c rm2b c drl2> is not
because of unmetpreconditions. Therefore, the plan repair
procedure is called. The unmetpreconditions of this operator are: (holdingc) (cardablec) (pushableb), and the planner
decides to first achieve (holding c) using the variable bindings heuristics. The following two steps are added to the
initial plan throughplan repair:
<goto-obj drl2 c rml rm2 rm2 rml drl2>
<pickup-obj
c keyl2 c rml rml drl2>
<goto-obj drl2 c rml rm2 rm2 rml drl2> is executed, but

<pickup-obj c keyl2 c rml rml drl2> fails. The only unmetpreconditionof this operators is (carriable c), therefore,
it is markedas a necessary precondition of the operator by
the learning module:
Marking (carriable <v140>)
as a necessary precondition
of the operator PICKUP-OBJ.
Since (carriable c) cannot be achieved by the planner.

<carry-thin-drrml c rm2b c drl2>thus fails, and its unachievable precondition is (holding c). The planner generates
newplan to achieve the top-level goals (inroomkeyl2 rml)
and (inroom c rm2) using one step dependencydirected plan
repair heuristics while avoiding the operators with the unachievableprecotutition (holding c). The newplan is:
<push-thru-dr
drl2 c rml rm2 keyl2 c b>
<push-thru-drdrl2 c rml rm2keyl2 c b> fails to execute, and

the unmetpreconditions are ((next-to robot drl2) (next-to
drl2) (inroomkeyl2 rm2)). Anewplan step <push-to-drdrl2
keyl2 c b rml rm2>is generated to achieve the precondition
(next-to robot drl2), andit is executed.Finally, <push-thin-dr
drl2 c rml rm2keyl2 c b> is also executed. The learning
modulenotices that the precondition (inroom<v 184><v 178>
is not satisfied in the state when<push-thin-dr drl2 c rml
rm2 keyl2 c b> is executed, therefore, is an unnecessary
precondition of the operator PUSH-THRU-DR:
Removing unnecessary precondition
(inroom <v184> <v178>)
from the operator PUSH-THRU-DR.
At this point, the systemhas achieved all the top-level

goals.
Empirical Results
The learning algorithm described in this paper has been
tested in the extended-STRIPS(Minton1988) domain and
a subset of the process planning domain(Gil1991). It also
easily learns operators in more simple domainssuch as the
blocksworld domainand the logistics domain. As already
discussed, the learned operators always have all the correct preconditions (except for negated preconditions), but
they mayalso have unnecessary preconditions that can be
removedwith more observations and practice. Therefore,
we use the percentage of unnecessary preconditions in the
learned operators to measurethe effectiveness of the learning system. Wealso measure thenumberof problems solved
with the learned operators becausethe goal of learning is to
solve problems in the domains.
The following table shows the systems performance in
the extended-strips and process planning domain. Both
the training and the practice problems in these domains
are randomly generated. In the extended-strips domain,
these problemshave up to 3. goals and solution lengths of
5 to 10 operators. In the process planning domain, these
problemshave up to 3 goals, and an average solution length
of 20. Weare currenlty performing more extensive tests
in the complete process planning domainthat has total 76
operators. In both domains,we see that the learned operators
can be used to solve the practice problemseffectively, and
the unnecessarypreconditions of the learned operators are
removedwith practice so that the operators will eventually
convergeto the correct ones.
extendedstrips
# of trainingprobs
7
# of learnedops
21
%of unnecessarypreconds 45%
# of practice probs
32
# of practice probs solved 26
%of practice probs solved 81%
%of unnecessarypreconds
(after practice)
25%
process
planning
20
12
43%
10
10
100%
38%
Related Work
There is considerable interest in learning for planning
systems. Most of this work concentrates on learning
WANG
339
search control knowledge(Fikes, Hart, & Nilsson 1972;
Carbonell,
Mitchell, Utgoff, & Banerji 1983; Minton 1988; Veloso

1992). EXPO(Gil1992) is a learning-by-experimentation
modulefor refining incomplete planning operators. Learning is triggered whenplan execution monitoring detects a
divergence betweeninternal expectations and external observations. Our systemdiffers in that 1) Theinitial knowledge given to the two systems is different. EXPO
starts
with a set of operators that are missing somepreconditions
and effects. Our system starts with no knowledgeabout the
preconditions or the effects of the operators. 2) EXPO
only
copes with incompletedomalnknowledge, i.e. over-general
preconditions and incompleteeffects of the operators, while
our systemalso copes with incorrect domainknowledge,i.e.
operators with over-specific preconditions. 3) Operatorsare
learned from general-to-specific in EXPO,while they are
learned from specific-to-general in our system.
LIVE(Shen 1989) is a system that learns and discovers
from environment.It integrates action, exploration, experimentation, learning, problem solving. It also constructs
newterm, that our system does not deal with. It does not
learn fromobservingothers, therefore it cannot learn search
heuristics for planning. The preconditions of the operators in LIVEare learned mostly from general to specific,
although over-specific preconditions can be eliminated using complementarydiscrimination learning algorithm. Our
system deals explicitly with incomplete and incorrect operators by using a set of domain-independent
heuristics for
planning and plan repair, whereasLIVEdependson a set of
domain-dependentsearch heuristics.
On the planning side, some planning systems do plan
repair (Simmons 1988; Wilkins 1988; Hammond1989;
Kambhampati1990). However,all these plan repair systems use a correct domain model. In our system, on the
other hand, a correct domainmodelis not available, therefore it is necessaryto plan and repair plan using incomplete
and incorrect domainknowledge. Our plan repair method
gives feedbackto learning module,this is not the case for
other plan repairs systems.
Future Workand Conclusions

Future workincludes relaxing someof the assumptions described in this paper. For example,if there is noise in the
sensor, we can have a threshold for each precondition so
that a predicate is removedonly whenit is absent from the
state moreoften than the threshold.
In conclusion, we have described a frameworkfor learning planning operators by observation and practice and have
demonstratedits effectiveness in several domains. In our
framework,operators are learned from specific-to-general
throughan integration of plan execution as well as planning
and plan repair with possibly incorrect operators.
References
Anzai, Y., and Simon, H. A. 1979. The Theoryof Learning
by Doing. Psychological Review 86:124-140.
340
POSTERS
J.; J.Blythe; O.Etzioni; Oil, Y.; Joseph, R.;

Kahn, D.; Knoblock,C.; Minton, S.; P~rez, M. A.; Reily,
S.; Veloso, M.; and Wang, X. 1992. PRODIGY
4.0: The
Manualand Tutorial. Technical report, School of Computer Science, Carnegie MellonUniversity.
Carbonell, J. G.; Knoblock,C. A.; and Minton, S. 1990.
PRODIGY:
An Integrated Architecture for Planning and
Learning. In VanLehn,K., ed, Architectures for Intelligence. Hillsdale, NJ: Erlbaum.
Fikes, R. E., and Nilsson, N. J. 1971. STRIPS:A NewApproach to the Application of TheoremProving to Problem
Solving. Artificial Intelligence 2(3,4):189-208.
Fikes, R. E.; Hart, P. E.; and Nilsson, N. J. 1972. Learning
and ExecutingGeneralizedRobotPlans. Artificial Intelligence 3:251-288.
Gil, Y. 1991. A Specificication of Manufacturing Processes for Planning. Technical Report CMU-CS-91-179,
School of ComputerScience, Carnegie MellonUniversity,
Pittsburgh, PA.
Gil, Y. ! 992. Acquiring DomainKnowledgefor Planning
by Experimentation. Ph.D. Dissertation, School of Computer Science, CarnegieMellonUniversity, Pittsburgh, PA.
Hammond,K. 1989. Cased-Based Planning: Viewing
Planning as a Mentor), Task. AcademicPress.
Kambhampati,S. 1990. A Theory of Plan Modification.
In Proceedingsof the Eighth National Conferenceon Artificial Intelligence.
Minton, S. 1988. Learning Search Control Knowledge:
An Explanation-BasedApproach. Kluwer Academic Publishers.
Mitchell, T.; Utgoff, P.; and Banerji, R. 1983. Learning
by Experimentation: Acquiring and Refining Problemsolving Heuristic. In MachineLearning, An Artificial Intelligence Approach,Volume1. Palo Alto, CA:Tioga Press.
Newell, A. 1982. The KnowledgeLevel. Artificial Intelligence 18(I ):87-127.
Shen, W. 1989. Learning from EnvironmentBased on Percepts" andActions. Ph.D. Dissertation, Schoolof Computer
Science, CarnegieMellonUniversity, Pittsburgh, PA.
Simmons, R. 1988. A Theory of Debugging Plans and
Interpretations. In Proceedingsof the Sixth National Conference on Artificial Intelligence.
Veloso, M. 1992. Learning by Analogical Reasoning
in General ProblemSolving. Ph.D. Dissertation, School
of ComputerScience, Carnegie MellonUniversity, Pittsburgh, PA.
Wilkins, D. 1988. Practical Planning: Extending the
Classical AI Planning Paradigm. MorganKaufmann.
Acknowledgments
Thanksto JaimeCarbonelifor manydiscussions and insights in
this work.This manuscriptis submittedfor publicationwiththe
understanding
that the U.S.Government
is authorizedto reproduce
and distribute reprints for Governmental
purposes,notwithstanding any copyrightnotationthereon.

Learning Planning Operators by Observation and Practice Xuemei Wang

Uploaded by

Copyright:

Available Formats

You might also like

Learning Planning Operators by Observation and Practice Xuemei Wang

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Learning Planning Operators by Observation and Practice Xuemei Wang

Uploaded by

Copyright:

Available Formats

From: AIPS 1994 Proceedings. Copyright 1994, AAAI (www.aaai.org). All rights reserved.

Learning Planning Operators by Observation and Practice

The learning systemis given at the outset the description

Thearchitecture for learning by observation and practice in

execul ion traces

Figure 1: Overviewof the data-flow amongdifferent modulesin

This work makes the following assumptions:

The observation modulemonitors the execution traces of the

Figure 3: Learnedoperator GOTO-DR

Inferring Subgoal Structure of the Observed Agent

tion model. But before our learner acquires a complete

(a) Initial State:

Incorrect and incomplete plan generated by the planner

A Correct plan that can be executed in the environment:

Figure 5: A SampleProblem.The incorrect and incompleteplan

moduletests it in the environment.If it is executed, then

Output: Whethergoals are achievedor not

3.1. Modifythe operator with the un-executed

<goto-obj drl2 c rml rm2 rm2 rml drl2> is executed, but

Since (carriable c) cannot be achieved by the planner.

drl2 c rml rm2 keyl2 c b>

<push-thru-drdrl2 c rml rm2keyl2 c b> fails to execute, and

At this point, the systemhas achieved all the top-level

Mitchell, Utgoff, & Banerji 1983; Minton 1988; Veloso

Future Workand Conclusions

J.; J.Blythe; O.Etzioni; Oil, Y.; Joseph, R.;

You might also like