RP On Claims 4

Advanced Engineering Informatics 16 (2002) 265–275
www.elsevier.com/locate/aei
Learning in multi-agent systems: a case study of construction

claims negotiation
Z. Ren*, C.J. Anumba
Centre for Innovative Construction Engineering (CICE), Loughborough University, Loughborough LE11 3TU, UK
Received 13 December 2002; revised 3 April 2003; accepted 19 April 2003
Abstract
The ability of agents to learn is of growing importance in multi-agent systems. It is considered essential to improve the quality of peer to
peer negotiation in these systems. This paper reviews various aspects of agent learning, and presents the particular learning approach—
Bayesian learning—adopted in the MASCOT system (multi-agent system for construction claims negotiation). The core objective of the
MASCOT system is to facilitate construction claims negotiation among different project participants. Agent learning is an integral part of the
negotiation mechanism. The paper demonstrates that the ability to learn greatly enhances agents’ negotiation power, and speeds up the rate of
convergence between agents. In this case, learning is essential for the success of peer to peer agent negotiation systems.
q 2003 Elsevier Ltd. All rights reserved.
Keywords: Bayesian learning; Multi-agent systems; Negotiation
1. Introduction mechanism adopted, the integrative approach of the

learning and negotiation mechanism, and the development
Negotiations in industries are often inefficient due to the methodology. Details of the nature and problems in
diversity of intellectual backgrounds of the negotiating construction claims negotiation have been addressed in
parties, the many variables involved, the complex inter- other publications [15 –17]. This paper first examines the
actions, and the inadequate negotiation knowledge of various aspects of agent learning, then describes the
project participants. Multi-agent systems (MAS) offer an Bayesian learning approach adopted in the MASCOT
innovative approach towards reducing the tremendous time system, and presents an example showing how the learning
and human resources invested in negotiations since they are approach is implemented. Finally it draws conclusions and
particularly suitable for resolving fragmented problems. makes recommendations for the further development of
Although various agent negotiation mechanisms [3,5,10,11] agent learning in construction.
have been developed, agents’ abilities in dealing with the
changing environments during negotiation are still very
limited and need to be studied further. This is particularly 2. Learning in multi-agent systems
important for negotiations involving complex and dynamic
industrial problems. In this regard, the ability of agents to A major problem in the development of multi-agent
learn from their previous interactions with other agents or systems is the difficulty of foreseeing all the potential
their environment is very important. situations that an agent could encounter and specifying
This paper presents an agent learning approach inte- agent behaviour optimally in advance. Therefore, it is
grated in a multi-agent system for construction claims widely recognised that one of the most important features of
negotiation. Although this system is particularly designed agents is their ability to adapt, to learn, and to modify their
for construction claims negotiation, it could also be useful behaviours. Learning in MAS can be defined operationally
for other industrial problems in terms of the learning to mean the ability to perform new tasks that could not be
performed before, or to perform old tasks better as a result of
* Corresponding author. changes produced by the learning process [19]. Four
E-mail address: z.ren@lboro.ac.uk (Z. Ren). important aspects of agent learning are discussed below.
1474-0346/02/$ - see front matter q 2003 Elsevier Ltd. All rights reserved.
doi:10.1016/S1474-0346(03)00015-6
266 Z. Ren, C.J. Anumba / Advanced Engineering Informatics 16 (2002) 265–275
2.1. Motivations for learning agent’s proposal during negotiation. As beliefs are closely
related to preferences, learning can influence the preferred
As a group, agents work in an open, complex and order of an agent’s decisions in a multiple solution situation.
dynamic environment, which results from a number of Also, an agent’s knowledge acquired from the external
factors such as: environment can influence other agents to revise their
beliefs as a result of negotiation, and thus propagate the
† varying beliefs, goals, abilities, preferences, skills and external influence [14]. Furthermore, an agent can make
levels of knowledge of individual agents; inferences about the other agents’ beliefs, and analyse their
† environmental uncertainty and dynamics: agents often intentions and further negotiation strategies so that it can
work in complex environments; it is impossible to define determine its negotiation strategies accordingly. This is
all conditions before the systems start to work. particularly important for any negotiation between agents.
Furthermore, agents may exist in environments which Generally, the key negotiation features such as an agent’s
often vary over time; utility function and risk attitude represent the agent’s
† complexity of interactions between agents: an agent’s beliefs.
activities might be influenced by other agents, or lead to a
change in the other agents’ decisions;
† pool of solutions: each agent has a number of planning 2.2.2. Learning negotiation strategies
options available. The acceptability of these solutions are Depending on different negotiating situations, a
different; potential optimum plans need to be selected; negotiating agent not only needs to understand other
and participants’ beliefs and intentions, but may also need to
† time stress: the time for decision-making is not infinite; know their negotiation strategies. In the situation when
especially in real time systems, the response time is vital. all the parties’ objectives are relatively apparent,
negotiation results often depend on the negotiation
Agents in such a system face uncertainties due to their
strategies adopted by the negotiating agents. However,
partial views of other agents or the environment. Incomplete
understanding an opponent’s negotiation strategies is one
information about the progress, characteristics, expec-
of the most difficult problems even if an agent has some
tations, or preferences of other agents, and generated partial
knowledge about the opponent’s beliefs. By analysing the
results may lead to global incoherence and degradation in
difference between the predicted response and the actual
system performance [19]. To effectively utilise opportu-
response, an agent can adjust its perceptions about an
nities, agents need to learn about other agents or to adapt
opponent’s strategies. Furthermore, the results of nego-
their local behaviour based on group composition and
dynamics. tiation could be used to evaluate the quality of the
specific negotiation strategy that an agent adopts. It could
2.2. Objectives of learning reinforce the agent’s drive to use the same sequence of
actions in a similar situation or, alternatively, it could
The learning objectives are highly dependent on the weaken that drive. More generally, a negotiation history
application domains and the goals of each individual agent. can be used to extract and to compile particularly useful
Agents can either learn for their group benefit (i.e. agents sequences of negotiation actions. Agents may recognise
collectively pursue a common learning goal), or learn for the applicability of a strategy in situations where the
their individual benefit (i.e. each agent pursues its own existing commitments prevent the execution of the initial
learning goals). Therefore, agent learning objectives and the steps of that strategy. Otherwise, the strategy might
approaches adopted are often very different. The things prove to be inefficient if used continuously during
which an agent expects to learn from others could be the negotiation. In addition, an agent’s observation of the
preferences, utility functions, risk attitudes, tasks, strategies, other agents’ actions, guided by its own strategies, may
actions or plans, specific domain knowledge, prediction of eventually lead to the synthesis of new strategies.
decisions, and types of conflicts. For example, in a MAS Besides the above learning objectives, the learning
negotiation domain, an agent may expect to achieve the objectives in a MAS negotiation system could vary
following objectives through learning: depending on different application domains and issues
such as the essential features of negotiation items, the
2.2.1. Changing own beliefs and learning about others’ opponent’s domain knowledge, or other environmental
beliefs considerations. For example, an agent may learn to
Agents can hold different beliefs about the same fact. By recognise and classify different types of conflict. This
exchanging knowledge and information during nego- implies learning about the context leading to conflicts and
tiations, an agent can change its beliefs if the new belief learning the characteristics of conflicts. The first factor is
is supported by more concrete evidence. The change of important for avoiding conflicts, the second one for taking
beliefs through learning can lead to significant changes to an negotiation decisions.
Z. Ren, C.J. Anumba / Advanced Engineering Informatics 16 (2002) 265–275 267
2.3. Key elements in learning 2.3.3. Evaluation criteria

Evaluation criteria are closely related to the expectations
Learning in a multi-agent environment could be a very of the learning agent. A basic problem that any learning
complex process, considering that the environment changes system is confronted with is how the agent evaluates the
continuously, agents learn mutually, and agents’ actions and feedback from the others as a response to the agent’s last
strategies are often not directly observable. However, no decisions or actions. In a more general sense, it is a problem
matter how complex the learning process is, there are of properly assigning credit for overall performance
always several essential components of learning on which changes to each of the system activities that contributed to
agents build their inference, such as agents’ expectations, those changes [19]. The selection of performance criteria
feedback information, and credit assignment, to evaluate the shapes the direction in which the system will evolve. The
feedback information and decision-making. evaluation problem can be usefully decomposed into two
sub-problems: the assignment of credit for an overall
performance change to external actions, and the assignment
2.3.1. Expectations
of credit for an action to the corresponding internal
Expectations are the bases for agent learning. They decisions.
represent an agent’s beliefs that events will occur in a pre-
defined way. Expectations encode the agent’s current
knowledge of an event and the global environment in 2.4. Methods of learning
which it operates, and represent the basis for action in a
partially observable and partially computable world [7]. The The approach to learning is the most essential problem
expectations of an agent guide its decision-making. In a for any MAS learning problem. Researchers such as Alonso
negotiation domain, an agent’s expectations determine what et al. [1], Carbonell [4], Weiss [19], Winston [20], and
and how much the agent expects to get from the others, or Grecu and Brown [6,7] have studied various agent learning
what the others will do. On the other hand, an agent’s methods from different perspectives. Most of these are
expectations are limited in its anticipatory power due to the extended from machine learning methods (e.g. learn by
constraints imposed on perceiving the other agents and the analysing differences, explaining experience, correcting
environment. An agent may fail to perform a task planned mistakes, recording cases, managing multiple models,
based on its expectations because an event does not occur or building identification trees, training neural nets, training
because other agents respond in an unexpected manner. This perceptions, training approximation nets, and simulating
means the agent’s expectations are violated, either in a good evolution [20]).
or a bad sense. The violation of the expectation indicates Weiss [19] discusses Mitchell’s [12] machine learning
that the agent’s knowledge about other agents, events, or approaches in multi-agent systems. Here, agents’ learning
environment is limited. If not noticed and taken into can be analysed based on the learning method and feedback.
consideration, the agent will repeatedly fail in outlining and According to the learning method or strategy, the following
implementing its actions. A continuous learning process methods are usually distinguished, where the learning effort
makes it possible for an agent to modify its expectations to increases from top to bottom.
be more realistic during the negotiation process.
† Rote learning: direct implantation of knowledge and
skills without requiring further inference or transform-
2.3.2. Feedback ation from the learner;
The availability of information is a primary requirement † Learning from instruction and by advice taking:
for the learning process. Sound and unbiased feedback operationalisation-transformation into an internal rep-
information provides a learning agent with resources about resentation and integration with prior knowledge and
the other agents’ perceptions, the properties of events, and skills—of new information like an instruction that is not
the working environment. Feedback can originate from directly executable by the learner;
direct communication with other agents, or indirectly, † Learning from examples and by practice: extraction and
mediated by intermediary agents, or without communi- refinement of knowledge and skills like a general concept
cation, directly through the learning agent’s observations of or a standardised pattern of motion from positive and
the effects of its decisions and other agents’ actions. negative examples or from practical experience;
Feedback can be biased by the path to the receiver. † Learning by analogy: solution-preserving transformation
Feedback may also contain different information or be of knowledge and skills from a solved to similar but
from several sources, and therefore, its effect depends on unsolved problem; and
how these sources are used: filtered, independently or in a † Learning by discovery: gathering new knowledge and
combined manner. Feedback can also be affected and skills by making observations, conducting experiments,
reduced by processes such as conflicts and backtracking, or and generating and testing hypotheses or theories on the
hidden through social decision-making schemes. basis of the observational and experimental results.
According to the learning feedback, learning can be 3. Learning in the MASCOT model
distinguished as:
There are three negotiating agents in the MASCOT
† Supervised learning: the feedback specifies the desired system (Fig. 1). The contractor agent and the engineer agent
activity of the learner, and the objective of learning is to are involved in the direct claims negotiation whilst the client
match this desired action as closely as possible; agent may step into the negotiation if the negotiation
† Reinforcement learning: an action should be reinforced if
between the contractor agent and the engineer agent falls
it produces favourable results, and weakened if it
into deadlock. Negotiations are always conducted pair by
produces unfavourable results. This approach requires
pair. Each agent, on behalf of its owner, negotiates with
limited computational resources but a large number of
others to achieve its expected result. A modified monotonic
examples; and
concession protocol and the related negotiation strategies
† Unsupervised learning: no explicit feedback is pro-
were developed to suit the major characteristics of claims
vided and the objective is to find out useful and
negotiation. The focus is on the integration of Zeuthen’s
desired activities on the basis of trial and error and
self-organised processes. negotiation strategy with Bayesian updating approach
[15 –17].
In all three cases, the learning feedback is assumed to According to Zeuthen’s negotiation strategy [21], an
be provided by the system environment or the agents agent makes its decision of concession based on how much
themselves. This means that the environment or an agent it has to lose by running into conflict at that time. If an agent
providing feedback acts as a teacher in the case of has already made many concessions, it will have less to lose
supervised learning and as a ‘critic’ in the case of from a conflict, and will be less willing to concede. Thus, it
reinforcement learning; in the unsupervised learning, the has a high acceptability to risk conflict. If each agent’s
environment and the agents just act as passive ‘obser- willingness to risk conflict can be measured, the agent with
vers’. less willingness to risk will make a concession. The criteria
Furthermore, learning in multi-agent systems can also be for risk evaluation can be formulated into the following
analysed according to other criteria such as: equations [18,21]
ðthe utility agent 1 loses by conceding and accepting agent 20 s offerÞ

Riskt1 ¼
ðthe utility agent 1 loses by not conceding and causing a conflictÞ
or
† the purpose and goal of learning (i.e. to improve a single t t t t
agent’s skills and abilities, or to improve the agent Ucc 2 Uce Uee 2 Uec
Pc max ¼ t ; Pe max ¼ t ð1Þ
system’s coherence and co-ordination); Ucc 2 Uc ðCÞ Uee 2 Ue ðCÞ
† the categories of learning (i.e. only one of the agents gets
where
involved in the learning process, or all available agents
are involved); and
Pc max ðPe max Þ : the contractor’s (engineer) maximum
† an agent’s ability to learn (i.e. whether the learning
likelihood of risk acceptability;
capability is essential for an agent to achieve its goal). t t
Ucc ðUee Þ : the contractor (engineer) agent’s utility given
its offer in iteration t;
These learning paradigms emerged from different Ucet
: the contractor agent’s utility given the engineer
scientific roots, employ different computational methods, agent’s offer in iteration t;
and often rely on different ways of evaluating success. Some Ucet
: the engineer agent’s utility given the contractor
of the learning approaches are generic to any type of agent agent’s offer in iteration t;
system; some could only be applied to a particular system or Uc ðCÞ ðUe ðCÞÞ : the contractor (engineer) agent’s utility
work domain, whilst others will be used only when for a conflict deal. It is assumed to be 0 in this case.
interacting with some specific agents. Furthermore, different
agents do not necessarily adopt the same learning method or At every step, each agent calculates and compares the
the same type of learning feedback. In the course of Riskti (or Pmax ) for itself and its opponent. If agent 1’s Riskti
learning, an agent may employ different learning methods (or Pmax ) is higher than that of agent 2, agent 1 will have less
and types of learning feedback. Finally, learning can occur to lose from a conflict, and will be less willing to concede,
not only during the negotiation process, but also afterwards. and risk reaching a conflict. Therefore, agent 2 (with smaller
The negotiation history together with some evaluation risk acceptability) will make the next concession. The
techniques for the agent’s past actions can be used to concession rate should be the minimum sufficient to make
classify agents as more or less successful. its opponent’s maximum risk acceptability ðPmax Þ smaller
Fig. 1. Agent structure in MASCOT.
than or equal to its own. Otherwise, the agent will offer the Given the contractually obliged self-interested nature and
same deal as the previous one [18]. By following this the role-dependent information, the contractor and the
approach, agents will concede alternately until the maxi- engineer often adopt a number of strategies (e.g. inflating
mum risks of conflict for both parties are zero. the opening demands; misrepresenting positions or adopting
threatening behaviour) in an attempt to draw the settlement
3.1. Why learning? point from the middle towards their expected outcomes.
Learning about the opponent’s negotiation strategy and the
Unlike the negotiations in other businesses where one underlying rationale is essential for the negotiating agents.
party may simply leave a negotiation if the negotiation falls Time: an important factor. Claims negotiation is a time-
into a deadlock, nobody can easily walk away from consuming process. This not only represents the time spent
construction claims negotiation. First, claims negotiation in planning negotiation or gathering negotiators but also
participants are legally obliged by the project contract not to reflects considerable time consumed during the negotiation
walk away. Negotiations are conducted within the frame- process. A party may adopt a time-consuming strategy so as
work of the contract. Second, if a negotiation ends in to benefit from the opponent’s time pressure or emotional
conflict, the negotiating parties may be forced into an exhaustion. Moreover, claims can often be settled only
arbitration or litigation that they can barely afford. Thus, through several meetings, which allows both parties to make
both parties will try to avoid a conflict outcome. On the offers and counteroffers. Learning will enable agents to
other hand, project participants are from different organis- reduce the time spent during negotiation.
ations. Each participant will try to maximise its own benefit Stability-system requirement. Besides these factors,
as long as it does not break the co-operative relationship. another important reason for the adoption of agent learning
Therefore, it is important for each participant to learn of is that Zeuthen’s negotiation strategy might not be stable in
others’ negotiation objectives to gain more benefits and a construction claims negotiation environment because the
avoid conflicts. perfect information assumption in Zeuthen’s model is
Agent learning in the MASCOT system is also violated [16]. An essential requirement for any agent
strengthened by other factors, which include: negotiation mechanism is that the system should be stable
Role-dependent information. Due to their different roles, (i.e. no agent should have an incentive to deviate from the
participants have different perspectives on a project. For agreed-upon strategies). This is the notion of strategies in
example, the client knows clearly the final functional equilibrium. Two strategies S; T are in Nash equilibrium if,
requirements, budget and the financial status of the project assuming that one agent is using S; the other agent cannot do
whilst the engineer understands well the client’s require- better using some strategy other than T; and vice versa [13].
ments, contract documents and the contractor’s progress and If Nash equilibrium is reached, agents will have no incentive
site work. The contractor has detailed information about to deviate from the agreed negotiation strategies. However,
schedule, progress, and the circumstances that led to a Nash equilibrium is based on the complete information
claim. Furthermore, each party also has different expertise. assumption, which is not true in the claims negotiation
S/he will try to use her/his specific information and environment. To overcome this problem, the concept of the
expertise to explain, argue, and persuade the other party to Bayesian – Nash equilibrium is introduced. This equilibrium
accept her/his offer. Each party will, therefore, try to learn includes a set of beliefs (one for each agent) and a set of
the opponent’s key negotiation features through offers and strategies. A strategy combination and a set of beliefs form a
counteroffers in the course of claims negotiations. Bayesian – Nash equilibrium if the strategies are in Nash
Strategy-influenced process. Incomplete information and equilibrium given the set of beliefs, and the agents update
different strategies will influence the payoffs of negotiation. their beliefs according to Baye’s rule [8].
3.2. Learning approach: Bayesian learning mechanism R the engineer agent’s reservation value;
Ei the engineer agent’s offer at encounter i;
Bayesian learning approach is introduced in the MAS- Ri a set of the contractor agent’s partial beliefs
COT model for agents to estimate their opponents’ key (hypotheses) about the engineer agent’s reser-
negotiation features. Bayesian inference has a long history vation value R; e.g. R1 ¼ 100; R2 ¼ 150; etc. ði ¼
of being used as a simple but powerful learning approach, 1; 2; …; nÞ;
which has been developed in various AI research projects PðRi Þ the probabilistic evaluation over the set of
such as Harsanyi [8], Iversen [9], Zeng and Sycara [22] and hypotheses {Ri }; which are the contractor agent’s
Bui [2]. This study adopted the Bayesian learning approach prior knowledge, e.g. PðR1 Þ ¼ 0:75; PðR2 Þ ¼ 0:60;
[8,22]. P etc. ði ¼ 1; 2; …nÞ;
Based on the Bayesian learning mechanism, when an Ri PðRi Þ the current estimate of R can be calculated as a
agent receives an offer (or counteroffer) from its opponent, mean.
the agent analyses the offer, modifies its beliefs about the
opponent and makes a counteroffer accordingly. The The Bayesian learning mechanism is applied when the
updated belief then becomes the agent’s prior knowledge contractor agent receives a new offer from the engineer
agent. Based on its prior knowledge about the engineer
in the next updating process. An agent can finally get a
agent, a new offer enables the contractor agent to acquire
relatively accurate belief about the opponent even if its
new insights into the engineer agent’s reservation value in
initial domain knowledge is not so accurate (Fig. 2).
the form of posterior subjective evaluation over Ri : The
Such beliefs are normally about the opponent’s key
contractor agent’s prior knowledge about the engineer
negotiation features, such as: reservation value, risk attitude,
agent’s strategy can be expressed as ‘usually the engineer
payoff functions, or negotiation strategy. This study focuses
agent will offer an amount, which is 20% lower than its
on the ‘reservation value’ which is the maximum amount
reservation value’ [22]. Such a relationship can be
that the engineer agent can offer to the contractor agent, and
represented by a set of conditional statements, for example:
vice versa.
Pðe2lR2Þ ¼ 0:95; where e2 represents offerengineer ¼ 120;
Since reservation values are private information, it is
and R2 ¼ 150:
impossible for an agent to know its opponent’s exact Given the encoded contractor agent’s domain knowledge
reservation value. Nevertheless, an agent can update its in the form of conditional statements and the engineer
beliefs about its opponent’s reservation value based on its agent’s offer, the contractor agent can use the standard
interactions with the opponent and its domain knowledge by Bayesian rule to revise its beliefs about the engineer agent’s
using Bayesian inference. Therefore, an agent can gain a reservation value R :
more accurate expectation about its opponent’s utility and
maximum risk acceptability, and make a counteroffer based
on the information available at this stage. Section 3.3 PðRi ÞPðelRi Þ PðRi ÞPðelRi Þ
discusses how the contractor agent updates its beliefs about PðRi leÞ ¼ X
n ¼ ð2Þ
PðeÞ
the engineer agent’s reservation value. Some important PðelRk ÞPðRk Þ
terms used in MASCOT are defined as follows: k¼1
Fig. 2. Bayesian updating mechanism.

where updates its belief about the probability of the engineer

agent’s reservation amount being R according to the
P(Rile) the probability that the engineer agent’s reser- Bayesian rule Eq. (2). In this case, the engineer’s offer is
vation value is Ri under the condition that its offer £7000 ðe1 ¼ 7000Þ; thus
is e;
PðRi Þ the probability that the engineer agent’s reser- PðR3 ÞPðe1 lR3 Þ
PðR3 le1 Þ ¼
vation value is a certain Ri ; X
6
PðelRi Þ the probability that the engineer agent’s offer is a Pðe1 lRk ÞPðRk Þ
k¼1
certain e under the certain reservation value Ri ;
PðeÞ the probability that the engineer agent’s offer is e: 0:25 £ 0:14
¼
ð0:25 £ 0:14Þ þ ð0:25 £ 0:1Þ þ ð0:25 £ 0:03Þ
3.3. Example ¼ 0:518

PðR4 ÞPðe1 lR4 Þ
In a water supply project, the construction work was PðR4 le1 Þ ¼
X
6
seriously delayed due to the wrong geotechnical infor- Pðe1 lRk ÞPðRk Þ
mation being provided at the design stage. The contractor k¼1
was therefore entitled to claim for time extension and extra
costs. Negotiations were conducted regarding the amount of 0:25 £ 0:1
¼
compensation. This example presents the application of the ð0:25 £ 0:14Þ þ ð0:25 £ 0:1Þ þ ð0:25 £ 0:03Þ
MASCOT system for the claim for loss of productivity with
¼ 0:370
focus on the learning mechanism.
PðR5 ÞPðe1 lR5 Þ
PðR5 le1 Þ ¼
3.3.1. Negotiation preparation X
6
Before negotiation, the contractor estimates that his real Pðe1 lRk ÞPðRk Þ
loss of productivity is £9000. Considering his current k¼1
situation and the importance of the claim, the contractor 0:25 £ 0:03
decides his critical negotiation figures as in Table 1. ¼
ð0:25 £ 0:14Þ þ ð0:25 £ 0:1Þ þ ð0:25 £ 0:03Þ
Meanwhile, the contractor also tries to estimate the
engineer’s reservation amount based on his domain knowl- ¼ 0:111
edge. Table 2 shows the contractor’s estimate about the
possible distribution of the engineer’s reservation amount. From Table 3, PðR1 le1 Þ ¼ 0 because PðR1 Þ ¼ PðR2 Þ ¼ 0;
Table 3 shows the contractor’s estimate about the conditional PðR6 le1 Þ ¼ 0 because Pðe1 lR6 Þ ¼ 0:
probabilities of the engineer’s offer given his hypothesis. These Estimating the engineer agent’s reservation amount.
are based on the contractor’s perception of the engineer’s Prior to receiving the engineer agent’s offer (£7000), the
negotiation strategy; for example, ‘the engineer will normally contractor agent’s belief about the engineer agent’s
make his offer 10% lower than what he really wants’. reservation amount is:
X
R ¼ PðRi ÞRi ¼ 0:25 £ 8500 þ 0:25 £ 9000
3.3.2. Negotiation process
(1) The initial offer and counteroffer. The contractor þ 0:25 £ 10; 000 þ 0:25 £ 11; 000
agent makes its initial offer of £11,000 which is assumed to ¼ 9625
be its optimum claim amount. After receiving the contractor
After receiving the counteroffer, the contractor agent’s
agent’s initial offer, the engineer agent makes a counteroffer
estimation of the engineer agent’s reservation amount is
of £7000 to the contractor agent, which is also assumed to be
updated as follows:
the engineer agent’s optimum amount. X
(2) The contractor agent’s offer in the second iteration: R ¼ PðRi ÞRi ¼ 0:518 £ 8500 þ 0:37 £ 9000
Updating the belief of the probability of the engineer
þ 0:111 £ 10; 000
agent’s reservation amount. Based on the engineer agent’s
counteroffer and the contractor’s prior knowledge about the ¼ 8843
engineer agent (Tables 2 and 3), the contractor agent Utility functions
Table 1
The contractor’s major negotiation figures (a) The contractor agent’s utility function. In this example,
agents’ utility functions are assumed to be linear,
Reservation amount Optimum amount
which can be expressed as uc ¼ kx þ b: Also, the two
key points in the utility functions are known, i.e.
£9000 £11,000
optimum point: (11,000, 1) and reservation point:
Table 2
The contractor’s prior knowledge about the probabilities of the engineer’s reservation amount
Hypothesis R1 ; £7000 R2 ; £8000 R3 ; £8500 R4 ; £9000 R5 ; £10,000 R6 ; 11,000
Probability PðRi Þ 0 0 0.25 0.25 0.25 0.25
(9000, 0.6). Thus, the contractor agent’s utility the engineer agent), the contractor agent knows that it
function can be calculated to be uc ¼ 2 £ 1024 x 2 1:2: should make a concession in the next iteration. In this
(b) The contractor agent’s estimate of the engineer agent’s example, a simple concession approach is adopted to
utility function. The engineer agent’s utility function calculate the concession rate (i.e. the contractor agent will
can be expressed as ue ¼ kx þ b; where the contractor make the minimum concession sufficient to make the
agent knows two points alone this line based on its engineer agent’s maximum acceptable risk smaller than its
updated beliefs: optimum point: (7000, 1); reservation own in the next iteration). The concession step can be
point: (8707, 0.6); Thus, the contractor agent estimates calculated as:
the engineer agent’s utility function to be ue ¼
22:3 £ 1024 x þ 2:64: Uðwe Þ 2 UðDc Þ l1 2 uec l
Pe max ¼ ¼ ¼ 0:8;
(c) The combined utility function. Since the contractor Uðwe Þ 2 UðeÞ 120
agent’s and the engineer agent’s utility functions are
uc ¼ 2 £ 1024 x 2 1:2 and ue ¼ 22:3 £ 1024 x þ 2:64; uec ¼ 00:2 ) uc ¼ 0:8837 ) x ¼ 10; 418
the correlation between the contractor agent and the Thus, the contractor agent’s new offer will be equal to, or
engineer agent’s utility functions can be calculated as lower than £10,418.
ue ¼ 21:17uc þ 1:234: (3) The engineer agent’s counteroffer in the second
iteration. By following the same procedure as outlined
Risk evaluation. According to Zeuthen’s model, the
above for the contractor agent, the engineer agent is
maximum likelihood of risk acceptable to the contractor
required to make a concession and makes a counteroffer
agent ðPc max Þ and the engineer agent ðPe max Þ can be
of £8508.
calculated as:
The contractor agent’s offer in the third iteration. By
t t t t receiving the engineer agent’s new counteroffer (£8508), the
Ucc 2 Uce Uee 2 Uec
Pc max ¼ t 2 U ðCÞ ; Pe max ¼ t 2 U ðCÞ contractor agent repeats the evaluation process, as in the
Ucc c Uee e
second iteration, in order to decide its concession amount.
In this iteration, the offer of the contractor agent and the Updating the belief of the probability of the engineer
engineer agent are (11,000, 7000). Thus, the maximum agent’s reservation amount. After the last iteration, the
likelihood of risk acceptability to the contractor and the contractor agent updates its belief about the probability of
engineer are: the engineer agent’s reservation amount as given in Table 4.
t
Ucc 2 Ucet
1 2 0:2 Based on this prior knowledge (at £7000) and the
Pc max ¼ t ¼ ¼ 0:8 engineer agent’s new counteroffer £8838, the contractor
Ucc 2 Uc ðCÞ 1
agent updates its belief as follows:
t t
Uee 2 Uec 1 2 0:11
Pe max ¼ t ¼ ¼ 0:89 PðRi le1 ÞPðe1 ; e2 ; e3 lRi Þ
Uee 2 Ue ðCÞ 1 PðRi le1 ; e2 ; e3 Þ ¼
X
6
Concession. Since the Pc max , Pe max (i.e. the contrac- Pðe1 ; e2 ; e3 lRk ÞPðRk Þ
tor’s maximum risk acceptability is less than that of k¼1
Table 3
The conditional probabilities of the engineer’s offer given the contractor’s hypothesis
Hypothesis Possible event
e0 ; £6000 e1 ; £7000 e2 ; £8000 e3 ; £8500 e4 ; £9000 e5 ; £9500 e6 ; £10,000 e7 ; £11,000
£7000 0.35 0.35 0.25 0.04 0.01 0 0 0

£8000 0.10 0.40 0.35 0.12 0.03 0 0 0
£8500 0 0.14 0.50 0.30 0.05 0.01 0 0
£9000 0 0.10 0.15 0.45 0.25 0.05 0 0
£10,000 0 0.03 0.10 0.20 0.40 0.22 0.05 0
£11,000 0 0 0 0.05 0.10 0.25 0.40 0.20
Table 4
The contractor agent’s belief of the probability after the 2nd iteration
Hypothesis 1 (£7000) 2 (£8000) 3 (£8500) 4 (£9000) 5 (£10,000) 6 (£11,000)
Probability PðRi =e1 Þ 0 0 0.518 0.370 0.111 0
where reservation amount is:

Pðe1 ; e2 ; …; en lRi Þ ¼ Pðe1 lRi ÞPðe2 lRi Þ· · ·Pðen lRi Þ R ¼ 0:556 £ 8500 þ 0:426 £ 9000 þ 0:017 £ 10; 000 ¼ 8738:
Pðe1 ; e2 ; e3 lRi Þ ¼ Pðe1 lRi ÞPðe2 lRi ÞPðe3 lRi Þ
Making concession according to Zeuthen’s strategy.
¼ ð0:14; 0:1; 0:03Þð0:30; 0:45; 0:20Þ ¼ ð0:042; 0:045; 0:006Þ Through the process of estimating the opponent’s utility
function, maximum risk acceptability and concession
The above equations are based on the assumption that event amount based on Zeuthen’s strategy, the contractor agent
e1 and e2 are independent [14]. In this iteration, the identifies that the new offer should be equal to or lower than
contractor agent’s new prior conditional probabilities £10,073.
become: Due to space constraints, this example only presents the
first two iterations. Further iterations of negotiation will
PðR1 le1 ÞPðe1 ; e2 ; e3 lR1 Þ follow the same procedure. Table 5 and Fig. 3 show the
PðR1 le1 ; e2 ; e3 Þ ¼
X6 outcomes of the full negotiation process in each iteration.
Pðe1 ; e2 ; e3 lR1 ÞPðR1 Þ
k¼1
3.3.3. Discussion
0:518 £ 0:042 The MASCOT prototype has shown that the Bayesian
¼ learning approach could be a simple and effective learning
ð0:518 £ 0:042Þ þ ð0:37 £ 0:045Þ þ ð0:111 £ 0:006Þ
approach if agents could gain enough prior knowledge.
¼ 0:556 However, there are some limitations with this learning
approach. As a result, few extra rules have to be added to
PðR2 le1 ÞPðe1 ; e2 ; e3 lR2 Þ
PðR2 le1 ; e2 ; e3 Þ ¼ ensure that the system always generates reasonable results if
X6
the negotiating agents do not have adequate capability to
Pðe1 ; e2 ; e3 lR2 ÞPðR2 Þ
k¼1
learn about the opponent’s reservation amount.
This study adopts the Bayesian learning approach based
0:37 £ 0:045 on the fact that construction claims negotiation parties may
¼
ð0:518 £ 0:042Þ þ ð0:37 £ 0:045Þ þ ð0:111 £ 0:006Þ have adequate knowledge about their opponents’ nego-
tiation preference. However, there are several limitations in
¼ 0:426 the adoption of the Bayesian learning approach. For
PðR3 le1 ÞPðe1 ; e2 ; e3 lR3 Þ example, there is one extreme case where agents cannot
PðR3 le1 ; e2 ; e3 Þ ¼ learn correctly in the prototype. This occurs when one
X6
Pðe1 ; e2 ; e3 lR3 ÞPðR3 Þ negotiation party does not have adequate information about
k¼2 the negotiation item, or lacks enough domain knowledge
about its opponent, but still believes that its estimate of the
0:111 £ 0:006
¼ opponent’s key negotiation features is reliable. As a result,
ð0:518 £ 0:042Þ þ ð0:37 £ 0:045Þ þ ð0:111 £ 0:006Þ its input of the opponent’s reservation value or negotiation
habit could be very far from the opponent’s real value,
¼ 0:017
although the specified confidence level is high. Conse-
Thus, after receiving the engineer agent’s two offers, the quently, its updated reservation value in the first iteration
contractor agent’s current estimation of the engineer agent’s will be quite different from the opponent’s real value. Thus,
Table 5
The negotiation process
Negotiation iteration 0 First Second Third Fourth Fifth Sixth
The contractor agent’s offer 11,000 10,418 10,073 9814 9624 9548
The contractor agent’s estimate of the engineer’s reservation value 9625 8843 8738 8952 9394 9576
The engineer agent’s offer 7000 8508 8818 9232 9429 9500
The engineer agent’s estimate of the contractor’s reservation value 8600 9464 9520 9528 9532 9496
Fig. 3. The negotiation process (negotiation converged at the eighth/ninth iteration).
it may concede too much in the first iteration or may not and binary divisive strategy). Since agents do not adopt
concede at all. Consequently, the negotiation will converge Zeuthen’s strategy, it is not essential for them to adopt any
at an unreasonably fast rate or never converge. This could be learning approach to keep the negotiation strategy stable.
a general drawback of the Bayesian learning approach in Thus, situations such as one agent learning while another do
this kind of negotiation. not have been studied. The results show that a learning agent
In the MASCOT prototype, it has been found that an is able to draw the negotiation result to its preferred end
agent may adopt a misleading strategy to benefit from its given the condition that the learning agent could obtain
negotiation opponent because of the Bayesian learning enough prior negotiation information.
approach it adopts (i.e. it purposely makes an extremely
high or low initial offer). In this case, if the opponent holds
its current position (or makes very little concession as a 4. Conclusions and recommendations
gesture) for several iterations, it is expected that the agent
will adopt a more realistic approach to the negotiation. This paper has discussed the general characteristics of
Since the Bayesian learning approach is vital for the agent learning, described the Bayesian learning approach
adoption of the Zeuthen’s negotiation strategy, the MAS- adopted in the MASCOT system, and presented the
COT prototype did not explore the case where one agent implementation of the learning approach using a practical
adopted Bayesian learning whilst the other agent could not example. The Bayesian learning approach integrated with
learn. Nevertheless, this study considered other cases where Zeuthen’s negotiation strategy has been evaluated through
agents have different learning abilities (e.g. agent A has both theoretical analysis and prototype assessment. The
enough prior knowledge of agent B, whilst agent B lacks learning approach keeps the MAS negotiation mechanism
such knowledge before the negotiation). The result shows stable on the one hand, and addresses the human
that the benefits of the learning mainly depend on who has negotiator’s need to learn about the opponent’s negotiation
more reliable prior information and whether or not the features on the other hand [17].
opponent adopts a cheating strategy, rather than who the Given the advantages and flexibility of MAS, there is
opponent is. Also, it has been discussed that if an agent does considerable potential for MAS to be further applied to
not have enough prior knowledge, it is possible for the agent other fields (e.g. collaborative design, project planning and
to adopt another learning approach such as that of reinforced scheduling, and material management) to solve the
learning. In this approach, an agent may have different fragmentation problem of the industry. Agent learning
learning approaches based on its prior knowledge. In such capability greatly enhances the efficiency of a multi-agent
cases, since agents adopt different learning approaches, the system and makes these systems suitable for complex and
effectiveness of the learning approach will be different. The dynamic environments. For example, a design agent’s
result will be that one agent gains more than another. inference ability allows it to learn about other specialist
This study also analyses the impacts of the Bayesian agents’ domain knowledge, and allows the entire system
learning approach in the cases where agents adopt other to work efficiently in a changing design environment.
negotiation strategies (i.e. a simple gradient descent strategy More importantly, in cases such as the MASCOT system,
the agents’ learning ability is not only an approach to [5] Conry SE, Meyer RA, Lesser VR. Multistage negotiation in
improving agents’ working efficiency, but it is also a distributed planning. In: Bond AH, Gasser L, editors. Readings in
distributed artificial intelligence. San Mateo: Morgan Kaufmann;
functional requirement for system stability.
1988. p. 367–84.
However, how to develop an appropriate learning [6] Grecu DL, Brown DC. Learning by design agents during negotiation.
approach to suit a particular engineering problem is an Proceedings of the Third International Conference on AI in Design—
essential question which a system developer needs to Workshop on Machine Learning in Design, Lausanne, Switzerland;
answer. Although it is difficult to develop a general learning 1994.
approach for different application scenarios, several general [7] Grecu DL, Brown DC. Guiding agent learning in design. Proceedings
of the Third IFIP Working Group 5.2 Workshop on Knowledge
principles adopted in this study will be helpful for working Intensive CAD, Tokyo, Japan; 1998. p. 237–50.
out an appropriate learning approach. For example, it is [8] Harsanyi JC. Games with incomplete information played by Bayesian
important for a developer to answer the following questions: players. Mgmt Sci 1967–1968;14:159– 82. see also pages 320– 34,
486–502.
† Why should learning be included? [9] Iversen GR. Bayesian statistical inference. Beverly: Sage University
Paper; 1984.
† Who should learn?
[10] Kraus S. Negotiation and cooperation in multi-agent environments.
† From whom should learning take place? Artif Intell J, Spec Issue Econ Principles Multi-Agent Syst 1997;94(1/
† What should be learnt, and what are the objectives of 2):79– 98.
learning? [11] Matos N, Sierra C, Jennings NR. Determing successful negotiation
† What is the expected result of learning? and most strategies: an evolutionary approach. Proceedings of the Third
importantly International Conference on Multi-Agent Systems, IEEE Computer
Society; 1998.
† Which kind of learning method should be adopted?
[12] Mitchell TM. Machine learning. New York: McGraw-Hill; 1997.
† How should the results of the learning approach be [13] Nash JF. The bargaining problem. Econometrica 1950;28:155–62.
evaluated? [14] Pearl J. Probabilistic reasoning in intelligent systems: networks of
plausible inference. San Mateo: Morgan Kaufmann; 1988.
These questions, particularly the learning approach, [15] Ren Z, Anumba CJ, Ugwu OO. Construction claims management:
could only be answered by understanding all the key towards an agent-based approach. Engng Construct Architect Mgmt
aspects of agent learning and analysing the major charac- 2001;8(3):185–97.
[16] Ren Z, Anumba CJ, Ugwu OO. Negotiation in a multi-agent system
teristics of the particular problem domain.
for construction claims negotiation. Appl Artif Intell 2002;16(5):
359–94.
[17] Ren Z, Anumba CJ, Ugwu OO. A multi-agent system for construction
References claims negotiation. ASCE J Comput Civil Engng 2002;17(3).
[18] Rosenschein JS, Zlotkin G. Rules of encounter. Cambridge, MA: MIT
[1] Alonso E, d’Inverno M, Kudenko F, Luck M, Noble J. Learning in Press; 1994.
multi-agent systems. Technical Report of the Third Workshop of the [19] Weiss G. Adaptation and learning in multi-agent systems: some
UK’s Special Interest Group on Multi-Agent Systems; 2001. remarks and a bibliography. In: Weiß G, Sen S, editors. Adaption and
[2] Bui HH. Learning other agents’ preferences in multi-agent negotiation learning in multi-agent systems. Lecture notes in artificial intelli-
using the Bayesian classifier. Proc Natl Conf Artif Intell (AAAI-96) gence, vol. 1042. Berlin: Springer; 1996. p. 1 –21.
1996;114–9. [20] WinSton PH. Artificial intelligence, 3rd ed. New York: Addison-
[3] Bussmann S, Muller HJ. A negotiation framework for co-operating Wesley; 1992.
agents. Proceedings of CKBS-SIG, Dark Centre, University of Keele; [21] Young OR, editor. Bargaining: formal theories of negotiation.
1992. p. 1–17. Urbana: University of Illinois Press; 1975.
[4] Carbonell JM. Introduction: paradigms for machine learning. Artif [22] Zeng D, Sycara K. Bayesian learning in negotiation. Int J Human–
Intell 1989;40(1 –3):1–9. Comput Stud 1998;48:125 –41.

RP On Claims 4

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

RP On Claims 4

Uploaded by

Copyright:

Available Formats

Advanced Engineering Informatics 16 (2002) 265–275

Learning in multi-agent systems: a case study of construction

1. Introduction mechanism adopted, the integrative approach of the

2.3. Key elements in learning 2.3.3. Evaluation criteria

ðthe utility agent 1 loses by conceding and accepting agent 20 s offerÞ

Fig. 1. Agent structure in MASCOT.

Fig. 2. Bayesian updating mechanism.

where updates its belief about the probability of the engineer

3.3. Example ¼ 0:518

Hypothesis R1 ; £7000 R2 ; £8000 R3 ; £8500 R4 ; £9000 R5 ; £10,000 R6 ; 11,000

Probability PðRi Þ 0 0 0.25 0.25 0.25 0.25

Hypothesis Possible event

e0 ; £6000 e1 ; £7000 e2 ; £8000 e3 ; £8500 e4 ; £9000 e5 ; £9500 e6 ; £10,000 e7 ; £11,000

£7000 0.35 0.35 0.25 0.04 0.01 0 0 0

Hypothesis 1 (£7000) 2 (£8000) 3 (£8500) 4 (£9000) 5 (£10,000) 6 (£11,000)

Probability PðRi =e1 Þ 0 0 0.518 0.370 0.111 0

where reservation amount is:

Negotiation iteration 0 First Second Third Fourth Fifth Sixth

Fig. 3. The negotiation process (negotiation converged at the eighth/ninth iteration).

You might also like