Understanding The Application of Utility Theory in Robotics and Artificial Intelligence - A Survey

Understanding the Application of Utility Theory in Robotics
and Artificial Intelligence: A Survey

Qin Yang1∗ and Rui Liu2
Abstract — As a unifying concept in economics, game State (St)
theory, and operations research, even in the Robotics St+1

and AI field, the utility is used to evaluate the level of ?
individual needs, preferences, and interests. Especially
arXiv:2306.09445v1 [cs.RO] 15 Jun 2023
for decision-making and learning in multi-agent/robot Reward (rt)

systems (MAS/MRS), a suitable utility model can guide rt+1
agents in choosing reasonable strategies to achieve their
current needs and learning to cooperate and organize
their behaviors, optimizing the system’s utility, building Policy (π)
Environment Action (at) Agent

stable and reliable relationships, and guaranteeing each
group member’s sustainable development, similar to the
human society. Although these systems’ complex, large- Fig. 1: Rewards (Cumulative Utilities) Driven Behaviors
scale, and long-term behaviors are strongly determined
by the fundamental characteristics of the underlying rela-
tionships, there has been less discussion on the theoretical representing the collective behavior of distributed and self-
aspects of mechanisms and the fields of applications organized systems. There are a lot of research fields related to
in Robotics and AI. This paper introduces a utility- it, especially building so-called artificial social systems, such
orient needs paradigm to describe and evaluate inter and as drones and self-driving cars. Specifically, in multi-robot
outer relationships among agents’ interactions. Then, we systems (MRS), each robot can internally estimate the utility
survey existing literature in relevant fields to support it of executing an action. And robots’ utility estimates will be
and propose several promising research directions along inexact for several reasons, including mission cost, strategy
with some open problems deemed necessary for further rewards, sensor noise, general uncertainty, and environmental
investigations. change [4], [5].
Furthermore, in MAS cooperation, trust is an essential
I. INTRODUCTION
component for describing the agent’s purposes. In other
When people study, analyze, and design robotic or artifi- words, there are complex relationships between utility, trans-
cial intelligence (AI) systems, they would like to compare parency, and trust, which depend on individual purposes [6].
them with natural systems working. For example, many Especially considering humans and robots collaborated as a
natural systems (e.g., brains, immune system, ecology, so- team, recognizing the utility of robots will increase interest in
cieties) are characterized by apparently complex behaviors group understanding and improve Human-Robot Interaction
that emerge as a result of often nonlinear spatiotemporal (HRI) performance in tasks [7]. In this paper, we introduce
interactions among a large number of component systems a utility-orient needs paradigm and survey existing work on
at different levels of the organization [1]. From the single- robotics and AI through the lens of utility. We organize
robot perspective, cognitive developmental robotics (CDR) this review by the relationships between single-agent, multi-
aims to provide a new understanding of how human higher agent, trust among agents, and HRI to underscore the effects
cognitive functions develop by using a synthetic approach of utility on interaction outcomes.
that developmentally constructs cognitive functions [2]. For
multi-agent systems (MAS)1 , they potentially share the prop- A. Scope and Contributions
erties of swarm intelligence [3] in practical applications (like One of the significant challenges in designing robotic and
search, rescue, mining, map construction, exploration, etc.), AI systems is how to appropriately describe the relationships
of their interconnected elements and natures. It is natural to
1∗ is the corresponding author and with the Intelligent Social Systems
introduce a unifying concept expressing various components
and Swarm Robotics Lab (IS3R), Computer Science and Information
Systems Department, Bradley University, Peoria, IL 61625, USA, Email: and evaluating diverse factors on the same ground. As a
is3rlab@gmail.com; results, Utility Theory is a wide cross-area of research that
2 is with the Cognitive Robotics and AI Lab (CRAI), College of Aeronautics
lies at the intersection of economics, operations research,
and Engineering, Kent State University, Kent, OH 44242, USA.
1 In this paper, we use the term agent to mean autonomous intelligent robotics, AI, cognitive science, and psychology. To the best
entities, such as robot. of our knowledge, this survey is the first to formalize a
Single-Agent Systems Multi-Agent Systems (MAS) Human-Robot Interaction (HRI)
Group Utility: Observer
Individual Utility: Satisfy Supplier
individual capabilities; winning rate;
collision detection; execution time; cost; trust; etc.
human
battery level; needs
Satisfy Supplier Observer
data communication; group
Human
sensor range; utilities Combination Utility:
etc.
Adapte Human needs & Utility: eﬀectiveness; eﬃciency;
human safety; reliablity; compatibility; robostness; etc.
Excutor behaviors stablility; trust; etc.
Adapte Excutor Excutor
Carrier Note: Carrier
tasks and
Trust between
environments human and agent
Trust among agents
Fig. 2: Illustration of the utility-orient needs paradigm based on agents’ motivations among single-agent, MAS, and HRI.
comprehensive model of the utility paradigm in agents’ inter- a set of available alternatives and understand how these
action from agents’ needs perspective and situate prior and preferences change when an agent faces uncertainty about
ongoing research regarding how the utility can influence inter which alternative it will receive [8].
and outer relationships among agents and affect systems’ To describe the interactions between multiple utility-
performance. The main contributions of this paper are listed theoretic agents, we use the specific utility function to ana-
as follow: lyze their preferences and rational action. The utility function
• We introduce a utility paradigm based on agents’ needs is a mapping from states of the world to real numbers, which
to describe and evaluate inter and outer relationships are interpreted as measures of an agent’s level of happiness
among agents’ interaction, integrating and summarizing (needs) in the given states. If the agent is uncertain about
insights from robotics and AI studies. its current state, the utility is defined as the expected value
• We review existing literature in relevant fields that (Eq. (1)) of its utility function for the appropriate probability
support the proposed utility-orient needs paradigm of distribution over the states [8]. From the perspective of the
agents’ interaction. connection between a decision-maker and its preference, a
• We propose several open problems for future research decision-maker would rather implement a more preferred
based on utility theory, which will support researchers alternative (act, course of action, strategy) than one that is
in building robust, reliable, and efficient robotic and AI less preferred [9].
systems, especially in the MAS and HRI domain. X
E [u(X)] = u(xi ) · P (xi ) (1)
Fig. 2 outlines the vital relationships of the utility-orient i
needs paradigm based on agents’ motivations among single-
agent, MAS, and HRI, which affect systems’ abilities to meet Generally, utility serves as a unified concept in value
the requirements of specific tasks and adapt to human needs systems to describe the capacity of action, behavior, and
and uncertain environments. We survey how the utility-orient strategy to satisfy the agents’ desires or needs in their
relationships affect the performance of different systems, decision-making. Furthermore, modern utility theory has its
especially evaluating the trust level among agents through origins in the theory of expected value, and the value of any
various types of utilities and integrating human needs and course of action could be determined by multiplying the gain
agents’ utilities to build a harmonious team. realized from that action by the likelihood of receiving that
gain [10].
II. S YNOPSIS OF A FFORDANCE F ORMALISMS
B. Basic Concepts in Agent’s Motivation
This section summarises the evolution of formalisms that
use the concept of utilities to improve agents’ performance. Supposing AI agents can sense various environments,
This evolution can be classified into two main stages. Math- monitor their behaviors, and get internal and external infor-
ematical conceptualizations characterize the first stage as mation through perception P (t) in the given time. And the
extensions of theories from various fields. The second stage data can generally be categorized into two classes: different
corresponds to formalisms that focus on optimizing systems’ variables in the environments (like light, sound, color, etc.)
capabilities by unifying different concepts based on Utility and agents’ physical structure and operation (such as arm
Theory. extension, contact in leg, etc), the operation of the cognitive
processes within the robot (learning error, perceptual novelty,
A. Utility Theory etc.) [11]. Then, the set of values transmits to the agent’s
Utility theory is postulated in economics to explain the actuators making up the actuation vector A(t) in a given
behavior of individuals based on the premise that individuals time. Furthermore, we can define a series of specific concepts
can consistently rank the order of their choices depending as follow:
on their needs or preferences. This theoretical approach • Goal: An agent obtains utilities in a perceptual state
aims to quantify an agent’s degree of preference across P (t).
Eniveronment generating an environment state at each time step, the agent
. . . will form various goals and corresponding expected utilities
in motivation components. Then, the behavioral components
Sensor 1 . . . Sensor n
will make decisions based on specific goals to modify the
agent’s internal state and generate diverse strategies and
Perception Components
actions to achieve goals by triggering actuators to interact
State with the environment changing the external state.
C. Needs-driven Behaviors
Behavioral Intrinsic In nature, from cell to human, all intelligent agents
Goals
Components Motivation represent different kinds of hierarchical needs such as the
Decision-Making Unitlites Components low-level physiological needs (food and water) in microbe
and animal; the high-level needs self-actualization (creative
Actions & Strategies Intrinsically activities) in human being [15]. Different levels of needs
Motivated stimulate all agents to adopt diverse behaviors achieving
Agent certain goals or purposes and satisfying their various needs
Actuators
Interaction in the natural world [16]. They also can be regarded as self-
organizing behaviors [17], which keep on interacting with
Fig. 3: The illustration of the model for a generic intrinsically the environments and exchanging information and energy to
motivated agent. support the system’s survival and development.
Specifically, the needs describe the necessities for a self-
organizing system to survive and evolve, which arouses an
• Innate Value/needs/Motivation/Drive: An agent has the agent to action toward a goal, giving purpose and direction to
plasticity to encode the feedback received from its inter- behavior and strategy. The dominant approach to modeling
active environments continuously, and the encoding of an agent’s interests or needs is Utility Theory. [18] introduced
experience should facilitate the emergence of adaptive the self-sufficient robots’ decision-making based on a basic
behaviors [12]. work cycle – find fuel – refuel. They utilized some utility
• Strategy: A strategy describes the general plan of an criterion needs, such as utility maximization, to guide robots
agent achieving short-term or long-term goals under performing opportunistic behaviors [19].
uncertainty, which involves setting sub-goals and prior- From the utility theory perspective, needs are regarded as
ities, determining action sequences to fulfill the tasks, innate values driving agents to interact with environments.
and mobilizing resources to execute the actions [13]. Especially the ultimate payoff for animals is to maximize
• Utility: The benefit of using a strategy is to achieve a inclusive fitness utilizing some evolutionary strategy, and for
goal satisfying a specific innate value and needs. The humans is to maximize profitability through some marketing
utility is determined by the system’s interaction with strategy [20], [21]. If we consider the ecological status of
the environment and is not known a priori, traditionally AI agents, like robots, accomplishing tasks in the real world,
called Reward in the reinforcement learning field [11]. their situation is similar to our ecological situation [22].
• Expected Utility (eu ): The probability of obtaining util- Especially reinforcement learning (RL) is an excellent
ity starts from a given perceptual state P (t), which is model to describe the needs-driven behaviors of AI agents.
modulated by the amount of utility obtained. The essence of RL is learning from interaction based on
• Utility Model: A function provides the expected utility reward-driven (such as utilities and needs) behaviors, much
for any point in state space, which establishes the like natural agents. When an RL agent interacts with the
relationship between the real utility obtained from the environment, it can observe the consequence of its actions
interaction and how to achieve the goal. and learn to change its behaviors based on the corresponding
• Value Function (VF): A specific utility model expresses rewards received. The theoretical foundation of RL is the
the expected utility for any perceptual state P (t): paradigm of trial-and-error learning rooted in behaviorist
psychology [23]. In RL, the policy dictates the actions that
eu (t + 1) = V F (P (t + 1)) (2)
the agent takes as a function of its state and the environment,
• Episode: It describes a circle of the interaction between and the goal of the agent is to learn a policy maximizing the
agents and environments: expected cumulative rewards/utilities in the process. Fig. 1
illustrates the RL process of rewards-driven behaviors [24].
episode = {P (t); A(t); P (t + 1; eu (t + 1)} (3)
D. Utility-orient Needs Paradigm Systems
Especially, [14] presents a model for a generic intrinsi-
Fig. 2 represents the relationships among single-agent,
cally motivated agent shown in Fig. 3 2 . Through sensors
multi-agent, and humans based on the paradigm of the
2 Here, intrinsic motivation components modulate behavioral components utility-orient needs. More specially, before an individual
to connect perception components (sensors) and actuators. agent cooperates with group members, it needs to fit some
PARIETAL LOBE
basic needs, such as having enough energy, guaranteeing integrate information from several
FRONTAL LOBE
other agents’ safety, etc. Then it requires corresponding ability to concentrate,
senses, including speech, pain and
touch sensation, etc.
capabilities to collaborate with other agents, satisfy task judgment, analysis,
problem solving,
requirements, and adapt to dynamically changing environ- plan, personality, etc.
ments (Fig. 2 shows four categories of robot, carrier, supplier, OCCIPITAL
observer, and executor, working as a team in the middle LOBE
visual processing
graph.). Especially trust among agents can be evaluated based center of the
on their current status and needs, performance, knowledge, brain, mostly what
is referred to as a
experience, motivations, etc. "visual cortex"
Furthermore, when AI agents work with humans, they TEMPORAL LOBE
deals with high level visual
should satisfy humans’ needs and assist humans in fulfilling processing (face & scenes)
CEREBELLUM
perform some cognitive
their missions efficiently. Therefore, it is necessary to build BRAIN STEM function, like attention and
a unified utility paradigm describing the needs of AI agents serves as brain's warning system language. Also, controls the
and sets alertness voluntry movement
and humans in the common ground for evaluating their rela-
tionships like trust, which is also the pre-condition of safety, Fig. 4: The illustration of the functions of the brain.
reliability, stability, and sustainability in cooperation. In the
following sections, we will review existing literature from
the perspective of single-agent systems, multi-agent systems egories. Neuroanatomical Systems discuss the explainable bi-
(MAS), trust among agents, and human-robot interaction ologically inspired value systems design from neuroanatomy
(HRI) to support the proposed utility-orient needs paradigm and physiology perspectives [27]; Neural Networks Systems
systems. build more abstract models through mathematical approaches
to mimic the agent’s value systems; Motivational Systems
III. S INGLE -AGENT S YSTEMS consider the model that agents interact with environments
For single-agent systems, cognitive robotics is the most to satisfy their innate values, and the typical mechanism is
typical research area for implementing Utility Theory. It stud- reinforcement learning (RL). We review the three categories
ies the mechanisms, architectures, and constraints that allow as follow:
lifelong and open-ended improvement of perceptual, reason-
A. Neuroanatomical Systems
ing, planning, social, knowledge acquisition, and decision-
making skills in embodied machines [12]. Especially build- In the studies of the theory of primate nervous systems
ing a value 3 system to mimic the “brain” of an AI agent in the brain, researchers usually used robotics systems as
mapping behavioral responses for sensed external phenom- a test-bed [28], figuring out how neurons influence each
ena is the core component of cognitive robotics, which is other, how long neurons activate, and the brain regions are
also an emerging and specialized sub-field in neurorobotics, affected [29]. Fig. 4 illustrates the functions of the brain
robotics, and artificial cognitive systems research. from anatomical and physiological perspectives. Following
Here, the value measures an agent’s effort to expend the goals of synthetic neural modeling, they aim to forge
to obtain a reward or avoid punishment. It is not hard- links between crucial anatomical and physiological proper-
wired for an AI agent, even a biological entity, and the ties of nervous systems and their overall function during
specific value system achieved through experience reflects an autonomous behavior in an environment [12].
agent’s subjective evaluation of the sensory space. And the For example, [27] designed a value system in the NOMAD
value mechanisms usually have been defined as the expected mobile robot, driven by the object – “taste” to modulate
values, particularly in uncertain environments. Moreover, the changes in connections between visual and motor neurons,
innate value reflects an agent’s subjective evaluation of the thus linking specific visual responses to appropriate motor
sensory space, but the acquired value is shaped through outputs. [30] described the robotic architecture embedding
experience during its development. of a high-level model of the basal ganglia and related nuclei
[25] defines the artificial value system for the autonomous based on varying motivational and sensory inputs, such as
robot as a mechanism that can reflect the effect of their fear and hunger, to generate coherent sequences of robot
experience in future behaviors. In the life of an agent, the behavior.
agent starts from inherent values, and its acquired values Especially based on the principles of neuromodulation in
are modified through interaction with various environments the mammalian brain, [31] presents a strategy for controlling
and knowledge accumulation in learning. So the acquired autonomous robots. They used a cognitive robot – CARL-
value is thus activity-dependent and allows the value system 1, to test the hypothesis that neuromodulatory activity can
to become sensitive to stimuli that are not able to trigger a shape learning, drive attention, and select actions. The exper-
value-related response by themselves [26]. iments demonstrated that the robot could learn to approach
According to the development of cognitive robotics, the and move away from stimuli by predicting the positive or
existing value systems can mainly be classified into three cat- negative value.
Recently, [32] shows the putative functions ascribed to
3 In this section, we use the terms value and utility interchangeably. dopamine (DA) emerging from the combination of a standard
Hidden layer
Input layer
Input Value an agent by selecting the optimal action that maximizes the
Output layer Weight Wni
agent’s learning of sensory-motor correlations. From the
Input
Function Σ information theory perspective, [41], [42] presents a method
Inputs Activation linking information theoretic quantities on the behavioral
Function g
Outputs level (sensor values) to explicit dynamical rules on the inter-
Output
Function aj nal level (synaptic weights) in a systematic way. They studied
Output Links an intrinsic motivation system for behavioral self-exploration
Neuron based on the maximization of the predictive information
Synapses
using a range of real robots, such as the humanoid and
Fig. 5: The illustration of a two-layered feedforward neural Stumpy robots.
network. [43] recently proposed the value system of a develop-
mental robot using the RBM as the data structure. They
use simulation to demonstrate the mechanism allowing the
computational mechanism coupled with differential sensitiv- agent to accumulate knowledge in an organized sequence
ity to the presence of DA across the striatum through testing with gradually increasing complexity while hardly learning
on the simulated humanoid robot iCub interacting with a from purely random areas. From another angle, RL gradually
mechatronic board. However, it is worth mentioning that becomes a popular and effective model for non-specific
the current trend tries to blur the line between designing reward tasks [12], especially combined with deep learning
value systems that follow the neuroanatomy and biology methods.
of the brain and artificial neural networks (ANN) that are
biologically inspired but less anatomically accurate [12]. C. Motivational Systems
To clarify agents’ motivation, providing an expected utility
B. Artificial Neural Networks Systems for each perceptual state in the motivation evaluation is the
Artificial Neural Networks (ANN) are an information pre-condition in their decision-making, especially in discov-
manager model inspired by the biological nervous systems ering goals achieving them automatically, and determining
function of the human brain. Similarly, Fig. 4 demonstrates the priority of drives [11]. Many machine learning algorithms
the connection of different function modules within the share the properties of value systems and provide a way for
human brain working as a neural network to perform various AI agents to improve their performance in tasks and respect
reasoning activities. Like the human brain adjusting the their innate values [12]. The most typical and general model
synaptic relationships between and among neurons in learn- is reinforcement learning (RL).
ing, ANN also needs to modify the parameters of “nodes” The essence of RL is learning from interaction. When an
(neuron) to adapt to the specific scenario through a learning RL agent interacts with the environment, it can observe the
process. Fig. 5 illustrates a two-layered feedforward neural consequence of its actions and learn to change its behaviors
network. based on the corresponding rewards received. Moreover, the
There are several typical ANN models, such as restricted theoretical foundation of RL is the paradigm of trial-and-
Boltzmann machines (RBMs) [33], spiking neural networks error learning rooted in behaviorist psychology [23]. In this
[34], adaptive resonance theory (ART) networks [35], au- process, the best action sequences of the agent is determined
toencoder (AE) [36], convolutional neural networks (CNNs) by the rewards through interaction and the goal of the
[37], growing neural gases (GNGs) [38], ect. In the ANN, agent is to learn a policy π maximizing its expected return
Nodes working as neurons are non-linear processing units (cumulative utilities with discount). In a word, given a state,
receiving various signals and performing different processing a policy returns an action performance and learn an optimal
steps, as depicted graphically in Fig. 5. Moreover, the AI policy from the consequence of actions through trial and
agent’s sensor states represent its properties for the input error to maximize the expected return in the environment.
nodes, and the output nodes express corresponding actions Figure 1 illustrates this perception-action-learning loop.
and behaviors in the current status. The agent needs to learn a Furthermore, the flexibility of RL permits it as an effective
function that can describe its future expected reward (utility) utility system to model agents’ motivation through design-
based on the specific action or strategy in the given state. Due ing task-nonspecific reward signals derived from agents’
to the reward inverse proportion to learning error, the agent experiences [39]. [44] introduced one of the earliest self-
is more motivated to learn the actions with higher learning motivated value systems integrated with RL, demonstrating
error until it gets the skill. Then, it will switch to another to that the SAIL robot could learn to pay attention to salient
learn based on its needs. visual stimuli while neglecting unimportant input. To un-
Especially [39] proposes a neural network architecture derstand the origins of reward and affective systems, [45]
for integrating different value systems with reinforcement built artificial agents sharing the same intrinsic constraints
learning, presenting an empirical evaluation and comparison as natural agents: self-preservation and self-reproduction.
of four value systems for motivating exploration by a Lego [46] uses a table-based Q-learning as a baseline compared
Mindstorms NXT robot. [40] introduces a model of hier- with other function approximation techniques in motivated
archical curiosity loops for an autonomous active learning reinforcement learning.
state of the agent. After satisfying safety needs, the agent
E. Create the Bayesian Strategy
Networks (BSN) through Deep considers its basic needs, which includes features such as
Reinforcement Learning (DRL) get
the Optimal Strategy Combination.
B. Discuss Needs-driven Task
energy levels, data communication levels that help maintain
D. Build Game-theoretic
Utility Tree (GUT)
Assignments for MAS/MRS
Cooperation in USAR Missions.
the basic operations of that agent. Only after fitting the safety
Organizing more
Complex Relationships
and basic needs, an agent can consider its capability needs,
for cooperative MAS
Decision-Making A. Introduce SASS with which are composed of features such as its health level, com-
Robot Needs Hierarchy,
In Explore Domain.
Negotiation-Agreement puting (e.g., storage, performance), physical functionalities
Mechanism, and
Atomic Operations. (e.g., resources, manipulation), etc.
Robot Needs Hierarchy
At the next higher level, the agent can identify its teaming
Agent 1 needs that accounts the contributions of this agent to its team
Group 1 C. Define the Relative Needs
Agent 6 Agent 4 through several factors (e.g., heterogeneity, trust, actions)
Entropy (RNE) as the Trust
Agent 3
Assessment Model in that the team needs so that they can form a reliable and
MAS/MRS Cooperation Agent 2 Agent 5
Agent 8 Agent 7
Optimizing Agents’ Grouping
Group 2 robust team to successfully perform a given mission.
Strategy and Achieving better
Agent 10
Performance in Uncertain
Agent 9
Ultimately, at the highest level, the agent learns skills
and Adversarial Environments.
or features to improve its capabilities and performance in
achieving a given task through RL. Later work by [54] pre-
Fig. 6: The illustration of Agent Needs Hierarchy as five sented a strategy-oriented Bayesian soft actor-critic (BSAC)
different modules. Model based on the agent needs, which integrates an agent
strategy composition approach termed Bayesian Strategy
Network (BSN) with the soft actor-critic (SAC) method to
achieve efficient deep reinforcement learning (DRL).
From the mechanism of intelligent adaptive curiosity Generally, the set of agent needs reflects its innate values
perspective, [47] introduced an intrinsic motivation system and motivations. It is described as the union of needs at all
pushing a robot toward situations to maximize its learning the levels in the needs hierarchy. Each category needs level
progress. [48] presented a curiosity-driven reinforcement is combined with various similar needs (expected utilities)
learning approach that explores the iCub’s state-action space presenting as a set, consisting of individual hierarchical and
through information gain maximization, learning a world compound needs matrix [55]. It drives the agent to develop
model from experience, and controlling the actual iCub various strategies and diverse skills to satisfy its intrinsic
hardware in real time. To help the iCub robot intrinsically needs and values, much like a human does.
motivated to acquire, store and reuse skills, [49] introduced
Continual Curiosity driven Skill Acquisition (CCSA), a set IV. M ULTI -AGENT S YSTEMS
of compact low-dimensional representations of the streams A multi-agent system (MAS) consists of several agents
of high-dimensional visual information learned through in- interacting with each other, which may act cooperatively,
cremental slow feature analysis. competitively, or exhibit a mixture of these behaviors [56]. In
Especially [50] proposed an RL model combined with the the most general case, agents will act on behalf of users with
agent basic needs satisfaction and intrinsic motivations. It different goals and motivations to cooperate, coordinate, and
was tested in a simulated survival domain, where a robot negotiate with each other, achieving successful interaction,
was engaged in survival tasks such as finding food or water such as in human society [57]. Most MAS implementations
while avoiding dangerous situations. Furthermore, [51], [52] aim to optimize the system’s policies with respect to indi-
recently proposed the first formalized utility paradigm – vidual needs and intrinsic values, even though many real-
Agent Needs Hierarchy – for the needs of an AI agent world problems are inherently multi-objective [58]. There-
to describe its motivations from the system theory and fore, many conflicts and complex trade-offs in the MAS
psychological perspective. The Agent Needs Hierarchy is need to be managed, and compromises among agents should
similar to Maslow’s human needs pyramid [15]. An agent’s be based on the utility mapping the innate values of a
abstract needs for a given task are prioritized and distributed compromise solution – how to measure and what to optimize
into multiple levels, each of them preconditioned on their [59].
lower levels. At each level, it expresses the needs as an However, in the MAS setting, the situations will become
expectation over the corresponding factors/features’ distri- much more complex when we consider individual utility
bution to the specific group [52]. Fig. 6 illustrates the Agent reflects its own needs and preferences. For example, although
Needs Hierarchy as five different modules in the proposed we assume each group member receives the same team
Self-Adaptive Swarm Systems (SASS) [53]. rewards in fully cooperative MAS, the benefits received
Specifically, the lowest (first) level is the safety features by an individual agent are usually significant differences
of the agent (e.g., features such as collision detection, fault according to its contributions and innate values in real-
detection, etc., that assure safety to the agent, human, and world scenarios or general multi-agent settings. Especially in
other friendly agents in the environment). The safety needs distributed intelligent systems, such as multi-robot systems
can be calculated through its safety feature’s value and cor- (MRS), self-driving cars, and delivery drones, there is no
responding safety feature’s probability based on the current single decision-maker with full information and authority.
Instead, the system performance greatly depends on the
decisions made by interacting entities with local information
and limited communication capabilities [60]. This section
reviews the works of decision-makers compromise in MAS
from the game-theoretic utility cooperative decision-making
perspective based on their needs and innate values. Then, we
discuss the design and influence of individual utility func-
tions to induce desirable system behaviors and optimization
criteria.
A. Game-theoretic Utility Systems
From a game-theoretic control perspective, most of the
research in game theory has been focused on single-stage
games with fixed, known agent utilities [61], such as dis-
tributed control in communication [62] and task allocation
[63]. Especially, recent MAS research domains focus on
solving path planning problems for avoiding static or dynam-
ical obstacles [64] and formation control [65] from the unin-
Fig. 7: Illustration of A MAS (UAV) Working in Adversarial
tentional adversary perspective. For intentional adversaries,
Environments with Several Typical Scenarios
the Pursuit Domain [66], [67] primarily deals with how to
guide pursuers to catch evaders [68] efficiently. Similarly,
the robot soccer domain [69] deals with how one group of
robots wins over another group of robots on a typical game cooperative decision-making for MAS in adversarial envi-
element. ronments. It combines with a new payoff measure based
Furthermore, optimal control of MAS via game theory on agent needs for real-time strategy games. By calculating
assumes a system-level objective is given, and the utility the Game-Theoretic Utility Computation Unit distributed at
functions for individual agents are designed to convert a each level, the individual can decompose high-level strate-
Multi-Agent System into a potential game [70]. Although gies into executable low-level actions. It demonstrated the
game-theoretic approaches has received intensive attention in GUT in the proposed Explore Game domain and verified
the control community, it is still a promising new approach the effectiveness of the GUT in the real robot testbed –
to distributed control of MAS. Especially dynamic non- Robotarium [82], indicating that GUT can organize more
cooperative game theory using distributed optimization in complex relationships among MAS cooperation and help the
MAS has been studied in [71]. In [72], the authors inves- group achieve challenging tasks with lower costs and higher
tigated the MAS consensus and the work in [73] provided winning rates. Fig 7 illustrates a MAS (UAV) working in
an algorithm for large-scale MAS optimization. However, adversarial environments with several typical scenarios.
designing utility functions, learning from global goals for
B. Cooperative Decision-making and Learning
potential game-based optimization of control systems, and
converting the local optimization problem to an original Cooperative decision-making among the agents is essential
optimization problem into a potential networked game are to address the threats posed by intentional physical ad-
still open challenges [74]. versaries or determine tradeoffs in tactical networks [83].
It is worth mentioning that [75] and [76] introduced an Although cooperative MAS decision-making used to be
architectural overview from the perspectives of decomposing studied in many separate communities, such as evolutionary
the problem into utility function design and building the computation, complex systems, game theory, graph theory,
learning component. Specifically, the utility function design and control theory [84], these problems are either be episodic
concerns the satisfaction of the results of emergent behav- or sequential [85]. Agents’ actions or behavior are usually
iors from the standpoint of the system designer, and the generated from a sequence of actions or policies, and the
learning component design focuses on developing distributed decision-making algorithms are evaluated based on utility-
algorithms coordinating agents towards one such equilibrium orient policy optimality, search completeness, time complex-
configuration. Recently, [77], [78] solved the utility function ity, and space complexity.
design of the separable resource allocation problems through Furthermore, existing cooperative decision-making mod-
systematic methodology optimizing the price of anarchy. [79] els use Markov decision process (MDP) and its variants
discussed the learning in games of rational agents’ coverage [86], game-theoretic methods, and swarm intelligence [84].
to an equilibrium allocation by revising their choices over They mostly involve using Reinforcement Learning (RL)
time in a dynamic process. and Recurrent Neural Networks (RNN) to find optimal or
More recently, [80], [81] proposed a new model called suboptimal action sequences based on current and previous
the Game-Theoretic Utility Tree (GUT), combining the core states for achieving independent or transferred learning of
principles of game theory and utility theory to achieve decision-making policies [87], [88].
Considering decision-making in the context of cooperative the properties of cooperative coevolution and [97] visualize
MAS, the learning process can be centralized or decentral- basins of attraction to Nash equilibria for cooperative coevo-
ized. [89] divides it into two categories: team learning and lution.
concurrent learning. Moreover, lots of research in concurrent learning involve
In team learning, only one learner involves the learning game theory to investigate MAS problems [98], [8]. Espe-
process and represents a set of behaviors for the group of cially for Nash equilibria, it provides a joint strategy for each
agents, which is a simple approach for cooperative MAS group member, which lets individuals not have the motivation
learning since the standard single-agent machine learning to shift their strategy or action in the current situation in
techniques can handle it. So considering multiple learning terms of a better reward. In the full-cooperative scenario
processes improve the team’s performance, concurrent learn- with global reward, the reward affects the benefits of each
ing is the most common approach in cooperative MAS. agent and only needs to consider the globally optimal Nash
Since concurrent learning projects the large joint team search equilibrium. However, the relationships among agents are
space onto separate, smaller subset search spaces, [90] argue not clear for more general situations, which combine the
that it is suitable for the domains that can be decomposed cooperative and non-cooperative scenarios. In other words,
into independent problems and benefit from them. Especially individual rewards do not directly relate to the team reward,
for individual behaviors that are relatively disjoint, it can so the general-sum game [8] is the cooperative learning
dramatically reduce search space and computation complex- paradigm.
ity. Furthermore, breaking the learning process into smaller
chunks provides more flexibility for the individual learning C. From Individual Utility to Social Welfare
process using computational resources.
However, [89] argue that the central challenge for concur- As we discussed above, the reward same as the util-
rent learning is that each learner is adapting its behaviors ity, reflects the agents’ diverse intrinsic motivations and
in the context of other co-adapting learners over which it needs. Individual and team rewards also directly describe
has no control, and there are three main thrusts in the area the relationships between each group member and the team.
of concurrent learning: credit assignment, the dynamics of We are taking MAS football and the Robot-soccer do-
learning, and teammate modeling. main as an example. When considering multi-agent systems
The credit assignment problem focuses on appropriately (MAS) cooperation as a team, an individual agent must
apportioning the group rewards (utilities) to the individual first master essential skills to satisfy low-level needs (safety,
learner. The most typical solution is to separate the team basic, and capability needs) [51]. Then, it will develop
rewards for each learner equally or the reward changing trend effective strategies fitting middle-level needs (like teaming
of individual learners are the same. The approach divvying and collaboration) to guarantee the systems’ utilities [52].
up among each learner and receiving rewards through joint Through learning from interaction, MAS can optimize group
actions or strategies is usually termed global rewards. [91] behaviors and present complex strategies adapting to var-
argue that global reward does not scale well to increasingly ious scenarios and achieving the highest-level needs, and
difficult problems because the learners do not have sufficient fulfilling evolution. By cooperating to achieve a specific task,
feedback tailored to their own specific actions. gaining expected utility (reward), or against the adversaries
In contrast, if we do not divide the team rewards equally, decreasing the threat, intelligent agents can benefit the entire
evaluating each agent’s performance is based on individual group development or utilities and guarantee individual needs
behaviors, which means agents do not have the motivation and interests. It is worth mentioning that [99] developed
to cooperate and tend to greedy behaviors; these methods the end-to-end learning implemented in the MuJoCo multi-
are the so-called local reward. Through studies different agent soccer environment [100], which combines low-level
credit assignment policies, [92], [93] argue that local reward imitation learning, mid-level, and high-level reinforcement
leads to faster learning rates, but not necessarily to better learning, using transferable representations of behavior for
results than global reward. Because local rewards increase decision-making at different levels of abstraction. Figure 8
the homogeneity of the learning group, he suggests that represents the training process.
credit assignment selection relies on the desired degree However, suppose we extend the scalar to the entire sys-
of specialization. Furthermore, [94] argues that individual tem, like community and society, and consider optimizing the
learning processes can be improved by combining separate overall social welfare across all agents. In that case, we have
local reinforcement with types of social reinforcement. to introduce the social choice utility. In this setting, agents
The dynamics of learning consider the impact of co- have different agendas driven by their needs, leading to
adaptation on the learning processes. Assuming agents work complex dynamic behaviors when interacting, which causes
in dynamic changing and unstructured environments, they the utility hard to predict, let alone optimize. To solve this
need to constantly track the shifting optimal behavior or problem, we usually assume that agents are self-interested
strategy adapting to various situations, especially consider- and optimize their utilities from the perspective of socially
ing the agents may change other group members’ learning favorable desires and needs [58]. By defining the social
environments. Evolutionary game theory provides a common choice function representing social welfare reflecting each
framework for cooperative MAS learning. [95], [96] studied agent’s needs and desired outcome, we can design a system
(1) Imitation (2) Reinforcement learning (3) Distillation
Single
Player
of mocap data in drill tasks of drill experts
into policy priors
Low-level skills: Mid-level skills: High-level skills:

Running, Getting up Dribbling, Shooting Coordination, Passing,
Blocking
Multiply
Player
(4) Reinforcement learning
in 2 vs 2 matches
Fig. 8: End-to-End learning of coordinated 2 vs 2 humanoid football in MuJoCo multi-agent soccer environment
of payments making the joint policy converge to the desired and the trustee usually is the automation system or an
outcome [56]. intelligent agent like robot. Those systems’ primary purpose
For example, the government controls the parameters of is to assess and calibrate the trustors’ trust beliefs reflecting
tenders to balance the different requirements of projects for the automation system’s ability to achieve a specific task.
social welfare. Also, [101] discussed computing a convex In Computing and Networking, the concept of trust in-
coverage set for a cooperative multi-objective multi-agent volves many research fields, such as artificial intelligence
MDP balancing traffic delays and costs as a posteriori in (AI), human-machine interaction, and communication net-
traffic network maintenance planning. works [107], [108], [109]. They regard an agent’s trust
From another perspective, considering individual rewards as a subjective belief which represents the reliability of
and utilities from the standpoint of social choice, they are the other agents’ behaviors in a particular situation with
weighed and summed up, consisting of a social welfare potential risks. Their decisions are based on learning from the
function. In other words, the social welfare function relies on experience to maximize its interest (or utility) and minimize
individual values or return rewards and its utility computing risk [110]. Especially in the social IoT domain [111], [112],
model or mechanism. For example, [102] discussed which trust is measured based on the information accuracy and
properties must satisfy a multi-criteria function so it can be agents’ intentions (friendly or malicious) according to the
used to determine the winner of a multi-attribute auction. current environment to avoid attacks on the system. [113]
Here, social welfare may depend on attributes of the winning proposes a decision-making model named MATE, which
bids, as well as a fair outcome in terms of payments to the applies the expected-utility decision rule to decide whether
individual agents, that together with the costs the agents need or not pervasive devices are in their node interests to load a
to support to execute their bids, if chosen, typically determine particular component.
the individual utilities [58]. In MAS, trust is defined by mainly two major ways. One
Generally speaking, the design of the social welfare func- way is reliable relations between human operators and the
tion aims to find a mechanism for making agents’ utility MAS/MRS. In this context, trust reflects human expectations
function transparently reflect their innate values and needs, and team performance assessments [114], [115]. The primary
which can optimize the joint policy concerning sustainable research includes using the trust to measure the satisfactory
social welfare and guarantee individual development. degree of robot team performance that hybrid considering
in-process behaviors and final mission accomplishments. In
V. T RUST AMONG AGENTS this process, the visible and invisible behaviors regarding
Trust describes the interdependent relationship between humans supervising robots and robots’ behaviors will be
agents [103], which can help us better understand the dy- investigated to decide the most influential factors. Especially
namics of cooperation and competition, the resolution of investigating human trust in robots will help the robots to
conflicts, and the facilitation of economic exchange [104]. make decisions or react to emergencies, such as obstacles,
From the economic angle, [105] has regarded trust as an when human supervision is unavailable [116], [117].
expectation or a subjective probability and defined it using The second type of trust will be defined as reliable re-
expected utility theory combined with concepts such as lations among agents, reflecting the trusting agents’ mission
betrayal premium. satisfaction, organizational group member behaviors and con-
In Automation, authors in [106] describe trust in human- sensus, and task performance at the individual level. More-
automation relationships as the attitude that an agent will over, trust has been associated with system-level analysis,
help achieve an individual’s goals in a situation characterized such as reward and function at the organized group structure
by uncertainty and vulnerability. The trustor here is a human in a specific situation, the conflict solving of heterogeneous
and diverse needs, and the system evolution through learning entropy, it defines the entropy of needs as the difference or
from interaction and adaptation. distance of needs distribution between agents in a specific
On the other hand, the act of trusting by an agent can scenario for an individual or group.
be conceptualized as consisting of two main steps: 1) trust From a statistical perspective, the RNE can be regarded as
evaluation, in which the agent assesses the trustworthiness calculating the similarity of high-dimensional samples from
of potential interaction partners; 2) trust-aware decision- the agent needs vector. A lower RNE value means that the
making, in which the agent selects interaction partners based trust level between agents or groups is higher because their
on their trust values [118]. Also, assessing the performance needs are well-aligned and there is low difference (distance)
of agent trust models is of concern to agent trust research in their needs distributions. Similarly, a higher RNE value
[119]. will mean that the needs distributions are diverse, and there
It is worth noticing that reputation and trust mechanisms exists a low trust level between the agent or groups because
have gradually become critical elements in MAS design of their misalignment in their motivations.
since agents depend on them to evaluate the behavior of As discussed above, trust is the basis for decision-making
potential partners in open environments [120]. As an essen- in many contexts and the motivation for maintaining long-
tial component of reputation, trust generates the reputation term relationships based on cooperation and collaboration
from agents’ previous behaviors and experiences, which [110]. Although trust is a subjective belief, it is a ratio-
becomes an implicit social control artifact [121] when agents nal decision based on learning from previous experience
select partners or exclude them due to social rejection (bad to maximize its interest (or utility) and minimize risk by
reputation). choosing the best compromise between risk and possible
Furthermore, considering the interactions between human utility from cooperation. Trust assessment involves various
agents and artificial intelligence agents like human-robot factors, particularly utility and risk analysis under dynamic
interaction (HRI), building stable and reliable relationships situations, which rely on context and balance key tradeoffs
is of utmost importance in MRS cooperation, especially in between conflicting goals to maximize decision effectiveness
adversarial environments and rescue missions [52]. In such and task performance.
multi-agent missions, appropriate trust in robotic collabora-
tors is one of the leading factors influencing HRI perfor- VI. H UMAN -ROBOT I NTERACTION
mance [122]. In HRI, factors affecting trust are classified
into three categories [123]: 1) Human-related factors (i.e., As the higher-level intelligent creature, humans represent
including ability-based factors, characteristics); 2) Robot- more complex and diversified needs such as personal se-
related factors (i.e., including performance-based factors curity, health, friendship, love, respect, recognition, and so
and attribute-based factors); 3) Environmental factors (i.e., forth. In this regard, humans are highly flexible resources due
including team collaboration, tasking, etc.). Although there to the high degree of motion of the physical structure and
is no unified definition of trust in the literature, researchers the high level of intelligence and recognition [129]. When we
take a utilitarian approach to defining trust for HRI adopting consider humans and robots work as a team, organizing their
a trust definition that gives robots practical benefits in de- needs and getting a common ground is a precondition for
veloping appropriate behaviors through planning and control human-robot collaboration in complex missions and dynamic
[124], [125], [126]. changing environments. Considering building more stable
Moreover, with the development of the explanation of and reliable relationships between humans and robots, [130]
machine learning, such as explainable AI, although expla- depicts an updated model in HRI termed Human Robot Team
nation and trust have been intertwined for historical reasons, Trust (HRTT). It considers the human as a resource with
the utility of machine learning explanations will have more specific abilities that the resource can gain or lose trust with
general and consequential [127]. Especially the utility of an training and realizes the robots as resources with known
explanation must be tied to its appropriate trust model. performance attributes included as needs in the design phase
Although modeling trust has been studied from various of the collaborative process.
perspectives and involves many different factors, it is still Due to the constantly increasing need for robots to interact,
challenging to develop a conclusive trust model that incorpo- collaborate and assist humans, Human-Robot Interaction
rates all of these factors. Therefore, future research into trust (HRI) poses new challenges, such as safety, autonomy, and
modeling needs to be more focused on developing general acceptance issues [131]. From a robot needs perspective [51],
trust models based on measures other than the countless it first needs to guarantee human security and health, such
factors affecting trust [122]. Specifically, few works from the as avoiding collision with humans, protecting them from
literature considered trust in socio-intelligent systems from risks, etc. Moreover, in the higher level teaming needs, robots
an AI agent perspective to model trust with agents having should consider human team members’ specialty and capabil-
similar needs and intrinsic values. ity to form corresponding heterogeneous Human-Robot team
Recently, [128] proposed a novel trust model termed adapting specific missions automatically [52]. Furthermore,
relative needs entropy (RNE) by combining utility theory and efficient and reliable assistance is essential to the entire
the concept of relative entropy based on the agent’s needs mission process in HRI. More importantly, designing an
hierarchy [51], [52] and innate values. Similar to information Interruption Mechanism can help humans interrupt robots’
current actions and re-organize them to fulfill specific emer- evolution of their expectations of the robot’s capabilities.
gency tasks or execute some crucial operations manually. In the HRI, adaptability is a crucial property provided
Especially humans also expect robots to provide safety and by the SMM, which intrinsically relates to performance and
a stable working environment in missions. [132] supports is objectively measured by an adaptable controller as an
the utility of people’s attitudes and anxiety towards robots independent variable to compare alongside other controllers
to explain and predict behavior in human-robot interaction [139]. Especially in complex, uncertain, and dynamically
and suggests further investigation of which aspects of robots changing environments, robots need to adapt efficiently to
evoke what type of emotions and how this influences the humans’ behaviors and strategies caused by their frequently
overall evaluation of robots. From the psychophysiology changing needs and motivations. Additionally, [142] showed
perspective, [133] combines subjective research methods, that a robot adapting to the differences in humans’ prefer-
such as behavioral and self-report measurements, to achieve ences and needs is positively correlated with trust between
a more complete and reliable knowledge of the person’s them. Moreover, SMM improves trust and reliability between
experience. For example, when people interact with AI humans and robots by alleviating uncertainty in roles, respon-
agents like robots, which will be affected by mood and social sibilities, and capabilities in the interactions. By integrating
desirability in a specific issue. However, psychophysiological research regarding trust in automation and describing the
techniques have different challenges in data acquisition and dynamics of trust, the role of context, and the influence of
interpretation. Moreover, the monitor quality is influenced display characteristics, [106] believes trust involves the expe-
by both confounding environmental factors, such as noise or rience and knowledge of humans about the needs, motivation,
lightning, and by the individual psychological internal state function processing, and performance of robots [123], such
during the evaluation, which might undermine the reliability as whether robots’ capabilities meet expectations of humans
and the correct interpretation of the gathered data [134]. and tasks [143], and minimizing system fault occurrence,
From another angle, recognizing the potential utility of system predictability, and transparency [144].
human affect (e.g., via facial expressions, voice, gestures, Generally speaking, through formulizing the motivations
etc.) and responding to humans is essential in a human- and needs of humans and robots and describing their com-
robot team’s collaborative tasks, which would make robots plex, diverse, and dynamic relationships in a common ground
more believable to human observers [135]. [136] proposed an like trust, utility theory provides a unified model to evaluate
architecture including natural language processing, higher- the performance of the interaction between humans and AI
level deliberative functions, and implementing “joint in- agents effectively and make a fundamental contribution to
tention theory”. [137] demonstrated that expressing affect safety in the interaction.
and responding to human affect with affect expressions can
significantly improve team performance in a joint human- VII. I NSIGHTS AND C HALLENGES
robot task.
For Multi-agent HRI systems, they require more complex Currently, with the tremendous growth in AI technology,
control strategies to coordinate several agents that may be Robotics, IoT, and high-speed wireless sensor networks
dissimilar to one another (in roles, communication modes (like 5G), it gradually forms an artificial ecosystem termed
etc.), can involve interactions that connect more than two artificial social systems that involves entire AI agents from
agents at once, and need special attention to address any software entities to hardware devices [53]. How to integrate
conflicts that may arise from their interactions [138]. Specif- artificial social systems into human society and coexist har-
ically, the human would give high-level commands to the moniously is a critical issue for the sustainable development
robot group or control them individually using a screen-based of human beings, such as assistive and healthcare robotics
interface, and robots need to adapt to the operator’s actions or [145], social path planning and navigation [146], search and
optimize some utility function (computational characteristics rescue [147], [52], and autonomous driving [148]. Especially
like task requirements, teaming needs, etc.) (See Fig. 2). the future factory is likely to utilize robots for a much broader
Moreover, building the shared mental models (SMM) range of tasks in a much more collaborative manner with
enable team members to draw on their own well-structured humans, which intrinsically requires operation in proximity
common knowledge to select actions that are consistent to humans, raising safety and efficiency issues [139].
and coordinated with those of their teammates, which are At this point, building a robust, stable, and reliable trust
strongly correlated to team performance [139]. [140] designs network between humans and AI agents to evaluate agents’
the human-robot cross-training inspired by SMM to compute performance and status in a common ground is the pre-
the robot policy aligned with human preference (needs) by condition for efficient and safe interaction in their collab-
iteratively switching roles between humans and robots to oration. Here, utility is the key concept representing the
learn a shared plan for a collaborative task. Furthermore, con- individual agent’s innate values, needs and motivations over
sidering humans infer the robot’s capabilities and partially states of the world and is described as expected rewards. It
adapt to the robot, [141] presents a game-theoretic model measures sensitivity to the impact of actions on trust and
of human partial adaptation to the robot, where the human long-term cooperation and is efficient enough to allow AI
responds to the robot’s actions by maximizing a reward agents like robots to make real-time decisions [149]. For
function that changes stochastically over time, capturing the example, self-driving cars might be the first widespread case
of trustworthy robots designed to earn trust by demonstrating VIII. C ONCLUSION
how well they follow social norms.
This paper proposes a utility-orient needs paradigm based
From the learning perspective, the individual learning
on agents’ needs to describe and evaluate inter and outer
model can be regarded as constructing models of the other
relationships among agents’ interactions in robotics and AI.
agents, which takes some portion of the observed interaction
We review existing literature illustrating the application of
history as input, and returns a prediction of some property
utility theory as a unified concept from the perspective of
of interest regarding the modeled agent [150]. Taking human
single-agent, MAS, agent trust, and HRI. Then, we discuss
society as an example, human beings represent more complex
the insight and open problems for future research based on
and diversified needs, such as personal security, health,
utility theory. We anticipate that this comprehensive overview
friendship, love, respect, recognition, and self-actualization.
of the role of the utility-orient needs paradigm based on
These diverse needs can be expressed as corresponding re-
agents’ needs in robotics and AI systems will support future
wards or utility mechanisms driving people to adopt various
research that leverages, compares, or constructs interactions
strategies and skills to achieve their requirements. Especially
for artificial social systems.
the multi-layer and diverse needs of people are the founda-
tion for describing their complex and dynamic relationships, R EFERENCES
such as trust.
Similarly, how to design a general and standard utility [1] S. A. Levin, “Ecosystems and the biosphere as complex adaptive
mechanism integrating humans and AI agents’ needs and systems,” Ecosystems, vol. 1, no. 5, pp. 431–436, 1998.
[2] M. Asada, K. Hosoda, Y. Kuniyoshi, H. Ishiguro, T. Inui,
learning to efficiently build reliable and stable trust relation- Y. Yoshikawa, M. Ogino, and C. Yoshida, “Cognitive developmen-
ships between agents and humans is essential [128]. Espe- tal robotics: A survey,” IEEE transactions on autonomous mental
cially a suitable knowledge graph combined with efficient development, vol. 1, no. 1, pp. 12–34, 2009.
[3] G. Beni and J. Wang, “Swarm intelligence in cellular robotic sys-
DRL can help agents learn to adopt appropriate behaviors or tems,” in Robots and biological systems: towards a new bionics?
strategies benefiting the group utilities, adapting to human Springer, 1993, pp. 703–712.
members’ needs, and guaranteeing their development in var- [4] B. P. Gerkey and M. J. Mataric, “Multi-robot task allocation: Ana-
lyzing the complexity and optimality of key architectures,” in 2003
ious situations [54]. Another interesting direction is behavior IEEE international conference on robotics and automation (Cat. No.
manipulation, known as policy elicitation, which refers to 03CH37422), vol. 3. IEEE, 2003, pp. 3862–3868.
problems in human-robot collaboration wherein agents must [5] Q. Yang, Z. Luo, W. Song, and R. Parasuraman, “Self-reactive
planning of multi-robots with dynamic task assignments,” in 2019
guide humans toward an optimal policy to fulfill a task International Symposium on Multi-Robot and Multi-Agent Systems
through implicit or explicit communication [151], [152]. For (MRS). IEEE, 2019, pp. 89–91.
example, it is used to achieve effective personalized learning [6] R. H. Wortham and A. Theodorou, “Robot transparency, trust and
utility,” Connection Science, vol. 29, no. 3, pp. 242–248, 2017.
in AI agents’ teaching and coaching [153]. [7] M. S. Prewett, R. C. Johnson, K. N. Saboe, L. R. Elliott, and M. D.
For future MAS research, AI agents learn and adapt to Coovert, “Managing workload in human–robot interaction: A review
human needs and maintain trust and rapport among agents, of empirical studies,” Computers in Human Behavior, vol. 26, no. 5,
pp. 840–856, 2010.
which are critical for task efficiency and safety improvement [8] Y. Shoham and K. Leyton-Brown, Multiagent systems: Algorithmic,
[147]. And we list potential directions as follow: game-theoretic, and logical foundations. Cambridge University
• Adopting suitable formation to perceive and survey Press, 2008.
[9] P. C. Fishburn, “Utility theory for decision making,” Research
environments predicting threats warning human team analysis corp McLean VA, Tech. Rep., 1970.
members. [10] P. W. Glimcher, M. C. Dorris, and H. M. Bayer, “Physiological utility
• Reasonable path planning adaptation in various scenar- theory and the neuroeconomics of choice,” Games and economic
behavior, vol. 52, no. 2, pp. 213–256, 2005.
ios avoids collision guaranteeing human security and [11] A. Romero, A. Prieto, F. Bellas, and R. J. Duro, “Simplifying the
decreasing human working environment interference. creation and management of utility models in continuous domains for
• Combining the specific capabilities and needs of agents cognitive robotics,” Neurocomputing, vol. 353, pp. 106–118, 2019.
[12] K. Merrick, “Value systems for developmental cognitive robotics: A
and humans, calculating sensible strategies to efficiently survey,” Cognitive Systems Research, vol. 41, pp. 38–55, 2017.
organize the entire group collaboration and fulfill the [13] L. Freedman, Strategy: A history. Oxford University Press, 2015.
corresponding mission. [14] K. E. Merrick, “Novelty and beyond: Towards combined motivation
models and integrated learning architectures,” Intrinsically motivated
As discussed above, Agent Needs Hierarchy is the foun- learning in natural and artificial systems, pp. 209–233, 2013.
dation for MAS learning from interaction [55]. It surveys [15] A. H. Maslow, “A theory of human motivation.” Psychological
the system’s utility from individual needs. Balancing the review, vol. 50, no. 4, p. 370, 1943.
[16] F. E. Yates, Self-organizing systems: The emergence of order.
needs and rewards between agents and groups for MAS Springer Science & Business Media, 2012.
through interaction and adaptation in cooperation optimizes [17] W. Banzhaf, “Self-organizing systems.” Encyclopedia of complexity
the system’s utility and guarantees sustainable development and systems science, vol. 14, p. 589, 2009.
for each group member, much like human society does. [18] D. Mcfarland, “Autonomy and self-sufficiency in robots,” in The
Artificial Life Route to Artificial Intelligence. Routledge, 2018, pp.
Furthermore, by designing suitable reward mechanisms and 187–214.
developing efficient DRL methods, MAS can effectively [19] D. McFarland and E. Spier, “Basic cycles, utility and opportunism
represent various group behaviors, skills, and strategies to in self-sufficient robots,” Robotics and Autonomous Systems, vol. 20,
no. 2-4, pp. 179–190, 1997.
adapt to uncertain, adversarial, and dynamically changing [20] J.-A. Meyer and S. W. Wilson, “From animals to animats,” Cam-
environments [53]. bridge MA, 1991.
[21] D. McFarland, T. Bösser, and T. Bosser, Intelligent behavior in international workshop on Epigenetic Robotics: Modeling cognitive
animals and robots. Mit Press, 1993. development in robotic systems, vol. 74, 2002, p. 55.
[22] D. McFarland, “Towards robot cooperation,” From animals to ani- [45] K. Doya and E. Uchibe, “The cyber rodent project: Exploration
mats, vol. 3, pp. 440–444, 1994. of adaptive mechanisms for self-preservation and self-reproduction,”
[23] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduc- Adaptive Behavior, vol. 13, no. 2, pp. 149–160, 2005.
tion. MIT press, 2018. [46] K. E. Merrick, “Modeling behavior cycles as a value system for
[24] S. Wallkötter, “The hidden potential of nao and pepper – custom developmental robots,” Adaptive Behavior, vol. 18, no. 3-4, pp. 237–
robot postures in naoqi v2.4,” 2019. 257, 2010.
[25] M. Begum and F. Karray, “Computational intelligence techniques in [47] P.-Y. Oudeyer, F. Kaplan, and V. V. Hafner, “Intrinsic motivation
bio-inspired robotics,” in Design and control of intelligent robotic systems for autonomous mental development,” IEEE transactions on
systems. Springer, 2009, pp. 1–28. evolutionary computation, vol. 11, no. 2, pp. 265–286, 2007.
[26] O. Sporns and E. Körner, “Value and self-referential control: neces- [48] M. Frank, J. Leitner, M. Stollenga, A. Förster, and J. Schmidhuber,
sary ingredients for the autonomous development of flexible intelli- “Curiosity driven reinforcement learning for motion planning on
gence,” Towards a Theory of Thinking, pp. 323–335, 2010. humanoids,” Frontiers in neurorobotics, vol. 7, p. 25, 2014.
[27] O. Sporns, N. Almássy, and G. M. Edelman, “Plasticity in value [49] V. R. Kompella, M. Stollenga, M. Luciw, and J. Schmidhuber,
systems and its role in adaptive behavior,” Adaptive Behavior, vol. 8, “Continual curiosity-driven skill acquisition from high-dimensional
no. 2, pp. 129–148, 2000. video inputs for humanoid robots,” Artificial Intelligence, vol. 247,
[28] J. L. Krichmar and G. M. Edelman, “Brain-based devices: Intelligent pp. 313–335, 2017.
systems based on principles of the nervous system,” in Proceedings [50] D. Di Nocera, A. Finzi, S. Rossi, and M. Staffa, “The role of
2003 IEEE/RSJ International Conference on Intelligent Robots and intrinsic motivations in attention allocation and shifting,” Frontiers
Systems (IROS 2003)(Cat. No. 03CH37453), vol. 1. IEEE, 2003, in psychology, vol. 5, p. 273, 2014.
pp. 940–945. [51] Q. Yang and R. Parasuraman, “Hierarchical needs based self-adaptive
[29] N. Almássy, G. Edelman, and O. Sporns, “Behavioral constraints in framework for cooperative multi-robot system,” in 2020 IEEE In-
the development of neuronal properties: a cortical model embedded ternational Conference on Systems, Man, and Cybernetics (SMC).
in a real-world device.” Cerebral cortex (New York, NY: 1991), vol. 8, IEEE, 2020, pp. 2991–2998.
no. 4, pp. 346–361, 1998. [52] Q. Yang and R. Parasuraman, “Needs-driven heterogeneous multi-
[30] T. J. Prescott, F. M. M. González, K. Gurney, M. D. Humphries, robot cooperation in rescue missions,” in 2020 IEEE International
and P. Redgrave, “A robot model of the basal ganglia: behavior and Symposium on Safety, Security, and Rescue Robotics (SSRR). IEEE,
intrinsic processing,” Neural networks, vol. 19, no. 1, pp. 31–61, 2020, pp. 252–259.
2006. [53] Q. Yang, “Self-adaptive swarm system,” Ph.D. dissertation, Univer-
[31] B. R. Cox and J. L. Krichmar, “Neuromodulation as a robot con- sity of Georgia, 2022.
troller,” IEEE Robotics & Automation Magazine, vol. 16, no. 3, pp. [54] Q. Yang and R. Parasuraman, “A strategy-oriented bayesian soft
72–80, 2009. actor-critic model,” Procedia Computer Science, vol. 220, pp. 561–
[32] V. G. Fiore, V. Sperati, F. Mannella, M. Mirolli, K. Gurney, K. Fris- 566, 2023.
ton, R. J. Dolan, and G. Baldassarre, “Keep focussing: striatal [55] Q. Yang, “Self-adaptive swarm system (sass),” Proceedings of the
dopamine multiple functions resolved in a single mechanism tested Thirtieth International Joint Conference on Artificial Intelligence,
in a simulated humanoid robot,” Frontiers in psychology, vol. 5, p. IJCAI-21, pp. 5040–5041, , 2021, doctoral Consortium.
124, 2014. [56] N. Vlassis, A concise introduction to multiagent systems and dis-
[33] G. E. Hinton, S. Osindero, and Y.-W. Teh, “A fast learning algorithm tributed artificial intelligence. Morgan & Claypool Publishers, 2007,
for deep belief nets,” Neural computation, vol. 18, no. 7, pp. 1527– no. 2.
1554, 2006. [57] M. Wooldridge, An introduction to multiagent systems. John wiley
[34] W. Maass, “Networks of spiking neurons: the third generation of & sons, 2009.
neural network models,” Neural networks, vol. 10, no. 9, pp. 1659– [58] R. Rădulescu, P. Mannion, D. M. Roijers, and A. Nowé, “Multi-
1671, 1997. objective multi-agent decision making: a utility-based analysis and
[35] S. Grossberg, “Adaptive pattern classification and universal recoding: survey,” Autonomous Agents and Multi-Agent Systems, vol. 34, no. 1,
I. parallel development and coding of neural feature detectors,” p. 10, 2020.
Biological cybernetics, vol. 23, no. 3, pp. 121–134, 1976. [59] L. M. Zintgraf, T. V. Kanters, D. M. Roijers, F. Oliehoek, and P. Beau,
[36] H. Bourlard and Y. Kamp, “Auto-association by multilayer per- “Quality assessment of morl algorithms: A utility-based approach,”
ceptrons and singular value decomposition,” Biological cybernetics, in Benelearn 2015: proceedings of the 24th annual machine learning
vol. 59, no. 4, pp. 291–294, 1988. conference of Belgium and the Netherlands, 2015.
[37] K. Fukushima and S. Miyake, “Neocognitron: A self-organizing [60] D. Paccagnan, R. Chandan, and J. R. Marden, “Utility and mechanism
neural network model for a mechanism of visual pattern recognition,” design in multi-agent systems: An overview,” Annual Reviews in
in Competition and cooperation in neural nets. Springer, 1982, pp. Control, 2022.
267–285. [61] B. Browning, J. Bruce, M. Bowling, and M. Veloso, “Stp: Skills,
[38] B. Fritzke, “A growing neural gas network learns topologies,” Ad- tactics, and plays for multi-robot control in adversarial environ-
vances in neural information processing systems, vol. 7, 1994. ments,” Proceedings of the Institution of Mechanical Engineers, Part
[39] K. E. Merrick, “A comparative study of value systems for self- I: Journal of Systems and Control Engineering, vol. 219, no. 1, pp.
motivated exploration and learning by robots,” IEEE Transactions 33–52, 2005.
on Autonomous Mental Development, vol. 2, no. 2, pp. 119–131, [62] J.-l. Zhang, D.-l. Qi, and M. Yu, “A game theoretic approach for the
2010. distributed control of multi-agent systems under directed and time-
[40] G. Gordon and E. Ahissar, “Hierarchical curiosity loops and active varying topology,” International Journal of Control, Automation and
sensing,” Neural Networks, vol. 32, pp. 119–129, 2012. Systems, vol. 12, no. 4, pp. 749–758, 2014.
[41] G. Martius, R. Der, and N. Ay, “Information driven self-organization [63] E. Bakolas and Y. Lee, “Decentralized game-theoretic control for
of complex robotic behaviors,” PloS one, vol. 8, no. 5, p. e63400, dynamic task allocation problems for multi-agent systems,” in 2021
2013. American Control Conference (ACC). IEEE, 2021, pp. 3228–3233.
[42] G. Martius, L. Jahn, H. Hauser, and V. V. Hafner, “Self-exploration [64] N. Agmon, G. A. Kaminka, and S. Kraus, “Multi-robot adversarial
of the stumpy robot with predictive information maximization,” patrolling: facing a full-knowledge opponent,” Journal of Artificial
in International Conference on Simulation of Adaptive Behavior. Intelligence Research, vol. 42, pp. 887–916, 2011.
Springer, 2014, pp. 32–42. [65] Y. Shapira and N. Agmon, “Path planning for optimizing surviv-
[43] K. Makukhin and S. Bolland, “Exploring the periphery of knowledge ability of multi-robot formation in adversarial environments,” in
by intrinsically motivated systems,” in Australasian Conference on 2015 IEEE/RSJ International Conference on Intelligent Robots and
Artificial Life and Computational Intelligence. Springer, 2015, pp. Systems (IROS). IEEE, 2015, pp. 4544–4549.
49–61. [66] T. H. Chung, G. A. Hollinger, and V. Isler, “Search and pursuit-
[44] X. Huang and J. Weng, “Novelty and reinforcement learning in the evasion in mobile robotics,” Autonomous robots, vol. 31, no. 4, p.
value system of developmental robots,” in Proceedings of the 2nd 299, 2011.
[67] M. Kothari, J. G. Manathara, and I. Postlethwaite, “A cooperative [90] T. Jansen and R. P. Wiegand, “Exploring the explorative advantage
pursuit-evasion game for non-holonomic systems,” IFAC Proceedings of the cooperative coevolutionary (1+ 1) ea,” in Genetic and Evolu-
Volumes, vol. 47, no. 3, pp. 1977–1984, 2014. tionary Computation Conference. Springer, 2003, pp. 310–321.
[68] V. R. Makkapati and P. Tsiotras, “Optimal evading strategies and [91] D. H. Wolpert and K. Tumer, “Optimal payoff functions for members
task allocation in multi-player pursuit–evasion problems,” Dynamic of collectives,” in Modeling complexity in economic and social
Games and Applications, pp. 1–20, 2019. systems. World Scientific, 2002, pp. 355–369.
[69] S. Nadarajah and K. Sundaraj, “A survey on team strategies in robot [92] T. Balch et al., “Learning roles: Behavioral diversity in robot teams,”
soccer: team strategies and role description,” Artificial Intelligence in AAAI Workshop on Multiagent Learning, 1997.
Review, vol. 40, no. 3, pp. 271–304, 2013. [93] T. Balch, “Reward and diversity in multirobot foraging,” in IJCAI-99
[70] T. Liu, J. Wang, X. Zhang, and D. Cheng, “Game theoretic control Workshop on Agents Learning About, From and With other Agents,
of multiagent systems,” SIAM Journal on Control and Optimization, 1999.
vol. 57, no. 3, pp. 1691–1709, 2019. [94] M. J. Mataric, “Learning to behave socially,” From animals to
[71] T. Başar and G. J. Olsder, Dynamic noncooperative game theory. animats, vol. 3, pp. 453–462, 1994.
SIAM, 1998. [95] S. G. Ficici and J. B. Pollack, “A game-theoretic approach to the
[72] J. R. Marden, G. Arslan, and J. S. Shamma, “Cooperative control simple coevolutionary algorithm,” in International Conference on
and potential games,” IEEE Transactions on Systems, Man, and Parallel Problem Solving from Nature. Springer, 2000, pp. 467–
Cybernetics, Part B (Cybernetics), vol. 39, no. 6, pp. 1393–1407, 476.
2009. [96] R. P. Wiegand, An analysis of cooperative coevolutionary algorithms.
[73] S. Xu and H. Chen, “Nash game based efficient global optimization George Mason University, 2004.
for large-scale design problems,” Journal of Global Optimization, [97] L. Panait, R. P. Wiegand, and S. Luke, “A visual demonstration of
vol. 71, no. 2, pp. 361–381, 2018. convergence properties of cooperative coevolution,” in International
[74] M. I. Abouheaf, F. L. Lewis, K. G. Vamvoudakis, S. Haesaert, Conference on Parallel Problem Solving from Nature. Springer,
and R. Babuska, “Multi-agent discrete-time graphical games and 2004, pp. 892–901.
reinforcement learning solutions,” Automatica, vol. 50, no. 12, pp. [98] D. Fudenberg, F. Drew, D. K. Levine, and D. K. Levine, The theory
3038–3053, 2014. of learning in games. MIT press, 1998, vol. 2.
[75] R. Gopalakrishnan, J. R. Marden, and A. Wierman, “An architectural [99] S. Liu, G. Lever, Z. Wang, J. Merel, S. Eslami, D. Hennes, W. M.
view of game theoretic control,” ACM SIGMETRICS Performance Czarnecki, Y. Tassa, S. Omidshafiei, A. Abdolmaleki et al., “From
Evaluation Review, vol. 38, no. 3, pp. 31–36, 2011. motor control to team play in simulated humanoid football,” arXiv
[76] J. R. Marden and J. S. Shamma, “Game-theoretic learning in dis- preprint arXiv:2105.12196, 2021.
tributed control,” in Handbook of dynamic game theory. Springer, [100] S. Tunyasuvunakool, A. Muldal, Y. Doron, S. Liu, S. Bohez, J. Merel,
2017, pp. 1–36. T. Erez, T. Lillicrap, N. Heess, and Y. Tassa, “dm control: Software
[77] D. Paccagnan, R. Chandan, and J. R. Marden, “Utility design for and tasks for continuous control,” Software Impacts, vol. 6, p.
distributed resource allocation—part i: Characterizing and optimizing 100022, 2020.
the exact price of anarchy,” IEEE Transactions on Automatic Control,
[101] J. Scharpff, M. Spaan, L. Volker, and M. de Weerdt, “Coordinating
vol. 65, no. 11, pp. 4616–4631, 2019.
stochastic multiagent planning in a private values setting,” Distributed
[78] D. Paccagnan and J. R. Marden, “Utility design for distributed re- and multi-agent planning, p. 17, 2013.
source allocation—part ii: Applications to submodular, covering, and
[102] A. Pla, B. Lopez, and J. Murillo, “Multi criteria operators for multi-
supermodular problems,” IEEE Transactions on Automatic Control,
attribute auctions,” in Modeling Decisions for Artificial Intelligence:
vol. 67, no. 2, pp. 618–632, 2021.
9th International Conference, MDAI 2012, Girona, Catalonia, Spain,
[79] J. R. Marden and J. S. Shamma, “Game theory and control,” Annual
November 21-23, 2012. Proceedings 9. Springer, 2012, pp. 318–328.
Review of Control, Robotics, and Autonomous Systems, vol. 1, pp.
[103] R. L. Swinth, “The establishment of the trust relationship,” Journal
105–134, 2018.
of conflict resolution, vol. 11, no. 3, pp. 335–344, 1967.
[80] Q. Yang and R. Parasuraman, “A hierarchical game-theoretic
decision-making for cooperative multiagent systems under the pres- [104] R. J. Lewicki, E. C. Tomlinson, and N. Gillespie, “Models of
ence of adversarial agents,” in Proceedings of the 38th ACM/SIGAPP interpersonal trust development: Theoretical approaches, empirical
Symposium on Applied Computing, 2023, pp. 773–782. evidence, and future directions,” Journal of management, vol. 32,
no. 6, pp. 991–1022, 2006.
[81] Q. Yang and R. Parasuraman, “Game-theoretic utility tree for multi-
robot cooperative pursuit strategy,” ISR Europe 2022; 54th Interna- [105] K. Arai, “Defining trust using expected utility theory,” Hitotsubashi
tional Symposium on Robotics, 2022, pp. 1–7, 2022. Journal of Economics, pp. 205–224, 2009.
[82] D. Pickem, P. Glotfelter, L. Wang, M. Mote, A. Ames, E. Feron, [106] J. D. Lee and K. A. See, “Trust in automation: Designing for
and M. Egerstedt, “The robotarium: A remotely accessible swarm appropriate reliance,” Human factors, vol. 46, no. 1, pp. 50–80, 2004.
robotics research testbed,” in 2017 IEEE International Conference [107] J.-H. Cho, A. Swami, and R. Chen, “A survey on trust management
on Robotics and Automation (ICRA). IEEE, 2017, pp. 1699–1706. for mobile ad hoc networks,” IEEE communications surveys &
[83] J.-H. Cho, “Tradeoffs between trust and survivability for mission tutorials, vol. 13, no. 4, pp. 562–583, 2010.
effectiveness in tactical networks,” IEEE transactions on cybernetics, [108] W. Sherchan, S. Nepal, and C. Paris, “A survey of trust in social
vol. 45, no. 4, pp. 754–766, 2014. networks,” ACM Computing Surveys (CSUR), vol. 45, no. 4, pp. 1–
[84] Y. Rizk, M. Awad, and E. W. Tunstel, “Decision making in multiagent 33, 2013.
systems: A survey,” IEEE Transactions on Cognitive and Develop- [109] H. Wang and D.-Y. Yeung, “A survey on bayesian deep learning,”
mental Systems, vol. 10, no. 3, pp. 514–529, 2018. ACM Computing Surveys (CSUR), vol. 53, no. 5, pp. 1–37, 2020.
[85] S. Russell and P. Norvig, “Artificial intelligence: a modern approach,” [110] J.-H. Cho, K. Chan, and S. Adali, “A survey on trust modeling,”
2002. ACM Computing Surveys (CSUR), vol. 48, no. 2, pp. 1–40, 2015.
[86] L. Yu, J. Song, and S. Ermon, “Multi-agent adversarial inverse [111] R. Chen, F. Bao, and J. Guo, “Trust-based service management for
reinforcement learning,” in Proceedings of the 36th International social internet of things systems,” IEEE transactions on dependable
Conference on Machine Learning. PMLR, 09–15 Jun 2019. and secure computing, vol. 13, no. 6, pp. 684–696, 2015.
[87] Z. Zhang, Y.-S. Ong, D. Wang, and B. Xue, “A collaborative [112] V. Mohammadi, A. M. Rahmani, A. M. Darwesh, and A. Sahafi,
multiagent reinforcement learning method based on policy gradient “Trust-based recommendation systems in internet of things: a system-
potential,” IEEE transactions on cybernetics, 2019. atic literature review,” Human-centric Computing and Information
[88] T. Rashid, M. Samvelyan, C. Schroeder, G. Farquhar, J. Foerster, Sciences, vol. 9, no. 1, pp. 1–61, 2019.
and S. Whiteson, “QMIX: Monotonic value function factorisation for [113] D. Quercia and S. Hailes, “Mate: Mobility and adaptation with trust
deep multi-agent reinforcement learning,” in Proceedings of the 35th and expected-utility,” International Journal of Internet Technology
International Conference on Machine Learning, ser. Proceedings of and Secured Transactions (IJITST), vol. 1, pp. 43–53, 2007.
Machine Learning Research. PMLR, 2018. [114] R. Liu, Z. Cai, M. Lewis, J. Lyons, and K. Sycara, “Trust repair in
[89] L. Panait and S. Luke, “Cooperative multi-agent learning: The state of human-swarm teams+,” in 2019 28th IEEE International Conference
the art,” Autonomous agents and multi-agent systems, vol. 11, no. 3, on Robot and Human Interactive Communication (RO-MAN). IEEE,
pp. 387–434, 2005. 2019, pp. 1–6.
[115] R. Liu, F. Jia, W. Luo, M. Chandarana, C. Nam, M. Lewis, and a review and feasibility study,” Robotics, vol. 2, no. 2, pp. 92–121,
K. Sycara, “Trust-aware behavior reflection for robot swarm self- 2013.
healing,” in Proceedings of the 18th International Conference on [135] C. Breazeal and B. Scassellati, “How to build robots that make friends
Autonomous Agents and MultiAgent Systems, ser. AAMAS ’19. and influence people,” in Proceedings 1999 IEEE/RSJ international
Richland, SC: International Foundation for Autonomous Agents and conference on intelligent robots and systems. Human and environment
Multiagent Systems, 2019, p. 122–130. friendly robots with high intelligence and emotional quotients (cat.
[116] Y. Pang and R. Liu, “Trust-aware emergency response for a resilient No. 99CH36289), vol. 2. IEEE, 1999, pp. 858–863.
human-swarm cooperative system,” in 2021 IEEE International Sym- [136] C. Breazeal, G. Hoffman, and A. Lockerd, “Teaching and working
posium on Safety, Security, and Rescue Robotics (SSRR). IEEE, with robots as a collaboration,” in AAMAS, vol. 4, 2004, pp. 1030–
2021, pp. 15–20. 1037.
[117] R. Luo, C. Huang, Y. Peng, B. Song, and R. Liu, “Repairing human [137] M. Scheutz, P. Schermerhorn, and J. Kramer, “The utility of affect
trust by promptly correcting robot mistakes with an attention transfer expression in natural language interactions in joint human-robot
model,” in 2021 IEEE 17th International Conference on Automation tasks,” in Proceedings of the 1st ACM SIGCHI/SIGART conference
Science and Engineering (CASE). IEEE, 2021, pp. 1928–1933. on Human-robot interaction, 2006, pp. 226–233.
[118] H. Yu, Z. Shen, C. Leung, C. Miao, and V. R. Lesser, “A survey [138] A. Dahiya, A. M. Aroyo, K. Dautenhahn, and S. L. Smith, “A
of multi-agent trust management systems,” IEEE Access, vol. 1, pp. survey of multi-agent human–robot interaction systems,” Robotics
35–50, 2013. and Autonomous Systems, vol. 161, p. 104335, 2023.
[119] K. K. Fullam, T. B. Klos, G. Muller, J. Sabater, A. Schlosser, [139] A. Tabrez, M. B. Luebbers, and B. Hayes, “A survey of mental
Z. Topol, K. S. Barber, J. S. Rosenschein, L. Vercouter, and M. Voss, modeling techniques in human–robot teaming,” Current Robotics
“A specification of the agent reputation and trust (art) testbed: Reports, vol. 1, pp. 259–267, 2020.
experimentation and competition for trust in agent societies,” in Pro- [140] S. Nikolaidis and J. Shah, “Human-robot cross-training: compu-
ceedings of the fourth international joint conference on Autonomous tational formulation, modeling and evaluation of a human team
agents and multiagent systems, 2005, pp. 512–518. training strategy,” in 2013 8th ACM/IEEE International Conference
[120] I. Pinyol and J. Sabater-Mir, “Computational trust and reputation on Human-Robot Interaction (HRI). IEEE, 2013, pp. 33–40.
models for open multi-agent systems: a review,” Artificial Intelligence [141] S. Nikolaidis, S. Nath, A. D. Procaccia, and S. Srinivasa, “Game-
Review, vol. 40, no. 1, pp. 1–25, 2013. theoretic modeling of human adaptation in human-robot collabora-
[121] R. Conte and M. Paolucci, Reputation in artificial societies: Social tion,” in Proceedings of the 2017 ACM/IEEE international conference
beliefs for social order. Springer Science & Business Media, 2002, on human-robot interaction, 2017, pp. 323–331.
vol. 6. [142] S. Nikolaidis, Y. X. Zhu, D. Hsu, and S. Srinivasa, “Human-robot
[122] Z. R. Khavas, S. R. Ahmadzadeh, and P. Robinette, “Modeling trust mutual adaptation in shared autonomy,” in Proceedings of the 2017
in human-robot interaction: A survey,” in International Conference ACM/IEEE International Conference on Human-Robot Interaction,
on Social Robotics. Springer, 2020, pp. 529–541. 2017, pp. 294–302.
[123] P. A. Hancock, D. R. Billings, K. E. Schaefer, J. Y. Chen, E. J. [143] M. Kwon, M. F. Jung, and R. A. Knepper, “Human expectations of
De Visser, and R. Parasuraman, “A meta-analysis of factors affecting social robots,” in 2016 11th ACM/IEEE International Conference on
trust in human-robot interaction,” Human factors, vol. 53, no. 5, pp. Human-Robot Interaction (HRI). IEEE, 2016, pp. 463–464.
517–527, 2011. [144] M. Lewis, K. Sycara, and P. Walker, “The role of trust in human-
[124] M. Chen, S. Nikolaidis, H. Soh, D. Hsu, and S. Srinivasa, “Trust- robot interaction,” Foundations of trusted autonomy, pp. 135–159,
aware decision making for human-robot collaboration: Model learn- 2018.
ing and planning,” ACM Transactions on Human-Robot Interaction [145] A. D. Dragan and S. S. Srinivasa, Formalizing assistive teleoperation.
(THRI), vol. 9, no. 2, pp. 1–23, 2020. MIT Press, July, 2012, vol. 376.
[125] Y. Wang, L. R. Humphrey, Z. Liao, and H. Zheng, “Trust-based multi- [146] J. Rios-Martinez, A. Spalanzani, and C. Laugier, “From proxemics
robot symbolic motion planning with a human-in-the-loop,” ACM theory to socially-aware navigation: A survey,” International Journal
Transactions on Interactive Intelligent Systems (TiiS), vol. 8, no. 4, of Social Robotics, vol. 7, pp. 137–153, 2015.
pp. 1–33, 2018. [147] I. R. Nourbakhsh, K. Sycara, M. Koes, M. Yong, M. Lewis, and
[126] B. C. Kok and H. Soh, “Trust in robots: Challenges and opportuni- S. Burion, “Human-robot teaming for search and rescue,” IEEE
ties,” Current Robotics Reports, vol. 1, pp. 297–309, 2020. Pervasive Computing, vol. 4, no. 1, pp. 72–79, 2005.
[127] B. Davis, M. Glenski, W. Sealy, and D. Arendt, “Measure utility, gain [148] D. Sadigh, N. Landolfi, S. S. Sastry, S. A. Seshia, and A. D. Dragan,
trust: practical advice for xai researchers,” in 2020 IEEE Workshop “Planning for cars that coordinate with people: leveraging effects on
on TRust and EXpertise in Visual Analytics (TREX). IEEE, 2020, human actions for planning and active information gathering over
pp. 1–8. human internal state,” Autonomous Robots, vol. 42, pp. 1405–1426,
2018.
[128] Q. Yang and R. Parasuraman, “How can robots trust each other
[149] B. Kuipers, “How can we trust a robot?” Communications of the
for better cooperation? a relative needs entropy based robot-robot
ACM, vol. 61, no. 3, pp. 86–95, 2018.
trust assessment model,” in 2021 IEEE International Conference on
[150] S. V. Albrecht and P. Stone, “Autonomous agents modelling other
Systems, Man, and Cybernetics (SMC). IEEE, 2021, pp. 2656–2663.
agents: A comprehensive survey and open problems,” Artificial
[129] A. Toichoa Eyam, W. M. Mohammed, and J. L. Martinez Lastra,
Intelligence, vol. 258, pp. 66–95, 2018.
“Emotion-driven analysis and control of human-robot interactions in
[151] A. Tabrez, S. Agrawal, and B. Hayes, “Explanation-based reward
collaborative applications,” Sensors, vol. 21, no. 14, p. 4626, 2021.
coaching to improve human performance via reinforcement learning,”
[130] T. Sanders, K. E. Oleson, D. R. Billings, J. Y. Chen, and P. A.
in 2019 14th ACM/IEEE International Conference on Human-Robot
Hancock, “A model of human-robot trust: Theoretical model develop-
Interaction (HRI). IEEE, 2019, pp. 249–257.
ment,” in Proceedings of the human factors and ergonomics society
[152] T. Chakraborti, S. Sreedharan, Y. Zhang, and S. Kambhampati, “Plan
annual meeting, vol. 55, no. 1. SAGE Publications Sage CA: Los
explanations as model reconciliation: Moving beyond explanation as
Angeles, CA, 2011, pp. 1432–1436.
soliloquy,” arXiv preprint arXiv:1701.08317, 2017.
[131] A. Zacharaki, I. Kostavelis, A. Gasteratos, and I. Dokas, “Safety [153] D. Leyzberg, A. Ramachandran, and B. Scassellati, “The effect of
bounds in human robot interaction: A survey,” Safety science, vol. personalization in longer-term robot tutoring,” ACM Transactions on
127, p. 104667, 2020. Human-Robot Interaction (THRI), vol. 7, no. 3, pp. 1–19, 2018.
[132] M. M. de Graaf and S. B. Allouch, “The relation between people’s
attitude and anxiety towards robots in human-robot interaction,” in
2013 IEEE RO-MAN. IEEE, 2013, pp. 632–637.
[133] C. L. Bethel, K. Salomon, R. R. Murphy, and J. L. Burke, “Survey
of psychophysiology measurements applied to human-robot interac-
tion,” in RO-MAN 2007-The 16th IEEE International Symposium on
Robot and Human Interactive Communication. IEEE, 2007, pp.
732–737.
[134] L. Tiberio, A. Cesta, and M. Olivetti Belardinelli, “Psychophysiolog-
ical methods to evaluate user’s response in human robot interaction:

Understanding The Application of Utility Theory in Robotics and Artificial Intelligence - A Survey

Uploaded by

Copyright:

Available Formats

You might also like

Understanding The Application of Utility Theory in Robotics and Artificial Intelligence - A Survey

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Understanding The Application of Utility Theory in Robotics and Artificial Intelligence - A Survey

Uploaded by

Copyright:

Available Formats

Understanding the Application of Utility Theory in Robotics

and Artificial Intelligence: A Survey

Abstract — As a unifying concept in economics, game State (St)

theory, and operations research, even in the Robotics St+1

for decision-making and learning in multi-agent/robot Reward (rt)

Environment Action (at) Agent

Trust among agents

Low-level skills: Mid-level skills: High-level skills:

You might also like