Professional Documents
Culture Documents
Distributed Autonomous Robotic Systems
Distributed Autonomous Robotic Systems
Distributed
Autonomous
Robotic
Systems
16th International Symposium
Springer Proceedings in Advanced
Robotics 28
Series Editors
Bruno Siciliano, Dipartimento di Ingegneria Elettrica e Tecnologie dell’Informazione,
Università degli Studi di Napoli Federico II, Napoli, Italy
Oussama Khatib, Robotics Laboratory Department of Computer Science, Stanford
University, Stanford, CA, USA
Advisory Editors
Gianluca Antonelli, Department of Electrical and Information Engineering, University
of Cassino and Southern Lazio, Cassino, Italy
Dieter Fox, Department of Computer Science and Engineering, University of
Washington, Seattle, WA, USA
Kensuke Harada, Engineering Science, Osaka University Engineering Science,
Toyonaka, Japan
M. Ani Hsieh, GRASP Laboratory, University of Pennsylvania, Philadelphia, PA, USA
Torsten Kröger, Karlsruhe Institute of Technology, Karlsruhe, Germany
Dana Kulic, University of Waterloo, Waterloo, ON, Canada
Jaeheung Park, Department of Transdisciplinary Studies, Seoul National University,
Suwon, Korea (Republic of)
The Springer Proceedings in Advanced Robotics (SPAR) publishes new developments
and advances in the fields of robotics research, rapidly and informally but with a high
quality.
The intent is to cover all the technical contents, applications, and multidisciplinary
aspects of robotics, embedded in the fields of Mechanical Engineering, Computer Sci-
ence, Electrical Engineering, Mechatronics, Control, and Life Sciences, as well as the
methodologies behind them.
The publications within the “Springer Proceedings in Advanced Robotics” are pri-
marily proceedings and post-proceedings of important conferences, symposia and con-
gresses. They cover significant recent developments in the field, both of a founda-
tional and applicable character. Also considered for publication are edited monographs,
contributed volumes and lecture notes of exceptionally high quality and interest.
An important characteristic feature of the series is the short publication time and
world-wide distribution. This permits a rapid and broad dissemination of research results.
Indexed by SCOPUS, SCIMAGO, WTI Frankfurt eG, zbMATH.
All books published in the series are submitted for consideration in Web of Science.
Julien Bourgeois · Jamie Paik · Benoît Piranda ·
Justin Werfel · Sabine Hauert · Alyssa Pierson ·
Heiko Hamann · Tin Lun Lam ·
Fumitoshi Matsuno · Negar Mehr ·
Abdallah Makhoul
Editors
Distributed Autonomous
Robotic Systems
16th International Symposium
Editors
Julien Bourgeois Jamie Paik
University of Franche-Comté École Polytechnique Fédérale de Lausanne
Montbéliard, France Lausanne, Switzerland
Abdallah Makhoul
University of Franche-Comté
Montbéliard, France
© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2024
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of
illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the
editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors
or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
At the dawn of the century’s third decade, robotics is reaching an elevated level of
maturity and continues to benefit from the advances and innovations in its enabling
technologies. These all are contributing to an unprecedented effort to bring robots to
human environment in hospitals and homes, factories and schools; in the field for robots
fighting fires, making goods and products, picking fruits and watering the farmland, and
saving time and lives. Robots today hold the promise for making a considerable impact
in a wide range of real-world applications from industrial manufacturing to healthcare,
transportation, and exploration of the deep space and sea. Tomorrow, robots will become
pervasive and touch upon many aspects of modern life.
The Springer Tracts in Advanced Robotics (STAR) was launched in 2002 with the
goal of bringing to the research community the latest advances in the robotics field
based on their significance and quality. During the last fifteen years, the STAR series has
featured the publication of both monographs and edited collections. Among the latter,
the proceedings of thematic symposia devoted to excellence in robotics research, such
as ISRR, ISER, FSR, and WAFR, have been regularly included in STAR.
The expansion of our field as well as the emergence of new research areas has
motivated us to enlarge the pool of proceedings in the STAR series in the past few years.
This has ultimately led to launching a sister series in parallel to STAR. The Springer
Proceedings in Advanced Robotics (SPAR) is dedicated to the timely dissemination of
the latest research results presented in selected symposia and workshops.
This volume of the SPAR series brings the proceedings of the sixteenth edition of the
DARS Symposium on Distributed Autonomous Robotic Systems, whose proceedings
have been published within SPAR since the thirteenth edition. This symposium took
place in Montbéliard, France November 28 to 30, 2022. The volume edited by Julien
Bourgeois, Justin Werfel, Jamie Paik, Benoît Piranda, Sabine Hauert, Alyssa Pierson,
Fumitoshi Matsuno, Negar Mehr, Heiko Hamann, Tin Lun Lam, and Abdallah Makhoul
contains 39 scientific contributions cutting across mobile sensor networks, unmanned
aerial vehicles, multi-agent systems, algorithms for multi-robot systems, modular robots,
swarm robotics, and reinforcement learning or deep learning, all united through the
common thread of distributed robotic systems.
With its excellent technical program, DARS culminates with this unique reference on
the current developments and new advances in distributed autonomous robotic systems—
a genuine tribute to its contributors and organizers!
Distributed robotics is an interdisciplinary and rapidly growing area, with research and
applications spanning fields including computer science, communication and control
systems, electrical and mechanical engineering, life sciences, and humanities. This book
brings together advanced works in this field presented at the 16th International Sympo-
sium on Distributed Autonomous Robotic Systems (DARS), November 28–30, 2022, in
Montbéliard, France.
During this symposium, 39 papers were presented covering a broad scope of topics
within distributed robotics including mobile sensor networks, unmanned aerial vehi-
cles, multi-agent systems, algorithms for multi-robot systems, modular robots, swarm
robotics, and reinforcement learning or deep learning applied to multi-robot systems.
Three keynote talks featured renowned experts in the field: Alcherio Martinoli
(EPFL), on “Gas and Particle Sensing in Air: Challenges, Achievements, and Lessons”;
Jessica Flack (SFI), on “Hourglass Emergence”; and Kirstin Petersen (Cornell), on
“Designing Robotic Systems with Collective Embodied Intelligence.”
The welcome reception featured an artwork made from Blinky Blocks, created by
the artist duo Scenocosme. They have been working for several years with this hard-
ware developed at Carnegie Mellon University and the FEMTO-ST Institute and have
exhibited this work in many museums.
The program committee awarded the Best Paper Award to Shoma Abe, Jun
Ogawa, Yosuke Watanabe, MD Nahin Islam Shiblee, Masaru Kawakami, and Hidemitsu
Furukawa for their work on “Application of 3D Printed Vacuum-actuated module with
Multi-Soft Material to Support Handwork Rehabilitation,” and the Best Student Paper
Award to Yuhong Cao, Zhanhong Sun and Guillaume Sartoretti for their work on “DAN:
Decentralized Attention-based Neural Network for the MinMax Multiple Traveling
Salesman Problem.”
Finally, we would like to thank all those who contributed to the success of DARS
2022, including Pays de Montbéliard Agglomération, EIPHI Graduate School, MA
Scène Nationale, Région Bourgogne Franche-Comté for sponsoring the event, the DARS
advisory committee for their useful advice, the program committee for their high-quality
viii Preface
scientific work, and the organizing committee for serving the attendees during the
conference.
1 Introduction
Along with the diffusion of robotic platforms, multi-robot systems are expected to be
deployed in a variety of applications in our everyday lives, including precision agricul-
ture, logistics, environmental monitoring, or surveillance [1]. Their spread is enabled
not only by the availability of efficient and safe coordination strategies, but also of intu-
itive communication modalities. Specifically, given the complexity associated with the
coordination among numerous individual robots, it is important that, on the one hand,
users can exert control over the multi-robot system in a straightforward and intuitive
manner and, on the other hand, users can easily understand what robots are doing or
going to do. This second problem is the focus of our work and refers to the availability
of communication modalities that allow robots to give information about their status in
a manner that is easily perceived and understood by human users.
Feedback from the robotic platform to the user can be communicated either explic-
itly or implicitly [2]. Implicit communication consists in defining a robot behavior that
enables the collaborator to quickly and confidently infer the goal [3–6]. This idea is
related to the notion of legibility and requires the use of implicit cues, such as robot
motion. On the contrary, in explicit communication explicit cues are used to provide
feedback from the robot to the user. Classical approaches in this regard resort to audio
L. Sabattini—This work was supported by the COLLABORATION Project through the Italian
Ministry of Foreign Affairs and International Cooperation.
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 1–13, 2024.
https://doi.org/10.1007/978-3-031-51497-5_1
2 V. Villani et al.
2 Methods
In this section, we describe the experiments carried out to compare implicit and explicit
communication in human-multi-robot system interaction. The experiment, which is
described in details in Sect. 2.2, consists in having two groups of mobile robots moving
towards a goal and asking participants to infer the goal relying on explicit or implicit
cues.
of the jerk, which is the rate of change of the acceleration. Previous works have shown
that this kind of trajectory is usually used by humans to grasp objects and improves the
interaction of a human with a robot manipulator [13, 14]. Moreover, in [5, 6] it has been
shown that this trajectory improves the legibility of one or several multi-robot systems
that move in a shared environment with the user.
As regards explicit communication, vocal commands consisted in a sentence like
“The group of robots number 1 is moving towards the (e.g.,) red goal”. Color cues
consisted in highlighting the group of robots and their goal with the same color, as
shown in Fig. 1(b). Arrows are shown in Fig. 1(b) and simply consist in an arrow starting
from a group of robot and ending in the associated goal.
It is worthwhile noting that the considered explicit communication modalities are
very intuitive and do not require any processing or intuition by the user, as opposed
to implicit communication. Nevertheless, implementing the explicit communication
modalities requires specific hardware (i.e., speaker, light, projector), which is not
needed in the case of implicit communication.
Participants were asked to selected the goal for each group of robot as soon as they
were confident with it. The task was repeated four times, each time with a different
communication modality (either implicit or one of the explicit ones). The order com-
munication modalities were used varied among participants. The second part of the
experiment was organized as the first one, with the same objectives. In addition, partic-
ipants were involved in a primary task, consisting of a puzzle. Hence, they were asked
to focus on a puzzle, while robots were moving towards goals. As in the first part, they
were asked to select the goal of each group of robot as soon as they were confident with
it. Figure 2 shows the experimental setting in the second part of the task. As a result, the
experimental task of each participant consisted in eight trails, where all the considered
communication modalities were tested in random order twice: in the first part, focus
was exclusively on inferring robots goal; in the second part, participants had to solve a
puzzle and, at the same time, infer robots goal.
Before the beginning of the experiment, participants were introduced to the objec-
tives of the experiment and its setup. Moreover, they were instructed on how to interact
with the system (e.g., how to select the objects in the environment), and they were
informed that clues about the goal of the groups may have appeared during the experi-
ment. At the end of the experiment, they were administered a questionnaire investigat-
ing their preference for the tested communication modalities.
The environment was developed in Unity and the experiments were reproduced
with the Oculus Rift headset. Goals were rendered as colored boxes, while robots were
rendered as elliptical cylinders (width equal to 0.05 m, lengths equal to 0.1 m, height
equal to 0.01 m). The two groups of robots were identified by a text box “Group 1”
or “Group 2”, moving with the group. Next to goals, two buttons were placed to let
participants select the goal inferred for each group. Interaction was enabled though the
Oculus remote controller, called Oculus Remote. Participants could select pieces of the
puzzle or the goal of group of robots through the central button of the controller.
2.4 Participants
A total of 30 volunteer subjects were enrolled in the experiment. Thesubjects are stu-
dents and researchers working at our engineering department, in different research fields
and are categorized as follows: 12 females, 18 males, spread across an age range of 20–
65 years, with mean age of 25.5 years. All the participants were completely naı̈ve to
the experimental task and goals. Ten of them had never used virtual reality before the
experiment, while eleven reported they had used it once or twice. The remaining nine
participants reported they had used it more than twice before this experiment. More-
over, three reported they had never played videogames before, twelve reported they
play less than once in a month, five play some times per month, and ten are used to play
at least once in a week. The study protocol followed the Helsinki declaration and the
participants signed the informed consent form before starting the experiment. All the
data were analyzed and reported anonymously.
During each experiment, we collected data about the participants’ performance, in terms
of selected goal for each robot group, time taken from the beginning of robots move-
ments to selection of the goal for each group and, for the second experimental task, time
required to complete the puzzle. A total of 480 values were measured for these quanti-
ties (240 for time required to complete the puzzle), since each subject participated in the
two parts composing the experiment (without and with puzzle), and in each part they
experienced the four considered communication modalities and two groups of robots
were used. Data were analyzed using MATLAB R2021a and its Statistics and Machine
Learning Toolbox.
The final questionnaire included four questions addressing comfort during the
experiment, effectiveness of the considered communication modalities in the first and
the second part of the experiment, and overall preference for such modalities.
3 Results
3.1 Correctness of Responses
The first analysis we carried out investigated whether the use of implicit or explicit com-
munication modalities has some influences on legibility of robots groups. Specifically,
6 V. Villani et al.
Table 2. Correctness of participant’s responses for the different robots goals, as shown in Fig. 1.
Goal Correctness
0 1
Leftmost (yellow) 8 86
Second from left (green) 19 77
Middle (red) 4 74
Second from right (blue) 31 72
Rightmost (cyan) 8 101
Table 3. Correctness of participant’s responses with respect to the presence of puzzle solving,
meant as additional confounding task.
blue goals were often confused with the leftmost and rightmost goals, respectively. Con-
versely, these latter were rarely confused since the corresponding trajectory exhibited
a large deviation to left or right. These considerations are corroborated by the results
of the chi-squared test, which returned a p-value p = 4.72 · 10−7 for H1, thus reject-
ing the null hypothesis and confirming that the position of goals influences accuracy of
estimation.
As regards the presence of a confounding task test participants had to focus on while
robots were moving towards goals, Table 3 shows that more errors were made when no
confounding task was included. This could be due to the fact that this condition was
always administered as first to all the participants, to let them become familiar with
the communication modalities, which were the main focus of our experiments. Thus,
the results in Table 3 might reflect the learning effect. The chi-squared test returned a
p-value p = 0.02 and, hence, the null hypothesis H2 can be rejected. However, a deeper
analysis in this regard should be carried out randomizing the order of the two tasks
among test participants.
Table 4. P-values for three-way ANOVA: C denotes the communication modality, G robots goal
and P the presence of puzzle solving as confounding task.
Variable p-value
C < 10−7
G < 10−7
P 0.73
CG < 10−7
CP 0.34
PG 0.76
(ANOVA) in terms of the p-value of all the main effects C, G and P and also of the
interaction variables. Results in the table show that communication modality (C) and
goal (G) are the main effects that influence response time in a statistically significant
manner. Also the effect of their interaction (CG) results statistically significant. As a
result, we can reject the null hypothesis for the variables C and G since they influence
response time. Conversely, the presence of a confounding task, such as puzzle solving,
did not significantly affect participant’s response time. It should be clarified that no pri-
ority was given to test participants regarding the two tasks, puzzle solving and guessing
robots goal: they could stop puzzle solving as soon as they had a reliable intuition about
the goal of any of the two groups of robots.
To further investigate differences existing among the experimental conditions, we
carried out a post-hoc analysis, based on multiple pairwise comparisons. Indeed, results
in Table 4 suggest that for the variable C, G and CG, on average, at least one of the con-
sidered experimental conditions is different from the others. However, this analysis does
not provide details on which conditions are different from each other. To answer this
question, we performed the Tukey’s honestly significant difference test [17]. As regards
the variable C, that is the effect of communication modality on participant’s response
time, Fig. 3(a) shows the mean response time marginalized for the considered com-
munication modalities. The figure shows that implicit communication caused higher
response time than explicit modalities, which all required comparable response time.
Multiple comparison with Tukey’s test confirmed this result, since the only pairs with
statistically significant difference are those involving implicit communication (p-values
p < 10−10 for all the pairs (implicit, arrows), (implicit, colors), (implicit, audio)). Con-
versely, for all possible pairs with two explicit communication modalities, there was no
statistically significant difference between mean response times in each pair (p-value
p = 0.99 for the pair (arrows, colors), p = 0.16 for (arrows, audio) and p = 0.30 for
(colors, audio)).
As regards the variable G, which refers to robots goal, Fig. 3(b) reports marginal
mean response time. The figure shows that the green and blue target required longer
response times, thus confirming that they were the most critical, as anticipated in
Sect. 3.1. In particular, the green one returned statistically significantly different
response time with respect to yellow (p = 0.0017), red (p = 0.0027) and cyan (p =
How Can We Understand Multi-Robot Systems? 9
Fig. 3. Mean response time for the different communication modalities and robots goals.
10−4 ) targets, while response time for the blue one was statistically significantly differ-
ent for the cyan (p = 0.0162) goal only.
The combined effect of communication modality and robots goal (variable CG
in Table 4) is described by Fig. 4. The figure confirms that, for each goal, implicit
cues were the slowest communication modality. However, within a goal, the difference
between mean response time for implicit communication and response time for any
explicit modality was not statistically significant only in the case of the yellow and cyan
goals: for the yellow goal (first four rows in Fig. 4) p-value p = 0.9917 for (implicit,
audio), p = 0.9481 for (implicit, colors) and p = 0.9139 for (implicit, arrows); for the
cyan goal (last four rows in Fig. 4) p-value p = 1.000 for (implicit, audio), p = 0.9758
for (implicit, colors) and p = 1.000 for (implicit, arrows). This result is interesting since
the yellow and cyan goals are the only ones having one neighbor, hence they are less
Fig. 4. Mean response time marginalized over communication modality C and robots goal G.
10 V. Villani et al.
Fig. 5. Mean response time marginalized over communication modality C and involvement in
confounding task P.
ambiguous than the other goals: under these circumstances, the delay introduced by
implicit communication is neglectable, as confirmed by p-values and Fig. 4. From this
result, we can assume that implicit communication is a viable communication modality
in the case of enough distinct and unambiguous robots goals.
As a last remark, in Fig. 5 we summarize the effect of communication modality and
involvement in a confounding task on response time. The figure shows that the longest
response times are given in the case of implicit cues, no matter whether the subject
was involved in puzzle solving or focused on robots movements. Moreover, as regards
explicit communication, no significant differences are found between the presence of
puzzle (P = 1) and its absence (P = 0). This is also confirmed by Tukey’s test: the only
p-values that are less than 0.01 are those of pairs involving any explicit communication
and implicit cues, with or without puzzle. This result might lead to the consideration
that if the only reason behind the use of implicit communication is not to overwhelm
users since they are busy with other tasks that are occupying their perceptive channels
(i.e., need to look at something or to pay attention to sounds), then the choice of implicit
communication might be not advantageous as expected. In this regard, Fig. 6 shows that
the use of different communication modalities (and goals) has no effect at all on the time
required to complete the confounding task, namely puzzle solving.
Fig. 6. Mean time required for puzzle solving marginalized over communication modality C and
robots goal C.
never) to 4 (yes, always), 80% of them reported that they had always been at ease
during the experiment and 20% almost always; 60% reported they had always been
comfortable with the interaction system and 40% almost always.
Then, they were asked to rate each communication modality on a scale from 1 (very
bad) to 5 (very good). The average scores are as follows: 3.3 for implicit communica-
tion, 4.3 for explicit communication with color cues, 4.3 for explicit communication
with vocal commands and 3.6 for explicit communication with arrows.
Regarding the second part of the experiment, that is the one including puzzle solv-
ing, we asked how much each communication modality was perceived as disturbing.
Scores were on the scale from 1 (not disturbing) to 5 (very disturbing). On average,
implicit communication was rated 1.9, color cues 1.6, arrows 2.6 and audio messages
1.8. Arrows were found as the most disturbing communication modalities and, interest-
ingly, implicit cues were found almost as disturbing as vocal synthesis.
Finally, the following questions were asked:
Replies to these questions are reported in Table 5. Participants reporting show that they
did not appreciate implicit communication and, if given the choice, they prefer not to
use it. However, it is noteworthy that it was not the most disturbing communication
modality while they were involved in another confounding task. Among the considered
explicit communication modalities, color cues and arrows were preferred over audio,
which was found almost as disturbing as implicit cues.
4 Conclusion
In this paper we investigated the use of different communication modalities for a multi-
robot system. Specifically, the aim is to let a human subject be aware of the goal robots
are moving towards. To this end, we considered implicit and explicit communication
modalities. In implicit communication, users infer robots goal from their trajectory,
whereas for explicit communication we considered color cues, where the robots and
their goal exhibit the same color, arrows pointing from the center of the multi-robot
system to the goal, and audio messages. We designed an experiment aimed to assess
which communication modality is the most effective and how they are perceived by
human subjects.
Results have shown that, on average, implicit cues cause the highest number of
errors and require the longest response time. While this result is not surprising, the
experiments have highlighted that implicit communication is comparable to explicit
communication modalities when robots goals are unambiguous and sufficiently distinct.
Moreover, implicit communication does not require any specific hardware on robots,
which is a practical limitation of explicit communication. Conversely, according to our
results, implicit communication does not provide any advantage with respect user’s
involvement in other tasks that involve their perceptive channels, such as looking at
something requiring attention.
References
1. Schranz, M., Umlauft, M., Sende, M., Elmenreich, W.: Swarm robotic behaviors and current
applications. Front. Robot. AI 7, 36 (2020)
How Can We Understand Multi-Robot Systems? 13
2. Lasota, P.A., Fong, T., Shah, J.A., et al.: A survey of methods for safe human-robot interac-
tion. Found. Trends Robot. 5(4), 261–349 (2017)
3. Dragan, A.D., Lee, K.C.T., Srinivasa, S.S.: Legibility and predictability of robot motion. In:
Proceedings of 8th Annual ACM/IEEE International Conference Human-Robot Interaction,
pp. 301–308. IEEE Press (2013)
4. St-Onge, D., Levillain, F., Zibetti, E., Beltrame, G.: Collective expression: how robotic
swarms convey information with group motion. Paladyn, J. Behav. Robot. 10(1), 418–435
(2019)
5. Capelli, B., Villani, V., Secchi, C., Sabattini, L.: Understanding multi-robot systems: on the
concept of legibility. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and
Systems (IROS), pp. 7355–7361. IEEE (2019)
6. Capelli, B., Villani, V., Secchi, C., Sabattini, L.: Communication through motion: Legibility
of multi-robot systems. In: 2019 International Symposium on Multi-Robot and Multi-Agent
Systems (MRS), pp. 126–132. IEEE (2019)
7. Clair, A.S., Mataric, M.: How robot verbal feedback can improve team performance in
human-robot task collaborations. In: Proceedings of 10th Annual ACM/IEEE International
Conference Human-Robot Interaction, pp. 213–220. ACM (2015)
8. Johnson, T., Tang, G., Fletcher, S.R., Webb, P.: Investigating the effects of signal light posi-
tion on human workload and reaction time in human-robot collaboration tasks. In: Advances
in Ergonomics of Manufacturing: Managing the Enterprise of the Future, pp. 207–215.
Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41697-7 19
9. May, A.D., Dondrup, C., Hanheide, M.: Show me your moves! conveying navigation inten-
tion of a mobile robot to humans. In: European Conference Mobile Robots (ECMR), pp.
1–6. IEEE (2015)
10. Szafir, D., Mutlu, B., Fong, T.: Communicating directionality in flying robots. In: Proceed-
ings of 10th Annual ACM/IEEE International Conference Human-Robot Interaction, pp.
19–26. IEEE (2015)
11. De Graaf, M.M.A., Malle, B.F.: People’s explanations of robot behavior subtly reveal men-
tal state inferences. In: 2019 14th ACM/IEEE International Conference on Human-Robot
Interaction (HRI), pp. 239–248. IEEE (2019)
12. Kim, L.H., Follmer, S.: Generating legible and glanceable swarm robot motion through tra-
jectory, collective behavior, and pre-attentive processing features. ACM Trans. Hum.-Robot
Interact. (THRI) 10(3), 1–25 (2021)
13. Flash, T., Hogan, N.: The coordination of arm movements: an experimentally confirmed
mathematical model. J. Neurosci. 5(7), 1688–1703 (1985)
14. Glasauer, S., Huber, M., Basili, P., Knoll, A., Brandt, T.: Interacting in time and space: inves-
tigating human-human and human-robot joint action. In: 19th International Symposium in
Robot and Human Interactive Communication, pp. 252–257. IEEE (2010)
15. Villani, V., Capelli, B., Sabattini, L.: Use of virtual reality for the evaluation of human-robot
interaction systems in complex scenarios. In: 2018 27th IEEE International Symposium on
Robot and Human Interactive Communication (RO-MAN), pp. 422–427. IEEE (2018)
16. McHugh, M.L.: The Chi-square test of independence. Biochemia Medica 23(2), 143–149
(2013)
17. Miller, R.G.: Simultaneous statistical inference. Springer-Verlag New York Inc., (1981).
https://doi.org/10.1007/978-1-4613-8122-8
The Benefits of Interaction Constraints
in Distributed Autonomous Systems
1 Introduction
robotics using platforms such as the Kilobots with infrared transceivers that are
limited to a physical communication radius of 10 cm [16,20]. However, in this
paper we argue that we ought to consider which agents should be allowed to
interact, rather than merely considering how those agents interact.
A common modelling assumption in distributed multi-agent and multi-robot
systems is the “well-stirred system” assumption: In a well-stirred system each
agent is equally likely to encounter any other agent in the system and, therefore,
each interaction is regarded as an independent event [15]. This equates to a sys-
tem in which the interaction network of the agents is totally connected and edges
are selected at random to simulate the random interactions of different pairs of
agents. However, this assumption does not consider that intentional constraints
placed on agent interactions may yield better performance than if the interac-
tion network is assumed to be totally connected and interaction depends only
upon the physical proximity of the agents. More recently, [18] found that “less is
more” when it comes to communication in a robot swarm and that, by reducing
the physical communication radius of the robots (or the density of the swarm),
the system-level performance improves as a result of there being less communi-
cation overall. While this is the most recent study to highlight the problem of
too much communication, [9, Sect. 3.2] previously demonstrated a similar prob-
lem in which agents became “too cooperative”. This general problem can be
traced further back to [14] in which the authors examined the effect of network
structure on system-level performance. A general survey of recent literature that
considers the connectivity of a multi-agent/robot system conducting collective
exploration is provided in [13].
In previous work we have studied the impact of applying constraints to the
interaction network in a non-physical multi-agent system within a collective
learning setting [5]. We showed that the “less is more” hypothesis holds true
for collective learning when the underlying interaction network’s topology is a
small-world network and that interaction networks with low connectivity lead
to optimal performance for collective learning. However, this study was limited
to studying interaction networks in abstract simulations, without the ability to
consider the physical network layer. Therefore, in this work we propose to study
both the physical and interaction networks in combination, to determine which
layer of connectivity is more important for performance in the development of
distributed autonomous systems.
An outline of the remainder of this paper is as follows: We begin by describing
collective learning as the context of our investigation, before defining a three-
valued model for collective learning. We present results for a totally-connected
interaction network but where the physical network connectivity is varied, so
as to understand the extent to which the physical network has an impact on
system performance. Then, we introduce constraints over the interaction network
by restricting its topology to that of a regular lattice network, before studying
system performance with respect to different levels of connectivity. Finally, we
conclude our findings with some discussions and highlight our main contributions
before detailing directions for future work.
16 M. Crosscombe and J. Lawry
B (pi )
B(pi ) 0 12 1
0 0 0 12
1
2
0 12 1
1 12 1 1
where ∈ [0, 0.5] is a noise parameter denoting the probability that the evidence
is erroneous.
Belief Fusion. In collective learning the agents do not exist in isolation. Instead,
individual agents can benefit from being able to interact with, and learn from,
other agents in a shared environment. To this end, we now describe the indirect
learning process by which pairs of agents fuse their beliefs in order to improve
the system’s ability to learn the state of the world.
The belief fusion process provides two crucial benefits: Firstly, agents are able
to propagate information they obtain directly from the environment through the
system by fusing their beliefs with those of other agents. We propose to combine
beliefs using the three-valued fusion operator in Table 1 which is applied element-
wise to all of the propositional variables pi ∈ P. Given two beliefs B and B ,
corresponding to the beliefs of two agents, the fused belief is then given by:
Secondly, the belief fusion operator helps to correct for errors that occur when
the evidence obtained by the agents is subject to noise. This error-correcting
effect arises when two agents have gathered evidence about the same proposition
pi , independently of one another, and encounter a disagreement. When a pair
of agents differ in their belief about the truth value of pi , then the belief fusion
process leads them both to become uncertain with regards to pi . As an example,
suppose that two agents hold the beliefs B1 (pi ) = 1 and B2 (pi ) = 0, then
upon the agents fusing their beliefs such that B1 B2 (pi ) = 12 , both agents
will attempt to seek additional evidence about proposition pi , either directly
from the environment or indirectly via fusion with other agents, having become
uncertain about the truth value of pi . Notice that the belief fusion process can
lead to agents becoming less certain when the fusing agents disagree about the
truth value of a given proposition, while evidential updating only leads to agents
becoming more certain.
The Benefits of Interaction Constraints 19
4 Simulation Environment
Fig. 2. A snapshot of the simulation during an experiment. In this model two different
types of networks are formed which govern the agent interactions in the system. (a)
A snapshot of the simulation environment during runtime. (b) A zoomed-in view of 7
agents with a system-wide communication radius Cr = 20. (c) A representation of the
physical network that forms from the agents’ communication radii. (d) A representation
of the pre-assigned interaction network between the visible agents. (e) The resulting
communication links that are available according to edges which are present in both
the physical and interaction networks.
world, which can be averaged across the system to give us a general view of
performance. To show this, the colours of the hexagons increase in intensity rep-
resenting the collective certainty about each location as contained in the belief
of the average agent. The different colours are used to represent ocean (blue),
beach (yellow), and low and high terrain (light and dark green), but are purely
visual in this context. The brown hexagon is the initial launch location from
which all agents begin exploring. In the case that the system collectively adopts
the incorrect truth value for a given proposition, the hexagon at its associated
location is coloured red, with its intensity still representing the average certainty
with which the agents have (incorrectly) determined its truth value. At the end
of the simulation (Fig. 1c) the agents have reached a consensus about the state
of the world and the map is in full view, albeit with some errors depicted by red
hexagons.
single time step t. Assuming each agent acts as a node in both a physical network
and an interaction network, then the red edges connect pairs of agents which
are physically capable of communicating, based on the agents being within their
communication range with radius Cr . Edges in the physical network are added
or removed during each time step t of the simulation according to which pairs
of agents are within communication range of one another. The yellow edges,
meanwhile, depict the connectivity of the interaction network as determined at
the start of each experiment. Unlike in the physical network, the pairs of agents
which are allowed to interact are immutable, such that no new edges are added,
nor existing edges removed after the network is initialised. The green edges
depict which edges exist in both networks at a given time step t. Each edge then
corresponds to a pair of agents which are both able and allowed to interact with
one another.
An interaction network can be constructed in whatever manner leads to opti-
mal system performance, provided that the agents remain capable of interacting
physically in the environment during their task. The default interaction network
in many multi-agent and multi-robot systems is a totally-connected network. In
this paper we seek to constrain interactions by reducing the connectivity of the
network whilst improving system performance. To do this we shall use regular
lattice networks, having discovered in previous work that such a topology is more
optimal even than small-world networks in this setting [5,22].
For each experimental condition, the system proceeds in the following way: We
initialise m = 20 agents at the same starting location and with total uncertainty
about the n = 126 locations, i.e. each agent begins with the belief B(pi ) = 12
for pi ∈ P. Every agent is also initialised with the same communication radius
Cr ∈ {20, . . . , 100} and a communication frequency Cf ∈ [0, 1].
After initialisation, each agent selects a proposition pi at random about which
they are uncertain and begins travelling to the associated location in search of
evidence about pi . When they arrive at the location the agent will observe their
environment and obtain evidence E according to Eq. (2), before updating its
belief by Eq. (1). The agent will then select another uncertain proposition pi
to investigate and begin to travel in the direction of the new location. At this
moment, the agent will also enter a communicating state with probability Cf .
If the agent does not enter the communicating state then it continues to its
destination to obtain evidence and the process repeats. If the agent does enter
the communicating state, however, then the agent begins broadcasting its belief
to any agents that come within its communication radii Cr as it travels to its
destination. The agent remains in the communicating state until it encounters
one or more other communicating agents within communication range. Upon
receiving the belief(s) of the other agent(s), the agent will cease communicating
and update its belief B by selecting another belief B at random before adopting
the fused belief B B by Eq. (3). If the agent remains uncertain about the
22 M. Crosscombe and J. Lawry
Fig. 3. Average error at steady state against communication radius Cr for increasing
communication frequency Cf = [0, 1]. Each dotted line shows the noise ∈ {0, 0.1, 0.3}.
The shaded regions show a 95% confidence interval.
Fig. 4. Network diagrams with different levels of connectivity. (a) Regular lattice net-
work with k = 2. (b) Regular lattice network with k = 4.(c) Totally-connected network.
Fig. 5. Average error at steady state against communication radius Cr for increasing
communication frequency Cf = [0, 1]. Each dotted line shows a different lattice network
with connectivity k ∈ {2, 4, 8, 16}. The shaded regions show a 95% confidence interval.
Fig. 6. Average error against time t for Cr = 20 and Cf = 0.2 for different regular
lattice networks with connectivity k ∈ {2, 4, 8, 16} and a totally-connected network (i.e.
k = 19). Each dotted line shows the noise ∈ {0, 0.1, 0.3}. The shaded regions show a
95% confidence interval.
In this paper we have proposed to study the effects of both the physical and
interaction networks on the collective learning performance of a multi-agent sys-
tem. We have demonstrated that, when the underlying interaction network is
totally connected, the physical network has minimal impact on performance in
26 M. Crosscombe and J. Lawry
our model. However, when the connectivity of the interaction network is greatly
reduced the system performs better, achieving lower average error when com-
pared with highly-connected interaction networks under the same experimental
conditions. Limiting the interactions between agents also results in improved
robustness to noise, where increased connectivity at the physical layer only wors-
ens performance for the same limited interaction network. Overall, performance
is best when the connectivity of both the physical and interaction networks is
severely reduced, e.g., for a connectivity of k = 2 and a communication radius
Cr = 20.
To our knowledge, this is the first such study explicitly investigating the
impact of both the physical and interaction networks in combination in a swarm
setting. While it is difficult to draw strong, broad conclusions from this study, we
hope that this will encourage other researchers working on collective behaviours
to consider the importance of the underlying interaction network and its level
of connectivity. Obviously, this study is limited to simulation studies of the
effects of the interplay between the physical and interaction network, but a more
theoretical study of this interplay is important to understand the reason behind
the observed dynamics.
Up to now we have only considered regular lattice networks and small-world
networks as the basis for interaction networks. These networks possess certain
desirable properties which may be more conducive to the processes involved in
collective learning than may other, different, types of networks. Furthermore, we
have only considered the effect of network connectivity in the context of collective
learning. This effect may not necessarily be observed for other kinds of collective
behaviours. As such, we intend to further explore how network connectivity,
and information sharing more generally, impacts different kinds of collective
behaviours in multi-agent systems. We are also interested in understanding how
a more dynamic environment might change the optimal level of connectivity
in the networks, as less connectivity may lead to outdated information being
retained by the population which is likely to affect performance over time.
References
1. Bechon, P., Slotine, J.J.: Synchronization and quorum sensing in a swarm of
humanoid robots. arXiv preprint arXiv:1205.2952 (2012)
2. Brambilla, M., Ferrante, E., Birattari, M., Dorigo, M.: Swarm robotics: a review
from the swarm engineering perspective. Swarm Intell. 7(1), 1–41 (2013)
3. Crosscombe, M., Lawry, J.: A model of multi-agent consensus for vague and uncer-
tain beliefs. Adapt. Behav. 24(4), 249–260 (2016)
4. Crosscombe, M., Lawry, J.: Collective preference learning in the best-of-n problem.
Swarm Intell. 15(1), 145–170 (2021)
The Benefits of Interaction Constraints 27
5. Crosscombe, M., Lawry, J.: The impact of network connectivity on collective learn-
ing. In: Matsuno, F., Azuma, S., Yamamoto, M. (eds.) DARS 2021. SPAR, vol. 22,
pp. 82–94. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-92790-5 7
6. Douven, I.: Optimizing group learning: an evolutionary computing approach. Artif.
Intell. 275, 235–251 (2019)
7. Douven, I., Kelp, C.: Truth approximation, social epistemology, and opinion
dynamics. Erkenntnis 75(2), 271–283 (2011)
8. Franks, N.R., Richardson, T.: Teaching in tandem-running ants. Nature 439(7073),
153–153 (2006)
9. Hamann, H.: Superlinear scalability in parallel computing and multi-robot systems:
shared resources, collaboration, and network topology. In: Berekovic, M., Buchty,
R., Hamann, H., Koch, D., Pionteck, T. (eds.) ARCS 2018. LNCS, vol. 10793, pp.
31–42. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77610-1 3
10. Heyes, C.M.: Social learning in animals: categories and mechanisms. Biol. Rev.
69(2), 207–231 (1994)
11. Hogg, E., Harvey, D., Hauert, S., Richards, A.: Evolving robust supervisors for
robot swarms in uncertain complex environments. In: Matsuno, F., Azuma, S.,
Yamamoto, M. (eds.) DARS 2021. SPAR, vol. 22, pp. 120–133. Springer, Cham
(2022). https://doi.org/10.1007/978-3-030-92790-5 10
12. Innocente, M.S., Grasso, P.: Self-organising swarms of firefighting drones: Har-
nessing the power of collective intelligence in decentralised multi-robot systems. J.
Comput. Sci. 34, 80–101 (2019)
13. Kwa, H.L., Leong Kit, J., Bouffanais, R.: Balancing collective exploration and
exploitation in multi-agent and multi-robot systems: a review. Front. Robot. AI 8,
771520 (2022)
14. Lazer, D., Friedman, A.: The network structure of exploration and exploitation.
Adm. Sci. Q. 52(4), 667–694 (2007)
15. Parker, C.A., Zhang, H.: Cooperative decision-making in decentralized multiple-
robot systems: the best-of-n problem. IEEE/ASME Trans. Mechatron. 14(2), 240–
251 (2009)
16. Rubenstein, M., Ahler, C., Hoff, N., Cabrera, A., Nagpal, R.: Kilobot: a low cost
robot with scalable operations designed for collective behaviors. Robot. Auton.
Syst. 62(7), 966–975 (2014). Reconfigurable Modular Robotics
17. Schranz, M., Umlauft, M., Sende, M., Elmenreich, W.: Swarm robotic behaviors
and current applications. Front. Robot. AI 7, 36 (2020)
18. Talamali, M.S., Saha, A., Marshall, J.A., Reina, A.: When less is more: robot
swarms adapt better to changes with constrained communication. Sci. Robot.
6(56), eabf1416 (2021)
19. Valentini, G.: Achieving Consensus in Robot Swarms. SCI, vol. 706. Springer,
Cham (2017). https://doi.org/10.1007/978-3-319-53609-5
20. Valentini, G., Ferrante, E., Dorigo, M.: The best-of-n problem in robot swarms:
formalization, state of the art, and novel perspectives. Front. Robot. AI 4, 9 (2017)
21. de Vries, J.: Image processing and noise reduction techniques for thermographic
images from large-scale industrial fires. In: Proceedings of the 12th International
Conference on Quantitative InfraRed Thermography (QIRT) (2014)
22. Watts, D., Strogatz, S.: Collective dynamics of ‘small-world’ networks. Nature 393,
440–442 (1998)
Outlining the Design Space of eXplainable
Swarm (xSwarm): Experts’ Perspective
1 Introduction
Swarm robotics consists of multiple robots that interact with each other to
accomplish a collective task that is beyond individual ability, or can perform
more effectively by a group of robots [11]. Swarm robotics promise various imple-
mentations in many scenarios such as pollution control, surveillance, package
delivery and firefighting. Although swarms can perform the task autonomously,
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 28–41, 2024.
https://doi.org/10.1007/978-3-031-51497-5_3
Explainable Swarm 29
human intervention is still crucial for safe and effective deployment in real-
world scenarios [1]. For example, swarm operators may need to manage swarm
tasks that might be beyond swarm capability or intervene in cases of failures
and errors. This intervention could be more complex when it requires multiple
humans to work together in a team to operate remote swarms [10]. For instance,
in a search and rescue scenario where the swarm is responsible for searching and
identifying casualties, the system may consist of two operators: (a) an operator
is tasked with scheduling clusters of robots to search areas and (b) an analyst
who is tasked with analysing casualties images.
Human-Swarm Interaction (HSI) is an emerging field of research that studies
human factors and swarm robotics to better utilise both humans and swarm
capabilities in a Human-Swarm environment. An essential part of HSI is com-
municating, to humans, the reasons behind swarms’ decisions and behaviour
[25]. As swarms become more advanced, explaining their behaviour (individual
and collective) is buried in increasingly complex communication between swarm
agents and underlying black-box machine learning models.
Explainable Artificial Intelligence (XAI) is a response to this direction of re-
search. XAI is a domain of research that is interested in making the behaviour or
the output of intelligent systems more intelligible to humans by providing expla-
nations [18]. While the concept of XAI is increasingly gaining attention in several
research domains, such as Agent Planning [3], Machine Learning [22] and Motion
Planning [6]. There is a clear gap in the literature to integrate swarm robotics
into the research and development of XAI. Similar to XAI fields of research,
we argue that eXplainable Swarm (xSwarm) should address that an explanation
is an answer to an unexpected event or a failure of the system [18]. However,
compared to different XAI domains, xSwarm can be significantly different since
the decision made by a swarm significantly depends on the communication and
collective contribution of its agents [25]. Further, the explainability of swarms
would require explaining the emerging collective behaviour that is not necessary
for the summation of the individual robot behaviour [5].
Many studies discussed the need for XAI methods and models for Human-
Swarm environments to improve trust and maintain situational awareness, e.g.,
[1,20,25]. However, at the current stage of research, it is still vague what kind of
explanations a swarm system should be expected to generate and what questions
an explainable swarm is expected to answer. This paper seeks to conceptualise
the concept of xSwarm and introduce a new taxonomy of explainability require-
ments for HSI. Our results can guide the requirements elicitation for explanation
methods and models for Human-Swarm teaming. These insights contribute to
operationalising the concept of xSwarm in reality and can serve as a resource
for researchers and developers interested in designing explainable Human-Swarm
interfaces.
In this paper, we limit our research to swarm robotics, i.e., this paper does
not address multi-robot systems that have explicit goals and their agents execute
both individual and group plans. We follow a similar approach proposed in recent
publications to outline explainability requirements in a given problem [6,15]. Our
30 M. Naiseh et al.
3.1 Participants
Thirty-two participants voluntarily accessed our online questionnaire. Partici-
pants were approached through two main mailing lists: multi-agent1 and TAS-
Hub2 mailing lists. Four participants were excluded because they identified them-
selves to have less than one year of working in swarm robotics. Two more par-
ticipants were also excluded as they chose not to complete the questionnaire,
leaving us with twenty-six experts. 46.2% of our participants were researchers,
7.7% were lecturers, 23.1% were PhD students, and 22.9% with industrial roles.
All experts had at least more than one year of working in swarm robotics. 61.5%
of participants had 1–5 years of experience, 23.1% had 5–10 years, 7.7% with
10–15 years and 7.7% exceeded 15 years of experience in the field.
experts (researchers from the inside and outside the research group) to assure
that question formulation can achieve the research goal and be understandable
to study participants. The questionnaire starts by asking participants to read
the participant information sheet. The information sheet outlines the motivation
of the study, compiles background material about the research goal and problem
as well as describes the questionnaire structure. Participants’ information sheet
also contains background material about explainable AI to minimise any avail-
ability bias [19]. Then participants were directed toward the main questionnaire.
The questionnaire has four main sections. Table 1 provides a brief description of
the questionnaire structure and probe types used in each section.
3.3 Analysis
We followed the content analysis method [12]. Our data set included participants’
answers to open-ended questions with themes extracted across perspectives. The
analysis was conducted by the first author. To increase the trustworthiness of our
qualitative analysis, we followed an iterative process across the research team.
This iterative procedure was implemented to check and confirm that the taxon-
omy is reflected in the responses of the participants. The authors met multiple
times during each stage of the research to ensure the accurate interpretation
of each category and the supporting evidence. These discussions and meetings
34 M. Naiseh et al.
4 Results
Our results are divided into two parts. We start by discussing the general expla-
nation categories emergent in participants’ answers, which outline the design
space of xSwarm. We then discuss the main challenges faced by our experts to
integrate explainability in swarm systems. Table 2 shows a sample of partici-
pants’ answers. More examples can be found in Appendix A - Table A.2.
Table 2. Explanation categories for swarm robotics. Explain column refers to partici-
pants’ explanations to swarm failures or unexpected behaviour- Debug column presents
participants’ questions to debug swarm failures or unexpected behaviour - Freq. refers
to the number of times this category appeared in the data set.
3
For anonymity, individual participants are referred to as Pn (e.g. P18) in this chapter.
36 M. Naiseh et al.
Hardware. Even though the quality and robustness of the hardware are increas-
ing, hardware failure is still quite common. Participants explained the failure of
the swarm with a link to a hardware problem. For instance, when the swarm
was not able to detect all the casualties in the area, P17 commented: “because
of faulty in the sensors”. Our participants also discussed that explanations of
hardware are necessary to give human operators actionable explanations to either
better utilise the swarm or to improve the swarm performance, e.g., send more
swarm agents to the location. There was also a typical pattern among partic-
ipants’ feedback to explain swarm behaviour or failure based on its hardware
boundaries and limitation as well as environmental and training data limita-
Explainable Swarm 37
tions. For instance, P11 explained unexpected behaviour in Fig. 4.1: “Because
some agents have limited speed capabilities, so they are waiting for each other”,
and similarly P14 explained Fig. 4.2 failure, “the swarm sensors have low image
input”.
Architecture and Design. Participants also found many reasons for swarm
failures based on the swarm architecture or design decisions of the system. This
category did not appear frequently in the data, and participants mentioned five
explanations and six questions. Participants recognised that potential failure or
unexpected behaviour can be related to initial design decisions made by the
systems engineers e.g., P12 answered: “Because the swarm has not got enough
training in such an environment” and similarly, P22 commented: “Because the
transition probability between states is fixed”. Participants also took a broader
view of swarm explainability when they discussed questions that can be answered
through descriptive information about the swarm system design. For instance, P1
and P4 asked: “How is the swarm size selected?”, “What is the design method? is
it behaviour-based or Automatic design?.” Currently, these kinds of explanations
are discussed in the XAI literature as an on-boarding activity where human
operators are introduced to the system design architecture to understand its
capability and limitation [8]. These global properties of the system could help
inform the interpretation of swarm behaviour and predictions during critical
decision-making scenarios.
ical studies discussed this issue and showed that erroneous actions taken by the
human operator to progress toward the overall goal will be amass if actions
are based on outdated swarm explanations or information [27]. Future xSwarm
research may need to consider discovering mitigation strategies to break the error
propagation cycle that may emerge from outdated explanations.
5 Conclusion
Acknowledgments. This work was supported by the Engineering and Physical Sci-
ences Research Council (EP/V00784X/1).
References
1. Agrawal, A., Cleland-Huang, J.: Explaining autonomous decisions in swarms of
human-on-the-loop small unmanned aerial systems. In: Proceedings of the AAAI
Conference on Human Computation and Crowdsourcing, vol. 9, pp. 15–26 (2021)
2. Almagor, S., Lahijanian, M.: Explainable multi agent path finding. In: AAMAS
(2020)
3. Anjomshoae, S., Najjar, A., Calvaresi, D., Främling, K.: Explainable agents and
robots: results from a systematic literature review. In: 18th International Con-
ference on Autonomous Agents and Multiagent Systems (AAMAS 2019), Mon-
treal, Canada, 13–17 May 2019, pp. 1078–1088. International Foundation for
Autonomous Agents and Multiagent Systems (2019)
4. Atkinson, K., Bench-Capon, T., Bollegala, D.: Explanation in AI and law: past,
present and future. Artif. Intell. 289, 103387 (2020)
5. Brambilla, M., Ferrante, E., Birattari, M., Dorigo, M.: Swarm robotics: a review
from the swarm engineering perspective. Swarm Intell. 7(1), 1–41 (2013)
40 M. Naiseh et al.
6. Brandao, M., Canal, G., Krivić, S., Luff, P., Coles, A.: How experts explain motion
planner output: a preliminary user-study to inform the design of explainable plan-
ners. In: 2021 30th IEEE International Conference on Robot & Human Interactive
Communication (RO-MAN), pp. 299–306. IEEE (2021)
7. Buçinca, Z., Malaya, M.B., Gajos, K.Z.: To trust or to think: cognitive forcing
functions can reduce overreliance on AI in AI-assisted decision-making. Proc. ACM
Hum.-Comput. Interact. 5(CSCW1), 1–21 (2021)
8. Cai, C.J., Winter, S., Steiner, D., Wilcox, L., Terry, M.: “hello AI”: uncovering the
onboarding needs of medical practitioners for human-AI collaborative decision-
making. Proc. ACM Hum.-Comput. Interact. 3(CSCW), 1–24 (2019)
9. Chakraborti, T., Sreedharan, S., Zhang, Y., Kambhampati, S.: Plan explanations
as model reconciliation: moving beyond explanation as soliloquy. arXiv preprint
arXiv:1701.08317 (2017)
10. Clark, J.R., et al.: Industry led use-case development for human-swarm operations.
arXiv preprint arXiv:2207.09543 (2022)
11. Divband Soorati, M., Clark, J., Ghofrani, J., Tarapore, D., Ramchurn, S.D.:
Designing a user-centered interaction interface for human-swarm teaming. Drones
5(4), 131 (2021)
12. Elo, S., Kyngäs, H.: The qualitative content analysis process. J. Adv. Nurs. 62(1),
107–115 (2008)
13. Klein, G.A., Calderwood, R., Macgregor, D.: Critical decision method for eliciting
knowledge. IEEE Trans. Syst. Man Cybern. 19(3), 462–472 (1989)
14. Kraus, S., et al.: AI for explaining decisions in multi-agent environments. In: Pro-
ceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 13534–13538
(2020)
15. Liao, Q.V., Gruen, D., Miller, S.: Questioning the AI: informing design practices
for explainable AI user experiences. In: Proceedings of the 2020 CHI Conference
on Human Factors in Computing Systems, pp. 1–15 (2020)
16. Lundberg, S.M., et al.: From local explanations to global understanding with
explainable AI for trees. Nat. Mach. Intell. 2(1), 56–67 (2020)
17. McGuirk, P.M., O’Neill, P.: Using questionnaires in qualitative human geography
(2016)
18. Miller, T.: Explanation in artificial intelligence: insights from the social sciences.
Artif. Intell. 267, 1–38 (2019)
19. Morgan, M.G.: Use (and abuse) of expert elicitation in support of decision making
for public policy. Proc. Natl. Acad. Sci. 111(20), 7176–7184 (2014)
20. Mualla, Y., et al.: The quest of parsimonious XAI: a human-agent architecture for
explanation formulation. Artif. Intell. 302, 103573 (2022)
21. Mueller, S.T., Hoffman, R.R., Clancey, W., Emrey, A., Klein, G.: Explanation in
human-AI systems: a literature meta-review, synopsis of key ideas and publications,
and bibliography for explainable AI. arXiv preprint arXiv:1902.01876 (2019)
22. Naiseh, M., Cemiloglu, D., Al Thani, D., Jiang, N., Ali, R.: Explainable recommen-
dations and calibrated trust: two systematic user errors. Computer 54(10), 28–37
(2021)
23. Naiseh, M., Jiang, N., Ma, J., Ali, R.: Personalising explainable recommendations:
literature and conceptualisation. In: Rocha, Á., Adeli, H., Reis, L.P., Costanzo,
S., Orovic, I., Moreira, F. (eds.) WorldCIST 2020. AISC, vol. 1160, pp. 518–533.
Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45691-7 49
24. Poli, R., Kennedy, J., Blackwell, T.: Particle swarm optimization. Swarm Intell.
1(1), 33–57 (2007)
Explainable Swarm 41
25. Roundtree, K.A., Goodrich, M.A., Adams, J.A.: Transparency: transitioning from
human-machine systems to human-swarm systems. J. Cogn. Eng. Decis. Making
13(3), 171–195 (2019)
26. Sokol, K., Flach, P.: Explainability fact sheets: a framework for systematic assess-
ment of explainable approaches. In: Proceedings of the 2020 Conference on Fair-
ness, Accountability, and Transparency, pp. 56–67 (2020)
27. Walker, P., Nunnally, S., Lewis, M., Kolling, A., Chakraborty, N., Sycara, K.:
Neglect benevolence in human control of swarms in the presence of latency. In:
2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC),
pp. 3009–3014. IEEE (2012)
VMAS: A Vectorized Multi-agent
Simulator for Collective Robot Learning
1 Introduction
Many real-world problems require coordination of multiple robots to be solved.
However, coordination problems are commonly computationally hard. Examples
include path-planning [14], task assignment [25], and area coverage [39]. While
exact solutions exist, their complexity grows exponentially in the number of
robots [3]. Metaheuristics [6] provide a fast and scalable solutions, but lack in
optimality. Multi-Agent Reinforcement Learning (MARL) can be used as a scal-
able approach to find near-optimal solutions to these problems [34]. In MARL,
agents trained in simulation collect experiences by interacting with the environ-
ment, and train their policies (typically represented with deep neural networks)
through a reward signal.
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 42–56, 2024.
https://doi.org/10.1007/978-3-031-51497-5_4
VMAS: A Vectorized Multi-agent Simulator for Collective Robot Learning 43
However, current MARL approaches present several issues. Firstly, the train-
ing phase can require significant time to converge to optimal behavior. This is
partially due to the sample efficiency of the algorithm, and partially to the
computational complexity of the simulator. Secondly, current benchmarks are
specific to a predefined task and mostly tackle unrealistic videogame-like sce-
narios [28,31], far from real-world multi-robot problems. This makes research in
this area fragmented, with a new simulation framework being implemented for
each new task introduced. Multi-robot simulators, on the other hand, prove to
be more general, but their high fidelity and full-stack simulation results in slow
performance, preventing their applicability to MARL. Full-stack learning can
significantly hinder training performance. Learning can be made more sample-
efficient if simulation is used to solve high-level multi-robot coordination prob-
lems, while leaving low-level robotic control to first-principles-based methods.
Motivated by these reasons, we introduce VMAS, a vectorized multi-agent
simulator. VMAS is a vectorized 2D physics simulator written in PyTorch [22],
designed for efficient MARL benchmarking. It simulates agents and landmarks
44 M. Bettini et al.
of different shapes and supports torque, elastic collisions and custom gravity.
Holonomic motion models are used for the agents to simplify simulation. Vec-
torization in PyTorch allows VMAS to perform simulations in a batch, seamlessly
scaling to tens of thousands of parallel environments on accelerated hardware.
With the term GPU vectorization we refer to the Single Instruction Multiple
Data (SIMD) execution paradigm available inside a GPU warp. This paradigm
permits to execute the same instruction on a set of parallel simulations in a
batch. VMAS has an interface compatible with OpenAI Gym [7] and with the
RLlib library [15], enabling out-of-the-box integration with a wide range of RL
algorithms. VMAS also provides a framework to easily implement custom multi-
robot scenarios. Using this framework, we introduce a set of 12 multi-robot
scenarios representing difficult learning problems. Additional scenarios can be
implemented through a simple and modular interface. We vectorize and port all
scenarios from OpenAI MPE [16] in VMAS. We benchmark four of VMAS’s new
scenarios using three MARL algorithms based on Proximal Policy Optimization
(PPO) [29]. We show the benefits of vectorization by benchmarking our scenar-
ios in the RLlib [15] library. Our scenarios prove to challenge state-of-the-art
MARL algorithms in complementary ways.
Contributions. We now list the main contributions of this work:
– We introduce the VMAS framework. A vectorized multi-agent simulator
which enables MARL training at scale. VMAS supports inter-agent com-
munication and customizable sensors, such as LIDARs.
– We implement a set of twelve multi-robot scenarios in VMAS, which focus on
testing different collective learning challenges including: behavioural hetero-
geneity, coordination through communication, and adversarial interaction.
– We port and vectorize all scenarios from OpenAI MPE [16] into VMAS and
run a performance comparison between the two simulators. We demonstrate
the benefits of vectorization in terms of simulation speed, showing that VMAS
is up to 100× faster than MPE.
The VMAS codebase is available https://github.com/proroklab/VectorizedMul
tiAgentParticleSimulator.
2 Related Work
In this section, we review the related literature in the fields of multi-agent and
multi-robot simulation, highlighting the core gaps of each field. Furthermore, we
compare the most relevant simulation frameworks with VMAS in Table 1.
Multi-agent Reinforcement Learning Environments. A significant
amount of work exists in the context of MARL to address the issues of
multi-robot simulation for learning hard coordination strategies. Realistic GPU-
accelerated simulators and engines have been proposed. Isaac [17] is a proprietary
NVIDIA simulator used for realistic robotic simulation in reinforcement learning.
Instead of using environment vectorization to accelerate learning, it uses concur-
rent execution of multiple training environments in the same simulation instance.
VMAS: A Vectorized Multi-agent Simulator for Collective Robot Learning 45
Vectora Stateb Commc Actiond PhysEnge #Agentsf Geng Exth MRobi MARLj RLlibk
Brax [8] ✓ C ✗ C 3D < 10 ✓ ✓ ✗ ✗ ✗
MPE [16] ✗ C C+D C+D 2D < 100 ✓ ✓ ✗ ✓ ✓
MAgent [38] ✗ D ✗ D ✗ > 1000 ✗ ✗ ✗ ✓ ✓
MA-Learning-Environments [10] ✗ D ✗ D ✗ < 10 ✓ ✗ ✓ ✓ ✗
MA-Emergence-Environments [2] ✗ C ✗ C+D 3D < 10 ✗ ✗ ✗ ✓ ✗
Pommerman [26] ✗ D ✗ D ✗ < 10 ✗ ✗ ✗ ✓ ✗
SMAC [28] ✗ C ✗ D ✗ < 100 ✗ ✓ ✗ ✓ ✓
Neural-MMO [31] ✗ C ✗ C+D ✗ < 1000 ✗ ✓ ✗ ✓ ✓
Google research football [12] ✗ C ✗ D 2D < 100 ✗ ✓ ✗ ✓ ✓
gym-pybullet-drones [21] ✗ C ✗ C 3D < 100 ✗ ✓ ✓ ✓ ✓
Particle robots simulator [30] ✗ C ✗ C+D 2D < 100 ✗ ✓ ✓ ✓ ✗
MAMujoco [23] ✗ C ✗ C 3D < 10 ✗ ✗ ✗ ✓ ✗
Gazebo [11] ✗ C C+D C+D 3D < 10 ✓ ✓ ✓ ✗ ✗
Webots [18] ✗ C C+D C+D 3D < 10 ✓ ✓ ✓ ✗ ✗
ARGOS [24] ✗ C C+D C+D 2D&3D < 1000 ✓ ✓ ✓ ✗ ✗
VMAS ✓ C C+D C+D 2D < 100 ✓ ✓ ✓ ✓ ✓
a
Vectorized
b
Continuous state (C) or discrete state/grid world (D)
c
Continuous communication (C) or discrete communication (D) inside the simulator
d
Continuous actions (C) or discrete actions (D)
e
Type of physics engine
f
Number of agents supported
g
General purpose simulator: any type of task can be created
h
Extensibility (API for creating new scenarios)
i
Contains multi-robot tasks
j
Made for Multi-Agent Reinforcement Learning (MARL)
k
Compatible with RLlib framework [15]
of simulated team rollouts and a policy update. During the rollout phase of
iteration k, simulations are performed to collect experiences from the agents’
interactions with the environment according to their policy πk . The collected
experiences are then used to update the team policy. The new policy πk+1 will
be employed in the rollout phase of the next training iteration. The rollout phase
usually constitutes the bottleneck of this process. Vectorization allows parallel
simulation and helps alleviate this issue.
Inspired by the modularity of some existing solutions, like MPE [16], we cre-
ated our framework as a new scalable platform for running and creating MARL
benchmarks. With this goal in mind, we developed VMAS following a set of
tenets:
– Vectorized. VMAS vectorization can step any number of environments in
parallel. This significantly reduces the time needed to collect rollouts for train-
ing in MARL.
– Simple. Complex vectorized physics engines exist (e.g., Brax [8]), but they
do not scale efficiently when dealing with multiple agents. This defeats the
computational speed goal set by vectorization. VMAS uses a simple custom
2D dynamics engine written in PyTorch to provide fast simulation.
– General. The core of VMAS is structured so that it can be used to implement
general high-level multi-robot problems in 2D. It can support adversarial as
well as cooperative scenarios. Holonomic robot simulation shifts focus to high-
level coordination, obviating the need to learn low-level controls using MARL.
– Extensible. VMAS is not just a simulator with a set of environments. It
is a framework that can be used to create new multi-agent scenarios in a
format that is usable by the whole MARL community. For this purpose, we
have modularized our framework to enable new task creation and introduced
interactive rendering to debug scenarios.
– Compatible. VMAS has multiple wrappers which make it directly compati-
ble with different MARL interfaces, including RLlib [15] and Gym [7]. RLlib
has a large number of already implemented RL algorithms.
Let us break down VMAS’s structure in depth.
Interface. The structure of VMAS is illustrated in Fig. 2. It has a vector-
ized interface, which means that an arbitrary number of environments can be
stepped in parallel in a batch. In Sec. 5, we demonstrate how vectorization grants
important speed-ups on the CPU and seamless scaling on the GPU. While the
standard simulator interface uses PyTorch [22] to enable feeding tensors directly
as input/output, we provide wrappers for the standard non-vectorized OpenAI
Gym [7] interface and for the vectorized interface of the RLlib [15] framework.
This enables users to effortlessly access the range of RL training algorithms
already available in RLlib. Actions for all environments and agents are fed to
VMAS for every simulation step. VMAS supports movement and inter-agent
communication actions, both of which can be either continuous or discrete. The
interface of VMAS provides rendering through Pyglet [1].
Scenario. Scenarios encode the multi-agent task that the team is trying to
solve. Custom scenarios can be implemented in a few hours and debugged using
48 M. Bettini et al.
Fig. 2. VMAS structure. VMAS has a vectorized MARL interface (left) with wrappers
for compatibility with OpenAI Gym [7] and the RLlib RL library [15]. The default
VMAS interface uses PyTorch [22] and can be used for feeding input already on the
GPU. Multi-agent tasks in VMAS are defined as scenarios (center). To define a scenario,
it is sufficient to implement the listed functions. Scenarios access the VMAS core (right),
where agents and landmarks are simulated in the world using a 2D custom written
physics module.
⎧
⎨c xij (t) k log 1 + e −(xij (t)k−dmin ) if xij (t) dmin
e xij (t)
fij (t) = . (2)
⎩
0 otherwise
Here, c is a parameter regulating the force intensity. xij is the relative position
between the closest points on the shapes of the two entities. dmin is the minimum
distance allowable between them. The term inside the logarithm computes a
scalar proportional to the penetration of the two entities, parameterized by a
coefficient k. This term is then multiplied by the normalized relative position
vector. Collision intensity and penetration can be tuned by regulating c and k.
This is the same collision system used in OpenAI MPE [16].
The simulation step used for the linear state is also applied to the angular
state:
⎧
⎪ e
⎨τi (t) = j∈N \{i} rij (t) × fij (t)
θ̇i (t + 1) = (1 − ζ)θ̇i (t) + τiI(t) δt . (3)
⎪
⎩ i
4 Multi-robot Scenarios
Alongside VMAS, we introduce a set of 12 multi-robot scenarios. These scenar-
ios contain various multi-robot problems, which require complex coordination—
like leveraging heterogeneous behaviour and inter-agent communication—to be
solved. While the ability to send communication actions is not used in these
scenarios, communication can be used in the policy to improve performance.
For example, Graph Neural Networks (GNNs) can be used to overcome partial
observability through information sharing [4].
50 M. Bettini et al.
Each scenario delimits the agents’ input by defining the set of their obser-
vations. This set typically contains the minimum observation needed to solve
the task (e.g., position, velocity, sensory input, goal position). Scenarios can be
made arbitrarily harder or easier by modifying these observations. For example,
if the agents are trying to transport a package, the precise relative distance to
the package can be removed from the agent inputs and replaced with LIDAR
measurements. Removing global observations from a scenario is a good incentive
for inter-agent communication.
All tasks contain numerous parametrizable components. Every scenario
comes with a set of tests, which run a local heuristic on all agents. Furthermore,
we vectorize and port all 9 scenarios from MPE [16] to VMAS. In this section,
we give a brief overview of our new scenarios. For more details (e.g., observation
space, reward, etc.) you can find in-depth descriptions in the VMAS repository.
Transport (Fig. 1a). N agents have to push M packages to a goal. Packages
have a customizable mass and shape. Single agents are not able to move a high-
mass package by themselves. Cooperation with teammates is thus needed to
solve the task.
Wheel (Fig. 1b). N agents have to collectively rotate a line. The line is
anchored to the origin and has a parametrizable mass and length. The team’s
goal is to bring the line to a desired angular velocity. Lines with a high mass are
impossible to push for single agents. Therefore, the team has to organize with
agents on both sides to increase and reduce the line’s velocity.
Balance (Fig. 1c). N agents are spawned at the bottom of a world with ver-
tical gravity. A line is spawned on top of them. The agents have to transport a
spherical package, positioned randomly on top of the line, to a given goal at the
top. The package has a parametrizable mass and the line can rotate.
Give Way (Fig. 1d). Two agents start in front of each other’s goals in a
symmetric environment. To solve the task, one agent has to give way to the
other by using a narrow space in the middle of the environment.
Football (Fig. 1e). A team of N blue agents competes against a team of M
red agents to score a goal. By default, red agents are controlled by a heuristic
AI, but self-play is also possible. Cooperation among teammates is required to
coordinate attacking and defensive maneuvers. Agents need to communicate and
assume different behavioural roles in order to solve the task.
Passage (Fig. 1f ). 5 agents, starting in a cross formation, have to reproduce
the same formation on the other side of a barrier. The barrier has M passages
(M = 1 in the figure). Agents are penalized for colliding amongst each other and
with the barrier. This scenario is a generalization of the one considered in [4].
Reverse Transport (Fig. 1g). This task is the same as Transport, except only
one package is present. Agents are spawned inside of it and need to push it to
the goal.
Dispersion (Fig. 1h). There are N agents and N food particles. Agents start
in the same position and need to cooperatively eat all food. Most MARL algo-
rithms cannot solve this task (without communication or observations from
other agents) as they are constrained by behavioural homogeneity deriving from
VMAS: A Vectorized Multi-agent Simulator for Collective Robot Learning 51
Fig. 3. Comparison of the scalability of VMAS and MPE [16] in the number of parallel
environments. In this plot, we show the execution time of the “simple spread” scenario
for 100 steps. MPE does not support vectorization and thus cannot be run on a GPU.
VMAS: A Vectorized Multi-agent Simulator for Collective Robot Learning 53
Experiments are run in RLlib [15] using the vectorized interface. We run all
algorithms for 400 training iterations. Each training iteration is performed over
60,000 environment interactions. We plot the mean and standard deviation of the
mean episode reward2 over 10 runs with different seeds. The model used for all
critics and actors is a two layer Multi Layer Perceptron (MLP) with hyperbolic
tangent activations. A video of the learned policies is available at this https://
youtu.be/aaDRYfiesAY. In the following, we discuss the results for the trained
scenarios.
Transport (Fig. 4a). In the Transport environment, only IPPO is able to learn
the optimal policy. This is because the other models, which have centralized com-
ponents, have an input space consisting of the concatenation of all the agents’
observations. Consequently, centralized architectures fail to generalize in envi-
ronments requiring a high initial exploration like this one, where there is a high
variance in possible joint states (and therefore there is a low probability that a
similar state will be encountered).
2
The episode reward mean is the mean of the total rewards of episodes contained in
the training iteration.
54 M. Bettini et al.
Wheel (Fig. 4b). The Wheel environment proved to be a hard task for MARL
algorithms. Here, all models were not able to solve the task and performed worse
than the heuristic.
Balance (Fig. 4c). In Balance, all models were able to solve the task and out-
perform the heuristic. However, this is largely due to the use of a big observation
space containing global information. The task can be made arbitrarily harder by
removing part of the observation space and thus increasing partial observability.
Give Way (Fig. 4d). In the Give Way scenario, it is shown that only algorithms
able to develop heterogeneous agent behaviour can solve the environment. In
fact, IPPO and MAPPO, which use parameter sharing and decentralized actors,
fail this scenario. On the other hand, it is shown that the scenario can be solved
either through a centralized actor (CPPO) or by disabling parameter sharing
and allowing agent policies to be heterogeneous (HetIPPO).
The experimental results confirm that VMAS proposes a selection of sce-
narios which prove challenging in orthogonal ways for current state-of-the-art
MARL algorithms. We show that there exists no one-fits-all solution and that
our scenarios can provide a valuable benchmark for new MARL algorithms. In
addition, vectorization enables faster training, which is key to a wider adoption
of multi-agent learning in the robotics community.
7 Conclusion
References
1. Pyglet. https://pyglet.org/
2. Baker, B., et al.: Emergent tool use from multi-agent autocurricula. In: Interna-
tional Conference on Learning Representations (2019)
3. Bernstein, D.S., Givan, R., Immerman, N., Zilberstein, S.: The complexity of decen-
tralized control of markov decision processes. Math. Oper. Res. 27(4), 819–840
(2002)
4. Blumenkamp, J., Morad, S., Gielis, J., Li, Q., Prorok, A.: A framework for real-
world multi-robot systems running decentralized gnn-based policies. arXiv preprint
arXiv:2111.01777 (2021)
5. Bradbury, J., et al.: JAX: composable transformations of Python+NumPy pro-
grams (2018). http://github.com/google/jax
6. Bräysy, O., Gendreau, M.: Vehicle routing problem with time windows, Part II:
Metaheuristics. Transp. Sci. 39(1), 119–139 (2005)
7. Brockman, G., et al.: Openai gym. arXiv preprint arXiv:1606.01540 (2016)
8. Freeman, C.D., Frey, E., Raichuk, A., Girgin, S., Mordatch, I., Bachem, O.: Brax -
a differentiable physics engine for large scale rigid body simulation (2021). http://
github.com/google/brax
9. Ijspeert, A.J., Martinoli, A., Billard, A., Gambardella, L.M.: Collaboration through
the exploitation of local interactions in autonomous collective robotics: the stick
pulling experiment. Auton. Robot. 11(2), 149–171 (2001)
10. Jiang, S., Amato, C.: Multi-agent reinforcement learning with directed exploration
and selective memory reuse. In: Proceedings of the 36th Annual ACM Symposium
on Applied Computing, pp. 777–784 (2021)
11. Koenig, N., Howard, A.: Design and use paradigms for gazebo, an open-source
multi-robot simulator. In: 2004 IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS) (IEEE Cat. No. 04CH37566), pp. 2149–2154. IEEE
(2004)
12. Kurach, K., et al.: Google research football: a novel reinforcement learning envi-
ronment. Proc. AAAI Conf. Artif. Intell. 34, 4501–4510 (2020)
13. Lange, R.T.: gymnax: A JAX-based reinforcement learning environment library
(2022). http://github.com/RobertTLange/gymnax
14. Li, Q., Gama, F., Ribeiro, A., Prorok, A.: Graph neural networks for decentral-
ized multi-robot path planning. In: 2020 IEEE/RSJ International Conference on
Intelligent Robots and Systems (IROS), pp. 11785–11792. IEEE (2020)
15. Liang, E., et al.: Rllib: abstractions for distributed reinforcement learning. In:
International Conference on Machine Learning, pp. 3053–3062. PMLR (2018)
16. Lowe, R., Wu, Y.I., Tamar, A., Harb, J., Pieter Abbeel, O., Mordatch, I.: Multi-
agent actor-critic for mixed cooperative-competitive environments. Adv. Neural
Inf. Process. Syst. (2017)
17. Makoviychuk, V., et al.: Isaac gym: High performance GPU based physics simula-
tion for robot learning. In: Thirty-fifth Conference on Neural Information Process-
ing Systems Datasets and Benchmarks Track (Round 2) (2021)
18. Michel, O.: Cyberbotics ltd. webotsTM : professional mobile robot simulation. Int.
J. Adv. Robot. Syst. 1, 5 (2004)
19. Niiranen, J.: Fast and accurate symmetric Euler algorithm for electromechanical
simulations. In: Electrimacs 99 (Modelling and Simulation of Electric Machines
Converters an& Systems), pp. I–71 (D1999)
56 M. Bettini et al.
20. Noori, F.M., Portugal, D., Rocha, R.P., Couceiro, M.S.: On 3d simulators for multi-
robot systems in ros: Morse or gazebo? In: 2017 IEEE International Symposium
on Safety, Security and Rescue Robotics (SSRR), pp. 19–24 (2017)
21. Panerati, J., Zheng, H., Zhou, S., Xu, J., Prorok, A., Schoellig, A.P.: Learning to fly-
a gym environment with pybullet physics for reinforcement learning of multi-agent
quadcopter control. In: 2021 IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS), pp. 7512–7519. IEEE (2021)
22. Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning
library. Adv. Neural Inf. Process. Syst. (2019)
23. Peng, B., et al.: Facmac: factored multi-agent centralised policy gradients. Adv.
Neural Inf. Process. Syst. 12208–12221 (2021)
24. Pinciroli, C., et al.: ARGoS: a modular, parallel, multi-engine simulator for multi-
robot systems. Swarm Intell. 6, 271–295 (2012)
25. Prorok, A.: Robust assignment using redundant robots on transport networks with
uncertain travel time. IEEE Trans. Automat. Sci. Eng. 17, 2025–2037 (2020)
26. Resnick, C., et al.: Pommerman: a multi-agent playground. CoRR (2018)
27. Reynolds, C.W.: Flocks, herds and schools: a distributed behavioral model. In:
Proceedings of the 14th Annual Conference on Computer Graphics and Interactive
Techniques, pp. 25–34 (1987)
28. Samvelyan, M., et al.: The StarCraft Multi-agent Challenge. CoRR (2019)
29. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy
optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
30. Shen, J., Xiao, E., Liu, Y., Feng, C.: A deep reinforcement learning envi-
ronment for particle robot navigation and object manipulation. arXiv preprint
arXiv:2203.06464 (2022)
31. Suarez, J., Du, Y., Isola, P., Mordatch, I.: Neural MMO: a massively multiagent
game environment for training and evaluating intelligent agents. arXiv preprint
arXiv:1903.00784 (2019)
32. Todorov, E., Erez, T., Tassa, Y.: Mujoco: a physics engine for model-based control.
In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems,
pp. 5026–5033. IEEE (2012)
33. Tolstaya, E., Gama, F., Paulos, J., Pappas, G., Kumar, V., Ribeiro, A.: Learning
decentralized controllers for robot swarms with graph neural networks. In: Kael-
bling, L.P., Kragic, D., Sugiura, K. (eds.) Proceedings of the Conference on Robot
Learning, Proceedings of Machine Learning Research, vol. 100, pp. 671–682. PMLR
(2020). https://proceedings.mlr.press/v100/tolstaya20a.html
34. Wang, B., Liu, Z., Li, Q., Prorok, A.: Mobile robot path planning in dynamic envi-
ronments through globally guided reinforcement learning. IEEE Robot. Automat.
Lett. 5, 6932–6939 (2020)
35. Weng, J., et al.: Envpool: a highly parallel reinforcement learning environment
execution engine. arXiv preprint arXiv:2206.10558 (2022)
36. de Witt, C.S., et al.: Is independent learning all you need in the starcraft multi-
agent challenge? arXiv preprint arXiv:2011.09533 (2020)
37. Yu, C., Velu, A., Vinitsky, E., Wang, Y., Bayen, A., Wu, Y.: The surpris-
ing effectiveness of PPO in cooperative, multi-agent games. arXiv preprint
arXiv:2103.01955 (2021)
38. Zheng, L., et al.: Magent: a many-agent reinforcement learning platform for arti-
ficial collective intelligence. In: Proceedings of the AAAI Conference on Artificial
Intelligence (2018)
39. Zheng, X., Koenig, S., Kempe, D., Jain, S.: Multirobot forest coverage for weighted
and unweighted terrain. IEEE Trans. Robot. 26, 1018–1031 (2010)
FLAM: Fault Localization and Mapping
1 Introduction
Using a team of robots, in opposition to a single one, for carrying a given task
can result in significant performance gains, especially if the aforementioned task
requires terrain coverage. Indeed, with proper coordination between the robots,
the speed at which the terrain is being covered should increase proportionally
with the number of robots in the team [1]. Time critical applications like search-
and-rescue scenarios and forest fire reckoning are therefore well-suited for multi
robot applications. However, in most cases, the environment in which the task is
being carried is unknown and unfortunately, the use of robotic systems in such
settings remains a challenge. Robots are prone to exhibiting failures when facing
the associated random hazards and constantly changing conditions of such envi-
ronments. Additionally, the agents forming a robot swarm are generally simple
and inefficient on their own [2] and as a result, are not particularly tolerant
to faults. If in theory the inherent redundancy of robot swarms should provide
increased robustness, it has been demonstrated that, in practice, robot swarms
can even exhibit lower reliability levels than traditional centralized robots sys-
tems [3]. Individual failures often lead to a complete system failure, challenging
the assumption that swarm systems are robust by nature [4]. Developing a tool
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 57–68, 2024.
https://doi.org/10.1007/978-3-031-51497-5_5
58 G. Ricard et al.
been proposed in the past and the three main families are presented in the
same survey [6]. (i) Data-driven approaches use sensor data and compare it to
past normal/abnormal behaviors or to the behavior of neighbor agents to detect
faults. (ii) Model-based approaches use a priori explicit model of the system
to identify faults. (iii) Knowledge-based approaches are similar to the way a
human would perform fault detection. Faults are typically represented in a tree
structure where the fault can be linked backwards to where it originated. The
main drawbacks of the last two approaches is the need of a priori model of the
system and its inefficiency in detecting faults in dynamic environments.
Additionally, two strategies can be applied to fault detection: endogenous
fault detection and exogenous fault detection [7,8]. Endogenous fault detection
refers to the actions taken by an individual to perform fault detection on itself
and on its own. It is the robot’s responsibility to determine if it is in a faulty
state. In opposition, exogenous fault detection refers to the actions taken by
neighboring robots to detect faults on an observed entity. This approach relies
on the observations of external robots and requires collaboration of neighboring
agents. Exogenous data-driven fault detection methods are the best-suited for
robot swarms. Exogenous strategies leverage the power of swarms: collaboration
of all team members. Data-driven methods allow the detection of previously
unseen faults and, as a result, enable the deployment of the swarm in unknown
and dynamic environments.
An exogenous data-driven fault detection method inspired by the human
immune system has been proposed by Tarapore et al. [9–11]. Binary feature
vectors that characterize agents’ behavior are periodically shared among robots
of the swarm. Then, using the mathematical formulation of the cross-regulation
model along with the feature vectors, outliers are detected. The method detects
faults by assuming that the behavioral abnormality is the result of a fault. Non-
binary feature vectors could be used to perform fault detection using more com-
mon outlier detection methods like K-nearest neighbor (KNN) and local outlier
factor [12]. A neural network fault detection approach was proposed in [13]
where faults are presumed to cause changes in the data flow of the system. By
monitoring the flow, they can infer the presence of a faults. Of course, such a
method requires learning, which can be challenging when failures are concerned.
However, because outlier detection identifies abnormal behaviors and not faults
directly, it is hard to perform diagnosis and identify the adequate recovery pro-
cedure. In [14,15], authors address the problem by adding a fault diagnosis layer
that uses feature vectors after the fault detection one.
Ono et al., Vitus et al. introduced the concept of a “risk budget” [17,18] and
leverage it to both maximize reward and optimize allocation of risk. Unfortu-
nately, global knowledge of the environment is required and is therefore ill-suited
for tasks performed in unknown environments. Vielfaure et al. proposed DORA-
Explorer [19], a risk-aware exploration algorithm that maximizes terrain coverage
while minimizing exposure to risk in a fully decentralized fashion. The method
assumes that the risk associated with the environment can directly be sensed
by the agents using an on-board sensor. Risk levels are then used to compute a
direction that will provide meaningful exploration gains while minimizing expo-
sure to risk. In all these methods the nature of risk is known by the robots: in
SPIDER [16] risk is connectivity, in Ono et al., Vitus et al. [17,18] a global state
of the environment is needed and in DORA-Explorer [19] risk can be measured
by the agents. However in some cases the nature of the risk might not be known,
especially is the mission is being carried in dynamic and unknown environments.
In FLAM we instead use a fault detector and leverage the identified failures to
build a map of the environmental hazards. Risk is never directly sensed, it is
inferred from the location of previous failures.
Fig. 1. FLAM intuition. Figure 1a: Robots of the swarm start exploring an unknown
and hazardous environment. Figure 1b: Some robots start experiencing failures due to
environmental hazards (in this example a mud pit causing wheel slippage and moun-
tains causing falling damage). Failures are detected using a data-driven exogenous
fault detector and their respective locations are recorded in the distributed belief map.
Figure 1c: A map of the environmental hazards that embodies the “risk beliefs” is built
and can be leveraged to provides risk-awareness to the swarm to perform the given
mission.
3 System Model
Taking inspiration from previous works, we build FLAM, an exogenous data-
driven fault detection mechanism that is able to identify interaction faults, in
other words, faults caused by exogenous events in the environment modeled as
a grid E ⊂ Z2 . Then, using the detected faults and their respective locations,
FLAM builds a belief map of where the hazards are located in the environment
and shares it among the swarm using the virtual stigmergy. The hazard map pro-
vides risk-awareness to agents of the swarm and could, for example, be later used
for planning purposes to help avoiding environmental pitfalls during subsequent
exploration efforts. This intuition is summarized in Fig. 1.
Several triggers are taken into account for risk sources. Some hazards pertain to
fixed areas matching specific terrain or obstacles in the environment: robots are
subject to such faults immediately as they enter the corresponding zone (e.g.
slippage).
Conversely, more diffuse areas of danger may affect the agents with a given
probability as they draw closer, causing interference that may result in robot
death or communication jamming, we model these as point radiation sources of
varying intensities (Ij )j∈S ∼ U(0, 1) akin to those used in DORA-Explorer [19].
The radiation induced for robot i at position xi ∈ E by a single source located
at sj ∈ S is expressed as:
Ij
rsj (xi ) = (1)
1 + λ||xi − sj ||2
62 G. Ricard et al.
with λ as a decay constant. To determine whether a robot fails due to its radi-
ation exposure, we sum all individual source radiation levels and add Gaussian
noise b ∼ N (0.005) to account for environment and source variability.
r(xi ) = b + rj (xi ) (2)
sj ∈S
we may consider all types of fault equal in terms of risk as FLAM leaves the
task of leveraging these maps to the exploration control software.
Thus, we store the detected failure count for each discrete cell in the envi-
ronment modeled as a grid. This information can be used with the current time
step to compute an estimate of the failure probability at position (x, y):
Nf
pt (x, y) = (3)
t
As the fault likelihood decreases with time, entries in the stigmergy older than
5 epochs are evicted after each update.
In a more realistic application, performing fault diagnosis lets us prioritize
which risks we want to avoid by applying additional weights when constructing
the global risk belief map.
3.5 Implementation
With the steps put forward, the FLAM implementation may be summarized as
the following algorithm:
Failed robots or robots that were voted against by the majority of the swarm
become inactive and stop running FLAM as soon as they are aware that they
no longer seem fit for duty. Updates to the stigmergy are one step apart to let
the information spread through the shared storage.
4 Simulations
4.1 Experimental Setup
We evaluated FLAM with different kinds of faults on the open-source ARGoS
robot swarm simulator [29] performing 50 simulation instances over 1200 time
64 G. Ricard et al.
Fig. 2. Simulation setup. Figure 2a: Fault-generating areas are present in the top-left
(slippage) and bottom-right (sensor saturation) corners highlighted with rectangles.
Red stars represent radiation sources fixed across simulation instances to achieve con-
sistent belief maps. Figure 2b robots inside ARGoS, cylinders indicate radiation sources
Simulations logs are collected for each robot and combined to assess fault
detection performance with simple counting rules for determining true positives
(TP), false positives (FP) and false negatives (FN).
4.2 Results
Table 1. Overall fault detection performance by method. The scores capture the
retained counting rules for TP, FP, FN. Left: N = 10, Right: N = 20
Fig. 4. Generated risk-belief map with slippage (blue rectangle) and sensor satura-
tion (green rectangle) areas as well as radiation sources (red stars). Darker intensity
indicates more robot failures at that discrete location. Left: N = 10, Right: N = 20
66 G. Ricard et al.
5 Conclusion
We introduced FLAM, a multiclass bio-inspired distributed fault detection
method that builds risk belief maps from identified robot failures. We had
expected that successfully applying CRM [10] would outperform basic KNN
[12] classification for fault detection and verified that KNN [12] outputs more
false positives. CRM [10] avoids this sensitivity pitfall via inherently conserva-
tive design though precision is not as high in sparse swarms (N = 10) as in
denser swarms (N = 20) because of the challenging small-scale swarm context
FLAM operates in to reflect practical applications. Both approaches cease to
detect faults when too few robots are left (stagnancy in reported faults Fig. 3).
Independently of fault detection accuracy, maps of failures are obtained that are
consistent with the simulation setups as they highlight the frontier of the error
zones and the surroundings of radiation sources (Fig. 4).
The generated risk belief maps may be used to guide navigation for subse-
quent tasks in the environment. Future work may involve building risk-aware
exploration at the same time as FLAM-based risk assessment is conducted, tak-
ing immediate action to alleviate faults (e.g. by increasing obstacle avoidance
sensitivity and radius). Such robot response can be explored either as automatic
countermeasures or interactive visualization presenting risk and terrain informa-
tion to a remote operator as it is gathered.
References
1. Burgard, W., Moors, M., Stachniss, C., Schneider, F.E.: Coordinated multi-robot
exploration. IEEE Trans. Robot. 21(3), 376–386 (2005). https://ieeexplore.ieee.
org/abstract/document/1435481
2. Şahin, E.: Swarm robotics: from sources of inspiration to domains of application.
In: Şahin, E., Spears, W.M. (eds.) SR 2004. LNCS, vol. 3342, pp. 10–20. Springer,
Heidelberg (2005). https://doi.org/10.1007/978-3-540-30552-1 2
3. Winfield, A.F., Nembrini, J.: Safety in numbers: fault-tolerance in robot swarms.
Int. J. Model. Identif. Control 1(1), 30–37 (2006). https://www.inderscienceonline.
com/doi/abs/10.1504/IJMIC.2006.008645
4. Bjerknes, J.D., Winfield, A.F.T.: On fault tolerance and scalability of swarm
robotic systems. In: Martinoli, A., et al. (eds.) Distributed Autonomous Robotic
Systems. STAR, vol. 83, pp. 431–444. Springer, Heidelberg (2013). https://doi.
org/10.1007/978-3-642-32723-0 31
5. Yang, G.-Z., et al.: The grand challenges of Science Robotics. Sci. Robot. 3(14),
eaar7650 (2018). https://www.science.org/doi/abs/10.1126/scirobotics.aar7650
6. Khalastchi, E., Kalech, M.: On fault detection and diagnosis in robotic systems.
ACM Comput. Surv. 51(1), 1–24 (2018). https://doi.org/10.1145/3146389
7. Christensen, A.L.: Fault detection in autonomous robots. Ph.D., Université Libre
de Bruxelles (2008)
8. Lau, H.K.: Error detection in swarm robotics: a focus on adaptivity to dynamic
environments. Ph.D. dissertation, University of York (2012)
9. Tarapore, D., Lima, P.U., Carneiro, J., Christensen, A.L.: To err is robotic, to toler-
ate immunological: fault detection in multirobot systems. Bioinspiration Biomimet-
ics 10(1), 016014 (2015)
FLAM: Fault Localization and Mapping 67
10. Tarapore, D., Christensen, A.L., Timmis, J.: Generic, scalable and decentralized
fault detection for robot swarms. PLoS ONE 12(8), e0182058 (2017)
11. Tarapore, D., Timmis, J., Christensen, A.L.: Fault detection in a swarm of physical
robots based on behavioral outlier detection. IEEE Trans. Robot. 35(6), 1516–1522
(2019)
12. Breunig, M.M., Kriegel, H.-P., Ng, R.T., Sander, J.: LOF: identifying density-based
local outliers. In: Proceedings of the 2000 ACM SIGMOD International Conference
on Management of Data, pp. 93–104 (2000)
13. Christensen, A.L., O’Grady, R., Birattari, M., Dorigo, M.: Fault detection in
autonomous robots based on fault injection and learning. Auton. Robot. 24(1),
49–67 (2008). https://doi.org/10.1007/s10514-007-9060-9
14. O’Keeffe, J., Tarapore, D., Millard, A.G., Timmis, J.: Towards fault diagnosis in
robot swarms: an online behaviour characterisation approach. In: Gao, Y., Fallah,
S., Jin, Y., Lekakou, C. (eds.) TAROS 2017. LNCS (LNAI), vol. 10454, pp. 393–
407. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64107-2 31
15. O’Keeffe, J., Tarapore, D., Millard, A.G., Timmis, J.: Adaptive online fault diag-
nosis in autonomous robot swarms. Front. Robot. AI 5, 131 (2018)
16. Hunt, E.R., Jenkinson, G., Wilsher, M., Dettmann, C.P., Hauert, S.: SPIDER: a
bioinspired swarm algorithm for adaptive risk-taking. In: Artificial Life Conference
Proceedings, pp. 44–51. MIT Press, Cambridge (2020)
17. Ono, M., Williams, B.C.: An efficient motion planning algorithm for stochastic
dynamic systems with constraints on probability of failure. In: AAAI, pp. 1376–
1382 (2008)
18. Vitus, M.P., Tomlin, C.J.: On feedback design and risk allocation in chance con-
strained control. In: 2011 50th IEEE Conference on Decision and Control and
European Control Conference, pp. 734–739. IEEE (2011)
19. Vielfaure, D., Arseneault, S., Lajoie, P.-Y., Beltrame, G.: DORA: distributed online
risk-aware explorer (2021)
20. Amigoni, F., Banfi, J., Basilico, N.: Multirobot exploration of communication-
restricted environments: a survey. IEEE Intell. Syst. 32(6), 48–57 (2017). https://
ieeexplore.ieee.org/abstract/document/8267592
21. Otte, M.: An emergent group mind across a swarm of robots: collective cognition
and distributed sensing via a shared wireless neural network. Int. J. Robot. Res.
37(9), 1017–1061 (2018)
22. Pinciroli, C., Lee-Brown, A., Beltrame, G.: A tuple space for data sharing in robot
swarms. In: Proceedings of the 9th EAI International Conference on Bio-Inspired
Information and Communications Technologies (Formerly BIONETICS), pp. 287–
294 (2016). https://carlo.pinciroli.net/pdf/Pinciroli:BICT2015.pdf
23. Pinciroli, C., Beltrame, G.: Buzz: a programming language for robot swarms. IEEE
Softw. 33(4), 97–100 (2016)
24. Majcherczyk, N., Pinciroli, C.: SwarmMesh: a distributed data structure for
cooperative multi-robot applications. In: 2020 IEEE International Conference
on Robotics and Automation (ICRA), pp. 4059–4065. IEEE (2020). https://
ieeexplore.ieee.org/abstract/document/9197403
25. Kobayashi, F., Sakai, S., Kojima, F.: Sharing of exploring information using belief
measure for multi robot exploration. In: 2002 IEEE World Congress on Computa-
tional Intelligence. 2002 IEEE International Conference on Fuzzy Systems. FUZZ-
IEEE 2002. Proceedings (Cat. No. 02CH37291), vol. 2, pp. 1544–1549 (2002)
68 G. Ricard et al.
26. Kobayashi, F., Sakai, S., Kojima, F.: Determination of exploration target based
on belief measure in multi-robot exploration. In: Proceedings 2003 IEEE Inter-
national Symposium on Computational Intelligence in Robotics and Automation.
Computational Intelligence in Robotics and Automation for the New Millennium
(Cat. No. 03EX694), vol. 3, pp. 1545–1550 (2003)
27. Indelman, V.: Cooperative multi-robot belief space planning for autonomous nav-
igation in unknown environments. Auton. Robot. 42(2), 353–373 (2018). https://
doi.org/10.1007/s10514-017-9620-6
28. Stachniss, C., Burgard, W.: Mapping and exploration with mobile robots using
coverage maps. In: Proceedings 2003 IEEE/RSJ International Conference on Intel-
ligent Robots and Systems (IROS 2003) (Cat. No. 03CH37453), Las Vegas, Nevada,
USA, vol. 1, pp. 467–472. IEEE (2003)
29. Pinciroli, C., et al.: ARGoS: a modular, multi-engine simulator for heterogeneous
swarm robotics. In: 2011 IEEE/RSJ International Conference on Intelligent Robots
and Systems, pp. 5027–5034 (2011)
30. K-Team, “Khepera IV” (2021). https://www.k-team.com/khepera-iv
Social Exploration in Robot Swarms
1 Introduction
Swarm robotics studies how to design large teams of robots which can work
together to achieve some goal. This takes inspiration from swarms found in
nature, particularly, the ability to produce complex emergent behaviours from
simple individual swarm agents reacting to their local environment [1,2]. Swarm
robotics aims to understand how to design the local interaction between neigh-
bours in the swarm to generate these complex behaviours, without any defined
leader [3]. Some of the key benefits of swarms systems are their fully decentralized
nature, which reduces risks of single point of failure, and provides redundancy
in the case of lost agents [4]. Swarm robotics has been investigated and applied
to a large range of applications, including, exploration, collective assembly, and
collective decision making [5,6]. Here, we focus on the problem of exploration of
unknown environments with swarms.
Exploration in robot swarms has been investigated widely, considering how
to design swarm systems to explore different types of environments. Swarms
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 69–82, 2024.
https://doi.org/10.1007/978-3-031-51497-5_6
70 E. Hogg et al.
are well suited to exploration as we can use large teams of simple robots which
can explore different areas in parallel. There are many existing dispersion based
algorithms, where agents can simply repel from their local neighbours, allowing
swarms to cover large areas of different shapes [7,8]. This has also been demon-
strated with minimal sensory requirements for real world applications [9,10].
Other approaches such as random walk behaviours have also been shown to
be effective solutions [11,12]. Exploration is also a fundamental requirement in
foraging based problems [13]. Foraging algorithms can use virtual pheromones
placed by agents to guide other swarm agents in the correct direction [14]. Whilst
each of these proposed swarm algorithms have been shown to be effective, each
are suited to different types of environments. When faced with uncertainty over
the nature of the environment, it is not clear how to select suitable behaviours.
Therefore, there is a need for a universal swarm search strategy.
In this paper we explore the design of a social exploration algorithm which
allows swarms to explore complex, uncertain environments by giving agents the
option to individually choose between different swarm behaviours. Without any
prior knowledge, agents trial different behaviours to try and find the best current
option. Agents use a “happiness” measure to assess their degree of free movement
when exploring to determine whether a behaviour is effective. Crucially, agents
work together by sharing opinions of different behaviours to discover the most
suitable behaviours to use (Fig. 1). This naturally allows the swarm to adapt
its search to the configuration of the environment. We discuss our algorithm in
greater detail in Sect. 3. In the following section we highlight similar work to
our presented algorithm and finally we discuss the performance of our algorithm
when compared to other swarm behaviours in Sect. 4.
Fig. 1. Swarm agents reason over suitable swarm behaviours to explore environments
using a happiness measure, indicating their ability to explore freely. Agents locally
compare happiness to find suitable behaviours and improve exploration.
Social Exploration in Robot Swarms 71
2 Related Work
The social exploration algorithm discussed in this paper is inspired from other
examples of collective decision making in robot swarms [15]. Examples of collec-
tive decision making can be found in nature such as site selection in ant swarms
[16]. Some examples applied to robotics include the “best of n” problem, tasking
the swarm with comparing the qualities of different sites, aiming to reach con-
sensus on the best option [17]. In these problems, agents develop opinions over
different options and through different communication mechanisms, share this
evidence with their neighbours. Through information sharing, agents compare
beliefs until the swarm can form consensus on one shared belief or decision [18].
Our work relates to the decision making mechanisms used within this field but
crucially differs in that we do not want to achieve global consensus in the swarm.
Agents will share their opinions of different options of swarm behaviours and
whilst we may observe small groups of agents which adopt the same behaviour,
this should remain highly localized. Talamali et al. [19] have also explored collec-
tive decision making whilst with short communication ranges, highlighting the
benefit of localized information sharing.
Global consensus is not a priority in our work, but we find a similar example
where swarm agents reason over possible actions in a foraging task [20]. Here
it is shown that the swarm can form consensus over the best of two actions to
find the shortest path between a food and nest site. Whilst agents don’t need to
directly measure the reward of each action, only two options need to be consid-
ered given two possible paths to take. In exploration of complex environments,
agents are required to make many correct decisions in sequence when many
possible complex paths are present.
We also highlight themes in Particle Swarm Optimization (PSO) which over-
lap with our algorithm. In PSO, swarm agents search an N dimensional param-
eter space for optimal solutions, recording their best individual state and states
of their neighbours [21]. The particles use this shared historical information to
determine their velocity headings, aiming to search for optimal solutions [22,23].
There are similarities here where agents share their states with some value asso-
ciated to that state. Our problem differs from PSO as there is no sense of an
optimal state that we want to reach, rather promoting continuous exploration.
Additionally, agents do not use a sense of a best previous position or state,
instead agents use an internalized sense of performance based only on their
choose of a certain option.
Our implementation of agent happiness also links to themes in reinforcement
learning (RL) using reward functions. In multi-agent RL, agents aim to deter-
mine optimal policies to solve some problem which consist of a specific sequence
of actions. This is achieved using a reward function which is associated to dif-
ferent actions an agent can take in different possible states [24,25]. Whilst in
RL, the reward function is used to construct policies over many iterations, here
we employ a real-time sense of reward to make a decision at a given point in
time. Our agents do not construct policies or decide upon sequences of actions
to perform.
72 E. Hogg et al.
3 Methodology
3.1 Simulation and Local Swarm Behaviours
We investigate the exploration of complex environments in a custom simulator
built in Python. Swarm agents are modelled as particles which can move in
any direction with a degree of random motion. Agents aim to represent either
quadcopter or omni-directional wheel based robots. Potential field forces allow
agents to avoid obstacles and walls in the environment. The agents travel at a
constant speed of 0.5 m/s. In our investigations we aim to maximise coverage
of environments which we measure based on the detection of a set of objectives
which are evenly distributed over the environment. An objective is detected
when the euclidean distance between an agent and an objective is less than 2.5
m. Objectives are distributed evenly 1 m apart over the environment, giving a
granular measurement of coverage. The swarm agents have no access to coverage
information.
In this investigation, individual agents can choose to follow different local
rules or swarm behaviours. In this case we implement a directed field behaviour
which allows the swarm to disperse in a specific direction. We separate this into
eight discrete behaviours allowing the swarm to disperse, North, East, South,
and West. As well as, Northeast, Northwest, Southeast, and Southwest. We
explored the application of these behaviours in a previous study using central-
ized behaviour selection [26]. These behaviours allow the swarm to disperse in
specific directions, and in this case, individual agents can choose between which
directions to disperse rather than all agents following the same global rules.
Agents should then work together to determine which direction is best for them
to travel to explore an environment. We also give agents the option of selecting a
random walk behaviour in addition to the directed field behaviours. The process
of selecting between behaviours is discussed in detail in Sect. 3.4.
Key parameters which control the types of environments that are generated
include, bounds for the segment lengths within corridors, clen , the probability
of creating sub-corridors, psub , and the minimum area which corridors cover in
the environment, cprop . All environments in this study were generated with the
same bounding dimensions of 80 × 80 m.
xt − xt−τ
α= (1)
τ vc
The second factor included is agent density. When agents update their hap-
piness measure, at that moment in time, they check how many other agents
are within their communication range. To promote coverage, agents should not
be densely packed together. Therefore we define β where n is how many neigh-
bours an agent has and nmax is the limit on the maximum number of neighbours
an agent should have. In this case, nmax = 10. This means agents with fewer
neighbours are happier.
⎧n
max − n
⎪
⎨ if n < nmax
nmax
β= (2)
⎪
⎩ 1
if n >= nmax
nmax
happiness = α ∗ β (3)
74 E. Hogg et al.
Agent happiness is then given by Eq. 3 which gives a value between 0 for
lowest happiness, and 1 for highest happiness. Given a defined happiness measure
which agents can use to assess different behaviours, agents can then reason over
which behaviours are best to use.
Unhappy Agents. If the agent’s happiness is low and has not improved since
the last state check, then the agent needs to switch to some other behaviour.
When agents are unhappy, they need to communicate with their neighbours to
find a better option. In this case the agent will first look for agents with higher
Social Exploration in Robot Swarms 75
happiness than its own. If a neighbour does have a higher happiness, the agent
will switch their behaviour to match their neighbour. This allows unhappy agents
to use positive evidence from their neighbours to adopt suitable behaviours.
However, if a neighbour is using the same behaviour, the unhappy agent will not
switch their behaviour as it conflicts with their own evidence.
When all an agent’s neighbours have equal or lower happiness, the agent
will then select a behaviour which none of its neighbours are using. Here the
agent uses the negative evidence of it’s neighbours to eliminate bad options. In
addition, there is a probability that upon selecting a new behaviour, the agent
will in turn signal to each neighbour to also switch to this new behaviour. This
allows agents to help unhappy neighbours, promoting co-operation rather than
individual gain.
If an unhappy agent has no neighbours, it will randomly switch to a new
behaviour. With no external evidence, agents use trial and error to discover
more suitable behaviours.
Happy Agents. When an agent has high happiness, it does not make any
changes to its chosen behaviour, but it can influence agents with low happiness.
At each time-step, happy agents have a probability of broadcasting their chosen
behaviour to their neighbours. This probability is set equal to their happiness
level. If a broadcast is successful, a neighbour which receives the broadcast will
switch its behaviour to the happy agent, but only if its happiness is significantly
higher than its own. It is important that happy agents help other agents to
improve performance of the swarm as a whole.
4 Results
In this section we discuss the emergent behaviour produced by the social explo-
ration algorithm and performance across large sets of random environments
(videos of simulations are presented here). For all experiments shown here we
selected a short update interval of 20 s to allow agents to frequently check their
happiness. We also set the communication range between agents to 5 m such
that evidence is only shared between agents in close proximity.
Fig. 2. Emergent behaviour of the swarm using happiness to select between different
options of exploration.
hits a wall and becomes unhappy. Without any neighbours present, an agent
simply picks a random behaviour to try. The agent briefly attempts to head
Southwest, with no success, before then switching to heading Northeast. This
then allows the agent to head away from the corner and we observe a sharp
improvement in happiness. Whilst the agent has high happiness, no change of
behaviour is needed. As navigation continues, we see the agent repeat this pro-
cess each time their happiness drops. This trial and error approach allows agents
to fix problems when they become unhappy, allowing them to navigate out of
positions where they become stuck. This creates agents which can adapt to their
surroundings without complex reasoning over why they have become stuck.
Social Exploration. Whilst with only 1 agent we have to rely on trial and
error, with many agents, this approach becomes more powerful where agents can
collaborate and share knowledge. Figure 2b shows another environment with 30
deployed agents. By utilizing the mechanisms discussed in the previous section,
happy and unhappy agents work together to try and promote the selection of
effective behaviours in their surroundings. Through the use of happy agents
which broadcast their behaviours, and unhappy agents which collaborate with
Social Exploration in Robot Swarms 77
Fig. 3. Trails of 50 agents deployed in a maze environment. The colour of the trials
in the environment indicate the level of happiness of agents, red trials show positions
where agents had happiness lower than 0.5, and green if greater than 0.5. The coverage
achieved over time is presented in comparison to 50 agents performing dispersion or
random walk.
and similarly, using random walk is not suited to specific paths that need to be
travelled.
Fig. 4. Performance comparison of social exploration algorithm and random walk and
dispersion behaviours with 25 agents. Each data point shows the comparison in perfor-
mance of two algorithms for one environment. Each data set contains 250 environments.
Points shown above the red line indicate higher performance with the social exploration
algorithm.
To systematically test the efficacy of our algorithm we test the performance over
a large set of randomly generated environments. The developed algorithm should
enable the swarm to adapt to search a variety of environments and agents will
work together to promote exploration. We test the performance of a swarm of
25 agents over a set of 250 randomly generated environments. For each envi-
ronment, the swarm has 1000 s to explore and we take the mean performance
of each algorithm over 10 separate trials. The input parameters for the gener-
ated environments (as discussed in Sect. 3.2) are also randomly selected within
controlled bounds to generate a broad spectrum of environments. We measure
performance based on the integral of coverage achieved over time, λ, as shown
by Eq. 4. Here, C(t) is the function of coverage achieved over time and, T , is the
total search duration. Rather than only measuring overall coverage, we also want
to measure the rate of coverage, where λ is higher for faster exploration. Fast
exploration is important in time critical scenarios such as search and rescue.
T
1
λ= C(t) dt (4)
T 0
Social Exploration in Robot Swarms 79
Acknowledgement. This work was funded and delivered in partnership between the
Thales Group and the University of Bristol, and with the support of the UK Engineering
and Physical Sciences Research Council Grant Award EP/R004757/1 entitled “Thales-
Bristol Partnership in Hybrid Autonomous Systems Engineering (T-B PHASE)”.
References
1. Meng, X.B., Gao, X.Z., Lu, L., Liu, Y., Zhang, H.: A new bio-inspired optimisation
algorithm: bird swarm Algorithm. J. Exp. Theor. Artif. Intell. 28(4), 673–687
(2016)
2. Mirjalili, S., Gandomi, A.H., Mirjalili, S.Z., Saremi, S., Faris, H., Mirjalili, S.M.:
Salp swarm algorithm: a bio-inspired optimizer for engineering design problems.
Adv. Eng. Softw. 114, 163–191 (2017)
3. Dorigo, M., Theraulaz, G., Trianni, V.: Swarm robotics: past, present, and future.
Proc. IEEE 109(7), 1152–1165 (2021)
4. Zhu, B., Xie, L., Han, D., Meng, X., Teo, R.: A survey on recent progress in control
of swarm systems. Sci. China Inf. Sci. 60(7), 1–24 (2017)
5. Hanay, Y.S., Ilter, M.: Aggregation , Foraging , and Formation Control of Swarms
with Non-Holonomic Agents Using Potential Functions and Sliding Mode Tech-
niques, vol. 15, no. 2, pp. 149–168 (2007)
6. Cardona, G.A., Calderon, J.M.: Robot swarm navigation and victim detection
using rendezvous consensus in search and rescue operations. Appl. Sci. (Switzer-
land) 9(8) (2019)
7. McLurkin, J., Smith, J.: Distributed algorithms for dispersion in indoor environ-
ments using a swarm of autonomous mobile robots. Distrib. Auton. Robot. Syst.
6, 399–408 (2008)
8. Novischi, D.M., Florea, A.M.: Decentralized swarm aggregation and dispersion
with inter-member collision avoidance for non-holonomic multi-robot systems. In:
Proceedings - 2018 IEEE 14th International Conference on Intelligent Computer
Communication and Processing, ICCP 2018, pp. 89–95 (2018)
9. Bayert, J., Khorbotly, S.: Robotic swarm dispersion using gradient descent algo-
rithm. In: IEEE International Symposium on Robotic and Sensors Environments,
ROSE 2019 - Proceedings (2019)
10. Ludwig, L., Gini, M.: Robotic swarm dispersion using wireless intensity signals.
Distrib. Auton. Robot. Syst. 7, 135–144 (2006)
11. Kegeleirs, M., Garzón Ramos, D., Birattari, M.: Random walk exploration for
swarm mapping. In: Althoefer, K., Konstantinova, J., Zhang, K. (eds.) TAROS
2019. LNCS (LNAI), vol. 11650, pp. 211–222. Springer, Cham (2019). https://doi.
org/10.1007/978-3-030-25332-5 19
12. Pang, B., Song, Y., Zhang, C., Wang, H., Yang, R.: A swarm robotic exploration
strategy based on an improved random walk method. J. Robot. 2019(i) (2019)
13. Lu, Q., Fricke, G.M., Ericksen, J.C., Moses, M.E.: Swarm foraging review: closing
the gap between proof and practice. Curr. Robot. Rep. 1(4), 215–225 (2020)
14. Song, Y., Fang, X., Liu, B., Li, C., Li, Y., Yang, S.X.: A novel foraging algorithm
for swarm robotics based on virtual pheromones and neural network. Appl. Soft
Comput. J. 90, 106156 (2020)
15. Valentini, G., Brambilla, D., Hamann, H., Dorigo, M.: Collective perception of
environmental features in a robot swarm. In: Dorigo, M., et al. (eds.) ANTS 2016.
LNCS, vol. 9882, pp. 65–76. Springer, Cham (2016). https://doi.org/10.1007/978-
3-319-44427-7 6
82 E. Hogg et al.
16. Mallon, E.B., Pratt, S.C., Franks, N.R.: Individual and collective decision-making
during nest site selection by the ant Leptothorax albipennis. Behav. Ecol. Sociobiol.
50(4), 352–359 (2001)
17. Valentini, G., Ferrante, E., Dorigo, M.: The best-of-n problem in robot swarms:
formalization, state of the art, and novel perspectives. Front. Robot. AI 4(MAR)
(2017)
18. Valentini, G., Hamann, H., Dorigo, M.: Efficient decision-making in a self-
organizing swarm of simple robots: on the speed versus accuracy Trade-Off G.
Valentini, G., Hamann, H., Dorigo, M. Technical Report No ., no. September, pp.
1305–1314 (2014)
19. Talamali, M.S., Saha, A., Marshall, J.A., Reina, A.: When less is more: robot
swarms adapt better to changes with constrained communication. Sci. Robot. 6(56)
(2021)
20. Scheidler, A., Brutschy, A., Ferrante, E., Dorigo, M.: The k-unanimity rule for
self-organized decision-making in swarms of robots. IEEE Trans. Cybern. 46(5),
1175–1188 (2016)
21. Zhao, S.Z., Liang, J.J., Suganthan, P.N., Tasgetiren, M.F.: Dynamic multi-swarm
particle swarm optimizer with local search for large scale global optimization. In:
2008 IEEE Congress on Evolutionary Computation. CEC 2008, pp. 3845–3852
(2008)
22. Poli, R., Kennedy, J., Blackwell, T.: Particle swarm optimization: an overview.
Swarm Intell. 1(1), 33–57 (2007)
23. Tian, D., Shi, Z.: MPSO: modified particle swarm optimization and its applications.
Swarm Evol. Comput. 41(August 2017), 49–68 (2018)
24. Althamary, I., Huang, C.W., Lin, P.: A survey on multi-agent reinforcement learn-
ing methods for vehicular networks. In: 2019 15th International Wireless Commu-
nications and Mobile Computing Conference. IWCMC 2019, pp. 1154–1159 (2019)
25. Arulkumaran, K., Deisenroth, M.P., Brundage, M., Bharath, A.A.: Deep reinforce-
ment learning: a brief survey. IEEE Signal Process. Mag. 34(6), 26–38 (2017)
26. Hogg, E., Hauert, S., Harvey, D., Richards, A.: Evolving behaviour trees for super-
visory control of robot swarms. Artif. Life Robot. 25(4), 569–577 (2020)
27. Cimino, M.G.C.A., Lazzeri, A., Vaglini, G.: Combining stigmergic and flocking
behaviors to coordinate swarms of drones performing target search. In: 2015 6th
International Conference on Information, Intelligence, Systems and Applications
(IISA), pp. 1–6, IEEE, July 2015
28. Hunt, E.R., Jones, S., Hauert, S.: Testing the limits of pheromone stigmergy in
high-density robot swarms. R. Soc. Open Sci. 6(11) (2019)
Stochastic Nonlinear Ensemble Modeling
and Control for Robot Team
Environmental Monitoring
1 Introduction
Robot teams performing long-term environmental monitoring require methods
to manage different data collection across the team’s varied sensing modalities.
Since environments can be dynamic and uncertain, it is often necessary to vary
the distribution of robots within the workspace over time as they perform their
monitoring tasks. Existing strategies are mostly bottom-up where the objective
is to synthesize individual robot strategies, [14], but are difficult to ensure the
desired global distribution of the team over time. We present an approach to
model and control time-varying distributions of robots with multi-modal sens-
ing capabilities that extends existing top-down methods, [6], by modeling robot
interactions that govern changes in sensing behavior.
To illustrate the importance of our approach, consider the long-term deploy-
ment of a robot team to monitor marine life in a littoral ocean environment.
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 83–99, 2024.
https://doi.org/10.1007/978-3-031-51497-5_7
84 V. Edwards et al.
robots can also change their behaviors according to the overall environmental
conditions, e.g., tidal changes, collaboration refers to the robot’s ability to adjust
its short term goal to attain a global objective. To achieve a representation that
exhibits those complex behavioral modes, we model the macroscopic continuous
dynamics of the team using the Replicator Equations, which are commonly used
in Biology to study natural selection and evolutionary dynamics [25]. Alternative
models have been considered to understand evolutionary dynamics with poten-
tial applications to robotics, but existing work primarily focus on the continuous
analysis of discrete solutions overlooking the fundamentally stochastic nature
of the microscopic multirobot system [5,16,21]. Instead, we use the Replicator
Equations to model a population of robots interacting in a collaborative fashion,
where the collaboration necessitates a time-varying distribution of robots across
possible behavioral modes and/or specific regions in the workspace.
The contributions in this paper are: 1) the development of a nonlinear macro-
scopic model for a collaborative robot team based on the Replicator Equa-
tions, 2) the design of a feedback control strategy to handle uncertainties during
team deployment based on the proposed nonlinear macroscopic model, and 3)
a method to distribute macroscopic model derived feedback control strategy for
the team. We show how our proposed framework works well for environmental
monitoring tasks that requires robots with several sensing capabilities to work
collaboratively over time through our multi-level fidelity simulation and experi-
mental results.
2 Problem Formulation
Consider a set of M tasks where for each task a robot must exhibit a specific set
of behaviors, for example sensing a specific phenomenon. We build an undirected
graph to represent the tasks and potential switching between task. The graph
G = (V, E), has nodes for each task vj ∈ V where j = 1, ..., M , and edges eij ∈ E.
If an edge eij = 0 then it is not possible for robots to switch from task i to task
j, likewise, when eij = 1 it is possible for robots to switch from task i to task
j. We assume the graph G is strongly connected, which means that there is a
path from each task to any other task in the graph. Note that the existence of
an edge is not reflective of distance to a spatially distributed task as in [19,22],
but instead reflective of the potential for robot collaboration to occur.
86 V. Edwards et al.
Example 1: Consider the canonical example of the Lokta Volterra model, writ-
ten as the Replicator Equations in (2) as detailed by Hofbauer [11]. This example
has three tasks, M = 3, and the payoff matrix is
⎡ ⎤
0 0 0
K = ⎣ k10 0 −k12 ⎦ . (3)
−k20 k21 0
which correspond to the following differential equations that describe how the
processes evolve in time,
The parameters k10 , k12 , k20 , and k21 , are the collaboration rates, and define
the result of an interaction between two robots performing task i or task j.
The possible outcomes based on the collaboration rate are; keep doing the same
tasks, or have one of the robots change task.
This model is of interest to us because it exhibits time-varying population
distributions. Consider the following equilibrium for positive robot populations,
T
Y d = 0, k12k−k
12
21
, k
k21
21 −k 12
. If we linearize Eq. (5) about Y d , and compute the
√ √
eigenvalues of the linearization we get λ0 = 0, λ1 = − ik21
√ k10 , and λ2 = ik21
k20
√ k10 .
k20
From dynamical systems theory we know that the system is classified as a center.
(See Guckenheimer and Holmes Ch 1 [9] for a detailed analysis.) This means that
for the same set of parameters, K, and different initial positions, Y (0), a unique
non-intersecting orbit will occur around the fixed point. The oscillations that
result from a center equilibrium produce time-varying distributions of robots.
From our littoral ocean monitoring case, we can select parameters which match
the tidal charts so that the desired sensing measurements are performed for
representative amounts of time without wasting space resources with unnecessary
data or compute resources for replanning.
88 V. Edwards et al.
The collaboration definitions can be easily achieved by using Eq. (1), and the
resulting dynamical system using Eq. (2) comes out to,
Notice that for kij = 1 an interaction results in a change from task j to task i,
and in the case that kij = −μ the result will be a change from task i to task j.
Again this model is of interest because of equilibrium behavior which provides
alternative time-varying population distributions. Shuster et al. proposed the
T
following equilibrium Y d = 14 , 14 , 14 , 14 , and if we linearize Eq. (7) about Y d ,
and compute the eigenvalues of the linearization we get λ0 = −1+μ 4 , λ1 =
−i+μ
4 ,
−1−μ
λ2 = 4 , and λ3 = 4 . The analysis that Schuster et al. performed con-
i+μ
with other agents influences the switch to different behaviors. The macroscopic
continuous solutions to our proposed models give deterministic behavior, and will
result in constant oscillations either around the center equilibrium in Example
1 or the limit cycle equilibrium in Example 2. However, because our models are
representations of stochastic processes we know that due to variability in the
robot-robot and robot-environment interactions our desired distributions may
not be achieved over time. This is important because if the environment we
are monitoring in has specific windows where phenomena can be observed it is
necessary that we stay near the desired distributions.
Yi∗ (t)
kij = αij ( − 1), (8)
Yi (t)
where αij is a tunable parameter, Yi∗ (t) is the desired trajectory and Yi (t) is the
current population fraction at time t. If we plug this new collaboration rate into
Eq. (1) we get:
Yi∗
αij ( Y −1)
i
Yi + Yj −−−−− −−→ 2Yi . (9)
After substituting Eq. (8) into the Replicator Equations in (2) we get:
M
Y∗
M
M
Yp∗
Ẏi = Yj [ (αij ( i − 1))Yi − Yl αpl ( − 1)Yp ],
j
Yi p
Yp
l
M
M
M
= Yj [ αij (Yi∗ − Yi ) − αpl Yl (Yp∗ − Yp )]. (10)
j p l
The resulting (Yi∗ − Yi ) acts as a proportional error term which regulates the
population based on the desired trajectory. Notice that, as before, we will have
collaboration rates that are negative which will switch the resulting collaboration
outcome. The same substitutions can be replicated for both Example 1 and
Example 2.
monitoring. The goal is to find the area of interaction, aij and corresponding
radius of interaction, rij , which are microscopic instantiations of the macroscopic
collaboration rate kij . If there is a nonzero collaboration rate, kij = 0, then we
know that when robots interact within a certain area a task switch will occur.
Notice that if the collaboration rate is zero, than the robots may interact but
this interaction will not result in collaboration. The area of interaction is directly
related to the collaboration rate, which is a measure of how close two agents need
to be to one another for a task switch to happen. Let Ŷi be the population count
for robots performing task i, then the area of interaction, aij , is computed as
follows
Akij
aij = , (11)
vij Ŷi Yˆj
where A is the area of the environment to monitor, kij is the collaboration rate,
vij is the relative speed of robot i and robot j, and Ŷi Ŷj are all disjoint pairs of
collaborations possible given the populations
of robots performing these tasks.
The robot radius of interaction is: rij = aij /π.
Incorporating the proposed feedback controller, Eq. (8), consider
Akij
aij = , (12)
vij Ŷj Ŷ ∗1−Ŷ
i i
where Ŷi∗ = Yi∗ N is the re-scaling of the desired trajectory population fraction
by the total number of robots in the system. The intuition for the feedback term
is that when Ŷi is far from Ŷi∗ then the resulting area of interaction will increase.
Likewise, when Ŷi is close to Ŷi∗ the area of interaction decreases. The interaction
area is updated in time as the microscopic simulations are running.
We extend our controller to consider distributed estimation of the population
dynamics. Similar to other methods, [3,15], we are proposing a local estimation
method of robots performing tasks. Consider that each robot has a fixed sens-
ing radius, Ri , and the ability to determine which robots within its radius are
performing which behaviors. The neighborhood of the robot, Ni , is every robot
within Ri , and the population estimate for robots performing different i tasks
within Ni is Ỹi . This method assumes that within the sensing radius, Ri , there
will be a sufficient density of robots to act as a proxy for the overall distribu-
tions of robot behaviors. Substituting Ỹi for Ŷi in Eq. (12) gives a new locally
computed interaction area. Note to scale the desired trajectory, Ỹi∗ = Yi∗ Ni .
Using local estimates and an appropriately scaled desired trajectory we have
eliminated the need for global information in our microscopic experiments.
trials. Parameters for all results were selected to be near our equilibria of interest
for both Example 1 and Example 2, which ensure that we see time-varying
distributions of robots performing tasks.
Our macro-discrete simulations used the Stochastic Simulation Algorithm
(SSA) from Gillespie [8]. SSA takes in initial populations of robots Ŷ (0) and
collaboration rates K. To determine the next time and type of event, SSA uses
a pseudo random number generator. While the total time and total number
of transitions are not exceeded, the following two steps are iteratively taken.
The next event is determined, by considering the likelihood of a certain task
switch based on the current population counts, Ŷi and the collaboration rates,
kij . Consider that each collaboration type has the following scaled collaboration
rate,
Fig. 1. The multi-robot Coherent Structure Testbed (mCoSTe) water tank where the
miniature Autonomous Surface Vehicles (mASV) perform different tasks.
1 1
tp+1 = tp + log . (14)
r2 bn
The process continues until the maximum time or maximum number of transi-
tions are met. An important observation is that SSA ignores robot dynamics, and
considers only the stochastic jump process making it mathematically equivalent
to microscopic simulations.
92 V. Edwards et al.
5 Results
Fig. 2. Example 1 has three time-varying tasks, and include five SSA trials. Figure
T
(a) parameters: Ŷ (0) = 2 50 50 , with k10 = 0.004, k12 = 0.0004, k20 = 0.003, and
T
k21 = 0.0008. Figure (b) parameters: Ŷ (0) = 2 50 50 , with α10 = 0.03, α12 = 0.003,
α20 = 0.0225, and α21 = 0.006.
Our results use multiple levels of fidelity to demonstrate how interactions can
induce time varying distributions of robot behaviors. When possible the same
initial conditions were used for both Example 1 and Example 2 to help draw com-
parisons between different types of results. Specifically, the desired trajectories
for Example 1 and Example 2 were the same for all fidelity results. The initial
T
conditions for Example 1 were a reference trajectory of Y ∗ (0) = 0.2 0.2 0.6
with parameters k10 = 2.0, k12 = 0.2, k20 = 1.5, and k21 = 0.4. The initial con-
T
ditions for Example 2 were a reference trajectory of Y ∗ (0) = 0.1 0.2 0.4 0.3
∗
and parameter μ = 0.01. Figures contain reference trajectories Y as solid lines,
and resulting population fractions as dots. Further details for parameters used
for each level of fidelity simulation and experiment are in the corresponding
figure captions. For both Example 1 and Example 2 we selected parameters near
the equilibrium presented in Sect. 2.2 so that time-varying distributions of robot
behaviors were observed.
94 V. Edwards et al.
Fig. 3. Example 2 has four time varying tasks and plots include five SSA trials. Figure
T
(a) parameters: Ŷ (0) = 5 10 15 20 , with k12 = k23 = k34 = k41 = 0.01 and μ =
T
0.0001. Figure (b) parameters: Ŷ (0) = 5 10 15 20 , with α12 = α23 = α34 = α41 =
0.1, and μ = 0.001
Five macrodiscrete, SSA, trail results are in Figs. 2 and 3. In the case with-
out control, Figs. 2a and 3a, each trial roughly follows the desired time varying
trajectory, but the variability is significant. This variability can be attributed
to microscopic behaviors not being accounted for during SSA. In the case with
control, Figs. 2b and 3b, the trials follow the trajectory closely as we anticipated.
The gains, αij were tuned so that the rate of interactions were sufficiently high,
if there was an insufficient interaction rate then the SSA trials would lag behind
the reference trajectory.
Fig. 4. Controlled microscopic results of Example 1. Figure (a) uses global population
estimation, and Figure (b) uses distributed population estimation where Ri = 5. Both
T
trials use Ŷ (0) = 2 25 25 , α10 = 2.0, α12 = 0.2, α20 = 1.5, and α21 = 0.4.
Stochastic Nonlinear Ensemble Modeling and Control 95
Fig. 6. Figure (a) shows a centralized mixed reality experiment, and Figure (b) shows
a distributed mixed reality experiment with Ri = 0.25. The parameters used are the
T
same as those used in Fig. 4. The environment area is 2.3 m×1.3 m and Ŷ (0) = 2 4 4 .
96 V. Edwards et al.
6 Discussion
Across multiple types of experimental results, we have demonstrated that we
can control time-varying distributions of robots. Our results consider the most
ideal case, where robots move randomly and consistently throughout the envi-
ronment with parameters that were selected near equilibria of interest to ensure
that our desired behavior was observed. There are several directions of future
work which need to be addressed to further enhance this approach. For example,
a more realistic robot behavior should be considered where there may be pro-
longed stops to complete a task or complex non-random behavior which would
change the reaction dynamics of our macroscopic model. Likewise, right now, we
have selected parameters to show that we can achieve time-varying distributions
of robots, however, in practice we want the team to evolve based on informa-
tion sensed in the environment. One way to address these nonidealities is to
consider feedback from the microscopic team to the macroscopic model. This
would require taking data collected from the individual robot and transforming
it into a form that is compatible with the macroscopic model, for example by
determining collaboration rate parameters kij .
A direction of great interest to the authors is to consider more broadly the
classes of replicator dynamics in general which achieve our task objective and
potentially other more diverse task objectives. Important properties of the repli-
cator dynamics which we have identified with our two examples is the need for
equilibria to have either limit cycle type behavior or center behavior. For exam-
ple, the resulting oscillations would not exist for parameters or equilibria which
resulted in a payoff matrix that had only real eigenvalues, which we know from
dynamical systems theory results in stable solutions [9].
Currently, we enforce that at least one agent must be performing a task at any
given time to avoid extinction events. Extinction can occur when the parameters
selected drive the populations to have extreme shifts in size, for example, all
agents performing one task. Without our stated assumption, there is no way for
the team to recover the other tasks if extinction happens. A solution would be
to consider equilibria which do not cause such extreme population fluctuations.
Alternatively, there are different nonlinear systems, for example, the Replicator
Mutator equations [21] which introduce variability to the model to help ensure
that only traits which are strong persist.
Stochastic Nonlinear Ensemble Modeling and Control 97
Finally, during the mixed reality experimental trials, robot collisions can
occur and are non-catastrophic due to the small size and low speed of the mASVs.
The addition of collision avoidance requires an additional scaling term to com-
pute the interaction radius and is a direction for future work. An additional
direction of future work is to consider how to expand experimental results to
further improve the distributed estimation using robot-robot communication.
A potential approach similar to [17], would introduce more stable results by
considering a history of observations made about neighbors.
7 Conclusion
In this work, we address the limitations of existing macroscopic modeling tech-
niques to model and control time-varying distributions of robot behaviors. This
is of particular importance for environmental monitoring tasks, where natu-
ral phenomena might require the populations of robots to change behavior, for
example, tidal shifts. Our proposed methods use robot-robot interactions, which
are explicitly ignored by existing linear mean field models. The result is a rich set
of time-varying robot behavior distributions that were not previously achievable.
In addition, to counteract known uncertainty during robot team deployment, we
propose a feedback control strategy which follows a specified desired distribution.
Our macro-discrete results show that we get improved control over the model,
and the idealized microscopic examples support that this is a fruitful avenue for
further study. Future work includes understanding different types of nonlinear
models, and further characterizing the uncertainty by seeking upper and lower
bounds on the macroscopic model.
References
1. Almadhoun, R., Taha, T., Seneviratne, L., Zweiri, Y.: A survey on multi-robot
coverage path planning for model reconstruction and mapping. SN Appl. Sci. 1(8),
1–24 (2019)
2. Berman, S., Halász, A., Hsieh, M.A., Kumar, V.: Optimized stochastic policies for
task allocation in swarms of robots. IEEE Trans. Rob. 25(4), 927–937 (2009)
3. Biswal, S., Elamvazhuthi, K., Berman, S.: Decentralized control of multi-agent
systems using local density feedback. IEEE Trans. Autom. Control 67(8), 3920–
3932 (2021)
4. Deshmukh, V., Elamvazhuthi, K., Biswal, S., Kakish, Z., Berman, S.: Mean-field
stabilization of Markov chain models for robotic swarms: computational approaches
and experimental results. IEEE Robot. Autom. Lett. 3(3), 1985–1992 (2018).
https://doi.org/10.1109/LRA.2018.2792696
98 V. Edwards et al.
5. Dey, B., Franci, A., Özcimder, K., Leonard, N.E.: Feedback controlled bifurca-
tion of evolutionary dynamics with generalized fitness. In: 2018 Annual American
Control Conference (ACC), pp. 6049–6054. IEEE (2018)
6. Elamvazhuthi, K., Berman, S.: Mean-field models in swarm robotics: a survey.
Bioinspiration Biomimetics 15(1), 015001 (2019). https://doi.org/10.1088/1748-
3190/ab49a4
7. Gerkey, B.P., Mataric, M.J.: Multi-robot task allocation: analyzing the complexity
and optimality of key architectures. In: 2003 IEEE International Conference on
Robotics and Automation (ICRA), vol. 3, pp. 3862–3868. IEEE (2003)
8. Gillespie, D.T.: Exact stochastic simulation of coupled chemical reactions. J. Phys.
Chem. 81(25), 2340–2361 (1977)
9. Guckenheimer, J., Holmes, P.: Nonlinear Oscillations, Dynamical Systems, and
Bifurcations of Vector Fields, vol. 42. Springer, Cham (2013). https://doi.org/10.
1007/978-1-4612-1140-2
10. Harwell, J., Sylvester, A., Gini, M.: Characterizing the limits of linear modeling of
non-linear swarm behaviors. arXiv preprint arXiv:2110.12307 (2021)
11. Hofbauer, J.: On the occurrence of limit cycles in the Volterra-Lotka equation.
Nonlinear Anal. Theory Methods Appl. 5(9), 1003–1007 (1981)
12. Hsieh, M.A., Halász, Á., Berman, S., Kumar, V.: Biologically inspired redistri-
bution of a swarm of robots among multiple sites. Swarm Intell. 2(2), 121–141
(2008)
13. Hsieh, M.A., Halasz, A., Cubuk, E.D., Schoenholz, S., Martinoli, A.: Specialization
as an optimal strategy under varying external conditions. In: 2009 IEEE Interna-
tional Conference on Robotics and Automation (ICRA), pp. 1941–1946 (2009).
https://doi.org/10.1109/ROBOT.2009.5152798
14. Khamis, A., Hussein, A., Elmogy, A.: Multi-robot task allocation: a review of the
state-of-the-art. Coop. Robots Sens. Netw. 2015, 31–51 (2015)
15. Lee, W., Kim, D.: Adaptive approach to regulate task distribution in swarm robotic
systems. Swarm Evol. Comput. 44, 1108–1118 (2019)
16. Leonard, N.E.: Multi-agent system dynamics: bifurcation and behavior of animal
groups. Annu. Rev. Control. 38(2), 171–183 (2014)
17. Lerman, K., Jones, C., Galstyan, A., Matarić, M.J.: Analysis of dynamic task allo-
cation in multi-robot systems. Int. J. Robot. Re. 25(3), 225–241 (2006). https://
doi.org/10.1177/0278364906063426
18. Lerman, K., Martinoli, A., Galstyan, A.: A review of probabilistic macroscopic
models for swarm robotic systems. In: Şahin, E., Spears, W.M. (eds.) Swarm
Robotics, pp. 143–152. Springer, Heidelberg (2005). https://doi.org/10.1007/978-
3-540-30552-1 12
19. Mather, T.W., Hsieh, M.A.: Distributed robot ensemble control for deployment to
multiple sites. In: Robotics: Science and Systems VII (2011)
20. Nam, C., Shell, D.A.: Analyzing the sensitivity of the optimal assignment in prob-
abilistic multi-robot task allocation. IEEE Robot. Autom. Lett. 2(1), 193–200
(2017). https://doi.org/10.1109/LRA.2016.2588138
21. Pais, D., Caicedo-Nunez, C.H., Leonard, N.E.: Hopf bifurcations and limit cycles
in evolutionary network dynamics. SIAM J. Appl. Dyn. Syst. 11(4), 1754–1784
(2012)
22. Prorok, A., Hsieh, M.A., Kumar, V.: The impact of diversity on optimal control
policies for heterogeneous robot swarms. IEEE Trans. Rob. 33(2), 346–358 (2017)
23. Ravichandar, H., Shaw, K., Chernova, S.: STRATA: unified framework for task
assignments in large teams of heterogeneous agents. Auton. Agents Multi Agent
Syst. 34(2), 38 (2020)
Stochastic Nonlinear Ensemble Modeling and Control 99
24. Schuster, P., Sigmund, K., Hofbauer, J., Gottlieb, R., Merz, P.: Selfregulation of
behaviour in animal societies. Biol. Cybern. 40(1), 17–25 (1981)
25. Sigmund, K.: A survey of replicator equations. In: Casti, J.L., Karlqvist, A. (eds.)
Complexity, Language, and Life: Mathematical Approaches. Biomathematics, vol.
16, pp. 88–104. Springer, Heidelberg (1986). https://doi.org/10.1007/978-3-642-
70953-1 4
A Decentralized Cooperative Approach
to Gentle Human Transportation
with Mobile Robots Based on Tactile
Feedback
1 Introduction
have adopted the method of pushing and pulling on the ground for pushing-only
and caging. Though there are studies dealing with human transportation [9,10]
among them, in [10], the human is pulled and transported by swarm robots on
the ground, which is not gentle to the human. For systems that carry adult-sized
people on the robot, [9] is designed with about three modular robots, making it
difficult to deal with different people sizes due to the large size of each modular
robot. Thus, though there exists a demand for robots to transport people at
disaster sites, there are few studies of cooperative transport systems in which
multiple robots carry a human on their top and are suitable for various body
types.
When we consider small robots carrying a fragile object on their top, the
contact surface materials that cover their top are desired to be soft. If the robot
parts are made with hard materials, a substantial force will occur on the contact
surface between the robot and the object to be transported, which may damage
the object. Safe interaction is necessary and objects such as humans need to be
treated gently which can be achieved by using flexible materials [14,15].
Another advantage of using soft materials as a contact surface is that by
introducing a sensor to measure the deformation of the contact surface, each
robot can determine its movement so that no extra force is applied to the object.
According to the action-reaction law, if we move the robot so that the defor-
mation of the contact surface is reduced, the force applied to the object is also
reduced. For most of the previous studies, the force information is measured at
the joints rather than at the surface that handles the object.
Here, we consider the implementation of flexible tactile sensors that measure
the force through deformation on the contact surface of all robots in a leader-
follower system. Leader-follower is a typical cooperative transportation method
in which the leader decides the direction of the transport, and the followers
move the object to assist the leader’s effort [16–20]. We consider the method
suitable for transporting humans, which may change their position during trans-
port, because the followers can keep following each body part based on the force
information even when the person changes their position. However, the trans-
portation is not gentle if the leader robot runs forward without paying attention
to the overall transportation situation. In [19,20], the control law is designed
so that leader tries to run at the desired velocity all the time. This will let the
leader exert excessive force on the object when the desired velocity is suddenly
changed. These situations include the beginning of the transport and emergen-
cies where the leader detects a danger such as a soon-to-collapse house and needs
to evacuate as soon as possible. This is why the control of a leader robot also
needs to include feedback from its force sensor.
Therefore, in this study, we introduce force feedback into the control law for
the leader, as a major difference from the conventional leader-follower based
method. Using the improved leader-follower method, in situations when the
leader tries to give a speed command far from the current system’s speed, the
designed control law will restrict the force the leader applies to the object by
moving toward the direction where force is applied, i.e. slowing down when force
102 Y. Zhang et al.
is applied to the opposite direction of the desired velocity. Since each robot’s
control is completed by its sensors and controller, there is no need for commu-
nication with other robots in this system, the system is fully autonomous and
decentralized.
Thus, this study adopts an approach to gentle transportation and safe inter-
action without applying excessive force by improving both the hardware and
the software. We define gentle as a characteristic when the device has a surface
that enables safe interaction with human, and does not apply undue force to the
transportation subject. This is achieved by adopting a robot equipped with a
thick and flexible tri-axis tactile sensor and an improved leader-follower based
control which includes force feedback in the leader’s control.
In this paper, firstly, an autonomous decentralized controller for cooperative
transport is designed based on the force information from the flexible sensors.
Then, a model including the flexible sensor’s dynamics is constructed. Speed
convergence, consensus speed characteristics and gentleness to the object of the
designed autonomous decentralized control are analyzed through simulations and
actual experiments using a rigid object. In addition, we discuss the performance
of the extended application of the designed controller to multijoint transport
through transport experiments with a replica of a half-size skeleton.
Contributions: I. An approach to gentle transport without applying excessive
force by improving both hardware and control laws; II. A modeling method of
the system considering the flexible tactile sensor; III. Verification of extensibility
of the controller to multijoint objects.
2 Methodology
In this chapter, we discuss the mechanism of the flexible tactile sensor and the
control law for gentle cooperative transportation which uses the flexible tactile
sensor’s reading values.
In recent years, various flexible tactile sensors have been developed [11–13] and
their design methods have been established [11]. In this study, a force sensor
with a flexible thick contact layer that can measure the triaxial force devel-
oped by Kawasetsu et al. [12] was adopted for the part that interacts with
the object. The force exerted on the object during transport is measured at
the contact area, enabling transport without exerting excessive force on the
object. The flexible tactile sensor consists of two main parts: the flexible contact
part and the circuit board part which includes a 4-channel inductance digital
converter (LDC1614EVM, TEXAS INSTRUMENTS). The flexible contact part
has a disk-shaped magnetorheological elastomer (ferromagnetic marker) in the
center that deforms according to the force applied. In [12], the silicon rubber
Gentle Human Transportation Based on Tactile Feedback 103
Fig. 1. (a) shows the mechanism of the measurement using the flexible tactile sensor’s
deformation. (b) shows the coil position indexing for force related values’ calculation.
ECOFLEX 00-30 (Smooth-On) was used, whereas, in this study, ECOFLEX 00-
20 (Smooth-On), a softer silicone rubber, was used considering handling objects
more gently and increasing sensor sensitivity. The magnetorheological elastomer
in the center was made of elastomer containing iron powder. The deforma-
tion of the magnetorheological elastomer part changes the values of inductance
Li ∈ R, i = {1, 2, 3, 4}, measured with the four coils as shown in Fig. 1, which
can be used to determine Lx , Ly and Lz as in Equations (1)–(3) which are
known to correspond linearly to the forces in the three axes. We calibrate the
sensor by centralizing each Li to zero when no load is on the sensor. During the
experiment, when force is applied to the sensor, we obtain three values that are
proportional to the force as follows:
Consider a situation where not all robots know the path in which the object
should be moving. In this case, the robot that has the knowledge of the path
to the goal can be the leader and brings the object forward. The remaining
follower robots can sense the direction the leader robot wants to go based on
the force information from the object, then follow the leader robot. There is
no need for communication between robots. However, the transportation is not
gentle if the leader robot runs forward without paying attention to the overall
transportation situation or movements made by humans such as stretching one’s
limb. Therefore, the control of a leader robot also needs to include feedback from
the force sensor. The system schema is shown in Fig. 2.
104 Y. Zhang et al.
For the leader robot’s control law, in addition to the desired velocity v d ∈ R2
for leader along the predetermined path, [LL x , Ly ] obtained from the flexible
L T
The control law is designed so that leader will decrease its speed when force
is applied to a direction opposite to the desired velocity. Due to this feature,
the system’s velocity does not converge to v d but to some velocity v c where
|v c | < |v d |.
3.1 Modeling
In this model, we consider the transportation of a rigid body in a one-dimensional
path for simplicity. Since silicone rubber generally has elasticity and damping
elements, we model the flexible tactile sensor as a spring and a damper with the
modulus of elasticity and damping coefficient denoted by k and c, respectively.
Gentle Human Transportation Based on Tactile Feedback 105
Fig. 3. Modeling schematic diagram. All the flexible sensors mounted on the robots
are assumed to have the same elasticity and damping coefficient. The initial value of
each robot’s and object’s position and speed is set to be 0.
When the object is too heavy to move for the leader, the followers are not able
to sense the force from the object in the first place. So we consider multiple
leaders in our model. Thus, consider a system with p leaders, q followers and
an object, where the position of leaders, object and followers are denoted by
{x0 , x1 , · · ·, xp−1 }, xp and {xp+1 , xp+2 , · · ·, xp+q } respectively. The mass of each
robot and the object is m and M respectively. In the initial state, the object
is placed on the robots so that the forces are balanced among all robots. Let
xi ∈ R, {i ∈ Z | 0 ≤ i ≤ p + q} be defined as zero when all robots are in the
initial state, i.e., at the position where the sensor springs are at their initial
length. The model schema is shown in Fig. 3. There is no direct interaction
between the robots. We assume there is no sliding between the object and each
robot.
For the dynamics model, we only consider the dynamics of the object for sim-
plicity. We consider rolling resistance for wheel resistance, which is the main resis-
tance in the low-speed region. The rolling resistance is a force directly applied
to the robot, but we consider it hypothetically applied to the object since it is
assumed that there is no sliding between the object and the robot. Ur is denoted
as the resistance coefficient. Thus, the dynamics of leaders, followers and the
object can be expressed as in Equations (6)–(8). The leaders’ and followers’
speed change according to the control law and the object moves according to
the summation of forces it receives from all robots.
Fig. 4. (a) shows the force applied to each robot and the object, (b) shows the speed
of the leader, follower and object during the base case simulation.
3.2 Simulation
In this section, we use the model constructed in the previous section to inves-
tigate the convergence and trends of consensus speed for different parameters.
We define consensus speed as the speed after leaders and followers reach con-
sensus and their speeds become the same value. The governing equations and
the initial conditions for all leader robots are the same, so the answers will be
identical. This is the same with the followers. Therefore, only one leader and one
follower are considered in the simulation and the following graph, the leader’s
state represents the state of all leaders and the same with the followers.
We consider a base case that has a certain configuration and investigate the
influence of parameters by comparing other cases with the base case. The param-
eters include the object’s weight, leaders’ number, followers’ number, leader’s
gain and follower’s gain in the control law, sensor’s elasticity, damping effect and
leader’s desired speed. The parameter value settings for each case are described
in Table 1. We set Ur to be 0.01 in all cases.
We show the graph of speed and force in Fig. 4. The speed of the leader and
follower reached a consensus so that the speed value became the same. Both in
the speed and force graph, we could observe the gentleness to the object in the
first two seconds. When the leader is trying to run too fast or applying too much
force to the object, it will restrict its speed according to the designed control
law.
The rest of the cases also showed speed consensus. The speed after consensus
of each case is shown in Table 1. It is shown that when the value of leader number,
follower gain, or leader desired speed increases, the speed after consensus is
higher. On the other hand, if the weight of the object, follower number, or leader
gain increases, the consensus speed becomes lower. The elasticity and damping
effect of the sensor does not affect the consensus speed. More specifically, the
elasticity of 1 to 400 and damping effect of 0.001 to 5 resulted in the same
consensus speed with other parameters fixed, though they affect the length of
Gentle Human Transportation Based on Tactile Feedback 107
time for the system’s speed to converge. With elasticity larger than the range,
convergence still can be achieved by adjusting the gain. Conditions for stability
will be explored in future work.
4 Experiments
⎛ ⎞ ⎛ ⎞⎛ ⎞
v1 1 √0 r vx
⎝v2 ⎠ = ⎝−1/2 − 3/2 r⎠ ⎝vy ⎠ (9)
√
v3 −1/2 3/2 r vθ
In the experiments, the validity of the system is first verified through the
transportation of a rigid object. We show that the system is able to transport an
object cooperatively even if the weight of the object and the number of leaders
108 Y. Zhang et al.
Fig. 5. (a): The mechanical structure of the developed robot. The Vstone’s omni-robot
platform which comes with three omni-wheels, 12V DC coreless motors (gearbox ratio
is 80:1), and optical encoders (4CPR) is used for the experiments. (b): The micro-
controller - ESP32-DevKitC V4, motor driver - Cytron MDD3A, Bi-directional Logic
Level Converter - TXB0108, and a Lipo battery is equipped. (c): velocity indexing of
the omnidirectional robot.
Cases p q M (kg) KL (M −1 T ) KF (M −1 T )
Base case 1 2 5 15 30
Heavier object 1 2 9 15 30
More leaders 2 2 5 15 30
More followers 1 3 5 15 30
Fig. 6. (a,b,c) shows the speed of the robots during the base case experiment calcu-
lated with encoders. (d,e,f) shows the flexible force sensor’s reading in the direction of
movement.
elastomer was removed when removing the object, resulting in a remaining sensor
reading.
In other cases, we also observed the consensus of speed during the experi-
ments. Time-series graphs of other experiments are omitted. The averaged speed
after the observation of the speed consensus in each case is shown in Table 3. The
consensus speed is calculated using the motion capture software Kinovea. The
trend of consensus speed when the parameters change stated in the simulation
is also shown in the experiment except in the case when a heavier object is used
for the experiment. This might due to the difference of the actual vd in each
experiment. In all cases, the leader’s desired input was set to the same value,
but the actual desired speed results are slightly different due to the difference in
Lipo battery voltage.
Table 4. Experiment consensus speed result calculated using Kinovea and the leader’s
desired speed in each case measured with encoder.
The result of each robot’s average speed after consensus was calculated using
Kinovea is shown in Table 4. We observed the consensus on speed and success
in transportation. A video of the experiment is shown in [22].
In another experiment, we applied force to arms and legs in which the direc-
tion is in the plane that is being controlled to mimic the situation where a
human moves its limbs. A video of the experiment is shown in [23]. The fol-
lowers moved in the direction of applied force following the designed control
law and the robots successfully tracked the limb’s movement. The compatibil-
ity of the leader-follower based control and multijoint objects’ transportation is
demonstrated.
little thought has been given to the transport of multijoint objects, damage or
discomfort to the objects being transported. The system in this paper is an
example of research that considers extensions to multijoint object transport and
gentleness to the objects being transported.
The control law designed in this study includes feedback from a force sensor
to the leader, so the consensus speed is not simply the desired speed given to the
leader. Modeling of the system allowed us to qualitatively confirm the difference
in consensus speed. In the future, by analytically determining the consensus
speed, it is possible to confirm what parameters and how they are involved in
the value of consensus speed to design the consensus speed. In addition, we
plan to decrease the instability due to the roundness of the body by making
the current robots smaller, increasing the number of robots and increasing the
contact area.
References
1. Cao, Y., Fukunaga, A., Kahng, A.: Cooperative mobile robotics: antecedents
and directions. Auton. Robot. 4, 7–27 (1997). https://doi.org/10.1023/A:
1008855018923
2. Tuci, E., Alkilabi, M.H., Akanyeti, O.: Cooperative object transport in multi-robot
systems: a review of the state-of-the-art. Front. Robot. AI. 5 (2018). https://doi.
org/10.3389/frobt.2018.00059
3. Schranz, M., Umlauft, M., Sende, M., Elmenreich, W.: Swarm robotic behaviors
and current applications. Front. Robot. AI (2020)
4. Fukuda, T., Ueyama, T., Kawauchi, Y., Arai, F.: Concept of cellular robotic system
(CEBOT) and basic strategies for its realization. Comput. Electr. Eng. 18, 11–39
(1992). https://www.sciencedirect.com/science/article/pii/004579069290029D
5. Parker, L.: Current state of the art in distributed autonomous mobile robotics.
Distrib. Auton. Robot. Syst. 4, 3–12 (2000)
6. Yan, Z., Jouandeau, N., Cherif, A.: A survey and analysis of multi-robot coordi-
nation. Int. J. Adv. Robot. Syst. 10 (2013). https://doi.org/10.5772/57313
7. Brambilla, M., Ferrante, E., Birattari, M., Dorigo, M.: Swarm robotics: a review
from the swarm engineering perspective. Swarm Intell. 7, 1–41 (2013)
8. Nazarova, A., Zhai, M.: The application of multi-agent robotic systems for earth-
quake rescue. Robot.: Indus. 4.0 Issues New Intell. Control Paradigms., 133–146
(2020). https://doi.org/10.1007/978-3-030-37841-7 11
9. Osuka, K., Isayama, S.: Motion control of multi-linked stretcher robot DUCKS.
In: Proceedings Of SICE Annual Conference 2010, pp. 873–874 (2010)
10. Dorigo, M., et al.: The SWARM-BOTS project
11. Wang, H., et al.: Design methodology for magnetic field-based soft tri-axis tactile
sensors. Sensors (Switzerland) 16 (2016)
112 Y. Zhang et al.
12. Kawasetsu, T., Horii, T., Ishihara, H., Asada, M.: Flexible tri-axis tactile sensor
using spiral inductor and magnetorheological elastomer. IEEE Sens. J. 18, 5834–
5841 (2018)
13. Wang, H., et al.: Design and characterization of tri-axis soft inductive tactile sen-
sors. IEEE Sens. J. 18, 7793–7801 (2018)
14. Hughes, J., Culha, U., Giardina, F., Guenther, F., Rosendo, A., Iida, F.: Soft
manipulators and grippers: a review. Front. Robot. AI 3 (2016)
15. Kim, S., Laschi, C., Trimmer, B.: Soft robotics: a bioinspired evolution in robotics.
Trends Biotechnol. 31, 287–294 (2013)
16. Asama, H., Matsumoto, A., Ishida, Y.: Design of an autonomous and distributed
robot system: ACTRESS. In: Proceedings. IEEE/RSJ International Workshop on
Intelligent Robots and Systems. (IROS ’89) the Autonomous Mobile Robots and
its Applications, pp. 283–290 (1989)
17. Kosuge, K., Oosumi, T., Chiba, K.: Decentralized control of multiple mobile robots
handling a single object in coordination. J. Robot. Soc. Jpn. 16, 87–95 (1998)
18. Osumi, H., Nojiri, H., Kuribayashi, Y., Okazaki, T.: Cooperative control for three
mobile robots transporting a large object. J. Robot. Soc. Jpn. 19, 744–752 (2001)
19. Wang, Z., Schwager, M.: Kinematic multi-robot manipulation with no communi-
cation using force feedback. In: 2016 IEEE International Conference On Robotics
And Automation (ICRA), pp. 427–432 (2016)
20. Wang, Z., Yang, G., Su, X., Schwager, M.: OuijaBots: omnidirectional robots for
cooperative object transport with rotation control using no communication. In:
Groß, R., et al. (eds.) Distributed Autonomous Robotic Systems. SPAR, vol. 6, pp.
117–131. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73008-0 9
21. Zhang, Y.: Transportation experiment with mobile robots based on tactile feedback
- object. https://youtu.be/i5yjvAJ5VRE
22. Zhang, Y.: Transportation experiment with mobile robots based on tactile feedback
- half-sized skeleton. https://youtu.be/AtM5e7bqSZQ
23. Zhang, Y.: Transportation experiment with mobile robots based on tactile feedback
- moving limb. https://youtu.be/NBYaup4oivA
Sparse Sensing in Ergodic Optimization
1 Introduction
There is a lack of capability of current robotic systems to explore and search for
information in resource limited settings. Such a problem occurs in several appli-
cations including planetary science robots with expensive sensors, agricultural
robots with limited available measurements, etc. Currently, arbitrary heuristics
are used to prioritize resources, but this approach does not consider the depen-
dence on what information the robot has acquired and its mobility capabilities.
There is a need for methods that can jointly optimize how robots search and
when to take measurements.
Prior works in coverage, especially those based on ergodic search processes,
assume that the sensor being used is consistently taking measurements, and
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 113–126, 2024.
https://doi.org/10.1007/978-3-031-51497-5_9
114 A. Rao et al.
2 Prior Work
Search and Coverage Methods: Search and coverage methods have wide-
spread uses in a number of important applications, including search and rescue,
cleaning robots, and automated agriculture techniques. Both object search and
coverage involve motion planning, typically based on an a priori information
distribution in a domain. Most search and coverage techniques involve the con-
struction and use of an information map which encodes the utility of the robot
exploring specific regions of the search space.
Broadly, area search and coverage methods fall into three categories: geo-
metric, gradient-based and trajectory optimization-based approaches. Geomet-
ric methods, like lawnmower search, are good techniques for uniformly covering
a search domain [3,4]. Such approaches are useful when no a priori informa-
tion is provided, or when the a priori information is presumed to be a uniform
probability distribution over the search domain. On the other hand, since these
approaches do not utilize any a priori information about regions of interest, the
efficacy of exploration and object search is upper bounded.
When a priori information is available, a representation of how information
is distributed across a search domain (i.e. an a priori information map or infor-
mation distribution) can be leveraged to develop better search and coverage
techniques. In gradient-based methods [5–7], the short-term information gain is
greedily maximized by guiding the search agent in the direction of the derivative
of the information map around its position. Since the agent is always driven in the
direction of greatest information gain, it is led to areas with higher likelihood of
finding objects. However, these approaches can leave areas unexplored, because
they generally do not take into account the uncertainty associated with the a
priori information distribution, which can help differentiate between unexplored
areas with low information and areas with no information. These approaches are
very sensitive to noise in the information map because noise leads to inaccurate
gradient estimations, which in turn leads to agents greedily over-exploiting local
information maxima.
Trajectory optimization-based methods formulate search as an information
gathering maximization problem, which is then solved by planning a path for
the search agent [8]. The cost function that drives the optimization in many of
these approaches combines both the information distribution and its associated
uncertainty, thereby reducing the likelihood of leaving areas unexplored.
Sparse Sensing: Many real-world robotic applications are plagued by resource
limitations. Sparse sensing techniques are useful in these scenarios. Most prior
work in applications involving sparse sensing focus on using sparse sensor mea-
surements (and therefore sparse data) to accomplish tasks like localization and
SLAM [1], depth reconstruction [2] and wall-following [9]. These works mostly
explore methods to better use limited data that has already been acquired, or
that is actively being acquired. However, in all of these approaches, the robot
still has to acquire the measurements, and post-optimizing for sparse data points
does not help reduce costs for limited onboard resources. While intelligently using
116 A. Rao et al.
limited data does help improve the performance of resource-limited robotic sys-
tems, further improvements can be made by deciding where to take these limited
measurements.
Ergodic Search: Most information-based search and coverage methods view
the information gathering problem through one of two lenses: exploration or
exploitation. Through the exploratory lens, the information acquisition problem
is framed as the wide space search for diffused information, for applications like
localization or coverage [10,11]. On the other hand, through the exploitative
lens, the information gathering problem is framed as the direct search for highly
structured information peaks, such as in object search [11,12]. Ergodic search is
able to balance both exploration and exploitation goals by accounting for both
diffused information densities and highly focused points of information [8,13,
14]. By doing this, ergodic control trajectories are able to conduct both broad-
stroke coverage for diffused information densities and localized search for more
focused high information points, thereby balancing exploration and exploitation.
Specifically, an ergodic path will drive a robot to spend more time in regions of
higher expected information in an a priori information map, and less time in
regions of low information.
Ergodic coverage [8] produces trajectories for agents such that they spend
time in areas of the domain proportional to the expected amount of information
present in that area. In order to control agents to accomplish this behavior, we
pose an optimization problem that minimizes the distance between the time-
average statistics of the agent Eq. 1 and the underlying information map.
1
t−1
t
C (x, γt ) = δ(x − γi (τ )), (1)
t τ =0
and
αk are the weights of each coefficient difference. In practice, αk =
(1 + k2 )−(d+1) is usually defined to place higher weights on lower frequency
components, which correspond to larger spatial-scale variations in the informa-
tion distribution.
The goal of ergodic coverage is to generate optimal controls u∗ (t) for an
agent, whose dynamics are described by a function f : Q × U → T Q, such that
comprehend and optimize a model. Note that in the optimization in Eq. 3, the
locations at which sensor measurements are obtained are uniform in time, that
is, each measurement along the trajectory is treated as being equally important
and contributing equally to minimizing the ergodic metric. In many scenarios,
due to both the non-uniform distribution of information in the region being cov-
ered, as well as the dynamic constraints of the robotic agents being used, this
can lead to extraneous measurements that do not contribute to reconstructing
the information prior.
For example, if there are few regions of high information, which are sepa-
rated by larger regions of low expected information, uniform sensor measure-
ments would result in many measurements that provide low information gain
(see Fig. 2). In this case, ergodic performance would be improved by only taking
sensor measurements near the areas of higher information.
In this work we extend ergodic optimization in order to find this set of optimal
measurements for a given search scenario. We optimize for the set of optimal
sensor measurements by posing the following ergodic optimization,
where q ∈ Q is the state, u ∈ U denotes the set of controls, and λ(t) ∈ {0, 1}.
λ(t) represents the decision variable for choosing whether to take a sensor mea-
surement or not at a given location in the search domain. We promote sparsity
in the sample measurements by regularizing λ with an L1 optimization which
promotes sparsity [16].
We augment the ergodic metric in Eq. 2 in the following manner
m
2
Φ (γ(t)) = αk |ck (γ(t), λ(t)) − ξk | + |λk |, (5)
k=0
1 t
C t (x, γ(t)) = λ(t)δ(x − γ(τ )), (6)
t λ(t) τ =0
λ(t) ∈ [0, 1] and optimize over the new domain. After optimization, we project λ
from the continuous domain to the nearest integer value, while adhering to the
sensing budget. This allows us to continuously optimize Eq. 4, and then map
the resultant continuous values of λ(t) to discrete values {0, 1}. The procedure
for optimizing the sparse ergodic problem Eq. 4 is depicted in Fig. 3.
When we investigate the numerical values of the decision variables λ(t) cal-
culated in sparse ergodic optimization Eq. 4, we see that there are peaks formed
that correspond to peaks in information in the a priori information distribu-
tion (Fig. 3). When these continuous numerical values are mapped back to
λ(t) ∈ {0, 1}, this results in λ(t) = 1 being more likely in areas of higher infor-
mation, and λ(t) = 0 being more likely in areas of lower information. This shows
that the sparse ergodic optimization drives the likelihood of taking a sensor mea-
surement in an area of higher expected information to be higher, which follows
intuition. Additionally, we also see some peaks in regions of lower information,
which typically correspond with areas between relatively closely places areas of
high information, or with regions of low information where no sensor measure-
ments have been taken for a considerable amount of time.
For a multi-agent team covering a given information prior, the limited number
of measurements required to fully cover the region can be distributed among the
120 A. Rao et al.
1
t
C t (x, γ(t)) = N λi (t)δ(x − γi (τ )), (7)
Nt i=1 t λi (t) 0
0.075
0.055
heuristic is much better than that
0.05 of standard ergodic optimization
0.045
0 10 20 30 40 50 60 70 80
with uniformly distributed mea-
90 100
tories (Fig. 5), implying that trajectory optimization and measurement choice
impact each other. Jointly optimizing for trajectory and sensing measurements
leads to lower control cost, as you are directly taking into account the cost of
moving between chosen sensing measurements. For a single agent in the sim-
ple coverage problems being considered, there aren’t large differences in control
cost, leading to very similar performance of the probabilistic heuristic and sparse
ergodic optimization.
Further, we compare the performance statistics and computation cost of our
approach to that of solving the mixed integer programming formulation of the
sparse ergodic optimization problem. We see in Fig. 7 that our relaxation of the
sparse ergodic optimization problems leads to much lower computational costs
(i.e. lower run times), while retaining comparable coverage performance in terms
of the ergodic metric.
0.08 6
Sparse Ergodic Optimization Computation Time (seconds) Sparse Ergodic Optimization
0.075
Mixed Integer Formulation 5 Mixed Integer Formulation
0.07
Ergodic Value
4
0.065
0.06 3
0.055
2
0.05
1
0.045
0.04 0
0 50 100 150 0 50 100 150
Ergodic Value
0.075
for a multi-agent team using the
sparse ergodic optimization app- 0.07
5 Conclusion
In this paper we investigate the idea that there is an optimal set of sensing mea-
surements that can be taken during coverage, in order to fully characterize the
problem, reduce sensing costs and avoid local sensing optima. To this end, we
formulate a novel approach to sensing-resource limited coverage by modifying
ergodic optimization to jointly optimize for both the sensing trajectory and the
decision of where to take sensing measurements. Specifically, the set of optimal
sensor measurements is posed as a sparse ergodic optimization problem, where
the choice to take a measurement is encoded in a vector of sample weights. Our
set of experiments show that there exists an optimal set of sensing measure-
ments for a given coverage scenario, in both single and multi agent cases, using
ergodicity, a state-of-the-art coverage metric. We also infer that there exists a
cross-effect between an agent’s trajectory and sensing decisions, which make it
important to jointly optimize these in cases with limited sensing measurements.
This effect is stronger for multi-agent scenarios.
This work experimentally shows that there are improvements to be made in
resource-limited coverage scenarios through sparse sensing. Specifically, we show
that there exists an optimal set of measurements required to solve a coverage
problem. Future work will focus on proving theoretical guarantees regarding
the existence of this optimal set of sensing measurements. This work assumes
the availability of accurate a priori information maps, which is not the case
for many real-world coverage applications. Future work will seek to use sparse
ergodic optimization in order to identify and account for inaccurate information
priors with minimum sensor measurements, and take into account sensor noise.
References
1. Marı́n, L., Vallés, M., Soriano, A., Valera, A., Albertos, P.: Multi sensor fusion
framework for indoor-outdoor localization of limited resource mobile robots. Sen-
sors 13(10), 14133–14160 (2013). https://doi.org/10.3390/s131014133. https://
www.mdpi.com/1424-8220/13/10/14133
2. Ma, F., Carlone, L., Ayaz, U., Karaman, S.: Sparse sensing for resource-constrained
depth reconstruction. In: 2016 IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS), pp. 96–103 (2016). https://doi.org/10.1109/IROS.
2016.7759040
3. Ablavsky, V., Snorrason, M.: Optimal search for a moving target - a geometric
approach. In: AIAA Guidance, Navigation, and Control Conference and Exhibit.
AIAA (2000)
4. Choset, H.: Coverage for robotics-a survey of recent results. Ann. Math. Artif.
Intell. 31(1), 113–126 (2001)
5. Lanillos, P., Gan, S.K., Besada-Portas, E., Pajares, G., Sukkarieh, S.: Multi-UAV
target search using decentralized gradient-based negotiation with expected obser-
vation. Inf. Sci. 282, 92–110 (2014)
126 A. Rao et al.
6. Baxter, J.L., Burke, E.K., Garibaldi, J.M., Norman, M.: Multi-robot search and
rescue: a potential field based approach. In: Mukhopadhyay, S.C., Gupta, G.S.
(eds.) Autonomous Robots and Agents. Studies in Computational Intelligence,
vol. 76, pp. 9–16. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-
73424-6 2
7. Wong, E.M., Bourgault, F., Furukawa, T.: Multi-vehicle bayesian search for mul-
tiple lost targets. In: International Conference on Robotics and Automation, pp.
3169–3174. IEEE (2005)
8. Mathew, G., Mezić, I.: Metrics for ergodicity and design of ergodic dynamics for
multi-agent systems. Physica D 240(4), 432–442 (2011)
9. Wu, G.D., Zhu, Z.W., Chien, C.W.: Sparse-sensing-based wall-following control
design for a mobile-robot. In: 2016 IEEE International Conference on Control and
Robotics Engineering (ICCRE), pp. 1–5 (2016). https://doi.org/10.1109/ICCRE.
2016.7476144
10. Miller, L.M., Silverman, Y., MacIver, M.A., Murphey, T.D.: Ergodic exploration
of distributed information. IEEE Trans. Rob. 32(1), 36–52 (2015)
11. Abraham, I., Mavrommati, A., Murphey, T.: Data-driven measurement models for
active localization in sparse environments. In: Robotics: Science and Systems XIV.
Robotics: Science and Systems Foundation (2018). https://doi.org/10.15607/RSS.
2018.XIV.045
12. Abraham, I., Prabhakar, A., Murphey, T.D.: An ergodic measure for active learning
from equilibrium. IEEE Trans. Autom. Sci. Eng. 18(3), 917–931 (2021). https://
doi.org/10.1109/TASE.2020.3043636
13. Ayvali, E., Salman, H., Choset, H.: Ergodic coverage in constrained environments
using stochastic trajectory optimization. In: International Conference on Intelligent
Robots and Systems, pp. 5204–5210. IEEE (2017)
14. Miller, L.M., Murphey, T.D.: Trajectory optimization for continuous ergodic explo-
ration. In: American Control Conference (ACC) 2013, pp. 4196–4201. IEEE (2013)
15. Kobilarov, M.: Cross-entropy motion planning. The Int. J. Robot. Res. 31(7), 855–
871 (2012)
16. Schmidt, M., Niculescu-Mizil, A., Murphy, K., et al.: Learning graphical model
structure using l1-regularization paths. In: AAAI, vol. 7, pp. 1278–1283 (2007)
17. Wolsey, L.A.: Mixed Integer Programming. Wiley Encyclopedia of Computer Sci-
ence and Engineering, pp. 1–10. Wiley, Hoboken (2007)
Distributed Multi-robot Tracking
of Unknown Clustered Targets with Noisy
Measurements
1 Introduction
Multi-target tracking using distributed multi-robot systems (MRSs) has drawn
increasing attention over the past decades as robots become more powerful and
low-cost. In a large number of real-world scenarios, targets are likely to dis-
tribute in clusters despite lack of prior knowledge about exact locations, includ-
ing social animals, certain species of plants, trash distributed in inhabited places,
etc. In such cases, detecting a targets indicates that some other targets are likely
to appear nearby. Most existing multi-robot multi-target tracking (MR-MTT)
algorithms underperform in trading-off between having robots to search for un-
detected targets and tracking detected targets when targets are not evenly dis-
tributed across the search space given no prior knowledge. This paper aims at
developing effective distributed tracking algorithms for unknown targets that
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 127–139, 2024.
https://doi.org/10.1007/978-3-031-51497-5_10
128 J. Chen et al.
2 Problem Formulation
A set of nt targets with states X = {x1 , . . . , xnt } are located within a convex
open task space denoted by E ⊂ R2 . A team of nr (possibly heterogeneous)
robots R = {r1 , . . . , rnr } are tasked with determining nt and X, both of which
are unknown and may vary over time. We assume that each robot ri knows its
location qi in a global reference frame (e.g. from GPS), though our proposed
method can be immediately extended to handle localization uncertainty using
the algorithms from our previous work [3]. At each time step, a robot ri receives
a set of |Zi | noisy measurements Zi = {zi , z2 , . . .} of targets within the field of
view (FoV) Fi of its onboard sensor. Note that the sensor may experience false
negative or false positive detections so the number of detections may not match
the true number of targets.
where Wi is dominance region of robot ri (i.e., the region that robot ri is respon-
sible for), · is the Euclidean distance, x ∈ E, φ(x) is the weighting function
for all x ∈ E, and f (·) is a monotonically increasing function. The role of f is to
quantify the cost of sensing due to degradation of a robot’s ability to measure
events with increasing distance. The dominance regions Wi form a partition over
E, meaning the regions have disjoint interiors and the union of all regions is E
[5].
The goal is for the team to minimize the functional (1), both with respect to
the partition set W and the robot positions Q. Minimizing H with respect to W
induces a partition on the environment Vi = {x | i = arg mink=1,...,n x − qk }.
In other words, Vi is the collection of all points that are the nearest neighbor of
ri . This is the Voronoi partition, and these Vi are the Voronoi cells, which are
convex by construction. We call qi the generator point of Vi . Minimizing H with
respect to Q leads each sensor to the weighted centroid of its Voronoi cell [5],
that is
130 J. Chen et al.
xφ(x) dx
qi∗ = Vi . (2)
Vi
φ(x) dx
Lloyd’s algorithm sets the control input for robot ri to ur (qi∗ ), where
g − qi
ui (g) = min dstep , g − qi , (3)
g − qi
g is an arbitrary goal location, and dstep > 0 is the distance a robot can move
during one time step. Robots follow this control law online, i.e., recursively move
to the temporary weighted centroids of their Voronoi cells, re-construct their cells
based on their new positions, and compute the new weighted centroids to move
to. As a result, the robots asymptotically converge to the weighted centroids
of their Voronoi cells, causing the team to reach a local minimum of Eq. (2).
This still holds true when φ(x) varies with time. Note that the original Lloyd’s
algorithm works in a centralized manner.
The sets X and Zi from above contain a random number of random elements, and
thus are realization of random finite sets (RFSs) [13]. The first order moment of
an RFS is known as the Probability Hypothesis Density (PHD) (which we denote
v(x)) and takes the form of a density function over the state space of a single tar-
get or measurement. By assuming that the RFSs are Poisson distributed where
the number of targets follows a Poisson distribution and the spatial distribution
of targets is i.i.d., the PHD filter recursively updates this target density function
in order to track the distribution over target sets [12].
The PHD filter uses three models to describe the motion of targets: 1) The
motion model, f (x | ξ), describes the likelihood of an individual target tran-
sitioning from an initial state ξ to a new state x. 2) The survival probability
model, ps (x), describes the likelihood that a target with state x will continue to
exist from one time step to the next. 3) The birth PHD, b(x), encodes both the
number and locations of the new targets that may appear in the environment.
The PHD filter also uses three models to describe the ability of robots to
detect targets: 1) The detection model, pd (x | q), gives the probability of a
robot with state q successfully detecting a target with state x. Note that the
probability of detection is identically zero for all x outside the sensor FoV. 2)
The measurement model, g(z | x, q), gives the likelihood of a robot with state q
receiving a measurement z from a target with state x. 3) The false positive (or
clutter) PHD, c(z | q), describes both the number and locations of the clutter
measurements.
Distributed Tracking of Clustered Targets 131
Using these target and sensor models, the PHD filter prediction and update
equations are:
v̄t (x) = b(x) + f (x | ξ)ps (ξ)vt−1 (ξ) dξ (4)
E
ψz,q (x)v̄t (x)
vt (x) = (1 − pd (x | q))v̄t (x) + (5)
ηz (v̄ t )
z∈Zt
ηz (v) = c(z | q) + ψz,q (x)v(x) dx (6)
E
ψz,q (x) = g(z | x, q)pd (x | q), (7)
where ψz,q (x) is the probability of a sensor at q receiving measurement z from
a target with state x.
The PHD represents the best guess of target states at current time step and
is utilized as the instant state estimation of target density, denoted by wi(x),
i.e., wi(x) = v(x), x ∈ E. In [6] a distributed PHD filter is formulated, with
each robot maintaining the PHD within a unique subset, Vi , of the environ-
ment. Three algorithms then account for motion of the robots (to update the
subsets Vi ), motion of the targets (in (4)), and measurement updates (in (5)).
As a result, each robot recursively re-constructs its Voronoi cell online based
on current relative locations of neighboring robots and maintains PHD locally
by communicating with neighbors to estimate target states, yielding to identical
results to running a centralized PHD filter over the task space. In this paper,
same strategies are applied for propagating wi(x) over time in a distributed
manner.
where α(x) is the shape parameter, which describes the total rewards obtained
from the observations at location x, and β(x) is the inverse rate parameter, which
describes the inverse of the number of historical observations sensors conduct at
location x. Initially, α0 (x) = 0 and β0 (x) = ∞. When a robot ri receives a
measurement set Zi at time t > 0, it updates α(x), β(x) for all x ∈ Fi by
α(x) = α(x) + rw,
(9)
β(x) = (β(x)−1 + 1)−1 ,
where
0 |Zi | = 0
rw = . (10)
1 else
Then, for each x ∈ E, the cumulative state estimation is given by the mode of
the distribution Γ(α(x), β(x))
α(x)−1
β(x) , α(x) ≥ 1
wc(x) = (11)
0, α(x) < 1.
with Dames’ method [6] since wc is close to uniform across the task space and
W is similar to a standard Voronoi diagram since then. However, the advantages
of our method is pronounced when targets are not uniformly distributed in the
space but are instead grouped together within small regions. Under these cir-
cumstances, idle robots are especially helpful in learning the difference in target
density among sub-regions and optimizing the assignment of tracking effort in
different sub-regions.
4 Simulations
We test our proposed algorithms via Matlab simulations. The task space is a
100 m × 100 m square. Targets are distributed in clusters, where 30 are located
in a 33 × 33 m square sub-region at the lower-left corner of E, and another 30
targets in a 33 × 33 m squared sub-region at the top-right corner. Targets move
within the sub-regions following a Gaussian random walk at a maximum speed
of 3 m/s. All robots begin each trial at randomized locations within a 20 m × 10
m box at the bottom center of the environment. Robots have a maximum speed
of 5 m/s and are equipped with isotropic sensors with a sensing radius 5 m.
Both instantaneous and cumulative estimation are approximated by uniform
grid implementation, with a grid size of 0.1 m × 0.1 m. In the PHD filter, the
robots use a Gaussian random walk (with σ = 0.35 m/s) for the motion models
f , set the survival probability to 1, and the birth PHD to 0 due to no prior
knowledge of targets is learned. We use the same measurement model for homo-
geneous noisy sensors as [6]. Note that our proposed method is compatible with
heterogeneous sensing network [4], we just make this choice to simplify the tests.
Fig. 1. Figures show 70 robots and 60 targets distribution in a 100 m × 100 m squared
task space after 300 s of tracking using Dame’s method and our method, respectively.
Orange diamonds plot locations of targets. Green squares and circles show locations
and field of views of robots, respectively. Red crosses plot robots’ temporary goals.
Dashed lines plot boundaries of robots’ current assigned space.
Distributed Tracking of Clustered Targets 135
We first show how our proposed algorithms improve multi-target tracking using
a single trial using 70 robots. Figures 1 show the locations of robots and targets
after 300 s using both Dames’ method [6] and our method. When targets gather
within only a small portion of the environment, the previous method can hardly
optimize robot locations for more accurate tracking of targets since only a few
robots have found targets and are tracking them. As a result, most of the robots
are idle and move erratically to cover areas where no targets are distribute,
which demonstrates the weakness of the previous algorithm that idle robots do
Fig. 2. Figures plot surface of wi and wc distributed over the 100 m × 100 m task
spaces after 300 s, respectively.
136 J. Chen et al.
not actively search for targets, causing an inefficient use of the total sensing
capability of the team while searching for un-tracked targets. Such phenomenon
is improved by our method, which drives a larger number of robots to gather at
the clusters of targets while a handful of other robots continue to search unex-
plored areas, as the exact locations and motion models of targets are unknown.
The result indicates the efficacy of our method in that the distribution of the
clusters is learned overtime and the power diagram distributes more balanced
workload to individuals.
To illustrate the different characteristics of the instantaneous estimation and
the cumulative estimation of targets, we plot in Fig. 2 the value of wi(x) and
wc(x) for all x ∈ E after 300 s of the task using our method, respectively.
The instantaneous estimation, i.e., the PHD, provides the best guess of exact
locations of targets at current time through a set of sharp peaks, shown as Fig. 2a.
The PHD wi(x) of x ∈ E returns to near zero rapidly as targets are no longer
found at x over a very short period of time given low expected target birth rate,
exhibiting high accuracy of estimating target exact locations but short “memory”
of historical target distribution. On the contrary, the cumulative estimation wc
presents a relatively smooth and continuous distribution over the task space,
with higher values distributed over the entire target clusters and near-zero values
over the rest of the space, shown as Fig. 2b. Therefore, the instantaneous and
the cumulative estimation are utilized to drive busy and idle robots respectively,
as the former robots requires target exact locations for accurate tracking while
the latter robots are in need of coarse distribution of clusters for optimized
deployment.
and 3) our method (“O” method) which uses both instantaneous and cumulative
estimation, outlined in Algorithm 1. For each team size we run 10 trials.
OSPA. To quantify the performance, we will use the first order Optimal Sub-
Pattern Assignment (OSPA) metric [14], a commonly-used approach in MTT.
The error between two sets X, Y , where |X| = m ≤ |Y | = n without loss of
Distributed Tracking of Clustered Targets 137
generality, is
1/p
1
m
d(X, Y ) = min dc (xi , yπ(i) ) + c (n − m)
p p
, (15)
n π∈Πn i=1
where c is a cutoff distance, dc (x, y) = min(c, x − y), and Πn is the set of all
permutations of the set {1, 2, . . . , n}. This gives the average error in matched
targets, where OSPA considers all possible assignments between elements x ∈ X
and y ∈ Y that are within distance c of each other. This can be efficiently
computed in polynomial time using the Hungarian algorithm [10]. We use c =
10 m, p = 1, and measure the error between the true and estimated target sets.
Note that a lower OSPA value indicates a more accurate tracking of the target
set.
Results. In Fig. 3a, we report the median OSPA value to measure tracking
accuracy over the final 700 s of each trial, allowing the team to reach or be close
to a steady state and smoothing out the effects of spurious measurements that
cause the OSPA to fluctuate, and the results are aggregated into boxplots. “D”
method shows the worst performance in tracking clustered targets regardless
of team sizes. As depicted by the results of “P” method, such performance is
improved by assigning optimized task spaces that contain equivalent instanta-
neous estimated number of targets to robots in the team, which drives the robots
to gather at where targets are currently tracked. However, robots are still not
Fig. 3. Figures show OSPA errors of group of trials. Figure 3a displays boxplots of
average OSPA errors from 300 s to 1000 s using the three methods (“D”, “P”, and
“O”) with four team sizes, each over ten trials. |R| denotes the number of robots.
Figure 3b plots average OSPA errors of the total of 40 trials (10 for each team size)
using the three methods over the entire 1000 s.
138 J. Chen et al.
5 Conclusions
Acknowledgements. This work was supported by funding from King Abdullah Uni-
versity of Science and Technology (KAUST).
References
1. Ahmad, A., Tipaldi, G.D., Lima, P., Burgard, W.: Cooperative robot localization
and target tracking based on least squares minimization. In: 2013 IEEE Interna-
tional Conference on Robotics and Automation, pp. 5696–5701. IEEE (2013)
2. Blackman, S.S.: Multiple hypothesis tracking for multiple target tracking. IEEE
Aerosp. Electron. Syst. Mag. 19(1), 5–18 (2004)
Distributed Tracking of Clustered Targets 139
3. Chen, J., Dames, P.: Collision-free distributed multi-target tracking using teams of
mobile robot with localization uncertainty. In: IEEE/RSJ International Conference
on Intelligent Robots and Systems (2020)
4. Chen, J., Dames, P.: Distributed multi-target tracking for heterogeneous mobile
sensing networks with limited field of views. In: IEEE International Conference
Robotics and Automation (2021), Under review
5. Cortes, J., Martinez, S., Karatas, T., Bullo, F.: Coverage control for mobile sensing
networks. IEEE Trans. Robot. Autom. 20(2), 243–255 (2004)
6. Dames, P.M.: Distributed multi-target search and tracking using the phd filter.
Auton. Robot. 44(3), 673–689 (2020)
7. Doucet, A., Vo, B.N., Andrieu, C., Davy, M.: Particle filtering for multi-target
tracking and sensor management. In: Proceedings of the Fifth International Con-
ference on Information Fusion, vol. 1, pp. 474–481. IEEE (2002)
8. Hamid Rezatofighi, S., Milan, A., Zhang, Z., Shi, Q., Dick, A., Reid, I.: Joint
probabilistic data association revisited. In: Proceedings of the IEEE International
Conference on Computer Vision, pp. 3047–3055 (2015)
9. Konstantinova, P., Udvarev, A., Semerdjiev, T.: A study of a target tracking algo-
rithm using global nearest neighbor approach. In: Proceedings of the International
Conference on Computer Systems and Technologies (CompSysTech 2003), pp. 290–
295 (2003)
10. Kuhn, H.W.: The Hungarian method for the assignment problem. Naval Res. Logis-
tics Q. 2(1–2), 83–97 (1955)
11. Lloyd, S.: Least squares quantization in pcm. IEEE Trans. Inf. Theory 28(2),
129–137 (1982)
12. Mahler, R.P.: Multitarget bayes filtering via first-order multitarget moments. IEEE
Trans. Aerosp. Electron. Syst. 39(4), 1152–1178 (2003)
13. Mahler, R.P.: Statistical multisource-multitarget information fusion, vol. 685.
Artech House Norwood, MA (2007)
14. Schuhmacher, D., Vo, B.T., Vo, B.N.: A consistent metric for performance evalu-
ation of multi-object filters. IEEE Trans. Signal Process. 56(8), 3447–3457 (2008)
15. Schwager, M., Rus, D., Slotine, J.J.: Decentralized, adaptive coverage control for
networked robots. Inter. J. Robot. Res. 28(3), 357–375 (2009)
16. Sung, Y., Budhiraja, A.K., Williams, R.K., Tokekar, P.: Distributed assignment
with limited communication for multi-robot multi-target tracking. Auton. Robot.
44(1), 57–73 (2020)
A Force-Mediated Controller
for Cooperative Object Manipulation
with Independent Autonomous Robots
1 Background
Cooperative (or collaborative) manipulation is the task where two or more
agents, which may be any combination of robots and humans, must work together
to manipulate an object too large or unwieldy for one agent to handle alone.
Often one agent may act as a leader, seeking to move the object along a privately
known trajectory through space, aided by an arbitrary number of independent
helper agents who have no direct knowledge of the path or destination.
Challenges associated with this task can include the following: (1) The details
of the environment (topography of the ground, locations of obstacles, etc.) may
not be known by some or all agents, and may change dynamically. (2) The
number and locations of the agents participating in the task may not be known.
(3) The properties of the object (mass, geometry, etc.) may not be known. (4)
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 140–155, 2024.
https://doi.org/10.1007/978-3-031-51497-5_11
A Force-Mediated Controller for Cooperative Object Manipulation 141
There may be unexpected disruptions (e.g., the object or agents may bump into
obstacles). (5) Communication may be problematic, e.g., unreliable, undesirable,
or unscalable.
We are interested in the case of an unanticipated need to move an object,
with no opportunity for advance preparation. That is, the properties of the
environment, object, set of helpers, and desired manipulation trajectory are all
unknown, and some of these elements may also be changeable and dynamic.
Further, if the leader is a human, then communicating the details of a desired
trajectory to the helper robots is unlikely to be quick, easy, or detailed and
accurate. From a usability standpoint, the ideal way for the robots to give help
would be for the human to try to move the object in the desired way directly, by
exerting forces on it, and for the robots to supplement the leader’s force to enable
the manipulation, without requiring any explicit instruction. An approach that
works under these restrictive conditions could also be used in a scenario where
more information or communication is available.
2 Related Work
Work on collective manipulation typically assumes that all robots share knowl-
edge about the intended trajectory, inertial parameters of the payload, and/or
number and locations of teammates, and that explicit robot-to-robot commu-
nication is available [1,7,15,17,19]. Some work has considered the problem of
obtaining such information in partially restricted cases, e.g., estimating inertial
parameters using carefully coordinated actions and knowledge about the team
[15]. Some studies rely on specialized gripper hardware that constrains freedom
of movement in specific ways [10]. A subcategory of cooperative manipulation
is collective transport, where movement is limited to a plane [4,9,10]. Here we
consider cases where only limited payload information (local mass and relative
center of mass (CM) location) can be obtained, no other information or explicit
communication is available, movement in any dimension may be desired, and
off-the-shelf hardware can be used.
and prevent movement out of the plane. That is, the contextual framework that
robots used to interpret forces was the a priori knowledge that only movement
in the plane would be intended.
In this paper, we modify our previously presented controller to permit more
general collaborative manipulation tasks, encompassing movement in any direc-
tion. The central principle robots use to determine context is to interpret tran-
sient or weak forces as unintended (e.g., perturbations due to collisions), and sus-
tained stronger forces as intentional control signals. We show that if each robot
can estimate the payload’s mass and relative CM location, then it can respond
to control signals in any direction, switching between dimensional contexts with-
out loss of controller stability or compliance. We demonstrate this framework in
physical collaborative experiments using a compliant 7-DoF manipulator.
The key assumptions of the approach are: (1) Each agent can obtain an
estimate of the load’s relative CM location and its share of the mass (the latter
may typically be m/na , for a system of na agents and total mass m, with both m
and na unknown to the agents). (See §4.2 for a discussion of how this assumption
can be realized.) (2) There are enough agents to handle the load (na is large
enough that m/na is significantly less than the carrying capacity of any single
agent). (3) The grasp or connection between each agent and the shared load is
A Force-Mediated Controller for Cooperative Object Manipulation 143
rigid, with no slippage or in-grasp rotation. (4) Agents have some compliance,
permitting a degree of conflict between forces exerted by different agents without
immediately causing instability. (5) Each agent knows the orientation of its base
with respect to a shared reference vector (e.g., robots can sense the direction of
gravity while moving over uneven terrain). We do not assume other knowledge
about the environment, task, agent numbers or locations, nor direct explicit
communication.
We note that compliance can be helpful in multiple ways. For instance, it
facilitates movement of the object in response to the guiding force, providing
displacement as well as force cues that can be used by helper robots to infer the
intended motion (Table 1).
4 Control Methodology
4.1 Closed Loop Control Dynamics
The system dynamics for each agent take the standard form
T = M q̈ + C(q, q̇)q̇ + g(q) + Text (1)
144 N. E. Carey and J. Werfel
where the load mass and inertia are modeled as connected rigid components
of the end-effector, incorporated into the mass/Coriolis/gravitational matrices
accordingly.
Each agent’s (dynamic) context for the intended manipulation is encoded as
a binary 6 × 1 vector d, expressed in the robot’s base frame. di > 0 indicates
the agent should follow a guiding force along dimension i; di = 0 indicates a
dimension where any external force is likely to be disturbance-related.
We can use d to decouple the instantaneous control torque applied in the
robot’s joint space into two components: a task controller, and a null or residual
controller. Using d as a dimensional filter on the full manipulator Jacobian J
gives a task Jacobian Jt such that ẋ = Jt q̇. The null space Jacobian Zν can
be derived as in [18]. This allows us to develop a task-specific control schema,
and use the complementary space to enact a different form of control on the
remaining null dimensions.
In a previous study [4], we considered a framework in which the task space
was the set of dimensions in which external forces should be rejected, and the null
space allowed free movement along other dimensions. Here we invert that division,
so that instead of assigning a high priority task of stabilisation against unknown
inertial dynamics (with the force-following component relegated to a lower prior-
ity null space), we assign the collaborative force-following goal to the high prior-
ity task space, while disturbance rejection of out-of-task forces is accomplished in
the null space. This assignment allows the force-following behaviour of each agent
(and hence the collective) to be more sensitive and responsive.
The task space controller takes the following form (a task passivity-based
control):
ẍc = Λ−1
x (wt − (μx + Dx )ẋ − γKe x̃) (2)
where wt is an estimate of the guiding external wrench in the chosen task domain,
expressed in the robot’s base frame, and measured by the robot using either
joint torque sensors or a wrist-based force/torque sensor; (x, ẋ) describes the
trajectory of the end-effector in the task-space dimensions only (i.e., x is a subset
of the full 6×1 end-effector pose vector p); and Λx , μx are the task-space dynamic
inertia and Coriolis matrices, respectively [20]. The error-correcting stiffness term
Ke is explored in Sect. 4.2.
In a cluttered dynamic environment, it is likely that disturbance forces will
be due to environmental collisions. Rigidly maintaining null-space target posi-
tioning during a collision risks potential damage to the robot or the colliding
object/person. However, the control space of the null component is defined by
the residual degrees of freedom that remain unused by the task controller (i.e.,
it is thus both task- and robot-dependent). Noting this, we can take advantage
of the greater number of degrees of freedom offered by redundant manipulators,
and use a redundant joint space to accommodate collisions compliantly while
minimally impacting the task space behavior. (If, instead, a high rigidity null
space is desired, or the robot has insufficient DOFs to accommodate all possi-
ble task space configurations while rejecting error, a high stiffness impedance
controller based on joint space error could be used.)
A Force-Mediated Controller for Cooperative Object Manipulation 145
where μν , Λν are the null Coriolis and inertial matrices [20], Kν is a stiffness term
acting on offsets in non-task-space end-effector dimensions, and Bν , Dν represent
damping gains. The null position error state variable η̃ can be calculated using
D̄b , the binary matrix derived from the negation of d, assuming that for small
changes in η and q, we can use the expanded (full-rank) Jacobian JE to transform
errors in joint space to the (non-task-constrained) end-effector frame dimensions:
δη = D̄b JE δq (4)
(In fact for x = 0, where the goal joint configuration qd is governed solely by
null space control input, η̃ is the 6 × 1 end-effector pose projection of the joint
error q̃.)
The input torque corresponding to the task control components can be writ-
ten
Tt = −JtT Λx (J˙q̇) + (wt − (μx + D)ẋ − γKe x̃) (5)
and our total input torque becomes
ΔM i = M i − Mest
i
. (7)
When a follower agent senses an external force, we can assume that this force
is predominantly composed of a wrench imparted by the guiding agent, and any
residual uncompensated inertial elements of the load:
where Mmax (q) is the mass matrix for a load of mass mmax at joint configuration
q. Based on this minimum guiding wrench input, and the current system state,
we can project a minimum expected end-effector pose estimate xg , given a dis-
crete time-window Δt. The difference between the actual pose and this forward
projection can be written as an error term in the task domain, x̃ = xg − x.
This allows us to introduce an error correction term into our control acceler-
ation equation, using a spring stiffness gain matrix (Ke ) (see Eq. 2), and our
challenge then becomes choosing a suitable Ke so that it fully compensates for
model error.
From Eq. 9, suppose f (Mmax , q) is the inertial force applied by the maximum
load the robot could support (calculated given the joint configuration q). Then a
possible choice of the elements ke of Ke is |kej | ≥ max(f (ML,max , q))j , i.e., we
estimate the maximum possible force exerted by the inertial mass matrix upper
bound Mmax in each task dimension {j}, and choose Ke to guarantee sufficient
compensation force. Note that given Eq. 9, the ratio of {maximum force it is
reasonable for the guiding agent to exert} :: {upper bound of load estimate}
represents a metric for the system responsiveness.
Ke need not be constant: an adaptive stiffness term can be used without loss
of stability [14] (provided K̇e is bounded), as shown in [4], to create regions or
dimensions of high and low responsiveness based on the expected impact of errors
in the inertial estimate. Adaptive stiffness can also be increased temporarily in
response to unexpected motions of the load, to prevent instability that could
otherwise result if load inertia estimates are poor [4].
Other work has shown approaches for multiple robots to collaboratively esti-
mate load parameters [8,15]. In future work we plan to extend these approaches
to the case without explicit communication. In the reductive case of a single
guiding agent and a single follower (as in §5.1), an upper bound on ΔM i can be
found by examining the force residuals Δf (q) at the extremes of inertial offset,
while only the follower agent is carrying the load (call this a stiffness tuning
phase). Since in this case it is also relatively trivial to obtain a high degree of
accuracy in inertial load estimation [16], in practice these residual force errors
are minimal, and a small constant Ke suffices to fully compensate for model
error.
In this section, we consider the stability of the full operational space controller,
including the model-error correction component described above. Let our state
variable be z = {q̃, ν, T̃ , f̃ , x̃}. . From [11], we can demonstrate asymptotic sta-
bility for the system if there exists a function V (z) such that in a neighborhood
Ω of an equilibrium point z = 0, the following three conditions are satisfied:
The total wrench acting on the system along the task dimensions can be writ-
ten f = wt + f̃ , where wt is the estimated input guiding input wrench as before,
and f̃ is the total force estimation error, now including not only inaccuracies in
the system model as above, but also other sources of error such as sensor noise.
(Note that we are not considering the constrained null dimensions, so this error
term is not confounded by disturbance rejection). At equilibrium, f = f̃ , and we
can introduce a disturbance observer to track error in the force domain. Using
˙
the task state error defined previously, we desire ẋg = 0, so f̃ = −Γf−1 ẋ, where
Γf is symmetric and positive definite.
Let V (x, f ) be a candidate Lyapunov sub-function that considers only task-
related energy terms:
1 1
V (x, f ) = x̃T Ke x̃ + ẋT Λx ẋ + f̃ T Γf f̃ (10)
2 2
This function is positive definite along (x, f ) and positive semi-definite on the
full state z (condition 1 above). Differentiating, we find
1 ˙
V̇ = 2x̃T Ke ẋ + ẋT Λx ẍ + ẋT Λ̇x ẋ + f̃ T Γf f̃ (11)
2
Substitute the closed form dynamics from Eq. 2 to obtain
1 T ˙
V̇ = 2x̃T Ke ẋ + ẋT Λx Λ−1
x (wt − (μx + D)ẋ − γKe x̃) + ẋ Λ̇x ẋ + f̃ Γf f̃ (12)
T
2
We can substitute the disturbance observer defined in Eq. 4.3 and obtain an esti-
mate of the task dimension force error by examining the position error in com-
bination with the environmental stiffness. Then, considering the skew-symmetry
of Λ̇x − 2μx [20], we find condition 2 above on V̇ is satisfied provided
and since D is constant and positive definite, we require only that the controller
gain γ > 2 to ensure conditional stability.
Furthermore, under the set condition x = 0, our null state error term
Z T Kn J η̃ reduces to Z T Kn q̃, and the null space control torque is analogous
to that used in [20]. By using a similar subset function candidate VL on the
set L = {q̃, ν, T̃ , f̃ = 0, x̃ = 0}, it can be shown that the system is asymptoti-
cally stable conditional to L (condition 3 above). Hence all conditions for overall
system stability are fulfilled.
The guiding agent can indicate a change in d (e.g., switching from lifting to
horizontal carrying) to the other agents using the payload as the medium for a
physical signal. For this paper, we note that in the absence of slow changes in
uncompensated force driven by unknown load inertia (and with the assumption
148 N. E. Carey and J. Werfel
5 Experimental Validation
The approach requires a force-sensitive manipulator with (ideally) > 6 degrees
of joint freedom. The Franka Emika Panda cobot meets these needs (7 DOF)
and is readily commercially available, making it an ideal platform with which
to undertake proof of concept experiments. In §5.1 we discuss hardware test-
ing/demonstration of the approach, using the single physical Panda available to
us to assist a human leader. In §5.2 we discuss simulations with four Pandas
assisting a leader. All code is available at [2].
Fig. 3. Robot guidance without and with contextual information. The human guide
applies a torque to rotate the end-effector through 90 degree, around the roll axis. In the
first sequence of images (top), the robot control torque compensates for gravitational
load, but otherwise is compliant in all dimensions. In the second sequence (bottom), we
implement our decoupled force/impedance algorithm with the task constraint vector
d = {0, 0, 0, 1, 1, 0}, rejecting wrenches in any direction other than roll and pitch. X’s
show the position of the end of the object across successive frames. Video: [5].
6 Results
6.1 Controller Performance Within Static Task Domain
To investigate the performance of the two control elements (force following vs.
impedance-control), we defined our task domain as d = {1, 1, 0, 0, 0, 1}, applied a
set of short disturbance forces to a load held by a single robot, and examined the
resulting end-effector trajectory. A sample output is shown in Fig. 2, with motion
and force expressed relative to the robot’s base frame of reference. Disturbances
A Force-Mediated Controller for Cooperative Object Manipulation 151
in the pitch or vertical motion are rejected by the null controller, while those
around the yaw axis or along the lateral Y axis result in motion for as long as
the wrench is applied, and do not return to the origin position afterwards.
To demonstrate the behavioural difference between our chosen decoupled
control schema and a more conventional compliant control that might be used
for, e.g., a learning by demonstration context, we consider a task whose goal is a
90 degree rotation of a manipulated object around the base’s roll axis. Figure 3
compares the motion of the robot and object for each of the two control schemes
in response to the same user input wrench. In the top set of images, the robot
is under a free-floating (mass-compensated) control without contextual informa-
tion about which wrench dimensions should be followed. Since the user input—as
expressed through the shared load—has wrench components in multiple dimen-
sions, the robot’s end-effector position not only rotates, but also moves in the
Y and Z dimensions. In the bottom image series, the robot is controlled by our
decoupled controller and supplied with a dimensional context which restricts
motion in Y and Z (d = {0, 0, 0, 1, 1, 0}). This controller decouples the wrenches
imparted by human manipulation of the load and responds only to those along
or around the task dimensions, resulting in an end-effector pose that changes
only along the roll axis, as desired. In other words, with this control framework
and given a suitable context, we can shape the robot’s resultant trajectory with
much greater precision. See video at [5] (part 1).
Fig. 4. Performing a collaborative task where force following and rejection dimensions
change, under the guidance of a human operator. The human guides the robot to first
insert a key into a slot, then turn it 90 degree and slide it along the slot. Left: Still
image after insertion, during the turning phase. Right: The force and motion mapping
for one dimension (roll). Force following vs force rejection phases are indicated through
colour on the torque input (blue = force follow, red = force reject). Controller gains:
Dx = 5.0I3 , Ke = 0.2I3 , γ = 2.5, Kν = diag(240, 120, 120, 120, 120, 120, 120), Bν =
0.2, Dν = 4.0I4 . Task-switch thresholding: (tth = 1.5s, wth = (5N, 0.5N m)). Video: [5].
Fig. 5. Behaviour of a four robot collective using our controller, following a two-stage
guiding force. Top left: stable state counteracting inertial load, before external force
application. Top right: An external force is applied along the global X axis. Bottom
left: An external torque is applied around the global X axis. Bottom right: The cor-
responding local Y and local pitch pose of one robot during this process. Video: [5].
A Force-Mediated Controller for Cooperative Object Manipulation 153
Figure 5 shows that using only an external force input and a pre-calculated
knowledge of load inertial parameters (note that the robots are not aware of the
number, locations, or behaviour of other agents co-manipulating the object),
the robot collective can ensure the load follows the guidance force. Video: [5]
(part 3).
7 Discussion
A fundamental reason for deploying decoupled operational space control, in this
task of robots performing collective manipulation with limited information, is to
address disturbances imposed by the lack of shared knowledge. Without aware-
ness of the others’ poses or intentions, agents cannot disambiguate between inter-
pretations of sensed forces, and thus in general will apply conflicting torques.
This research demonstrates that (a) given a task context, these kinds of conflicts
between agents can be resolved by using a compliant error-rejection framework;
(b) if the agents are further provided with an estimation of load inertia, this
task context can incorporate out-of-plane motion; and (c) force can be used as a
control signal to enable contextual switching between control dimension subsets.
The use case for such a system is likely to be in collaborative transport or
manipulation in complex, unknown, or dynamic environments. As §6.3 demon-
strates, this kind of spontaneous collaboration can be effectively deployed with
very little foreknowledge or preparation, requiring only one knowledgeable guid-
ing agent. This simplifies the navigational challenges of collaborative transport
considerably, and minimizes the shared knowledge base required. In particular,
using compliant control for both elements of the decoupled controller eliminates
any need for strict global kinematic alignment between agents; hence such sys-
tems should be robust to perturbations such as rough terrain or minor collisions.
To fully realize the goal of collective manipulation without prior knowledge
or communication, further work is needed for letting agents estimate inertial
154 N. E. Carey and J. Werfel
References
1. Bai, H., Wen, J.T.: Cooperative load transport: a formation-control perspective.
IEEE Trans. Rob. 26(4), 742–750 (2010)
2. Carey, N.: Decoupled Controller for the Franka Panda (Version 1.08) [Computer
software] (2022). https://github.com/niccarey/panda decoupled controller
3. Carey, N.: Unity implementation of the Franka Panda platform(version 1.0) [Com-
puter software] (2020). https://github.com/niccarey/FrankaPanda Unity
4. Carey, N.E., Werfel, J.: Collective transport of unconstrained objects via implicit
coordination and adaptive compliance. In 2021 IEEE International Conference on
Robotics and Automation (ICRA), (pp. 12603–12609). IEEE (2021)
5. Carey, N.E., Werfel, J.: A force-mediated controller for cooperative object manip-
ulation [Media file, video] (2022) https://youtu.be/VIoj6v a-Gw
6. Ćehajić, D., Hirche, S.: Estimating unknown object dynamics in human-robot
manipulation tasks. In 2017 IEEE International Conference on Robotics and
Automation (ICRA), (pp. 1730–1737) (2017) IEEE
7. Elwin, M.L., Strong, B., Freeman, R.A., Lynch, K.M.: Human-multirobot col-
laborative mobile manipulation: the Omnid Mocobots. (2022) arXiv preprint
arXiv:2206.14293
8. Franchi, A., Petitti, A., Rizzo, A.: Decentralized parameter estimation and obser-
vation for cooperative mobile manipulation of an unknown load using noisy mea-
surements. In: 2015 IEEE International Conference on Robotics and Automation
(ICRA), (pp. 5517–5522). IEEE (2015)
9. Habibi, G., Kingston, Z., Xie, W., Jellins, M., McLurkin, J.: Distributed centroid
estimation and motion controllers for collective transport by multi-robot systems.
In: 2015 IEEE International Conference on Robotics and Automation (ICRA), (pp.
1282–1288). IEEE (May 2015)
10. Hichri, B., Fauroux, J.C., Adouane, L., Doroftei, I., Mezouar, Y.: Design of cooper-
ative mobile robots for co-manipulation and transportation tasks. Robot. Comput.-
Integr. Manufact. 57, 412–421 (2019)
11. Iggidr, A., Sallet, G.: On the stability of nonautonomous systems. Automatica
39(1), 167–171 (2003)
12. Khatib, O.: A unified approach for motion and force control of robot manipulators:
the operational space formulation. IEEE J. Robot. Autom. 3(1), 43–53 (1987)
13. Kubus, D., Kroger, T., Wahl, F.M.: On-line rigid object recognition and pose esti-
mation based on inertial parameters. In: 2007 IEEE/RSJ International Conference
on Intelligent Robots and Systems, (pp. 1402–1408). IEEE (Oct 2007)
A Force-Mediated Controller for Cooperative Object Manipulation 155
14. Liang, X., et al.: An adaptive time-varying impedance controller for manipulators.
Front. Neurorobot. 16 (2022)
15. Marino, A., Muscio, G., Pierri, F.: Distributed cooperative object parameter esti-
mation and manipulation without explicit communication. In: 2017 IEEE Interna-
tional Conference on Robotics and Automation (ICRA), (pp. 2110–21116). IEEE
(May 2017)
16. Mavrakis, N., Stolkin, R.: Estimation and exploitation of objects’ inertial param-
eters in robotic grasping and manipulation: a survey. Robot. Auton. Syst. 124,
103374 (2020)
17. Montemayor, G., Wen, J.T.: Decentralized collaborative load transport by multiple
robots. In: Proceedings of the 2005 IEEE International Conference on Robotics and
Automation, (pp. 372–377). IEEE (Apr 2005)
18. Ott, C.: Cartesian impedance control of redundant and flexible-joint robots.
Springer (2008)
19. Petitti, A., Franchi, A., Di Paola, D., Rizzo, A.: Decentralized motion control for
cooperative manipulation with a team of networked mobile manipulators. In: 2016
IEEE International Conference on Robotics and Automation (ICRA), (pp. 441–
446). IEEE (May 2016)
20. Sadeghian, H., Villani, L., Keshmiri, M., Siciliano, B.: Task-space control of robot
manipulators with null-space compliance. IEEE Trans. Rob. 30(2), 493–506 (2013)
21. Sariyildiz, E., Sekiguchi, H., Nozaki, T., Ugurlu, B., Ohnishi, K.: A stability anal-
ysis for the acceleration-based robust position control of robot manipulators via
disturbance observer. IEEE/ASME Trans. Mechatron. 23(5), 2369–2378 (2018)
22. Wang, Z., Schwager, M.: Kinematic multi-robot manipulation with no communi-
cation using force feedback. In: 2016 IEEE International Conference on Robotics
and Automation (ICRA), (pp. 427–432). IEEE (May 2016)
A Distributed Architecture for Onboard
Tightly-Coupled Estimation
and Predictive Control of Micro Aerial
Vehicle Formations
1 Introduction
Recent years have been characterized by various improvements in the coordina-
tion and cooperation strategies of autonomous robots. This is because loosely
connected systems controlled by distributed or decentralized strategies have
inherent advantages over centrally governed systems, such as additional flexi-
bility, robustness, and scalability [1]. Among these strategies, formation control,
usually aimed at achieving prescribed geometric constraints between vehicles, is
one of the most actively investigated topics [2]. A promising method to optimally
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 156–172, 2024.
https://doi.org/10.1007/978-3-031-51497-5_12
A Distributed Architecture for Onboard Estimation and Control of MAVs 157
achieve this goal while satisfying diverse constraints is Nonlinear Model Predic-
tive Control (NMPC) due to its architectural heterogeneity, simple reconfigura-
bility, and hierarchical flexibility [3]. Furthermore, to improve the performance,
the distributed version of this algorithm is a prominent candidate because it
approximates the global objective more accurately than its decentralized coun-
terpart by including inter-robot communication.
However, NMPC-based algorithms require an accurate model of the system,
correct state estimates, and sufficient computational resources. Furthermore, the
pure distributed version of the algorithm requires a robust communication frame-
work. These requirements might be challenging to satisfy primarily on resource-
constrained Micro Aerial Vehicles (MAVs). For example, system identification
techniques might not yield a faithful model, the system might be time-varying
due to the intrinsic nature of multiple vehicles, and only the partial state infor-
mation and limited computational power might be available because of hardware
and software constraints. As a result, these problems should be addressed in a
tightly-coupled estimation and control architecture that considers the various
constraints in order to be deployed successfully on real robots. Another critical
challenge for deploying such systems is the dependency of robotic platforms on
global localization systems, such as Motion Capture Systems (MCSs) or Global
Navigation Satellite Systems (GNSSs), for indoor and outdoor scenarios, respec-
tively.
There are various successful real-world implementations of model predictive
approaches for the formations of multi-robot systems. Among those, we focus
on decentralized and distributed approaches with real-world implementations.
In [4], authors propose a trajectory planner that considers various task require-
ments such as swarm coordination, flight efficiency, obstacle avoidance for a
swarm of flying robots and demonstrates the method in cluttered environments.
Although the strategy seems highly effective and scalable for flocking navigation,
visual-inertial odometry (VIO) coupled with Ultra-Wide Band (UWB) distance
measurement (without orientation information) might not be accurate enough
for formation control applications. In [5], authors present a decentralized strat-
egy for the compact swarm of unmanned aerial vehicles without communication
and external localization. The approach is well-constructed for outdoor flock-
ing applications; however, the accuracy of the ultra-violet (UV) direction and
ranging technique is limited for indoor formation applications. Furthermore,
since predictive information is not shared among vehicles, it might yield low
performance in dynamic scenarios. A distributed methodology is studied for
the formation control and obstacle avoidance of homogeneous robots employ-
ing multi-robot consensus in [6]. The method is scalable and efficient with the
number of robots; however, the implementation is not carried out onboard, and
a motion capture system is used to obtain state feedback. In [7], a distributed
predictive approach is presented for drone swarms in cluttered environments
and with high sensor noise. Although the real-world experiments with sixteen
quadrotors show the scalability of the strategy, the computations are not carried
out onboard, and the localization solution is global. In [8], authors introduce a
158 I. K. Erunsal et al.
2 Problem Formulation
Similar to [16], the following notation for vectors and rotation matrices will be
adopted assuming that {a}, {b}, {c} are the coordinate frames and x is the name
of the vector:
xca/b : Vector quantity x ∈ Rn of the origin of {a} with respect to the origin
of {b} expressed in {c}.
Rab : Rotation matrix R ∈ SO3 that transforms a vector expression from {a}
to {b} with the Z-Y-X convention.
. denotes the√Euclidean norm of a vector or the spectral norm of a real
matrix and . P := xT P x (with P ∈ Rn×n and P 0) stands for the weighted
Euclidean norm of x ∈ Rn .
Consider a formation network N that consists of K +1 aerial robots. Assume
that one of them is assigned as a leader L and the rest are followers Fi so that
N = {L, F1 , ..., FK }. All robots operate in a workspace W ⊂ R3 . The state of
each robot i ∈ N is defined as follows:
T
χi = [xnb/n T vb/n
n T n T b
tb/n wb/n ]T (1)
inertial frame and {b} is the body-fixed frame. Then, for instance, the vector
xnb/n would represent the position of the origin of {b} with respect to {n} and
expressed in the latter coordinate system.
160 I. K. Erunsal et al.
Problem 1. Consider one leader and K follower robots under the input, percep-
tion, communication, and state constraints ubb/n ∈ Uc , zb/n
b
∈ Zc and χ ∈ Xc .
Here, the leader can be considered as a separate and independent trajectory
tracking robot. Under the assumptions explained in the previous paragraph,
design a control and estimation strategy for followers so that the robotic net-
work will follow a predefined reference trajectory rp ∈ R3 while maintaining the
formation and connectivity defined by, possibly time-varying, relative positions
references Δ xij , relative yaw angles references Δψi where i ∈ Ni \ {L} and
j ∈ Ni .
A Distributed Architecture for Onboard Estimation and Control of MAVs 161
3 Methodology
The components of our distributed architecture are selected and designed to take
into account resource constraints. In particular, we have paid attention to dis-
tribute the architecture over the network while preserving its strong integration
at the robot level. The former feature allows us to obtain scalable behavior, while
the latter brings high performance and robustness due to the tight coupling of
the perception, communication, estimation, and control components.
The proposed architecture is represented in Fig. 3. Here, the neighborhood
consists exclusively of followers, but the leader could also be part of the sub-
network. One can observe the tight-coupling of components on the vehicle level
and their distribution over the network. The details of the estimation and for-
mation control components and their relationship with perception and commu-
nication will be explained in the next section.
To comprehend the details of the estimation and control architecture, we
should define the coordinate frames employed here. The first frame is the body-
fixed reference frame {b} and is defined for the real-time horizon, the second is
the body-fixed MPC reference frame {d} anchoring to the body throughout any
MPC prediction horizon. As a result, {b} and {d} coincide at the beginning of
each prediction horizon. Furthermore, {m} is defined as the inertial MPC-EKF
frame, which corresponds to the zero vector at the beginning of the motion. The
final reference frame is introduced as the MPC control frame {c}, which is a roll
and pitch-free frame anchored to the {d} and adopted to define the control and
reference variables. These frames are represented in Fig. 2.
Fig. 3. Distributed architecture for a sub-network, tight-coupling of components on the agent level
is observable.
we will present the process and perception models and highlight the important
points.
Consider a new extended state definition for each robot i ∈ N \ {L}:
where Δxij , Δvij and Δψij are the relative positions, velocities, and yaw angles
of the robot i with respect to j ∈ Ni ; vi , vj , ψ̇i and ψ̇j are the linear and
yaw velocities of the robot i and j ∈ Ni , respectively. Finally, di,a stands for
the slowly varying disturbances in the acceleration dynamics of the agent i.
Note that all states are expressed in the inertial frame {m}. Let us concatenate
the defined states so that we obtain an extended state for an individual robot
i ∈ Ni \ {L}:
T T T T
ξi = [ξii ξij1 ξij2
... ξijN
]T (3)
where j1 , j2 , ..., jN ∈ Ni .
Consider a generic nonlinear time-discrete model of an individual agent i:
where Fi is the input, Ri stands for the rotation matrix, g is the gravitational
acceleration vector, Δt is the sample time, w’s represent the Gaussian noise, d’s
are the slowly varying disturbance (or uncertainty) on the corresponding states.
Note that these lumped disturbance estimates accounting for model uncertainty,
environmental disturbance, and time-varying characteristics of the vehicle are
A Distributed Architecture for Onboard Estimation and Control of MAVs 163
crucial for the robust behavior of the predictive formation controller. In addition,
some definitions in the model can be written as follows: wij,Δa := wj,a − wi,a
and wij,Δψ := wj,ψ − wi,ψ . Note that all equations are written with respect to
the {m} frame and ∀j ∈ Ni . Furthermore, ∀j ∈ Ni , the perception model is
given as follows:
where z represents the measurement, ν stands for perception noise, and the
function g(.) maps the relative position to the range and the bearing information.
Furthermore, it is assumed that the robot has an accurate attitude estimator
that generates the roll and pitch estimates φi [k] and θi [k] so that the rotation
matrix Ri is fully defined.
Remark 1. (Neighboring vehicle states): Eq. (19) is critical in the perception
model. vj represents the velocity of any neighboring vehicle j ∈ Ni received
by communication and must be assigned to the reference frame of vehicle i by
Rj . However, if it is not available due to various communication flaws, the EKF
provides a continuous estimation and tracking of the velocity of neighboring
vehicles.
N −1
min Jk (ξi [k + n|k], ui [k + n|k], ri [k + n|k])+
ξ i [k|k],...,ξ i [k+N |k]
u i [k|k],...u i [k+N −1|k] n=0
pj = KCΔxij (25)
where C is the camera matrix, which is a function of the focal length f of the
camera and K is the gain.
Now, the stage and terminal cost functions can be expressed as follows:
Jk (ξi [k + n|k], ui [k + n|k], ri [k + n|k]) =
Δxij [k + n|k] − Δxij [k + n|k]QΔx +
j∈Ni
vj [k + n|k] − vi [k + n|k]Qv +
j∈Ni
ui [k + n|k]Qu + Δui [k + n|k]QΔu + pj Qp (26)
j∈Ni
where Δ xij represents the geometric formation reference and Δui stands for
the rate of change of inputs. As a result, in the stage cost function, the first
term minimizes the formation errors, the second term stabilizes the relative
velocities, the third and fourth terms optimize the control effort, and the last
one is responsible for satisfying camera FOV constraints.
The continuous version of the prediction function, which includes not only ego
but also neighboring vehicle motion characteristics, fˆ expressed in Eq. (21) and
can be discretized with any efficient method such as the implicit Runge-Kutta
(i-RG), can be written as follows:
˙ m
d/m,ij = vd/m,j − vd/m,i
m m
Δx
m Rdm d m
v̇d/m,i = F + g + Cd vd/m,i + di,a
mb d/m,i
1
ṫm
d/m,i = (ki tref,i − tm
d/m,i )
τi
m
ψ̇d/m,j = 0, ∀j ∈ Ni (28)
d
where inputs are thrust Fd/m,i and attitude angle references tref,i . Finally, gn
and hn include constraints to impose the safety conditions/limits such as the
maximum velocity, thrust, and Euler angle that the robotic agent is allowed to
achieve. The following remarks will be helpful to gain more insight about the
OCP:
Remark 2. (Information sharing): At each sample instant, vd/m,j m
, the predicted
m
velocity of neighboring vehicles should be received and vd/m,i , the vehicle’s pre-
dicted velocity should be transmitted over the communication channel for the
whole horizon. If the received information is not available for some reason, the D-
NMPC leverages the estimates sent by the local EKF for the prediction. In this
case, the formation controller assumes a constant velocity over the prediction
horizon.
Fig. 4. The Helipal Quadrotor and its Fig. 5. Webots simulation consisting
components. of three drones equipped with April-
bundles and a RGB camera.
Fig. 6. Experiment arena with Fig. 7. Leader’s trajectory for the navigation
two drones. experiment.
Fig. 8. Formation errors for the nav- Fig. 9. Solver metrics: cost function
igation experiment: x, y, z and norm value, KKT value, solver time and the
error, dashed red lines indicate the number of iterations, dashed red lines
averages over time. indicate the averages over time.
disturbance estimations are mostly steady, indicating that the main disturbance
source in this experiment is the time-invariant model uncertainty.
Fig. 10. Estimation errors (relative position, Fig. 11. Follower’s reference dur-
velocity) and disturbance estimations in 3D, ing the reconfiguration experi-
dashed red lines indicate the averages over ment. 15-30-30 corresponds to the
time. angles of relative position vectors
For these experiments, the position of the leader is kept fixed and the formation
reference of the follower is changed over a prescribed duration to reconfigure the
formation geometry. Three distinct relative position references for the follower
are displayed in Fig. 11. The total duration of the experiment is around 23 s and
the scenario is realized five times.
The mean and standard deviations of the formation errors are given in Fig. 12.
We can see that the oscillatory behavior of the errors is due to the fact that the
reference commands are sent separately over the course of the experiment. As
a result, the mean error is around 0.21 m ≈ 7% of the total range. However,
the errors reach steady state after some time and, on average, stabilize around
0.17 m.
Solver metrics are shown in Fig. 13. We can make similar observations for
the solver metrics, except for the cost value. It can be seen that the cost value
is overall higher but reaches the same low levels at the end of the experiment.
This is because the experiment is more dynamic compared to the formation
navigation experiment. Furthermore, the observed plateaus are possibly due to
the transient compromise between the individual terms in the cost function.
170 I. K. Erunsal et al.
Fig. 12. The formation errors for the Fig. 13. Solver metrics: cost function
reconfiguration experiment: x, y, z and value, KKT value, solver time and the
norm errors, dashed red lines indicate number of iterations, dashed red line
the averages over time indicate the averages over time
4.3 Discussion
In order for the reader to compare the quality of the results, we present a brief
benchmark on relative position estimation and formation errors. Recently, [28]
introduced a decentralized VIO-UWB perception and estimation scheme for
aerial swarms. For an indoor trajectory of 27.8 m, they obtained about 0.08
m average relative position estimation error, which is comparable to our vision-
based solution. Another state-of-the-art localization method presented by [5]
targets outdoor applications with their ultra-violent (UV) markers and camera,
and they achieved 1.16 m relative localization RMSE. Finally, [21] employed
a distributed infrared-based relative localization system for the formations of
quadrotors. The system achieved an error of less than 0.20 m in range, 5◦ in
bearing and 10◦ in elevation for the sensor-to-target distance of 3 m. Consider-
ing formation control errors, [21] adopted a distributed graph-based formation
control method that obtained a 0.27 m range and 5◦ elevation error for the
desired range of 2.26 m, which corresponds to 12% of the full range. In our pre-
vious work [13], we proposed a decentralized predictive approach and achieved
an average formation error of 0.44 m. The high error observed in this work was
due to the exclusion of aerodynamic disturbances occurring between vehicles and
the assumption of constant neighbor velocity in the prediction model.
The low average estimation (≤0.1 m) and formation (≤0.17 m) errors
obtained in two different formation control scenarios reveal the high perfor-
mance of our strategy. By leveraging the tightly-coupled distributed MPC and
EKF scheme, we could simultaneously comply with perception, communication,
and vehicle constraints. Although the method is scalable in terms of algorith-
mic properties, a couple of issues should be addressed for real-world scalability.
First, at least three wide-lens cameras should be used to perceive the neighbor
vehicles in all directions. Next, the April-bundle pose detection algorithm is not
lightweight, and in our current implementation, it runs on the CPU. There are
A Distributed Architecture for Onboard Estimation and Control of MAVs 171
5 Conclusion
Obtaining a robust, autonomous, performant and scalable control and estimation
architecture for multi-robot systems is a significant challenge. In this work, we
present a unified, predictive, and distributed architecture to carry out leader-
follower formations with MAVs. Our approach employs onboard resources and
integrates various physical constraints, such as camera FOV, actuation, velocity,
and acceleration limits, into the architecture. After a parameter tuning process
in a high-fidelity simulator with three vehicles, we validated and showed the
performance of our approach with two real quadrotors, one serving as leader
and the other as follower, for two common formation tasks.
In the future, we intend to validate the approach with more following vehicles,
improve the autonomy of the leader, analyze the robustness in presence of various
communication problems and compare our solution with alternative approaches
in detail.
Acknowledgment. This work has been partially sponsored by the FCT grant
[PD/BD/135151/2017], the FCT doctoral program RBCog and the FCT project
[UIDB/50009/2013]
References
1. Ebel, H., Ardakani, E., Eberhard, P.: Distributed model predictive formation con-
trol with discretization-free path planning for transporting a load. Robot. Auton.
Syst. 96, 211–223 (2017)
2. Oh, K., Park, M., Ahn, H.: A survey of multi-agent formation control. Automatica
53, 424–440 (2015)
3. Eren, U., Prach, A., Koçer, B., Raković, S., Kayacan, E., Açıkmeşe, B.: MPC in
aerospace systems: current state and opportunities. J. Guid. Control. Dyn. 40,
1541–1566 (2017)
4. Zhou, X., et al.: Swarm of micro flying robots in the wild. Sci. Robot. 7, eabm5954
(2022)
5. Petráček, P., Walter, V., Báča, T., Saska, M.: Bio-inspired compact swarms of
unmanned aerial vehicles without communication and external localization. Bioin-
spir. Biomim. 16, 026009 (2020)
6. Alonso-Mora, J., Montijano, E., Nägeli, T., Hilliges, O., Schwager, M., Rus, D.:
Distributed multi-robot formation control in dynamic environments. Auton. Robot.
43, 1079–1100 (2019)
7. Soria, E., Schiano, F., Floreano, D.: Distributed Predictive Drone Swarms in Clut-
tered Environments. IEEE Robot. Autom. Lett. 7, 73–80 (2022)
8. Allamraju, R., et al.: Active perception based formation control for multiple aerial
vehicles. IEEE Robot. Autom. Lett. 4, 4491–4498 (2019)
172 I. K. Erunsal et al.
9. Hafez, A., Marasco, A., Givigi, S., Iskandarani, M., Yousefi, S., Rabbath, C.: Solv-
ing multi-UAV dynamic encirclement via model predictive control. IEEE Trans.
Control Syst. Technol. 23, 2251–2265 (2015)
10. Yuan, Q., Zhan, J., Li, X.: Outdoor flocking of quadcopter drones with decentral-
ized model predictive control. Int. Soc. Autom. Trans. 71, 84–92 (2017)
11. Van Parys, R., Pipeleers, G.: Distributed MPC for multi-vehicle systems moving
in formation. IEEE Robot. Autonom. Syst. 97, 144–152 (2017)
12. Abichandani, P., Levin, K., Bucci, D.: Decentralized formation coordination of
multiple quadcopters under communication constraints. In: IEEE International
Conference on Robotics and Automation, pp. 3326–3332 (2019)
13. Erunsal, I., Ventura, R., Martinoli, A. NMPC for formations of multi-rotor micro
aerial vehicles: an experimental approach. In: International Symposium on Exper-
imental Robotics, pp. 449–461 (2020)
14. Thakur, D., Tao, Y., Li, R., Zhou, A., Kushleyev, A., Kumar, V.: Swarm of inexpen-
sive heterogeneous micro aerial vehicles. In: International Symposium on Experi-
mental Robotics, pp. 413–423 (2020)
15. Olson, E.: AprilTag: a robust and flexible visual fiducial system. In: IEEE Inter-
national Conference on Robotics and Automation, pp. 3400–3407 (2011)
16. Fossen, T.: Handbook of marine craft hydrodynamics and motion control. John
Wiley and Sons (2011)
17. Mahony, R., Kumar, V., Corke, P.: Multirotor aerial vehicles: modeling, estimation,
and control of quadrotor. IEEE Robot. Autom. Mag. 19, 20–32 (2012)
18. Omari, S., Hua, M., Ducard, G., Hamel, T.: Nonlinear control of VTOL UAVs
incorporating flapping dynamics. In: IEEE/RSJ International Conference on Intel-
ligent Robots and Systems, pp. 2419–2425 (2013)
19. Jazwinski, A.: Stochastic processes and filtering theory Jazwinski. AH Academic
Press (1970)
20. Quan, Q.: Introduction to multicopter design and control. Springer (2017)
21. Dias, D.: Distributed State Estimation and Control of Autonomous Quadrotor
Formations Using Exclusively Onboard Resources. (EPFL-IST PhD Thesis, No.
9224) (2019)
22. https://docs.px4.io/ . Autopilot control. Accessed July 2022
23. Magni, L., Scattolini, R.: Stabilizing decentralized model predictive control of non-
linear systems. Automatica 42, 1231–1236 (2006)
24. Szeliski, R.: Computer vision: algorithms and applications. Springer Science &
Business Media (2010)
25. Gros, S., Zanon, M., Quirynen, R., Bemporad, A., Diehl, M.: From linear to non-
linear MPC: bridging the gap via the real-time iteration. Int. J. Control 93, 62–80
(2020)
26. Ferreau, H., Kraus, T., Vukov, M., Saeys, W., Diehl, M.: High-speed moving hori-
zon estimation based on automatic code generation. In: IEEE Conference On Deci-
sion And Control, pp. 687–692 (2012)
27. Michel, O.: Webots: professional mobile robot simulation. Int. J. Adv. Rob. Syst.
1, 39–42 (2004)
28. Xu, H., et al.: Omni-swarm: a decentralized omnidirectional visual-inertial-uwb
state estimation system for aerial swarms. IEEE Trans. Robot. (2022)
29. https://github.com/NVIDIA-AI-IOT/isaac ros apriltag . Accessed Sept 2022
Search Space Illumination of Robot
Swarm Parameters for Trustworthy
Interaction
1 Introduction
Robot swarms create promising opportunities within a wide range of applica-
tions, from intralogistics for small businesses, to space exploration [24]. How-
ever, to create a swarm robotic system that can be deemed trustworthy, and
thus enable wide-spread use, swarm operators will need to rapidly simulate,
deploy, monitor, and control swarms through intuitive cyber-physical infrastruc-
ture, making use of methodologies that do not retract from the swarms inherent
beneficial qualities.
Swarms of robots present an interesting challenge when it comes to human
control. Human input can benefit the swarms performance by providing context
or instructions from the human operator that would not typically be available to
the swarm [13]. However, the high number of agents within a swarm can present
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 173–186, 2024.
https://doi.org/10.1007/978-3-031-51497-5_13
174 J. Wilson and S. Hauert
difficulty; with more agents acting within a system, the more difficult manually
controlling the entire swarm is, potentially surpassing the cognitive limit of the
operator [15]. Additionally, by centralising the swarm to a human operator it is
possible to create a single point of failure in the system. As a result, the robot
system may lose its swarm-like properties i.e. the adaptability, scalability and
robustness swarms gain from their innate redundancy and distributed operation.
[9]
The level of control an operator has over a swarm is a key consideration when
producing an effective swarm system [1]. Too much influence from a human, espe-
cially from an inexperienced operator, can lead to significantly reduced swarm
performance as the collective behaviours of the swarm are overridden by human
control. This effect has been referred to as ‘Neglect Benevolence’ [28] in which
swarms can, in some cases, be observed to operate more effectively as interfer-
ence from a human operator reduces. Nonetheless, currently there is a demand
for human operator supervision of autonomous systems [20] and swarms [4]. Not
only can operators provide the aforementioned context to swarm tasks, but they
can also provide an element of trust in the swarm system, giving the public reas-
surance in their interactions with the swarm. Operators are able to make ethical
considerations when presented with scenarios that current levels of autonomy
would struggle to provide adequate resolutions to. For example: directing agents
to areas of importance in a wild fire scenario, or selecting a preferable behaviour
in a shared human-robot warehouse as humans enter/exit the environment. Even
in less critical scenarios, operators themselves may exhibit more trust in a system
they are able to influence towards their preference for operation.
To address the issue of effective human swarm interaction, many techniques
have been investigated. Some studies have explored approaches for direction
control of agents including gesture, voice and touch [14,22]. While others have
presented approaches for monitoring and providing information on many agents
to an operator [6,29]. In addition to this, there have also been investigations into
the appropriate levels of control over swarms [1,27].
Recently, evolutionary techniques have been deployed to automatically gen-
erate swarm behaviour to overcome the difficulty of collective behaviour design.
[2]. Within this field, methods have been explored that generate several swarm
behaviour options that are human understandable and human readable [8]. Oper-
ators with contextual awareness can then use these options to send commands
to a swarm, which then behaves autonomously and in a distributed manner to
execute the desired swarm behaviour. This allows the operator to modify the
behaviour of the swarm as a task or environment varies, while still preserving
swarm qualities.
We look to expand upon the idea of operator behaviour selection and hope
to add an element of trustworthiness to swarm systems by providing operators
with characteristic options defined in simple terms. Options which have been
measured and tested based on sets of generated, low-level, swarm parameters.
When it comes to deployment, this means that an operator can simply express
their use-case’s characteristic priorities, and select a set of parameters for their
Search Space Illumination for Trustworthy Swarms 175
Fig. 1. Illustration of target interface displaying the behaviour of the swarm within
an arena (left side of interface). Users are able to list their priority of characteristic
traits. Based on this priority, the top sets of available characteristic combinations are
displayed in a radar chart to the right. Operators can then select the suggestion they
feel best fits their use case or can re-specify their preference order to receive additional
recommendations.
176 J. Wilson and S. Hauert
2 Methodology
2.1 Scenario
Simulated Agents. The agents within our simulation are based on DOTS, an
open access swarm robot solution designed for industrial swarm experimentation
[10]. While this experimentation platform has a high fidelity simulation platform
available, we chose to create a simple 2D simulation for the purpose of executing
the illumination algorithm. This was due to the large amount of computing
power that would be required to simulate 5–40 agents in high fidelity across
several thousand iterations.
In our simulation, agents measure 30 cm in diameter and are able to pick up
boxes instantaneously once within 25 cm of the boxes centre coordinate. Agents
have been simulated with simple 2D physics and, should their avoidance systems
fail, are able to collide elastically with other agents, boxes, and the walls of the
environment. The maximum velocity, size of the swarm, maximum amount of
time an agent will carry a box, and distance at which agents begin to avoid
one other are all controlled by the Map-Elites mutations. The limitations of our
swarm parameters and the magnitude of mutations applied on each iteration of
Map-Elites are shown in Table 1.
Agent within
Check timer. No Agent within No No
emergency
Start. Have 10 sec- in avoidance
avoidance
onds passed? range? range?
Yes No
Fig. 2. Flow chart illustrating the behaviour executed by every swarm agent.
Energy Consumption. Our first swarm characteristic is the total energy con-
sumed by the swarm (E) (displayed in Eq. 2). Energy consumption is a key
ethical consideration when creating modern products and, depending on the
use case, may be a factor in the trustworthiness of a system [21,25]. E was pro-
duced by summing the energy used by robots (Er ) across the experiment length.
Energy consumption in this case was approximated through the magnitude of
acceleration each agent experienced at any given time step. This creates a score
capturing where the majority of energy would be consumed in the equivalent
real-world system.
As with the performance metric, to compensate for early completion of the
task and to make parameter set comparison fair, the total score was multiplied
by maximum experiment length (T ) over actual time taken to complete the task
(t).
nr
T
E= Er (2)
t r=1
Robot Danger. The next characteristic was the average number of time each
agent spent in the emergency avoidance mode (shown in Fig. 2). This character-
istic was chosen to give an indication of how safe a system might be. Equation 3
shows that this was calculated by summing the amount of time each agent spent
performing emergency avoidance (Dt ) throughout the experiment and dividing
by the total number of agents in the swarm (nr ). This was once more multi-
plied by a factor of Tt to give a value representative of how long agents would
be expected to have spent in emergency avoidance should the simulation have
extended to the maximum time limit.
nr
T r=1 Dt
D= (3)
t nr
Search Space Illumination for Trustworthy Swarms 179
Robustness. Our final characteristic, robustness, comes from the recent work
on quantifying swarm traits conducted by Milner et al. [18]. This metric is pro-
duced from the deviation from optimal performance due to faults (dP ) over the
deviation from optimal swarm size due to loss of agents (dN ). Using this metric,
values where R ≥ 1 are considered robust while R < 1 are not. This means that a
system is considered robust when, despite failures or faults, the swarm produces
a performance equal to or greater than the same system with the faulty agents
entirely removed from the swarm. We show the equations comprising this metric
in Eqs. 4 - 6.
dP
R= (4)
dN
P − Po
dP = 1 − (5)
Po
Nf
dN = 1 − (6)
N
To calculate robustness in these experiments, faults must be injected into the
system. We inject two faults into comparison trials to gauge the performance of
the system when compromised. To simulate these robot faults we recreated a
common wheel fault, in which faulty agents have a constant directional bias.
We achieved this in simulation by creating an orthogonal vector to the right of
the agent at one tenth the magnitude of the current velocity. This caused faulty
agents to consistently spiral to the right, compromising avoidance and direction
of travel.
As a result of fault injection, we are able to gain values for the swarm’s
performance with no faults injected (Po ) and performance with faults injected
(P ). As mentioned before, across these trials the number of faults injected to
the swarm for testing (Nf ) was always 2, while N , the number of agents in the
swarm, would act as a variable parameter in our search space illumination.
Swarm size: 6
Select new genome at random from the fea- 1-1.25 0-5 15-20 52
Params
Fig. 3. Illustration of the map elites algorithm process with example parameters, char-
acteristic values and mutations.
In our use of Map-Elites we began with an empty feature space with charac-
teristics divided equally into 10 bins, using maximum and minimum values iden-
tified through initial testing of the simulation. Starting with a single unevaluated
parameter set, on each iteration thereafter we randomly selected 7 genomes cur-
rently filling the feature space to mutate. We then tested all 7 of the offspring in
parallel, repeating each test 4 times, taking the mean results from the repetitions
as the genome’s final characteristic and performance evaluation scores.
3 Results
The results of our application of Map-Elites can be seen in Fig. 4. These results,
displaying 28000 total parameter mutations (4000 repetitions of our 7 parallel
parameter mutation batches, executed until coverage of the feature space was
deemed adequate), show patterns forming in the relations between characteristics
and performance. To begin there is a clear gradient towards the top right of each
graph. This shows, somewhat intuitively, that there is a link between power
consumption, safety and performance. This is explained by fast moving or large
volumes of agents consuming large amounts of energy to quickly move boxes from
one side of the arena to the other. At these higher speeds agents also struggle
to effectively maintain distance from one another, relying more so on emergency
avoidance to prevent collisions. Interestingly, robustness does not seem to have
Search Space Illumination for Trustworthy Swarms 181
Looking more closely at the results from our application of Map-Elites, we con-
sider a case study in which users have specified a priority list for swarm charac-
teristics. We then take a closer look at the parameters that could be selected for
the representative genomes, highlighting how Map-Elites generated characteris-
tics can create an effective tool for swarm control.
Taking the feature space generated through our implementation of Map-
Elites, we produce radar charts to illustrate characteristic values. To display
the information intuitively, characteristics have been normalised to take values
between 0 and 1. Energy consumption and danger have been used to produce
values for energy efficiency (1 − E) and safety (1 − D) respectively. The resulting
characteristics thus constitute their greatest fitness at a value of 1, in line with
performance and robustness.
In our example case study, a user requires a swarm for a cluttered warehouse
in which humans will frequently interact with swarm agents during the day. At
night there will be no humans present in the warehouse and agents will rely on
energy stored via solar panels during the day. As a result, the operator specifies
priority lists 1 and 2 for day and night operation respectively:
Priority List 1
1. Safety 2. Performance 3. Energy Efficiency 4. Robustness
Priority List 2
1. Energy Efficiency 2. Performance 3. Safety 4. Robustness
In Fig. 5 we illustrate two radar charts showing the genomes that would be
presented to the operator for each of the stated priority lists. In the illustration
for priority list 1 we can see that example 1 shows the option with the highest
possible safety rating. However, this safety comes at a cost to all of the other
swarm features. By selecting other features with slightly lower safety scores,
we see improvements to both robustness and performance. Similarly, for the
radar chart given for priority list 2 we can see that example 4, the genome best
representing the operators priorities, gives very good results for energy efficiency,
safety and robustness but lacks performance. As energy efficiency is reduced in
examples 5 and 6 we see a notable increase in performance. However, we also
see a drop in robustness and safety.
Using these graphs to display swarm information in simple terms, we can
see how the search space illumination can be beneficial to a user. Operators
can make their selection, clearly see the trade-offs between the features they
have a primary interest in, and use this information to make informed swarm
characteristic selection.
Fig. 5. Radar diagrams each displaying the characteristics and performance of the
three example parameter sets that may be recommended for the specified operator
preferences.
trees [11], instead of simply mutating swarm parameters. Through the use of
this system, swarms could be equipped with vast repertoires of behaviour types,
deployed based on human understanding of characteristic requirements.
Another interesting implementation for this system would be to create
diversity amongst a swarm, taking clusters of neighbouring genomes to be
spread amongst agents. These selections may lead to more robust and effec-
tive behaviours with agents specialising and dividing labour to complement each
other’s strengths and weaknesses. This would reflect the benefits demonstrated
by natural examples of increased swarm diversity in social insects [23].
We also hope to investigate the viability of this system for user interaction
by performing a human study. This study would examine whether a system
similar to that presented in this paper would be perceived as beneficial and
more trustworthy than other high-level command techniques, or direct agent
control.
Finally, the generation of discrete sets of parameters seen in this paper could
enable rigorous validation of swarms. After the initial search space illumination,
the resultant parameter sets or behaviours could be taken through a series of
tests, evaluating the viability of the genomes, removing genomes from the fea-
ture space that are deemed unacceptable. Such a process would increase user
confidence in the system, adding an additional layer of safety to enforce the
trustworthiness of any behaviours or parameter sets selected via the user’s char-
acteristic requirements.
References
1. Ashcraft, C.C., Goodrich, M.A., Crandall, J.W.: Moderating operator influence in
human-swarm systems. In: 2019 IEEE International Conference on Systems, Man
and Cybernetics (SMC), pp. 4275–4282. IEEE (2019)
2. Birattari, M., et al.: Automatic off-line design of robot swarms: a manifesto. Front.
Robot. AI 6, 59 (2019)
3. Bossens, D.M., Tarapore, D.: Rapidly adapting robot swarms with swarm map-
based Bayesian optimisation. In: 2021 IEEE International Conference on Robotics
and Automation (ICRA), pp. 9848–9854. IEEE (2021)
4. Carrillo-Zapata, D., et al.: Mutual shaping in swarm robotics: user studies in fire
and rescue, storage organization, and bridge inspection. Front. Robot. AI 7, 53
(2020)
5. Cully, A., Clune, J., Tarapore, D., Mouret, J.B.: Robots that can adapt like ani-
mals. Nature 521(7553), 503–507 (2015)
6. Divband Soorati, M., Clark, J., Ghofrani, J., Tarapore, D., Ramchurn, S.D.:
Designing a user-centered interaction interface for human-swarm teaming. Drones
5(4), 131 (2021)
7. Engebraaten, S.A., Moen, J., Yakimenko, O.A., Glette, K.: A framework for auto-
matic behavior generation in multi-function swarms. Front. Robot. AI, 175 (2020)
8. Hogg, E., Hauert, S., Harvey, D., Richards, A.: Evolving behaviour trees for super-
visory control of robot swarms. Artif. Life Robot. 25(4), 569–577 (2020)
9. Jones, S., Milner, E., Sooriyabandara, M., Hauert, S.: Distributed situational
awareness in robot swarms. Adv. Intell. Syst. 2(11), 2000110 (2020)
Search Space Illumination for Trustworthy Swarms 185
10. Jones, S., Milner, E., Sooriyabandara, M., Hauert, S.: DOTS: an open testbed for
industrial swarm robotic solutions. arXiv preprint arXiv:2203.13809 (2022)
11. Jones, S., Studley, M., Hauert, S., Winfield, A.: Evolving behaviour trees for swarm
robotics. In: Groß, R., et al. (eds.) Distributed Autonomous Robotic Systems.
SPAR, vol. 6, pp. 487–501. Springer, Cham (2018). https://doi.org/10.1007/978-
3-319-73008-0 34
12. Kaiser, T.K., Hamann, H.: Evolution of diverse swarm behaviors with minimal
surprise. In: ALIFE 2020: The 2020 Conference on Artificial Life, pp. 384–392.
MIT Press (2020)
13. Kapellmann-Zafra, G., Salomons, N., Kolling, A., Groß, R.: Human-robot swarm
interaction with limited situational awareness. In: Dorigo, M., et al. (eds.) ANTS
2016. LNCS, vol. 9882, pp. 125–136. Springer, Cham (2016). https://doi.org/10.
1007/978-3-319-44427-7 11
14. Kim, L.H., Drew, D.S., Domova, V., Follmer, S.: User-defined swarm robot control.
In: Proceedings of the 2020 CHI Conference on Human Factors in Computing
Systems, pp. 1–13 (2020)
15. Lewis, M.: Human interaction with multiple remote robots. Rev. Hum. Factors
Ergon. 9(1), 131–174 (2013)
16. Milner, E., Sooriyabandara, M., Hauert, S.: Stochastic behaviours for retrieval of
storage items using simulated robot swarms. Artif. Life Robot. 27, 1–8 (2022)
17. Milner, E., Sooriyabandara, M., Hauert, S.: Swarm diffusion-taxis: transport of
spatial information for cooperative gradient-based navigation. In: AROB SWARM
(2022)
18. Milner, E., Sooriyabandara, M., Hauert, S.: Swarm performance indicators (SPIS).
In: Preparation, pp. 1–6 (2022)
19. Mouret, J.B., Clune, J.: Illuminating search spaces by mapping elites. arXiv
preprint arXiv:1504.04909 (2015)
20. Nahavandi, S.: Trusted autonomy between humans and robots: toward human-on-
the-loop in robotics and autonomous systems. IEEE Syst. Man Cybern. Magaz.
3(1), 10–17 (2017)
21. Partridge, A., et al.: ReRobot: recycled materials for trustworthy soft robots. In:
5th IEEE-RAS International Conference on Soft Robotics (2022)
22. Podevijn, G., O’Grady, R., Nashed, Y.S.G., Dorigo, M.: Gesturing at subswarms:
towards direct human control of robot swarms. In: Natraj, A., Cameron, S., Mel-
huish, C., Witkowski, M. (eds.) TAROS 2013. LNCS (LNAI), vol. 8069, pp. 390–
403. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-43645-5 41
23. Psalti, M.N., Gohlke, D., Libbrecht, R.: Experimental increase of worker diversity
benefits brood production in ants. BMC Ecol. Evol. 21(1), 1–10 (2021)
24. Schranz, M., Umlauft, M., Sende, M., Elmenreich, W.: Swarm robotic behaviors
and current applications. Front. Robot. AI 7, 36 (2020)
25. Thiebes, S., Lins, S., Sunyaev, A.: Trustworthy artificial intelligence. Electron.
Mark. 31(2), 447–464 (2021)
26. Urquhart, N., Guckert, M., Powers, S.: Increasing trust in meta-heuristics by using
map-elites. In: Proceedings of the Genetic and Evolutionary Computation Confer-
ence Companion, pp. 1345–1348 (2019)
27. Walker, P., Nunnally, S., Lewis, M., Chakraborty, N., Sycara, K.: Levels of automa-
tion for human influence of robot swarms. In: Proceedings of the Human Factors
and Ergonomics Society Annual Meeting, vol. 57, pp. 429–433. SAGE Publications
Sage CA: Los Angeles, CA (2013)
186 J. Wilson and S. Hauert
28. Walker, P., Nunnally, S., Lewis, M., Kolling, A., Chakraborty, N., Sycara, K.:
Neglect benevolence in human control of swarms in the presence of latency. In:
2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC),
pp. 3009–3014. IEEE (2012)
29. Wilson, J., Hauert, S.: Information transport in communication limited swarms.
In: 5th International Symposium on Swarm Behavior and Bio-Inspired Robotics
(2022)
Collective Gradient Following
with Sensory Heterogeneous UAV Swarm
1 Introduction
Coordinated and ordered collective motion of entities can be encountered in dif-
ferent species from bacteria to big mammals [17]. They gain various advantages
from this distributed organization. For some species, the advantage is improved
searching of a food source [6]. For others, it can be safety and enhanced survival
[11]. Among these advantages, an important one is emergent sensing. Emergent
sensing can be defined as the collective behavior arising from social interactions
that enable a swarm to perceive and react to a global feature in the environment
while individuals alone perceive insufficient information about this feature and
would not be able to react alone. A special case, considered in this paper, is
the ability to achieve collective gradient following with individuals having the
measurements limited to the scalar values of the gradient [3]. When individu-
als have the right decision and evaluation mechanisms, emergent sensing can be
achieved without direct information exchange of the environmental feature. Fish
schools couple this behavior with collective motion [2,13]. An individual fish
cannot estimate the light gradient in the environment and, hence, cannot follow
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 187–201, 2024.
https://doi.org/10.1007/978-3-031-51497-5_14
188 T. A. Karagüzel et al.
proportion of non-sensory agents in the swarm, for a wide range of swarm size.
In real life experiments, we emulate the peer and environmental sensing to show
similar behaviors are achievable with a real aerial vehicle dynamics, in a way
that does not violate the aforementioned constraints.
2 Methodology
In the equation above, the relative distance of a neighbor peer m is shown with
jφm
i . The bearing angle of neighbor m is represented with ∠e
dm i .
In ADM, the desired distance, i.e. the distance where the neighbor m pro-
duces no virtual forces on the focal agent, is given by dides = σi . Finally, α stands
for the proximal weight coefficient. The contribution of each perceived peer to
the magnitude of pi is:
190 T. A. Karagüzel et al.
0.5
σi σi
pm m
i (di , σi ) =− − m (2)
2 dm
i di
In the above equation, is the coefficient that tunes the strength of the proximal
vector. In addition to the peers positions, sensory agents can use the gradient’s
local value to compute the proximal vector. These scalar measurements modulate
dides through σi . In ADM, the desired distance of the focal agent (dides ) takes the
same value as the coefficient σi .
The value of σi of a sensory agent i is computed according to the scalar
perception of the gradient as follows:
0.2
G◦
σi = σc + σc (3)
Gmax − G◦
The scalar gradient perception of the focal agent is depicted by the term G◦
and the sensor threshold by Gmax . σc is a parameter which is kept constant
and simply determines upper and lower boundaries of σi on sensory agents. On
non-sensory agents, σi is always equal to σc . In other words, non-sensory agents
always sense G◦ as zero.
To calculate linear (Ui ) and angular (ωi ) speeds, first the proximal control
pi ) is projected on to agent’s local frame. The local frame of agent is
vector (
placed such that x-axis of this orthogonal and right-hand sided frame always
coincides with the heading direction. Next, speeds are calculated as follows where
K1 and K2 are respective gains of speeds and px and py are components of pi
on agent’s local frame:
Ui = K 1 p x , ω i = K2 p y (4)
Finally, linear speed (Ui ) is clamped between 0 and Umax whereas angular speed
(ωi ) is clamped between −ωmax and ωmax . This clamping implies that agents
cannot move backwards. In case of a sharp repulsion/attraction, agent changes
heading first and slowly increase the linear speed when the heading is diverged
enough. In addition, the rate of change of heading being finite implies that agent
changes heading over time, not instantaneously.
Our previous proposed method is called Desired Distance Modulation (DM)
[9]. The major differences between DM and ADM are the calculation of the
proximal control vector magnitude and the modulation of the desired distance.
In DM, the contribution of each neighbor peer to the magnitude of proximal
control vector of i is calculated as follows:
σi4 σi2
pm
i (d m
i , σ i ) = − 2 − (5)
(dm
i )
5 (dm
i )
3
Fig. 1. Proximal control vector magnitude vs neighbor distance. Positive values repre-
sent attraction, while negative repulsion.
√
Differently from ADM, √ σi correlates with the desired distance by a factor of 2
(dides = 21/2 σi ) in DM. 2 (dides = 21/2 σi ) in DM.
Figure 1a and Fig. 1b show the difference between the two methods of calcu-
lating, respectively, pm
i and σi . In Fig. 1a, positive values produce attraction and
negative ones repulsion. In addition, the points where the curves intersect the
red dashed line (no force) indicate the desired distances (dides ). As seen in Fig. 1a,
ADM produces attraction for the neighbors which are further than dides , up to
the sensing range, which is set to 3.0. In contrary, DM produces attraction only
for a short distance after dides and the attraction starts to decrease. Although the
form of DM’s proximal control vector helps on maintaining a stable collective
motion in the swarm [8], the form of ADM becomes advantageous when there
are non-sensory agents since sensory agents can sufficiently influence the whole
swarm. To maintain the stable collective motion, curve of ADM intersects the
res dashed line at a shorter distance than DM, implying a shorter desired dis-
tance. In Fig. 1b, higher σi stands for larger dides on both methods. In ADM, σi
changes drastically at high local perceived values to produce a impulse within
the swarm by using sensory agents, which guides the swarm towards increasing
gradient more successfully.
3 Experimental Setup
with a random error in range of [−eg , eg ]. All agents including non-sensory ones
can perceive relative peer distances and bearings within a limited interaction
range (Di = 3.0 m). Random error is added to the relative range and bearing
measurements within the ranges of [−er , er ] and [−eb , eb ] respectively. These
ranges are set to: er = 0.025 m, eb = 0.09 radians and eg = 2.5. We also tested
the following swarm sizes: 10, 20, 50, 100, 150, 200, 250. For a fair comparison,
the grid is modeled in a way that each cell edge measures 0.04 m. In addition,
larger swarms require larger gradient grid to counterbalance the effect of them
covering a larger area. To achieve that, the grid is scaled in correlation with the
swarm size while still preserving cell edge length. To assess the influence of non-
sensory agents on the swarm, the ratio of sensory agents is incremented by 10%
steps between 10% and 100% for each swarm size. Sensory agents are selected
randomly.
For each run of each setting, the swarm is initialized around a random point
chosen on the isoline (which is a circle because of our gradient design) carrying
values which are 25% of the maximum. This value was chosen to be challenging
while starting with the whole swarm inside the gradient, since starting from out-
side of the gradient would make our methods unusable. The initialization is done
within a circle large enough to host all agents with a safe distance among them
when uniformly scattered. Random perturbations are applied to the positions
before each run. The heading is randomly chosen for each agent. For each setting,
24 experiments are conducted. To assess a gradient following performance (GFP)
for swarms, the local value perceived by all agents are averaged over swarm and
recorded every 2 simulated seconds, together with agent states. Since larger
swarms move in larger gradients, they require respectively longer times. Hence,
the experiment duration increases proportionally to the swarm sizes (1620, 2292,
3623, 5123, 6275, 7245, 8100 simulated seconds). In all experiments of ADM and
DM, speed gains were chosen as K1 = 0.2, K2 = 0.1. Whereas σc was 0.6 for
ADM and 0.5 for DM. and α are chosen as 12.0 and 3.0 respectively, for both
methods. The maximum speed was 0.08 m/s and maximum angular speed was
3.14 rad/s.
To test our methods on agents which are under the constraints of aerial vehi-
cle dynamics, we chose Crazyflie 2.1 nano-drones1 . A swarm of 7 Crazyflie’s is
used for the experiments. In our setup, we use an Ultra-Wideband positioning
system developed for the Crazyflie2 . In the setup, each drone calculates its posi-
tion onboard in 3D. Since an off-the-shelf framework for onboard peer sensing
on Crazyflie still does not exist, we use the Crazyswarm project [12] and paral-
lelized controllers on central computer. Nevertheless in our central computer, we
guarantee that constraints on peer (Di ) and environmental sensing (defined in
Sect. 2) are never violated. During the experiment, each drone sends its position
to the central computer. Then, the control software calculates linear and angular
speeds on parallel controllers. Local value sensing is achieved in the same way
1
https://www.bitcraze.io/products/crazyflie-2-1/.
2
https://www.bitcraze.io/documentation/system/positioning/loco-positioning-
system/.
Collective Gradient Following with Sensory Heterogeneous UAV Swarm 193
with a simulated gradient grid. The control software detects which virtual cell
the real agent is in and the value of that cell is used in the velocity command
calculations. Finally, linear and angular speed commands are sent back to each
drone. Drones and control software exchange information 20 times a second.
Here, it is important to note that the heading does not correspond to the yaw
angle of the drone in our setup. Instead, the yaw is kept constant. The head-
ing only updates the instantaneous direction of motion. By doing so, we get rid
of extra control overhead of the yaw angle and still have an effective notion of
heading.
Fig. 2. Performance comparisons of DM and ADM with 50 agents and 50% sensory
agent ratio.
For each method, we tested the GFP for two cases: All agents are sensory
and only 4 out of 7 agents are sensory. To test each setting, flights are repeated
5 times, once for each region selected around 4 corners in addition to the center
point of our flight arena. Initial positions of the drones are chosen randomly
in the selected region. The dimensions of the flight arena are 6.5 m by 4.0 by
meters. Each flight test has a duration of 270 s. Finally, the variable parameters
of ADM in flight tests were: K1 = 0.4, K2 = 0.2 and σc = 0.7. In DM, they are
chosen as: K1 = 0.2, K2 = 0.1 and σc = 0.5. Maximum speeds, α and were
exactly the same with simulations. The parameter values for simulation and flight
experiments are manually tuned for the best performance by considering sensor
ranges and maximum allowed speeds at the same time. Some parameter values
change between flight and simulation experiments since the motion dynamics of
point agents and nano-drones are different from each other.
the minimum distance where the balance is achieved is smaller when compared
to DM.
When violins are observed (which shows distribution at a time instant), we
naturally see very similar properties with over-time distributions. Yet, it can be
seen that distances are clustered around several values, in a multimodal distribu-
tion. This indicates that neighbors are located as “neighborhood rings” around
the focal agents by having empty spaces between “rings”.
In Fig. 4, we present the trajectory of the swarm centroids (i.e., the center
of mass of the swarm in each experiment, not the individual agent trajectories)
and snapshots from simulations. For both methods, we can clearly see “neigh-
borhood rings” with aligned directions of motion when the swarm is following
the gradient. Centroid trajectories also indicate that ADM ascends the gradi-
ent successfully with a decisive direction of motion while DM trajectories show
disturbances and curves. Also in the top center snapshot, ADM forms a unique
emergent and dynamic formation whereby the swarm rotates around the max-
imum. In DM, the swarm wanders around the maximum by covering a larger
region without any emergent formation.
Next, we focus on scalability and look at the performances of the methods
for changing swarm sizes. For each swarm size, all possible sensory agent ratios
are observed. In addition, the interaction between agents within a larger swarm
is also analyzed. Median of time-averaged GFP distributions are presented in
Fig. 5.
With this new metric, we have an overall look at the performances for all
agent sizes and sensory agent ratios. We observe that performance with ADM is
196 T. A. Karagüzel et al.
Fig. 4. Centroid trajectories of the swarms for 24 repetitions and snapshots from phases
of a simulation run. DM in top row, ADM in bottom. Non-sensory agents are colored
in blue and sensory agents are colored in orange. (Color figure online)
Fig. 5. Median of time-averaged GFP of two methods for 24 repetitions. Left block
shows values for DM and right block for ADM. Blue box boundaries points out a group
separation during at least one of the experiments.
Collective Gradient Following with Sensory Heterogeneous UAV Swarm 197
consistently better than DM. Still, we can pin point the cases where ADM is not
at its best, specifically with low sensory agent ratio combined with small swarm
size or very large swarm size. When the swarm size is small, sensory agent’s total
influence become insufficient to guide the whole swarm. On the other hand when
swarm size is larger, visual inspections has shown reduced order. This results in
lower speed of the swarm centroid, which makes the time limit given insufficient
for swarm to gather higher values of the gradient. For DM, performances suffer
comparatively more from decreasing sensory agent ratio and never reach high
values for ratios lower than 50%. In addition, the blue box boundaries indicate
a group separation (N oG > 1). It is also another indicator of low performance
for DM, especially for larger swarm sizes.
Experiments with increased swarm size give us the opportunity to analyze
the scalability of the neighbor distances since we observed potential issues with
smaller swarms. Figure 6a and 6b show this information for 50% sensory agent
ratio. For ADM with 250 agents the desired and median neighbor distances are
very close to each other, similarly with 50 agents. Yet minimum neighbor distance
is clearly shorter. It can even be considered dangerous since it reaches near-zero
values which indicates a possible collisions for real robots. When those distances
are observed for DM, we can see that minimum neighbor distance never falls
below 0.5 m. Despite that, sensory agents with DM still show a difference for 50
and 250 agents: Difference between desired distance and minimum distance is
significantly higher for 250 agents. Especially in larger swarms with DM, agents
fail to obtain the emergent isolation between sensory and non-sensory agents.
Non-sensory agents with low desired distances become trapped among sensory
ones. Hence, they produce minimum agent to agent distances for sensory agents
similar to non-sensory ones.
Finally, we assess the performances of ADM and DM with aerial vehicle
dynamics. We observe the centroid trajectory of the nano-drone swarm (each
trajectory corresponds to the swarm centroid in one experiment) in addition
to calculating GFP. Figure 7 presents these centroid trajectories in addition to
mean time-averaged GFP of 5 experiments. Start points are labeled with S where
last recorded points are labeled with F.
What we can observe from the centroid trajectories is that when all agents
are sensory, both methods manage to follow the ascending gradient and stay
around the maximum for the rest of the experiment. However, when only 4
agents are sensory out of 7, DM struggles to follow the gradient. Even if the
swarm manages to reach the maximum, it does not stay there. In addition, since
we did not introduce any boundaries, swarms with DM leave the gradient (and,
hence, our flight arena) in 2 out of 5 experiments. For safety reasons, these
experiments are terminated earlier than the time limit and the last recorded
point is labeled with E. On the other hand, swarms with ADM show satisfactory
performances with only 4 sensory agents. Although the centroid wanders around
the maximum instead of staying there, we do not observe any behavior where
the swarm abandons it or outright leaves the gradient. In addition, the formation
we observed on simulation experiments also emerges in real flights with ADM.
198 T. A. Karagüzel et al.
Fig. 6. Distributions of neighbor distances for sensory and non-sensory agents of 250
agents with 50% sensory ratio
As a final remark, for both methods and for all settings, there were no crashes
or group separations. Drones always maintained a safe distance to each other
and maintained a cohesive group. Shots of drone swarms can be seen in Fig. 7.
Videos can be accessed on https://youtu.be/migbwjt EW0.
We now discuss the reported results at a higher level. Altogether, when over-
all performances of both methods are compared, with various swarm sizes and
sensory agent ratios, ADM clearly stands out. It shows better performances with
larger swarms and lower proportions of sensory agents, even down to 10% of all
swarm. Yet, for both methods, when the swarm is too small, the influence of
sensory agents becomes insufficient to guide the rest of the swarm. Conversely,
when the swarm becomes too large with low proportions of sensory agents, order
cannot be obtained and maintained. Thus, although the swarm moves towards
the ascending gradient, the performance ends up unsatisfactory within a given
time limit.
In neighbor distances data, we can observe the difference of methods on sen-
sory and non-sensory agent’s neighbor distances. In ADM, the gap is larger,
hence its performances are better. To understand this, we need to compre-
hend how both methods work in the first place. When sensory agents perceive
higher values, they increase their desired distances to others. In other words,
they become more “asocial”. Since non-sensory agents still have shorter desired
Collective Gradient Following with Sensory Heterogeneous UAV Swarm 199
Fig. 7. The swarm centroid trajectories of the nano-drone swarm and shots from flights
distances, they move to be closer to the others, in other words they become
more “social”. The chase of “social” to be closer to “asocial”, ends up with a
collective motion following increasing gradient. When the distance gap is more
significant between sensory and non-sensory, gradient following gets better. Yet,
we could not optimize DM manually to obtain larger difference in distances. In
that case, group separation becomes a serious problem. This is expected because
DM has a smaller attractive region in its proximal control vector (see Fig. 1a).
ADM does not suffer from that but, as a drawback, it cannot guarantee safe dis-
tance between agents, especially in larger swarms. DM guarantees safe distance
200 T. A. Karagüzel et al.
but suffers from low performance. The most optimized solution must be lying
between them, yet still not discovered.
We do not see the drawback of ADM in real flights since the number of
agents is lower than in simulations. Hence ADM performs better than DM in
nano-drone experiments as it does in simulation experiments.
5 Conclusion
References
1. Amorim, T., Nascimento, T., Petracek, P., De Masi, G., Ferrante, E., Saska, M.:
Self-organized UAV flocking based on proximal control. In: 2021 International Con-
ference on Unmanned Aircraft Systems (ICUAS), pp. 1374–1382. IEEE (2021)
2. Berdahl, A., Torney, C.J., Ioannou, C.C., Faria, J.J., Couzin, I.D.: Emergent sens-
ing of complex environments by mobile animal groups. Science 339(6119), 574–576
(2013)
3. Berdahl, A.M., et al.: Collective animal navigation and migratory culture: from
theoretical models to empirical evidence. Philos. Trans. Roy. Soc. B Biol. Sci.
373(1746), 20170009 (2018)
Collective Gradient Following with Sensory Heterogeneous UAV Swarm 201
1 Introduction
that visits all other nodes exactly once with minimal path length. The TSP can
be further extended to the multiple traveling salesman problem (mTSP), where
multiple agents collaborate with each other to visit all cities from a common
starting node with minimal cost. In particular, MinMax (minimizing the max
tour length among agents, i.e., total task duration) and MinSum (minimizing
total tour length) are two most common objectives for mTSP [1–3]. Although
simple in nature, the TSP and mTSP are ubiquitous in robotics problems that
need to address agent distribution, task allocation, and/or path planning, such
as multi-robot patrolling, last mile delivery, or distributed search/coverage. For
example, Faigl et al. [4], Obwald et al. [5] and Cao et al. [6] proposed methods for
robot/multi-robot exploration tasks, at the core of which is such a TSP/mTSP
instance, whose solution quality will influence the overall performance. More gen-
erally, dynamic environments are involved in many robotic deployments, where
the underlying graph may change with time and thus require the TSP/mTSP
solution be recomputed frequently and rapidly (within seconds or tens of seconds
if possible). While state-of-the-art exact algorithms (e.g., CPLEX [7], LKH3 [8],
and Gurobi [9]) can find near-optimal solutions to TSP instances in a few sec-
onds, exact algorithms become unusable for mTSP instances if computing time
is limited [2]. As an alternative, meta-heuristic algorithms like OR Tools [10]
were proposed to balance solution quality and computing time. In recent years,
neural-based methods have been developed to solve TSP instances [11–13] and
showed promising advantages over heuristic algorithms both in terms of com-
puting time and solution quality. However, neural-based methods for mTSP are
scarce[2,3]. In this work, we introduce DAN, a decentralized attention-based
neural approach for the MinMax mTSP, which is able to quickly and distribut-
edly generate tours that minimize the time needed for the whole team to visit
all cities and return to the depot (i.e., the makespan).
We focus on solving mTSP as a decentralized coop-
eration problem, where agents each construct their
own tour towards a common objective. To this end, we
rely on a threefold approach: first, we formulate mTSP
as a sequential decision-making problem where agents
make decisions asynchronously towards enhanced col-
laboration. Second, we propose an attention based
neural network to allow agents to make individual
decisions according to their own observations, which
provides agents with the ability to implicitly predict
other agents’ future decisions, by modeling the depen-
dencies of all the agents and cities. Third, we train our
Fig. 1. DAN’s final solu-
model using multi-agent reinforcement learning with
tion to an example mTSP
parameter sharing, which provides our model with problem.
natural scalability to arbitrary team sizes (Fig. 1).
We present test results on randomized mTSP
instances involving 50 to 1000 cities and 5 to 20 agents, and compare DAN’s
performance with that of exact, meta-heuristic, and dRL methods. Our results
204 Y. Cao et al.
2 Prior Works
For TSP, exact algorithms like dynamic programming and integer program-
ming can theoretically guarantee optimal solutions. However, these algorithms do
not scale well. Nevertheless, exact algorithms with handcrafted heuristics (e.g.,
CPLEX [7]) remain state-of-the-art, since they can reduce the search space effi-
ciently. Neural network methods for TSP became competitive after the recent
advancements in dRL. Vinyals et al. [11] first built the connection between deep
learning and TSP by proposing the Pointer network, a sequence-to-sequence
model with a modified attention mechanism, which allows one neural network to
solve TSP instances composed of an arbitrary number of cities. Kool et al. [13]
replaced the recurrent unit in the Pointer Network and proposed a model based
on a Transformer unit [14]. Taking the advantage of self-attention on modeling
the dependencies among cities, Kool et al.’s achieved significant improvement in
term of solution quality.
In mTSP, cities need to be partitioned and allocated to each agent in addi-
tion to finding optimal sequences/tours. This added complexity makes state-of-
the-art exact algorithm TSP solvers impractical for larger mTSP instances. OR
Tools [10], developed by Google, is a highly optimized meta-heuristic algorithm.
Although it does not guarantee optimal solutions, OR Tools is one of the most
popular mTSP solvers because it effectively balances the trade-off between solu-
tion quality and computing time. A few recent neural-based methods have also
approached the mTSP in a decentralized manner. Notably, Hu et al. [2] proposed
a model based on a shared graph neural network and distributed attention mech-
anism networks to first allocate cities to agents. OR Tools can then be used to
quickly generate the tour associated with each agent. Park et al. [3] presented an
end-to-end decentralized model based on graph attention network, which could
solve mTSP instances with arbitrary numbers of agents and cities. They used a
type-ware graph attention mechanism to learn the dependencies between cities
and agents, where the extracted agents’ feature and cities’ features are then con-
catenated during the embedding procedure before outputting the final policy.
3 Problem Formulation
The mTSP is defined on a graph G = (V, E), where V = {1, ..., n} is a set of n
nodes (cities), and E is a set of edges. In this work, we consider G to be complete,
i.e., (i, j) ∈ E for all i = j. Node 1 is defined as the depot, where all m agents
DAN: Decentralized Attention-based Neural Network for the MinMax mTSP 205
are initially placed, and the remaining nodes as cities to be visited. The cities
must be visited exactly once by any agent. After all cities are visited, all agents
return to the depot to finish their tour. Following the usual mTSP statement
in robotics [15], we use the Euclidean distance between cities as edge weights,
i.e.,
this work addresses
the Euclidean mTSP. We define a solution to mTSP
π = π 1 , ..., π m as a set of agent tours. Each agent tour π i = (π1i , π2i , ..., πni i ) is
an ordered set of the cities visited by this agent, where πti ∈ V and π1i = πni i is the
m
depot. ni denotes the number of cities in this agent tour, so i=1 ni = n+2m−1
since all agent tours involve the depot twice. Denoting the Euclidean distance
n
i −1
between cities i and j as cij , the cost of agent i’s tour reads: L(π i ) = cπji πj+1
i .
j=1
As discussed in Sect. 1, we consider MinMax (minimize max L(π i )) as the
i∈{1,...,m}
objective of our model.
4 mTSP as an RL PROBLEM
In this section, we cast mTSP into a decentralized multi-agent reinforcement
learning (MARL) framework. In our work, we consider mTSP as a cooperative
task instead of an optimization task. We first formulate mTSP as a sequential
decision making problem, which allows RL to tackle mTSP. We then detail the
agents’ state and action spaces, and the reward structure used in our work.
Observation. We consider a fully observable world where each agent can access
the states of all cities and all agents. Although a partial observation is more
206 Y. Cao et al.
In this work, the reward is simply the negative of the max tour length among
agents: R(π) = − max (L(π i )), and all agents share it as a global reward.
i∈{1,..,m}
This reward structure is sparse, i.e., agents only get rewarded after all agents
finish their tours.
Attention Layer. The Transformer attention layer [14] is used as the funda-
mental building block in our model. The input of such an attention layer con-
sists of the query source hq and the key-and-value source hk,v , which are both
vectors with the same dimension. The attention layer updates the query source
using the weighted sum of the value, where the attention weight depends on the
similarity between query and key. We compute the query qi , key ki and value vi
as: qi = W Q hqi , ki = W k hk,v v k,v Q K
i , vi = W hi , where W , W , W
V
are all learn-
able matrices with size d × d. Next, we compute the similarity uij between the
qiT ·kj
query qi and the key kj using a scaled dot product: uij = √ . Then we calculate
d
uij
the attention weights aij using a softmax: aij = ne . Finally, we compute a
uij
j=1 e
weighted
n sum of these values as the output embedding from this attention layer:
hi = j=1 aij vj . The embedding content is then passed through the feed forward
sublayer (containing two linear layer and a ReLU activation). Layer normalization
and residual connections are used within these two sublayers as in [14].
City Encoder. The city encoder is used to extract features from the cities
state sci and model their dependencies. The city encoder first embeds the relative
Euclidean coordinates xci of city i, i ∈ {2, ..., n} into a d-dimensional (d = 128
208 Y. Cao et al.
Fig. 2. DAN consists of a city encoder, an agent encoder, a city-agent encoder and
a final decoder, which allows each agent to individually process its inputs (the cities
states, and the agents states), to finally obtain its own city selection policy. In partic-
ular, the agent and city-agent encoders are introduced in this work to endow agents
with the ability to predict each others’ future decision and improve the decentralized
distribution of agents.
in practice) initial city embedding hci using a linear layer. Similarly, the depot’s
Euclidean coordinates xc1 are embedded by another linear layer to hc1 . The initial
city embedding is then passed through an attention layer. Here hq = hk,v = hc ,
as is commonly done in self-attention mechanisms. Self-attention achieved good
performance to model the dependencies of cities in single TSP approaches [13],
and we propose to rely on the same fundamental idea to model the dependencies
in mTSP. We term the output of the city encoder, h c , the city embedding. It
contains the dependencies between each city i and all other cities.
Agent Encoder. The agent encoder is used to extract features from the agents
state sai and model their dependencies. A linear layer is used to separately embed
each (3-dimensional) component of sai into the initial agent embedding hai . This
embedding is then passed through an attention layer, where hq = hk,v = ha .
We term the output of this encoder, h a the agent embedding. It contains the
dependencies between each agent i and all other agents.
Decoder. The decoder is used to decode the different embeddings into a policy
for selecting the next city to visit. The decoder starts with encoding the deciding
agent’s current state. We choose to express the current agent state implicitly by
computing an aggregated embedding hs which is the mean of the city embedding.
This operation is similar to the graph embedding used in [13].
The first attention layer then adds the agent embedding to the aggregated
embedding. In doing so, it relates the state of the deciding agent to that of all
other agents. Here hq = hs and hk,v = h a . This layer outputs the current state
s
embedding h . After that, a second attention layer is used to compute the final
candidate embedding hf , where hq = h s , hk,v = hca . This layer serves as a
glimpse which is common to improve attention mechanisms [12]. There, when
computing the similarity, we rely on the global mask M to manually set the sim-
ilarity ui = −∞ if the corresponding city i has already been visited to ensure the
attention weights of visited cities are 0. The final candidate embedding hf then
passes through a third and final attention layer. The query source is the final
candidate embedding hf , and the key source is the city-agent embedding hca . For
this final layer only, following [11], we directly use the vector of attention weights
as the final policy for the deciding agent. The same masking operation is also
applied in this layer to satisfy the mTSP constraint. These similarities are nor-
malized using a Softmax operation, to finally yield the probability n distribution
p for the next city to visit: pi = pθ (πtj = i|sci , sai , M ) = eui / i=1 eui .
6 Training
In this section, we describe how DAN is trained, including the choice of hyper-
parameters and hardware used.
of a common neural network, thus making the training more efficient by relying
on the sum of experience from all agents. Our model is trained on a worksta-
tion equipped with a i9-10980XE CPU and four NVIDIA GeForce RTX 3090
GPUs. We train our model utilizing Ray, a distributed framework for machine
learning [18] to accelerate training by parallelizing the code. With Ray, we run
8 mTSP instances in parallel and pass gradients to be applied to the global
network under the A2C structure [19]. At each training episode, the positions
of cities are generated uniformly at random in the unit square [0, 1]2 and the
velocity of agents is set to Δg = 0.1. The number of agent is randomized within
[5, 10] and the number of cites is randomized within [20, 100] during early train-
ing, which needs 96 h to converge. After initial convergence of the policy, the
number of cities is randomized within [20, 200] for further refinement, requiring
24 h of training. We formulate one training batch after 8 mTSP instances are
solved, and perform one gradient update for each agent. We train the model with
the Adam optimizer [20] and use an initial learning rate of 10−5 and decay every
1024 steps by a factor of 0.96. Every 2048 steps we update the baseline model if
the current training model is significantly better than the baseline model accord-
ing to a t-test. Our full training and testing code is available at https://bit.ly/
DAN mTSP.
7 Experiments
We test our decentralized attention-based neural network (DAN) on numerous
sets of 500 mTSP instances each, generated uniformly at random in the unit
square [0, 1]2 . We test two different variants of our model, which utilize the same
trained policy differently:
– Greedy: each agent always selects the action with highest activation in its
policy.
– Sampling: each agent selects the city stochastically according to its policy.
For our sampling variant, we run our model multiple times on the same instance
and report the solution with the highest quality. While [13] sample 1280 solutions
for single TSP, we only sample 64 solutions (denoted as s.64) for each mTSP
instance to balance the trade-off between computing time and solution quality.
In the test, the velocity of agents is set to Δg = 0.01 to improves the performance
of our model by allowing more finely-grained asynchronous action selection.
7.1 Results
We compare the performance of our model with both conventional and neural-
based methods. For conventional methods, we test OR Tools, evolutionary algo-
rithm (EA), and self organizing maps (SOM) [21] on the same test set. OR Tools
initially gets a solution using meta-heuristic algorithms (path-cheapest-arc) and
then further improves it by local search (2-opt) [22] (denoted as OR). We allow
OR Tools to run for a similar amount of time as our sampling variant, for fair
DAN: Decentralized Attention-based Neural Network for the MinMax mTSP 211
Table 1. Results on random mTSP set (500 instances each). n denotes the number of
cities and m denotes the number of agents
n=50 m=5 n=50 m=7 n=50 m=10 n=100 m=5 n=100 m=10
Method
Max. T(s) Max. T(s) Max. T(s) Max. T(s) Max. T(s)
EA 2.35 7.82 2.08 9.58 1.96 11.50 3.55 12.80 2.75 17.52
SOM 2.57 0.76 2.30 0.78 2.16 0.76 3.10 1.58 2.41 1.58
OR 2.04 12.00 1.96 12.00 1.96 12.00 2.36 18.00 2.29 18.00
Gurobi 2.54 3600 2.42 3600 7.29 3600 7.17 3600
Kool et al. 2.84 0.20 2.64 0.22 2.52 0.25 3.28 0.37 2.77 0.41
Park et al. 2.37 2.18 2.10 2.88 2.23
Hu et al. 2.12 0.01 1.95 0.02 2.48 0.04 2.09 0.04
DAN(g.) 2.29 0.25 2.11 0.26 2.03 0.30 2.72 0.43 2.17 0.48
DAN(s.64) 2.12 7.87 1.99 9.38 1.95 11.26 2.55 12.18 2.05 14.81
Kool et al. 2.64 0.46 3.27 0.78 2.92 0.83 2.77 0.89
Park et al. 2.16 2.50 2.38 2.44
Table 2. Results on the large-scale mTSP set (500 instances each) where the number
of agents is fixed to 10.
by simply replacing our neural network structure with theirs in our distributed
RL framework.
Table 2 shows the average MinMax cost for large-scale mTSP instances (from
500 to 1000 cities), where the number of agents is fixed to 10 (due to the limita-
tion of Hu et al.’s model). When testing our sampling variant, we set C = 100 in
the third decoder layer for efficient exploration (since the tour is much longer).
Except DAN and Hu et al.’s model, no method can handle such large-scale
mTSP within reasonable time, but we still report the results of OR Tools from
Hu et al.’s paper, as well as SOM results as the best-performing meta-heuristic
algorithms for completeness.
Table 3 shows the test results on TSPlib [23] instances, a well-known bench-
mark library, where city distributions come from real-world data. There, we
extend the computing time of OR Tools to 600s and increase the sample size of
DAN to 256, to yield solutions as optimal as possible. Note that the computing
time of DAN never exceeds 100s. LKH3 results are reported from [8], where long
enough computing time were allowed to yield exact solutions.
7.2 Discussion
8 Conclusion
References
1. Kaempfer, Y., Wolf, L.: Learning the multiple traveling salesmen problem with
permutation invariant pooling networks. arXiv preprint arXiv:1803.09621 (2018)
2. Hu, Y., Yao, Y., Lee, W.S.: A reinforcement learning approach for optimizing
multiple traveling salesman problems over graphs. Knowl.-Based Syst. 204, 106244
(2020)
3. Park, J., Bakhtiyar, S., Park, J.: ScheduleNet: learn to solve multi-agent scheduling
problems with reinforcement learning. arXiv preprint arXiv:2106.03051 (2021)
4. Faigl, J., Kulich, M., Přeučil, L.: Goal assignment using distance cost in multi-robot
exploration. In: 2012 IEEE/RSJ International Conference on Intelligent Robots
and Systems, pp. 3741–3746. IEEE (2012)
5. Oßwald, S., Bennewitz, M., Burgard, W., Stachniss, C.: Speeding-up robot explo-
ration by exploiting background information. IEEE Robot. Autom. Lett. 1(2),
716–723 (2016)
6. Chao, C., Hongbiao, Z., Howie, C., Ji, Z.: TARE: a hierarchical framework for
efficiently exploring complex 3D environments. In: Robotics: Science and Systems
Conference (RSS). Virtual (2021)
7. IBM: CPLEX Optimizer (2018). https://www.ibm.com/analytics/cplex-optimizer
8. Helsgaun, K.: An extension of the Lin-Kernighan-Helsgaun TSP solver for con-
strained traveling salesman and vehicle routing problems. Roskilde University,
Roskilde (2017)
DAN: Decentralized Attention-based Neural Network for the MinMax mTSP 215
1 Introduction
that have the ith node as a child node is denoted by Ni = {j ∈ V : (j, i) ∈ E} and
it is called neighborhood of the ith node. In accordance to this nomenclature,
we call a neighbor of a node i a node j in Ni . The adjacency matrix A = [aij ]
associated with the graph G is an object that maps the edge set into a matrix
as,
0, if i = j or (j, i)E,
aij = (1)
ωij (t), if (j, i) ∈ E,
where ωij (t) > 0 is a positive number that maps the transmission rate of infor-
mation from node j to node i, which we will examine in more detail below.
data transactions, for example on data fusion tasks. Also, estimates techniques
might help to alleviate such an assumption.
We model the current state of a robot as an indicator random variable Xi (t) ∈
{0, 1}, such that Xi (t) = 1 if the ith robot is informed at time t, and Xi (t) = 0
otherwise. Note that Xi (t) is a Bernoulli random variable, therefore the evolution
of the network expectation of broadcast of information can be determined by the
evolution of the probability Pr{Xi (t) = 1 : Xj (0) = 1} for any pair (i, j), over
time t > 0. The joint state of the network maps the set of agents that hold the
information in a given instant. Since Xi (t) ∈ {0, 1} for each i ∈ V, it is clear that
the joint states of the network evolve in a 2n state-space. Inspired by [19,24] in
epidemiological studies, we define a time-dependent state-space vector for the
whole network as
Fig. 1. Representation of all possible states of a network with three agents. The arrows
represent possible switching between states.
One could try to model the states of the whole network directly as X(t) =
[X1 (t) X2 (t) · · · Xn (t)], since that would reduce the dimension of the represen-
n
tation from R2 to Rn . However, as shown in [19], the evolution of E[X(t)] repre-
sents only a marginal probability distribution of the states of the network, and this
information is not enough to describe the evolution of Pr{Xi (t) = 1 : Xj (0) = 1}
for any t > 0. Therefore, if we are interested in computing the expectation of
future states given a current distribution, using X(t) alone is not enough. Alter-
native methods to reduce the dimension of the state-space include mean-field
type approximations and moment-closure methods (please, see [19] and refer-
ences therein). Typically those methods are valid under the assumption of large
Receding Horizon Control on the Broadcast of Information 221
numbers and have reduced precision about the underlying stochastic process.
Indeed, as pointed out in [25], in general, mean-field approximations do not
consistently provide either an upper- nor a lower-bound for general spreading
processes.
Since Y (t) is an exact characterization of the states of the network, an
expression for Pr{Yi (t) = 1 : Y (0)} can be obtained using a continuous-time
Markov chain with 2n states, determined by the following infinitesimal genera-
n n
tor Q(t) = [qij (t)] ∈ R2 ×2 [24],
⎧ n m−1
⎨ k=1 ωmk Xk , for m = 1, ..., n, Xm = 1 if i = j − 2 ,
n
qij (t) = − k=1 qkj , if i = j, (2)
⎩
0, otherwise,
for all Xk ∈ {0, 1}, in which the corresponding Markov chain evolves according
the following Kolmogorov foward equation
dv(t)
= v(t) Q(t), (3)
dt
where the state vector is defined as v(t) = [v0 (t) · · · v2n−1 ] with elements
vi (t) = Pr{Yi (t) = 1} and satisfies i vi = 1 for all t > 0.
In this paper, we are interested in enforcing a desired evolution for the propa-
gation of information on the network (i.e., Pr{Xi (t) → 1 : Xj (0) = 1}) exponen-
tially fast, by closing the loop of the network with a feedback control strategy,
which chooses the control action from a finite set of possible actions and mini-
mizes a cost of available resources. Specifically, the problem investigated can be
stated as:
3 Methodology
3.1 Control Strategy
To focus on the transmission of information over the network and understand the
ability to shape the broadcast of information, we make the following assumption:
Assumption 1. The communication graph has a directed spanning tree with a root
corresponding to the robot that is the origin of the information to be transmitted.
This assumption implies that each edge of the spanning tree has a positive
transmission rate (i.e., the information is propagated with some positive base-
line transmission). In this scenario, it is clear that the broadcast of information
will eventually reach every robot on the network with probability 1. Hence, the
222 T. C. Silva et al.
main objective of our control problem can be viewed as an attempt to shape the
statistics of the broadcast of information on the network to satisfy a given con-
vergence criterion. It is worth noticing that most works on contact processes are
concerned with regular lattices or mean-field models [17,18]. Also, usually, the
desired convergence rate is performed through static strategies, targeting central
nodes, and using heuristics [18,21]. With our control strategy, the network can
adapt to the stochastic realization of the propagation of information and only
expend resources to actuate during realizations that might interfere with the
team’s desired behavior.
With the broadcast model in (3), we control the transmission rates ωij (t)
by applying allowable actions based on the current state of the network and on
predicted future states, sampled on predefined times, following a sampled-data
predictive controller [22]. The controller we study in this paper is defined by the
following optimization problem:
Define the broadcast time as tb = inf{t ≥ 0 : S(X(t)) = n}, then its expectation
can be computed as
∞
E[tb : X(0)] = 1 − Pr(tb ≤ σ)dσ. (6)
0
e−rτ1
≤ τ1 + (n − S(X(0))Δt, (7)
1 − e−rΔt
t
with τ1 = inf{t ≥ 0 : (n − S(X(t)))e−r Δt Δt ≤ 1}.
which are lower- and upper-bounds for the joint probability Pr{A, B}, respec-
tively. Its main advantage over mean-field approximations is that it gives mean-
ingful bounds for the evolution of the real mean of the underlying stochastic
process (see [25] for a detailed discussion).
Initially, we need to reformulate our representation to consider the marginal
probabilities of each robot receiving the information, since we are interested in
drawing bounds on their evolution. For our problem, the marginal probabilities
can be written as the expectation of a robot becoming informed as a function of
its neighbors as,
dE[Xi (t)]
= E[X̄i (t)Xj (t)]ωij (t), (10)
dt
j∈Ni
for all i ∈ V, where X̄i (t) is the complement of Xi (t) (i.e., X̄i (t) = 1 if Xi (t) = 0).
This equation maps an infinitesimal probability of Xi receiving the information
given that some neighbors Xj , j ∈ Ni , have the message. Notice that (10)
is not closed, therefore we cannot solve it for E[Xi (t)] without equations for
E[X̄i (t)Xj (t)]. Central to the result in [25] is to bound the joint probabilities of
the form E[X̄i Xj ]. Next, we show the optimal approximations for the expecta-
tions we employ, which can be expressed as closed-form functions.
Receding Horizon Control on the Broadcast of Information 225
Lemma 1 (Watkins et al.,[25]). Let x(0) = X(0) = x(0), define S(X) as the
number of nodes informed in X. It holds that
E[S(X(t)) : X(0)] ≤ i (t), 1 − xi (t) , and
min xinf non
(11)
i∈V
E[S(X(t)) : X(0)] ≥ i (t), 1 − xi (t) ,
max xinf non
(12)
i∈V
where the variables xi (t) and xi (t), for each ∈ L = {inf, non} are the solutions
of the following ordinary differential equations,
ẋi = Fupr (Bxi (xi ), xj )ωij (t), (13)
j∈Ni ∈L
ẋi = Flwr (xi , xj )ωij (t), (14)
j∈Ni ∈L
and Bxi (xi ) = min 1 − xi , xi } is the upper-bounding operator. In addition, the
following inclusion is always satisfied
E[Xi (t) : X(0)] ∈ xi (t), xi (t) ⊆ 0, 1 ,
for all t ≥ 0.
The result in Lemma 1 allows us to compute bounds for the evolution of the
probabilities of the broadcast of information in the network using 4n differential
equations (equations (13) and (14) for each informed and non-informed node),
instead of 2n . This enables us to apply the receding horizon strategy in equation
(4) in larger networks.
myriad of cost functions that align with different design purposes. Here we chose
2
C(Xi (t)) = j∈Ni ωij . If we were not seeking an exponential propagation, this
objective function, together with the constraints on the transmission resource
ωij (t) ∈ (0, μ), and j∈Ni ωij = μ for each i ∈ V, would push for an even distri-
bution of every robot’s resources between its neighbors. We chose Δt = 1, and
r = 0.22 for both CTMC and RMC models, and simulated our proposed con-
troller which guarantees an exponential propagation while minimizes the chosen
cost. The original exact model is also simulated without any control method in
the loop. We run all three models 10 times each, and we took their average as the
performance results. The performance comparison result is shown in Fig. 2 and
the numerical result at different stages of the propagation process is in Table 1.
From Fig. 2a, the CTMC model with control outperforms two other models,
the RMC model with control is slightly slower than the CTMC, but still faster
than the original model without control. The CTMC model was in average 39.6%
faster than the open-loop model to informed all robots, and RMC model was
27.3% faster than the open-loop model completion time. Notice that the two pat-
terns in black and in red lines shown in Fig. 2b are similar–the biggest difference
number between CTMC and RMC is 3 robots which occurs around 2.5 s.
References
1. Bar-Noy, A., Schieber, B.: The canadian traveller problem. In: Proceedings of
the Second Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 261–270
(1991)
2. Brauer, F.: Compartmental models in epidemiology. In: Brauer, F., van den Driess-
che, P., Wu, J. (eds.) Mathematical Epidemiology, pp. 19–79. Springer, Heidelberg
(2008). https://doi.org/10.1007/978-3-540-78911-6 2
Receding Horizon Control on the Broadcast of Information 229
3. Breitenmoser, A., Schwager, M., Metzger, J.C., Siegwart, R., Rus, D.: Voronoi cov-
erage of non-convex environments with a group of networked robots. In: 2010 IEEE
International Conference on Robotics and Automation, pp. 4982–4989 (2010).
https://doi.org/10.1109/ROBOT.2010.5509696
4. Cassandras, C.G.: Smart cities as cyber-physical social systems. Engineering 2(2),
156–158 (2016)
5. Cator, E., Van Mieghem, P.: Second-order mean-field susceptible-infected-
susceptible epidemic threshold. Phys. Rev. E 85, 056111 (2012). https://doi.org/
10.1103/PhysRevE.85.056111
6. Ellis, M., Durand, H., Christofides, P.D.: A tutorial review of economic model
predictive control methods. J. Process Control 24(8), 1156–1178 (2014)
7. Gusrialdi, A., Hirche, S., Hatanaka, T., Fujita, M.: Voronoi based coverage control
with anisotropic sensors. In: Proceedings of the IEEE American Control Confer-
ence, pp. 736–741 (2008)
8. Hollinger, G.A., Singh, S.: Multirobot coordination with periodic connectivity:
theory and experiments. IEEE Trans. Rob. 28(4), 967–973 (2012)
9. Hsieh, M.A., Kumar, V., Chaimowicz, L.: Decentralized controllers for shape gen-
eration with robotic swarms. Robotica 26(5), 691–701 (2008)
10. Huang, L., Park, K., Lai, Y.C.: Information propagation on modular networks.
Phys. Rev. E 73(3), 035103 (2006)
11. Khelil, A., Becker, C., Tian, J., Rothermel, K.: An epidemic model for information
diffusion in manets. In: Proceedings of the 5th ACM International Workshop on
Modeling Analysis and Simulation of Wireless and Mobile Systems, pp. 54–60
(2002)
12. Khodayi-Mehr, R., Kantaros, Y., Zavlanos, M.M.: Distributed state estimation
using intermittently connected robot networks. IEEE Trans. Robot. 35(3), 709–
724 (2019). https://doi.org/10.1109/TRO.2019.2897865
13. Knizhnik, G., Li, P., Yu, X., Hsieh, M.A.: Flow-based control of marine robots in
gyre-like environments. In: 2022 IEEE international Conference on Robotics and
Automation (ICRA). IEEE (2022)
14. Moarref, S., Kress-Gazit, H.: Automated synthesis of decentralized controllers for
robot swarms from high-level temporal logic specifications. Auton. Robot. 44(3),
585–600 (2020)
15. Nikolova, E., Karger, D.R.: Route planning under uncertainty: the Canadian trav-
eller problem. In: Proceedings of the 23rd National Conference on Artificial Intel-
ligence (AAAI 2008), pp. 969–974. AAAI Press (2008)
16. Øksendal, B., Sulem, A.: Applied Stochastic Control of Jump Diffusions, 2nd edn.
Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-69826-5
17. Paré, P.E., Beck, C.L., Nedić, A.: Epidemic processes over time-varying networks.
IEEE Trans. Control Netw. Syst. 5(3), 1322–1334 (2018). https://doi.org/10.1109/
TCNS.2017.2706138
18. Preciado, V.M., Zargham, M., Enyioha, C., Jadbabaie, A., Pappas, G.J.: Opti-
mal resource allocation for network protection against spreading processes. IEEE
Trans. Control Netw. Syst. 1(1), 99–108 (2014). https://doi.org/10.1109/TCNS.
2014.2310911
19. Sahneh, D.F., Scoglio, C., Van Mieghem, P.: Generalized epidemic mean-field
model for spreading processes over multilayer complex networks. IEEE/ACM
Trans. Netw. 21(5), 1609–1620 (2013). https://doi.org/10.1109/TNET.2013.
2239658
230 T. C. Silva et al.
20. Schnoerr, D., Sanguinetti, G., Grima, R.: Comparison of different moment-closure
approximations for stochastic chemical kinetics. J. Chem. Phys. 143(18), 185101
(2015). https://doi.org/10.1063/1.4934990
21. Shen, L., Yu, X., Hsieh, M.A.: Topology control of a periodic time-varying com-
munication network with stochastic temporal links. In: 2022 American Control
Conference (ACC), pp. 4211–4217. IEEE (2022)
22. Sopasakis, P., Patrinos, P., Sarimveis, H.: MPC for sampled-data linear systems:
guaranteeing constraint satisfaction in continuous-time. IEEE Trans. Autom. Con-
trol 59(4), 1088–1093 (2014). https://doi.org/10.1109/TAC.2013.2285786
23. Twigg, J., Dagefu, F., Chopra, N., Sadler, B.M.: Robotic parasitic array optimiza-
tion in outdoor environments. In: 2019 IEEE International Symposium on Safety,
Security, and Rescue Robotics (SSRR), pp. 1–8 (2019). https://doi.org/10.1109/
SSRR.2019.8848974
24. Van Mieghem, P., Cator, E.: Epidemics in networks with nodal self-infection and
the epidemic threshold. Phys. Rev. E 86, 016116 (2012). https://doi.org/10.1103/
PhysRevE.86.016116
25. Watkins, N.J., Nowzari, C., Pappas, G.J.: A robust moment closure for general
continuous-time epidemic processes. In: 2018 IEEE Conference on Decision and
Control (CDC), pp. 244–249 (2018). https://doi.org/10.1109/CDC.2018.8618664
26. Watkins, N.J., Nowzari, C., Pappas, G.J.: Robust economic model predictive con-
trol of continuous-time epidemic processes. IEEE Trans. Autom. Control 65(3),
1116–1131 (2020). https://doi.org/10.1109/TAC.2019.2919136
27. Wei, C., Yu, X., Tanner, H.G., Hsieh, M.A.: Synchronous rendezvous for networks
of active drifters in gyre flows. In: Distributed Autonomous Robotic Systems, pp.
413–425. Springer, Cham (2019)
28. Yu, X., Hsieh, M.A.: Synthesis of a time-varying communication network by robot
teams with information propagation guarantees. IEEE Robot. Automat. Lett. 5(2),
1413–1420 (2020)
29. Yu, X., Hsieh, M.A., Wei, C., Tanner, H.G.: Synchronous rendezvous for networks
of marine robots in large scale ocean monitoring. Front. Robot. AI 6, 76 (2019)
30. Yu, X., Saldaña, D., Shishika, D., Hsieh, M.A.: Resilient consensus in robot swarms
with periodic motion and intermittent communication. IEEE Trans. Robot. 38,
110–125 (2021). https://doi.org/10.1109/TRO.2021.3088765
31. Zanette, D.H.: Dynamics of rumor propagation on small-world networks. Phys.
Rev. E 65(4), 041908 (2002)
Adaptation Strategy for a Distributed
Autonomous UAV Formation in Case
of Aircraft Loss
Tagir Muslimov(B)
1 Introduction
For a decentralized formation of unmanned aerial vehicles (UAVs), the forma-
tion needs to be capable of reconfiguration should one of the UAVs fail. Some
papers approach this problem by developing algorithms to optimize the recovery
or maintenance of connectivity. For instance, paper [1] investigates reconfigur-
ing the communication topology to minimize the communication cost in case of
the UAVs in the formation losing their links. When considering a UAV forma-
tion as a multiagent system, existing approaches become applicable, see, e.g.,
[2] for an overview. Paper [3] investigates the possibility of reaching consensus
should agents stop communicating. Work [4] covers the effectiveness of a swarm
intelligence in a multiagent system that loses its agents. Article [5] investigates
controlling a multiagent system when links between agents are lost. Paper [6]
proposes a novel approach that relies on a Negative Imaginary (NI) theory-based
control algorithm that keeps a decentralized system of single integrators stable
when several agents are lost. Work [7] covers an agent loss-tolerant multiagent
system. One of the approaches covered therein consists in adding redundant links
to make the group robust to agent loss.
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 231–242, 2024.
https://doi.org/10.1007/978-3-031-51497-5_17
232 T. Muslimov
agents in the formation. In other words, the UAV agent dynamics contains the
state variables for the limited set of adjacent UAV agents. Interaction refers
herein to receiving the state readings of the adjacent UAV agents. Let N (qi ) =
{ql |ql is interacting with qi } be the set of such adjacent UAVs.
Each pair of interacting UAVs, e.g., Si and Sj , have a linear functional depen-
dency relationship as pk = pk (qi , qj ), where k = 1, . . . , n are the ordinal num-
bers of interactions. The difference between the respective subsystem states is an
T
example of such functional dependency. Consider the vector P = [p1 , . . . , pn ] .
In the case of a decentralized UAV formation, this vector can be regarded as
a vector of current phase shifts in circular motion around the target object.
The functional dependencies in the formation can be written as the equation
P = LQ, where L ∈ Rn×m .
Control consists in finding such subsystem dynamics q̇i = fi (N (qi )) that the
T
functional dependencies P converge to the desired values Pd = [pd1 , . . . , pdn ] . In
the case of decentralized UAV formation, vector Pd can be considered as a vector
of desired phase shifts in circular motion around the target object. For brevity,
let us refer to the desired geometric shape of the UAV formation (depends on
Pd ) as the pattern. An important assumption here is that the desired values of
the functional dependencies Pd are predetermined, i.e., there is such vector Q
that the condition LQ = Pd holds. Write such condition as (I − LL+ ) Pd = 0,
where L+ is the pseudoinverse of matrix L, and I is a unit matrix.
UAV formation control applies when the vector P converges to the vector
Pd , i.e., each element of the vector P converges to the correspondingly numbered
element of the vector Pd . However, the vector Pd is not always attainable for a
variety of reasons. For instance, it will not be attainable if one or more UAVs
in a formation fail. Thus, creating an adaptation (or adaptive reconfiguration)
algorithm boils down to developing a vector Pd tuning algorithm.
Yuasa and Ito published a series of papers [19,20] where they solved the
above-mentioned control problem on determining the dynamics of subsystems
Suppose that the subsystem dynamics can be described as q̇i = f¯i + f˜i (xi ),
T T
⊥
where f¯1 , . . . , f¯m = F̄ ∈ ker L and f˜1 , . . . , f˜m = F̃ ∈ (ker L) . Dynamics
of P is only determined by the summand f˜i (xi ) since LF̄ = 0. Given this, write
the dynamics of P as Ṗ = LF̃. Paper [18] also proves the following theorem.
Theorem 3 [18]. If f˜i (xi ) satisfies the following conditions: 1. f˜i (xi ) = 0 at the
˜
point xi = xdi , 2. ∂ f∂x
i (xi )
i
> 0, then X = Xd (here Xd = −LT Pd ) will be
xi =xid
one of the equilibrium states. Besides, if the following holds: 3.f˜i (xi )·(xi − xdi ) >
0, then X = Xd becomes the only equilibrium state.
The authors of [19] prove this theorem under an assumption that the dynam-
ics of P become a gradient system. When all the three conditions of this the-
orem are satisfied, X = Xd is the global minimum of the potential function
m
V (X) = i=1 f˜i (xi )dxi . Below is a variant of dynamics satisfying this theo-
rem:
q̇i = f¯i + fi+ (xi − xdi ) , (1)
where fi+ (·) is an odd increasing function.
Pd being unattainable means there is no such matrix Q that would satisfy the
equation LQ = Pd . From the above equations Xd = −LT Pd and X = −LT P,
derive X − Xd = −LT (P − Pd ). Apparently, X − Xd = 0 is necessary but
insufficient to satisfy P − Pd = 0. In other words, if P − Pd = 0 and also
P − Pd ∈ ker L, then X − Xd = 0.
Thus, Pd needs to be adaptively tuned. Such tuning involves estimating the
current system. From the engineering standpoint, it is reasonable to use the
function
1 n 2
E= (pi − pdi ) . (2)
2 i=1
Figure 2 shows the minimum of this function is attained when the desired pattern
Pd is formed; local interactions between the adjacent UAVs are null. If the
pattern Pd cannot be attained due to losing a UAV from the formation, then
the interaction is ongoing, and a new pattern emerges as a result of balancing
such interactions, see Fig. 2b. To reduce subsystem interaction while preserving
the newly emerged pattern, Pd needs to be tuned, see Fig. 2c.
As noted in [18], a UAV agent’s Si interaction as a subsystem can be defined
in terms of the equation:
Here, k is the ordinal number of the interaction, i.e., assuming that in the matrix
L, its elements Lki and Lkj are non-zero, that means the UAVs numbered i and
j interact as subsystems in the formation.
Based on (3), the second term of the Eq. (1) can be written as [18]
n
fi+ (xi − xdi ) = fi+ Iki .
k=1
Using (3), the objective function (2) can be transformed as follows [18]:
1 i 2
m n
1
E= I .
4 i=1 k=1
L2ki k
Lki =0
where ϕ∗ is the phase angle of UAVs orbiting the target, with the subscript
for the UAV’s ordinal number in the formation. As this is a four-UAV group,
i = 1, 2, 3, 4. Phase shifts should converge to the desired values determined by
T T
the vector of the pattern Pd = [D1 , D2 , D3 ] = [2π/3, 9π/13,18π/29] . Then we
can obtain
⎡ ⎤
−1 1 0 0
P = LQ, L = ⎣ 0 −1 1 0 ⎦ ,
0 0 −1 1
T T
where P = p12 p23 p34 and Q = ϕ1 ϕ2 ϕ3 ϕ4 . The kernel of the matrix L
T
in this case is defined as ker L = 1 1 1 1 , and from the relation X = −LT LQ,
it can be found that
⎡ ⎤ ⎡ ⎤
p12 −ϕ1 + ϕ2
⎢ −p12 + p23 ⎥ ⎢ ϕ1 − 2ϕ2 + ϕ3 ⎥
X=⎢ ⎥ ⎢ ⎥
⎣ −p23 + p34 ⎦ = ⎣ ϕ2 − 2ϕ3 + ϕ4 ⎦ .
−p34 ϕ3 − ϕ 4
UAV formation dynamics in case of encircling a target is defined as
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
ϕ̇1 ω1 vf (2/{πρ1 }) arctan (kθ (p12 − D1 ))
⎢ ϕ̇2 ⎥ ⎢ ω2 ⎥ ⎢ vf (2/{πρ2 }) arctan (kθ (−p12 + p23 + D1 − D2 )) ⎥
Q̇ = ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣ ϕ̇3 ⎦ = ⎣ ω3 ⎦ + ⎣ vf (2/{πρ3 }) arctan (kθ (−p23 + p34 + D2 − D3 )) ⎦ ,
ϕ̇4 ω4 vf (2/{πρ3 }) arctan (kθ (−p34 + D3 ))
where ω∗ and ρ∗ are the angular velocity of UAVs orbiting the target, and the
radius of such orbit, respectively; subscripts match the UAVs’ ordinal numbers
in the formation. Applying a nonlinear arctan (·) function limits the minima
and maxima of the speed control signal. Thus, it avoids the undesirable integral
windup that might cause the UAV formation to go unstable. Here vf is a positive
constant that determines the maximum added speed; kθ is a positive constant
that determines the smoothness of how the UAV reaches its orbit.
The dynamics of the pattern Ṗ, once the uniform radius ρ is attained, is
written as follows since ω1 = ω2 = ω3 = ω4 = v/ρ, where v is the final cruising
linear speed of the formation:
⎡ ⎤
−vf (2/{πρ}) arctan (kθ (p12 − D1 )) +
⎢ +vf (2/{πρ}) arctan (kθ (−p12 + p23 + D1 − D2 )) ⎥
⎢ ⎥
⎡ ⎤ ⎢ −p12 + p23 ⎥
ṗ12 ⎢ −v (2/{πρ}) arctan k + ⎥
⎢ ⎥
+D1 − D2
f θ
Ṗ = ⎣ ṗ23 ⎦ = ⎢
⎢
⎥.
⎥
⎢ −p23 + p34 ⎥
ṗ34 ⎢ +v (2/{πρ}) arctan k ⎥
⎢
f θ
+D2 − D3 ⎥
⎣ −vf (2/{πρ}) arctan (kθ (−p23 + p34 + D2 − D3 )) + ⎦
+vf (2/{πρ}) arctan (kθ (−p34 + D3 ))
We propose the following adaptation algorithm:
shift between UAV No. 1 and UAV No. 2, UAV No. 2 and UAV No. 4. The new
state of equilibrium is different from that of the 4-UAV formation.
In order to further illustrate the performance of the adaptation algorithm,
we conducted simulations at other initial positions of the UAV-agents in the
formation. This time, we used the following initial positions in the inertial coor-
T T T
dinate system: [230, 750, 100] , [150, 360, 100] , [810, 770, 100] . Here, the order
in which the initial position vectors are listed corresponds to the ordinal numbers
of the UAVs in the formation. The desired relative phase shift angles and other
simulation parameters were chosen the same as in the first scenario described
above. Figure 6a shows how the speeds of the UAVs changed in the absence of
the adaptation algorithm. Figure 6b shows how the speeds of the UAV changed
when the adaptation algorithm was running. From the graphs we can see the
effect of eliminating the increase in the final cruising speed of the formation.
Fig. 3. (a) 4-UAV formation trajectories; (b) relative phase errors in a 4-UAV formation
Fig. 4. (a) Phase shift errors in the resulting 3-UAV formation after losing 3rd UAV;
(b) UAV speeds in the resulting 3-UAV formation after losing 3rd UAV
240 T. Muslimov
Fig. 6. (a) UAV speeds in the resulting 3-UAV formation after losing 3rd UAV; (b)
adaptation algorithm performance in case of adaptively reconfiguring a UAV formation
after losing a single UAV
4 Conclusions
The adaptation strategy presented herein eliminates a negative effect that com-
pletely decentralized UAV formations tend to show when losing one or more
UAVs. This negative effect lies in the fact that in such a situation the final
cruising speed of the formation increases. If it is possible to take the UAV out
of formation in advance with fault diagnosis, then the proposed approach will
allow the overall mission to continue while maintaining high safety. The forma-
tion reconfiguration approach shown here could be applicable in more diverse
UAV formation control scenarios; the adaptation algorithms are further modifi-
able. This adaptation scenario also necessitates designing a diagnostics module
for UAVs; further work will present its models.
Adaptation Strategy for a Distributed Autonomous UAV Formation 241
Acknowledgements. This work was supported by the Ministry of Science and Higher
Education of the Russian Federation (Agreement No. 075-15-2021-1016).
References
1. Wang, G., Luo, H., Hu, X., Ma, H., Yang, S.: Fault-tolerant communication topol-
ogy management based on minimum cost arborescence for leader-follower UAV
formation under communication faults. Int. J. Adv. Robot. Syst. 14, 1–17 (2017)
2. Yang, H., et al.: Fault-tolerant cooperative control of multiagent systems: a survey
of trends and methodologies. IEEE Trans. Ind. Inform. 16, 4–17 (2020)
3. Rezaee, H., Parisini, T., Polycarpou, M.M.: Almost sure resilient consensus under
stochastic interaction: links failure and noisy channels. IEEE Trans. Autom. Con-
trol 66, 5727–5741 (2021)
4. Mann, V., Sivaram, A., Das, L., Venkatasubramanian, V.: Robust and efficient
swarm communication topologies for hostile environments. Swarm Evol. Comput.
62, 100848 (2021)
5. Abel, R.O., Dasgupta, S., Kuhl, J.G.: The relation between redundancy and con-
vergence rate in distributed multi-agent formation control. In: 2008 47th IEEE
Conference on Decision and Control, pp. 3977–3982. IEEE (2008)
6. Hu, J., Lennox, B., Arvin, F.: Robust formation control for networked robotic
systems using negative imaginary dynamics. Automatica 140, 110235 (2022)
7. Alireza Motevallian, S., Yu, C., Anderson, B.D.O.: On the robustness to multiple
agent losses in 2D and 3D formations. Int. J. Robust Nonlinear Control 25, 1654–
1687 (2015)
8. Slim, M., Saied, M., Mazeh, H., Shraim, H., Francis, C.: Fault-tolerant control
design for multirotor UAVs formation flight. Gyroscopy Navig. 12, 166–177 (2021)
9. Huang, S., Teo, R.S.H., Kwan, J.L.P., Liu, W., Dymkou, S.M.: Distributed UAV
loss detection and auto-replacement protocol with guaranteed properties. J. Intell.
Robot. Syst. 93, 303–316 (2019)
10. Aminzadeh, A., Khoshnood, A.: A novel distributed fault-tolerant cooperative con-
trol approach for auto-reconfiguration of a multi agent system in natural disasters.
In: 2021 9th RSI International Conference on Robotics and Mechatronics (ICRoM),
pp. 163–170. IEEE (2021)
11. Ghamry, K.A., Zhang, Y.: Fault-tolerant cooperative control of multiple UAVs for
forest fire detection and tracking mission. In 2016 3rd Conference on Control and
Fault-Tolerant Systems (SysTol), 2016-November, pp. 133–138 (2016)
12. Yu, Z., Zhang, Y., Jiang, B., Yu, X.: Fault-tolerant time-varying elliptical formation
control of multiple fixed-wing UAVs for cooperative forest fire monitoring. J. Intell.
Robot. Syst. 101, 48 (2021)
13. Han, B., Jiang, J., Yu, C.: Distributed fault-tolerant formation control for multiple
unmanned aerial vehicles under actuator fault and intermittent communication
interrupt. Proc. Inst. Mech. Eng. Part I J. Syst. Control Eng. 235, 1064–1083
(2021)
14. Yamaguchi, H., Arai, T.: Distributed and autonomous control method for gen-
erating shape of multiple mobile robot group. In: Proceedings of the IEEE/RSJ
International Conference on Intelligent Robots and Systems, vol. 2, pp. 800–807
(1994)
15. Yamaguchi, H., Beni, G.: Distributed autonomous formation control of mobile
robot groups by swarm-based pattern generation. Distrib. Auton. Robot. Syst. 2,
141–155 (1996)
242 T. Muslimov
16. Yamaguchi, H.: A distributed motion coordination strategy for multiple nonholo-
nomic mobile robots in cooperative hunting operations. Robot. Auton. Syst. 43,
257–282 (2003)
17. Yamaguchi, H., Arai, T., Beni, G.: A distributed control scheme for multiple robotic
vehicles to make group formations. Robot. Auton. Syst. 36, 125–147 (2001)
18. Ito, S., Yuasa, H., Ito, M., Hosoe, S.: On an adaptation in distributed system
based on a gradient dynamics. In: IEEE SMC’99 Conference Proceedings. 1999
IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.
99CH37028), pp. 200–205. IEEE (1999)
19. Yuasa, H., Ito, M.: Coordination of many oscillators and generation of locomotory
patterns. Biol. Cybern. 63, 177–184 (1990)
20. Ito, M.: Trends and perspective of researches on control system theory. Distrib.
Auton. Robot. Syst. 2, 3–11 (1996)
21. Ito, S., Sasaki, M., Fujita, Y., Yuasa, H.: A circularly coupled oscillator system for
relative phase regulation and its application to timing control of a multicylinder
engine. Commun. Nonlinear Sci. Numer. Simul. 15, 3100–3112 (2010)
22. Beard, R.W., McLain, T.W.: Small Unmanned Aircraft: Theory and Practice.
Princeton University Press, Princeton (2012)
23. Muslimov, T.Z., Munasypov, R.A.: Coordinated UAV standoff tracking of moving
target based on Lyapunov vector fields. In: 2020 International Conference Nonlin-
earity, Information and Robotics (NIR), pp. 1–5. IEEE (2020)
DGORL: Distributed Graph Optimization
Based Relative Localization of Multi-robot
Systems
1 Introduction
The estimation of a relative pose, including position and orientation, [1], for multi-
robot systems (MRS) [2] is the foundation for higher-level tasks like collision avoid-
ance, cooperative transportation, and object manipulation. Motion capture systems [3],
ultra-wideband (UWB) systems with anchors, and RTK-GPS systems are a few exam-
ples of established multi-robot relative positioning solutions that currently rely on the
deployment of physical anchor or base stations in the application. These plans, how-
ever, are not suitable for large areas or interior settings where it is difficult to convert
the infrastructure, which limits the overall performance and application possibilities of
multi-robot systems and makes their use more difficult and expensive. Furthermore,
extraction of range and bearing measurements from cameras and visual makers, while
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 243–256, 2024.
https://doi.org/10.1007/978-3-031-51497-5_18
244 E. Latif and R. Parasuraman
another practical approach, has the drawbacks of having a small field of vision, a short-
range, obscured by nearby objects, and maybe requiring much computational power.
The use of distance measurements from sensors like radars, Lidars, and UWB to achieve
relative localization, on the other hand, has recently attracted more significant interest.
The multi-robot relative localization (MRL) problem, which refers to detecting and
locating the relative configurations of mobile agents (typically with fewer sensor data
such as relative range or bearing) concerning other agents or landmarks, is critical in
MRS because it is required for robot teaming and swarming [4, 5]. As a result, many
applications are frequently confronted with the relative localization problem, includ-
ing formation control, cooperative transportation, perimeter surveillance, area cover-
age, and situational awareness. The relative localization and mapping (aka multi-robot
SLAM) is an extension of the MRL problem. While several researchers have proposed
novel solutions to the multi-robot map merging problem using pose graph matching
and optimization techniques, they rely on extensive sensor data inputs (such as point
clouds or Lidar scans) [6–8]. Therefore, solving the MRL problem with relative range
or bearing in a distributed manner is desirable and scalable in MRS [9].
R3
Connected Robots
R5
R2
Connecting Robots
R02 R03
R43 Connected and
X3
3
, ω)
R45
, (v
Connecting Robots
5
ω)
,ω
X3 , (v
(v,
5,
X2
X
,(
R04 velocity of j to i
)2
X1 , (v R4
, ω)
1
R0 X4, (v, ω)4
Fig. 1. Overview of configuration space of a multi-robot system, sharing their pose (xi ) and rela-
tive range (Rij ) measurements in our DGORL solution.
2 Related Work
Most recent solutions to the simultaneous localization and mapping (SLAM) and MRL
problem are based on graph optimization (i.e., all robot poses and landmark positions
compose the graph’s nodes, while each edge encodes a measurement constraint) [18].
A conventional graph formulation, on the other hand, may suffer from unbounded pro-
cessing and memory complexity, which might constantly expand over time. This is
because new robot poses (and new landmarks in the case of feature-based SLAM) are
constantly being added to the graph, resulting in an increase in the number of nodes
over time; additionally, if frequent loop-closing events occur in SLAM, loop-closure
constraints (edges) can significantly increase the graph density [20]. For example, this
could be the case if a service robot works for an extended time inside an office building.
Particularly, graph optimization and factoring have been recently proposed in the
literature to solve different variants of the MRL problem [21–23]. Even though the issue
of reducing the complexity of graph optimization has recently been addressed [24, 25],
to the best of our knowledge, little work has yet explicitly taken into account estimation
consistency (i.e., unbiased estimates and an estimated covariance more significant than
or equal to the actual covariance [26]) in the design of graph reduction (sparsification)
schemes. This is a critical flaw because if an estimator is inconsistent, the accuracy
of the derived state estimations is unclear, making it untrustworthy [27]. Moreover,
the performance and efficiency of approaches to the localization problem in dynamic
environments are significantly traded off.
Most cooperative localization methods entail robot communication and observation,
which makes any step prone to inaccuracy. In a recent attempt at multi-robot local-
ization, many robots can locate themselves jointly using Terrain Relative Navigation
1
http://github.com/herolab-uga/DGORL.
246 E. Latif and R. Parasuraman
(TRN) [19]. The localization estimation utilizing shared information fusion has been
improved by using an estimator structure that takes advantage of covariance intersec-
tion (CI) to reduce one source of measurement correlation while properly including oth-
ers. Similarly, a work [28] developed a CI-based localization method with an explicit
communication update and guaranteed estimation consistency simultaneously would
increase the robustness of multi-robot cooperative localization methods in a dispersed
setting. However, robots keep a system-wide estimate in their algorithm, which relative
observations can instantly update. Consequently, it increases the computational com-
plexity at a centralized server and over-burdened the robot to keep track of dynamics
for global positioning accurately. Unlike the explicit CI modeling methods, our objec-
tive is to localize the robot relative to other robots in a distributed fashion by utilizing
the polynomial-time graph optimization technique with high accuracy and efficiency.
Therefore, this paper proposes a graph-based optimization algorithm to address the
gaps mentioned above. To form a graph using shared RSSI information among robots,
we employ a Relative Pose Measurement Graph (RPMG) using observability analysis
[22]. Once we have created a connected, reliable graph, we exploit particle filtering over
the motion model to expand the graph based on mobility constraints. Finally, we con-
struct a k-possible combination of graphs for each robot which needs to be optimized.
We use the polynomial-time graph optimizer (g2 o [18]) for graph optimization with a
distributed constraint optimization.
Multi-robot System. Robotic members of an MRS are divided into disjoint (iso-
lated/disconnected from others) and connected (operating collaboratively) robots,
which can be either ground-based or aerial. The robot that enters the measuring range
of the monitoring robot at any given moment is referred to as the observed robot. A
robot that takes measurements of a random robot, among other robots, is the observa-
tion robot. The following qualities are presumptive for the considered MRS’s robotic
members:
aij > 0, if (vi , vj ) ∈ E and aij = 0 otherwise. An undirected edge eij in the graph G
is denoted by the unordered pair of robots (vi , vj ), which means that robots vi and vj
can exchange information with each other.
Here, we only consider the undirected graphs, indicating that the robots’ communi-
cations are all bidirectional. Then, the connection weight between robots vi and vj in
graph G satisfies aij = aji > 0 if they are connected; otherwise, aij = aji = 0. With-
out loss of generality, it is noted that aii = 0 indicates no self-connection in the graph.
n
The degree of robot vi is defined by d(vi ) = j=1 aij where j = i and i = 1, 2, ..., n.
The Laplacian matrix of the graph G is defined as Ln = D−A, where D is the diagonal
with D = diag{d(v1 ), d(v2 ), ..., d(vn )}. If the graph G is undirected, Ln is symmetric
and positive semi-definite. A path between robots vi and vj in a graph G is a sequence
of edges (vi , vi1 ), (vi1 , vi2 ), ..., (vik , vj ) in the graph with distinct robots vil ∈ V . An
undirected graph G is connected if a path exists between any pair of distinct robots vi
and vj where (i, j = 1, ..., n).
Fig. 2. The Distributed Graph Optimization for Relative Localization of Multi-Robot Systems
(DGORL) system architecture shows input (Received connected robot’s motion information and
RSSI) and output (relative pose estimation) for robot i. The robot generates and expends graphs
based on observability for optimization along with the constraints set up through local and
received inertial measurement, which further fed into the optimization function. The optimized
graph yields relative pose estimates for all connected robots.
An undirected graph G = (V, E), where (i, j) ∈ E if and only if (i, j) and (j, i)
both are in E, is called the underlying undirected graph of G. G is connected if its
underlying undirected graph is connected. An isolated and connected subgraph of G
is called a component. Suppose that an undirected graph G has n nodes and m edges.
248 E. Latif and R. Parasuraman
rank(A(G)) = n − λ ≤ n − 1
The second step (See Algorithm 2) is to examine the observability of the n-robot
system according to the ERP M G = GE generated by Algorithm 1. We initialize the
incidence matrix A(GE ) and the diagonal matrix Ln to be zero matrices, with their
dimensions determined by the number of nodes and edges in GE . Then construct the
incidence matrix A(GE ) and the diagonal matrix Ln . The incidence matrix A(GE )
describes which nodes are connected by which edges and is constructed based on the
topology of GE . The diagonal matrix Ln describes the weight of the edges in GE with
Distributed Graph Optimization Based Relative Localization 249
the weight vector of each edge. The edges in A(GE ) and Ln take the same order. Then,
we can obtain the spectral matrix C(GE ) based on A(GE ) and Ln . The observability
of the n-robot system can be determined by evaluating the rank of C(GE ).
Initial position and velocity constraints are known to each robot; hence locally con-
structed ERPMG tends to change based on relative motion in the network. We have
limited the number of possible positions of an individual robot to k by exploiting the
concept of particle filtering. The motion process is carried out in the 2-D state-space
Xn,t includes position (xn,t , yn,t ) and orientation φn , t.
The robot model f (∗) can be written as:
xn,t+1 = xn,t + vn,t Δtcos(φn , t)
yn,t+1 = yn,t + vn,t Δtsin(φn , t) (1)
φn,t+1 = φn,t + ωn,t Δt
In the Eq. (1), vn,t and ωn,t are the velocity and angular velocity of the robot n at
time, t respectively. δt represents the time interval between two control inputs. Based
on the robot model f (∗) , the control input of the robot n at time t is defined as:
un,t = [vn,t , ωn,t ] (2)
It is worth mentioning that Eq. (2) is the ideal kinematic model of the robot. Under
pure rolling conditions, the robot’s motion follows this equation. However, imperfec-
tions are unavoidable in real devices. Complex wheel-ground interactions and system
250 E. Latif and R. Parasuraman
noises cause disturbances to the robots. Moreover, these disturbances are modeled as
Gaussian random variables, characterized by their mean and covariance matrices. Thus,
the model of a given robot n at time t is defined as:
Here, xn,t ∈ Rnx and un,t ∈ Rnu are the state and control input of the robot
n at the time, t respectively. Nnoise ∈ Rnx is an unknown disturbance with Gaussian
probability distribution. Considering the disturbance level in the real environment, Nn,t
noise in simulations is set to diag(0.1 m, 0.1 m, 0.5◦ ).
As each robot also receives RSSI from other robots, we can find the intersecting
region as an area of interest to map the estimated relative position for each robot using
the model and find k soft max out of them. Once we have k possible positions of each
robot, we can generate < nk solvable graphs for optimization.
3.3 Optimization
N
minimize f (x) = fi (x) subject to x ∈ D = x ∈ Rk : c(x) ≤ 0 (4)
i=1
We also assume that there exists a point x̂ such that the inequality constraint is strictly
feasible, i.e., c(x̂) < 0. We introduce a regularized Lagrangian function to deal with the
inequality constraint c(x).
N
γ
N
γ
L(x, λ) = fi (x) + λN c(x) − N λ2 = fi (x) + λc(x) − λ2
i=1
2 i=1
2
N
= Li (x, λ), (6)
i=1
where, we have replaced the inequality constraint c(x) ≤ 0 with Nc (x) ≤ 0, and γ > 0
is some parameter. It is noted that Li (x, λ) is known only by robot i.
Distributed Graph Optimization Based Relative Localization 251
Model. We consider a time-varying network model that has been widely considered in
the literature [18, 20, 29]. The robots’ connectivity at time t can be represented by an
undirected graph G(t) = (V, E(t), A(t)), where E(t) is the set of activated edges at
time t, i.e., edge (i, j) ∈ E(t) if and only if robot i can receive data from robot j, and
we assign a weight [A(t)]ij > 0 to the data on edge (i, j) at time k. Note that the set
E(t) includes self-edges (i, i) for all i. We make the following standard assumption on
the graph G(t).
Assumptions: The graph G(t) = (V, E(t), A(t)) satisfies the following.
Once the robot i model the network and constraints, g2 o [18] optimize the provided
k-ERPMG to find the best possible position estimations for connected n robots in the
form of optimized graph Go = (Vo , Eo , Ao ). Vo are the possible node positions con-
cerning i, Eo contains the possible distance range to other nodes in terms of weighted
edges, and Ao is an optimized adjacency matrix from i to every other node. Algorithm 3
describes the entire localization strategy for a given MRS, with a graph-based optimizer
denoting the algorithm’s three main iterative steps.
252 E. Latif and R. Parasuraman
4 Theoretical Analysis
Graph optimization-based relative pose localization has three submodules; Formation,
Expansion, and Optimization. The formation depends on the number of robots n, which
causes the generation of ERPMG G = (V, E, A) and expansion relies on the number of
possible position instances, k, which produces nk potential graphs to be optimized. As
both initial steps can be serially done in polynomial time, one can deduce that the algo-
rithm is polynomial and scalable with the number of robots. However, the optimization
of nk possible graphs using the g2 o optimizer needs to be analyzed for its NP-hardness.
NP-hardness: We will show that the problem of finding optimizers is NP-hard in gen-
eral. An example shows that L1-optimality is NP-hard for non-submodular energies.
Remember that if no two vertices of a graph G = (V, E, A) are connected by
an edge, the set U of vertices is independent. The problem of finding the maximal
independent set of vertices of an arbitrary graph is known to be NP-hard [30]. Consider
the following local costs as an example:
– Give each vertex v of label l the cost li .
– For each edge with both vertices of label l, let the cost be N = |V | + l.
– Connect the cost of 0 to any other edge.
It is worth noting that if and only if the set U = l1 (1) is independent, the maximum cost
of any labeling l is N . All labeling l associated with an independent U have a maximum
cost of 1. Furthermore, the labeling l is a strict minimizer when the number of cost 1
atom for U , which is |V | − |U |, is minimal, i.e., when the size of U is maximal. To put
it another way, if we use the previously mentioned local cost assignments for a graph
G, then l is a strict minimizer if and only if U := l1 (1) is a maximal independent set
of vertices. As a result, our problem, like the problem of finding the most extensive
independent set of vertices, is NP-hard. Thus, the optimization problem is proved to
be NP-hard, and we also know that such graph problems can be verified in polynomial
time [30] hence the problem is NP-complete.
Fig. 3. Trajectories and the Optimized Graph. Left: Simulation with the ground truth (green)
trajectory along with the colored predicted trajectory of a robot; Right: Robots as colored vertices
and colored edges between connected robots with the respective color for optimized graphs.
Fig. 4. DGORL performance of five robots in terms of Root Mean Squared Error (RMSE) in
the 60 m × 60 m simulation workspace (Left). Comparison with Collaborative Terrain Relative
Navigation (TRN) based on Covariance Intersection (Right).
Sect. 3, where each robot shares inertial measurement and RSSI with connected robots.
Each robot in the MRS performs relative localization based on its frame of reference.
We can calculate weights of edges eij in G = (V, E) as a range d the between Ri and
Rj using path loss model using perceived RSSI:
A−RSSI
d = 10 10n (7)
Here, n denotes the signal propagation exponent, which varies between 2 (free space)
and 6 (complex indoor environment), d denotes the distance between robots i and j, and
A denotes received signal strength at a reference distance of one meter. On every itera-
tion, ri generates ERPMG based on its observation and performs observability analysis
(at least four nodes connected, based on the observability Checking in Algorithm 2).
Later, it will expand ERPMG and generate input for the g2 o optimizer.
The initial global measuring posture, the local relative pose limitations, and its infor-
mation matrix make up g2 o’s input. Once obtained an optimized graph, ri estimated rel-
254 E. Latif and R. Parasuraman
ative positions from vertex poses. Although in MRS, each robot performs localization
distributedly. To measure error, we have captured initial positions on a global scale and
compared predicted trajectories with ground truth as RMSE in meters. We used range
sensors to compare our strategy to Terrain Relative Navigation (TRN) from [19].
Computational Demand. The mean optimization time (referring to the time taken by
the solver) and the mean CPU usage of the whole process are 10.2 ± 2.7 ms and
143 ± 49%, respectively. The results are evaluated for five robots from 10 runs over
100 iterations. The computational complexity of covariance intersection-based TRN is
close to DGORL, which further highlights the efficiency of DGORL. It is worth notic-
ing that the performance of DGORL provides evidence for its practical use in small
resource-constrained robots (i.e., equipped with tiny computers, e.g., Raspberry Pi zero
with Quad Core running at 1 GHz). Hence, DGORL can be used as an efficient relative
localization algorithm for most multi-robot systems, including swarm robotics.
6 Conclusion
Many estimations, planning, and optimum control challenges in robotics have a basis
for optimization problems. In most optimization problems, the goal to be maximized
or minimized is made up of multiple local components or terms. That is, they only
rely on a tiny portion of the variables. We use graph theory to solve a well-defined
multi-robot relative localization problem. Graph generation, expansion through transi-
tion, and optimization are the three fundamental components of our novel technique.
We first estimated a relative pose measurement graph using received signal intensity
between connected robots. Later, we used a motion model to build graph expansions
and a polynomial-time graph optimizer to find an optimized pose estimation graph from
which a robot extracts relative positions for connected robots. Finally, we analyzed
the algorithm through simulations to validate its practicality and accuracy in a large
workspace. Furthermore, DGORL delineates high localization accuracy than appropri-
ate methods while maintaining a comparable optimization time, which further backs
Distributed Graph Optimization Based Relative Localization 255
the efficiency and accuracy of DGORL. In the future, we will perform hardware exper-
imentation to substantiate the applicability of graph optimization in the real world.
References
1. Islam, M.J., Mo, J., Sattar, J.: Robot-to-robot relative pose estimation using humans as mark-
ers. Auton. Robot. 45(4), 579–593 (2021)
2. Xianjia, Y., Qingqing, L., Queralta, J.P., Heikkonen, J., Westerlund, T.: Applications of
UWB networks and positioning to autonomous robots and industrial systems. In: 2021 10th
Mediterranean Conference on Embedded Computing (MECO), pp. 1–6. IEEE (2021)
3. Najafi, M., Nadealian, Z., Rahmanian, S., Ghafarinia, V.: An adaptive distributed approach
for the real-time vision-based navigation system. Measurement 145, 14–21 (2019)
4. Guo, K., Qiu, Z., Meng, W., Xie, L., Teo, R.: Ultra-wideband based cooperative relative
localization algorithm and experiments for multiple unmanned aerial vehicles in GPS denied
environments. Int. J. Micro Air Veh. 9(3), 169–186 (2017)
5. Fink, J., Michael, N., Kim, S., Kumar, V.: Distributed pursuit-evasion without global local-
ization via local fronteirs. Auton. Robot. 32(1), 81–95 (2012)
6. Dubé, R., Gawel, A., Sommer, H., Nieto, J., Siegwart, R., Cadena, C.: An online multi-robot
slam system for 3D lidars. In: 2017 IEEE/RSJ International Conference on Intelligent Robots
and Systems (IROS), pp. 1004–1011. IEEE (2017)
7. Mangelson, J.G., Dominic, D., Eustice, R.M., Vasudevan, R.: Pairwise consistent measure-
ment set maximization for robust multi-robot map merging. In: IEEE International Confer-
ence on Robotics and Automation (ICRA), pp. 2916–2923. IEEE (2018)
8. Tian, Y., Chang, Y., Arias, F.H., Nieto-Granda, C., How, J.P., Carlone, L.: Kimera-Multi:
robust, distributed, dense metric-semantic slam for multi-robot systems. IEEE Trans. Robot.
38, 2022–2038 (2022)
9. Latif, E., Parasuraman, R.: Multi-robot synergistic localization in dynamic environments. In:
ISR Europe 2022; 54th International Symposium on Robotics, pp. 1–8 (2022)
10. Shorinwa, O., Yu, J., Halsted, T., Koufos, A., Schwager, M.: Distributed multi-target tracking
for autonomous vehicle fleets. In: IEEE International Conference on Robotics and Automa-
tion (ICRA) 2020, pp. 3495–3501 (2020)
11. Wai, H.-T., Yang, Z., Wang, Z., Hong, M.: Multi-agent reinforcement learning via double
averaging primal-dual optimization. In: Advances in Neural Information Processing Sys-
tems, vol. 31 (2018)
12. Wang, W., Jadhav, N., Vohs, P., Hughes, N., Mazumder, M., Gil, S.: Active rendezvous for
multi-robot pose graph optimization using sensing over Wi-Fi. In: Asfour, T., Yoshida, E.,
Park, J., Christensen, H., Khatib, O. (eds.) Robotics Research. ISRR 2019. Springer Proceed-
ings in Advanced Robotics, vol. 20, pp. 832–849. Springer, Cham (2019). https://doi.org/10.
1007/978-3-030-95459-8_51
13. Parashar, R., Parasuraman, R.: Particle filter based localization of access points using
direction of arrival on mobile robots. In: IEEE 92nd Vehicular Technology Conference
(VTC2020-Fall), pp. 1–6. IEEE (2020)
14. Latif, E., Parasuraman, R.: Online indoor localization using DOA of wireless signals. arXiv
preprint arXiv:2201.05105 (2022)
15. Luo, S., Kim, J., Parasuraman, R., Bae, J.H., Matson, E.T., Min, B.-C.: Multi-robot ren-
dezvous based on bearing-aided hierarchical tracking of network topology. Ad Hoc Netw.
86, 131–143 (2019)
256 E. Latif and R. Parasuraman
16. Parasuraman, R., Min, B.-C.: Consensus control of distributed robots using direction of
arrival of wireless signals. In: Correll, N., Schwager, M., Otte, M. (eds.) Distributed
Autonomous Robotic Systems. SPAR, vol. 9, pp. 17–34. Springer, Cham (2019). https://
doi.org/10.1007/978-3-030-05816-6_2
17. Parasuraman, R., Oegren, P., Min, B.-C.: Kalman filter based spatial prediction of wireless
connectivity for autonomous robots and connected vehicles. In: IEEE 88th Vehicular Tech-
nology Conference (VTC-fall), pp. 1–5. IEEE (2018)
18. Kümmerle, R., Grisetti, G., Strasdat, H., Konolige, K., Burgard, W.: g2 o: a general frame-
work for graph optimization. In: 2011 IEEE International Conference on Robotics and
Automation, pp. 3607–3613. IEEE (2011)
19. Wiktor, A., Rock, S.: Collaborative multi-robot localization in natural terrain*. In: IEEE
International Conference on Robotics and Automation (ICRA), pp. 4529–4535 (2020)
20. Huang, G., Kaess, M., Leonard, J.J.: Consistent sparsification for graph optimization. In:
European Conference on Mobile Robots, pp. 150–157 (2013)
21. Zheng, S., et al.: Multi-robot relative positioning and orientation system based on UWB
range and graph optimization. Measurement 195, 111068 (2022)
22. Hao, N., He, F., Hou, Y., Yao, Y.: Graph-based observability analysis for mutual localization
in multi-robot systems. Syst. Control Lett. 161, 105152 (2022)
23. Sahawneh, L.R., Brink, K.M.: Factor graphs-based multi-robot cooperative localization: a
study of shared information influence on optimization accuracy and consistency. In: Pro-
ceedings of the 2017 International Technical Meeting of the Institute of Navigation (ION
ITM 2017), Monterey, California, pp. 819–838 (2017)
24. Carlevaris-Bianco, N., Eustice, R.M.: Generic factor-based node marginalization and edge
sparsification for pose-graph SLAM. In: 2013 IEEE International Conference on Robotics
and Automation, pp. 5748–5755. IEEE (2013)
25. Johannsson, H., Kaess, M., Fallon, M., Leonard, J.J.: Temporally scalable visual slam using
a reduced pose graph. In: 2013 IEEE International Conference on Robotics and Automation,
pp. 54–61. IEEE (2013)
26. Bar-Shalom, Y., Li, X.R., Kirubarajan, T.: Estimation with Applications to Tracking and
Navigation: Theory Algorithms and Software. Wiley, New York (2004)
27. Indelman, V., Nelson, E., Michael, N., Dellaert, F.: Multi-robot pose graph localization and
data association from unknown initial relative poses via expectation maximization. In: 2014
IEEE International Conference on Robotics and Automation (ICRA), pp. 593–600. IEEE
(2014)
28. Chang, T.-K., Chen, K., Mehta, A.: Resilient and consistent multirobot cooperative localiza-
tion with covariance intersection. IEEE Trans. Rob. 38(1), 197–208 (2021)
29. Jiang, X., Zeng, X., Sun, J., Chen, J.: Distributed solver for discrete-time Lyapunov equations
over dynamic networks with linear convergence rate. IEEE Trans. Cybern. 52(2), 937–946
(2022)
30. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. MIT
Press, Cambridge (2022)
Characterization of the Design Space
of Collective Braitenberg Vehicles
1 Introduction
Fig. 1. (a) Overview of how the original BV react to environmental stimulus. (b)
Overview of local and global coordinates referenced in this article.
delivery and minimally invasive surgery [13]. Robotics techniques at this scale
are rapidly becoming a reality, with memristor-based circuits capable of sensing
chemical and light signals [5] and read/write memory of temporal events [34].
However, the agents are too small to use traditional onboard state-estimation
and control algorithms, instead requiring careful design of primarily reactive
strategies, uniform off-board controls for the entire collective [30], or a combina-
tion of the two approaches [9]. Reactive approaches that use local information
and embodied intelligence are crucial for success at the micro-scale, as biological
collectives show [10]. At the human scale, as robotic collectives begin assisting
us with personal care, transportation, and environmental monitoring, the design
of reactive collectives is appealing. Decentralized methods are more scalable,
and could even help side-step some privacy and surveillance concerns if we can
implement systems that do not require cameras, full scene reconstruction, or
global positioning [8,29]. To our knowledge, the dynamics of multiple BVs with
onboard stimuli have not been a first-class object of study in the robotics, control
or autonomous systems literature.
Our contributions include a more formal definition of the BV model, an open-
source Python implementation for simulating and characterizing the system, and
an interpretation of the results and next steps toward application. We extend the
BV model to include on-board stimuli and define two stimuli models: a symmetric
omni-directional stimulus such as that enabled by radio frequency localization
[17], and a directional stimulus modeled off a generic LED’s spatial intensity.
We also define two sensor models, an omnidirectional sensor model and a sensor
model with an angular dependency, such that the observed stimuli intensity
decreases with misalignment. In this paper, we examine three combinations of
these models, and characterize emergent behaviors with respect to metrics over
the nearest-neighbor distances and kinetic energy of the collective. We examine
homogeneous collectives while varying three main design parameters:
– Influence, γ: scale of the reactive control law, updates heading and velocity
– Influence angle, θmax : angular extent of constrained sensors and/or stimuli
– Actuation noise: Gaussian noise on heading and velocity execution
Design Space of Braitenberg Vehicles 259
The premise that collectives of simple, reactive agents exhibit robust behaviors
has been extensively validated in simulated and empirical work, from boid mod-
els of swarms [25] and particle-based optimization algorithms [7], to consensus
in biological collectives [28]. However, the effort to design robust and efficient
artificial collectives is still very much underway, with current systems falling far
short of the performance seen in biological collectives [6]. It is still difficult to
design for robust behavior, especially if the behaviors are learned, evolved or envi-
ronmental disturbances are present. Recent works in formalizing reactive plans
or controllers have focused on problems such as co-design of individual robotic
hardware and software with dynamic effects [36]; task-driven single-robot plan-
ning and design problems [22]; or top-down synthesis problems whose solutions
require significant sensing, computation, and communication between agents [3].
Biological systems, such as schooling fish, are a proof-of-concept that parsimo-
nious local strategies can lead to robust behavior, especially when combined
with embodied agent-environment interactions [19]. In this work we contribute
a dynamical system definition and characterization of the BV model as a step
toward more formal understanding of the dynamics and design of reactive col-
lectives.
The BV model has not often been a direct object of study by engineers,
apart from use as a demonstration in education [14] and graphics [33]. Similar
underlying themes can be found in the reactive multi-robot systems literature,
often focused on finding minimal requirements of a single behavior, while we aim
to generalize many behaviors in a single model. One fruitful line of work shows
that local tactile information, such as the frequency of robot-robot encounters, is
sufficient for tasks such as localization and density regulation in structured envi-
ronments [21]. In [11], the authors perform a parameter sweep over the control
space to find an optimal reactive controller for aggregation with a one-bit laser-
type occupancy sensor. Similarly, segregative behavior is achieved by a reactive
controller on a sensor reporting only whether another agent is nearby and if it is
the same or a different type [23]. Interestingly, neither of these systems [11,23]
include the ability to sense whether nearby agents are to the left or right, as is
fundamental to the BV model. However, in both systems the optimal motion
pattern in the absence of stimulus is a circular arc, and the agents are able to
effectively search the space and aggregate when they detect nearby agents. This
“sweep-until-find” strategy seems to simulate the taxis-kinesis strategy of love
with a more minimalist active sensing strategy that avoids the need to compute
direction of taxis. However, these systems are not designed to switch modes to
other flocking or spatial coverage behaviors, as the BV model allows.
Another promising approach for safe emergent behavior design is the Gibbs
Random Fields (GRF) model, demonstrating segregative and flocking behaviors
[26] as well as static shape formation [27]. The GRF approach achieves stronger
guarantees on performance for specific behaviors, but assumes the ability to
estimate neighboring agent positions, velocities, and type (in the heterogenous
swarm case). In this work, we focus our efforts on the BV model with onboard
260 J. A. Defay et al.
stimulus for homogeneous collectives and characterize the behavior space. The
BV model has minimal sensing requirements and yet still demonstrates at mul-
tiple robust switchable behavior modes.
Researchers in control theory and applied mathematics have also shown for-
mal stability proofs for emergent collective motion. The Vicsek model of constant
speed agents in the plane with a nearest-neighbor alignment feedback rule is a
stable switched linear system [16,32]; the Dubin’s car model has been shown
to reliably rendezvous given a simple reactive steering control law [35]. Discrete
stochastic methods can give formal guarantees on performance of collectives of
minimal agents tasked with static shape formation, aggregation and dispersal
[4,20], and the strategies show promising applicability to continuous systems. So
while we have strong mathematical intuition that reactive swarms can perform
robust behaviors, there is a lack of understanding of dynamic system behav-
ior, especially considering different sensor models and sensor-actuator couplings.
One exception is [12], where the dynamics of schooling fish were investigated as a
function of both the perception angle and manoeuvrability of the fish, identifying
three possible stable states (schooling, swarming and milling). In a similar spirit,
we address the BV model as a starting point to unify these insights and advance
the understanding of minimalist, mode-switching autonomous collectives.
Here, we present a thorough characterization of the BV model as a first step
toward more formal understanding of the system’s stable states. A similar and
more developed effort is the swarmalator model [24]; the model leverages the
physics of oscillators to formally show five stable states in collective phase and
position, but is limited in its applicability to systems with directional sensing
constraints or the inability to sense neighboring phase. BVs have recently been
used as a testbed for embodied artificial intelligence research in developmen-
tal neurosimulation [1] and structured Bayesian techniques [31]. However, these
approaches generally lack a formal model or an accessible implementation. To
fill this gap, we implement an open-source BV model and swarm characteriza-
tion metrics, allowing for comparison between approaches to reactive collective
design.
considerations of morphology and collisions to future work, and assume that each
agent is small compared to the workspace. Time progresses in discrete stages,
where agents update their heading and speed at each time step as a memoryless,
reactive function of their observations in the previous time step. We omit the
time variable in our notation where operations occur within a single time step.
We implement directional and omnidirectional sensing and stimulus models,
inspired by common geometric sensing constraints for multi-robot systems, such
as those found with radio communications or LED-photodiode implementations.
Consider two agents (a leader and a follower ) with positions and orientations
xL , θL and xF , θF , respectively. Let θC be the global orientation of the vector
from xF to xL . Let α ∈ [−π, π] be the angular coordinate of the follower in the
leader’s reference frame. If the follower is to the right as viewed by the leader,
α is negative. Let β be the angular coordinate of the leader in the follower’s
reference frame (β = 0 when θC = θF ). If the leader is to the right as viewed by
the follower, β is negative.
In both stimulus models, intensity drops off proportional to the square of
the distance from the stimulus source, as would be the case with light-based
stimulus. If an observer is distance d from a stimulus source, the fraction of
stimulus intensity remaining after distance is taken into account is
d
Id = 1 − ( )2 , (1)
dmax
where dmax is the distance cutoff beyond which the stimulus is undetectable.
The directional source is modelled off a generic LED intensity profile, with
the stimulus pointing “behind” the agent and visible if the follower is within
the stimulus “cone” (α > π − θmax or α < −π + θmax ). We assume perceived
intensity does not depend on α within this angular range.
In the omnidirectional sensing case, the sensed area is the entire circle around
an agent defined by radius dmax .
In the directional sensing case, the field of view is limited by the angle con-
straint θmax such that |β| < θmax . In addition, we have the apparent intensity
be scaled by a multiplicative factor Iβ , as photodiodes are often flat and thus
reflect more incoming photons when misaligned. We use the relation
β
Iβ = 1 − ( )2 (2)
θmax
to model a smooth dropoff in sensor sensitivity that reaches zero at a mis-
alignment of β = θmax . See Fig. 1(B) for an illustration of the directional stimulus
intensity profile for θmax = π/3, θF = θL = π/2.
Inspired by Braitenberg’s two “forward-facing” sensors, we define a “wind-
shield” sensor model, splitting the field of view into the left and right half of the
“windshield,” centered on the agent’s orientation. Each agent then has two high
level sensors, each of which reports the total intensity of stimuli observed in that
half. We let yi,r and yi,l denote the right-hand and left-hand sensor readings for
agent i. To compute each of these sensor readings, we compute the sum of each
262 J. A. Defay et al.
visible agent’s intensity (Imax = 1 for all agents in our simulations), scaled by
Id and Iβ as applicable.
Next, we define reactive, proportional control laws according to Braitenberg’s
schema, where we vary kinesis (k) to move faster or slower in response to stimuli,
and vary taxis (t) to turn toward or away from stimuli. For agent i, we control
the speed vi and the orientation θi according to
where yi,l and yi,r are the left and right sensor response magnitudes, respec-
tively. For readability, we define i = yi,l + yi,r for the total stimulus observed by
an agent, and δi = yi,l − yi,r for the difference between the two sensing regimes,
with a positive δi indicating more stimulus on the left hand side of the agent
than the right.
The basic reactive “behaviors” that we consider here are love, fear, aggres-
sion, and curiosity, and each correspond to different simple control law imple-
mentations for k and t. For example, in the love behavior, agents turn toward
stimuli and slow down as intensity increases, leading to reliable aggregations.
The discretized control laws for love are then
where γ is a scaling factor and design parameter we call the influence, gov-
erning how strongly the agents actuate in each time step. In our discrete model
and simulation, it combines the design parameters of gain in the control law and
the discretization of time.
Likewise, we define fear as the emergent behavior when the agents turn away
from stimuli, and slow down in response to stimuli; agents “fear” the stimuli and
wish to avoid it. Aggression emerges when agents turn toward stimuli and speed
up (“attacking”), and curiosity emerges when agents turn away from stimuli and
speed up (on their way to find the next stimulus signal). The control laws are
summarized in Table 1.
We implement two independent sources of noise, on the actuation direction
and speed. During each timestep, the agent updates its control inputs (head-
ing and speed) according to the rules in Table 1. To simulate actuation noise,
an additive Gaussian noise term is applied to both of the control inputs, inde-
pendently for each agent, before the simulation takes a step. The default noise
has a mean of zero for both heading and speed, and a standard deviation of
σθ = 72π
= 5◦ for the heading and σv = vmax72 for speed. In later sections, we
increase these standard deviations to study system response to actuation noise.
Design Space of Braitenberg Vehicles 263
Table 1. The reaction rules for each of the four regimes of interest.
2.2 World
2.3 Metrics
We define two metrics for evaluating the emergent behavior in terms of spatial-
and velocity distribution.
The Nearest Neighbor Distance (NND) metric is defined as the population
average of the distance from each agent its closest neighbor, or
1
N
N N Dpop = min(d(ai , aj )) (7)
N i=0
where d(ai , aj ) is the minimum Euclidean distance along the surface of the
torus world between agents i and j.
Using an approximation based on uniformly distributed points in a unit
square [15], we estimate N N Dpop ≈ 60 in the case where 20 agents are dis-
tributed uniformly; however this is an underestimation due to the approximation
as well as our torus geometry. In practice, we see a maximum N N Dpop ≈ 75
in cases where the behavior is observed to create a more nearly uniform spatial
distribution.
264 J. A. Defay et al.
We also compute the population average for a kinetic energy metric, such
that
1 2
N
KEpop = v (8)
N i=0 i
normalizing the mass of each agent to 1.
3 Emergent Behaviors
3.1 Behavior Characterization
For the models with directional sensing, we characterize each of the four emergent
behaviors, with respect to the nearest neighbor distance and energy metrics, by
performing parameter sweeps across the influence angle θmax and the influence
magnitude γ. For the influence angle, we vary θmax ∈ [π/8, π/2], and we vary the
influence scale in [0.25, 2.5], with each range discretized into ten evenly spaced
values. For each pair of angle/influence variables, we simulated 20 randomly
initialized collectives of 20 agents, collecting the metrics at each timestep after
a convergence period Tconv = 990 for a duration of Trecord = 10 timesteps, for
a total simulation runtime of 1000 timesteps. We then average each metric over
time during the converged behavior, and report the value on the heat maps in
Figs. 2, 3, 4, and 5. We also show a representative time-lapse of agent positions
for the two directional models over the last 100 frames of the simulation. Finally,
we show the system response to noise for all three models. For each model, we
fix γ = 1.5 and θmax = π/2, and compute the mean and standard deviation of
NND and KE over 20 iterations with Tconv = 990 and Trecord = 10, as in the
heat maps.
but all behaviors are of technical interest for multi-modal artificial collectives.
In the following sections, we will first characterize the balanced behavior regime,
then identify under- and over-stimulated modes if they exist.
3.3 Love
“Loving” agents turn toward stimulus and slow down as local stimulus increases.
For artificial collectives, this behavior is primarily useful for aggregation into a
low-velocity collective, perhaps to await a new task or to establish a static net-
work for exchange of information. Figure 2 presents an overview of the emergent
behaviors of this taxis-kinesis combination.
Fig. 2. Characterization of love behavior (positive taxis, negative kinesis). (a) First
row: NND and kinetic energy profiles for the directional stimulus and sensing model,
varying the angle θmax identically for both stimulus and sensing, and the influence γ.
The right-most figure is a snapshot of the behavior for the last 100 frames of a random
simulation with γ = 1 and θ = π/2. Second row, (b) results for omni-directional
stimulus and directional sensing model, varying the angle θmax (now only affecting
angular sensing range) and the influence γ. Third row, (c) results for each sensor-stimuli
model of NND and kinetic energy, while varying noise scaling factor as a percent of
the maximum value π for rotational noise and vmax = 4px for linear actuation noise.
All data points have θmax = π/2 where applicable and γ = 1.5.
266 J. A. Defay et al.
Overall this is one of the most consistent quadrants, with almost all parame-
ter choices leading to consistent aggregative behaviors. Aggregation is not a new
behavior in the decentralized reactive agent literature, and our results confirm
that minimal sensing and reactive strategies lead to stable, low kinetic energy,
low NND states regardless of noise. Of interest is that the three sensor/stimulus
models have notably different mean NND as the noise magnitude increases, but
the mean kinetic energy of all three implementations is unaffected by noise.
In the case of directional stimulus and sensing, the converged steady state has
higher NND because the agents cannot see nearby agents behind them, and thus
an aligned following behavior is stable in the torus environment. In the cases
with omnidirectional stimulus and/or sensing, the agents have sufficient mutual
visibility such that some agents will halt completely and form more traditional
aggregates.
3.4 Aggression
Aggressive agents turn toward stimuli and speed up. This behavior is useful for
clustering and flocking in spaces where agents wish to keep a high momentum for
quick dispersion after aggregation, or when agents must come together quickly
or follow stimulus sources closely. As seen in Fig. 3(a) and (b), the traditional
behavior of aggressive following is found to emerge for higher values of influence
and θmax , and the behavior is not as robust as love.
At lower levels of stimulation, the agents tend to become more uniformly dis-
tributed, and qualitatively their behavior consists of short sprints toward stim-
ulus before losing the signal and slowing down. Quantitatively we also observe a
lower average kinetic energy. As we increase the power of the sensing and stimuli
models, mean kinetic energy increases, but mean NND stays relatively constant.
We observe that for narrow θmax , agents form small slow clusters, but as θmax
and influence increase, the collective forms chains of agents following each other
around the torus. Agents are closely following each other, so mean NND stays
low while collective kinetic energy increases.
Noise has relatively little effect on the behavior of the system, apart from
preventing a deadlocked stable state at zero noise for directional stimulus and
sensing. Increasing noise also tends to increase the kinetic energy of the sys-
tem for directional constraints, increasing the chance of sighting another agent
through random walk coverage and re-orientations. Finally, we observe that the
standard deviation for kinetic energy is high in the directional cases, supporting
our observation of bi-modal velocity distributions with this taxis-kinesis combi-
nation.
3.5 Fear
Fearful agents turn away from stimuli and slow down, and the resulting emergent
behavior is a space-filling formation. This behavior lends itself well to applica-
tions that need automatic organization of a collective into a mostly static network
of agents monitoring the environment or passing messages.
Design Space of Braitenberg Vehicles 267
We see from Fig. 4 that overall, the NND metric is consistently high with
this behavior regardless of sensor/stimulus model, nearly maximized in much of
the design space. As sensing and stimulus production become more powerful,
each agent is able to detect more stimulus around it, and becomes more “afraid”
and moves more slowly.
The under-stimulated case is apparent, dramatically so in the case of direc-
tional stimulus and sensing, where kinetic energy is consistently high across all
noise values. The system is in a stable state of agents detecting a low amount
of stimulus most of the time, and stopping and reorienting when they happen
to cross paths. The resulting behavior is reminiscent of run-and-tumble models
in active matter systems. The taxis-kinesis combination of fear is notably more
robust with respect to noise than the previous two behaviors.
268 J. A. Defay et al.
Fig. 4. Characterization of fear behavior (negative taxis, negative kinesis). (a) First
row: NND and kinetic energy profiles for the directional stimulus and sensing model,
varying the angle θmax identically for both stimulus and sensing, and the influence γ.
The right-most figure is a snapshot of the behavior for the last 100 frames of a random
simulation with γ = 1 and θ = π/2. Second row, (b) results for omni-directional
stimulus and directional sensing model, varying the angle θmax (now only affecting
angular sensing range) and the influence γ. Third row, (c) results for each sensor-stimuli
model of NND and kinetic energy, while varying noise scaling factor as a percent of
the maximum value π for rotational noise and vmax = 4px for linear actuation noise.
All data points have θmax = π/2 where applicable and γ = 1.5.
3.6 Curiosity
Finally, in the curiosity mode, agents turn away from stimulus and speed up
when more stimulus is detected. Intuitively, they are “seeking” regions with
no stimulus in which to stop. The agents tend to disperse, similarly to fear, but
display a greater sensitivity to noise and system design parameters. The curiosity
behavior is applicable to the formation of space-filling static formations, as well
as dynamic space-filling trajectories that have a higher average energy, depending
on the sensing and stimulus model and design parameters. In Fig. 5, we see that
counterintuitively, more directional models are better at maintaining a uniform
spatial distribution. As θmax increases, agents are able to see more neighbors,
and a stable state occurs where agents align and form a “search party” style
formation, as seen in Fig. 5(a).
Design Space of Braitenberg Vehicles 269
At high θmax in the omnidirectional stimulus and directed sensing case, the
system becomes overstimulated, as it also does in the fully omnidirectional case.
In this mode, agents cannot escape stimuli, and maintain a consistently high
speed. Interestingly, this behavior does not maximize the NND metric, indicating
some additional collective structure, especially at lower noise values.
and high kinetic energy modes for curiosity. We also analyzed for response to
noise, and found the emergent collective dynamics to be quite stable, giving hope
for continued formalization of multi-agent system design. It is also interesting to
note that noise often aids stability or can be used to tune the emergent behavior.
We also identify certain “deadlock” states, such as aggression with directional
stimulus and sensing; when agents cannot adequately detect others, the system
can to become “stuck” in a motionless state without noise.
We identified that constraints on angular range of sensing and stimulus have
a key effect on behavior in much of the design space, supporting findings in other
works on embodied system dynamics [12]. Other interesting design parameters
that we have not considered here include agent density in the space, environment
geometries and strategies for agent-environment interactions, heterogenous col-
lectives, and the influence of controllable or unreliable agents. In the supplemen-
tal video, available at https://youtu.be/9ngfH0vkOfc, we show a demonstration
of how a single controllable agent can be used along with global mode-switching
to guide a swarm through narrow passageways and explore a multi-room envi-
ronment.
Many of these behaviors that we have shown are amenable to more formal
analysis. We also aim to use this platform for comparison of reactive, stateful,
and learned strategies under minimal sensing, as well as develop an embodied
representation of the agents that allows for design of agent-agent and agent-
environment interactions. The Braitenberg Vehicle model’s strength lies in lever-
aging the ability to switch a collective between several useful modes with minimal
requirements in hardware and software.
Acknowledgments. This project was funded by the Packard Fellowship for Science
and Engineering, GETTYLABS, and the National Science Foundation (NSF) Grant
#2042411.
References
1. Alicea, B., Dvoretskii, S., Felder, S., Gong, Z., Gupta, A., Parent, J.: Developmental
embodied agents as meta-brain models. In: DevoNN Workshop (2020)
2. Braitenberg, V.: Vehicles: Experiments in Synthetic Psychology. MIT Press, Cam-
bridge (1986)
3. Chen, J., Sun, R., Kress-Gazit, H.: Distributed control of robotic swarms from
reactive high-level specifications. In: 2021 IEEE 17th International Conference on
Automation Science and Engineering (CASE), pp. 1247–1254. IEEE (2021)
4. Coppola, M., Guo, J., Gill, E., de Croon, G.C.: Provable self-organizing pattern
formation by a swarm of robots with limited knowledge. Swarm Intell. 13(1), 59–94
(2019)
5. Ding, T., et al.: Light-induced actuating nanotransducers. Proc. Natl. Acad. Sci.
113(20), 5503–5507 (2016)
6. Dorigo, M., Theraulaz, G., Trianni, V.: Swarm robotics: past, present, and future
[point of view]. Proc. IEEE 109(7), 1152–1165 (2021)
7. Eberhart, R.C., Shi, Y., Kennedy, J.: Swarm Intelligence. Elsevier, Amsterdam
(2001)
Design Space of Braitenberg Vehicles 271
8. Frank, S., Kuijper, A.: Privacy by design: survey on capacitive proximity sensing
as system of choice for driver vehicle interfaces. In: Computer Science in Cars
Symposium, pp. 1–9 (2020)
9. Gardi, G., Ceron, S., Wang, W., Petersen, K., Sitti, M.: Microrobot collectives
with reconfigurable morphologies, behaviors, and functions. Nat. Commun. 13(1),
1–14 (2022)
10. Garnier, S., et al.: The embodiment of cockroach aggregation behavior in a group
of micro-robots. Artif. Life 14(4), 387–408 (2008)
11. Gauci, M., Chen, J., Li, W., Dodd, T.J., Groß, R.: Self-organized aggregation
without computation. Int. J. Robot. Res. 33(8), 1145–1161 (2014)
12. Gautrais, J., Jost, C., Theraulaz, G.: Key behavioural factors in a self-organised
fish school model. In: Annales Zoologici Fennici, vol. 45, pp. 415–428. BioOne
(2008)
13. Hauert, S.: Swarm engineering across scales: from robots to nanomedicine. In:
ECAL 2017, The Fourteenth European Conference on Artificial Life, pp. 11–12.
MIT Press (2017)
14. Hogg, D.W., Martin, F., Resnick, M.: Braitenberg Creatures. Epistemology and
Learning Group, MIT Media Laboratory Cambridge (1991)
15. achille hui. (https://math.stackexchange.com/users/59379/achille hui): Average
distance between n randomly distributed points on a square with their near-
est neighbors. Mathematics Stack Exchange. https://math.stackexchange.com/q/
2565546. Accessed 11 Nov 2018
16. Jadbabaie, A., Lin, J., Morse, A.S.: Coordination of groups of mobile autonomous
agents using nearest neighbor rules. IEEE Trans. Autom. Control 48(6), 988–1001
(2003)
17. Lanzisera, S., Zats, D., Pister, K.S.: Radio frequency time-of-flight distance mea-
surement for low-cost wireless sensor localization. IEEE Sens. J. 11(3), 837–845
(2011)
18. LaViers, A., et al.: Choreographic and somatic approaches for the development of
expressive robotic systems. In: Arts, vol. 7, p. 11. MDPI (2018)
19. Lei, L., Escobedo, R., Sire, C., Theraulaz, G.: Computational and robotic modeling
reveal parsimonious combinations of interactions between individuals in schooling
fish. PLoS Comput. Biol. 16(3), e1007194 (2020)
20. Li, S., et al.: Programming active cohesive granular matter with mechanically
induced phase changes. Sci. Adv. 7(17), eabe8494 (2021)
21. Mayya, S.: Local encounters in robot swarms: from localization to density regula-
tion. Ph.D. thesis, Georgia Institute of Technology (2019)
22. McFassel, G., Shell, D.A.: Reactivity and statefulness: action-based sensors, plans,
and necessary state. Int. J. Robot. Res., 02783649221078874 (2021)
23. Mitrano, P., Burklund, J., Giancola, M., Pinciroli, C.: A minimalistic approach to
segregation in robot swarms. In: 2019 International Symposium on Multi-Robot
and Multi-Agent Systems (MRS), pp. 105–111. IEEE (2019)
24. O’Keeffe, K.P., Hong, H., Strogatz, S.H.: Oscillators that sync and swarm. Nat.
Commun. 8(1), 1–13 (2017)
25. Reynolds, C.W.: Flocks, herds and schools: a distributed behavioral model. In:
Proceedings of the 14th Annual Conference on Computer Graphics and Interactive
Techniques, pp. 25–34 (1987)
26. Rezeck, P., Assunção, R.M., Chaimowicz, L.: Flocking-segregative swarming
behaviors using Gibbs random fields. In: 2021 IEEE International Conference on
Robotics and Automation (ICRA), pp. 8757–8763. IEEE (2021)
272 J. A. Defay et al.
27. Rezeck, P., Chaimowicz, L.: Chemistry-inspired pattern formation with robotic
swarms. arXiv preprint: arXiv:2206.03388 (2022)
28. Rosenthal, S.B., Twomey, C.R., Hartnett, A.T., Wu, H.S., Couzin, I.D.: Revealing
the hidden networks of interaction in mobile animal groups allows prediction of
complex behavioral contagion. Proc. Natl. Acad. Sci. 112(15), 4690–4695 (2015)
29. Rueben, M., et al.: Themes and research directions in privacy-sensitive robotics.
In: 2018 IEEE Workshop on Advanced Robotics and its Social Impacts (ARSO),
pp. 77–84. IEEE (2018)
30. Shahrokhi, S., Lin, L., Ertel, C., Wan, M., Becker, A.T.: Steering a swarm of
particles using global inputs and swarm statistics. IEEE Trans. Rob. 34(1), 207–
219 (2017)
31. Thornton, C.: Predictive processing simplified: the infotropic machine. Brain Cogn.
112, 13–24 (2017)
32. Vicsek, T., Czirók, A., Ben-Jacob, E., Cohen, I., Shochet, O.: Novel type of phase
transition in a system of self-driven particles. Phys. Rev. Lett. 75(6), 1226 (1995)
33. Wiseman, J.: Braitenberg vehicles (1999). http://people.cs.uchicago.edu/wiseman/
vehicles/
34. Yang, J.F., et al.: Memristor circuits for colloidal robotics: temporal access to
memory, sensing, and actuation. Adv. Intell. Syst. 4(4), 2100205 (2022)
35. Yu, J., LaValle, S.M., Liberzon, D.: Rendezvous without coordinates. IEEE Trans.
Autom. Control 57(2), 421–434 (2011)
36. Zardini, G., Censi, A., Frazzoli, E.: Co-design of autonomous systems: from hard-
ware selection to control synthesis. In: 2021 European Control Conference (ECC),
pp. 682–689. IEEE (2021)
Decision-Making Among Bounded
Rational Agents
Abstract. When robots share the same workspace with other intelli-
gent agents (e.g., other robots or humans), they must be able to reason
about the behaviors of their neighboring agents while accomplishing the
designated tasks. In practice, frequently, agents do not exhibit abso-
lutely rational behavior due to their limited computational resources.
Thus, predicting the optimal agent behaviors is undesirable (because it
demands prohibitive computational resources) and undesirable (because
the prediction may be wrong). Motivated by this observation, we remove
the assumption of perfectly rational agents and propose incorporating
the concept of bounded rationality from an information-theoretic view
into the game-theoretic framework. This allows the robots to reason other
agents’ sub-optimal behaviors and act accordingly under their computa-
tional constraints. Specifically, bounded rationality directly models the
agent’s information processing ability, which is represented as the KL-
divergence between nominal and optimized stochastic policies, and the
solution to the bounded-optimal policy can be obtained by an efficient
importance sampling approach. Using both simulated and real-world
experiments in multi-robot navigation tasks, we demonstrate that the
resulting framework allows the robots to reason about different levels
of rational behaviors of other agents and compute a reasonable strat-
egy under its computational constraint. (1 A preliminary version of this
work appeared as a poster in 2021 NeurIPS Workshop on Learning and
Decision-Making with Strategic Feedback.
The video of the real-world experiments can be found at https://
youtu.be/hzCitSSuWiI.
We gratefully acknowledge the support of NSF with grant No. 2006886
and 2047169.)
1 Introduction
robot’s motion trajectory depends on and affects the trajectories of others. Thus
they need to anticipate how others will respond to their decisions. The game-
theoretic framework provides an appealing model choice to describe this com-
plex decision-making problem among agents [16] and has been applied to various
robotics applications, e.g., drone racing [24] and swarm coordination [1]. In an
ideal situation, where all the agents are perfectly rational (i.e., they have unlim-
ited computational resources), they can select the motion trajectories that reach
the Nash equilibrium (if exists). However, since these trajectories live in a contin-
uous space, agents need to evaluate infinitely many trajectories and the interac-
tion among them, which is intractable. As a remedy, most of the works consider
constraining the search space of the multi-robot problem via sampling [25] or
locally perturbing the solution [24] to find a good trajectory within a reasonable
amount of time.
Most game-theoretic planners mentioned above do not explicitly consider the
agents’ computational limitations in their modeling process, i.e., they assume
each agent is perfectly rational. They consider these limitations only externally,
e.g., by truncating the number of iterations during optimization. In contrast, we
propose a novel and more principled treatment for modeling agents with limited
computational resources by directly modeling the agents being only bounded-
rational [6] in the game-theoretic framework. Bounded Rationality (BR) has
been developed in economics [22] and cognitive science [7] to describe behav-
iors of humans (or other intelligent agents), who have limited computational
resources and partial information about the world but still need to make deci-
sions from an enormous number of choices, and has been applied to analyze the
robustness of controllers in the single-agent setting [17]. In this work, we use
the information-theoretic view of BR [6], which states that the agent optimizes
its strategy under an information-theoretic constraint (e.g., KL-divergence),
describing the cost of transforming its a-prior strategy into an optimized one.
This problem can be solved efficiently by evaluating a finite number of trajectory
choices from its prior trajectory distribution. Since BR explicitly considers the
computational constraints during the modeling process, the resulting solution
provides an explainable way to trade-off computational efficiency and perfor-
mance. Furthermore, by incorporating BR into the game-theoretic framework,
robots can naturally reason about other agents’ sub-optimal behaviors and use
these predictions to take advantage of (or avoid) other agents with less (or higher)
computational resources.
2 Related Work
3 Problem Formulation
In this section, we define multi-agent decision-making using the formulation of
Multi-Agent Markov Decision Processes (MMDPs) or Markov Game (MGs) [11],
and provide the resulting Nash Equilibrium (NE) solution concept under the
perfect rationality assumption. In the subsequent sections, we show how this
assumption can be mitigated using the BR concept and derive a sampling-based
method to find a Nash Equilibrium strategy profile under bounded rationality.
U i (St , Πt−i,∗ , πti,∗ ) ≥ U i (St , Πt−i,∗ , πti ), ∀i ∈ {1, ..., N }, for any πti , (2)
Intuitively, if the agents satisfy the NE, no agent can improve its utility by
unilaterally changing its strategy. To compute NE, we can apply a numerical
procedure called Iterative Best Response (IBR) [19]. Starting from an initial
guess of the strategy profiles of all the agents, we update each agent’s strategy
to the best response to the current strategies of all other agents. The above
procedure is applied iteratively for each agent until the strategy profile does
not change. If every agent is perfectly rational, the robot can use NE strategy
profile to predict other agents’ behaviors and act correspondingly.1 . However,
there is a gap between this perfect rational assumption and the real world, as
most existing methods can only search the strategy profile in a neighborhood of
the initial guess [23,24] due to computational limits. In the following section, we
fill this gap by explicitly formulating this bounded rational solution.
4 Methodology
This section first provides details on the formulation of bounded rationality and
integrates it into the game-theoretic framework. Then, we propose a sampling-
based approach to generate bounded-rational stochastic policies.
where ψ i (ai |s) ∝ q i (ai )eβU (s,a) . Since KL-divergence is non-negative, the max-
imum of − β1 KL(π i ||ψ i ) is obtained only when KL(π i ||ψ i ) = 0, which means
π i = ψ i . Therefore, the optimal action sequence distribution of agent i (while
keeping other agents’ strategies fixed) under the bounded rationality constraint
is
1 i
π i,∗ (ai |S, Π −i ) = q i (ai )eβ·U (S,Π) , (5)
Z
i i β·U i (S,Π) i
where Z = q (a )e da is a normalization constant. This bounded-
optimal strategy provides an intuitive and explainable way to trade off between
computation and performance. When β increases from 0 to infinity, the agent
becomes more rational and requires more time to compute the optimal behavior.
When βi = 0, the bounded-rational policy becomes the prior, meaning agent i
has no computational resources to leverage. On the other hand, when βi →
∞, the agent becomes entirely rational and selects the optimal action sequence
deterministically. The rationality parameter β allows us to model agents with
different amounts of computational resources.
278 J. Xu et al.
Similar to the rational case, our goal is to find the Nash Equilibrium strategy
profile for a group of bounded-rational agents whose bounded-optimal policies
are defined in Eq. (5). This can be done using the IBR procedure analogous to
Sect. 3. Instead of optimizing the policy in Eq. (1), each bounded-rational agent
finds the optimal strategy distribution defined in Eq. (5) while keeping other
agents’ strategies fixed. This procedure is carried out for each agent for a fixed
number of iterations or until no agent’s stochastic policy changes.
5 Simulated Experiments
We conduct extensive simulation experiments to demonstrate that integrating
bounded rationality with the game theory framework (1) allows each agent to
reason about other agents’ rationality levels to exploit (or avoid) others with less
(or higher) computational capabilities and (2) explicitly trades-off between the
performance and computation by varying the rationality level β and the number
of sampled trajectories. We also show qualitatively that our method can generate
group behaviors that approximately reach a bounded rational Nash Equilibrium
strategy profile for a varying number of agents.
5.2 Results
We first show that using the proposed framework agents can naturally reason
about the behaviors of others with the same physical capabilities but different
280 J. Xu et al.
Fig. 1. (a) Comparison of the performance by increasing the ego’s rationality level from
β = 0.01 to β = 0.13 while keeping other agents’ rationality levels fixed in six (solid
lines) and eight (dashed lines) agent environments. The x-axis and y-axis indicate the
ego’s β values and traveled distances. The green lines are the average travel distance
of other agents and the red lines indicate the ego’s travel distance. (b)(c) Show the
agents’ trajectories in six-agent environment with β = 0.01 and β = 0.13, respectively.
The ego’s trajectories are depicted in red.
rationality levels β. Since we want to examine how varying β affects the perfor-
mance, we need to ensure that the policy converges to the “optimal” one under
the computational constraints. Thus, we sample a large number of trajectories,
5 × 105 , for policy computation for each agent. In this simulation, we fix other
agents’ β = 0.05 and vary one agent’s (called ego agent) β from 0.01 to 0.13 and
compare the travel distances between the robot and other agents (the distance of
other agents is averaged). Figure 1(a) shows the performance comparison in six
and eight agent environments. In general, when the ego has a lower rationality
level β than other agents, it avoids them by taking a longer path. When all the
agents have the same β, the ego and other agents have a similar performance.
As β increases, the ego begins to exploit its advantage in computing more ratio-
nal decisions and taking shorter paths. We also notice that the ego generally
performs better with a large β when there are more agents in the scene. The
trajectories of the six-agent environment for β = 0.01 and β = 0.13 are plotted
in Fig. 1(b) and Fig. 1(c), respectively. When the ego’s β = 0.01 (in the red tra-
jectory), it takes a detour by elevating to a higher altitude to avoid other agents.
In contrast, when its β = 0.13, it pushes other agents aside and takes an almost
straight path. These results are aligned with Fig. 1(a). We omit the trajecto-
ries of eight agent environments to avoid clutter. The readers are encouraged to
watch the videos at https://youtu.be/hzCitSSuWiI
Next, we evaluate the performance of a group of agents with the same β
to show that the trade-off between the performance and computation can be
directly controlled by the rationality level. Since most of the computation occurs
when evaluating a large number of sampled trajectories in the proposed impor-
tance sampling-based method, we can use the number of evaluated trajectories as
a proxy to measure the amount of computation an agent possesses. In Fig. 2(c),
we analyze the relationship between β and the computation constraint in the
six-agent environment. We can observe that when the computation is limited,
Decision-Making Among Bounded Rational Agents 281
Fig. 2. (a) Compares the performance (average traveled distance of the group) of dif-
ferent β values using a different number of trajectories in the six-agent environment.
E The x and y axes are the number of trajectories and the average traveled distance
of the group. (b) Shows the number of trajectories required to converge for different β
in four, six, and eight agent environments.
Fig. 3. Minimum distances of each agent to other agents at every timestep in (a)
four-agent (b) six-agent (c) ten-agent environments. The x-axis and y-axis are the
timesteps and agents’ minimum distances, respectively. The colored solid lines repre-
sent the statistics of each agent, which is the same as their trajectory color in Fig. 4.
The dashed grey line shows the minimum safety distance (0.25m) that needs to be
maintained to avoid collisions.
i.e., only a small number of action sequences can be evaluated, a larger β (more
rational) actually hurts the performance. When the more rational agents have
the resources to evaluate more trajectories, they travel less distance on average
than the less rational groups. This result demonstrates that by controlling the
rationality parameter the bounded rational framework can effectively trade-off
between optimality and limited computation. In Fig. 2(b), we also evaluate the
number of trajectories that need to be sampled for the method to converge at
different rationality levels in four, six, and eight agent environments. The result
aligns with the previous observation – in general, when β is larger, more trajec-
tories need to be evaluated to converge to “optimal” under the bounded rational
constraints. Furthermore, when more agents are present in the environment, the
method requires more trajectories to converge.
282 J. Xu et al.
Fig. 4. Agents’ trajectories in environments with four, six, and ten agents. The three
columns show the snapshots at t = 20, 60, 80, respectively. Each agent, its trajectory,
and its goal are assigned the same unique color. We only show the last 10 steps of
trajectories to avoid clutter.
6 Physical Experiments
We use the Crazyflie 2.1 nano-drones under a motion capture system to validate
our method in the real world. For this hardware experiment, we consider two
types of tasks with a varying number of agents. The first task is to navigate
a group of drones to a designated goal region while avoiding static obstacles
and inter-drone collisions. The second task is position swapping similar to the
previous section. The size of the workspace considered in all the experiments
is 4.2 m × 5.4 m × 2 m in the x, y, and z axes, and the size of the drone is
Decision-Making Among Bounded Rational Agents 283
Fig. 5. (a) Shows the experimental setup with 4 Crazyflie drones. Green arrow points
to the goal position. All the agents are navigating to the same goal point. (b) Snapshot
of the experiment. Shows the path followed by the agents to avoid the obstacles.
Fig. 6. Shows the experimental setup with 6 Crazyflie drones. The drones are divided
into two groups - red and green. (a) Shows the initial position of the drones. The task
is to swap the positions. Black lines show assigned swapping tasks among agents. (b)
Snapshot of the experiment during swapping. It shows the path followed by the agents
to avoid collision with each other.
of the drone (0.5m) in the z-axis. This event is captured in the snapshot shown
in Fig. 5(b). Figure 6 shows the position swapping scenario. We use the same
dynamics model as the previous section to generate the trajectories. We observe
that the outcomes of the physical experiments are consistent with the results
obtained in the simulation.
7 Conclusion
This paper considers the problem of making sequential decisions for agents with
finite computational resources, where they need to interact with each other
to complete their designated tasks. This problem is challenging because each
agent needs to evaluate its infinite number of decisions (e.g., waypoints or actu-
ator commands) and reason how others will respond to its behavior. While the
game-theoretic formulation provides an elegant way to describe this problem,
it is based on an unrealistic assumption that agents are perfectly rational and
have the ability to evaluate the large decision space. We propose a formulation
that replaces this rational assumption with the bounded rationality concept and
presents a sampling-based approach to computing agents’ policies under their
computational constraints. As shown in the experiments, by removing the per-
fect rational assumption, the proposed formulation allows the agents to take
advantage of those with less computational power or avoid those who are more
computational-capable. Additionally, when all the agents are similarly compu-
tational capable, they exhibit behaviors that avoid being taken advantage of by
others.
References
1. Abdelkader, M., Güler, S., Jaleel, H., Shamma, J.S.: Aerial swarms: recent appli-
cations and challenges. Current Rob. Reports 2(3), 309–320 (2021)
2. Bishop. C.M., Nasrabadi, N.M.: Pattern recognition and machine learning, vol. 4.
Springer (2006)
3. Botvinick, M., Toussaint, M.: Planning as inference. Trends Cogn. Sci. 16(10),
485–488 (2012)
4. Chen, Y.F., Everett, M., Liu, M., How, J.P.: Socially aware motion planning with
deep reinforcement learning. In: 2017 IEEE/RSJ International Conference on Intel-
ligent Robots and Systems (IROS), pp. 1343–1350. IEEE (2017)
5. Fisac, J.F., Bronstein, E., Stefansson, E., Sadigh, D., Sastry, S.S., Dragan, A.D.:
Hierarchical game-theoretic planning for autonomous vehicles. In: 2019 Interna-
tional Conference on Robotics and Automation (ICRA), pp. 9590–9596. IEEE
(2019)
6. Genewein, T., Leibfried, F., Grau-Moya, J., Braun, D.A.: Bounded rationality,
abstraction, and hierarchical decision-making: An information-theoretic optimality
principle. Front. Robot. AI 2, 27 (2015)
7. Gigerenzer, G., Brighton, H.: Homo heuristicus: why biased minds make better
inferences. Top. Cogn. Sci. 1(1), 107–143 (2009)
8. Ingrand, F., Ghallab, M.: Deliberation for autonomous robots: a survey. Artif.
Intell. 247, 10–44 (2017)
Decision-Making Among Bounded Rational Agents 285
9. Kappen, H.J., Gómez, V., Opper, M.: Optimal control as a graphical model infer-
ence problem. Mach. Learn. 87(2), 159–182 (2012)
10. Lambert, A., Fishman, A., Fox, D., Boots, B., Ramos, F.: Stein variational model
predictive control. arXiv preprint arXiv:2011.07641 (2020)
11. Littman, M.L.: Markov games as a framework for multi-agent reinforcement learn-
ing. In: Machine Learning Proceedings 1994, pp. 157–163. Elsevier (1994)
12. Lütjens, B., Everett, M., How, J.P.: Safe reinforcement learning with model uncer-
tainty estimates. In: 2019 International Conference on Robotics and Automation
(ICRA), pp. 8662–8668. IEEE (2019)
13. Mellinger, D., Kumar, V.: Minimum snap trajectory generation and control for
quadrotors. In: 2011 IEEE International Conference on Robotics and Automation,
pp. 2520–2525 (2011)
14. Ortega, P.A., Braun, D.A.: Thermodynamics as a theory of decision-making
with information-processing costs. Proc. Royal Soc. A: Mathem. Phys. Eng. Sci.
469(2153), 20120683 (2013)
15. Ortega, P.A., Braun, D.A., Dyer, J., Kim, K.-E., Tishby, N.: Information-theoretic
bounded rationality. arXiv preprint arXiv:1512.06789 (2015)
16. Osborne, M.J., et al.: An introduction to game theory, vol. 3. Oxford University
Press New York (2004)
17. Pacelli, V., Majumdar, A.: Robust control under uncertainty via bounded ratio-
nality and differential privacy. arXiv preprint arXiv:2109.08262 (2021)
18. Preiss, J.A., Hönig, W., Sukhatme, G.S., Ayanian, N.: Crazyswarm: a large nano-
quadcopter swarm. In: IEEE International Conference on Robotics and Automa-
tion (ICRA), pp. 3299–3304. IEEE (2017). https://github.com/USC-ACTLab/
crazyswarm
19. Reeves, D., Wellman, M.P.: Computing best-response strategies in infinite games
of incomplete information. arXiv preprint arXiv:1207.4171 (2012)
20. Schwarting, W., Alonso-Mora, J., Rus, D.: Planning and decision-making for
autonomous vehicles. Annual Rev. Control Robot. Autonom. Syst. 1, 187–210
(2018)
21. Schwarting, W., Pierson, A., Karaman, S., Rus, D.: Stochastic dynamic games in
belief space. IEEE Trans. Robot. (2021)
22. Simon, H.A.: A behavioral model of rational choice. Q. J. Econ. 69(1), 99–118
(1955)
23. Spica, R., Cristofalo, E., Wang, Z., Montijano, E., Schwager, M.: A real-time game
theoretic planner for autonomous two-player drone racing. IEEE Trans. Rob. 36(5),
1389–1403 (2020)
24. Wang, M., Wang, Z., Talbot, J., Gerdes, J.S., Schwager, M.: Game theoretic plan-
ning for self-driving cars in competitive scenarios. In: Robotics: Science and Sys-
tems (2019)
25. Williams, G., Goldfain, B., Drews, P., Rehg, J.M., Theodorou, E.A.: Best response
model predictive control for agile interactions between autonomous ground vehi-
cles. In: 2018 IEEE International Conference on Robotics and Automation (ICRA),
pp. 2403–2410. IEEE (2018)
26. Williams, G., et al.: Information theoretic mpc for model-based reinforcement
learning. In: 2017 IEEE International Conference on Robotics and Automation
(ICRA), pp. 1714–1721. IEEE (2017)
Distributed Multi-robot Information
Gathering Using Path-Based Sensors
in Entropy-Weighted Voronoi Regions
1 Introduction
Fig. 1. Information gathering with two robots where the search space is decomposed
in two entropy-weighted Voronoi partitions. The dotted lines represent the path that
the agent traverses in the three search rounds. Robot 1 efficiently traverses the path
of round 1, reporting a potential target at the location shown in green. However, it
gets destroyed on round 2 and 3, and thus we only know that a hazard exist along
these two paths. The overlapping areas of the round 2 and 3 for robot 1 have higher
probability of hazard existence (darker gray). Robot 2 gets destroyed at round 1, and
survives round 2 and 3. Since robot 2 reports a target at the same location at round 2
and 3, the probability of target existence is higher (darker green).
2 Problem Formulation
Consider a spatial environment which is divided into a×b discrete cells composing
the search space S ∈ R2 . A team of M agents is deployed in the environment,
where each agent is denoted by i = 1, . . . , M . The search space includes n base
stations D = {d1 , d2 , . . . , dn } ⊂ S, where n ≤ M . Note that in this paper we
consider the case of n = M . Each agent i is located at its corresponding base
station di to explore a specific bounded region mSi ⊂ S. The union of the regions of
exploration Si constitute the search space i=1 Si = S, with no overlap ∀i = j,
Si ∩ Sj = ∅, where i, j ∈ {1, 2, . . . , m}. Each agent i can visit up to li cells c in its
search space forming the path ζi = c1 , c2 , . . . , cli ⊆ Si such that the exploration
starts and ends at the agent’s corresponding base station di , i.e., c1 = cli = di .
When all the surviving agents reach their respective base stations at the end of
their exploration, we terminate one search round r ∈ Z>0 . We define the time
taken by an agent i to move from cell cj to cj+1 as one timestep t. Thus, the
duration of a search round can be defined as tr := maxi∈{1,...,M } li t.
The search space includes hazardous elements Z = {0, 1} that may destroy
the agent, where Z = 1 indicates the presence of a hazard, while Z = 0 denotes
the absence of a hazard in a particular cell. If the agent i is destroyed anywhere
along the path ζi then the path-based sensor (Definition 1) is triggered (Θ = 1),
and if the agent i survives the path ζi then we consider that the path-based
sensor is not triggered (Θ = 0). We assume that the path-based sensor may
report false positive and false negative triggering. False-positive accounts for
faulty or malfunctioning robots that get destroyed regardless of the presence of
a hazard, whereas false-negative accounts for the chance of the robot surviving
a cell despite having a hazard. The search space also includes some elements of
interest, hereafter referred to as targets, X = {0, 1}, where X = 1 indicates the
presence of a target, while X = 0 denotes the absence of a target in a particular
cell. The presence of a target is recorded by a noisy sensor that may also report
a false positive or a false negative observation of the target. The sensor used for
the detection of targets is not a path-based sensor. In other words, if a robot
survives a path, the target sensor reports the exact whereabouts of the target
observation along the survived path.
290 A. K. Srivastava et al.
Remark 1. The use of the centralized server to partition the environment and
update shared belief maps is convenient, but not required. If a completely decen-
tralized and distributed algorithm is desired, then the use of the centralized
server can be eliminated, assuming that base stations are able to synchronize
their data after each search round. Each base station can perform the same
(deterministic) weighted Voronoi partitioning algorithm in parallel such that all
base stations achieve the same partitioning.
To address the obscure information obtained from a path-based sensor, [1] pro-
posed integration of all potential location for a path-based sensor triggering.
However, this increases the computational complexity of optimal path solutions
beyond short path lengths. Therefore, we propose a distributed strategy with
multiple agents that partitions the environment in a way that maximizes the
expected information gain of the entire search space S without task duplicity.
The structure of the proposed methodology is presented in Fig. 2.
Multi-robot Information Gathering, Path-Based Sensors, Voronoi Regions 291
Fig. 2. The block diagram illustrates a decoupled two-step scheme of distributed local
planning with centralized partitioning. In the leftmost block diagram base stations di
(r) (r)
communicate their belief maps Xi , Zi to the central server σ. Next, the central
server partitions the environment S into sub-search spaces Si and then transmits this
information to each base station di . The sub-search spaces Si are used locally for
information theoretic planning.
Si = { pi ∈ S | w(Xp i , XSj )g(pi , pdj ) ≤ w(Xp i , XSk )g(pi , pdk ), ∀ j = k}. (3)
Note that for the initial decomposition of the environment with no prior informa-
tion, the entropy-weighted Voronoi partitioning (3) is identical to the geodesic
Voronoi partitioning (1), i.e., w(Xp i , XSj ) = w(Xp i , XSk ) for all j = k.
now assigned to find a path ζi∗ in its corresponding search space Si that max-
imizes the expected information gained about targets X and hazards Z, given
the observed target sensor measurement Y and path-based sensor measurements
Θζi along a path ζi at the exploration round r,
(r)∗ (r−1) (r) (r) (r) (r−1) (r) (r)
ζi = arg maxζi ∈Ωi I(Xi ; Y i | Θζi , Si ) + I(Zi ; Θζi | Si ) ,
(4)
(r) (r)
where Ωi is the space all possible paths of agent i in its search space Si , Y i =
[Yi,1 , . . . , Yi,l ] ∈ Rl denotes all l observations collected at search round r by
(r) (r)
agent i, and Θζi is the space of observation of agent i about the path-based sensor
triggering. This local path planning (4) is addressed using the methodology in [1].
Fig. 3. The block diagram illustrates a distributed local planning with centralized
sequential Bayesian filter (SBF). In the leftmost block all base stations di communicate
(r) (r)
their local observations Y i , Θζi to the central server σ. Next, the central server
updates the belief map sequentially, where the posterior belief map of agent i is used
as prior for the i + 1 update. The posterior belief map is broadcasted to the base
stations which use them for local information theoretic planning.
indicates action from central server σ, and blue color local action from a base
station di . After the environment has been partitioned successfully at the central
entity, each agent i plans a path using calculatePath algorithm introduced in
[1]. DEVPP terminates after a number of rounds rmax .
Unlike DEVPP, this approach does not perform any partitioning of the search
space. Thus, each agent i explores the entire search space S and plan its path
based on the prior belief of hazards and targets in the search space S at every
search round r. In other words, the planning is global in the entire search space
S and not local as in Sect. 3.1. After a search round, each agent i transmits the
(r) (r)
target observations Y i and the path-based sensor measurement Θζi to its base
station di which subsequently uploads this information to the central server σ.
(r)
The central server then receives observations from all base stations {Θζi }M i=1 ,
(r)
{Y i }Mi=1 and updates the belief maps using sequential Bayesian filtering (SBF).
Next, the central server broadcasts the posterior belief maps to the base stations.
(r+1)∗
Lastly, each base station di computes an information theoretic path ζi in
the search space S and the robots are assigned to traverse the paths and collect
measurements. The proposed MA-DITP method is presented in Algorithm 2,
where brown color indicates action from the central entity, and blue color local
action from a base station di . The block diagram is illustrated in Fig. 3.
Fig. 4. Information entropy of the environment with hazard lethality of 50%, 70%,
90% using the DEVPP method. As the fleet size increases the information entropy of
the environment decreases faster.
Fig. 5. (Left) Computational time required for each agent to perform local computa-
tions for r = 150 search rounds using DEVPP for M ≥ 2 and Fig. 5 for M = 1. As
the fleet size increases the computation is distributed among agents. (Right) Compu-
tational time required by the central server σ to partition the environment using (3).
As the size of the fleet increases, the time required to compute the partitioned search
spaces Si ∈ S increases.
Fig. 6. Information entropy of the environment with hazard lethality of 90% for all
proposed methods. We consider three fleet sizes M = 2, M = 4, and M = 5 agents,
while all cases are identical to [1] when M = 1 agent.
Fig. 7. The boxplots illustrate the execution time per agent in seconds with hazard
lethality of 90% for fleet sizes M = 2, M = 4, and M = 5 agents. When of M = 1
agent all methods report identical execution time to [1].
To this end, not only information gathered using DEVPP is more efficient, but
also it is executed significantly faster. The other proposed methods (MA-DITP
and MA-GITP) are executed in similar or slightly more time than [1].
Main Result: The numerical experiments reveal that the proposed DEVPP
method with multiple robots is more efficient in terms of information gathered
compared to the case of a single robot [1]. This result is attributed to the shorter
paths produced for each agent on the assigned subspaces, which subsequently
reduces the likelihood of robot destruction and significantly reduces execution
time. To this end, simultaneous deployment of multiple robots leads to efficient
information gathering in all proposed methodologies.
5 Conclusion
information gathered for hazards and targets using DEVPP outperforms the
rest approaches in all cases. In addition, DEVPP reduces the computational
cost, thereby accelerates the overall execution time of the information-theoretic
planner. Although MA-GITP performs slower than the rest methods, no com-
munication is required, and thus it is resilient to malicious attacks in the com-
munication system. All proposed multi-robot methodologies, reduce the entropy
of the environment faster than the case where only a single robot is deployed.
Acknowledgments. This study was conducted at the Motion and Teaming Labo-
ratory, University of Maryland (UMD), and was funded by the Maryland Robotics
Center (MRC) and the Office of Naval Research (ONR). Alkesh K. Srivastava and
George P. Kontoudis were partially supported by the MRC to conduct this research
with “Pathway to the Ph.D.” and “MRC Postdoctoral Fellowship” programs, respec-
tively. This work has been possible with the support of ONR under the grant “Exper-
imentally Verified Autonomous Path Planning for Information Gathering in Lethally
Hostile Environments with Severely Limited Communication” (N0001420WX01827).
The views, positions, and conclusions contained in this document are solely those of
the authors and do not explicitly represent those of ONR.
References
1. Otte, M., Sofge, D.: Path-based sensors: paths as Sensors, Bayesian updates, and
Shannon information gathering. IEEE Trans. Autom. Sci. Eng. 18, 946–967 (2021)
2. Okabe, A., Boots, B., Sugihara, K.: Nearest neighbourhood operations with gen-
eralized Voronoi diagrams: a review. Int. J. Geogr. Inf. Syst. 8(1), 43–71 (1994)
3. Choset, H.: Coverage for robotics - a survey of recent results. Ann. Math. Artif.
Intell. 31, 113–126 (2001)
4. Nair, V.G., Guruprasad, K.R.: GM-VPC: an algorithm for multi-robot coverage of
known spaces using generalized Voronoi partition. Robotica 38(5), 845–860 (2020)
5. Jager, M., Nebel, B.: Dynamic decentralized area partitioning for cooperating
cleaning robots. In: IEEE International Conference on Robotics and Automation,
vol. 4, pp. 3577–3582 (2002)
6. Rankin, E.S., Rekleitis, I., New, A.P., Choset, H.: Efficient Boustrophedon multi-
robot coverage: an algorithmic approach. Ann. Math. Artif. Intell. 52, 109–142
(2008)
7. Hazon, N., Kaminka, G.A.: On redundancy, efficiency, and robustness in coverage
for multiple robots. Robot. Auton. Syst. 56, 1102–1114 (2008)
8. Agmon, N., Hazon, N., Kaminka, G.A.: The giving tree: constructing trees for
efficient offline and online multi-robot coverage. Ann. Math. Artif. Intell. 52(2–4),
43–168 (2009)
9. Zheng, X., Koenig, S., Kempe, D., Jain, S.: Multirobot forest coverage for weighted
and unweighted terrain. IEEE Trans. Rob. 26(6), 1018–1031 (2010)
10. Wilson, Z., Whipple, T., Dasgupta, P.: Multi-robot coverage with dynamic coverage
information compression. In: International Conference on Informatics in Control,
Automation and Robotics, pp. 236–241 (2011)
11. Michel, D., McIsaac, K.: New path planning scheme for complete coverage of
mapped areas by single and multiple robots. In: IEEE International Conference
on Mechatronics and Automation, pp. 1233–1240 (2012)
Multi-robot Information Gathering, Path-Based Sensors, Voronoi Regions 299
12. Macharet, D., Azpurua, H., Freitas, G., Campos, M.: Multi-robot coverage path
planning using hexagonal segmentation for geophysical surveys. Robotica 36(8),
1144–1166 (2018)
13. Guruprasad, K.R., Wilson, Z., Dasgupta, P.: Complete coverage of an initially
unknown environment by multiple robots using Voronoi partition. In: International
Conference on Advances in Control and Optimization in Dynamical Systems, Ban-
galore, India (2012)
14. Kurabayashi, D., Ota, J., Arai, T., Yoshida, E.: Cooperative sweeping by multiple
mobile robots. In: IEEE International Conference on Robotics and Automation,
vol. 2, pp. 1744–1749 (1996)
15. Bhattacharya, S., Michael, N., Kumar, V.: Distributed coverage and exploration
in unknown non-convex environments. In: Martinoli, A., et al. (eds.) Distributed
Autonomous Robotic Systems. Springer Tracts in Advanced Robotics, vol. 83, pp.
61–75. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-32723-0 5
16. Otte, M., Sofge, D.: Path planning for information gathering with lethal hazards
and no communication. In: International Workshop on the Algorithmic Founda-
tions of Robotics, pp. 389–405 (2018)
17. Schwager, M., Dames, P., Rus, D., Kumar, V.: A multi-robot control policy for
information gathering in the presence of unknown hazards. In: Christensen, H.I.,
Khatib, O. (eds.) Robotics Research. STAR, vol. 100, pp. 455–472. Springer, Cham
(2017). https://doi.org/10.1007/978-3-319-29363-9 26
18. Julian, B.J., Angermann, M., Schwager, M., Rus, D.: Distributed robotic sensor
networks: an information-theoretic approach. Int. J. Robot. Res. 31(10), 1134–1154
(2012)
19. Dames, P., Schwager, M., Kumar, V., Rus, D.: A decentralized control policy for
adaptive information gathering in hazardous environments. In: IEEE Conference
on Decision and Control, pp. 2807–2813 (2012)
20. Dames, P., Schwager, M., Rus, D., Kumar, V.: Active magnetic anomaly detection
using multiple micro aerial vehicles. IEEE Robot. Autom. Lett. 1, 153–160 (2016)
Distributed Multiple Hypothesis Tracker
for Mobile Sensor Networks
1 Introduction
[33]. We base our solution on the track-oriented MHT (TOMHT) [16] as this
offers a conceptually simple and computationally efficient solution.
Active search with teams of robots is another well studied problem. The
primary distinction in this space is the use of centralized [27,34] versus dis-
tributed strategies [11,19,29]. One of the most successful distributed algorithms
is Voronoi-based control [7], where the basic idea is to divide the search area
with a Voronoi partition [12] and iterative move each robot to the centroid of its
Voronoi cell, a process known as Lloyd’s algorithm. In recent years, Multi-Agent
Deep Reinforcement Learning (MADRL) has offered a new method to generate
search policies through repeated interactions between other agents and the envi-
ronment in an attempt to maximize a reward function [15]. However, much of
the work based on MADRL requires global information and uses a centralized
architecture [14,25]. Zhou et al. [35] propose a decentralized MADRL method
using the maximum reciprocal reward to track cooperatively with UAV swarms.
However, one downside of learning-based frameworks is that they often have
trouble generalizing to unseen scenarios.
There are few works that specifically address the MR-MTT problem. One of
the first solutions was Cooperative Multi-robot Observation of Multiple Moving
Targets (CMOMMT) from Parker [22], which assigns robots to targets to maxi-
mize the time such that each target is observed by at least one robot. Related to
CMOMMT, Pierson and Rus [23] proposed a distributed method for capturing
multiple evaders with multiple pursuers in 2D and 3D environment by assigning
pursuers to evaders and then using gradient descent. Pimenta et al. [24] propose
Simultaneous Coverage and Tracking (SCAT) where they use Lloyd’s algorithm
to create a control law with guaranteed exponential convergence to a local min-
imum of the objective function. Our previous work [9] built upon SCAT by
allowing the robots to learn the target distribution online rather than needing
that information a priori. We did this by formulating a distributed PHD filter
and using this as the importance weighting function in Lloyd’s algorithm to drive
target search. This paper will use the same control framework but formulate a
new distributed MHT.
The primary contribution of this work is creating a new distributed TOMHT
algorithm. Our approach leverages information about the relative locations of
sensors (i.e., robots) to decrease the memory and computational requirements
relative to a naı̈ve implementation. We then demonstrate the utility of this for-
mulation to solve the MR-MTT problem by using the resulting target estimates
to drive robot search. Our search strategy naturally and effectively drives the
robots to follow previously detected targets and to explore unknown areas that
may contain new targets. To demonstrate our result is correct and efficient, we
compare our result to that of the PHD filter [9] through a series of simulated
experiments with static and moving targets.
302 P. Xin and P. Dames
2 Background
In this section we provide the mathematical formulation of the standard Multiple
Hypothesis Tracker (MHT) for multi-target tracking and of Lloyd’s algorithm
for distributed coverage control.
d2t (x̂it , zjt ) = (μ̂it − zjt )T (Σ it )−1 (μ̂it − zjt ) ≤ 2d , (1)
where μit , Σ it are the mean and covariance of the estimated track x̂it and d
is the allowable distance threshold. Figure 1a shows the gating process for one
Distributed Multiple Hypothesis Tracker 303
time step. This comparison is to determine the full set of potential associations
between all measurements Zt and all tracks X̂t at time t.
The TOMHT then considers a set of potential associations, as Fig. 1b shows.
For example, if measurements z2t and z3t could be associated with a track x̂1t ,
then two copies of that track are propagated into the future, one using each
hypothetical association, as Fig. 1c shows. All these distinct hypotheses for a
track are maintained using a tree structure, where each branch is a unique series
of associations and each node has a distinct mean and covariance. Additionally,
the TOMHT considers the possibility of starting a new tree using each measure-
ment, which represents the possibility that a detection represents an entirely
new object.
Track Scoring. Based on the above structure, there will be many distinct
hypotheses for an individual track based on different association histories. There-
fore, the MHT uses a scoring system to determine the likelihood of individual
tracks. This score can be used both to find the maximum likelihood track as well
as to prune branches within the tree to avoid an exponential growth in the set
of hypotheses over time. A standard way to formulate the score S uses the log
likelihood ratio [3,16]
ki−1
i=1:t p(zi | z1:i−1 , zi ⊆ Zi )
ki ki
p(zk1:t
1:t
| zki i ⊆ Zi ∀i)
S(Ai1:t ) = ln = ln (2)
p(zk1:t
1:t
| zki i ⊆ ∅) ki−1
i=1:t p(zi | z1:i−1 , zi ⊆ ∅)
ki ki
In other words, we want to find the maximum total store such that each mea-
surement is associated with at most one target. This optimization problem is
solved in [16]. Then the N-scan pruning approach will trace back to t − N and
prune the subtrees which diverge from the global optimal hypothesis, which is
the solution to (4).
304 P. Xin and P. Dames
This is a distributed control algorithm because as long as each robot knows the
positions of its neighbors, a robot is capable of computing its Voronoi cell and
moving forward to its weighted centroid of Voronoi cell. Note that in practice,
Voronoi neighbors can potentially be far apart. We make the assumption that
the density of robots and/or the communication range of robots is sufficiently
large to allow robots to communicate with all of their neighbors. The problem
of formulating a consistent partition subject to communication range limits will
the subject of future work.
This simple iterative, myopic control scheme is widely used in coverage prob-
lems. For example, Lee and Egerstedt [18] use Lloyd’s algorithm to track a time-
varying density function with a teams of robots. Bhattacharya et al. [1] expand
the search space to non-convex and non-Euclidean space. Suresh et al. [31] pro-
pose a method with low complexity and communication workload in distributed
case using Lloyd’s algorithm. Our previous work [9] repurposes this coverage
controller to perform target tracking by choosing the estimated target density
as the time-varying importance weighting function φ(x).
The formulation above was for the standard single-sensor case. However, two
new considerations arise when there are multiple robots. First, multiple robots
may observe the same target at the same time (and consequently may each have
a local copy of a track for that target). Second, targets may leave the FoV of
one robot and enter the FoV of another robot. So long as those two robots are
within communication range (which is typically much larger than the sensing
range), then the first robot can provide some information to the second robot
to improve its initial estimation. This section outlines our novel formulation to
address these issues.
Distributed Multiple Hypothesis Tracker 305
Note that there are existing distributed MHT solutions for multi-sensor
surveillance in radar systems [5,6]. However, in that case, the sensors are static
and the topology of the communication network is fixed. In our case, the sen-
sors are mobile, have a smaller FoV than that of radar, and the communication
topology changes online. Our new formulation accounts for all of these effects.
3.1 Assumptions
We make three main assumptions. First, that robots know their pose. This is
a strong assumption, but robots operating in indoor environments with a high-
quality prior map [26] or in outdoor environments with GPS receivers are able to
navigate for long periods of time with minimal drift in their pose estimates. We
also showed in our past work that sufficiently small drift has negligible effects
on distributed target tracking accuracy [9]. Second, we assume that each robot
is capable of communicating with its Voronoi neighbors and all the robots with
overlapping sensor FoVs. This assumption was also discussed above in Sect. 2.2.
Third, each robot has a unique ID, which they will use to determine precedence
to avoid conflicts.
Fig. 2. Example of track fusion. There are three robots, whose FoV are shown by the
dashed circles. Each robot has a unique color that is shared by its tracks. There are
4 targets, located at (10.5, 11), (5.5, 6.5), (5.5, 9), and (16, 16), and 6 tracks (colored
ellipses), 2 from each robot. Here r2 is the only robot with a nonempty set of private
tracks, P 2 = (16, 16). All other tracks are in the union of multiple areas. Since r3 has
the largest ID, then R3 = (5.5, 9), (10.5, 11) and F 3 = (10, 11), (11, 11), (5.5, 9), while
R1 , R2 , F 1 , F 2 = ∅. Then, r3 calculates the KL-Divergence between tracks in F 3 and
R3 and fuse tracks with low KL-Divergence using Algorithm 1.
distribute information across the team, i.e., robot r maintains the tracks that
are located within its Voronoi cell Vr . The key is to create a set of rules for data
exchange to ensure consistency across the team as cell boundaries shift due to
robot motion and as targets move.
When robots are well spaced out with their FoVs not overlapping, then each
robot can run its own separate MHT. However, when multiple robots view the
same region then each may have its own separate track for the same target.
These repeated tracks not only drive several robots track one target, which is
a waste of system resource, but also increase false positive rate. Algorithm 1
outlines our approach, which is also illustrated in Fig. 2.
The process starts by inflating a robot’s FoV to account for effects such as
measurement noise (line 1), typically using the standard deviation of the mea-
surement model 2σ. Robots then exchange these inflated regions to determine
their local grouping NF (line 2). Robots then exchange track estimates amongst
these local clusters (lines 3–14), using the relative positions of the track and
other robots as well as the IDs of each robot to sort each track into one of three
categories: 1) private (i.e., only in F̂r ), 2) fused (i.e., in multiple regions but r
has the highest ID amongst these), or 3) received (i.e., in multiple regions but
r is not the highest ID).
After this data exchange process, robot r measures the Kullback-Leibler
divergence between all tracks in its fused and received sets F and R (lines 15–20).
Robot r then uses the Munkres algorithm [17] to find the optimal assignment of
tracks (line 21), where a valid assignment (i.e., with cost less than dmax ) means
that a fused and received track are the same object and the estimates are com-
bined and added to the private list (lines 26–27). Unassigned tracks are added
directly to the private list (lines 23–24 and 30–34).
where v(x) accounts for undetected targets (i.e., driving exploration) and the
right terms accounts for detected targets (i.e., exploitation). We create a simple
ad hoc rule to approximate the number of undetected targets
(1 − pd (x))v(x) x ∈ FoV
v(x) = (8)
max(v + Δv, δ) else,
DKL (tR F
i tj ), DKL (tR F
i tj ) < KDL
18: di,j =
dmax , else
19: end for
20: end for
21: Find assignment A : R → F
22: for tRi ∈ R do
23: if A(i) = ∅ then Track tRi not assigned
R
24: P = P ∪ {ti }
25: else
μR +μF Σ R +Σ F
26: Make new track t with μ = i 2 A(i) , Σ = i 2 A(i)
27: P = P ∪ {t}
28: end if
29: end for
30: for tFj ∈ F do
31: if (i s.t. A(i) = j) AND (di,j = dmax , ∀i) then Track tF
j not assigned
F
32: P = P ∪ {tj }
33: end if
34: end for
use this to approximate the integrals in (6) by summations. We use the same
algorithms as our previous work [9, Algorithm 1] to ensure the data is properly
exchanged as Voronoi boundaries shift.
4 Results
In this section, we test our approach against our previous results using the
distributed PHD filter [9], as this is the most closely related MR-MTT solution
in the literature. First, we isolate the differences due to the tracker by using the
same measurement and pose data from [9] and running this data through our
new MHT. Then, we examine the benefits of the new weighting function (7) to
show that unifying the estimation and control is helpful.
All trials take place in an open 100 × 100 m area. We run trials with different
number of targets with the position of targets drawn uniformly from a 120 × 120
m area. Any targets initialized outside the map will be discarded. The robots
begin at random locations within a small box at the bottom center of the map.
The robot are holonomic with a maximum velocity of 2 m/s. Each robot is
equipped with an onboard sensor with a circular FoV of radius 5 m, pfn = 1−pd =
0.2, and pfp = 3.66 · 10−3 . Robots measure the relative position of a target
z ∼ N (x − q, 0.25I2 ). The sensors collect measurements at 2 Hz. Note, these
values do not represent any specific real-world situation, but could represent a
team of drones equipped with downward facing cameras. For a real situation, one
could follow the procedures outlined in our other work to carefully characterize
the sensor models [4,8].
Fig. 3. Robot trajectories. (a) shows the trajectories of robots during a single trail
with 20 robots and 5 static targets. (b) shows the trajectories of robots during a single
trail with 20 robots and dynamic targets
Fig. 4. Boxplots shows the median OSPA statistics over 10 runs for teams of 10–100
robots and 10, 30, 50 static targets for distributed MHT, MHT tracking based on
control result in PHD and distributed PHD case respectively.
In the static case, the number of targets and their positions remain constant.
We run a batch of trials with 10, 30, and 50 targets, respectively. The robots
begin by distributing over the environment. Once a robot detects a target, it will
typically continue to track that target unless the track gets assigned to another
robot, as Fig. 3a shows. As the robots explore the environment, v(x) decreases in
the detected area, thus the weighted centroid in one Voronoi cell keeps shifting
away from regions without targets.
310 P. Xin and P. Dames
Fig. 5. Boxplots shows the median of standard deviation of OSPA statistics over 10
runs for teams of 10–100 robots and 10, 30, 50 static targets for distributed MHT,
tracking results from MHT using detection results in PHD and distributed PHD case
respectively.
Figure 4 shows the median OSPA over the last 100 s of a 550 s experiment
(long enough to reach a steady state). We see that the median OSPA decreases
significantly when the number of robots increases for all three methods, leveling
out when the number of robots is near the number of targets. We see that in
general, the performance is about the same using all three methods, with the
mixed MHT-PHD case having a higher average OSPA as the control was not
directly tied to the tracker. This is also born out in the standard deviation of
the OSPA data in Fig. 5, where the MHT-PHD case tends to have the highest
standard deviation. This also explains the increase in the standard deviation as
the number of robots increases.
We also see that the MHT data tends to be between the mixed case and the
PHD. This is likely due to the MHT making hard decisions about the presence
of a target (i.e., a track exists or is pruned) compared to the PHD, which uses
a target density function that more gracefully degrades. We expect the MHT
to perform better when there are fewer false negative detections, as repeated
missed detections often cause tracks to be pruned.
Fig. 6. Boxplots shows the median OSPA error statistics over 10 runs for teams of 10–
100 robots and dynamic targets for distributed MHT, MHT tracking based on control
result in PHD and distributed PHD case respectively.
number. The emergent behavior in this case is different the static case, where
some robots spread out along the map boundary due to detect new targets,
as Fig. 3b shows. To get the steady state tracking performance, we extend the
experiment to 800 s and we evaluate the OSPA error over last 200 s.
Figure 6 shows the median OSPA of the distributed MHT, MHT-PHD and
PHD. We see that the OSPA decreases as the number of robots increases, and
the variance of the OSPA has the same trend as the static case. The pure MHT
estimation, namely, the MHT-PHD, still performs a little worse than the dis-
tributed PHD and keeps the largest variance among these three methods. The
main potential reason is similar to that in static case, which is the control policy
based on distributed PHD disturbs the tracks assignment in MHT estimation.
However, unlike the static case, the MHT has a consistently lower OSPA value
than the PHD data, especially in smaller teams. It also has a smaller variation
in larger teams, meaning it provides a more consistent result.
We do not analyze the standard deviation of the OSPA in last 200 s of each
trail because the appearance and disappearance of the targets is the principle
component of the standard deviation.
4.4 Discussion
Beyond the qualitative and quantitative tracking performance in the above sce-
nario, there are a number of other factors to consider with MR-MTT algorithms.
Failure Cases. The most challenging scenario for MTT is when targets are
tightly clustered together (relative to the measurement covariance), as this is
when data association is most difficult. Due to their different approaches, the
MHT and PHD filter will fail in different ways. The MHT makes multiple hard
312 P. Xin and P. Dames
decisions about data association. Thus, tightly clustered targets will lead to a
large branching factor at each time step, resulting in a large number of branches
that are difficult to prune. This will cause the number of branches to be very
large and difficult to prune. On the other hand, the PHD filter represents targets
using one target density function. Thus, the PHD filter will have a single large
peak in the density function, making it difficult to extract the precise location
of any individual object within the cluster.
Another challenging scenario is when there are multiple missed detections in
a row. In the MHT, this causes a track to be pruned prematurely, as there is
typically a maximum timeout for a track to ensure that old branches get pruned.
Increasing this timeout will make the MHT more robust to missed detections,
but also increases computation by increasing the number of tracks. The PHD
filter fails in a different manner. In the PHD filter update equation [9, Eq. (4)],
the target density function decreases in areas where the robot expects to detect
a target and increases in areas where the robot actually detects a target. Thus,
repeated missed detections lower the density and make it difficult to recover
quickly.
Adding multiple robots adds another challenge. Due to the design of our
distributed MTT algorithms, when a target lies right at the boundary between
two Voronoi cells then the ownership of that track will be ambiguous. Thus, two
robots may repeatedly exchange ownership of a track simply due to measurement
noise pushing the mean position across the Voronoi boundary. This does not
affect tracking quality but increase the communication load.
5 Conclusion
Acknowledgment. This work was funded by NSF grants IIS-1830419 and CNS-
2143312.
314 P. Xin and P. Dames
References
1. Bhattacharya, S., Ghrist, R., Kumar, V.: Multi-robot coverage and exploration on
Riemannian manifolds with boundaries. Int. J. Robot. Res. 33(1), 113–137 (2014).
https://doi.org/10.1177/0278364913507324
2. Blackman, S.: Multiple hypothesis tracking for multiple target tracking. IEEE
Aerosp. Electron. Syst. Mag. 19(1), 5–18 (2004). https://doi.org/10.1109/MAES.
2004.1263228
3. Blackman, S., Popoli, R.: Design and Analysis of Modern Tracking Systems. Artech
House Radar Library. Artech House (1999)
4. Chen, J., Xie, Z., Dames, P.: The semantic PHD filter for multi-class target track-
ing: from theory to practice. Robot. Auton. Syst. 149 (2022). https://doi.org/10.
1016/j.robot.2021.103947
5. Chong, C.Y., Mori, S., Reid, D.B.: Forty years of multiple hypothesis tracking -
a review of key developments. In: 2018 21st International Conference on Informa-
tion Fusion (FUSION), pp. 452–459 (2018). https://doi.org/10.23919/ICIF.2018.
8455386
6. Coraluppi, S., et al.: Distributed MHT with active and passive sensors. In: 2015
18th International Conference on Information Fusion (Fusion), pp. 2065–2072
(2015)
7. Cortes, J., Martinez, S., Karatas, T., Bullo, F.: Coverage control for mobile sensing
networks. IEEE Trans. Robot. Autom. 20(2), 243–255 (2004). https://doi.org/10.
1109/TRA.2004.824698
8. Dames, P., Kumar, V.: Experimental characterization of a bearing-only sensor for
use with the PHD filter (2015)
9. Dames, P.M.: Distributed multi-target search and tracking using the PHD filter.
Auton. Robot. 44(3), 673–689 (2020). https://doi.org/10.1007/s10514-019-09840-
9
10. Deutsch, I., Liu, M., Siegwart, R.: A framework for multi-robot pose graph SLAM.
In: 2016 IEEE International Conference on Real-time Computing and Robotics
(RCAR), pp. 567–572 (2016). https://doi.org/10.1109/RCAR.2016.7784092
11. Doitsidis, L., et al.: Optimal surveillance coverage for teams of micro aerial vehicles
in GPS-denied environments using onboard vision. Auton. Robot. 33(1), 173–188
(2012). https://doi.org/10.1007/s10514-012-9292-1
12. Du, Q., Faber, V., Gunzburger, M.: Centroidal Voronoi tessellations: applica-
tions and algorithms. SIAM Rev. 41(4), 637–676 (1999). https://doi.org/10.1137/
S0036144599352836
13. Fortmann, T., Bar-Shalom, Y., Scheffe, M.: Sonar tracking of multiple targets using
joint probabilistic data association. IEEE J. Oceanic Eng. 8(3), 173–184 (1983).
https://doi.org/10.1109/JOE.1983.1145560
14. Goldhoorn, A., Garrell, A., Alquézar, R., Sanfeliu, A.: Searching and tracking peo-
ple with cooperative mobile robots. Auton. Robot. 42(4), 739–759 (2018). https://
doi.org/10.1007/s10514-017-9681-6
15. Gronauer, S., Diepold, K.: Multi-agent deep reinforcement learning: a survey. Artif.
Intell. Rev. 55(2), 895–943 (2022). https://doi.org/10.1007/s10462-021-09996-w
16. Kim, C., Li, F., Ciptadi, A., Rehg, J.M.: Multiple hypothesis tracking revisited.
In: Proceedings of the IEEE International Conference on Computer Vision, pp.
4696–4704 (2015)
Distributed Multiple Hypothesis Tracker 315
17. Konstantinova, P., Udvarev, A., Semerdjiev, T.: A study of a target tracking algo-
rithm using global nearest neighbor approach. In: Proceedings of the 4th Interna-
tional Conference Conference on Computer Systems and Technologies: E-Learning,
CompSysTech 2003, pp. 290–295 (2003). https://doi.org/10.1145/973620.973668
18. Lee, S.G., Egerstedt, M.: Controlled coverage using time-varying density functions.
IFAC Proc. Vol. 46(27), 220–226 (2013). https://doi.org/10.3182/20130925-2-DE-
4044.00030
19. Leonard, M.R., Zoubir, A.M.: Multi-target tracking in distributed sensor networks
using particle PHD filters. Signal Process. 159, 130–146 (2019)
20. Mahler, R.: Multitarget bayes filtering via first-order multitarget moments. IEEE
Trans. Aerosp. Electron. Syst. 39(4), 1152–1178 (2003). https://doi.org/10.1109/
TAES.2003.1261119
21. Murphy, R.: Human-robot interaction in rescue robotics. IEEE Trans. Syst. Man
Cybern. Part C (Appl. Rev.) 34(2), 138–153 (2004). https://doi.org/10.1109/
TSMCC.2004.826267
22. Parker, L.E.: Cooperative robotics for multi-target observation. Intell. Autom. Soft
Comput. 5(1), 5–19 (1999)
23. Pierson, A., Rus, D.: Distributed target tracking in cluttered environments with
guaranteed collision avoidance. In: 2017 International Symposium on Multi-Robot
and Multi-Agent Systems (MRS), pp. 83–89 (2017). https://doi.org/10.1109/MRS.
2017.8250935
24. Pimenta, L.C.A., et al.: Simultaneous coverage and tracking (SCAT) of moving tar-
gets with robot networks. In: Chirikjian, G.S., Choset, H., Morales, M., Murphey,
T. (eds.) Algorithmic Foundation of Robotics VIII. Springer Tracts in Advanced
Robotics, vol. 57, pp. 85–99. Springer, Heidelberg (2010). https://doi.org/10.1007/
978-3-642-00312-7 6
25. Qie, H., Shi, D., Shen, T., Xu, X., Li, Y., Wang, L.: Joint optimization of multi-UAV
target assignment and path planning based on multi-agent reinforcement learning.
IEEE Access 7, 146264–146272 (2019). https://doi.org/10.1109/ACCESS.2019.
2943253
26. Ribas, D., Ridao, P., Neira, J.: Localization with an a priori Map. In: Ribas,
D., Ridao, P., Neira, J. (eds.) Underwater SLAM for Structured Environments
Using an Imaging Sonar. Springer Tracts in Advanced Robotics, vol. 65, pp. 47–
75. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14040-2 5
27. Robin, C., Lacroix, S.: Multi-robot target detection and tracking: taxonomy and
survey. Auton. Robot. 40(4), 729–760 (2016)
28. Schuhmacher, D., Vo, B.T., Vo, B.N.: A consistent metric for performance evalua-
tion of multi-object filters. IEEE Trans. Signal Process. 56(8), 3447–3457 (2008).
https://doi.org/10.1109/TSP.2008.920469
29. Schwager, M., Dames, P., Rus, D., Kumar, V.: A multi-robot control policy for
information gathering in the presence of unknown hazards. In: Christensen, H.I.,
Khatib, O. (eds.) Robotics Research. STAR, vol. 100, pp. 455–472. Springer, Cham
(2017). https://doi.org/10.1007/978-3-319-29363-9 26
30. Stone, L.D., Streit, R.L., Corwin, T.L., Bell, K.L.: Bayesian Multiple Target Track-
ing. Artech House (2013)
31. Suresh, A.T., Yu, F.X., McMahan, H.B., Kumar, S.: Distributed mean estimation
with limited communication. CoRR abs/1611.00429 (2016)
32. Thrun, S., Burgard, W., Fox, D.: Probabilistic Robotics. MIT Press, Cambridge
(2005)
316 P. Xin and P. Dames
33. Vo, B.T., Vo, B.N., Cantoni, A.: The cardinality balanced multi-target multi-
bernoulli filter and its implementations. IEEE Trans. Signal Process. 57(2), 409–
423 (2009). https://doi.org/10.1109/TSP.2008.2007924
34. Yan, Z., Jouandeau, N., Cherif, A.A.: A survey and analysis of multi-robot coordi-
nation. Int. J. Adv. Rob. Syst. 10(12), 399 (2013). https://doi.org/10.5772/57313
35. Zhou, W., Li, J., Liu, Z., Shen, L.: Improving multi-target cooperative tracking
guidance for UAV swarms using multi-agent reinforcement learning. Chin. J. Aero-
naut. 35(7), 100–112 (2022). https://doi.org/10.1016/j.cja.2021.09.008
Distributed Multirobot Control
for Non-cooperative Herding
1 Introduction
Recent developments in robotics and sensing have created significant interest
among researchers to deploy multiple robots to operate cooperatively towards
achieving a common goal. Many works have developed techniques to tackle real-
world problems using multi-robot systems (MRS), like conducting surveys or
automating warehouses [1–3]. The major developments in MRS for enabling
multiple robots to behave cooperatively have been based on interactions within
a single team, i.e., a robot interacts with other robots in its group to achieve a
given objective [4,5]. The main features of these types of algorithms are a) local
N. Mohanty and J. Grover—These authors contributed equally to this work.
This research is supported by AFOSR FA9550-18-1-0097 and AFRL/AFOSR FA9550-
18-1-0251.
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 317–332, 2024.
https://doi.org/10.1007/978-3-031-51497-5_23
318 N. Mohanty et al.
1
Accepted in IEEE Conference on Decision and Control 2022.
Distributed Herding 319
2 Prior Work
The framework of multi-group interaction within MRS has many applications
beyond the adversarial problem statements. The shepherding problem is an
example of such a category. In [12,13], the authors have proposed methods to
enable multiple shepherd agents to influence a flock of sheep by modeling the
interaction as repulsion forces. The Robot Sheepdog Project [14,15] conducted
a real-world demonstration of a shepherding algorithm where a group of mobile
ground robots cooperatively herded a flock of ducks to a given goal location.
In the literature, there are several works on non-cooperative shepherding as
an example of a multi-group interaction type problem. The works like [13,16–20]
deal with a problem where the sheep robots do not exhibit adversarial behavior.
They do not have any goals of their own. However, they experience a repulsive
force from the dog robots, which is exploited to produce the desired behavior
in the sheep robots. For example, collecting all the sheep at some location and
then driving them to a target goal.
Differently from prior work, our sheep may or may not be adversarial. We call
them adversarial if their goal lies inside the protected zone and non-adversarial
otherwise. Our safe control synthesis approach remains the same regardless. The
dog robots observe and generate their control commands considering the cohesion
between the sheep robots, the attraction to their goal location, and the repulsion
experienced by them from the dog robots. And as we use control barrier functions
to generate the constraints on the velocity of the dog robots, it only requires the
dynamics of the sheep to be represented as a symbolic function. Thus allowing
for the sheep to experience any kind of attractive or repulsive forces.
3 Problem Formulation
Consider a scenario with m sheep agents flocking towards a common goal loca-
tion. One commonly assumed model for flocking is the Reynolds-Boids dynamics
[6] that considers inter-sheep cohesive forces, inter-sheep repulsive forces, and
attraction to a common goal. In the presence of dog agents, each sheep’s dynam-
ics would include repulsive forces from each dog robot. While en route to their
goal, the sheep, having no knowledge about high-value regions in workspace
(protected zones), pose a risk of breaching them. Thus, our problem is to orches-
trate the motions of dog robots by capitalizing on the repulsions that the sheep
experience from the dogs to prevent this breaching. Next, we pose this problem
in formal terms.
Consider the protected zone P ⊂ R2 as a disc centered at xP with radius
Rp , i.e., P := {x ∈ R2 | x − xP ≤ Rp }. We denote the flock of sheep as S and
the position of the ith sheep as xSi ∈ R2 . The collective positions of all sheep
is denoted as xall
S := (xS1 , xS2 , ..., xSm ). Similarly, we denote the set of all dogs
320 N. Mohanty et al.
using D. The position of the k th dog is xDk ∈ R2 and the positions of all dogs
collectively is xall
D := (xD1 , xD2 , ..., xDn ). Each sheep follows single integrator
dynamics ẋSi := f i (xS1 , ..., xSn , xD1 , ..., xDn ), given by
RS3
ẋSi = uSi = kS 1− (xSj − xSi ) + kG (xG − xSi )
xSj − xSi 3
j∈S\i
attraction to goal
inter-sheep cohesion and repulsion
xS − xD
i l
+ kD (1)
xSi − xDl 3
l∈D
repulsion from dogs
Here, RS is a safety margin that each sheep tends to maintain with every other
sheep, xG is the sheep’s desired goal and kS , kG and kD are proportional gains
corresponding to the attractive and repulsive forces. We model each dog as a
velocity controlled robot with the following dynamics:
Before posing the problem, we state some assumptions on the dogs’ knowledge:
Assumption 1. The dog robots have knowledge about the sheep’s dynamics i.e.
(1) and can measure the sheep’s positions accurately.
Assumption 2. Each dog robot can measure the velocities of other dog robots
(by using numerical differentiation, for example).
Problem 1. Assuming that the initial positions of the sheep xSi (0) ∈ / P ∀i ∈
S, the dog robots’ problem is to synthesize controls {uD1 , · · · , uDn } such that
xSi (t) ∈
/ P ∀t ≥ 0 ∀i ∈ S.
4 Controller Design
In this section, we show two approaches to solve Problem 1, building on our
previously proposed centralized algorithm [9]. Define a safety index h(·) : R2 −→
R that quantifies the distance of Si from P:
AC
i := (xP − xSi )
T
JD D
1i , J2i , · · · , Jni
D
h
T
bC
i := f i f i + (xSi − xP )
T
JSji f j + α(xSi − xP )T f i + β
2
j∈S
subject to AC all
i uC ≤ bC
i ∀i ∈ S. (12)
Building on this centralized approach, in this paper, we develop two distributed
approaches wherein we allow each dog to compute its velocity locally. The com-
puted velocities will make the dog herd the sheep away from P.
322 N. Mohanty et al.
In this approach, we assume that we have an equal number of dogs and sheep. By
exploiting this equality, we assign a unique sheep Si for i ∈ {1, · · · , n} to a unique
dog Dk for k ∈ {1, · · · , n} and make Dk responsible for herding Si away from
P. In other words, Dk computes a velocity uDk that repels Si from P thereby
ensuring that xSi (t) ∈ / P ∀t ≥ 0. The premise is that owing to the equality,
each sheep will end up being herded by a unique dog, therefore, no sheep will
breach the protected zone2 . Now while this strategy necessitates having an equal
number of dogs and sheep, the benefit of this approach stems from the feasibility
guarantee (that we prove shortly), which the centralized approach lacks. Simple
algebraic manipulation of constraint (11) yields a constraint on the velocity of
Dk as follows
AH H
i u D k ≤ bi , where (13)
AH T D
i := (xP − xSi ) Jki
h
T
bH
i := f i f i + β + (xSi − xP )T JSji f j + αf i + JD
li uDl
2
j∈S l∈D\k
Here AHi ∈R
1×2
and bH H
i ∈ R. The term uDl in the expression of bi is computed
by using numerical differentiation of the positions xDl . We pose a QP to obtain
the min-norm velocity for Dk as follows
subject to AH H
i u D k ≤ bi (14)
The obtained velocity u∗Dk guarantees that the protected zone P will not be
breached by sheep Si by ensuring that h(xSi (t)) ≥ 0 ∀t ≥ 0. Since each dog
in D is in-charge of herding exactly one sheep in S, feasibility of (13) ∀k ∈ D
would ensure no sheep breaches P. Next, we show the conditions under which
(14) remains feasible but first state some assumptions.
Assumption 3. We make the following assumptions on the distances between
pairs of agents:
1. There exists a lower bound and upper bound on the distance between any pair
of sheep, i.e., LS xSi − xSj MS , ∀i, j ∈ S and i = j.
2. There exists a lower bound on the distance between every sheep and dog, i.e.,
xSi − xDk ≥ LD ∀i ∈ S and k ∈ D.
2
Note that although Si is assigned to Dk , the position of the remaining dogs
{1, · · · , n}\k and the remaining sheep {1, · · · , n}\i do influence Dk ’s constraint
parameters (AH H ∗
i , bi ), and in turn, its computed velocity u Dk .
Distributed Herding 323
3. There exists a upper bound on the distance between each sheep and its goal
i.e., xSi − xG MG and between the sheep and the center of the protected
zone i.e., xSi − xP MP .
Theorem 1. In a scenario with ‘n’ dogs and ‘n’ sheep, with each dog assigned
a unique sheep, the herding constraint (13) for a given dog is always feasible,
provided assumptions 3 are met.
5 Results
We first validate the first distributed algorithm and the feasibility proof given
in Sect. 4.1. For this, we model the sheep with the Reynolds-Boids dynamics (1)
with gains kS = 0.5, kG = 1 and kD = 0.1. The dogs use (14) to compute their
velocities, where hyperparameters α and β are computed following (5) and (7).
We chose a circular protected zone of radius Rp = 0.6 m and center xP at origin.
The sheep are initialized outside of the protected zone, and their goal location xG
is chosen such that their nominal trajectory would make them breach the zone,
thus necessitating intervention from dogs. The positions of dogs are initialized
randomly within a certain range of the protected zone. In Figs. 1(a) and 1(b),
Fig. 1. Preventing the breaching of the protected zone using our proposed distributed
algorithm in Sect. 4.1. Here dogs are shown in blue and sheep in red. The green disc
represents the protected zone. The nominal task of the sheep is to go straight towards
goal xG . However, since this would result in infiltration of the protected zone, the dog
intervenes using the control algorithm presented in (14). In Fig. 1(c), we defend two
protected zones from four sheep.
Distributed Herding 325
we show two examples involving a) two dog robots vs. two sheep robots and b)
three dog robots vs. three sheep robots. To demonstrate the compositionality of
our approach, we consider two protected zones in Fig. 1(c) where we have four
dogs defending both zones from four sheep. In all these simulations, none of the
sheep breach any zone, thus demonstrating the correctness of our approach. In
the interest of space, we skip the simulation results for the algorithm in Sect. 4.2
but do provide experimental results.
Fig. 2. Experiments for Centralized Control: Two dogs defending the protected
zone from four sheep using centralized control algorithm (12) from our prior work [9].
Video at https://bit.ly/3OTAnOu.
Fig. 4. Experiment for the distributed algorithm in Sect. 4.1 : Four dogs (green-
tailed robots) defending two protected zone from four sheep (orange-tailed robots). The
goal position xG (red disc) is in extreme left that would encourage sheep to breach both
zones. However, our proposed algorithm moves the dogs so that none of the zones get
breached. Video at https://bit.ly/3yo9ziC.
Fig. 5. Experiment for the distributed algorithm in Sect. 4.1): Five dogs (green-
tailed robots) defending the protected zone from five sheep (orange-tailed robots). The
sheep’s goal (red disc) is in the center of the protected zone. Eventually, in this scenario
a deadlock occurs where all sheep come to a stop outside the protected zone. Video at
https://bit.ly/3o51Cu1.
328 N. Mohanty et al.
Fig. 6. Experiment for distributed algorithm in Sect. 4.2): Two dogs (green-
tailed robots) defending the protected zone from three sheep (orange-tailed robots).
The goal position xG (red disc) is at the center of the zone. Video at https://youtu.
be/IbCjkR1ye0c.
Fig. 7. Experiment for distributed algorithm in Sect. 4.2): Two dogs (green-
tailed robots) defending the protected zone from four sheep (orange-tailed robots). This
case is similar to the one shown in Fig. 2. Video at https://youtu.be/51FoHZWFYC4.
Distributed Herding 329
6 Conclusions
In this paper, we developed a novel optimization-based distributed control tech-
niques to enable multiple dog robots to prevent the breaching of protected zones by
sheep agents. We provided proof of feasibility of the controller when n dog robots
faceanequalnumberofsheeprobots.Additionally,wedevelopedanotherdistributed
algorithm that iteratively computes a solution that agrees with the solution returned
by the centralized problem without requiring equal number of dogs and sheep. We
experimentally validated both distributed algorithms in addition to validating our
previously developed centralized control. We show that multiple dog robots can pre-
vent breaching of protected zone in both simulation and real-world experiments. In
futurework,weaimforthedogrobotstolearnthedynamicsofthesheeprobotsonline
while preventing them from breaching.
Theorem 1. In a scenario with ‘n’ dogs and ‘n’ sheep, with each dog assigned
a unique sheep, the herding constraint (13) for a given dog is always feasible,
provided Assumptions 3 are met.
– either when AH H
i = 0 and bi < 0 (possibility 1)
H
– or when bi = −∞ (possibility 2).
−2kD2
det(JD
ki ) =
xDk − xSi 3
The determinant det(JD ki ) is non-zero as long as the distance between dog Dk and
sheep Si is finite. Therefore, JD H
ki will have no null space, implying that Ai = 0
∀xSi ∈ R , xDk ∈ R . This rules out possibility 1 for infeasibility. To rule out
2 2
where σmax is the largest singular value of a matrix. Further, using the fact that
the largest singular value of a matrix (σmax ) is upper bounded by its Frobenius
norm (σF ), we obtain
bH
i ≥ −σF JSji xSi − xP f j − σF JSii xSi − xP f i (3)
j∈S\i
−σF JD
li xSi − xP uDl − αxSi − xP f i
l∈D\k
Now to compute this lower bound we make use of Assumption 3. We use the
dynamics in (1) to compute JSii and obtain the upper bound on σF JSii and use
the bounds on distances from Assumption 3 to get following upper bound:
√ √
√ (3 + 2)R3 √ 3 + 2 kD
σF JSii kS 2+ + 2kG +
xSi − xSj 3 xSi − xDl
3
j∈S\i l∈D
√ √
√ (3 + 2)R3 √ 3 + 2 kD
(n − 1)kS 2+ + 2kG + n := λM
L3S L3D
We omit the proof of this computation in the interest of space. Similarly, using
the dynamics in (1), we compute an expression for JSji and obtain an upper
bound on σF JSji as follows:
√ √
(3 + 2)kS R3√ √ (3 + 2)kS R3
σF 2kS +JSji 2kS + := λS
xSi − xSj 3 L3S
Likewise, an upper bound of σF JD li , is given by
√ √
(3 + 2)kD (3 + 2)kD
σF JDli := λD
xSi − xDl 3 L3D
Lastly, we use obtain an upper bound on the dynamics of each sheep f i as:
R3
f i kS xSi − xSj + + kG xG − xSi
xSi − xSj 2
j∈S\i
xSi − xDl
+ kD (4)
xSi − xDl 3
l∈D
Distributed Herding 331
Now we need to compute the maximum possible value of the RHS to get the
upper bound of the sheep dynamics. The first term has a local minima at xSi −
xSj = (2)1/3 R. Therefore the maximum value can occur at either the lower
bound or upper bound of xSi − xSj . Thus the maximum value of the first
R3 R3
term can be given as Fmax := max(kS LS + kS L 2 , kS MS + kS M 2 ). Second term
S S
is maximum when xG − xSi = MG . The last term is maximum when distance
of the sheep to the dogs are minimum, xSi −xDk = LD . Using these the upper
bound on the sheep dynamics is computed as:
1
f i (n − 1)Fmax + kG MG + nkD
L2D
Assuming that the velocity of the dog robots have an upper bound, and by
taking the upper bound on the dynamics of all the sheep to be equal, the lower
bound on bHi from 3 is (taking γ = −(α + λM + (n − 1)λS )Mp )
nkD
bH
i γ (n − 1)F max + k G M G + − (n − 1)λD MP uD max
L2D
This shows that bHi has a finite lower bound, thus ruling out possibility 2. Thus,
the herding constraint (13) for a one dog to repel one sheep from the protected
zone is always feasible. Since each sheep in S is allocated to one unique dog in
D, extension of this feasibility result to all sheep ensures that none of them will
breach the protected zone.
References
1. D’Andrea, R.: Guest editorial: a revolution in the warehouse: a retrospective on
kiva systems and the grand challenges ahead. IEEE Trans. Autom. Sci. Eng. 9(4),
638–639 (2012)
2. D’Andrea, R., Dullerud, G.E.: Distributed control design for spatially intercon-
nected systems. IEEE Trans. Autom. Control 48(9), 1478–1495 (2003)
3. Kazmi, W., Bisgaard, M., Garcia-Ruiz, F., Hansen, K.D., la Cour-Harbo, A.: Adap-
tive surveying and early treatment of crops with a team of autonomous vehicles.
In: Proceedings of the 5th European Conference on Mobile Robots ECMR 2011,
pp. 253–258 (2011)
4. Ji, M., Egerstedt, M.: Distributed coordination control of multiagent systems while
preserving connectedness. IEEE Trans. Rob. 23(4), 693–703 (2007)
5. Lin, J., Morse, A.S., Anderson, B.D.: The multi-agent rendezvous problem-the
asynchronous case. In: 2004 43rd IEEE Conference on Decision and Control (CDC)
(IEEE Cat. No. 04CH37601), vol. 2. IEEE, pp. 1926–1931 (2004)
6. Reynolds, C.W.: Flocks, herds and schools: a distributed behavioral model. In:
Proceedings of the 14th Annual Conference on Computer Graphics and Interactive
Techniques, pp. 25–34 (1987)
7. Walton, C., Kaminer, I., Gong, Q., Clark, A., Tsatsanifos, T., et al.: Defense against
adversarial swarms with parameter uncertainty. arXiv preprint arXiv:2108.04205
(2021)
332 N. Mohanty et al.
8. Tsatsanifos, T., Clark, A.H., Walton, C., Kaminer, I., Gong, Q.: Modeling and con-
trol of large-scale adversarial swarm engagements. arXiv preprint arXiv:2108.02311
(2021)
9. Grover, J., Mohanty, N., Luo, W., Liu, C., Sycara, K.: Noncooperative herd-
ing with control barrier functions: Theory and experiments, arXiv preprint
arXiv:2204.10945 (2022)
10. Falsone, A., Margellos, K., Garatti, S., Prandini, M.: Dual decomposition for multi-
agent distributed optimization with coupling constraints. Automatica 84, 149–158
(2017)
11. Notarstefano, G., Notarnicola, I., Camisa, A., et al.: Distributed optimization for
smart cyber-physical networks. Found. Trends Syst. Control 73, 253–383 (2019)
12. Lien, J.-M., Bayazit, O.B., Sowell, R.T., Rodriguez, S., Amato, N.M.: Shepherd-
ing behaviors. In: Proceedings of IEEE International Conference on Robotics and
Automation, ICRA’04. 2004, vol. 4. IEEE, 2004, pp. 4159–4164 (2004)
13. Pierson, A., Schwager, M.: Controlling noncooperative herds with robotic herders.
IEEE Trans. Rob. 34(2), 517–525 (2017)
14. Vaughan, R., Sumpter, N., Henderson, J., Frost, A., Cameron, S.: Robot control
of animal flocks. In: Proceedings of the 1998 IEEE International Symposium on
Intelligent Control (ISIC) held jointly with IEEE International Symposium on
Computational Intelligence in Robotics and Automation (CIRA) Intell., pp. 277–
282. IEEE (1998)
15. Vaughan, R., Sumpter, N., Henderson, J., Frost, A., Cameron, S.: Experiments in
automatic flock control. Robotics and autonomous systems, vol. 31, no. 1-2, pp.
109–117 (2000)
16. Pierson, A., Schwager, M.: Bio-inspired non-cooperative multi-robot herding. In:
2015 IEEE International Conference on Robotics and Automation (ICRA), pp.
1843–1849. IEEE (2015)
17. Licitra, R.A., Bell, Z.I., Doucette, E.A., Dixon, W.E.: Single agent indirect herding
of multiple targets: a switched adaptive control approach. IEEE Control Syst. Lett.
2(1), 127–132 (2017)
18. Licitra, R.A., Hutcheson, Z.D., Doucette, E.A., Dixon, W.E.: Single agent herding
of n-agents: a switched systems approach. IFAC-PapersOnLine 50(1), 14 374–
14 379 (2017)
19. Sebastián, E., Montijano, E.: Multi-robot implicit control of herds. In: 2021 IEEE
International Conference on Robotics and Automation (ICRA), pp. 1601–1607.
IEEE (2021)
20. Bacon, M., Olgac, N.: Swarm herding using a region holding sliding mode con-
troller. J. Vib. Control 18(7), 1056–1066 (2012)
On Limited-Range Coverage Control
for Large-Scale Teams of Aerial Drones:
Deployment and Study
1 Introduction
Addressing a persistent and unpredictable service demand with a multi-robot
system can provide higher efficiency, thanks to parallel operation, and larger
flexibility by modulating the operational costs in terms of robots employed,
as well as by deploying robots only where needed. Concrete solutions can be
obtained by addressing two concurrent problems. In the last years, significant
improvements have been made by the researchers into the field of multi-robot
systems [4]. In particular, these systems are gaining relevance in the real world
applications where, due to unpredictability of the environments and the pres-
ence of external factors unknown a priori, they often requires the distributed
capabilities provided by a teams of cooperating robots [13]. At the same time,
a distributed and decentralized robotic system proves scalable, fault-tolerant
and, due to the absence of a central point of failure, ensures robust continuous
and efficient operation. Among the many applications that could benefit from
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 333–346, 2024.
https://doi.org/10.1007/978-3-031-51497-5_24
334 F. Bertoncelli et al.
Organization of the Paper. The rest of the paper is organized as follows. The
methodology considered for our experiment is described in Sect. 2. The anal-
ysis of the experiments is extensively presented in Sect. 3. Here, quantitative
measurements are reported along with a description of the simulator and the
platform used for the real-world experiments and continuing with a discussion
of the observed results. particular Finally, Sect. 4 closes the manuscript with
some concluding remarks.
2 Methods
The aim of our experiments is to evaluate the performance of a team of drones
controlled by a decentralized, limited-range, coverage control strategy, deployed
in a large outdoor environment. For this purpose, field trials and high-fidelity
dynamic simulations were set up, to achieve a statistically relevant characteri-
zation. Details of the methods are provided in the following paragraphs.
Each robot is then controlled to move towards the centroid of its cell by:
respectively representing: The number of drones n ∈ N = {8, 32, 64}; the size
of the operational area a ∈ A = {50 m, 100 m, 200 m}; and the sensitivity range
r ∈ R = {10 m, 50 m, 100 m, 500 m}.
Fig. 1. Figure of the MRS F450, the quad-rotor platform selected for the experiments
The robotic unit is equipped with several sensors among which, for the spe-
cific use case we analyze here, we highlight a Global Positioning System module
for self-localization—as the one coming with the Pixhawk autopilot bundle—
and a XBee devices for communication among the units and information sharing.
The latter, makes use of an ad-hoc peer-to-peer proprietary mesh-network called
DiGiMesh that we limit to be single-hop to force the range communication con-
straint. We employed omni-directional low-power antennas on board all drones,
each of which runs on 900Mhz empirically estimated to guarantee a good con-
nectivity among all the units in the group. The vehicle controller [12] allows to
generate the desired orientation, collective thrust command, body rates, angular
accelerations, and angular jerk based on the coverage control. The experiments
were carried out using 7 vehicles, in an area of 50×50 m; as previously described,
the communication devices used offer full connectivity but, to evaluate the behav-
ior of the control strategy we manually limit the effect of the sensing range r
338 F. Bertoncelli et al.
with the same values used for the simulations—i.e., R ∈ {10 m, 50 m, 100 m}.
Three simulations runs have been performed for each value of the sensing range.
As for the simulation, the control algorithm proposed in Sect. 2 runs in a dis-
tributed way on board of each drone. The position information of neighboring
units needed by the coverage control are solely provided through radio commu-
nication and only by the aerial units—i.e., no external or central computer is
used. The ve Fig. 2 shows the trajectories of the UAVs in one of the experiments
carried out; following the takeoff procedure to bring the vehicles to the same alti-
tude, the coverage control function is performed to determine the new positions
of the vehicles.
The speed control of Eq. 1 is limited to 2m/s for safety reasons, taking into
account the number of vehicles involved in the experiment and the degree of
freedom that the drones have to move in various directions.
2.4 Metrics
In order to evaluate the overall performance of the system, the same metrics
were collected both in simulations and in field tests. In particular, we consider
the following three main aspects: (i) optimality of the final configuration, (ii)
efficiency in the coverage of the environment, and (iii) effectiveness in the use
of the robots.
As discussed in [6,15], such definition allows to quantify how close the configura-
tion of the robots is to the centroidal Voronoi configuration, defined considering,
for each point q ∈ Q, the value of the pdf φ(q).
Overall Efficiency. Considering the fact that each robot is able to individually
cover a circular area with radius r, we measure the effectiveness in the use of
resources computing the ratio between the actual area covered by the team of
drones, and the upper-bound on the area that could be theoretically covered,
namely:
n ∫
dq
i=1 V
i
E(P) = (7)
n π r2
We hereby present the results of the simulations. Figure 3 displays the value of
the average value of H (P) across the set repetitions. The blue line shows the
value for 8 robots, the red line for 32 robots, and the yellow line for 64 robots.
Each plot groups according to the size of the area to cover and the sensing range.
Following the same color scheme, Figs. 4 and 5 display the average value of A(P)
and E(P) respectively.
Figure 3 clearly shows that increasing the number of robots increases the
performance with respect to the coverage metric. This effect is intuitively
attributable to a greater cumulative covered area. It is also possible to note
that, the rate of convergence decreases with the increase of area size, increases if
the sensing range grows and decreases the more robots are employed. The oscilla-
tions present on the blue line for the cases with large Q and big sensing range are
caused by the dynamics of the drones. Inherently, each drone possesses a certain
motion dynamic different from pi = ui . The velocity controller that generates
Fig. 3. Average value of A(P) across all repetitions. The plots on the same line refer
to the same total area to cover, while the plots on the same column refer to cases with
same sensing range. The values for 8, 32 and 64 robots are displayed with blue, red
and yellow lines respectively.
Limited-Range Coverage Control of Aerial Drones 341
Fig. 4. Average value of A(P) across all repetitions. The plots on the same line refer
to the same total area to cover, while the plots on the same column refer to cases with
same sensing range. The values for 8, 32 and 64 robots are displayed with blue, red
and yellow lines respectively.
Fig. 5. Average value of E(P) across all repetitions. The plots on the same line refer
to the same total area to cover, while the plots on the same column refer to cases with
same sensing range. The values for 8, 32 and 64 robots are displayed with blue, red
and yellow lines respectively.
342 F. Bertoncelli et al.
the motor commands controls the drone so that its motion is as similar to (1)
as possible. However, if the requested velocity is too different from the current
velocity, the controller encounters acceleration limits that render it impossible
to closely track the requested velocity profile. With large Q and large sensing
range, the resulting Vi can be very large and their centroid can be far away
from the drone. Therefore, the input generated by the control law (3) is very
large, requiring accelerations outside the limitations of the drone, causing the
oscillating behavior depicted by the blue line. With a larger number of drones,
this behavior is limited or totally absent since each drone has a larger number
of neighbors, therefore its resulting cell Vi is smaller. This effect can be reduced
by reducing the value of k pr op in (3) or by applying a saturation on the value
of ui . Nevertheless, the coverage algorithm is still capable of spreading drones
across the area.
Figure 4 shows how well the area is covered by the drones. When the sensing
range is small, the robots often struggle to cover the totality of Q. The cov-
ered area increases with the increase of the number of drones employed. This is
expected since, obviously, more robots imply more cells Vi and the behavior for
each drone is to distance itself from its neighbors, enlarging its cell Vi .
Figure 5 depicts the efficiency of the coverage. The plots clearly show that
fewer robots are more efficient. A value of 1 for E means that each robot has
reached the configuration where its cell Vi is the largest possible. With few robots
there are fewer neighbors to interact with, ultimately allowing for the maximum
enlargement of each cell. The condition for a team of robots to not reach max-
imum efficiency that the total area Q cannot accommodate all the cells of the
robot.
These considerations ultimately allow us to set some design analysis to make
when deploying a coverage application. A larger number of robots is not neces-
sarily better in all scenarios, even if it increases the performance. Depending on
the sensing range, which is often dictated by different factors, a larger number
of robots can imply a better area coverage but a trade-off has to be considered,
in order to not overly reduce efficiency.
We continue our analysis of the results with real field experiments involving
a group of seven drones. Results are presented in Fig. 6 where all the metrics
detailed in Sect. 2.4 are utilized. The first line of graphs describes the metric
H (P) for the different sensitivity range values taken into consideration. The
second and third lines describe A(P) and E(P) respectively.
Limited-Range Coverage Control of Aerial Drones 343
Fig. 6. Average value of Eqs. (5), (6), (7) across all repetitions. The plots on the same
line display the same metric, while the plots on the same column refer to cases with
same sensing range.
Parallel to the simulated case, it can be seen how the metric of H (P), first
row, blue line, increases as the value of the range increases. Unlike Fig. 5 it can
be seen that the convergence times of the optimization function in Eq. 5 are
longer. This is due to a cap to the maximum velocity of 2m/s introduced for
safety reason and to avoid long overshoot with the real platform. Results for
values of r equals to 10, 50 and 100 show different convergence time. For r = 50
and r = 100 we note how the velocity cap is particularly detrimental since the
vehicles unable to reach the speed outputted by the proportional control law
and the optimization function requires more time to reach the steady state.
Figure 7 shows the initial and final conditions of the whole control process.
Coloured dots show the position of the each drone in the group at start—leftmost
figure—and end—rightmost figure—of the simulation. The circles surrounding
the dots represent the sensing range of the robots. In particular, by visual anal-
ysis of Fig. 7b, it can be noted that the final group configuration does not show
a compact and close formation. Indeed, while in perfect conditions the final
graph should have adjacent but non-overlapping sensing ranges—circles—this is
not the case here. By analysis of the data, it has been noticed that, during the
experiment, the GPS covariance matrix for σx,x 2
and σy,y
2
ranged in the interval
[1.0, 1.4] m, while values for the EPH, the standard deviation of horizontal posi-
344 F. Bertoncelli et al.
tion error, and EPV, the standard deviation of vertical position, were oscillating
between 0.6 and 0.95 m. These values, and the position uncertainty that derives
from them, justify the distance between the sensitivity ranges.
Fig. 7. On image (a) the initial configuration of the vehicles during a real experiment
with sensitivity range R = 50; on image (b), the final configuration of the drones in the
same experiment. On image (c) an aerial photo of the convergence process is shown,
with a different orientation of the axes.
4 Conclusions
effect on the final configuration but overall do compromise the behavior of the
coverage control function.
Future work includes a more extensive study of the real experiments and
the investigation of an alternative control law. Indeed, in this work, the final
outputted velocity is generated by the proportional law in Eq. (1). In some sit-
uations, as previously discussed, the generated velocity is too high—e.g., when
the point to reach is far from the drone’s actual position—and does not respect
the physical constraints of the platform. This represents an intuition for a future
investigation, analyzing control strategies that take into account the dynamics
of the drone, through a more elaborate kinematic model.
References
1. Baca, T., et al.: The MRS UAV system: pushing the frontiers of reproducible
research, real-world deployment, and education with autonomous unmanned aerial
vehicles. J. Intell. Robot. Syst. 102(1), 1–28 (2021)
2. Bullo, F., Cortés, J., Martinez, S.: Distributed Control of Robotic Networks: A
Mathematical Approach to Motion Coordination Algorithms. Princeton University
Press, Princeton (2009)
3. Cao, M., Hadjicostis, C.: Distributed algorithms for voronoi diagrams and appli-
cation in ad-hoc networks. Technical report UILU-ENG-03-2222, DC-210, UIUC
Coordinated Science Laboratory (2003)
4. Cortés, J., Egerstedt, M.: Coordinated control of multi-robot systems: a survey.
SICE J. Control Meas. Syst. Integr. 10(6), 495–503 (2017)
5. Cortes, J., Martinez, S., Bullo, F.: Spatially-distributed coverage optimization and
control with limited-range interactions. ESAIM Control Optim. Calculus Varia-
tions 11(4), 691–719 (2005)
6. Cortes, J., Martinez, S., Karatas, T., Bullo, F.: Coverage control for mobile sensing
networks. IEEE Trans. Robot. Autom. 20(2), 243–255 (2004)
7. Du, Q., Emelianenko, M., Ju, L.: Convergence of the Lloyd algorithm for computing
centroidal voronoi tessellations. SIAM J. Numer. Anal. 44(1), 102–119 (2006)
8. Faessler, M., Franchi, A., Scaramuzza, D.: Differential flatness of quadrotor dynam-
ics subject to rotor drag for accurate tracking of high-speed trajectories. IEEE
Robot. Autom. Lett. 3(2), 620–626 (2018). https://doi.org/10.1109/LRA.2017.
2776353
9. Guruprasad, K., Dasgupta, P.: Distributed voronoi partitioning for multi-robot
systems with limited range sensors. In: 2012 IEEE/RSJ International Conference
on Intelligent Robots and Systems, pp. 3546–3552. IEEE (2012)
10. He, C., Feng, Z., Ren, Z.: Distributed algorithm for voronoi partition of wireless
sensor networks with a limited sensing range. Sensors 18(2), 446 (2018)
11. Laventall, K., Cortés, J.: Coverage control by multi-robot networks with limited-
range anisotropic sensory. Int. J. Control 82(6), 1113–1121 (2009)
12. Lee, T., Leok, M., McClamroch, N.H.: Geometric tracking control of a quadrotor
UAV on se (3). In: 49th IEEE Conference on Decision and Control (CDC), pp.
5420–5425. IEEE (2010)
13. Parker, L.E.: Multi-robot team design for real-world applications. In: Asama, H.,
Fukuda, T., Arai, T., Endo, I. (eds.) Distributed Autonomous Robotic Systems
2, pp. 91–102. Springer, Tokyo (1996). https://doi.org/10.1007/978-4-431-66942-
5 10
346 F. Bertoncelli et al.
14. Pratissoli, F., Capelli, B., Sabattini, L.: On coverage control for limited range multi-
robot systems. In: IEEE/RSJ International Conference on Intelligent Robots and
Systems (IROS) (2022)
15. Schwager, M., McLurkin, J., Rus, D.: Distributed coverage control with sensory
feedback for networked robots. In: Robotics: Science and Systems, pp. 49–56 (2006)
16. Song, Y., Naji, S., Kaufmann, E., Loquercio, A., Scaramuzza, D.: Flightmare: a
flexible quadrotor simulator. In: Conference on Robot Learning (2020)
17. Teruel, E., Aragues, R., López-Nicolás, G.: A practical method to cover evenly a
dynamic region with a swarm. IEEE Robot. Autom. Lett. 6(2), 1359–1366 (2021)
MARLAS: Multi Agent Reinforcement
Learning for Cooperated Adaptive
Sampling
lishuo pan@brown.edu
3 Plaksha University, Mohali, India
1 Introduction
we show that our method is scalable with both the size of robot team and the
size of the region being sampled, and achieves equitable task distribution among
robots. We further demonstrate our approach is robust to both communication
failures and individual robot failures.
2 Problem Formulation
Fig. 1. (a) Overview of MARLAS algorithm, (b) coverage paths generated by one of
the area division approaches (DARP [15]), and (c) an example sampling run of the
proposed approach (MARLAS).
task is to solve the optimization problem to find a set of parameters θ ∗ for the
neural network.
Multi-robot Learning Formulation. In our decentralized multi-agent system
setting, individual agents do not have the direct access to the ground truth
information, and need to trade-off between exploiting the current information
and communicating with neighbors to solve the task in a cooperative manner. We
frame the task as a decentralized partially observable Markov Decision Process
(Dec-POMDP) represented by the tuple, (I, S, {A i }, {Z i }, T, R, γ). Here, I is the
set of robots, S is a set of states, and A is the set of discrete actions. Z i is a set of
possible observations by the robot ri . T models the transition function. R, γ are
the reward function and discount factor. Optimal solution of the Dec-POMDP
using dynamic programming is provably intractable as the time horizon, number
of robots, and the size of region increases [17]. In this work, we train a neural
network in a multi-agent reinforcement learning (MARL) setting to approximate
the underlying optimal decentralized policy.
3 Approach
where φ(sit ) represents the spatial feature aggregation vector. At each time step,
robots sample the actions from stochastic policy and move to their next positions
following transition function T .
To avoid collisions with other robots in their neighborhood, robots employ
a potential field based obstacle avoidance strategy similar to [18]. It is assumed
that collision avoidance will only be activated within d0 distance from the clos-
est neighboring robot. We further quantized repulsive force synthesised from
collision avoidance strategy, by maximizing its alignment with the action from
the discrete set A. For robots whose collision avoidance is active, their actions
generated from learned policy are replaced with a repulsive force. Additionally,
robots generate random actions when the collision occurs.
Spatial Feature Aggregation. In our previous work [3,19], we proposed and
evaluated various feature aggregation methods for adaptive sampling. Multires-
olution feature aggregation [3] has been empirically proven to outperform other
methods, as it reduces the state dimension and utilizes the geometry induced
bias for the sampling task. In multiresolution feature aggregation, we divided
the current belief distribution of reward map F̂ti into regions with different res-
olutions according to their distance to each robot. Feature regions closer to
a robot have higher resolution, and resolution decreases farther away from the
robot. Since rewards at locations visited by other robots are removed, each robot
must estimate its neighbors’ positions at any given time. We utilize each robot’s
belief estimates for other robot locations, x̂ij and the belief estimate for pre-
vious reward map F̂t−l i when updating the belief of current reward map F̂ti in
t i
the form F̂ti (x1, x2 ) = F̂t−l
i (x , x ) J
1 2 h,w − k=t−l+1 j x̂ j,k (x1, x2 ) , for all (x1, x2 ).
+
Here, Jh,w is a all-ones matrix of size of the reward map, the x̂ij,k is the discrete
belief distribution of the neighbor positions with its value between 0 and 1 in
each grid cell, the operation [·]+ converts the negative value to 0. Notice that the
direct communication between robots updates the x̂ij,t−l+1:t , thus we recompute
the summation over l time steps of the belief distributions of the neighbor posi-
i
tions. Here, the belief distributions of reward maps F̂(t−l+ follow the Markov
1):t
property.
Multi-robot Learning in Adaptive Sampling. As mentioned in Sect. 2, solv-
ing a Dec-POMDP optimally is provably intractable in our task. Therefore, we
use the belief model to maintain belief distributions of the ground truth and
employ a neural network to approximate the optimal decentralized policy π ∗
and we use policy gradient methods to search the optimal policy on the neu-
ral network parameter space. To encourage cooperation, we adapt the central-
ized training with decentralized execution (CTDE) technique commonly used for
multi-agent policy training. More specifically, we compute a centralized reward
as the average of all the robots’ individual rewards, using global reward func-
tion R(·), state s and action a information. Thus, the centralized expected return
MARLAS: Multi Agent Reinforcement Learning 353
H t N i i
is given by Jtot (πθ ) = Eτ∼πθ N1 t= 1γ i=1 R st , at | s0 , Control policy πθ is
decentralized in both training and execution.
To optimize θ directly, we use policy gradient which gives the gradient of
the expected return in the form of ∇J(πθ ) ∝ Eτ∼πθ [Gt · ∇ log πθ (at |st )], where
Gt = Tk=t+1 γ k−t−1 R sik , aik . Action at and state st are drawn from τ. To esti-
mate Gt , we adapt the Monte Carlo based REINFORCE method by generating
trajectories τ following πθ . For each training episode of MARLAS, we initialize
robots randomly in E. We assumed the robot knows the initial position of all
the other robots. Each robot is given the control policy with the most updated
parameters θ. At time step t, each robot r i randomly samples action ait from
πθ (ait |sit ). Robot r i receives a reward F(xi ) by sampling data at location xi . Once
the grid cell xi is visited, F(xi ) is set to a negative value indicating the data
has been consumed at this location. If multiple robots enter the same grid cell
simultaneously, the reward will be evenly distributed to each robot in the cell,
and there will be a collision penalty for each robot in the same cell. The reward
function is given by,
F(xi )
R s i , ai = + βcol · I(xi = x j ), ∀ j ∈ I, (4)
c
where c is the total number of robots in the grid of xi , I is the indicator function,
which indicates a collision between robots, and βcol < 0 is the collision penalty.
Each robot generates τ by executing the control policy up to the time horizon H.
Furthermore, we generate M trajectories for each robot as one training batch.
We use REINFORCE with baseline to update the parameters θ as follows
N M H−1
1
θ k+1 = θ k + α · (Gi,m
t − bit )∇ log πθ (ai,m i,m
t |st ) , (5)
N·M i=1 m=1 t=0
1 M i,m
with the baseline defined as bit = M m=1 G t . Superscription i, m are the indices
of the robots and the trajectories. Note that we draw actions according to the
policy distribution in training to enhance the exploration. During deployment,
we draw the action that has the maximum probability.
Fig. 2. Test datasets we used in evaluation, (a) real bathymetry data from Barba-
dos, (b) synthetic cshape distribution, (c) reaction diffusion simulation [21], and (d) a
satellite image of a coral reef.
learned policies can adapt to diverse fields during the test phase as presented in
our results. The results presented in this Section show that the learned control
policies outperform existing baseline methods. We further evaluate scalability of
MARLAS for different robot team and workspace sizes, and the robustness of
MARLAS to communication failures and robot failures.
Testing Datasets: For our testing phase, we consider a set of diverse environ-
ments generated both by real-world data and synthetic simulations, as illustrated
in Fig. 2.
Performance Metrics: Many applications in the domain of robotic sampling
have a need to sample in information-rich locations at the earliest due to a limited
endurance of the robots to carry out the survey, and the environmental processes
being sampled are quasi-static. Hence, we measure the discounted accumulated
reward as one of the performance metrics. Standard deviation of discounted accu-
mulated reward provides a measure on task sharing between the robots. A lower
value implies that the robots contribute more equally to the overall task. Aver-
age pair-wise overlap measures the total pair-wise overlap between the robot
trajectories averaged by the number of all possible neighbors. It reflects the per-
formance of a multi-robot sampling approach in terms of coordination and task
allocation. Average overlap percentage measures the percentage of the averaged
pair-wise overlap in the robot trajectories and is computed as the ratio between
the average pair-wise overlap and the number of steps the robots has traveled.
Coverage measures the completeness of the active sampling mission, calculated
as the ratio between the accumulated reward and the maximum total rewards
possible in the given time horizon H. Average communication volume isa mea-
sure of the amount of communication between robots, given by N1 i t Nti ,
where Nti is the set of neighboring robots of robot r i at time t.
MARLAS: Multi Agent Reinforcement Learning 355
Fig. 3. Comparison with baseline methods: Column (a) illustrates trajectories of the
deployed robot team. Circles are the deployment location, and triangles are the stop
location. Column (b) presents comparisons of discounted accumulated reward. Column
(c) presents comparisons of average pair-wise overlap. The 95% confidence intervals are
over 40 trials.
non-overlapping utility maximizing paths. Even though the average overlaps for
MARLAS with l = 2 is high, the neighbor position estimate embedded learning
helps it to achieve higher discounted rewards compared to the baselines. MAR-
LAS is stochastic because the robots are initialized in the same grid and select
random actions to avoid collision. Therefore, its metrics have a tight confidence
interval.
Fig. 4. Deployment of robot teams over reef data for 150 time duration. The first row
presents robot trajectories (in different colors) as the team size is increased from 2 to
15. Circles are the deployment location, and triangles are the stop location. Second row
presents sequential progress of a sampling run with 15 robots at time steps 10, 40, 80
and 150. Circles represent the robot current positions.
Scalability with the Team Size: In this scenario, we increase the number of
robots from 2 to 20 and fix the √ sensing and communication radius to 10 grids.
We scale the map by a factor of n to maintain a constant density of robots for
different sizes of robot teams. For example, a map of size 26 × 26 is used for a 2
robot experiment, while a map of size 40 × 40 is used for a 5 robot experiment.
MARLAS: Multi Agent Reinforcement Learning 357
Robots start exploring from the upper left corner of the map, as illustrated in
Fig. 4 so that they have the same opportunity to accumulate rewards. Qualitative
results presented in the first row of Fig. 4 show the robot trajectories as the team
size is increased from 2 to 15. The plots in the second row of Fig. 4 illustrate the
sequential progress of a sampling run with 15 robots at time steps 10, 40, 80 and
150. We observe that the robots quickly spread out, identify, and cover to dense
reward regions. Furthermore, the robots spread to the different regions and are
able to perform the sampling task in a coordinated manner.
The quantitative results in Fig. 5 illustrate that the standard deviation of
the discounted accumulated rewards start close to 0 and increase sub-linearly
as the robot team size increases. The increase of the standard deviation val-
ues is expected as the discounted accumulated rewards also increase as team
size increases. We further notice that all the standard deviation values are neg-
ligible compared to the discounted accumulated rewards. This indicates that
the collected rewards are distributed evenly to each individual robot. Together
with the qualitative results, we conclude that each robot continues to contribute
equally to the sampling task as we scale up the team size. Majority of coverage
metrics in Fig. 5(b) remain above 70% regardless of scaling up the number of
robots to 20 and on an 80 × 80 map, which is far bigger than the training setup.
The decrease in coverage is not surprising, as the robots need to travel longer
distances between the dense reward regions as the map resolution increases.
Fig. 5. Quantitative results for scaling the team size. The dots represent each of the
40 trials.
Fig. 6. (a) and (b) Robot team with 5 robots carrying out the sampling task with dcr =
0%Dmax and dcr = 100%Dmax using the learned policy. Circles are the deployment
location, and triangles are the stop location. (c) Quantitative results include plots of
discounted accumulated reward, average pair-wise overlap, and average communication
volume over different communication radii. The 95% confidence intervals over 40 trials.
The plots in Fig. 6(c) illustrate that the discounted accumulated reward
increases and the average pair-wise overlap decreases, and both quickly plateau
at dcr = 30%Dmax . The average communication volume increases linearly and
slows its rate after dcr = 60%Dmax . We observed consistent behaviors over all
the other test datasets. We conclude that, with a moderate range of sensing and
communication, the cooperative behavior is achieved by the team, and increas-
ing the sensing and communication radius yields no significant contribution to
its performance.
scenario with the global communication. This means that the estimates of the
neighbor positions serve as good substitutes for the ground truth information.
The scenario with both the communication failure and the neighbor position
estimation failure performs poorly compared to all the other scenarios showing
that the proposed belief model compensates for complete communication fail-
ures, thus increases the robustness of MARLAS. We observed consistent results
over all the test datasets.
Fig. 7. Robustness to failures: The failures happen at time step 20, denoted as the
dashed line. The 95% confidence intervals are computed over 40 trials. (c) Snapshots
of robot failure experiments.
Fig. 8. Online adaptation: Three sequential time steps of a changing field and MAR-
LAS generating robot paths that adaptively cover the underlying distribution. Circles
represent the start locations, and triangles represent the stop locations.
360 L. Pan et al.
Fig. 9. (a) Indoor water tank experimental setup and embeded picture shows the inter-
nals of a mini robotic boat, (b) MARLAS policies run on mini robotic boats to sample
from the spatial field that is projected over the water surface.
MARLAS: Multi Agent Reinforcement Learning 361
References
1. Low, K.H., Dolan, J.M., Khosla, P.: Adaptive multi-robot wide-area explo-
ration and mapping. In: Proceedings of the 7th international joint conference on
Autonomous agents and Multiagent Systems Volume 1. International Foundation
for Autonomous Agents and Multiagent Systems, pp. 23–30 (2008)
2. Sadat, S.A., Wawerla, J., Vaughan, R.: Fractal trajectories for online non-uniform
aerial coverage. In: 2015 IEEE International Conference on Robotics and Automa-
tion (ICRA), pp. 2971–2976. IEEE (2015)
3. Manjanna, S., Van Hoof, H., Dudek, G.: Policy search on aggregated state space
for active sampling. In: Xiao, J., Kröger, T., Khatib, O. (eds.) ISER 2018. SPAR,
vol. 11, pp. 211–221. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-
33950-0 19
4. Almadhoun, R., Taha, T., Seneviratne, L., Zweiri, Y.: A survey on multi-robot
coverage path planning for model reconstruction and mapping. SN Appl. Sci. 1(8),
1–24 (2019)
5. Salam, T., Hsieh, M.A.: Adaptive sampling and reduced-order modeling of dynamic
processes by robot teams. IEEE Robot. Autom. Lett. 4(2), 477–484 (2019)
6. Venkataramani, R., Bresler, Y.: Perfect reconstruction formulas and bounds on
aliasing error in sub-nyquist nonuniform sampling of multiband signals. IEEE
Trans. Inf. Theory 46(6), 2173–2183 (2000)
7. Rahimi, M., Hansen, M., Kaiser, W.J., Sukhatme, G.S., Estrin, D.: Adaptive sam-
pling for environmental field estimation using robotic sensors. In: 2005 IEEE/RSJ
International Conference on Intelligent Robots and Systems, pp. 3692–3698. IEEE
(2005)
8. Cortes, J., Martinez, S., Karatas, T., Bullo, F.: Coverage control for mobile sensing
networks. IEEE Trans. Robot. Autom. 20(2), 243–255 (2004)
9. Durham, J.W., Carli, R., Frasca, P., Bullo, F.: Discrete partitioning and coverage
control for gossiping robots. IEEE Trans. Rob. 28(2), 364–378 (2011)
10. Aurenhammer, F.: Voronoi diagrams-a survey of a fundamental geometric data
structure. ACM Comput. Surv. (CSUR) 23(3), 345–405 (1991)
362 L. Pan et al.
11. Breitenmoser, A., Schwager, M., Metzger, J.-C., Siegwart, R., Rus, D.: Voronoi
coverage of non-convex environments with a group of networked robots. In: 2010
IEEE International Conference on Robotics and Automation, pp. 4982–4989. IEEE
(2010)
12. Mishra, M., Poddar, P., Chen, J., Tokekar, P., Sujit, P.: Galopp: multi-agent
deep reinforcement learning for persistent monitoring with localization constraints,
arXiv preprint arXiv:2109.06831 (2021)
13. Chen, J., Baskaran, A., Zhang, Z., Tokekar, P.: Multi-agent reinforcement learning
for visibility-based persistent monitoring. In: 2021 IEEE/RSJ International Con-
ference on Intelligent Robots and Systems (IROS), pp. 2563–2570. IEEE (2021)
14. Kantaros, Y., Schlotfeldt, B., Atanasov, N., Pappas, G.J.: Sampling-based planning
for non-myopic multi-robot information gathering. Auton. Robot. 45(7), 1029–1046
(2021)
15. Kapoutsis, A.C., Chatzichristofis, S.A., Kosmatopoulos, E.B.: DARP: divide areas
algorithm for optimal multi-robot coverage path planning. J. Intell. Robot. Syst.
86(3), 663–680 (2017)
16. Hsu, C.D., Jeong, H., Pappas, G.J., Chaudhari, P.: Scalable reinforcement learning
policies for multi-agent control. In: 2021 IEEE/RSJ International Conference on
Intelligent Robots and Systems (IROS), pp. 4785–4791. IEEE (2021)
17. Dibangoye, J.S., Amato, C., Buffet, O., Charpillet, F.: Optimally solving Dec-
POMDPs as continuous-state MDPs. J. Artif. Intell. Rese. 55, 443–497 (2016)
18. Khatib, O.: Real-time obstacle avoidance for manipulators and mobile robots. In:
Cox, I.J., Wilfong, G.T. (eds.) Autonomous Robot Vehicles. Springer, New York,
NY (1986). https://doi.org/10.1007/978-1-4613-8997-2 29
19. Manjanna, S., Hsieh, M.A., Dudek, G.: Scalable multi-robot system for non-myopic
spatial sampling, arXiv preprint arXiv:2105.10018 (2021)
20. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization, arXiv preprint
arXiv:1412.6980 (2014)
21. Dai, C., Zhao, M.: Nonlinear analysis in a nutrient-algae-zooplankton system with
sinking of algae. In: Abstract and Applied Analysis, vol. 2014. Hindawi (2014)
22. Meghjani, M., Manjanna, S., Dudek, G.: Multi-target rendezvous search. In: 2016
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS),
pp. 2596–2603. IEEE (2016)
Proportional Control for Stochastic
Regulation on Allocation of Multi-robots
1 Introduction
Modeling an ensemble of robots as an aggregate dynamical system offers flexi-
bility in the task assignment and is an alternative to traditional bottom-up task
allocation methods, which usually are computationally expensive (see [3,5] and
references therein). Given a desired allocation of robots to tasks, each robot
must navigate, handle dynamic constraints, and interact with the environment
to achieve the desired allocation while meeting some desired team-level perfor-
mance specifications. It is well understood that uncertainties resulting from the
robot’s interactions with the environment, execution of its assigned tasks, and
noise from its own sensors and actuators might lead to several issues during task
allocation (e.g., performance loss and inaccuracy) [13,14]. This is because alloca-
tions are often computed prior to execution and do not account for uncertainties
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 363–377, 2024.
https://doi.org/10.1007/978-3-031-51497-5_26
364 T. C. Silva et al.
While they highlighted the importance of considering variance control, since the
mean can be a sensitive control objective, they do not propose a methodology
to adjust the variance.
In this paper, we present a strategy to govern the mean and variance of
the distribution of robots across a set of tasks. The contributions of this work
include 1) a decoupled formulation of the first- and second-order moments for
an ensemble model of robot teams that allows their adjustment individually,
and 2) systematic methods to determine the control gains to achieve the desired
distributions of robots over tasks. Our experimental results indicate that our
proposed macroscopic techniques allows for improved control of small number
teams.
2 Problem Formulation
Assumption 1. The graph that maps the relationship between tasks, G(V, E),
is connected and undirected.
Assumption 1 allows the robots to sequentially move back and forth between
tasks. For some task topologies, it is possible to adjust the transition parameters
such that directionality in the task graph is obtained [2].
To derive the macroscopic ensemble model for the N robot team simultane-
ously executing M tasks, we assume each robot can execute only one task at a
given time and it must switch between adjacent tasks to complete the team’s
wide objective. The model needs to capture the global stochastic characteris-
tics that emerge from a robot’s interaction with other robots and the environ-
ment, such as variations in terrain [12,17], collision avoidance [11], and delays
[1]. Thus, the state of our system is given by a vector of random variables
X(t) = [X1 (t) · · · XM (t)], where the prime symbol in a vector, a , denotes its
366 T. C. Silva et al.
transpose, and each element of X(t) describes M the number of robots executing
the respective task at time t, therefore i= X
1 i (t) = N for any t ≥ 0.
We aim to characterize the evolution of the random variable Xi (t), for each
i ∈ {1, . . . , M } and, ultimately, regulate its value. However, the stochastic nature
of these variables might lead to convergence only to a vicinity of the desired
value, where the size of the convergence region depends on robot-level deviations
from the desired performance [9]. In principle, the control of systems subject to
random variations could be approached by applying robust control techniques
and defining input-to-state stability criteria. However, in our scenario, defining
bounds for variations of robot-level behavior for a general number of tasks and
robots can be difficult. Therefore, we approach the problem by controlling the
mean value of the state variables Xi (t) and the dynamics of their second-order
moments. Observe that the vicinity size of the desired objective is directly asso-
ciated with the second-order moment.
The states of our system characterize pure jump processes Xi (t) : [0, ∞) → D,
where D is a finite discrete set. In particular, let ψ : D × [0, ∞) → R be a real-
valued function, it is a known result that the dynamics of the expected value of
ψ is described by the following relation (see Chapter 1 in [16]),
d
E[ψ(Xi (t))] = E[Lψ(Xi (t))],
dt
in which L is the infinitesimal generator of the stochastic process defined as
Lψ(Xi (t)) = [ψ(φ (Xi (t))) − ψ(Xi (t))]λ (Xi (t)), (1)
where λ (Xi (t)) is a function of the number of robots at task Xi and in its neigh-
borhood which describes the rate of transitions, i.e., λ (Xi (t))dt represents the
probability of a transition in Xi (t) occurring in dt time, φ (Xi (t)) maps the size
of a change on Xi (t) given that an transition happened. In our case, we assume
two possible unitary changes in dt, a transition of type = 1, in which one agent
leaves Xi (t) defined by φ1 (Xi (t)) = Xi (t) − 1, and of type = 2 where one agent
arrives Xi (t), defined by φ2 (Xi (t)) = Xi (t) + 1. Hence, ∈ {1, 2} maps the type of
transition, i.e., one robot leaving or one agent arriving at Xi (t), respectively.
In this paper, we are interested in enforcing desired first- and second-order
moments of the distribution of robots across tasks. Specifically, the problem
investigated can be stated as:
[10], the proposed control strategy relies on a feedback linearization action, which
leads to a closed-loop system of the form (see equation (9) in [11])
d
E[X(t)] = (K α + K β )E[X(t)],
dt
where K α is a matrix that maps the transition rate of the system and K β is a
matrix with new parameters to attain the desired variance. The main drawback
of this approach is twofold: i) by changing K β the stationary distribution of
the equivalent Markov chain also changes, which makes it necessary to readjust
the gains for the whole system, and ii) addressing the regulation of second-
order moments using K β is equivalent to changing the transition rate of the
whole process during the entire evolution. In comparison, our proposed strategy
overcomes these limitations by simply manipulating the graph structure and
defining a new structure for λ (·), in such a way that the new feedback strategy
does not directly modify the stationary distribution of the equivalent Markov
chain and provides changes to the transition rates according to the current state
of the system. We describe our methodology in the following section.
3 Methodology
3.1 Control Strategy
We model the arrival and departure of robots by pure stochastic jump processes.
For example, for two tasks:
X1 (t) ←→ X2 (t) (2)
λ· (·)
the random variables X1 (t) and X2 (t) represent the number of robots in each task
at time t, and λ (Xi ) for ∈ {1, 2} maps the transition rate of agents between
tasks. Motivated by Klavins [10] on gene regulatory networks, we approach Prob-
lem 1 by defining the transition rate functions with a term that is proportional
with the number of agents on a given site and in its neighborhood. In Eq. (2)
this choice means that switching from task 1 to 2 and from 2 to 1 will depend
on the number of agents at both tasks. In particular, for Eq. (1) we define the
transition rates as,
λ1 (Xi ) = ki j Xi − βi Xi X j , (3)
j ∈Ni
λ2 (Xi ) = k ji X j − βi Xi X j . (4)
j ∈Ni
Remark 1. A necessary condition for the equations in (3) and (4) to be valid
transition rates is being non-negative. While it is possible to guarantee positive-
ness during the whole evolution for some well-designed initial conditions with
careful choices of gains, we instead, let robots transition back to previous tasks
before the team reaches steady-state by mapping the equivalent non-negative
transition with reversed direction. This flexibility is the justification for Assump-
tion 1.
368 T. C. Silva et al.
It is worth noticing that our control strategy is distinct from the one presented
in [10]. Here, we only require the transition rate to be adjusted according to the
current number of agents in the tasks and in their neighborhood. While in [10]
it is assumed that each node in the network has a source and a sink, which
implies that the variation on the value of the random variable Xi can increase
or decrease independently of the variation in its neighbors, in addition to the
switching between adjacent nodes.
In the following, we show that with our choice for the transition rates λ (Xi ),
the first-order moment depends only on the parameter ki j , while the variable βi
manifests on the second-order moments. This introduces a free variable, allowing
the variance to be properly adjusted.
Defining ψ(Xi ) = Xi and applying the infinitesimal generator (1) with the
transition rates in (3) and (4), we get the following first-order moment for a
general arrangement,
d
E[Xi ] = E k ji X j − βi Xi X j − ki j Xi − βi X j Xi
dt j ∈Ni j ∈Ni
= k ji E[X j ] − ki j E[Xi ] .
j ∈Ni
where ei , for i = 1, ..., M, are the canonical basis of R M and ⊗ denotes the Kro-
necker product. Eqs. (6) and (7) model the dynamics of the first- and second-
order moments of a group of N robots executing M tasks. There are two impor-
tant underlying assumptions on this representation, first we assume the timing
for robots to leave and arrive at a task follows a Poisson process. We believe this
assumption is not too restrictive since, as discussed above, robot level devia-
tions from desired performance will impact the scheduling, leading to stochastic
jump processes. The second assumption, and arguably the hardest to justify for
general systems, is that we can actuate on the system’s transition rates. To be
able to do this in our case, notice the structure of λ (·) in (3) and (4), we need
to monitor the current number of agents at each task and their neighborhood
and then communicate this value to each agent at these sites to compute the
switching time. Dynamically defining the transition rates has an intricate rela-
tionship with the allocated time and microscopic deviations. We expect that the
feedback nature of our approach accounts for microscopic deviations from the
desired performance in the long run.
initial distribution, with transition rates defined in (3) and (4), if, and only if
K E[X d ] = 0.
370 T. C. Silva et al.
Proof. The demonstration follows from standard Lyapunov analysis. Note that,
by definition, the eigenvalues of matrix K have negative real part, except for
one which is zero, that is, 0 = σ1 (K) > σ2 (K) ≥ · · · ≥ σM (K), where σi (K)
denotes the ith eigenvalue of K. Hence, defining a simple Lyapunov function
candidate, V = E[X] PE[X], where P is a symmetric positive definite matrix.
The time-derivative of V along the trajectories of the first-order moments yields
dV
= E[X] K P + PK E[X],
dt
because every eigenvalue of K has negative or zero real part, we have that
dV
≤ 0.
dt
Consider the invariant set S = {E[X] ∈ R M : dV dt = 0}, since the robots will always
M
be in one of the graph’s nodes, i.e., i= 1 Xi = N, we have that E[X(t)] 0 for
all t > 0. Therefore, no solution can stay in S other than E[X d ], in other words,
the equilibrium point E[X d ] is globally asymptotically stable.
Notice that we assumed that each robot will always be assigned to one task
at any instant, to consider robots in transit between nodes it is possible to
extend the task graph as in [1]. In addition, by construction, K results in a
stable ensemble. The compelling aspect of Theorem 1 with respect to our model
for the first- and second-order moment, Eqs. (6) and (7), is that we can use
the condition provided on the theorem to design the gains to attain a desired
distribution, while still having free variables βi to adjust the second-order terms.
We compute the gain matrix K by solving the following optimization:
K = argmin QE[X d ] (8)
Q
Since every eigenvalue of K has negative real part except for one which is equal
to zero, and E[Xd ] 0, we have an unique solution for C [9,10]. Therefore,
the additional tuning variables βi , for i ∈ V are inversely proportional to the
covariance at the steady-state–the bigger the values of βi , while (3) and (4) are
positive during the steady-state (recalling Remark 1), the smaller the covariance.
Control for Stochastic Regulation on Allocation of Multi-robots 371
4 Results
4.1 Numerical Simulations
Example 1. We use the stochastic simulation algorithm proposed by Gillespie [6]
to evaluate the proposed control strategy. We present a four task example, M = 4,
with the topology represented in Fig. 1 (a). We run two different tests to evaluate
the effectiveness of our method, one with parameters βi = 0, for i = 1, . . . , 4, and
another with nonzero βi parameters (values given below). In both cases, we aim
to achieve the same final distribution. The initial populations at each task is
given by X 0 = [5 15 5 5], and the desired distribution is E[X d ] = [13 9 6 2].
The gain matrix K used to reach such a distribution was computed solving the
optimization problem (8) with K = {K ∈ R4×4 : K ≤ −1.5I}, this constraint was
imposed to ensure not too long convergence time. The resulting parameters from
the optimization are k12 = 2.1, k14 = 1.4, k21 = 1.5, k23 = 1.3, k32 = 0.9, k34 = 1.2,
k41 = 0.1, k43 = 0.6, and 0 elsewhere outside the main-diagonal. We computed
the mean and variance from the simulation considering 130 data points sampled
for t > 2.0 seconds in each simulation. This number was chosen to match the
sample size between the two experiments. The simulation with βi = 0 for all i =
1, . . . , 4 in Fig. 1 (b) has the following mean E[X] = [12.16 9.76 5.18 2.77], and
Fig. 1. The graph topology is depicted in (a). In (b) and (c) it is shown a realization of
each of the stochastic simulations, where the dots represent the number of agents and
the solid lines represent the numerical solution of (6). The tasks are identified by the
color caption. The realization in (b) uses βi = 0, ∀i ∈ V, while in (c) is with β1 = 0.05,
β2 = 0.20, β3 = 0.11, and β4 = 0.052.
372 T. C. Silva et al.
variances, diag(E[X X ]) = [5.78 6.83 4.20 1.44]. The variances were improved
by greedily modifying βi values–if the variance was reduced and did not actively
disturb another task population variance then that value was kept. Figure 1 (c)
used parameters β1 = 0.05, β2 = 0.20, β3 = 0.11, and β4 = 0.052, where the
resulting means are E[X] = [12.26 9.30 5.83 2.60] variances are diag(E[X X ]) =
[1.06 1.12 1.15 0.45].
These results highlight the impact of variability on small size teams (N = 30).
Visually, the control reduces the variance throughout the experiment, especially
once steady-state is reached. It is worth mentioning that we have tested our
strategy on larger teams and the results were in agreement with our hypothesis
that it is possible to shape the variance using (3) and (4), however, due to space
limitations of the venue, we chose to not show those results.
Example 2. In this example, we run 6 different trials with varying team sizes to
numerically evaluate the variation in accuracy for teams of different sizes. To
this end, we consider four tasks with topology represented in Fig. 1 (a), and we
computed the variance considering 160 data points sampled from each simulation
for t > 2.0 s. In addition, we vary the total number of robots between simulations,
hence relative the initial and desired distributions for each simulation are: X 0 =
[25% 25% 0% 50%], and E[X d ] = [50% 50% 0% 0%], respectively, given as
a percentage of the total number of robots in the team. In these simulations,
we used the same values for the parameters βi as in Example 1, and we do
not change them between simulations. To provide an intuition of the level of
spread in each scenario, we compute the Relative Variance (RV) as the quotient
of the mean by the variance of the respective task. The results are presented
in Table 1. We notice that for this scenario, the RV for the multi-robot system
βi = 0 for i = 1, . . . , 4 βi 0 for i = 1, . . . , 4
N = 52 N = 26 N = 16 N = 52 N = 26 N = 16
E[X 1 ] 24.8 12.8 8.2 25.2 13.3 7.5
E[X 2 ] 26.5 12.7 7.7 25.6 14.6 8.3
E[X 3 ] 0.5 0.0 0.0 0.6 0.0 0.1
E[X 4 ] 0.2 0.0 0.0 0.4 0.0 0.0
RV(X 1 ) 0.49 0.62 0.61 0.21 0.36 0.53
RV(X 2 ) 0.54 0.63 0.69 0.21 0.23 0.47
RV(X 3 ) 0.97 0.1 0.0 0.88 0.00 1.56
RV(X 4 ) 0.75 0.1 0.0 1.38 0.00 0.00
Control for Stochastic Regulation on Allocation of Multi-robots 373
without βi and with 52 robots is similar to the RV of a team with only 16 robots
and βi values. This suggests that our methodology improves the accuracy of the
allocation of relatively small teams. Figures 2 (a) and (b) provide a realization
for an instance with βi = 0 and N = 52 and with βi 0 and N = 16.
Fig. 2. In (b) and (c) it is shown a realization of each of the stochastic simulations
with N = 52 and N = 16, respectively, where the dots represent the number of agents
and the solid lines represent the numerical solution of (6). The figure axis are the same
to help the visualization of the relative spread. The tasks are identified by the color
caption and the graph topology is depicted in Fig. 1 (a). The realization in (b) uses
βi = 0, ∀i ∈ V, while in (c) is with β1 = 0.05, β2 = 0.20, β3 = 0.11, and β4 = 0.052.
374 T. C. Silva et al.
5 Discussion
Fig. 3. The mCoSTe environment and mASV platform used for experimental results.
Control for Stochastic Regulation on Allocation of Multi-robots 375
Fig. 4. Experimental results from a 4 minute and 4 mASV run, with task graph in
Fig. 1 (a). The desired distribution is one agent at each task. The test in Fig. 4 (a)
had 28 switches, while in Fig. 4 (b) it had 13 switches. The colors’ intensity reflects
time–the lighter the color the earlier that path was cruised in the trial.
robot teams. In general, the intuition for these models is that they are more
accurate as the team size increases. However, when using our proposed method
and directly adjusting the team variance it is possible to increase the top-down
stochastic model accuracy for smaller teams. We argue that such refinement is
a stride towards sparse swarms–even though we are technically approaching the
team size problem and ignoring the task space size.
Although we have formally shown the impact of the additional tuning vari-
ables as well as the decoupling between first- and second-order moments, and also
experimentally and numerically investigated the influence of such variables, we
did not draw general mathematical expressions to compute their values during
the design stage. In our investigations, we used a greedy algorithm that increased
the variable number based on the desired values of variance and final distribu-
tion. In addition, our optimization problem to compute the gain matrix K for
a desired distribution incorporates designer knowledge about the robot capabil-
ities through the set K, which will directly affect the robots’ mean transition
time. We plan to formalize this unsolved concern in future investigations.
We have provided a new structure for the transition rates for ensemble robot
distributions modeled from a top-down perspective. As a first result, we have
demonstrated that such a structure leads to uncoupled parameters to adjust
the mean and the variance of desired team’s distribution. Then, based on this
finding, we examined simple design strategies to compute the necessary gains.
This approach provides an efficient ensemble behavior for relatively small groups.
Finally, physical and numerical experiments were implemented, illustrating the
effectiveness of the method. Possible future work includes the extension of the
376 T. C. Silva et al.
References
1. Berman, S., Halász, A., Hsieh, M.A., Kumar, V.: Optimized stochastic policies for
task allocation in swarms of robots. IEEE Trans. Rob. 25(4), 927–937 (2009)
2. Deshmukh, V., Elamvazhuthi, K., Biswal, S., Kakish, Z., Berman, S.: Mean-
field stabilization of Markov chain models for robotic swarms: computational
approaches and experimental results. IEEE Rob. Autom. Lett. 3(3), 1985–1992
(2018). https://doi.org/10.1109/LRA.2018.2792696
3. Elamvazhuthi, K., Berman, S.: Mean-field models in swarm robotics: a survey.
Bioinspiration Biomimetics 15(1), 015001 (2019). https://doi.org/10.1088/1748-
3190/ab49a4
4. Elamvazhuthi, K., Biswal, S., Berman, S.: Controllability and decentralized stabi-
lization of the Kolmogorov forward equation for Markov chains. Automatica 124,
109351 (2021). https://doi.org/10.1016/j.automatica.2020.109351
5. Gerkey, B.P., Matarić, M.J.: A formal analysis and taxonomy of task allocation in
multi-robot systems. Int. J. Rob. Res. 23(9), 939–954 (2004)
6. Gillespie, D.T.: Exact stochastic simulation of coupled chemical reactions. J. Phys.
Chem. 81(25), 2340–2361 (1977)
7. Hamann, H.: Towards swarm calculus: universal properties of swarm performance
and collective decisions. In: Dorigo, M., et al. (eds.) ANTS 2012. LNCS, vol.
7461, pp. 168–179. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-
642-32650-9 15
8. Hsieh, M.A., Halász, Á., Berman, S., Kumar, V.: Biologically inspired redistri-
bution of a swarm of robots among multiple sites. Swarm Intell. 2(2), 121–141
(2008)
9. Khalil, H.K.: Nonlinear Systems, vol. 10, 3rd edn. Prentice Hall, Hoboken (2002)
10. Klavins, E.: Proportional-integral control of stochastic gene regulatory networks.
In: 49th IEEE Conference on Decision and Control (CDC), pp. 2547–2553. IEEE
(2010)
11. Mather, T.W., Hsieh, M.A.: Distributed robot ensemble control for deployment to
multiple sites. In: Robotics: Science and Systems VII (2011)
12. Mather, T.W., Hsieh, M.A.: Ensemble modeling and control for congestion man-
agement in automated warehouses. In: 2012 IEEE International Conference on
Automation Science and Engineering (CASE), pp. 390–395 (2012). https://doi.
org/10.1109/CoASE.2012.6386498
13. Nam, C., Shell, D.A.: When to do your own thing: analysis of cost uncertainties in
multi-robot task allocation at run-time. In: 2015 IEEE International Conference
on Robotics and Automation (ICRA), pp. 1249–1254 (2015). https://doi.org/10.
1109/ICRA.2015.7139351
Control for Stochastic Regulation on Allocation of Multi-robots 377
14. Nam, C., Shell, D.A.: Analyzing the sensitivity of the optimal assignment in proba-
bilistic multi-robot task allocation. IEEE Rob. Autom. Lett. 2(1), 193–200 (2017).
https://doi.org/10.1109/LRA.2016.2588138
15. Napp, N., Burden, S., Klavins, E.: Setpoint regulation for stochastically interacting
robots. Auton. Robot. 30(1), 57–71 (2011). https://doi.org/10.1007/s10514-010-
9203-2
16. Øksendal, B., Sulem, A.: Applied Stochastic Control of Jump Diffusions, 2nd edn.
Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-69826-5
17. Prorok, A., Hsieh, M.A., Kumar, V.: The impact of diversity on optimal control
policies for heterogeneous robot swarms. IEEE Trans. Rob. 33(2), 346–358 (2017)
18. Ravichandar, H., Shaw, K., Chernova, S.: Strata: unified framework for task assign-
ments in large teams of heterogeneous agents. Auton. Agents Multi Agent Syst.
34(2), 38 (2020)
19. Tarapore, D., Groß, R., Zauner, K.P.: Sparse robot swarms: moving swarms to
real-world applications. Front. Robot. AI 7 (2020). https://doi.org/10.3389/frobt.
2020.00083
Comparing Stochastic Optimization
Methods for Multi-robot, Multi-target
Tracking
1 Introduction
Multi-robot, multi-target tracking (MR-MTT) encompasses many traditional
robotics problems, including search and rescue [19], surveillance [10], and map-
ping [9]. All of these scenarios require a team of robots to search across an
environment to detect, identify, and track individual objects. A generic solution
to the MR-MTT problem consists of two parts: 1) an estimation algorithm to
determine the number and states of all targets and 2) a control algorithm to
drive the robots to seek out undetected targets as well as track detected targets.
These problems are extensively studied in multi-robot systems [29].
target. Often the number of targets is unknown a priori and in many cases the
targets are not uniquely identifiable. One of the main challenges in MTT is
the data association problem, i.e., matching measurements to potential targets.
Stone et al. [25] discuss this problem in their book, and survey existing solutions,
such as the Multiple Hypothesis Tracker (MHT) [1], Joint Probabilistic Data
Association (JPDA) [12], and the Probability Hypothesis Density (PHD) filter
[17]. Each of these different methods solve data association in a different way.
We pick the PHD filter due to its relative simplicity and good performance.
Multi-robot search is another well studied problem in the literature, with
many methods existing methods, as discussed in the survey [22]. In recent
years, coordination policies using Multi-Agent Deep Reinforcement Learning
(MADRL) have become popular [21,30], with some examples use partially
observable Monte-Carlo planning [14] while others perform a joint optimization
of target assignment and path planning [20]. However, many of the methods are
based on a centralized policy and global information, which is a main challenge in
MADRL. One of the most widely used classical techniques is Lloyd’s algorithm
[11], a decentralized algorithm which iterative partitions the environment and
drives the robots using a gradient descent controller. Due to its wide popularity,
we will use Lloyd’s algorithm as our baseline approach for comparison.
In this paper, we will examine the use of different numerical optimiza-
tion algorithms to drive MR-MTT. These algorithms fall into two broad cat-
egories: deterministic and stochastic. Deterministic methods often suffer from
local optima entrapment, particularly for the gradient-based methods which
require the knowledge of the entire search space [18]. However, in our setting,
the target locations are unknown a priori, which makes the use of gradient-based
methods challenging. In contrast, stochastic optimization methods do not rely
on the gradient descent which makes these particularly fitted for solving prob-
lems with unknown search spaces. Based on the number of candidates solutions,
there are individual-based and population-based algorithms. In the context of
numerical optimization, these techniques are typically used to find a single global
extremum of a function. However, in the MR-MTT problem, we aim to find all
local maxima (i.e., all target locations), which requires us to reformulate these
techniques to suit our problem.
In the family of the individual-based methods, there are Tabu Search [13],
Hill Climbing [7], and simulated annealing (SA) [16]. The SA algorithm is widely
used in solving unconstrained and bound-constrained optimization problems.
This method models the physical activity of heating and tempering in mate-
rial process. It is likely to accept worse solutions at a rate proportional to the
temperature. This enables SA explore broad search space and escape from local
optima, which will allow the robots to better explore the environment. People
proposed different versions of distributed control with SA based on a variety of
objective functions. One similar work is to minimize the total travel distance
under Boolean specification using SA [24]. Their scenario is similar to that in
Travelling salesman problem (TSP), which enumerates the trajectories. The tra-
jectories are the combination of different nodes.
380 P. Xin and P. Dames
1.2 Contributions
2 Problem Formulation
We have a team of R robots exploring a convex environment E ⊆ R2 to search for
an unknown number of targets. The set of targets are Xt = {xt1 , xt2 , . . .}, where
xti is the state of the ith target at time t. At each time step, robot r receives a
set of measurements Ztr = {zt1,r , zt2,r , . . .}. The number of measurements depends
upon the number of targets within the robot’s field of view (FoV) as well as the
chance of receiving a false negative or false positive detection.
The PHD filter [17] models the sets X and Z above as realizations of Poisson
random finite sets (RFSs), which are sets composed of independently and identi-
cally distributed (i.i.d.) elements where the number of elements follows a Poisson
distribution. The likelihood of such an RFS X is
p(X) = e−λ v(x) (1)
x∈X
where v(x) is the PHD and λ = E v(x)dx. The PHD is a density function over
the state space of the targets, and the expected cardinality of X is the integral
of the PHD over E.
Stochastic Optimization for Multi-robot Tracking 381
The PHD filter recursively updates v(x), which completely encodes the dis-
tribution over sets, using the measurement sets collected by the robots and
the motion model of targets. The PHD filter uses six models. 1) The target
motion model f (xt | xt−1 ) characterizes the state transition model of x. 2)
The birth PHD b(xt ) describes both of the number of targets and their loca-
tion at t. 3) The survival probability, ps (xt−1 ), models the existence of a tar-
get with state xt−1 . 4) The detection model is pd (xt | q) describes the targets’
detected states xt with a sensor state q. 5) The measurement model, g(z | xt ; q),
describes the probability of a robot with state q receiving measurement z from
a detected target with state xt . Combining this with the detection model yields
ψz,q (xt ) = g(z | xt , q)pd (xt | q), which characterizes the probability of receiving
measurement z from a target with state xt by a sensor locating at q. 6) Finally,
the clutter model c(z) is another PHD describing the number and states of false
positive detections. With these targets and sensor models, the prediction and
update of the PHD filter are:
t t t
v̄ (x ) = b(x ) + f (xt | xt−1 )ps (xt−1 )v t−1 (xt−1 ) dxt−1 (2a)
E
ψz,q (xt )v̄ t (xt )
v t (xt ) = (1 − pd (xt | q))v̄ t (xt ) + (2b)
t
ηz (v̄ t (xt ))
z∈Z
ηz (v) = c(z | q) + ψz,q (xt )v t (xt ) dxt (2c)
E
Here, ηz (v) is a normalization factor, though note that E v(x) dx = λ, i.e., the
expected number of targets in the search space, rather than 1, which is standard
in single-target tracking.
Since the PHD v(x) is a density function in the state space of targets, we
need to make an additional assumption to parametrically represent it in a manner
that is amenable to computations. The two traditional methods are as a set of
weighted particles [28] or a mixture of Gaussians [27]. We use the former, as this
allows for arbitrarily non-linear sensing and target models.
Lloyds’ algorithm [3] iterative partitioning the environment and navigating the
robots towards the weighted centroid of their Voronoi cell. This process locally
minimizes the function:
H(q1 , q2 , ..., qR ) = min f (d(x, qr ))φ(x)dx, (3)
E r∈{1,...,R}
This is a distributed control algorithm since if each robot knows the positions
of its neighbors, the robot is able to compute its Voronoi cell and move to the
weighted centroid of its Voronoi cell. In our case, the PHD serves as weighting
function φ(x), converting the traditional coverage problem into a search and
tracking problem.
2.4 Assumptions
The first assumption we make is that each robot knows its own position. While
this is a strong assumption, it can be easily achieved using a localization algo-
rithm in indoor environments [4] or GPS receivers in outdoor environments.
Additionally, we have shown that the distributed PHD filter can perform well
even with some bounded uncertainty in the pose estimates [6].
Another assumption we make is that all robots can communicate with all of
their Voronoi neighbors, this is a necessary assumption under Lloyd’s algorithm
(and for our distributed PHD filter formulation). In a practical setting, this
problem could be addressed by using robots as communication relay or using
multi-hop communication strategies.
3 Distributed Control
As is mentioned in the introduction, stochastic optimization algorithm can be
divided into population-based algorithm and individual-based algorithm accord-
ing the number of candidate solutions. In this paper, we pick one typical method
from each family and compare these methods with the standard baseline of
Lloyd’s algorithm. Specifically, we use SA as an example of individual-based
algorithms and PSO as an example of population-based algorithms.
Stochastic Optimization for Multi-robot Tracking 383
which allows the solution to escape from local optima and hopefully reach a
global optimum, with the probability of a jump decreasing over time as the
temperature cools.
In PSO, a collection of particles P move on the search space and the algorithm
evaluate the objective value at each particle. These particles are the candidate
solutions where the particle i locates at qit at time t. The velocity vit of particle
i is updated with the algorithm evaluation function.
qti = qt−1
i + vti
(6)
vti = wvt−1
i + c1 r1 (q∗i,b − qti ) + c2 r2 (q∗i,g − qti )
where r1 and r2 are random numbers between 0 and 1, c1 and c2 are predefined
constant numbers in PSO. And w is called as the inertia term. Here, qi∗ i,b is the
position of particle i where the objective function achieves its best value (i.e., a
local best), and q∗i,g is the position with the best objective function value in the
history (i.e., a global best).
PSO is a metaheuristic method that generates multiple candidate solutions
in the search space at the same time. This has a natural analog in multi-robot
search, where each robot represents a single particle. However, slightly different
from standard PSO, a robot can measure the value of points everywhere in
its FoV instead of a single point. Additionally, our objective is not to find a
single global best value for the objective function, but rather to find all local
maxima (i.e., all targets). Thus, we need to redfine the meaning of q∗·,g and q∗·,b .
In Algorithm 2, we set q∗·,g to be the weighted center of Voronoi cell (i.e., the
location from Lloyd’s algorithm). And if a robot obtains the highest weight in its
history at its current location, it will set q as the q∗·,b . The highest weight in the
history, denoted as whistory , is initialized as the obtained weight at t = 1, and it
is updated with the current weight wt if wt ≥ whistory . Since r1 , r2 ∼ rand(0, 1),
the robot moves with random trade off between q∗·,b and q∗·,g .
Stochastic Optimization for Multi-robot Tracking 385
All teams use the distributed PHD filter for MTT regardless of the search strat-
egy. This is a limiting factor in terms of both computational complexity and
communication bandwidth. It requires robots to exchange ownership of por-
tions of the PHD density function as the Voronoi partitions shift [6, Algorithm
1], exchange information about targets that may move across the boundaries of
Voronoi cells [6, Algorithm 2], and exchange measurements about targets viewed
in neighboring Voronoi cells [6, Algorithm 3]. The standard PHD filter equations
scale linearly in complexity with the number of targets (i.e., measurements) [6,
eq. (4)]. The complexity and communication load of the distributed PHD filter
depend on the situation. For example, the particle exchange step [6, Algorithm
1] depends on the amount of area changing ownership, which can be large when
the entire team is tightly packed into one area (yielding very large Voronoi cells
for boundary robots). The PHD update [6, Algorithm 3] scales linearly with the
number of robots who can see into one another’s Voronoi cells, again making
the tightly packed case difficult. When all robots are evenly distributed in the
space, the complexity and communication bandwidth will be minimal. Note also
the distributed PHD filter assumes that all robots can communicate with their
Voronoi neighbors, that communication channels are lossless, and that it ignores
the effects of latency. Addressing these real-world factors will be the subject of
future work.
The different control policies (Lloyd’s algorithm, SA, and PSO) all utilize
the resulting PHD estimate to find the goal location. Since robots compute their
goal locations using only the PHD within their own Voronoi cell, no additional
communication is required for the control computations.1 Additionally, the con-
trol computations are all quite straightforward and have constant complexity
with respect to team size and target count.
Lastly, it should be noted that the “particles” used within our SA and PSO
algorithms are actually individual robotic agents in the team. Thus, while it is
well known that the performance of SA and PSO is dependent upon the number
of particles, in our case this simply means it is a function of team size.
1
This assumes that, per the note in Algorithm 1, line 2, the Voronoi computations
are combined with the distributed PHD filter step.
386 P. Xin and P. Dames
4 Experimental Results
In this section, we conduct a set of simulated experiments using Matlab so as
to 1) evaluate the correctness and 2) demonstrate the efficacy of the proposed
distributed estimation and control algorithm by comparing with the distributed
methods used in our previous work [6].
The environment is an open 100 × 100 m area with no obstacles. For simplic-
ity, we represent PHD using a uniform grid of particles with 1 m spacing. The
initial weight for each grid is wi = 1 · 10−4 so the initial number of expected tar-
gets in the environment is 1. The robots have a maximum velocity of 2 m/s and
are holonomic. The sensor collects the data at 2 Hz and the sensor specification
are:
0.8 x − q ≤ 5m
pd (x | q) =
0 else
(7)
g(z | x, q) = N (z | x − q, 0.25I2 )
c(z | q) = 3.66 · 10−3
The expected number of clutter detections per scan is F oV c(z | q) dz = 0.287.
Note, these values do not represent any specific real-world situation, but could
represent a team of drones equipped with downward facing cameras. For a real
situation, one could follow the procedures outlined in our other work to carefully
characterize the sensor models [2,5].
uses Lloyd’s algorithm alone [6]. The target positions are drawn uniformly at
random, while the robots all begin in a 20 × 10 m rectangle at the bottom center
of the environment. For each configuration of target number, robot number, and
control algorithm, we run 10 trials. Note that to ensure consistency in the results,
we use identical target distributions across all team/control methods, i.e., there
are 10 different randomized target distributions for each configuration.
Tracking Accuracy. Figure 1 shows the statistics of the median OSPA error
over the ten trials using our original function (PHD), PSO, and SA, respectively.
We can see in all cases, as the size of the team increases and surpasses the number
of targets, the OSPA error reaches a minimum and does not decrease, even with
the addition of more robots. The OSPA error of around 0.8 indicates that almost
all targets have been detected and tracked with higher accuracy than the PHD
grid (1 m). This empirical minimum is based on the accuracy of the sensors, the
particle spacing, and our method for extracting target locations (i.e., finding
peaks in the PHD density function [6]).
We see that the PSO and SA based algorithm both have a much lower OSPA
error when the number of robots is smaller than the targets number, with the
PSO algorithm having a lower average value than SA or PHD. The difference
between the new methods and PHD is especially evident in the 10-robot, 10-
target case, where the original method fails to find about half of the targets (i.e.,
OSPA about halfway between the minimum of 0.8 and maximum of c = 10).
We also see that the spread in the OSPA values is smallest for the SA case,
indicating that it offers the most reliable performance across all three approaches.
In summary, PSO offers the lowest average OSPA, SA offers the most consistent
results, and the original is not the best in any category.
Coverage. We also wish to examine how well each algorithm explores the envi-
ronment, as this is another indicator of success. Figure 2 show the robots’ tra-
jectories in the 10-robots, 10-targets case, where the orange diamonds are the
targets location and the lines in different colors are the robots’ trajectories. We
can observe that in all cases most of the robots eventually find targets and oscil-
late around them. However, the paths they take to get there are different. Lloyd’s
algorithm, as shown in Fig. 2a, spends the least time exploring the space, which
contributes to it having the highest OSPA error. In this figure, two targets are
not detected. The robots move directly to their weighted Voronoi Center and
become idle once they have arrived at the goal point. This is a great waste of
the searching resource. We see in Fig. 2b that the robots cover much more of the
search space in the same amount of time. Thus, the addition of the new local
best term (since the global best is identical to Lloyd’s), causes the robots to
explore more. We also see that the robot paths are relatively smooth, a result of
the inertia term in (6) causing the heading direction to remain somewhat con-
sistent. We see in Fig. 2c that these robots take more of a random walk. This is
because the annealing process drive the robots oscillate between the best point
388 P. Xin and P. Dames
Fig. 1. Boxplots shows the median OSPA statistics over 10 runs for teams of 10–100
robots and 10, 30, and 50 static targets for original PHD, PSO, and SA respectively.
Stochastic Optimization for Multi-robot Tracking 389
Fig. 2. Robots’ trajectories using different control methods. Robots all begin in the
dashed box at the bottom of the environment.
Fig. 3. Percent of undetected area over time over 10 runs for 10-robot, 10-target case,
where the solid curves show the mean value and the shaded areas show the range.
Fig. 4. Boxplots show the percent of undetected area at the end of the experiment over
10 runs for teams of 10–100 robots and 50 static targets for the original PHD, PSO,
and SA search strategies.
390 P. Xin and P. Dames
and the potential point. However, since the temperature keeps decreasing, the
oscillation range is smaller than that of PSO.
We can examine these trends in a quantitative manner by plotting the per-
centage of unexplored area over time for each method in Fig. 3. We observe that
the average exploration speed is similar across all methods over the first 100 s,
with SA showing a larger range. After that, we observe that the exploration rate
of Lloyd’s algorithm levels off around 0.35 as robots become trapped. The other
methods both have ways of escaping local minima, with SA yielding the highest
and most consistent exploration rate as time increases.
To further explain, the unexplored area metric in Fig. 4 measures the per-
centage of the total area that was not explored by any robot at the end of the
search period. We see that in the 50 target case, all methods are able to cover
a significant portion of the environment, but SA has the highest median and
maximum unexplored area. This is likely due to the fact that the robots fre-
quently get stuck in local minima due to the higher density of targets in the
environment. Once the number of robots exceeds the number of targets, all yield
similar results and cover more than 99% of the environment for ≥60 robots in
the allotted time. This is because there are free robots to continue to explore
the remainder of the space even when each target has a robot assigned to track
it.
Fig. 5. Boxplots shows the median OSPA statistics over 10 runs for teams of 10–100
robots in tracking moving targets for original PHD, PSO, and SA respectively.
This model expects one new target to appear per 10 s. Based on the appearance
and disappearance of targets, the number of targets reaches a steady state of
around 30 targets regardless of the initial number of targets.
Figure 5 shows the median steady state OSPA value, as measured by the
average over the final 200 out of 1000 time steps. We see that in all cases, the
OSPA decreases as the number of robots increases and appears to reach an
asymptote. However, the PHD method yields the highest OSPA error when the
number of robots is small, but has the lowest asymptote when the number of
robots is high. We see that the PSO and SA methods are quite similar with 10–40
robots. However, above that we see that PSO reaches a higher asymptote and has
a higher variance in the OSPA, meaning it is both less accurate and less reliable
than SA. This is similar to the static target case, where Lloyd’s algorithm get
stuck tracking one target while PSO and SA can both escape these local minima
to switch to another target. This escape behavior is very beneficial when there
are fewer robots than targets but slightly harmful to tracking performance when
there are more robots than targets.
5 Conclusion
In this paper, we examined the performance of different search strategies to
solve the MR-MTT problem. All methods share a common structure with two
components: 1) an estimation algorithm based on the distributed PHD filter
and 2) a control strategy combining the stochastic optimization algorithm with
Voronoi-based control. The robots use the estimate from the distributed PHD
filter to weight the relative importance of the area within their Voronoi cell
and select an action. We compare the standard Lloyd’s algorithm with two new
methods based of standard stochastic optimization algorithms, namely simulated
annealing (SA) and particle swarm optimization (PSO). Both these new methods
allows robots to trade-off between a global goal and other local areas known to
contain targets, unlike Lloyd’s algorithm which only uses local information. We
see that the new stochastic methods enable the robot team to explore more
search area and yield better tracking results when there are fewer robots than
targets. In other words, the robots make a better trade-off between exploration
and exploitation.
Acknowledgement. This work was funded by NSF grants IIS-1830419 and CNS-
2143312.
392 P. Xin and P. Dames
References
1. Blackman, S.: Multiple hypothesis tracking for multiple target tracking. IEEE
Aerosp. Electron. Syst. Mag. 19(1), 5–18 (2004). https://doi.org/10.1109/MAES.
2004.1263228
2. Chen, J., Xie, Z., Dames, P.: The semantic PHD filter for multi-class target track-
ing: From theory to practice. Robot. Autonomous Syst. 149 (2022). https://doi.
org/10.1016/j.robot.2021.103947
3. Cortes, J., Martinez, S., Karatas, T., Bullo, F.: Coverage control for mobile sensing
networks. IEEE Trans. Robot. Autom. 20(2), 243–255 (2004). https://doi.org/10.
1109/TRA.2004.824698
4. Dames, P., Kumar, V.: Autonomous localization of an unknown number of targets
without data association using teams of mobile sensors. IEEE Trans. Autom. Sci.
Eng. 12(3), 850–864 (2015). https://doi.org/10.1109/tase.2015.2425212
5. Dames, P., Kumar, V.: Experimental characterization of a bearing-only sensor for
use with the phd filter (2015). https://doi.org/10.48550/arXiv.1502.04661
6. Dames, P.M.: Distributed multi-target search and tracking using the PHD filter.
Auton. Robot. 44, 673–689 (2020). https://doi.org/10.1007/s10514-019-09840-9
7. Davis, L.: Bit-climbing, representational bias, and test suit design. In: Proc. Intl.
Conf. Genetic Algorithm, 1991, pp. 18–23 (1991)
8. Derr, K., Manic, M.: Multi-robot, multi-target particle swarm optimization search
in noisy wireless environments. In: 2009 2nd Conference on Human System Inter-
actions, pp. 81–86 (2009). https://doi.org/10.1109/HSI.2009.5090958
9. Deutsch, I., Liu, M., Siegwart, R.: A framework for multi-robot pose graph slam.
In: 2016 IEEE International Conference on Real-time Computing and Robotics
(RCAR), pp. 567–572 (2016). https://doi.org/10.1109/RCAR.2016.7784092
10. Doitsidis, L., et al.: Optimal surveillance coverage for teams of micro aerial vehicles
in gps-denied environments using onboard vision. Auton. Robot. 33(1), 173–188
(2012). https://doi.org/10.1007/s10514-012-9292-1
11. Du, Q., Faber, V., Gunzburger, M.: Centroidal voronoi tessellations: applica-
tions and algorithms. SIAM Rev. 41(4), 637–676 (1999). https://doi.org/10.1137/
S0036144599352836
12. Fortmann, T., Bar-Shalom, Y., Scheffe, M.: Sonar tracking of multiple targets using
joint probabilistic data association. IEEE J. Oceanic Eng. 8(3), 173–184 (1983).
https://doi.org/10.1109/JOE.1983.1145560
13. Glover, F.: Tabu search-part i. ORSA J. Comput. 1(3), 190–206 (1989)
14. Goldhoorn, A., Garrell, A., Alquézar, R., Sanfeliu, A.: Searching and tracking peo-
ple with cooperative mobile robots. Auton. Robot. 42(4), 739–759 (2018). https://
doi.org/10.1007/s10514-017-9681-6
15. Kennedy, J., Eberhart, R.: Particle swarm optimization. In: Proceedings of
ICNN’95 - International Conference on Neural Networks, vol. 4, pp. 1942–1948
vol 4 (1995). https://doi.org/10.1109/ICNN.1995.488968
16. Kirkpatrick, S., Gelatt, C.D., Jr., Vecchi, M.P.: Optimization by simulated anneal-
ing. Science 220(4598), 671–680 (1983)
17. Mahler, R.: Multitarget bayes filtering via first-order multitarget moments. IEEE
Trans. Aerosp. Electron. Syst. 39(4), 1152–1178 (2003). https://doi.org/10.1109/
TAES.2003.1261119
18. Mirjalili, S.: Moth-flame optimization algorithm: A novel nature-inspired heuris-
tic paradigm. Knowl.-Based Syst. 89, 228–249 (2015). https://doi.org/10.1016/j.
knosys.2015.07.006
Stochastic Optimization for Multi-robot Tracking 393
19. Murphy, R.: Human-robot interaction in rescue robotics. IEEE Trans. Syst. Man
Cybern. Part C (Applications and Reviews) 34(2), 138–153 (2004). https://doi.
org/10.1109/TSMCC.2004.826267
20. Qie, H., Shi, D., Shen, T., Xu, X., Li, Y., Wang, L.: Joint optimization of multi-uav
target assignment and path planning based on multi-agent reinforcement learning.
IEEE Access 7, 146264–146272 (2019). https://doi.org/10.1109/ACCESS.2019.
2943253
21. Rizk, Y., Awad, M., Tunstel, E.W.: Decision making in multiagent systems: a sur-
vey. IEEE Trans. Cognitive Developmental Syst. 10(3), 514–529 (2018). https://
doi.org/10.1109/TCDS.2018.2840971
22. Robin, C., Lacroix, S.: Multi-robot target detection and tracking: taxonomy and
survey. Auton. Robot. 40(4), 729–760 (2016)
23. Schuhmacher, D., Vo, B.T., Vo, B.N.: A consistent metric for performance evalua-
tion of multi-object filters. IEEE Trans. Signal Process. 56(8), 3447–3457 (2008).
https://doi.org/10.1109/TSP.2008.920469
24. Shi, W., He, Z., Tang, W., Liu, W., Ma, Z.: Path planning of multi-robot systems
with boolean specifications based on simulated annealing. IEEE Robot. Autom.
Lett. 7, 6091–6098 (2022). https://doi.org/10.1109/LRA.2022.3165184
25. Stone, L.D., Streit, R.L., Corwin, T.L., Bell, K.L.: Bayesian multiple target track-
ing. Artech House (2013)
26. Tang, Q., Yu, F., Xu, Z., Eberhard, P.: Swarm robots search for multiple tar-
gets. IEEE Access 8, 92814–92826 (2020). https://doi.org/10.1109/ACCESS.2020.
2994151
27. Vo, B.N., Ma, W.K.: The Gaussian mixture probability hypothesis density filter.
IEEE Trans. Signal Process. 54(11), 4091–4104 (2006)
28. Vo, B.N., Singh, S., Doucet, A.: Sequential Monte Carlo methods for multitarget
filtering with random finite sets. IEEE Trans. Aerosp. Electron. Syst. 41(4), 1224–
1245 (2005)
29. Wang, B.: Coverage problems in sensor networks: A survey. ACM Comput. Surv.
43(4) (2011). https://doi.org/10.1145/1978802.1978811
30. Zhang, C., Lesser, V.: Coordinating multi-agent reinforcement learning with lim-
ited communication. In: Proceedings of the 2013 International Conference on
Autonomous Agents and Multi-Agent Systems, AAMAS ’13, pp. 1101–1108. Inter-
national Foundation for Autonomous Agents and Multiagent Systems, Richland,
SC (2013)
Multi-agent Deep Reinforcement
Learning for Countering Uncrewed Aerial
Systems
1 Introduction
Counter Uncrewed Aerial Systems (C-UAS) is an active area of research. A paper
by the Center for The Study of Drones identified 235 counter-drone products on
the market or in development as of 2018 [1]. The rise in C-UAS research is largely
due to the proliferation of uncrewed aerial systems (UAS). Although UAS has
shown great potential in solving various critical and difficult tasks, the growing
autonomy of UAS posed a threat to the security of individuals and organizations
alike. A growing number of concerns is the intrusion of UAS into guarded or
protected spaces as any mishandled and malicious UAS in a protected space
poses a safety risk to the related entities, such as aircraft near or in airports
and groups of people in large gatherings. In addition, malicious UAS can be
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 394–407, 2024.
https://doi.org/10.1007/978-3-031-51497-5_28
Multi-agent Deep RL for C-UAS 395
among pursuers, but they do not consider, for example, the existence of mov-
ing obstacles and the collisions between pursuers [7], which may significantly
reduce the state space and complexity of the problem but leads to the methods
impractical.
As compared to the existing works, we study a more complicated but real-
istic C-UAS scenario, where there is one evader and multiple pursuers in an
alarmed area. As shown in Fig. 1, each pursuer has to work together to intercept
the evader from entering the target area, while avoiding collisions with other
pursuers and non-cooperating interacting entities (NCIEs) in the alarmed area.
We design a novel multi-agent DRL algorithm that is inspired by [16,18], where
different pursuers derive their actions based on their local actor networks, that is
trained with the shared experiences of all the pursuers simultaneously based on
Proximal Policy Optimization (PPO). The contributions of the paper are sum-
marized as follows. 1) We consider a complicated C-UAS scenario, where mul-
tiple pursuers, NCIEs, and one evader exist in an alarmed area. We tackle this
challenging problem by designing a Proximal Policy Optimization-based Advan-
tage Actor-Critic for parameter sharing multi-agent DRL. More specifically, the
pursuers share their observations to a centralized controller, which trains the
actor and critic networks, and then distributes the actor network to the pur-
suers to locally derive their actions. The design solution allows the pursuers to
quickly intercept the evader while avoiding collisions and model divergence. 2)
We conduct extensive simulations to show the convergence of the method and
demonstrate its performance by comparing it with the existing baseline methods.
The rest of the paper is structured as follows. We formulate the C-UAS
problem in Sect. 2 as a multi-agent Markov Decision Process. Section 3 describes
our designed multi-agent DRL algorithm, Sect. 4 discusses simulation results,
and Sect. 5 provides concluding remarks and future research directions.
catching an evader. The main focus of the work is to use two pursuers though,
which may not be adequate to capture a superior evader. Another variation is
described by[3], where the designed policy is to herd a group of swarm evaders
to a safe area. The authors developed a stringent algorithm to herd the attackers
to a safe area away from a target space.
Different from using the traditional control theory, multi-agent DRL is
another solution to solve the pursuit-evasion problem. Research in multi-agent
reinforcement learning has been plentiful over the years. The authors in [21]
highlighted the current state of multi-agent DRL and its challenges. Actually,
Multi-agent DRL can solve not only pursuit-evasion games, but also, for exam-
ple, swarm control [9,21] and cooperation. Some of the challenges to applying
multi-agent DRL include nonstationarity and scalability. The nonstationarity
issue arises because an agent’s action derived from its policy may affect other
agents’ actions, thus leading to model training divergence. The scalability arises
because the action and state spaces exponentially increase with respect to the
number of agents, and thus are not scalable to a larger group of agents. To
address the scalability issue, the authors in [9] propose Mean Embedding for
defining the agent’s observation so that training is invariant to the number of
agents. Whereas [8] used a self-attention-based model to aid with scalability. In
[22], a target prediction network (TP Net) is developed to predict an evader’s
location using long-term short-term memory (LSTM). In [4], which is the most
relevant work as compared to our paper, the authors provided a framework for
training eight pursuers to capture one evader. The evader’s goal is to avoid
capture while moving around within a specific area. This is different from our
scenario, where the evaders’ goal is to reach the predefined target area. Their
main contribution is the evaluation of different factors that affect the probabil-
ity of capture for the pursuers such as the evader’s relative speed, number of
pursuers and radius of pursuers and evader. They also used curriculum learning
to train their agents. With curriculum learning, the agents learn to achieve suc-
cessive tasks by first completing and building on easier tasks. This method has
been shown to have good stability performance but does not address collision
between agents and obstacle avoidance. Our approach builds on this research
and addresses inter-agent collision and obstacle avoidance.
To train the deep neural networks in terms of policies in multi-agent reinforce-
ment learning, several methods have been proposed. For example, [6] applied
Trust region policy optimization to training the deep neural networks, whose
parameters are shared among agents. In [12], the authors created a variant of
DDPG for multi-agent systems, where the agents share the action-value function
for learning and act in a decentralized fashion during execution. To achieve supe-
rior training performance, a reward function must be provided to the agents that
maps agents’ states to their rewards, and so [20] uses sparse reward to enable
pursuers to reach evaders. At each successful capture, the pursuers get a reward
based on the number of pursuers, who are within the captured distance of the
evader.
398 J.-E. Pierre et al.
2 Problem Formulation
As shown in Fig. 1, there is a group of N pursuers tasked with protecting a
target area T from a group of M evaders in an alarmed area W, where T ⊂ W.
The positions of the pursuers and evaders at each time step are denoted as
j
Pip (t) ∈ R2 (i = 1, . . . , N) and Pe (t) ∈ R2 ( j = 1, . . . , M), respectively, where i and
j are the indices of the pursuers and evaders, respectively, and t is the current
time step, Pip (t) and θ ip (t) are the 2D position and heading for pursuer i, and
j j
Pe (t) and θ e (t) are the 2D position and heading for evader j. Define xip (t) and
j
xe (t) as the state vector of pursuer i and evader j, where xip (t) = Pip (t), θ ip (t)
j j j
and xe (t) = Pe (t), θ e (t) . If the unicycle model is applied, then the dynamics of
the pursuer and evader are given by
where cxk and cyk represent the constant velocity of NCIE k in the x and y direc-
tion, respectively. Here, the constant speed setup for an NCIE is used to emulate
the trajectory of the NCIE. Although these NCIEs, such as flying aircraft in busy
airports or a flock of birds, may have variable speeds that lead to much more
sophisticated control of the pursuers to avoid collisions as compared to the con-
stant speed setup of the NCIEs, we adopt a simplified but reasonable strategy
that each pursuer should ensure a safe distance to each NCIE to avoid the col-
lision. Here, the safety distance is calculated based on the worst-case scenario
that two entities are flying towards each other with their maximum velocities.
Hence, the constant/variable speed setup for an NCIE does not affect our pro-
posed method to be mentioned later on. Note that for future reference, we will
drop indices (i, j, k) and t when not applicable. In this paper, we assume that all
entities are at the same height, and therefore will only focus on 2-dimensional
space for this paper. We leave 3-dimensional C-UAS for future research. Also,
restricting UASs to nonholonomic constraints is feasible for working on a planar
space W for fixed-wing UAVs.
Multi-agent Deep RL for C-UAS 399
The goal of the evaders is to enter the target area T while evading capture.
j
Evader j has successfully penetrated the target space if ||Pe (t)−P T || ≤ δ T , where
P T and δ T are the center position and radius of the target area T , respectively,
and · is the Euclidean norm. The goal of the pursuers is to capture the
evaders before the evaders enter the target area. Evader j successful capture
j
by pursuer i is categorized by ||Pip (t) − Pe (t)|| ≤ δc where δc is defined as a
captured radius around the evader. Moreover, the pursuers cannot enter the
target area T , which is motivated by the fact that in the real world, some agents
cannot enter the crowded closed areas to avoid complicated events. In addition,
a collision happened between two pursuers once Pip (t) − Pip (t) ≤ δ p , where i
and i are the indices of the two pursuers and δp is the safety distance to avoid
the collision between two pursuers. Similarly, a collision between pursuer i and
NCIE k occurs if Pip (t)−Pko (t) ≤ δo , where δo is the safety distance to avoid the
collision between a pursuer and an NCIE. We disregard the collisions between
two NCIEs as well as two evaders as we assume that the evaders/NCIEs are well-
coordinated to prevent collisions. Specifically, the initial positions of pursuers and
evaders are always within and out of the alarmed area W, respectively.
j j j
In our scenario, the evader’s action, i.e., ae (t) = ve (t) , ωe (t) , is dictated by
a virtual attractive force defined in [23]. To define the virtual attractive force, we
first convert the action of the evader from a unicycle model to a single integrator
using [5]. An integrator model takes as input
j j j
ve (t) = vx,e (t) , vy,e (t) , (3)
j j
with vx,e, vy,e being the current speed of the evader in the x and y direction
respectively. Alternatively, the velocity vector for the evader can be defined as:
j j j
ve (t) = ve d̂e (t), (4)
j
where ve is the max permissible velocity of the evader and d̂e (t) is a unit vector
pointing in the desired direction of the evader. Since the evader’s goal is to reach
the target, we can derive a virtual attractive force that pulls the evader towards
the target while simultaneously pushing the evader away from NCIEs or the
pursuer. This desired driving direction is given as
N
1
K
1
j
de (t) = −kr j
i
P̂p,e (t) + j
k
P̂o,e(t)
i=0 Pip (t) − Pe (t) α k=0 Pko (t) − Pe (t) α
1
+ ka j P̂T,e (t) , (5)
Pe (t) − P T
where the first two summations represent the repulsive force pushing the evader
away from the pursuers and NCIEs, and the last summation represents an attrac-
tive force that pulls the evader towards the target space area T . Here, kr and k a
are the repulsive and attractive force gains, respectively, and α determines if the
j
forces are distance dependent. In our simulation, we set α equal to 1. Since ve (t)
400 J.-E. Pierre et al.
is the action for the single integrator model, we use [5] to convert it to action
j
ae (t) of the unicycle model. The values kr and k a are optimized by extensive
evaluation to maximize the evader entering the alarmed space while avoiding
collision with the pursuers and NCIEs.
4. R is the reward function. The goal of each agent is to maximize its cumulative
reward Ri = Tt=0 γ t rti , where γ ∈ [0, 1] is the discount factor, T is the length
of one episode, and rti is the reward function of agent i at time slot t, i.e.,
⎧
⎪ j
||Pip (t) − Pe (t)|| ≤ δc,
⎪
⎪ 1,
⎪
⎪
⎪
⎪ −1,
j
||Pe (t) − P T || ≤ δ T ,
⎪
⎪
⎪
⎨ −10,
⎪ ||Pip (t) − P T || ≤ δ T ,
rti = (6)
⎪
⎪ −0.10, Pip (t) − Pip (t) ≤ δp,
⎪
⎪
⎪ −0.15,
⎪ Pip (t) − Pko (t) ≤ δo,
⎪
⎪
⎪
⎪ di, j
⎪ −β dmax , otherwise,
⎩
where rti = 1 if the evader is captured, rti = −1 if the evader reaches the
target space, rti = −10 if it pursuer j enters the target space, rti = −0.10 if
rt = −0.15 if the pursuer collides with
the pursuer collides with each other, i
i, j d
an NCIE in the environment. β dmax is a negative reward calculated based
on the distance between the pursuer and the evader. β determines the weight
Multi-agent Deep RL for C-UAS 401
of the penalty, di, j is the Euclidean distance to the evader and dmax is the
maximum possible distance of the environment.
5. F : S × A → S defines the state transition probability density function that
maps the current states and actions into the next states.
where γ ∈ [0, 1] is the discount factor and s0 is the initial state of the
environment. In policy-based RL algorithms, we can choose a parameter-
ized
policy πφ (at |st) to maximize the cumulative discounted reward G(φ) =
E ∞ t=0 γ t r(s , a ; φ) . Many solutions have been designed to efficiently solve the
t t
MAMDP problem [13], and motivated by [16], we apply the Advantage Actor-
Critic (A2C) method to the problem. A2C is a DRL method combining policy-
based and value-based reinforcement learning. In A2C, there are two neural
networks, i.e., the actor and critic networks. The actor network provides the
stochastic policy to choose the actions such that the expected cumulative reward
G is maximized. Here, we apply Proximal Policy Optimization (PPO) [16], which
claims to outperform other policy-based methods in continuous control environ-
ments [16]. Denote πφ (at |st ) as the parameterized policy for PPO, where φ is the
set of weights. The loss function in PPO is
L CLIP (φ) = Êt min(rt (φ) Ât , clip(rt (φ), 1 − , 1 + ) Ât ) , (8)
where is the clipping ratio, rt (φ) is the ratio between current policy πφ (at |st )
π (a |s )
and πφold (at |st ) i.e., rt (φ) = πφ φ (at t |st t ) , Ât in Eq. (8) is the Advantage function. In
old
Here, δV
t+l
is the temporal difference (TD) residual error at time step t + l, i.e.,
where Vν is the state value, which is estimated by the critic network, and ν
indicates the set of the weights in the critic network. The objective of the critic
402 J.-E. Pierre et al.
network is to minimize the loss function L(ν), which is defined as the mean square
error between the estimated state-value and the expected cumulative reward, i.e.,
Here, βa and βc are the learning rates for the actor and critic networks, respec-
tively. The updated actor network will then be forwarded to all the agents to
derive their actions based on their observations.
Algorithm 1 summarizes Proximal Policy Optimization based Advantage
Actor-Critic (PPO-A2C), which is implemented in RlLib [11], a RL application
programming interface (API) for training multi-agent systems using reinforce-
ment learning.
4 Simulation Results
We will conduct extensive simulations to validate the proposed PPO-A2C algo-
rithm. The environment has an arena size of 80 m × 60 m, where there are four
pursuers to capture one evader with five NCIEs navigating. The captured and
collision radii for pursuers/evader/NCIEs (i.e., δc /δ p /δo ) are set to 1 m. The
maximum speeds of the pursuers and evader are the same 10 m/s, while the
NCIEs move with a constant speed of 1 m/s. At each episode, the pursuers are
initialized in a random location inside the circular alarmed area with a radius
of 30 m but outside the circular target area with a radius of 5 m. Meanwhile,
the evader is randomly initialized outside the alarmed area and the NCIEs are
initialized with a random location inside the environment and heading θ. The
length of each episode is 40 s of simulation time, and the episode ends if the
evader is captured by any pursuers or if any pursuers enter the protected space.
Figure 3 shows a still frame of the pursuers capturing the evader. For additional
full-length videos, please visit our shared folder on Google Drive1 . The trace blue
lines show the trajectory used by the pursuers to reach the evader. The hyper-
parameters for PPO-A2C are listed in Table 1. Also, the simulation is conducted
on a computer with an Intel Core i7 processor with 12 cores, 16 GiB memory,
and GPU Nvidia GeForce RTX 2070 graphics card for 20 hours. Figure 4 shows
the learning curves for the pursuers by applying the proposed PPO-A2C and the
evader based on Eq. (5), respectively. From Fig. 4 we can derive that the A2C
Algorithm 1. PPO-A2C
Input: initial policy parameters φ0 , initial value function parameters ν0
for iteration = 1, 2, · · · do
for each agent i = 1, 2, · · · , N do
Run actor network to derive the related action ati based on the observation;
Interact with the environment to obtain rti and st+ i ;
i i i i 1
Transmit st , at , rt , st+1 to the centralized server;
end for
Update φ and ν based on Eq. (12) via mini-batch SGD;
Forward the updated actor network to all the agents;
end for
1 https://drive.google.com/drive/folders/1GNMzj1GrD3Ov8oOhKUZCorw5D4Xpg2ul?
usp=sharing.
404 J.-E. Pierre et al.
models in the pursuers are finally converged. In contrast, the evader’s normalized
reward decreases as the pursuers learn an optimal policy.
We compare the performance of the proposed
PPO-A2C to a baseline method, i.e., artificial Table 1. Hyperparameters
potential function pursuers (APFP), where each
pursuer uses the artificial potential field function Parameters Values
to determine its action to capture the evader. γ 0.99
Also, we apply the Multi-Agent Reinforcement λ 0.95
Learning Pursuers (MARLP), which also uses Minibatch size 4096
PPO-A2C, but pursuers neglect the existence of 0.2
NCIEs and there is no penalty for the pursuer to β a , βc 5 × 10−5
pursuer collisions. Note that for all the methods,
the evader uses APF (i.e., Eq. (5)) to reach the
target area, while trying to avoid the collisions. The metrics to evaluate the pol-
icy performance of PPO-A2C, APFP and MARLP are the capture probability
of evaders, target collision rate, NCIE collision rate, and pursuer collision rate.
The capture probability of evaders is defined as the number of episodes when
the evader was captured divided by all the episodes. The target collision rate is
the number of episodes when the pursuer enters the protected space divided by
the total number of episodes. The NCIE collision rate is the number of collisions
between pursuers and NCIEs divided by the total number of time steps. Lastly,
the pursuer collision rate is defined as the number of collisions between pursuers
divided by the total number of time steps.
Fig. 3. An example to show the trajec- Fig. 4. Normalize rewards for the pur-
tories of the evader and pursuers. suer and evader during training.
Figure 5 shows the mean capture probability of the three methods over
episodes, PPO-A2C has the highest mean capture probability, which demon-
strates its effectiveness. MARLP has the lowest mean capture probability due
to the reason that it is trained in an environment without NCIEs, thus unable
to adapt to the environment with NCIEs, leading to many collisions. Figure 6
Multi-agent Deep RL for C-UAS 405
illustrates the mean capture probability of the three methods when the number
of pursuers increases. From the figure, we can see that PPO-A2C can gener-
ate efficient policies for the pursuers to achieve the highest capture probability
when the number of pursuers is less than seven but achieves a similar capture
probability to APFP when the number of pursuers is no less than seven.
We also observe the target collision rate, pursuer collision rate, and NCIE col-
lision rate for the three methods by varying the number of pursuers. As shown in
Fig. 7a, PPO-A2C always maintains the lowest target collision rate as compared
to the other two methods. In Fig. 7b, we can find that PPO-A2C has a similar
collision rate to APFP. Interestingly, MARLP has the lowest pursuer collision
rate. Lastly, in Fig. 7c, APFP maintains the lowest NCIE collision rate even as
the number of pursuers increases, and PPO-A2C has slightly worse performance
but not as bad as MARLP. Based on the above figures, we can conclude that
PPO-A2C can achieve a comparable low collision rate to the repulsive algorithm,
but derive a better policy to capture the evaders with a higher probability than
the repulsive algorithm.
Fig. 7. Probability of different collisions. (a) Target collision rate. (b) Pursuer collision
rate. (c) NCIE collision rate.
406 J.-E. Pierre et al.
5 Conclusion
In this paper, we proposed PPO-A2C to tackle the counter UAS scenario. PPO-
A2C has been demonstrated to outperform the traditional APFP method against
a smart evader. In addition, each agent can avoid obstacles and maintains a safe
distance around the protected space. One of the limitations of our method is the
lack of cooperative behaviors among the pursuers. During the simulation, the
pursuers are observed with greedy behaviors to catch the evader. One possible
method to overcome the issue is to design reward values to foster collaboration
among the pursuers. We will tackle this limitation in future research. Further-
more, we plan to improve the proposed method so that each pursuer does not
need to have the complete state information of the environment to train the
models and derive the action. Lastly, we would extend the proposed PPO-A2C
method to capture multiple evaders.
References
1. Michel, A.H.: Counter-drone systems. http://dronecenter.bard.edu/counter-drone-
systems/ (2018)
2. Çetin, E., Barrado, C., Pastor, E.: Counter a drone in a complex neighborhood
area by deep reinforcement learning. Sensors 20(8), 2320 (2020)
3. Chipade, V.S., Marella, V.S.A., Panagou, D.: Aerial swarm defense by stringnet
herding: theory and experiments. Front. Robot. AI, 8, 81 (2021)
4. De Souza, C., Newbury, R., Cosgun, A., Castillo, P., Vidolov, B., Kulić, D.:
Decentralized multi-agent pursuit using deep reinforcement learning. IEEE Robot.
Autom. Lett. 6(3), 4552–4559 (2021)
5. Glotfelter, P., Egerstedt, M.: A parametric MPC approach to balancing the cost of
abstraction for differential-drive mobile robots. In:2018 IEEE International Con-
ference on Robotics and Automation (ICRA), pp. 732–737. IEEE (2018)
6. Gupta, J.K., Egorov, M., Kochenderfer, M.: Cooperative multi-agent control using
deep reinforcement learning. In: Sukthankar, G., Rodriguez-Aguilar, J.A. (eds.)
AAMAS 2017. LNCS (LNAI), vol. 10642, pp. 66–83. Springer, Cham (2017).
https://doi.org/10.1007/978-3-319-71682-4 5
7. Hasan, Y.A., Garg, A., Sugaya, S., Tapia, L.: Defensive escort teams for navigation
in crowds via multi-agent deep reinforcement learning. IEEE Robot. Autom. Lett.
5(4), 5645–5652 (2020)
8. Hasan, Y.A., Garg, A., Sugaya, S., Tapia, L.: Scalable reinforcement learning poli-
cies for multi-agent control. In: 2021 IEEE/RSJ International Conference on Intel-
ligent Robots and Systems (IROS)
Multi-agent Deep RL for C-UAS 407
9. Hüttenrauch, M., Adrian, S., Neumann, G., et al.: Deep reinforcement learning for
swarm systems. J. Mach. Learn. Res. 20(54), 1–31 (2019)
10. Kamimura, A., Ohira, T.: Group chase and escape. New J. Phys. 12(5), 053013
(2010)
11. Liang, E., et al.: Abstractions for Distributed Reinforcement Learning. Rllib (2018)
12. Lowe, R., Wu, Y.I., Tamar, A., Harb, J., Pieter Abbeel, O., Mordatch,
I.: Multi-agent actor-critic for mixed cooperative-competitive environments.
arXiv:1706.02275 (2017)
13. Oroojlooy, A., Hajinezhad, D.: A review of cooperative multi-agent deep reinforce-
ment learning. arXiv preprint arXiv:1908.03963 (2019). https://doi.org/10.1007/
s10489-022-04105-y
14. Pierson, A., Wang, Z., Schwager, M.: Intercepting rogue robots: an algorithm for
capturing multiple evaders with multiple pursuers. IEEE Robot. Autom. Lett. 2(2),
530–537 (2016)
15. Schulman, J., Moritz, P., Levine, S., Jordan, M., Abbeel, P.: High-dimensional con-
tinuous control using generalized advantage estimation. arXiv:1506.02438 (2015)
16. Schulman, J.: Filip Wolski. Alec Radford, and Oleg Klimov. Proximal policy opti-
mization algorithms, Prafulla Dhariwal (2017)
17. Shishika, D., Paulos, J., Kumar, V.: Cooperative team strategies for multi-player
perimeter-defense games. IEEE Robot. Autom. Lett. 5(2), 2738–2745 (2020)
18. Terry, J.K., Grammel, N., Hari, A., Santos, L., Black, B.: Revisiting parameter
sharing in multi-agent deep reinforcement learning (2021)
19. Wang, J., Liu, Y., Song, H.: Counter-unmanned aircraft system (s)(C-UAS): State
of the art, challenges, and future trends. IEEE Aerosp. Electron. Syst. Mag. 36(3),
4–29 (2021)
20. Chao, Yu., Dong, Y., Li, Y., Chen, Y.: Distributed multi-agent deep reinforcement
learning for cooperative multi-robot pursuit. J. Eng. 2020(13), 499–504 (2020)
21. Zhang, K., Yang, Z., Başar, T.: Multi-agent reinforcement learning: a selective
overview of theories and algorithms. In: Vamvoudakis, K.G., Wan, Y., Lewis, F.L.,
Cansever, D. (eds.) Handbook of Reinforcement Learning and Control. SSDC,
vol. 325, pp. 321–384. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-
60990-0 12
22. Zhang, R., Zong, Q., Zhang, X., Dou, L., Tian, B.: Game of drones: multi-UAV
pursuit-evasion game with online motion planning by deep reinforcement learning.
IEEE Transactions on Neural Networks and Learning Systems (2022)
23. Zhang, S., Liu, M., Lei, X., Yang, P., Huang, Y., Clark, R.: Synchronous intercept
strategies for a robotic defense-intrusion game with two pursuers. Auton. Robot.
45(1), 15–30 (2021)
24. Zhou, Z., et al.: Pursuit Voronoi Partitions. Autom. 72, 64–72 (2016)
Decentralized Risk-Aware Tracking
of Multiple Targets
1 Introduction
Tracking multiple targets using a decentralized robot team finds applications in
many settings, e.g., tracking biochemical or nuclear hazards [2] or hot-spots in a
fire [3]. In these settings, approaching the targets closely generally produces more
accurate state estimates, but this comes at a cost – the closer the team of robots
comes to the targets, the more likely that they experience sensor failure or other
system faults due to e.g., radiation or heat emitted by the targets. In certain
cases, the targets may even be actively hostile. The challenge is to maximize
tracking accuracy while avoiding risky behavior on the part of the robot team. A
common strategy for achieving team coordination is to devise a central controller
G.S. Sukhatme holds concurrent appointments as a Professor at USC and as an Ama-
zon Scholar. This paper describes work performed at USC and is not associated with
Amazon.
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 408–423, 2024.
https://doi.org/10.1007/978-3-031-51497-5_29
Decentralized Risk-Aware Tracking of Multiple Targets 409
to make team-level decisions for all robots [4]. This relies on the assumption that
all robots have a sufficiently large communication range to stay connected with
a central station. In real applications, robots may have a limited communication
range. A decentralized approach – where robots only communicate with nearby
team members – is a natural choice [5,6]. In a decentralized strategy, each robot
makes its own decisions according to local information gathered by itself and its
peers within a certain communication range. If such a decentralized strategy is
to achieve the same level of task execution as a centralized system, it is essential
for it to maintain network connectivity, such that robots could share information
mutually and make the most use of their resources.
Motivated by these issues, we present a decentralized risk-aware multi-target
tracking framework. Our contributions are as follows:
1. Our system is risk aware – it automatically trades off tracking quality and
safety;
2. Our system is fully decentralized – each robot makes decisions relying
on information sharing between neighbors within a communication range.
We apply the control barrier function (CBF) in a decentralized manner to
guarantee network connectivity maintenance and collision avoidance;
3. Our system produces comparable results to its centralized coun-
terpart – we extensively evaluate the performance of our system through
simulated experiments in Gazebo environment under various settings, and
compare it to its centralized counterpart from our previous work [1]. We
present qualitative demonstrations to provide an intuitive illustration. Quan-
titative results on metrics including tracking error and risk level are also
exhibited.
2 Related Work
Multi-robot multi-target tracking is usually formulated as an iterative state esti-
mation problem [7–12]. Various methods have been proposed for this; most of
them design different flavors of filters to reduce the uncertainty in target pose
estimation during the tracking process [1,7,13,14]. In the more challenging sce-
nario where the targets may adversarially cause damage to the robots, safety
becomes a major concern when the robot team plans its motion. To incorporate
risk-awareness into a multi-robot system, [15] uses safety barrier certificates to
ensure safety. With a special emphasis on tracking and covering wildfire, [3]
encodes hazards from fire using an artificial potential field. Mutual information
is used in [2] to build a probabilistic model of the distribution of hazards in the
environment.
Considering that a multi-robot system controlled by a central planner does
not scale well [1], efforts have been made to decentralize the multi-target tracking
task. For example, [16] adopts the local algorithm for solving the max-min prob-
lem to solve multi-target tracking in a distributed manner. In [17], a decentralized
MPC-based strategy is proposed that generates collision-free trajectories. In [18],
the general problem of multi-robot coordination while preserving connectivity is
410 J. Liu et al.
3 Problem Formulation
3.1 Notation
We use capital letters in boldface (e.g., A) to denote matrices, lowercase letters
in boldface (e.g., x) to denote vectors, and lowercase letters in regular font (e.g.,
q) to denote scalars. 1 M×N denotes an all-one matrix of shape M × N. I M is
the identity matrix of shape M × M. M1 ⊗ M2 gives the Kronecker product of
two matrices M1 and M2 . [v1 ; v2 ; · · · ; vn ] represents vertical vector concatena-
tion. The vertical concatenation of matrices or scalars are defined in the same
fashion. Diag([M1, M2, · · · , Mn ]) represents diagonal matrix construction with
constituents M1, M2, · · · , Mn placed along the diagonal. Tr(M) is the trace of
matrix M. We use · to denote the 2-norm for vectors. For a set S, |S| returns
its cardinality. [z] denotes the set {1, 2, · · · , z} for an arbitrary integer z ∈ Z+ .
N (0, Q) represents the Gaussian distribution with a zero mean and covariance
matrix Q.
Based on this, Γt denotes the sensor status of the whole robot team at time t. The
set of sensors available at time t for robot i is denoted as γi,t = {k ∈ [U]|Γki,t = 1}.
Combining the sensor status with the measurement characteristic of each sensor,
the measurement matrix for robot i taking measurements of the target j is:
Hi j,t = [hγi, t (1) ; hγi, t (2) ; · · · ; hγi, t ( |γi, t |) ], and the measurement matrix for robot i
measuring all M targets is: Hi,t = I M ⊗ Hi j,t .
Without loss of generality, we assume that all the robots have a homoge-
neous maximum communication range Rcomm , as heterogeneity in the robots’
communication ability is beyond the scope of this paper and is left for future
exploration. For robot i, we define the region where other robots could have
information exchange with it as χi := {x ∈ X|x − xi ≤ Rcomm }. We call the
robots that are within χi as robot i’s neighbors. A robot is only able to commu-
nicate with its neighbors. The list of indices of robot i’s neighbors is defined as
Ni := { j ∈ [N], j i|x j ∈ χi }. Note that Ni is time-varying for each robot i ∈ R
as the robots actively reconfigure their positions to track the targets.
Let the M targets be indexed by T := [M]. Denote the true state of target
j as z j,t ∈ X ⊆ R p , and its control input as v j,t ∈ V ⊆ Rq , both at time step t.
The dynamics of the targets follow:
where A and B are the process and control matrices, respectively. zt and vt
are obtained by stacking z j,t and v j,t , ∀ j ∈ T vertically. wt ∼ N (0, Q) is the
process Gaussian noise with covariance matrix Q. A, B and Q are assumed to
be known to the robots. Robot i’s measurement of the targets with noise ζ i,t =
[ζi1,t ; ζi2,t ; · · · ; ζiM,t ] is modeled by:
We let ζi j,t ∼ N (0, Ri j,t ), ∀ j ∈ T where the measurement noise covariance matrix
Ri j,t grows exponentially as the distance between robot i and target j increases,
R−i j,t
1
= Diag([R−i j11,t , R−i j12,t , · · · , R−i j1|γi, t |,t ]) (5)
R−i jk,t
1
= wk exp(−λk xi,t − z j,t ) (6)
where wk and λk are parameters characterizing the noise of sensor k ∈ γi,t . The
target risk field φ, which encodes the probability that a robot experiences sensor
failures induced by the targets, inherits the same definition as our previous
work [1, Sect. III-E, V]. In particular, the risk from target j acting on all robots
is:
cj 1
φ j (x) = exp(− (x − z j )T Σ j (x − z j )) (7)
2π|Σ j | 2
where c j and Σ j denote the peak value and covariance for the risk field of target j,
respectively. Correspondingly, the safety field is defined as π j (x) = 1 − φ j (x), ∀ j ∈
T.
412 J. Liu et al.
We focus on planning the motion of each robot in the team to obtain an accurate
estimation of the targets’ positions while at the same time avoiding risk, in
a decentralized manner at each time step t throughout the tracking process.
Specifically, for each robot i ∈ R, the goal is to solve the following optimization
program:
1
min q1 ηi δ i1 2 + q2 δ2 (8a)
xi ,δ i 1,δi 2 1 + ηi i 2
s.t. Tr(Pi j ) ≤ ρ i1, j + δ i1, j , δ i1, j ≥ 0, ∀ j ∈ T ; (8b)
Tr(O−i, Π
1
) ≤ ρi2 + δi2, δi2 ≥ 0; (8c)
|Ni | > 0; (8d)
xi − x̄i ≤ dmax ; (8e)
xi − xl ≥ dmin, ∀l ∈ R, l i. (8f)
The time index t is dropped here for the brevity of notation. xi is the optimization
variable representing robot i’s position at the next step, while x̄i refers to robot
i’s current position. The cost function Eq. (8a) balances competing objectives of
high tracking quality (first term) and safety (second term), using the sensor mar-
gin ηi which encodes the abundance of sensors available for robot i. The larger
the value of ηi , the more powerful the sensor suite on robot i is for observing
the targets, and therefore the first term in the objective is more stressed. On the
contrary, when the sensors on robot i are limited, ηi becomes small, enforcing
the second term in Eq. 8a to be stressed prioritizing safety. q1, q2 are constants
weighing the two objectives. The tracking error is represented using the trace
of target estimation covariance matrix Pi j and is upper bounded by threshold
ρ i1, j , j ∈ T to ensure tracking accuracy, as in Eq. (8b). Meanwhile, the risk of
sensor failure is encoded by Tr(O−i, Π1
), where Oi, Π is the risk-aware observability
Gramian for robot i, defined similarly as the ensemble form in [1, Eq. (14)]. The
risk level is upper bounded by a pre-specified threshold ρi2 to discourage highly
risky behavior, as in Eq. (8c). δ i1, j , ∀ j ∈ T and δi2 are slack variables, which
enable the “bend but not break ” behavior when sensor failures render it impossi-
ble to meet the strict constraints of ρ i1, j , ∀ j ∈ T and ρi2 . See [1, Sect. IV-B, C,
D] for more details on the design of sensor margin and observability Gramian.
Moreover, according to Eq. (8d), each robot should have at least one neighbor,
so that the communication network formed by the robots remains connected.
Note that the connectivity relationship is determined using xi rather than x̄i ,
to ensure that the robots are still connected at the next time step. Constraint
Eq. (8e) specifies the maximum distance dmax that robot i could travel between
two consecutive steps. Finally, Eq. (8f) is the collision avoidance constraint to
guarantee the minimum safety distance dmin between two robots. Notice that
the optimization program in Eq. (8) is a non-convex problem that is computa-
tionally difficult to solve. We discuss an approach to solve it in two stages in the
next section.
Decentralized Risk-Aware Tracking of Multiple Targets 413
4 Approach
In the proposed decentralized system, every robot relies on four major modules
to perform the target tracking task, as illustrated in Fig. 1. At each time step
t, every robot first generates its ideal goal state that it wants to reach without
considering constraints including network connectivity and collision avoidance.
Its decisions are only based on trading off tracking quality and risk aversion. This
is performed by the Ideal Action Generation module, as introduced in Sect. 4.1.
Sect. 4.2 corresponds to the Connectivity and Collision Avoidance module, which
employs CBF to ensure network connectivity and to avoid inter-robot collisions
when the robots move towards their respective goal positions. After each robot
moves to its new position, the Decentralized Estimation module generates a new
estimate of target positions and maintains common knowledge of such estimate
among neighbors using the Kalman-Consensus Filtering algorithm in [22]. We
introduce it in Sect. 4.3. Note that the Local Information Exchange module is
responsible for information sharing within the neighborhood and all the other
three modules rely on it.
Fig. 1. The proposed decentralized framework. Each robot performs the decentralized
target tracking task using 4 modules: Local Information Exchange, Decentralized Ideal
Action Generation, Decentralized Connectivity and Collision Avoidance, and Decen-
tralized Estimation. The Local Information Exchange module takes charge of all com-
munications between neighbors.
In this section, we focus on computing the “ideal” positions that robots want to
reach at every time step, only out of the desire for higher tracking accuracy and
aversion to overly risky behavior, without considering the connectivity mainte-
nance and collision avoidance constraints. To this end, each robot i ∈ R solves a
non-convex optimization program as follows:
414 J. Liu et al.
1
min q1 ηi δ i1 2 + q2 δ2 (9a)
xi ,δ i 1,δi 2 1 + ηi i 2
s.t. Tr(Pi j ) ≤ ρ i1, j + δ i1, j , δ i1, j ≥ 0, ∀ j ∈ T ; (9b)
Tr(O−i, Π
1
) ≤ ρi2 + δi2, δi2 ≥ 0. (9c)
This is a less constrained version of the optimization program in Eq. (8). Note
that to achieve team-level coordination, the computation of Pi j and O−i, Π
1
requires
not only robot i’s information, but also that from its neighbors acquired through
the Local Information Exchange module. Since each robot runs its own opti-
mization program, all robots in the team are required to solve the optimization
program in Eq. (9) over multiple rounds, until their ideal goal positions converge.
each robot i ∈ R is steered towards its desired position by solving the following
quadratically constrained quadratic program:
1 2
min ui − udes,i (13a)
ui 2
s.t. ui Δt ≤ dmax ; (13b)
∂λ2
ui + λ2 − ≥ 0; (13c)
∂x̄i
∂dil2
ui + dil2 − dmin
2
≥ 0, ∀l ∈ Ni, (13d)
∂x̄i
where udes,i denotes the ideal control input which could be computed easily using
the desired ideal goal position generated in Sect. 4.1. Δt is the time interval
between two consecutive steps. x̄i is robot i’s state at the current step. dil is the
distance between robot i and l calculated using their current states x̄i and x̄l . The
cost function in Eq. (13a) minimizes the discrepancy between robot i’s actual
control input and its ideal one. Constraint Eq. (13b) specifies the maximum
distance dmax that each robot could move between consecutive steps. Moreover,
Eqs. (13c) and (13d) are the connectivity maintenance and collision avoidance
constraints, respectively, as introduced above.
In the decentralized system, every robot solves the optimization program
in Eq. (13) to generate its actual control input. Similar to Sect. 4.1, we adopt
an iterative procedure where all the robots exchange information and solve the
optimization program in Eq. (13) repeatedly, until their solutions ui, ∀i ∈ R
converge.
From the computational perspective, each robot i ∈ R needs to know λ2 and
∂λ2
∂x̄i for calculating Eq. (13c). In a centralized system, a central controller has
access to information of every robot and can directly compute λ2 and ∂λ 2
∂x̄i . How-
ever, in the decentralized system, the robots need to compute them using local
information. Inspired by [23, Sect. III-IV], we utilize the Decentralized Power
Iteration (PI) algorithm, which enables each robot to compute a local approx-
imation of λ2 and ∂λ 2
∂x̄i . Such approximation is achieved through repeated local
information exchange. Specifically, in each round of the information exchange,
the message passed between neighbors is:
Ei = {νi, yi, 1, wi, 1, yi, 2, wi, 2 } (14)
where νi approximates the i th element of the eigenvector ν corresponding to λ2
of the weighted Laplacian matrix L. yi, 1 is robot i’s estimate of Ave({νi }), ∀i ∈ R,
i.e., average of all entries in ν. Similarly, yi, 2 is robot i’s estimate of Ave({νi2 }), ∀i ∈
R, i.e., the average of squares of all entries in ν. wi, 1 and wi, 2 are variables
introduced for computation purpose.
[23] shows that the decentralized PI algorithm converges in a few iterations.
Once it converges, every robot computes its local estimate of the algebraic con-
nectivity λ2 as:
l ∈Ni Lil νl
λ2 = −
i
(15)
νi
416 J. Liu et al.
∂λ2
Additionally, from [19, Eq. (8)], ∂x̄i is computed as:
∂λ2 ∂ail
= (νi − νl )2 (16)
∂x̄i l ∈N ∂x̄i
i
where ail is the weighted adjacency introduced in Eq. (12). In this way, ∂λ 2
∂x̄i
is computed locally using the values of νi, νl obtained from the decentralized
PI algorithm. Substituting the approximations of λ2 and ∂λ 2
∂x̄i into optimization
program in Eq. (13), each robot i ∈ R can solve Eq. (13) to obtain it actual
control input.
Whenever the robot team reaches a new state, each robot obtains new measure-
ments of the targets and runs a Kalman Filter (KF) independently to generate
new estimates of all targets’ positions. Specifically, each robot i ∈ R computes
the a priori estimate ẑi,t |t−1 and covariance matrix Pi,t |t−1 during the prediction
step as:
where ẑi,t−1 |t−1 and Pi,t−1 |t−1 represent robot i’s estimate of the targets’ positions
and the corresponding covariance matrix at the previous time step, respectively.
A is the process matrix for targets, and Q is the targets’ process noise covariance
matrix, both introduced in Sect. 3.2.
After robot i ∈ R obtains a new measurement yi,t of the targets, the mea-
surement residual ỹi,t and Kalman gain Ki,t are calculated as:
ỹi,t = yi,t − Hi,t ẑi,t |t−1, Ki,t = Pi,t |t−1 Hi,t S−i,t1, (18)
where Si,t = Hi,t Pi,t |t−1 Hi,t + Ri,t , with Ri,t being the measurement noise covari-
ance matrix as introduced in Sect. 3.2. The new measurement is incorporated
into robot i’s estimation during the update step of KF:
ẑi,t = ẑi,t |t−1 + Ki,t ỹi,t , Pi,t = Pi,t |t−1 − Ki,t Si,t Ki,t , (19)
Since each robot runs its own KF, their estimates may be different. To
maintain a common estimate of the targets’ positions, we utilize the Kalman-
Consensus Filtering algorithm [22, Table I]. This is an iterative approach for the
robots to reach an agreement on their estimates of the targets’ positions. Each
robot first exchanges the information matrix Ωit (0) = (Pi,t )−1 and information
vector qit (0) = (Pi,t )−1 ẑi,t with its neighbors. Then, each robot i ∈ R performs
multiple rounds of the following convex combination:
Ωit (τ + 1) = π̃i,l Ωit (τ), qit (τ + 1) = π̃i,l qit (τ), (20)
l ∈Ni l ∈Ni
Decentralized Risk-Aware Tracking of Multiple Targets 417
until the values of Ωit and qit converge. Note that π̃i,l is the weight between
robot i and l, which is assigned using the method in [24]. Then, each robot
uses Ωit and qit from the last round of the consensus algorithm to compute the
updated estimates of target positions and the covariance matrix. Based on [24],
it is guaranteed that once the consensus algorithm converges, each robot has the
same estimates of targets’ positions as its neighbors.
5 Experiments
Through simulated experiments in the Gazebo environment, we show that the
proposed system can track multiple targets in a decentralized, risk-aware, and
adaptive fashion, while achieving performance comparable to its centralized
counterpart.
At the start of the target tracking process, each robot is equipped with three
heterogeneous sensors which have the following measurement vectors:
h1 = 1 0 , h2 = 0 1 , h3 = 1 1 (21)
i.e., the first type of sensor has sensing ability in the x direction, the second
type of sensor has sensing ability in the y direction, and the third type has
both. Sensor failures are simulated randomly by flipping a biased coin, where
the probability of failure is proportional to the total risk at each robot’s posi-
tion induced by all the targets. Each robot’s random sensor failure is generated
independently. For an illustration of the system performance, a demo of Gazebo
simulations is available online1 , where five robots track three targets.
Here, we present qualitative results for the setting where five robots track four
targets. The initial sensor status as described in Eq. (2) is Γ0 = 15×3 . For each
robot i ∈ {1, 2, · · · , 5}, we let the parameters to trade off between tracking quality
maximization and risk minimization in Eq. (9a) be q1 = 3.0, q2 = 20.0. The
maximum tracking error in Eq. (9b) is chosen as ρ i1 = [0.001; 0.001; 0.001; 0.001],
and the maximum risk in Eq. (9c) is ρi2 = 0.15. The connectivity threshold in
Eq. (13c) is chosen to be = 0.25.
Figure 2 compares the behavior of robots under three different settings: risk-
agnostic with communication range Rcomm = 11.0 (Fig. 2a), risk-aware with
1 https://www.youtube.com/watch?v=AAOKDGqkuz8.
418 J. Liu et al.
Rcomm = 11.0 (Fig. 2b), and risk-aware with Rcomm = 7.0 (Fig. 2c). To simu-
late the risk-agnostic case, we remove constraint Eq. (9c) and the second term
in the cost function Eq. (9a) for all robots. All robots are initially placed around
the origin, and the targets move in concentric circles. In all three cases, the
robots can move closer to the targets, so that they could obtain more accurate
estimates of target positions. Meanwhile, these five robots spread out to track
different targets.
Figure 2a shows that without risk awareness, the robots directly approach the
targets to get better estimates of the targets’ positions. In contrast, the robots
keep a certain distance from the targets if the risk-aware constraint is added,
as shown in Fig. 2b. Such risk-aware behavior helps the robot team preserve its
sensor suite for a longer time. Figure 2c shows the behavior of the robot team
when the communication range is relatively small. Compared to Fig. 2b, the
robot team is more sparsely connected in this case.
Fig. 2. Comparison of the target tracking behaviors under three different settings. Grey
circles represent the robots. We draw true targets’ positions and their trajectories in
the latest 20 steps using red squares and red dotted lines. Correspondingly, blue squares
and blue dotted lines represent the estimated targets’ positions and trajectories. The
purple-shaded regions around the targets indicate the distance-dependent risk field.
The green edge denotes that the robots on its two endpoints are connected. (Color
figure online)
We compare the proposed decentralized target tracking system with its central-
ized counterpart in terms of tracking error and sensor margin. The centralized
tracking system used for comparison is from our previous work [1]. For a fair
comparison, we use the same CBFs (Eq. (13c) and (13d)) for network connec-
tivity maintenance and collision avoidance.
Fig. 3. Comparison of tracking error in Fig. 3a and sensor margin in Fig. 3b when the
risk-aware constraint is vs is-NOT added to our decentralized target tracking framework
with Rcomm = 18.0.
Fig. 4. The trade-off between tracking error and risk level with respect to the sensor
margin. When η is small, the system prioritizes safety; as η increases, the system
prioritizes higher tracking accuracy.
420 J. Liu et al.
Fig. 5. Comparison of the centralized and decentralized systems under the same set-
ting. We enforce the same sequence of sensor failures on both of them as denoted by the
grey crosses. For the decentralized system, the Tr(P) and η shown are both obtained
from averaging across all robots.
We consider the scenario where 4 robots are tasked to track 4 targets. In both
the centralized and decentralized systems, we let Rcomm = 18.0, q1 = 1.0, q2 =
10.0 in Eq. (9a), and the connectivity threshold in Eq. (13c) be = 0.25. The
maximum tracking error in Eq. (9b) is chosen as ρ i1, j = 0.01, ∀ j ∈ {1, 2, 3, 4},
and the maximum risk level in Eq. (9c) is chosen as ρi2 = 0.33, for each robot
indexed i ∈ {1, 2, 3, 4}. Initially, each robot is equipped with three sensors with
measurement matrices h1, h2, h3 , respectively, i.e., the initial sensor configuration
as described in Eq. (2) is Γ0 = 14×3 . To conveniently compare their performance,
we enforce the same sequence of pre-specified sensor failures instead of inducing
sensor failures randomly. In a total of 50 time steps, a sensor whose measure-
ment matrix is equal to h3 is damaged at time step 20; another sensor whose
measurement matrix is h2 is damaged at step 35. The comparison of the track-
ing error and sensor margin is illustrated in Fig. 5. When both systems have the
same changes in their sensor margin as shown in Fig. 5b, the tracking error of
the decentralized system is similar to that of the centralized one as shown in
Fig. 5a.
References
1. Mayya, S., et al.: Adaptive and risk-aware target tracking for robot teams with
heterogeneous sensors. IEEE Robo. Autom. Lett. 7(2), 5615–5622 (2022)
2. Schwager, M., Dames, P., Rus, D., Kumar, V.: A multi-robot control policy for
information gathering in the presence of unknown hazards. In: Christensen, H.I.,
Khatib, O. (eds.) Robotics Research, pp. 455–472. Springer, Cham (2017). https://
doi.org/10.1007/978-3-319-29363-9 26
3. Pham, H.X., La, H.M., Feil-Seifer, D., Deans, M.: A distributed control framework
for a team of unmanned aerial vehicles for dynamic wildfire tracking. In: 2017
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS),
pp. 6648–6653. IEEE (2017)
4. Robin, C., Lacroix, S.: Multi-robot target detection and tracking: taxonomy and
survey. Auton. Robot. 40(4), 729–760 (2016)
5. Sung, Y., Budhiraja, A.K., Williams, R.K., Tokekar, P.: Distributed assign-
ment with limited communication for multi-robot multi-target tracking. Autonom.
Robots 44(1), 57–73 (2020)
6. Zhao, Y., Wang, X., Wang, C., Cong, Y., Shen, L.: Systemic design of distributed
multi-uav cooperative decision-making for multi-target tracking. Auton. Agent.
Multi-Agent Syst. 33(1), 132–158 (2019)
7. Bar-Shalom, Y., Li, X.R., Kirubarajan, T.: Estimation with Applications to Track-
ing and Navigation: Theory Algorithms and Software. John Wiley & Sons (2004)
8. Zhou, K., Roumeliotis, S.I.: Multirobot active target tracking with combinations
of relative observations. IEEE Trans. Robot. 27(4), 678–695 (2011)
9. Dames, P., Tokekar, P., Kumar, V.: Detecting, localizing, and tracking an unknown
number of moving targets using a team of mobile robots. Int. J. Robot. Res. 36(13–
14), 1540–1553 (2017)
10. Zhou, L., Tokekar, P.: Active target tracking with self-triggered communications
in multi-robot teams. IEEE Trans. Autom. Sci. Eng. 16(3), 1085–1096 (2018)
11. Zhou, L., Tokekar, P.: Sensor assignment algorithms to improve observability while
tracking targets. IEEE Trans. Rob. 35(5), 1206–1219 (2019)
12. Dames, P.M.: Distributed multi-target search and tracking using the PHD filter.
Autonom. Robots 44(3), 673–689 (2020)
422 J. Liu et al.
13. Dias, S.S., Bruno, M.G.S.: Cooperative target tracking using decentralized particle
filtering and RSS sensors. IEEE Trans. Signal Process. 61(14), 3632–3646 (2013)
14. Wakulicz, J., Kong, H., Sukkarieh, S.: Active information acquisition under arbi-
trary unknown disturbances. In: 2021 IEEE International Conference on Robotics
and Automation (ICRA), pp. 8429–8435. IEEE (2021)
15. Wang, L., Ames, A.D., Egerstedt, M.: Safety barrier certificates for collisions-free
multirobot systems. IEEE Trans. Robot. 33(3), 661–674 (2017)
16. Sung, Y., Budhiraja, A.K., Williams, R.K., Tokekar, P.: Distributed simultaneous
action and target assignment for multi-robot multi-target tracking. In: 2018 IEEE
International Conference on Robotics and Automation (ICRA), pp. 3724–3729.
IEEE (2018)
17. Tallamraju, R., Rajappa, S., Black, M.J., Karlapalem, K., Ahmad, A.: Decen-
tralized MPC based obstacle avoidance for multi-robot target tracking scenarios.
In: 2018 IEEE International Symposium on Safety, Security, and Rescue Robotics
(SSRR), pp. 1–8. IEEE (2018)
18. Ji, M., Egerstedt, M.: Distributed coordination control of multiagent systems while
preserving connectedness. IEEE Trans. Robot. 23(4), 693–703 (2007)
19. Capelli, B., Fouad, H., Beltrame, G., Sabattini, L.: Decentralized connectivity
maintenance with time delays using control barrier functions. In: 2021 IEEE Inter-
national Conference on Robotics and Automation (ICRA), pp. 1586–1592. IEEE
(2021)
20. Ames, A.D., Coogan, S., Egerstedt, M., Notomista, G., Sreenath, K., Tabuada, P.:
Control barrier functions: theory and applications. In: 2019 18th European Control
Conference (ECC), pp. 3420–3431. IEEE (2019)
21. Sabattini, L., Secchi, C., Chopra, N., Gasparri, A.: Distributed control of multi-
robot systems with global connectivity maintenance. IEEE Trans. Robot. 29(5),
1326–1332 (2013)
22. Liu, Q., Wang, Z., He, X., Zhou, D.H.: On Kalman-consensus filtering with random
link failures over sensor networks. IEEE Trans. Automat. Control. 63(8), 2701–
2708 (2017)
23. Yang, P., Freeman, R.A., Gordon, G.J., Lynch, K.M., Srinivasa, S.S., Sukthankar,
R.: Decentralized estimation and control of graph connectivity for mobile sensor
networks. Automatica 46(2), 390–396 (2010)
24. Boyd, S., Diaconis, P., Xiao, L.: Fastest mixing markov chain on a graph. SIAM
Rev. 46(4), 667–689 (2004)
25. Atanasov, N., Ny, J.L., Daniilidis, K., Pappas, G.J. :Decentralized active infor-
mation acquisition: theory and application to multi-robot slam. In: 2015 IEEE
International Conference on Robotics and Automation (ICRA), pp. 4775–4782.
IEEE (2015)
26. Zhou, L., Tzoumas, V., Pappas, G.J., Tokekar, P.: Resilient active target tracking
with multiple robots. IEEE Robot. Automat. Lett. 4(1), 129–136 (2018)
27. Shi, G., Zhou, L., Tokekar, P.: Robust multiple-path orienteering problem: securing
against adversarial attacks. In: 2020 Robotics: Science and Systems (RSS) (2020)
28. Ramachandran, R.K., Zhou, L., Preiss, J.A., Sukhatme, G.S.: Resilient coverage:
exploring the local-to-global trade-off. In: 2020 IEEE/RSJ International Confer-
ence on Intelligent Robots and Systems (IROS), pp. 11740–11747. IEEE (2020)
29. Liu, J., Zhou, L., Tokekar, P., Williams, R.K.: Distributed resilient submodular
action selection in adversarial environments. IEEE Robot. Autom. Lett. 6(3),
5832–5839 (2021)
Decentralized Risk-Aware Tracking of Multiple Targets 423
30. Zhou, L., Tzoumas, V., Pappas, G.J., Tokekar, P.: Distributed attack-robust sub-
modular maximization for multirobot planning. IEEE Trans. Robot. (2022)
31. Zhou, L., Kumar, V.: Robust multi-robot active target tracking against sensing
and communication attacks. In: 2022 American Control Conference (ACC), pp.
4443–4450. IEEE (2022)
Application of 3D Printed
Vacuum-Actuated Module
with Multi-soft Material to Support
Handwork Rehabilitation
1 Introduction
Soft modular robotics is a research field that pursues body flexibility while main-
taining reconfigurability by employing flowable or soft materials as the main con-
stituent materials of modular robots. In soft modular robotics, a hollow structure
made of silicone is created, and the inside of the structure is pressurized to gen-
erate stretching and bending motions as an actuator module [1,2]. In general,
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 424–437, 2024.
https://doi.org/10.1007/978-3-031-51497-5_30
Application of 3D Vacuum-Actuated Module to Support Handwork 425
This section describes the procedure for creating the MORI-A FleX module.
426 S. Abe et al.
The silicone case is made of EcoflexTM 00–10 (Smooth-On, Inc.), a stretchable sil-
icone. The shape is a hollow cube of 19.0 mm square with a thickness of 2.0 mm.
Since the goal here is to develop a wearable device for hand work rehabilitation,
the dimensions are adjusted to the average finger width of an adult. The scala-
bility of the MORI-A FLeX module is not limited to this size, as it can be scaled
in Fig. 4 and Fig. 5. The shrinking, bending, and shearing deformation are 3D
printed with TPU filaments because they require flexibility to allow deformation.
The no-deformation structure is achieved by using Poly-lactic acid (PLA) as
the material, which has a stiffness that does not cause bending even when the
silicone case is vacuumed. Therefore, This structural form is not necessary for
“undeformed” since the material is the main deforming element. These parallel
cross structures were fabricated using an FDM 3D printer (X-Pro, QIDI Tech,
Inc.).
The MORI-A FleX module also has a parametric design for the connector that
closes the opening. The basic shape of the connector is a cylinder with a diameter
slightly larger than the silicone case opening. These connectors can be fabricated
on a 3D printer using the Ninja FleX (Ninja Tek) TPU filament or the UV light-
curable resin (Clear/Black Resin, Wanhao). The list of connectors to be molded
with this material and their roles are shown in Fig. 6. There are four types of
MORI-A FleX connectors: (1) module-to-module connector; (2) module air plug;
(3) texture plug; and (4) tubing connector. (1) The module-to-module connectors
are used to connect two vacuum-actuated modules with flow paths secured. (2)
The module air plug is a connector that seals the opening of a module without
connecting it to another module. The thickness of the cylindrical part is the same
as the wall thickness of the silicone case, so it can be sealed without allowing
air to enter. (3) The texture plug is used to change the surface texture of the
module. (4) A tube connector is a connector that can connect a silicone tube to
a module to allow air to flow from the silicone tube to the module. It is mainly
used to connect an air compressor.
430 S. Abe et al.
Fig. 8. Comparison of load carrying capacity of MORI-A FleX in the tensile direction
depending on the number of connections
pile to give the desired texture to the module. The results of a comparison of
the processed surface textures are shown in Fig. 9. These connectors coated with
UV-curable resin have the flexibility of the Ninja FleX (NinjaTek), but have a
smooth, glossy surface texture. Next, connectors coated with SMG developed
432 S. Abe et al.
Fig. 10. Comparison of friction coefficients due to materials coated on texture plugs
Fig. 11. Schematic of wearable device with MORI-A FleX assembly supported by
handwork
Fig. 12. Grasping a PET bottle of 200 ml water and a half-boiled egg (subject’s hand
is in a state of weakness)
434 S. Abe et al.
4 Applications
MORI-A Flex will be applied to a wearable device for the rehabilitation of human
hand grasping movements. A schematic diagram of the wearable device to be
developed is shown in Fig. 11. To secure MORI-A Flex, which is scaled to the
width of a human finger, to the finger, a ring connector is sculpted with a binding
band glued to one side of a UV-curable resin sealing plug. The modules on the five
fingers are connected to each other by a silicone tube and a six-mouth connector
of UV-curable resin developed by the company. To represent the movement along
the fingers with MORI-A FleX, a shear deformation module is attached to the
tip of the finger, while the rest of the finger area is attached to a module that
Fig. 14. Block diagram showing the wearable device developed in this study and the
control of the sixth finger
Application of 3D Vacuum-Actuated Module to Support Handwork 435
can stretch and deform. Figure 12 shows the actual grasping of a boiled egg and
a PET bottle containing 200 ml of water with this wearable device. The results
are evaluated from two points of view: rehabilitation and assistance. First, in
terms of rehabilitation, since the finger flexes softly and without strain, it is
expected to gradually restore the sense of finger bending for those who have
peripheral nerve problems by repeatedly applying and decreasing pressure. From
the viewpoint of power assist, even when the subject is in a state of weakness,
attaching a highly tacky SMG texture change plug to the tip of the little finger
on MORI-A FleX creates an effect that makes it easier to grasp a PET bottle
and successfully carry an object with a load of 200g. Next, in the grasping of a
half-boiled egg, which was conducted to verify whether an object with a smooth
texture could be grasped, a boiled egg was wrapped in the palm of the hand,
and the hand was able to carry the boiled egg stably without slipping under
reduced pressure. The deformability, contact texture, and changeable shape of
the wearable device can be considered to enable simple power assistance even
when the finger muscles are completely inoperable. In addition, the MORI-A
Flex attached to the finger was unable to withstand the weight of an object
weighing more than 200 g. This was due to misalignment between the module
and the finger, air leakage inside the module, and misalignment between the
adhesive surface and the object, which made it impossible to grasp the object
properly. However, although a PET bottle was grasped from the side, objects
that can be wrapped by hand, such as a half-boiled egg, can be expected to
grasp objects weighing 200 g or more. It can be indicated that adding two shear
modules next to the fingertips would allow them to carry objects more stably. In
addition, as shown Fig. 13, the flexion module is expected to assist in grasping
by creating a sixth finger composed only of flexion. Furthermore, if it can be
combined with the wearable device developed this time, it is thought that it will
be possible to grasp heavier and more difficult-to-grasp objects. Figure 14 shows
a block diagram representing the wearable device developed in this study and
the control of the sixth finger. Compressed air from the compressor is connected
to the vacuum generator to reduce pressure. A solenoid valve is inserted just
before the vacuum generator, and when the valve is open, the module attached
to the finger is depressurized and driven. On the other hand, when the valve is
closed, air flows back and the module returns to its original shape. Pneumatic
pressure can be controlled by opening and closing the valve with a relay circuit
controlled by an Arduino.
5 Conclusion
We have developed a versatile vacuum-actuated module, MORI-A FleX, which
allows for high reconfigurability, diverse deformability, and changes in contact
surface texture. By applying the morphological adaptability and texture adjust-
ment capabilities of this module, we have succeeded in developing a soft-weld
device for simple grasp rehabilitation and assistance by assisting soft movements
along with hand and finger movements. Since this module does not require elec-
trical wiring for the actuator, there is no risk of wire breakage or electric shock,
436 S. Abe et al.
and it can be viewed as a safe rehabilitation tool. Future prospects include the
development of MORI-A FleX wearable devices for infants and toddlers through
morphological changes, scaling, various combinations and refinements of different
textures, automatic calculation of module configurations to fit each individual,
and motion assist for different body parts.
References
1. Cheney, N., MacCurdy, R., Clune, J., Lipson, H.: Unshackling evolution: evolv-
ing soft robots with multiple materials and a powerful generative encoding. ACM
SIGEVOlution 7(1), 11–23 (2014)
2. Kriegman, S., Walker, S., Shah, D., Levin, M., Kramer-Bottiglio, R., Bon-
gard, J.: Automated shapeshifting for function recovery in damaged robots. In:
arXiv:1905.09264 (2019)
3. Vergara, A., Lau, Y., Mendoza-Garcia, R.F., Zagal, J.C.: Soft modular robotic
cubes: toward replicating morphogenetic movements of the embryo. PLoS ONE
12(1), e0169179 (2017)
4. Lee, J., Kim, W., Choi, W., Cho, K.: Soft robotic blocks: introducing SoBL, a
fast-build modularized design block. IEEE Robot. Autom. Magaz. 23(3), 30–41
(2016)
5. Nemitz, M.P., Mihaylov, P., Barraclough, T.W., Ross, D., Stokes, A.A.: Soft robotic
blocks: using voice coils to actuate modular soft robots: wormbot, an example. Soft
Rob. 3(4), 198–204 (2016)
6. Zhang, Y., et al.: A mechatronics-embedded pneumatic soft modular robot powered
via single air tube. Appl. Sci. 9(11), 2260 (2019)
7. Zou, J., Lin, Y., Ji, C., Yang, H.: A reconfigurable omnidirectional soft robot based
on caterpillar locomotion. Soft Rob. 5(2), 164–174 (2018)
8. Sui, X., Cai, H., Bie, D., Zhang, Y., Zhao, J., Zhu, Y.: Automatic generation of
locomotion patterns for soft modular reconfigurable robots. Appl. Sci. 10(1), 294
(2020)
9. Morin, S.A., et al.: Using “Click-e-Bricks” to make 3D elastomeric structures. Adv.
Mater. 26(34), 5991–5999 (2014)
10. Morin, S.A., et al.: Elastomeric tiles for the fabrication of inflatable structures.
Adv. Func. Mater. 24(35), 5541–5549 (2014)
11. Ogawa, J., Mori, T., Watanabe, Y., Kawakami, M., Shiblee, M.D.N.I., Furukawa,
H.: MORI-A: soft vacuum-acctuated module with 3D-printable deformation struc-
ture. IEEE Robot. Autom. Lett. 7(2), 2495–2502 (2022)
12. Hironari, T., Shuichi, W., Koichi, S.: Development of hand rehabilitation system to
prevent contracture for finger joints based on the therapy of occupational therapists
(Massege hand and range of motion exercises using pneumatic soft actuators) .
Trans. JSME 80(820), 348-348 (2014). (in Japanese)
Application of 3D Vacuum-Actuated Module to Support Handwork 437
13. Robertson, M.A., Kara, O.C., Paik, J.: Soft pneumatic actuator-driven origami-
inspired modular robotic “pneumagami”. Int. J. Robot. Res. 40(1), 72–85 (2021)
14. Yiyue, L., et al.: Digital fabrication of pneumatic actuators with integrated sending
by machine knitting. In: CHI Conference on Human Factors in Computing Systems,
pp. 1-13 (2022)
15. Al-Fahaam, H., Davis, S., Nefti-Meziani, S., Theodoridis, T.: Novel soft bending
actuator-based power augmentation hand exoskeleton controlled by human inten-
tion. Intell. Serv. Robot. 11(3), 247–268 (2018)
16. Xie, D., Ma, Z., Liu, J., Zuo, S.: Pneumatic artificial muscle based on novel winding
method. Actuators 10(5), 100 (2021)
17. Herr, H.M., Koronbluh, R.D.: New horizons for orthotic and prosthetic technol-
ogy: artificial muscle for ambulation. In: Smart Structures and Materials 2004.
Electroactive Polymer Actuators and Devices (EAPAD), vol. 5385, pp. 1-9 (2004)
18. Dong, T., Zhang, X., Liu, T.: Artificial muscles for wearable assistance and reha-
bilitation. Front. Inf. Technol. Electron. Eng. 19(11), 1303–1315 (2018)
19. Al-Fahaam, H., Davis, S., Nefti-Meziani, S.: The design and mathematical mod-
elling of novel extensor bending pneumatic artificial muscles (EBPAMs) for soft
exoskeletons. Robot. Auton. Syst. 99, 63–74 (2018)
20. Pan, M., et al.: Soft actuators and robotic devices for rehabilitation and assistance.
Adv. Intell. Syst. 4(4), 2100140 (2022)
21. Mendoza, M.J., et al.: A vacuum-powered artificial muscle designed for infant reha-
bilitation. Micromachines 12(8), 971 (2021)
22. Touma, T.: Complex of Parallel crosses structure. J. Imaging Soc. Japan 58, 406–
414 (2019)
23. Yokoo, T., Hidema, R., Furukawa, H.: Smart lenses developed with high-strength
and shape memory gels. e-J. Surf. Sci. Nanotechnol. 10, 243–247 (2012)
Combining Hierarchical MILP-MPC
and Artificial Potential Fields
for Multi-robot Priority-Based
Sanitization of Railway Stations
prefer cars instead of trains [6]. In this regard, disinfectant robot technologies are
proven very useful to fight global pandemics [30] by reducing the number of peo-
ple involved in the cleaning process and by optimizing the sterilization. In this
work, we propose a multi-robot sanitizing framework specific for teams of robots
capable of cleaning human-populated environments [16] as railway stations. Our
main goal is to design a framework in which a team of autonomous indoor clean-
ing robots executes the sanitizing activities exploiting information about human
presence retrieved from WiFi infrastructure to maximize the cleaning perfor-
mance in big indoor environments such as railway stations without stopping the
railway/commercial activities. In particular, we exploit the station’ WiFi Net-
work to verify the most populated areas of the environment by retrieving the
information about the position of the users’ smartphones [24]. In our solution
a Server (MPC-Server) receives from the WiFi Network, the information about
the presence of the people in the station and estimates the positions of the con-
taminated areas to be mapped into an heatmap. Here, we deploy a distributed
MPC approach that adapts the strategy of sanitization to the evolution of the
contamination due to the variation of the position and number of people in the
station. Several approaches have been proposed for multirobot cleaning of indoor
environments. One possible approach is to formulate the cleaning of the environ-
ment as a coverage path planning (CPP) problem, where the objective is to find
a strategy to reach every part of the environment minimizing the time of the
execution. A first work relies on the random walk method [4], where robots per-
forms random movements inside the environment. In this case, the time of exe-
cution is affected by position overlapping and path repetitions. Other solutions
consider the environment as a grid-map that is partitioned in several parts and
assigned to different robots executing fixed shaped cleaning paths (spirals, trian-
gles, rectangles, etc.) [9–12,14,17,18,20]. These methods are effective to provide
a continuous cleaning service which maximize the coverage and minimizes the
idleness of the agents. Additional works regarding the CPP methodology, con-
sider the presence of some zones of the environment with persistent static higher
priority, that needs to be cleaned/sanitized with pre-specified higher frequency
[5,13,15,21,28]. Those approaches often consider a graph-based representation
of the environments with only a limited number of nodes. In contrast, we are
interested to find a solution considering high resolution and dynamic priorities
in very large railway stations. An interesting work is described in [5], where the
authors solve the multi-robot persistent coverage control problem over a grid
environment using Mixed Integer Linear Programming (MILP) to maximize the
coverage level of the cells over a finite horizon. In their study, the authors defined
terminal constraints to ensure that the agents are always able to terminate their
plans in predetermined closed trajectories. Also in our framework we consider
persistent priorities from the historical data recorded by Meraki WiFi Network,
but in contrast, we are more interested to find a solution that may adapt to
different circumstances, following the unexpected changes of priorities of the
environment, without fixed trajectories or fixed nodes. Moreover, we chose to
design a Hierarchical MILP (HMILP) [3,7] where the problem of finding differ-
440 R. Caccavale et al.
2 Architecture
Fig. 1. Distributed client-server architecture, where the MILP server sends the MPC-
Heatmap and the destination nodes to the robots that decide their root thanks to the
APF motion path technique.
As shown in Fig. 1, we consider that the MPC Server receives the Meraki
heatmap as visualized in the Meraki Dashboard, with an updating period of 1 h.
In the Meraki heatmap, the colors represent the current distribution of people
in the station. This image is converted in a new MPC-Heatmap formatted as
gridmap of dimensions 100 × 172 pixels, in which the colors are now indicating
the level of contamination of the areas in the environment, as shown in Fig. 2
(a). More formally, we define M as the gridmap of the station, X as the set of
all free-obstacle grids in the map, S as the set of possible heatmaps (i.e. priority
distributions) on the map M and A as the set of actions available for a single
agent where ai ∈ A drives the agent i from the current grid to an adjacent one
(Fig. 2 (b)). Each element of the MPC-Heatmap is a real number in the interval
[0, 1] where 1 is the maximum priority and 0 means that no cleaning is needed.
This matrix can be displayed as a color-coded image (see MPC-Heatmap in Fig. 2
(b)) where black pixels are associated to 0 priority areas, while colors from red to
442 R. Caccavale et al.
yellow are for increasingly higher priorities. We have also modeled the behavior
of the contaminants to spread and attenuate during the time. As for the update
of the heatmap (cleaning priority) it is computed from the position of people
(clusters) by modeling possible spreading of viruses or bacteria using a gaussian
model of dispersion as in [8]. Specifically, we exploit the periodic convolution of
a Gaussian filter every step, where μ and σ are suitable parameters that can be
regulated depending on the meters/pixels ratio, the timestep, and the considered
typology of spreading (in this work we assume a setting inspired to the aerial
diffusion of the Covid-19 [26]). In our case, we use the values of μ and σ as in
[25]. The effects of spreading and attenuation of the priorities at every step of
time in the MPC-Heatmap is shown in Fig. 2 (c) after 100 steps. This spreading
process is also considered as a model for our MPC method in order to estimate
the evolution of the priority over time.
Fig. 2. Heatmap representation of the Roma Termini Station: Meraki Heatmap (a)
MPC-Heatmap (b) MPC-Heatmap after 100 timesteps, MPC-Heatmap with the asso-
ciated graph.
(Fig. 2) with different colored zones. The MPC-Server extracts information about
the positioning of the people in the station from the current Meraki heatmap,
it receives the positions of every robot and their chosen paths and cleans rela-
tive zones in the MPC-Heatmap to prepare the new updated map. MPC-Server
provides a partitioning of the current MPC-Heatmap in a discrete and limited
number of areas with a fixed step of sectioning (in our case we choose 20 pixels);
every resulting subarea will be represented by a single value in an ordered list
of partitions G1. Considering i ∈ N = {0, n}, where n is the total number of
partitions of the MPC-Heatmap, the value of the element i of the list G1, is qi ,
defined as the sum of the priority risk of the relative portion of MPC-Heatmap.
The MPC Server applies the same procedure to realize a second ordered list G2,
using a second heatmap MPC-Heatmap 2, which is used to estimate the future
expected distribution of people. MPC-Heatmap 2 is built considering the knowl-
edge about the transportation service of the station and the average behavior
of people in the historical data. RFI defines 4 categories of days to organize the
transportation service in the station Roma Termini: the working day, from Tues-
day to Thursday; the Friday, the day before the holidays, including the Saturday;
the holiday, including Sunday, The day after the holidays, as the Monday.
Fig. 3. Four main steps of execution in the strategy of Sanitization presented in this
study.
historical Meraki Heatmaps that represent the conditions of the station in the
days that belong the same category of day of the current MPC-Heatmap used
to built G1, but an hour later. Considering i ∈ N = {0, n}, where n is the total
number of partitions of the MPC-Heatmap 2, the value of the element i of the
list G2, is pi , defined as the sum of the priority risk of the relative portion of
MPC-Heatmap2. Starting by the knowledge of the last received position of every
robot in the MPC-Heatmap, the MPC-Server finds for every robot their current
starting node in the graph. Then, it calculates the next nodes for every robot,
considering the contribution of the values of the nodes of two ordered lists G1
and G2. Every robot receives from the MPC-Server the updated MPC-Heatmap
and a list of destination nodes that it must visit, following the indications of the
HMILP results. Every node represents a specific portion of the MPC-Heatmap.
Every robot, acting independently and with autonomy, verifies the positions of
the peaks of priorities in the portion of MPC-Heatmap considering the nodes of
its path in the graph. It sorts the found peaks starting from the higher value
to the lower value. In case of peaks of the same value, they are positioned in
the list from the most closed to its current position to the farthest one. Finally
every robot, autonomously use the APF method to reach the found peaks in
the list. We assume that the team of robots is constantly connected with the
common MPC-Server thanks to the WiFi railway service of the station. We con-
sider that every robot communicates with each other and they know the APF
paths of the other robots. In this way we avoid path repetition. The robots,
acting independently, are also responsible of the process of obstacle avoidance.
When a robot meets an obstacle it calculates the shortest path to turn around.
It starts to count the number of necessary steps to avoid the obstacles in both
the directions, then it choose the shortest route to avoid it. When a priority
peak is reached, the robot starts to clean following a spiral shape path centered
in correspondence of the coordinates of the found peak in the gridmap as in
[11,12]. We consider that our robot sanitizes an area of the gridmap of dimen-
sion 36 pixels at every timestep, where the timestep has a duration of 1 min as
in the study [25]. The Model Predictive Control (MPC) consists of solving an
on-line finite horizon open loop optimization problem using the current state
as the initial state for the team of robots. The solution to the problem gives
a sequence of control inputs for the entire control horizon; however, only some
elements of the optimal input sequence are applied to the team of robots. The
number of used inputs depends in our case by the number of peaks of priorities
that are present in the MPC-Heatmap, and by the horizon time. In our context,
Meraki Server sends an updating of the Meraki Heatmap with a periodicity of
1 h, so our horizon control is also of 1 h.
HMILP-MPC Multi-robot Priority-Based Sanitization of Railway Stations 445
Fig. 4. Comparison between RFI categories of day for the transportation service in
Roma Termini station. To the left different number of departure of trains, at differ-
ent hours and different categories. To the right, different typology of services for two
categories: Holiday and Working Day.
The graph representation used to describe the problem in this study is similar
to the ones proposed in [19,31]. We define the Graph G = (N, A), where N =
{0, 1, . . . , n} is the set of vertexes and A = {(i, j) : i, j ∈ N, i/ j} is the set of arcs.
Vertex 0 denotes the depot of the robots, while the other vertexes correspond to
different areas of the partitioned gridmap MPC-Heatmap. Each area i must be
sanitized so every node has a non negative priority number qi + pi that represents
the total quantity of risk priority in the relative area of the MPC-Heatmap added
to the predicted value relative to the MPC-Heatmap 2. Each arc between any two
vertexes i and j is associated with a distance di, j (in our solution it represents the
reward for the robot that travel between the nodes i and j). The value associated
to the arcs, takes into account also of the distance of the path in the gridmap
to reach the destination node, and the presence of fixed obstacles. The reward
assigned to the arc is lower in case diagonal movements respect to the axis of
the gridmap (as shown in the Fig. 2 (d)), or in case the robot may meet a wall
or other fixed obstacles to be avoid during its travel in the arc. The weight of
the edge is calculated by the equation : di, j = (q j + p j ) , i, j, ∈ A. In HMILP
path decision, it is important to take in account also of the cost of the distance
that a robot cover to reach the node. In general, a greater distance, depending
for example by the presence of an obstacle, may reduce the efficiency of the total
percentage of cleaning inside the horizon time, because our robots cleans also
the path to arrive at its target as in [25]. For this reason with reference to the
Fig. 2 (d) we define three kinds of arcs: the straight arcs, that connects two nodes
in the same x-axis or same y-axis of the MPC-Heatmap. Examples of straight
arcs in Fig. 2 (d) are the connections between the node 0 and 1, or 0 with 3; the
446 R. Caccavale et al.
Notation Description
N set of all nodes N = {1, . . . , n}
N0 N and the depot N0 = N ∪ {0}
K total number of k robots
Q max value of sanitization of a single robot
di, j benefit assigned to the arc between the nodes, i, j ∈ N0
xi, j is 1 if the k-th robot travels the arc (i,j)
ui total priority sanitized by the kth robot
qi priority of G1
pi priority of G2
diagonal arcs as the arcs as between nodes 0 and 4 or 1 and 5. A third type of
arc are those in which the robot meets an obstacle in his path in the gridmap
following the connection tracked by that arc. d in G represents a benefit for the
robot the run the relative arc. For this reason the arcs that are associated with
a greater distance, as in case of presence of obstacles, are also associated with
a lower benefit. In particular, we consider the following formula to define the
weights of the three defined typologies of arcs:
⎧
⎪ d ,
⎪
⎨ i, j
⎪
straight arcs,
di, j = di, j /2, diagonal arcs (1)
⎪
⎪
⎪ di, j /2, obstacles in the path
⎩
At the beginning of the horizon time, the robots starts their new path in the
graph from the node that was the last reached in the previous hour. The first
action that the MPC Server must do, is the positioning of the robots from the
deposit to the current starting node where the robot has sent its actual position.
This is done considering a dummy arc with a very high gain. The first step of the
path of every robot from the deposit to the starting node is removed from the
final path considered for the cleaning and sent to every robot at the beginning
of their decentralized operations. The notations including sets, parameters, and
variables used in the model are listed in Table 1. The MILP constraints for our
multirobot sanitization problem are the following as in [22]:
HMILP-MPC Multi-robot Priority-Based Sanitization of Railway Stations 447
max(z) (2)
N0
N0
xi, j di, j ≥ z (3)
i j
N0
xi, j = 1, i∈N (4)
j
N0
N0
xi,l − xl, j = 0, l ∈ N, i j, i l, jl (5)
i j
The objective function of the model is given in (2) which maximize the quan-
tity of priority that is deleted in the graph G. Here, z represents the sum of the
benefits d along a path selected on the graph G. Constraint set (3) indicates
that the objective may not exceed the sum of every benefit of all the arcs of
G. Constraint set (4) ensures that each section of the MPC-Heatmap is visited
exactly once. Constraint set (5) implies that every robot that leaves a node, will
reach a new different node. Constraint set (6) represents a limit for the quantity
of sanitization done in a node i, that cannot be less of the priority assigned to
the node i of the Graph G. Our approach to solve the multirobot optimization
problem is the hierarchical MILP as in [3,7]. The MPC Server repeats the MILP
calculus for every single robot, in a sequence. At every step, it finds the path
for one robot, it puts to zero the nodes of the Graph G, belonging the path
of the previous robot, and calculates the new path for the following robot. The
MPC-Server repeats this method for every following robot. In this way, the MPC
Server considers different graphs for every robot, to prevent two robots to select
the same path, even in case they start from the same node. The MPC Server cal-
culates the value of total reward of the whole team as sum of the single rewards
gained by every robot, for each permutation of the sequence of the 4 robots,
and chooses the sequence with the higher result. MPC Server considers also a
different deposit for every robot, positioned in different places of the Graph G
and fixed at the beginning of simulation. The assignment of a different deposit to
more than one robot may be on-line changed by human intervention to correct
the operation of cleaning by a remote control room. In Fig. 5 we see the action of
cleaning of 4 robots, represented by the white squared dots. The cold zones, in
black, are the path that they leave when they clean a zone of the environment.
The environment is represented by the MPC-Heatmap of Roma Termini, where
1 pixel corresponds to 1 square meter.
448 R. Caccavale et al.
Fig. 5. On the left, the distribution of priorities at the 6:00 a.m. of the 6 September
2021 in our simulation. On the right, the action of cleaning of 4 robots driven by
HMILP-MPC control, and moving in the environment that represent Roma Termini
station, thanks to the APF method.
where xtot = |X | is the number of free-obstacles cells of the map, while scurr =
(i, j) s(i, j) is the sum of the current priorities for each cell of the heatmap.
The value c perc is than 100% if the map is totally clean (i.e., each cell has 0
priority) and 0% if every cell of the map has maximum priority. We compared
the quality of cleaning for the day 6 September 2021, using for the HMILP the
HMILP-MPC Multi-robot Priority-Based Sanitization of Railway Stations 449
mean values of the heatmaps of the days: 31 August, 1 and 2 September 2021,
considered working day in the RFI category definitions. As it is shown in Fig. 6,
the two methods have comparable results and approximately the same shape of
the curve. They may be considered valid alternative to the proposed problem.
Fig. 6. Simulation of sanitization of the day 6 September 2021. The first two curves
represent c perc for the two cases: MARL, HMILP-MPC. The yellow curve represents
the spreading of the contaminants without any robot action.
Our solution has an important advantage respect to the MARL method pre-
sented in [25]. It has the possibility to be controlled by remote site by a technical
staff in a control room. This represents an important innovation in the solving
of the problem proposed by RFI, because Roma Temini represents a challenging
environment where exceptional events may occur and form new aggregation of
people. For this reason, it is required a solution that may permit to correct the
activity of cleaning of the robots, assigning greater priority to a specific area.
This feature is enabled by exploiting the graph-based representation of the envi-
ronment. We can on-line select for every robot a different deposit where they
conclude their mission, choosing between those available in the station. It is also
possible to assign a gain to the nodes inside the subarea that we want to select.
We have verified this possibility simulating a situation in which it is occurred
a rapid increment of priority risk in a zone Z where the nodes 7, 10, 13 are
involved. We selected that zone Z, multiplying the weight of these node with a
gain equal to 10. We also assigned to the robots the deposits in the nodes 10
and 13, while in normal condition their deposits are assigned to the node 2, 7, 9,
450 R. Caccavale et al.
4 Conclusions
References
1. Andersson, O., Ljungqvist, O., Tiger, M., Axehill, D., Heintz, F.: Receding-horizon
lattice-based motion planning with dynamic obstacle avoidance. In: 2018 IEEE
Conference on Decision and Control (CDC), pp. 4467–4474 (2018)
2. Andersson, O., Wzorek, M., Rudol, P., Doherty, P.: Model-predictive control with
stochastic collision avoidance using Bayesian policy optimization. In: 2016 IEEE
International Conference on Robotics and Automation (ICRA), pp. 4597–4604
(2016)
3. Branca, C., Fierro, R.: A hierarchical optimization algorithm for cooperative vehi-
cle networks. In: 2006 American Control Conference, p. 6 (2006)
4. Cao, Z.L., Huang, Y., Hall, E.L.: Region filling operations with random obstacle
avoidance for mobile robots. J. Field Robot. 5, 87–102 (1988)
5. Charitidou, M., Keviczky, T.: An milp approach for persistent coverage tasks with
multiple robots and performance guarantees. Eur. J. Control. 64, 100610 (2022)
6. Ciuffini, F., Tengattini, S., Bigazzi, A.Y.: Mitigating increased driving after the
Covid-19 pandemic: an analysis on mode share, travel demand, and public trans-
port capacity. Transp. Res. Res. 03611981211037884 (2023)
7. Earl, M.G., D’Andrea, R.: A decomposition approach to multi-vehicle cooperative
control. Robot. Auton. Syst. 55(4), 276–291 (2007)
8. Hanna, S.: Transport and dispersion of tracers simulating Covid-19 aerosols in
passenger aircraft. Indoor Air 32(1), e12974 (2022)
9. Lakshmanan, A.K., et al.: Complete coverage path planning using reinforcement
learning for tetromino based cleaning and maintenance robot. Autom. Constr. 112,
103078 (2020)
10. Lee, T.-K., Baek, S., Oh, S.-Y.: Sector-based maximal online coverage of unknown
environments for cleaning robots with limited sensing. Robot. Auton. Syst. 59(10),
698–710 (2011)
11. Lee, T.-K., Baek, S.-H., Choi, Y.-H., Oh, S.-Y.: Smooth coverage path planning
and control of mobile robots based on high-resolution grid map representation.
Robot. Auton. Syst. 59(10), 801–812 (2011)
12. Lee, T.-K., Baek, S.-H., Oh, S.-Y., Choi, Y.-H.: Complete coverage algorithm based
on linked smooth spiral paths for mobile robots. In: 2010 11th International Con-
ference on Control Automation Robotics Vision, pp. 609–614 (2010)
13. Mallya, D., Kandala, S., Vachhani, L., Sinha, A.: Priority patrolling using multi-
ple agents. In: 2021 IEEE International Conference on Robotics and Automation
(ICRA), pp. 8692–8698 (2021)
14. Miao, X., Lee, J., Kang, B.-Y.: Scalable coverage path planning for cleaning robots
using rectangular map decomposition on large environments. IEEE Access 6,
38200–38215 (2018)
15. Murtaza, G., Kanhere, S., Jha, S.: Priority-based coverage path planning for aerial
wireless sensor networks. In: 2013 IEEE Eighth International Conference on Intel-
ligent Sensors, Sensor Networks and Information Processing, pp. 219–224 (2013)
16. Narang, M., et al.: Fighting Covid: an autonomous indoor cleaning robot (AICR)
supported by artificial intelligence and vision for dynamic air disinfection. In: 2021
14th IEEE International Conference on Industry Applications (INDUSCON), pp.
1146–1153 (2021)
17. Nasirian, B., Mehrandezh, M., Janabi-Sharifi, F.: Efficient coverage path planning
for mobile disinfecting robots using graph-based representation of environment.
Front. Robot. AI 8, 4 (2021)
452 R. Caccavale et al.
18. Nasirian, B., Mehrandezh, M., Janabi-Sharifi, F.: Efficient coverage path planning
for mobile disinfecting robots using graph-based representation of environment.
Front. Robot. AI 8 (2021)
19. Madridano, Á., l-Kaff, A. , Martı́n, D., de la Escalera, A.: Trajectory planning for
multi-robot systems: Methods and applications. Expert Syst. Appl. 173, 114660
(2021)
20. Oh, J.S., Choi, Y.H., Park, J.B., Zheng, Y.: Complete coverage navigation of clean-
ing robots using triangular-cell-based map. IEEE Trans. Industr. Electron. 51(3),
718–726 (2004)
21. Pasqualetti, F., Durham, J.W., Bullo, F.: Cooperative patrolling via weighted
tours: performance analysis and distributed algorithms. IEEE Trans. Robot. 28(5),
1181–1188 (2012)
22. Qin, W., Zhuang, Z., Huang, Z., Huang, H.: A novel reinforcement learning-based
hyper-heuristic for heterogeneous vehicle routing problem. Comput. Ind. Eng. 156,
107252 (2021)
23. Rasekhipour, Y., Khajepour, A., Chen, S.-K., Litkouhi, B.: A potential field-based
model predictive path-planning controller for autonomous road vehicles. IEEE
Trans. Intell. Transp. Syst. 18(5), 1255–1267 (2017)
24. Ren, Y., et al.: D-log: a WIFI log-based differential scheme for enhanced indoor
localization with single RSSI source and infrequent sampling rate. Pervasive Mob.
Comput. 37, 94–114 (2017)
25. Caccavale, R., Calà, V., Ermini, M., Finzi, A., Lippiello, V., Tavano, F.: Multi-
robot sanitization of railway stations based on deep q-learning. AIRO 2021: 8th
Italian Workshop on Artificial Intelligence and Robotics of the 20th International
Conference of the Italian Association for Artificial Intelligence (AI*IA 2021) (2021)
26. Setti, L., et al. Airborne transmission route of Covid-19: why 2 meters/6 feet of
inter-personal distance could not be enough, April 2020
27. Song, P., Kumar, V.: A potential field based approach to multi-robot manipulation.
In: Proceedings 2002 IEEE International Conference on Robotics and Automation
(Cat. No.02CH37292), vol. 2, pp. 1217–1222 (2002)
28. Stump, E., Michael, N.: Multi-robot persistent surveillance planning as a vehicle
routing problem. In: 2011 IEEE International Conference on Automation Science
and Engineering, pp. 569–575 (2011)
29. Tardivo, A., Zanuy, A.C., Martı́n, C.S.: Covid-19 impact on transport: a paper
from the railways’ systems research perspective. Transp. Res. Rec. 2675(5), 367–
378 (2021)
30. Tavakoli, M., Carriere, J., Torabi, A.: Robotics, smart wearable technologies, and
autonomous intelligent systems for healthcare during the Covid-19 pandemic: an
analysis of the state of the art and future vision. Adv. Intell. Syst. 2(7), 2000071
(2020)
31. Zhou, X., Wang, H., Ding, B., Hu, T., Shang, S.: Balanced connected task alloca-
tions for multi-robot systems: an exact flow-based integer program and an approx-
imate tree-based genetic algorithm. Expert Syst. Appl. 116, 10–20 (2019)
Exploration System for Distributed
Swarm Robots Using Probabilistic Action
Decisions
1 Introduction
In recent years, various studies have been conducted in the form of multi-robot
agent systems, or the hierarchical forms, such as swarm robots, parent and child
robots, and so on [1–4]. These studies have been considered for applications
such as environmental measurement on the ground, disaster relief, and plane-
tary exploration, However The size of these robots, such as a planetary rover, is
relatively large because a single robot must perform a variety of tasks [5]. As a
result, the robot is required strict safety, and the development period and cost
become large. and the risk of loss due to failure is high. In addition, the trav-
eling speed is extremely slow because careful operation is required to maintain
safety, and only limited environments, such as along the path, can be measured.
Therefore, the efficiency of exploration by large robots is low. In order to solve
these problems, small robots have been developed by many institutes, such as
JAXA [6–10]. The advantages of small robots are as follows:
1. the development period and cost can be reduced
2. the robots can be mounted with other payloads.
The details of the extreme environments, such as a planetary surface, are
almost unknown. Therefore, it is hard to localize the robot by themselves.
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 453–465, 2024.
https://doi.org/10.1007/978-3-031-51497-5_32
454 T. Sato et al.
In this section, we present the proposed exploration system for wide area. we also
present the concept of our proposed algorithm and a description of the proposed
algorithm.
This study focuses on the small swarm robots system without path planning. The
schematic diagram is shown in Fig. 1 and the system structure in Fig. 2. We will
consider a simple configuration for the exploration system without assuming that
the ground robots have abundant functions and performance in terms of sensing
and communication capabilities. The proposed exploration system assumes a
heterogeneous small robot system in which a small hopping leader robot exists
in the upper layer and small wheeled swarm robots are in the lower layer. We do
not consider such a large robot can be a leader robot, since this study focuses
on the small swarm robot system.
The wheeled swarm robots in the lower tier are basically engaged in a sin-
gle leader robot, and explore the surrounding area widely with the leader as a
dispersing center. It is generally difficult for small ground robots to observe the
environment and create the map by their own sensing due to the field of view
constriction. In addition, small ground robots do not create the path because of
the reasons mentioned above. The ground robots need to be aware of the posi-
tional relationship with the leader robots, and for this purpose, employ previous
studies [11]. Since there are some studies on global and local path planning by
leader robots, this paper does not focus on the motion planning or control of the
leader robot [12].
The leader robot’s position in this study is to serve as a reference point
for the ground robots in this study, which cannot sense the environment. The
ground robot needs to periodically observe the leader robot at a higher altitude
than the narrowing of the field of view, and the leader robot will leap directly
above the ground robot at the same position until the ground robot finishes
its exploration, for example. At this time, the leader robot also repeats the
observation to determine the next moving point and searches for the next moving
position.
This paper focuses only on the exploration method around this distributed
reference point and shows how to design an algorithm for ground robots that act
independently and autonomously based only on the location information with
the leader robot.
In addition, the lack of some robots does not significantly affect this system,
because the total number of robots is much larger than the lacks. Thus, it has
an advantage in terms of risk diversification [13].
Exploration System for Distributed Swarm Robots 455
Fig. 1. The overview of the proposed Fig. 2. The structure of the proposed
system system
In this method, we adopt a random walk as the basic behavioral model of robots
and suppress their actions in a stochastic manner [14]. This section explains the
proposed stochastic distributed exploration algorithm for wide area observation.
In the assumed exploration system, the objective is that the lower layer robots
perform wide-area distributed exploration around a distributed center such as a
leader without commands from the upper layer.
There are many randomness factors such as the existence of obstacles and
topographical in natural environments or disaster areas, so that it is difficult
to expect the robot position accurately. Considering the low intelligence, low
functionality, and low performance of the robot, the algorithm should be simple
as possible. Formation control by animal mimicry has been the mainstay of the
conventional swarm intelligence studies, however these methods. These are not
suited for the exploration of what this study focuses on [15]. Therefore, we adopt
a stochastic and simple random walk model as the basic algorithm. The random
walk model is generally used to describe the phenomenon of particle dispersion
from an arbitrary point, which is known to gradually expand spatially with the
passage of time. Therefore, we think that the random walk-based method is
suitable for wide-area exploration. However, if the spread distance becomes too
large, many robots move away from the dispersion center and become isolated.
Therefore, it is necessary to moderately restrain the probabilistic movement away
from the center of distribution.
456 T. Sato et al.
p(α )
r= (3)
p(α)
Exploration System for Distributed Swarm Robots 457
(4) A uniform random number R(0 < R < 1) is generated and compared with r
to evaluate the behavior.
R < r, the destination (x , y ) is adopted, and the action is actually taken.
If R > r, it rejects generated action move and takes the rejection action.
Although various patterns of “rejection action” could be considered, such as
re-selection of the direction of movement, or movement toward the center of
dispersion, in this study, the left-right movement behavior is given toward
the center of dispersion, which can be observed widely within the region.
The“rejection action” induces the robots to rotate within the dispersed area
and reduces the bias of the observation.
With the stochastic control method described above, each robot always moves
in the direction of the dispersion center, since r > 1, and it is probabilistically
rejected when the direction of movement is away from the dispersion center. The
proposed method can control a group of robots within 3σ standard deviation
of the proposed distribution and perform a diffuse exploration. The internal
parameters of the proposed distribution can be used for stochastic control of
robots, and the size of the search area around the center of dispersion can be
controlled.
3 Evaluation by Simulation
In this section, we show how the control algorithm with stochastic action decision
proposed in the previous section is controlled in the simulation.
it can be said that the robot is controlled within the range in accordance with
the control parameters. This is also true in comparison with the ideal distribu-
tion shown in Fig. 4. In Fig. 3, all the cells are colored except for the unexplored
cells, and the blue cells outside the cells are light-colored according to the spec-
ification. It can be confirmed that this area is within a radius of 6.0[m] from the
dispersion center.
We also examined whether the proposed method produced exploration results
in accordance with the proposed distribution. A histogram of the x-axis of Fig. 3
is shown in Fig. 5.@ It can be seen that a high density of robots is formed around
the center of dispersion. The histogram was analyzed by approximation, and the
standard deviation σ0 of the normal distribution obtained was 1.7. Although the
analyzed value is not exactly equal to the set value σ0 = 2.0, the results are close
in tendency and numerical value, and it is confirmed that the control is based
on the proposed distribution.
Fig. 8. A robot named RED: There is a hole on the top surface to put out a camera
to recognize a photo of an LED marker in the direction of the ceiling. The robot is
equipped with a distance sensor, a camera, and LED lights in the front.
Exploration System for Distributed Swarm Robots 461
Fig. 9. Structure of robot system Fig. 10. The RED robot explores the
area around one or more LED markers
as a distributed center.
in which direction the LED markers are located and how far away they are. The
measurements are used to run the proposed algorithm. The LED markers are
placed in the exploration space as shown in the Fig. 10 and the RED robot is
allowed to explore around them. The height (z-component) of the LED marker is
given to the robots as a parameter in advance, and the LED marker is detected
and calculated by the camera on the top surface. The distance to the LED
marker in the x-y direction can be calculated by detecting the LED marker with
the camera on the top surface.
The software configuration is shown in Fig. 11, where the information is inte-
grated by GIT Server and UDP Server in the PC. The software on RED robot
is the latest version by referring to the GIT Server. In addition, we developed
the GUI application as shown in Fig. 12 that integrates the UDP Server. The
RED robots send its individual address, acquired images, and its own state to
462 T. Sato et al.
the application, and the application sends parameters necessary for each robot
to operate. The application sends the following parameters to the robot.
The RED robots can act autonomously with the proposed algorithm based on
the parameters and the positional relationship with the LED markers, which are
the dispersion centers. They send the following information to this application.
1. Estimated distance and direction from the center of dispersion at each step
number
2. Camera pictures in floor and ceiling direction at each number of steps
3. Status of each robot, including connection status
The application can analyze the information received from the robots to
make estimated distribution and parameter estimation for the robots. In addi-
tion, by observing the exploration of the robot by an external camera, we can
know the explored and unexplored locations. By determining the percentage of
explored and unexplored locations within the exploration area, we can estimate
the coverage of space relative to time.
Fig. 12. Applications for Management and Analysis of Swarm Robot Systems
Exploration System for Distributed Swarm Robots 463
The exploration field and area of exploration was a concrete field within a black
circle as shown in Fig. 13. 10 robots were operated in the exploration area. The
control parameters of the proposed algorithm were set to a mean value of μ = 5.0
and a standard deviation of σ = 2.0. The radius of the black circle was set
to 8.0[m]. This means that the area of the exploration area is 64[πm2 ]. The
exploration was performed in 1200[sec]. The RED robots were observed by an
external camera. The trajectories of their movements were extracted from the
video and are highlighted in white in the Fig. 13. The LED marker posts were
placed in the center of the circular area in the Fig. 13, but were excluded from
the figure for the sake of analysis.
In surface exploration, the size of the area and the number of overlaps (uni-
formity) affect the efficiency of exploration. In order to validate the exploration
efficiency, the experiments are evaluated in terms of”exploration time” required
to reach a certain level of coverage in an area.
We describe on a heat map created to evaluate exploration efficiency. This
heat map is shown in Fig. 14. The numbers in the color map on the heat map
indicate the number of overlapping explores for each location by the robot swarm,
and the number of overlapping searches increases as the color changes from blue
to red. The robots estimate the direction and distance of the dispersion center at
each step, and the graph summarizing the estimated distance among these data
is shown in Fig. 15. A graph of the coverage of the exploration area versus time
was created from the information on the unexplored and explored squares in the
exploration area obtained from the heat map. This graph is shown in Fig. 16.
Fig. 13. Robot movement trajectory Fig. 14. Heat map of the robot’s move-
obtained from video analysis of the ment trajectory
results of the wide-area exploration
experiment (it is painted in white).
464 T. Sato et al.
Fig. 15. Histogram of estimated dis- Fig. 16. Graph of spatial coverage of
tance between robot and dispersion the exploration radius 8.0[m] versus
center exploration time (in seconds)
From Fig. 6 and Fig. 14, it is confirmed that the on-torus region control works
in the actual experiment as well as in the simulation, depending on the control
parameters. The torus-like region control is a control parameter in which the
mean value μ is made larger than 0 to prevent the robot’s stochastic behavior
from concentrating around the center of dispersion. Figure 15. shows that the
frequency of robot movement around the position 5.0[m] from the dispersion
center is higher. The distribution of robots was modified according to the control
parameter mean μ = 5.0. This data confirms that torus-like area control was
achieved in the actual experiment. Figure 16 shows that in 1200 [sec], the robots
were able to explore 78% of the area with a radius of 8.0[m] from the LED
marker, which is the center of dispersion.
6 Conclusion
Acknowledgment. The results of this research were carried out in collaboration with
the Space Exploration Innovation Hub of the National Aerospace Laboratory of Japan
Aerospace Exploration Agency (JAXA), Takenaka Corporation, and Chuo University.
We would like to take this opportunity to express our deepest gratitude.
References
1. Zungeru, A.M., Ang, L.M., Seng, K.P.: Classical and swarm intelligence based
routing protocols for wireless sensor networks, A survey and comparison. J. Netw.
Comput. Appl. 35, 1508–1536 (2012)
2. Ichikawa, S., Hara, F.: Characteristics on swarm intelligence generated in multi-
robot system. J. Robot. Soc. Japan 13(8), 1138–1144 (1995)
3. Nouyan, S., Grob, R., Bonani, M.: Teamwork in self-organized robot colonies. IEEE
Trans. Evol. Comput. 13(4), 695–711 (2009)
4. Hamann, H.: Swarm Robotics: A Formal Approach. Springer, Cham (2018).
https://doi.org/10.1007/978-3-319-74528-2
5. Lindemann, R.A. and Voorhees, C.J.: Mars exploration rover mobility assembly
design, test and performance. In 2005 IEEE International Conference on Systems,
Man and Cybernetics, pp. 450–455 (2005)
6. Tao, J., Deng, Z., Hu, M., Liu, J., Bi, Z.: A small wheeled robotic rover for planetary
exploration. In: 2006 1st International Symposium on Systems and Control in
Aerospace and Astronautics (2006)
7. Middleton, A., Paschall, S. and Cohanim, B.: Small lunar lander/Hopper perfor-
mance analysis. In: Proceedings of IEEE International Conference on Systems,
Man and Cybernetics (2010)
8. Yoshimitsu, T., Kubota, T., Nakatani, I.: MINERVA rover which became a smalll
artificial solar satellite. In: 20th Annual AIAA/USU Conference on Small Satellites
(2006)
9. Rubenstein, M., Ahler, C., Nagpal, R.: Kilobot: a low cost scalable robot system
for collective behaviors. In: Robotics and Automation(ICRA) (2012)
10. Kazuya, Y., Yoichi, N., Takeshi, M.: Sampling and surface exploration strategies in
MUSES-C and future asteroid missions. In: 20th Annual AIAA/USU Conference
on Small Satellites (2003)
11. Matsumoto, R., Sato, T., Maeda, T., Kunii, Y., Kida, H.: Position and attitude
estimation for distributed exploration of small rovers using flash light from leader
agent. In: 2019 9th International Conference on Recent Advances in Space Tech-
nologies (RAST) (2019)
12. Ushijima, M., Kunii, Y., Maeda, T., Yoshimitsu, T., Otsuki, M.: Path planning with
risk consideration on hopping mobility. In: IEEE/SICE International Symposium
on System Integration (2017)
13. Zhu, Y.F., Tang, X.M.: Overview of swarm intelligence. In: International Confer-
ence on Computer Application and System Modeling, ICCASM 2010 (2010)
14. Nurzaman, S.G., Matsumoto, Y., Nakamura, Y., Koizumi, S., Ishiguro, H.: Yuragi-
based adaptive searching behavior in mobile robot: from bacterial chemotaxis to
levy walk. In: The 2008 IEEE International Conference on Robotics and Biomimet-
ics, Bangkok, Thailand, 21–26 February (2009)
15. Reynolds, C.W.: Flocks, herds, and schools: a distributed behavioral model. In:
ACM SIGGRAPH 1987 Conference Proceedings. Anaheim, California (1987)
16. Milutinovic, D., Lima, P.: Modeling and optimal centralized control of a large-size
robotic population. IEEE Trans. Rob. 22(6), 1280–1285 (2006). https://doi.org/
10.1109/TRO.2006.882941
Distributed Estimation of Scalar Fields
with Implicit Coordination
1 Introduction
In this work we consider the problem of estimating a scalar field using a team of
collaborating robots. This problem is motivated by our ongoing research in pre-
cision agriculture, where it is often necessary to estimate the spatial distribution
of parameters such as soil moisture, nitrates, or carbon dioxide flux that can be
modeled as spatially-varying scalars. To model this underlying field, we use a
Gaussian process (GP) [6]. GPs are not only elegant and efficient computational
tools to solve regression problems, but are also the model of choice in geostatis-
tics and agroecological modeling, where GP regression is known as kriging [9]. A
key feature about GP regression is that this model allows to easily estimate the
uncertainty of the predictions, thus enabling iterative methods where new sensor
measurements are collected in regions of high uncertainty to refine accuracy.
In agricultural applications one is often faced with the problem of estimating
quantities over very large domains, therefore the use of multiple robots allows for
quicker coverage of the region of interest. In these conditions, robots have to plan
their motions with multiple objectives. When an exhaustive, systematic coverage
of the entire region is not feasible, it is important to collect samples at the most
informative places. It is also necessary to be aware of the limited distance that
can be covered before the robot must be refueled or recharged. Therefore robots
have to plan paths that will eventually terminate at a designated end point before
they run out of energy or fuel. Likewise, it is also important to consider that
in real world applications the energy consumed to move between two locations
is not deterministically known upfront; rather, it is a random variable whose
realization is only known at run-time. For example, a robot may need to take a
detour to reach a certain place, or it may move through a muddy area causing
wheel splippage, etc. Finally, in rural regions communication infrastructures are
often lacking or limited and therefore robots can not assume the availability of
broadband communication channels.
With these motivations in mind, in this paper we present an algorithm to
solve this estimation problem with a team of robots. Coordination between the
agents is obtained with minimal information exchange and by leveraging the
mathematical properties of GPs to promote dispersion. The approach is fully
distributed and each robot just broadcasts to the rest of the team the limited
information consisting of the locations where it has collected data, the value it
measured, and its unique identifier—no more than a handful of bytes at a very
low frequency. No other communication is required and robots never exchange
their individual plans or models. In addition, each robot uses a refinement of our
recently developed planner for stochastic orienteering to ensure that it reaches
the final location before it runs out energy. Through extensive simulations we
demonstrate that this approach ensures robots collect samples in areas leading
to a more accurate reconstructions of the underlying unknown scalar field.
The rest of the paper is organized as follows. Selected related work is pre-
sented in Sect. 2. The problem formulation is introduced in Sect. 3 and our meth-
ods are discussed in Sect. 4. In Sect. 5 we present extensive simulations to eval-
uate our proposal and in Sect. 6 we draw the conclusions.
2 Related Work
The problem considered in this contribution is related to the orienteering combi-
natorial optimization problem where one has to plan a path through a weighted
graph to gather the maximum sum of vertex rewards while ensuring that the
length of the path does not exceed a preassigned travel budget. Recently, we
have extensively studied this problem in agricultural settings, both for single
agents [7,13] and multiple agents [14], and we also considered its stochastic vari-
ants [12]. In all these works, however, rewards associated with vertices were
static and assigned upfront, while in this work rewards associated with ver-
tices are iteratively re-estimated based on the gathered samples. Moreover, our
former multi-robot solution [14] was centralized, while we here propose a fully
distributed approach.
The use of Gaussian Processes for estimating scalar fields with robots has
also been explored in the past. Suryan and Tokekar [11] proposed an algorithm
468 L. Booth and S. Carpin
to reduce in minimal time the variance of the scalar field being estimated with a
GP. Their solution uses a team of robots, but does not consider travel budgets,
i.e., robots can travel as much as needed. Similarly, Di Caro et al. [2] propose
an estimation algorithm for GPs aiming at reducing uncertainty under time and
communication constraints. However, their solution does not consider a travel
budget.
Our work is also related to Informative path planning (IPP) where the
emphasis is on planning paths to collect the most informative samples [3,4].
In IPP, however, the set candidate sample points is not given, but is rather
determined by the algorithm, and the travel budget is typically not explicitly
considered.
3 Problem Statement
in Sect. 4. At the end of the mission, after the robots have converged at the goal
vertex vg and are in proximity, they can exchange all the data they have gathered
during the mission. However, at that point data collection is concluded and they
cannot return to the field and acquire more data to improve the estimate. These
communication assumptions are consistent with contemporary technology used
by robots in agricultural domain. In particular, LoRa [10] offers the capability of
streaming limited amounts of data at long distances and is compatible with the
assumptions we made when robots are out in the field. When robots terminate
their mission and are in close proximity data can be instead be exchanged using
onboard WiFi.
To reconstruct the scalar field we use Gaussian process (GP) regression, as
detailed in Sect. 4. Throughout the mission, using the available data (either
collected or communicated) robots can make predictions about the value of f at
arbitrary locations in X . We indicate such predictions as fˆ. The overall objective
is to collect the set of observations providing the most accurate reconstruction
of the underlying scalar field. As common in estimation literature, in this work
our metric for accuracy is the mean squared error (MSE) defined as
1
M SE = (f (ψ) − fˆ(ψ))2 dψ.
|X | X
4 Methods
4.1 Gaussian Process Regression
We provide the necessary notation and background for GP regression and we
refer the reader to [6] for a more comprehensive introduction. As per our prob-
lem definition, we describe the spatial distribution of an unknown parameter
(moisture, nitrates, etc.) as a function: f : X → R that is continuous in the
2-D environment X ⊂ R2 where measurements are taken. The function which
describes the environmental field f and measurement noise σm are represented
as unique random variables that follow an i.i.d. Gaussian distribution with zero
mean μ and variance σ 2 . The Gaussian process assumption is to model f as a
random probability distribution over a set of functions, and that the value of f
for arbitrary inputs x and x (f (x) and f (x ), respectively) has a jointly-Gaussian
distribution. We assume that f (x) is a realization of a Gaussian process, which is
completely defined by a mean function m(x) and a covariance function k(x, x )
with input vector x:
−1
σ = V [f ] = k (x , x ) − k (x , X) k(X, X) + σn2 In k (X, x∗ ) (4)
The kernel k establishes a prior likelihood over the space of functions that can
fit observed data in the regression task. Kernel selection and tuning is a key
component in GP regression tasks. In machine learning the radial basis function
(RBF) kernel is often used. However, in this paper, we use the Matérn kernel
with ν = 3/2 which is a finitely-differentiable function. Our choice of this kernel
is motivated by its broad use in the geostatistical literature for modeling physical
processes [9] like those motivating this research. The Matérn covariance function
takes the form:
√ ν √
21−ν 2ν 2ν
KMatern (X, X ) = σ 2 r Kν r (5)
Γ(ν) l l
where Kν is a modified Bessel function , Γ(·) is the Gamma function, and r is the
Euclidean distance between input points X and X . The hyperparemeters ν > 0,
l > 0, and σ 2 > 0 represent smoothness, lengthscale, and observation variance
respectively. As common in GP inference, to account for the measurement noise,
Distributed Estimation of Scalar Fields 471
the kernel we use is the sum of two kernels, namely the Matérn kernel and a
noise term, i.e., the kernel we use is
where I is the identity matrix and the term σn2 models the measurement noise
σm2
. While we keep ν fixed at 3/2, the other hyperparameters θ = {σ 2 , σn2 , l} can
be trained using various optimization methods using the marginal likelyhood to
match the properties of environment and the sampled data [6]. In particular,
the length scale l is related to the lag parameter of the variogram, a function
used in geostatistics that establishes how quickly the variance increases as a
function of separation distance between pairs of observations in space [9]. In the
GP kernel, smaller values of l imply that variance quickly grows with distance,
while with larger values the variance grows less. As we will see in the next
subsection, by putting constraints on the range of possible values of l one can
implicitly encourage dispersion between the robots, thus promoting the collection
of samples in different areas of the environment.
4.3 Exploration
In this section we present the planning algorithm executed by each robot in the
team. No global data structure is shared among the agents, and all the quantities
described in the following are local to each robot. Let G = (V, E) be the graph
of possible sampling locations and let D = (vi , yi ) i = 1 . . . n the set of collected
samples (vertices and values). All robots start from the same start vertex vs and
must end at the same goal vertex vg . D is initialized as an empty set, but then
grows as the robot collects more data or receives data from other agents. Each
robot is given a unique numerical identifier ni , but the robots need not to know
how many agents are in the team. At each iteration the robot assigns a reward
function to each of the vertices in V , i.e., it computes a function r : V → R
assigning a value to each possible sampling location. Different options for the
function r will be discussed in the next subsection.
Once the function r has been computed, the robot is faced with an instance
of the stochastic orienteering problem, i.e., it has a graph G = (V, E) with known
deterministic rewards r associated to vertices and stochastic costs c associated
to edges, as well as a residual budget B. At this point the robot executes the
algorithm presented in [12] to solve the stochastic orienteering problem (SOP).
Because of the intrinsic computational complexity of the orienteering problem,
the SOP algorithm uses a Monte Carlo tree search informed by an heuristic
aiming at identifying vertices with high value r, low travel cost, and from which
the robot can still reach vg with high probability (the reader is referred to [12] for
all details). The SOP algorithm returns the next vertex va to visit. The robot
then moves to va , collects an observation ya , updates D, and broadcasts the
packet (va , ya , ni ) to all other agents, where ni is the unique id of the robot.
This process continues until the SOP algorithm returns vg , in which case the
agent moves to the goal vertex vg and terminates. Throughout the process the
472 L. Booth and S. Carpin
robot keeps listening for possible packets broadcast by other agents, and when
they are received the location and sampled values are added to D.
As the SOP algorithm was developed for the single robot case, in this work
we added a minor modification to account for the presence of multiple robots.
The change is as follows: When considering the reward of a vertex r(v), the robot
considers for all other agents, the last packet they transmitted (if it exists). Then,
if it determines that another agent is closer to v than itself, it discounts the
reward r(v). More precisely, let v be the vertex whose utility is being evaluated,
and r(v) its reward. Let vc be the location of the current robot, and assume
that it determines that robot j has broadcast a packet indicating it collected a
sample at vertex vj . If vj is closer to v than vc , then r(v) is updated as r (v) =
r(v)d(vj , v)/d(vc , v) where d is the Euclidean distance between the vertices. The
rationale for this update is that if another robot is closer to v, then it is more
likely to reach v than the former robot, so the utility of v is decreased for
the former robot to prevent having both robots visiting v, as this would be a
replicated effort wasting resources. However, the utility is not set to zero because
robots do not communicate with each other and do not know their individual
intentions. Also, since each robot maintains its own set of GP hyperparameters
(see discussion below) and these will be different from each other, robots cannot
make absolute predictions about the intentions of other robots in the team.
Remark: one could imagine that after a robot has determined which vertex v
it will visit next, it could broadcast this information to other agents so that they
do not consider it anymore among their choices. However, this is not done for two
reasons. Fist, such additional communication would almost double the amount of
transmitted data, thus going against our effort to keep exchanged information at
a minimum. Second, because of the stochastic nature of the environment there is
no guarantee that a robot electing to visit a certain vertex will eventually reach
it and collect a sample. Hence we opt for the current approach where robots
share measured data only after they have reached and sampled a location.
Fig. 1. Benchmark scalar fields to be estimated. With reference to the travel budget
B, the length of the side edge is 5. In both instances the start vertex vs is in the lower
left corner and the goal vertex vg is in the top right corner.
– The random waypoint selection algorithm (RWP), which selects the next ver-
tex to visit at random among those still to be visited. Due to the nature of the
selection process, the ability to communicate during the sampling process is
immaterial. The RWP algorithm is often considered as a baseline comparison
in this type of tasks (see e.g., [2]).
– A non-coordinated (NC) approach, which selects the next sampling point
using the same criteria used by our proposed algorithm, but does not exchange
any information during the sample process, i.e., during the selection process
each agent only considers the samples it collected, but not those collected by
the other agents.
At the end of the mission, when all robots have reached vg , both RWP and
NC share all collected sensor observations and the MSE is computed after fitting
the GP using all data collected by all robots. This step is not necessary for Coord
because data is exchanged on the go, but it ensures that the MSE evaluation is
done fairly among the three algorithms. After the algorithm has selected the next
point to visit, all algorithms, including Coord, use the same planning engine, i.e.,
the one we proposed in [12]. Finally, both NC and Coord do refit of the GP and
update the parameters θ before computing r, while RWP does not do this step
because it does not use the current estimate to select the next point to visit.
All procedures were executed single-threaded in Python running on an Apple
M1 processor. All computations related to GP fitting and processing use the
scikit-learn library [5]. For both scalar fields considered in the tests, we varied
the number of agents (3,5,7,9), the budget B (10,15,20), and the parameter l0
(0.1,0.5,1). For each combination of parameters, twenty independent simulations
were performed, for a total of about 3500 executions.
Figure 2 shows the average MSE as a function of the number robots for
all algorithms. As expected, the trend is decreasing (i.e. improved prediction
accuracy) with diminishing returns as the number of robot grows. We can observe
that the proposed algorithm outperforms the others. Note that the range of
values for MSE in the two test cases is different because of the different values in
Distributed Estimation of Scalar Fields 475
Fig. 2. Average final map MSE for n=20 trials per algorithm. Error bars show ± one
standard deviation.
the underlying scalar fields. Next, in Fig. 3 we show the reconstructed scalar field
for the three algorithms with a budget of 20 and 5 robots. The red dots show the
locations where the samples were collected. Due to the random selection process,
RWP ends up collecting less samples before exhausting the budget and this leads
to an inferior estimate. NC and Coord, instead, collect more samples, but we
can see how Coord spreads them more uniformly and ultimately leads to a more
accurate estimate (see Fig. 1b for the ground truth). Similar results are observed
for the synthetic map, but are not displayed for lack of space. Table 1 provides
a more detailed numeric analysis of the performance of the three algorithms.
Specifically, we look at the number of unvisited locations as well as the number
of locations visited by more than one robot. These are two proxies for the MSE
metric, and lower values are desired in both cases. When the travel budget is 10,
the number of unvisited locations is similar for the three algorithms because with
limited budget the set of available choices before the budget is used is limited. As
the budget grows, we see that the Coord algorithm emerges as the algorithm with
less unvisited vertices, thus substantiating the claim that agents spread in the
environment in a more coordinated fashion. For the number of revisited locations,
RWP (as expected) always has the lowest number of revisited locations, due to
the completely random nature of the selection. However, when comparing NC
with Coord we see that the latter has always a lower number, again showing
that the agents better spread in the environment avoiding to revisit the same
places, thus ensuring that coordination leads to a better use of the robots are
mobile sensors.
Finally, in Fig. 4 we display how the choice of l0 , the lower-bound for the
length scale parameter l, impact the value of the MSE metric. The two panels
correspond to a budget of 15 and 20 respectively, and group the results for
different numbers of surveying robots. For 9 robots the impact is marginal, and
this is explained by the fact that with this many agents the team manages to
cover most of the environment during the mission. However, for budget of 15 and
476 L. Booth and S. Carpin
Fig. 3. Reconstructed scalar field soil moisture map with 5 robots and B = 20.
6 Conclusion
We have presented an approach to reconstruct a scalar field using a team of
robots performing distributed GP estimation. Through limited communication,
by exploiting the underlying properties of GPs, robots implicitly manage to dis-
perse thorough the domain and collect samples at locations leading to a more
accurate estimation. Our proposed approach has been extensively evaluated in
simulation and outperforms competing approaches.
References
1. Contal, E, Perchet, V., Vayatis, N.: Gaussian process optimization with mutual
information. In: Proceedings of the 31st International Conference on Machine
Learning, pp. 253–261 (2014)
2. Di Caro, G.A., Yousaf, Z., Wahab, A.: Map learning via adaptive region-based
sampling in multi-robot systems. In: Matsuno, F., Azuma, S.-I., Yamamoto, M.
(eds.) Distributed Autonomous Robotic Systems: 15th International Symposium,
pp. 335–348. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-92790-
5 26
3. Hollinger, G.A., Sukhatme, G.S.: Sampling-based robotic information gathering
algorithms. Int. J. Robot. Res. 33(9), 1271–1287 (2014)
4. Jadidi, M.G., Miro, J.V., Dissanayake, G.: Sampling-based incremental information
gathering with applications to robotic exploration and environmental monitoring.
Int. J. Robot. Res. 38(6), 658–685 (2019)
5. Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn.
Res. 12, 2825–2830 (2011)
6. Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning.
Adaptive Computation and Machine Learning. MIT Press, Cambridge, MA (2006)
7. Shamshirgaran, A., Carpin, S.: Reconstructing a spatial field with an autonomous
robot under a budget constraint. In: Proceedings of the IEEE/RSJ International
Conference on Intelligent Robots and Systems, pp. 8963–8970 (2022)
8. Srinivas, N., Krause, A., Kakade, S.M., Seeger, M.W.: Information-theoretic regret
bounds for gaussian process optimization in the bandit setting. IEEE Trans. Inf.
Theory 58(5), 3250–3265 (2012)
9. Stein, M.L.: Interpolation of Spatial Data: Some Theory for Kriging. Springer
Series in Statistics. Springer, New York (1999)
10. Sundaram, J.P.S., Du, W., Zhao, Z.: A survey on LoRa networking: research prob-
lems, current solutions, and open issues. IEEE Commun. Surv. Tutor. 22(1), 371–
388 (2019)
11. Suryan, V., Tokekar, P.: Learning a spatial field in minimum time with a team of
robots. IEEE Trans. Rob. 36(5), 1562–1576 (2020)
12. Thayer, T.C., Carpin, S.: Solving stochastic orienteering problems with chance con-
straints using Monte Carlo tree search. In: Proceedings of the IEEE International
Conference on Automation Science and Engineering, pp. 1170–1177 (2022)
478 L. Booth and S. Carpin
13. Thayer, T.C., Vougioukas, S., Goldberg, K., Carpin, S.: Routing algorithms for
robot assisted precision irrigation. In: Proceedings of the IEEE International Con-
ference on Robotics and Automation, pp. 2221–2228 (2018)
14. Thayer, T.C., Vougioukas, S., Goldberg, K., Carpin, S.: Multi-robot routing algo-
rithms for robots operating in vineyards. IEEE Trans. Autom. Sci. Eng. 17(3),
1184–1194 (2020)
A Constrained-Optimization Approach
to the Execution of Prioritized Stacks
of Learned Multi-robot Tasks
Gennaro Notomista(B)
1 Introduction
Learning complex robotic tasks can be challenging for several reasons. The
nature of compound tasks, made up of several simpler subtasks, renders it dif-
ficult to simultaneously capture and combine all features of the subtasks to be
learned. Another limiting factor of the learning process of compound tasks is
the computational complexity of machine learning algorithms employed in the
learning phase. This can make the training phase prohibitive, especially when
the representation of the tasks comprises of a large number of parameters, as it is
generally the case when dealing with complex tasks made up of several subtasks,
or in the case of high-dimensional state space representations.
For these reasons, when there is an effective way of combining the execution
of multiple subtasks, it is useful to break down complex tasks into building blocks
that can be independently learned in a more efficient fashion. Besides the reduced
computational complexity stemming from the simpler nature of the subtasks to
be learned, this approach has the benefit of increasing the modularity of the task
execution framework, by allowing for a reuse of the subtasks as building blocks
for the execution of different complex tasks. Discussions and analyses of such
advantages can be found, for instance, in [9,16,26,32].
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 479–493, 2024.
https://doi.org/10.1007/978-3-031-51497-5_34
480 G. Notomista
[2]
Optimal control Value functions
[20]
[17]
Prioritized execution Lyapunov functions
Fig. 1. Pictorial representation of the strategy adopted in this paper for the execution
of prioritized stacks of learned tasks.
The prioritized execution framework for learned tasks proposed in this paper
can be related to approaches devised for multi-task learning—a machine learn-
ing paradigm which aims at leveraging useful information contained in mul-
tiple related tasks to help improve the generalization performance of all the
tasks [35]. The learning of multiple tasks can happen in parallel (indepen-
dently) or in sequence for naturally sequential tasks [10,29], and a number
of computational frameworks have been proposed to learn multiple tasks (see,
e.g., [14,24,35], and references therein). It is worth noticing how, owing to its
constrained-optimization nature, the approach proposed in this paper is dual
to multi-objective optimization frameworks, such as [5,27] or compared to the
Riemannian motion policies [15,22,23].
Several works have focused on the composition and hierarchy of deep rein-
forcement learning policies. The seminal work [33] shows compositionality for
a specific class of value functions. More general value functions are considered
in [12], where, however, there are no guarantees on the policy resulting from
the multi-task learning process. Boolean and weighted composition of reward,
(Q-)value functions, or policies are considered in [11,19,34]. While these works
have shown their effectiveness on complex systems and tasks, our proposed app-
roach differs from them in two main aspects: (i) It separates the task learning
from the task composition; (ii) It allows for (possibly time-varying and state-
dependent) task prioritization, with task stacks that are enforced at runtime.
482 G. Notomista
where Lf0 Vi (x) and Lf1 Vi (x) are the Lie derivatives of Vi along the vector fields
f0 and f1 , respectively. The components of δ = [δ1 , . . . , δM ]T are used as slack
variables employed to prioritize the different tasks; γ : R → R is a Lipschitz con-
tinuous extended class K function—i.e. a continuous, monotonically increasing
function, with γ(0) = 0—κ > 0 is an optimization parameter, and K is the pri-
oritization matrix, known a priori, which enforces relative constraints between
components of δ of the following type: δi ≤ lδj , for l 1, which encodes the
fact that task i is executed at higher priority than task j.
In the following, Sect. 2.3 will be devoted to showing the connection between
dynamic programming and optimization-based controllers. In Sect. 3, this con-
nection will allow us to execute tasks learned using the RL paradigm by means
of a formulation akin to (2).
with J0 (x0 ) = 0, where x0 is the initial state, and gk (xk , uk ) is the cost incurred
at time k. The total cost accumulated along the system trajectory is given by
N
−1
J(x0 ) = lim αk gk (xk , uk ). (5)
N →∞
k=0
In this paper, we will consider α = 1 and we will assume there exists a cost-free
termination state.1
By Proposition 4.2.1 in [2] the value iteration algorithm (4) converges to J
satisfying
J (x) = min g(x, u) + J (f (x, u)) . (6)
u∈U (x)
1
Problems of this class are referred to as shortest path problems in [2].
484 G. Notomista
The optimal policy corresponding to the optimal value function (8) can be eval-
uated as follows [4]:
1 T
u = − (Lf1 J (x)) . (9)
2
In order to show how the optimal policy u in (9) can be obtained using an
optimization-based formulation, we now recall the concept of control Lyapunov
functions.
Definition 1 (Control Lyapunov function [30]). A continuously differ-
entiable, positive definite function V : Rn → R is a control Lyapunov function
(CLF) for the system (1) if, for all x = 0
inf Lf0 V (x) + Lf1 V (x)u < 0. (10)
u
2 T
where σ(x) = (Lf0 V (x)) + q(x)Lf1 V (x) (Lf1 V (x)) . This formulation shares
the same optimization structure with the one introduced in (2) in Sect. 2, and in
the next section we will provide a formulation which strengthens the connection
with approximate dynamic programming.
Multi-robot Multi-learned-Tasks 485
where I denotes
an identity
matrix of appropriate
size.
−1Substituting (16) in (15),
we get u = − fˆ1 (x)T 0 A−11 b0 and δ = κ I −K
1 T
A1 b0 .
To show the claimed stability property, we will proceed by a Lyapunov
argument. Let us consider the Lyapunov function candidate V (x) = 12 J(x) ˜ T
T
T
K K J˜ (x), where J˜ (x) = J˜1 (x) . . . J˜M (x) . The time derivative of V eval-
uates to:
∂V ˜
V̇ = ˜ T K T K ∂ J ẋ
ẋ = J(x)
∂x ∂x
∂ ˜
J
˜ T K T K
= J(x) f1 (x) u (by assumption 2),
∂x
fˆ1,λ (x)
where,
notice that fˆ1,λ (x) = Λ(x)fˆ1 (x) ∈ RM and Λ(x) = diag ([λ1 (x), . . . , λM (x)]).
By assumption 2, λi (x) ≥ 0 for all i, and therefore Λ(x) 0, i.e. Λ(x)
˜ T T ˆ ˜ T K T
is positive
semidefinite.
Then, V̇ = J(x) K KΛ(x)f1 (x)u = −J(x)
σ(x)
KΛ(x)Â , where
0
I + fˆ1 (x)fˆ1 (x)T −K T −1
 = fˆ1 (x) fˆ1 (x)T 0 κ 0 (17)
K KK T
Multi-robot Multi-learned-Tasks 487
Remark 2. The proof of Proposition 1 can be carried out even in case of time-
varying and state-dependent prioritization matrix K. Under the assumption that
K is bounded and continuously differentiable for all x and uniformly in time,
the norm and the gradient of K can be bounded in order to obtain an upper
bound for V̇ .
Remark 3. Even when the prioritization stack specified through the matrix K in
(14) is not physically realizable—due to the fact that, for instance, the functions
encoding the tasks cannot achieve the relative values prescribed by the prioriti-
zation matrix—the optimization program will still be feasible. Nevertheless, the
tasks will not be executed with the desired priorities and even the execution of
high-priority tasks might be degraded.
4 Experimental Results
In this section, the proposed framework for the execution of prioritized stacks
of tasks is showcased in simulation using a team of mobile robots. Owing to the
multitude of robotic units of which they are comprised, multi-robot systems are
often highly redundant with respect to the tasks they have to execute. Therefore,
they perfectly lend themselves to the concurrent execution of multiple prioritized
tasks.
Fig. 2. Snapshots (a–l) and plot of J˜1 , J˜2 , J˜3 , and J˜4 (m) corresponding to the
hexagonal formation control task and 3 go-to-goal tasks for robots 1, 2, and 3, respec-
tively, recorded during the course of a simulated experiment with a multi-robot system.
Robots are gray dots, connection edges between the robots used to assemble the desired
formation are depicted in blue, goal points are shown as red dots.
not be feasible to form a hexagonal formation while two robots are constrained
to be in two pre-specified arbitrary locations.
Figure 2 reports a sequence of snapshots and the graph of the value functions
encoding the four tasks recorded during the course of the experiment. Denoting
by Ti ≺ Tj the condition under which task Ti has priority higher than Tj , the
sequence of prioritized stacks tested in the experiment are the following:
⎧
⎪
⎨T2 , T3 , T4 ≺ T1 0 s ≤ t < 15 s
T 1 ≺ T 2 , T3 , T4 15 s ≤ t < 30 s (18)
⎪
⎩
T 1 ≺ T 2 ≺ T 3 , T4 30 s ≤ t ≤ 45 s.
Multi-robot Multi-learned-Tasks 489
The plot of the value functions in Fig. 2l shows how, for 0s ≤ t < 15s, since
the hexagonal formation control algorithm has lower priority compared to the
three go-to-goal tasks, its value function J˜1 is allowed to grow while the other
three value functions are driven to 0 by the velocity control input solution of
(14) supplied to the robots. For 15s ≤ t < 30s, the situation is reversed: the
hexagonal formation control is executed with highest priority while the value
functions encoding the three go-to-goal tasks are allowed to grow—a condition
which corresponds to the non-execution of the tasks. Finally, for 30s ≤ t ≤ 45s,
task T2 , i.e. go-to-goal task for robot 1 to goal point 1 is added at higher priority
with respect to tasks T3 and T4 . Since this is independent by task T1 , it can
be executed at the same time. As a result, as can be seen from the snapshots,
the formation translates towards the red point marked with 1. Tasks T1 and T2
are successfully executed while tasks T3 and T4 are not executed since are not
independent by the first two and they have lower priority.
4.2 Discussion
5 Conclusion
B Implementation Details
The results reported in Sect. 4 have been obtained using a custom value function
learning algorithm written in Python. The details of each multi-robot task are
given in the following.
Multi-robot Multi-learned-Tasks 491
Fig. 3. Comparison between the optimal controller (given in (9)), RL policy (based
on the approximate value function J˜ (8)), and optimization-based controller (solution
of (13) with V = J˜ ) employed to stabilize a double-integrator system to the origin
of its state space, i.e. driving both x1 and x2 to 0. As can be seen, when trained for
a sufficiently long time, the RL policy results in the optimal controller, which is also
equivalent to the optimization-based controller.
References
1. Antonelli, G.: Stability analysis for prioritized closed-loop inverse kinematic algo-
rithms for redundant robotic systems. IEEE Trans. Rob. 25(5), 985–994 (2009).
https://doi.org/10.1109/TRO.2009.2017135
2. Bertsekas, D.P.: Reinforcement Learning and Optimal Control. Athena Scientific
Belmont, MA (2019)
3. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press,
Cambridge (2004)
4. Bryson, A.E., Ho, Y.C.: Applied Optimal Control: Optimization, Estimation, and
Control. Routledge, New York (2018)
5. Bylard, A., Bonalli, R., Pavone, M.: Composable geometric motion policies using
multi-task pullback bundle dynamical systems. arXiv preprint arXiv:2101.01297
(2021)
6. Dulac-Arnold, G., Mankowitz, D., Hester, T.: Challenges of real-world reinforce-
ment learning. arXiv preprint arXiv:1904.12901 (2019)
7. Freeman, R.A., Primbs, J.A.: Control Lyapunov functions: new ideas from an old
source. In: Proceedings of 35th IEEE Conference on Decision and Control, vol. 4,
pp. 3926–3931. IEEE (1996)
8. Gerkey, B.P., Matarić, M.J.: A formal analysis and taxonomy of task allocation in
multi-robot systems. Int. J. Robot. Res. 23(9), 939–954 (2004)
9. Ghosh, D., Singh, A., Rajeswaran, A., Kumar, V., Levine, S.: Divide-and-conquer
reinforcement learning. arXiv preprint arXiv:1711.09874 (2017)
10. Gupta, A., et al.: Reset-free reinforcement learning via multi-task learning: learn-
ing dexterous manipulation behaviors without human intervention. arXiv preprint
arXiv:2104.11203 (2021)
11. Haarnoja, T., Pong, V., Zhou, A., Dalal, M., Abbeel, P., Levine, S.: Composable
deep reinforcement learning for robotic manipulation. In: 2018 IEEE International
Conference on Robotics and Automation (ICRA), pp. 6244–6251. IEEE (2018)
12. Haarnoja, T., Tang, H., Abbeel, P., Levine, S.: Reinforcement learning with deep
energy-based policies. In: International Conference on Machine Learning, pp. 1352–
1361. PMLR (2017)
13. Kaelbling, L.P.: The foundation of efficient robot learning. Science 369(6506), 915–
916 (2020)
14. Micchelli, C.A., Pontil, M.: Kernels for multi–task learning. In: NIPS, vol. 86, p. 89.
Citeseer (2004)
15. Mukadam, M., Cheng, C.A., Fox, D., Boots, B., Ratliff, N.: Riemannian motion
policy fusion through learnable lyapunov function reshaping. In: Conference on
Robot Learning, pp. 204–219. PMLR (2020)
16. Nachum, O., Gu, S., Lee, H., Levine, S.: Data-efficient hierarchical reinforcement
learning. arXiv preprint arXiv:1805.08296 (2018)
17. Notomista, G., Mayya, S., Hutchinson, S., Egerstedt, M.: An optimal task allo-
cation strategy for heterogeneous multi-robot systems. In: 2019 18th European
Control Conference (ECC), pp. 2071–2076. IEEE (2019)
Multi-robot Multi-learned-Tasks 493
18. Notomista, G., Mayya, S., Selvaggio, M., Santos, M., Secchi, C.: A set-theoretic
approach to multi-task execution and prioritization. In: 2020 IEEE International
Conference on Robotics and Automation (ICRA), pp. 9873–9879. IEEE (2020)
19. Peng, X.B., Chang, M., Zhang, G., Abbeel, P., Levine, S.: MCP: learning compos-
able hierarchical control with multiplicative compositional policies. arXiv preprint
arXiv:1905.09808 (2019)
20. Primbs, J.A., Nevistić, V., Doyle, J.C.: Nonlinear optimal control: a control Lya-
punov function and receding horizon perspective. Asian J. Control 1(1), 14–24
(1999)
21. Qureshi, A.H., Johnson, J.J., Qin, Y., Henderson, T., Boots, B., Yip, M.C.: Com-
posing task-agnostic policies with deep reinforcement learning. arXiv preprint
arXiv:1905.10681 (2019)
22. Rana, M.A., et al.: Learning reactive motion policies in multiple task spaces from
human demonstrations. In: Conference on Robot Learning, pp. 1457–1468. PMLR
(2020)
23. Ratliff, N.D., Issac, J., Kappler, D., Birchfield, S., Fox, D.: Riemannian motion
policies. arXiv preprint arXiv:1801.02854 (2018)
24. Ruder, S.: An overview of multi-task learning in deep neural networks. arXiv
preprint arXiv:1706.05098 (2017)
25. Sahni, H., Kumar, S., Tejani, F., Isbell, C.: Learning to compose skills. arXiv
preprint arXiv:1711.11289 (2017)
26. Schwartz, A., Thrun, S.: Finding structure in reinforcement learning. Adv. Neural.
Inf. Process. Syst. 7, 385–392 (1995)
27. Sener, O., Koltun, V.: Multi-task learning as multi-objective optimization. arXiv
preprint arXiv:1810.04650 (2018)
28. Singh, S.P.: Transfer of learning by composing solutions of elemental sequential
tasks. Mach. Learn. 8(3), 323–339 (1992)
29. Smith, V., Chiang, C.K., Sanjabi, M., Talwalkar, A.: Federated multi-task learning.
arXiv preprint arXiv:1705.10467 (2017)
30. Sontag, E.D.: A lyapunov-like characterization of asymptotic controllability. SIAM
J. Control. Optim. 21(3), 462–471 (1983)
31. Sontag, E.D.: A “universal” construction of Artstein’s theorem on nonlinear sta-
bilization. Syst. Control Lett. 13(2), 117–123 (1989)
32. Teh, Y.W., et al.: Distral: robust multitask reinforcement learning. arXiv preprint
arXiv:1707.04175 (2017)
33. Todorov, E.: Compositionality of optimal control laws. Adv. Neural. Inf. Process.
Syst. 22, 1856–1864 (2009)
34. Van Niekerk, B., James, S., Earle, A., Rosman, B.: Composing value functions
in reinforcement learning. In: International Conference on Machine Learning, pp.
6401–6409. PMLR (2019)
35. Zhang, Y., Yang, Q.: A survey on multi-task learning. IEEE Trans. Knowl. Data
Eng. 34, 5586–5609 (2021)
A Comparison of Two Decoupled
Methods for Simultaneous Multiple
Robots Path Planning
1 Introduction
Coordinated path planning of a group of mobile robots is a major requirement for
many cooperative tasks in interaction with a complex environment (e.g. unknown,
cluttered, dynamical). A high-level description of the problems related to either
collaborative or competitive robot coordination can be found in [1]. This coordi-
nation implies an inherent conflict of resources: in path planning the conflict is
mostly a space conflict because the robots move around each other, but it may
also be a conflict due to limited communication channels or a conflict of payload
if various robots manipulate the same one. Task allocation is a problem in itself
too, as well as the option of leaving the fleet management to a centralized author-
ity or allowing every robot to take decisions on their own. It is considered here that
the allocation of goal positions has already been performed and that the problem
consists in coordinating the robots’ simultaneous motion in the workspace.
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 494–508, 2024.
https://doi.org/10.1007/978-3-031-51497-5_35
Comparison of Decoupled Methods 495
When managing more than one robot, a solution can consist in applying
motion coordination [2] i.e. coordinating the robots online, as they are moving,
to prevent collisions from occurring. This can be achieved either through a cen-
tralized authority sending commands to the robots (centralized approach) or with
the robots communicating with each other based on their future displacements
(decentralized approach). Motion coordination can result in various solutions: (i)
a system of traffic rules; (ii) reactive approaches which can be based on poten-
tial fields with attractive forces towards goals and repulsive ones to stay away
from obstacles; (iii) the management of a swarm or fleet with a common objec-
tive. For example, coordinated control approaches have been previously used for
several applications that can be addressed by a fleet of autonomous aerial or
terrestrial robots [3]. To limit computational complexity, the online search of a
model predictive control input for each vehicle can be reduced to a discretized
set while still taking into account the predicted motion of the other robots, as
in [4]. However this strategy can be limited for autonomous robots moving freely
in a very cluttered environment, where in some configurations no feasible solu-
tion can be found. This is why graph-based and sampling-based path planning
methods should also be considered for simultaneous robot coordination [5,6]. In
the latter framework, a simple approach could consist in considering only the
current positions of the other robots when computing the path of each one, but
collision avoidance with the other robots is then not guaranteed. The objective
of the present work is to study path planning methods that can be applied to
multiple robots moving simultaneously in the same cluttered environment, with
an explicit handling of their motion interaction. For this purpose two decou-
pled sampling-based path planning algorithms derived from the A* procedure
are described in Sect. 3, and their performances are evaluated in a Monte Carlo
setup with a combination of map sizes, rate of static obstacles and number of
robots (see Sect. 4).
2 Related Work
Simultaneous motion planning for multiple robots comes with various difficulties,
such as the numerical capacity of planning for a high number of robots or large
workspaces, the consideration of robot dynamics as well as sensing constraints,
or the transition from numerical simulation to real robots experiments [2]. To
deal with this variety of possible issues, several motion planning paradigms have
been derived as described in the reference textbook [5], where in particular algo-
rithms can be classified as either sampling-based or combinatorial methods. In
Sampling-based motion planning, the idea is to sample the free space to conduct
the search for a path relying on a collision detection module, hence the explicit
construction of the obstacle space is avoided. In Combinatorial motion plan-
ning, also referred to as exact methods, paths are computed in the continuous
configuration space without resorting to sampling or discretization. Two main
categories emerge from the literature to address simultaneous path planning,
namely coupled approaches [6–8] and decoupled approaches [8–11]. In coupled
496 B. Bouvier and J. Marzat
planning, the problem can be seen as handling a single robot which concatenates
the states of the actually different robots. This increases the dimension of the
configuration space and complexity is known to rise exponentially in this num-
ber. Classical algorithms may then fail to find a solution or require too much
time (this has been verified in our evaluations, see Sect. 4.1). In decoupled path
planning, two subcategories can be considered [12]:
1. Robots are treated one by one. For every one of them, a path is calculated
considering all other robots as moving obstacles on top of the already-existing
static obstacles. Two steps are required, namely: a planning algorithm and a
prioritization rule. The planning algorithm is applied to obtain the path of
each robot to its respective goal, and then prioritization should favor those
with the highest cost (depending on time, distance, energy, etc.). For instance,
robots with the least cost could be penalized by having to bypass or make
detours.
2. Each robot has its own path, calculated independently of the others with only
the static obstacles taken into account. Then a coordination diagram should
plan the displacements to avoid collisions. Some authors mention a second
phase of velocity tuning (e.g. in [13]) to coordinate robots in parallel: the
relative speed between two robots can thus be tuned in the portion of path
subject to a risk of collision.
in [9,10]. The main ideas for coordinating the speed of robots having indepen-
dent goals along pre-defined paths have been presented in [8]. The so-called
coordination diagram based on a bounding box representation of the obstacles
has then been introduced in [17] to tackle the motion coordination of several
robots moving along fixed independent paths while avoiding mutual collisions.
Multi-robot path planning continues to be an active area of research with many
different approaches proposed in the past few years, based e.g. on reinforcement
learning [18,19], particle swarm optimization [20,21], graph neural networks [22]
and with the design of novel heuristics [23].
Several decoupled multi-robot coordination strategies have been defined and
evaluated in these previous works, relying either on priority-based successive
planning or on velocity adjustment along fixed paths. Two such decoupled algo-
rithms are defined in the present paper, one from each category, namely Pri-
oritized Planning (PP) and Fixed-Path Coordination (FPC). They are both
derived from the same A* framework with practical implementations, which are
detailed in pseudocodes. This allows to compare fairly the two approaches in the
same simulation setup using extensive Monte-Carlo evaluations on randomly-
defined workspace configurations, and as a result provide recommendations for
the motion coordination of multiple robots having independent goals in cluttered
environments.
3 Planning Algorithms
3.1 Problem Formulation
Fig. 1. Multiple robots path planning: robots A1 and A2 drive from their initial to final
configurations while avoiding two static obstacles and ensuring they do not collide.
The path planning problem consists in finding, for every robot Ai , a contin-
uous path τi : {[0; 1] → Cf ree ; s → τi (s)} where s parameterizes the path such
that τ (0) = qIi and τ (1) = qG
i
, and with an empty intersection with both Cobs and
i,j
Cmov obs for all j = i. The notion of moving obstacles requires to have a parame-
ter that can be used to determine where a given robot is located at some instant.
If all robots have the same speed, the cumulative distance could be used and s
would be sufficient. However, for robots with different speeds vi , time should be
considered explicitly in the problem definition. Therefore, a date is associated to
every configuration of the path. Instead of looking for the parameterized path τi
defined previously, we now look for a trajectory φi = τi ◦ σi considering a timing
function defined as σi : {T → [0; 1] ; t → s}, where t denotes time.
Each robot is assigned a vector of dates associated to its A* path, given its
nominal velocity vi specified by the user. For robot Ai , it will allow to interpo-
late the positions of all higher-priority robots Aj , 1 ≤ j < i, to detect possible
collisions. Collisions are defined as follows: the current robot Ai is considered as
a point, while all the robots with higher priorities Aj , 1 ≤ j < i, are considered
to occupy a disk of parameterized safety radius centered at the robot position
at the current date. If Ai finds itself inside or on the outer circle of any disk
representing Aj , a collision is detected and the corresponding node is declared
unfeasible. The pseudocode of the algorithm is given in Algorithm 1. It globally
consists in a loop with a modified A* procedure applied for each robot (modifi-
cations are indicated in blue). The dates in the list time vec (which is of same
length as the heuristic scores gscores ) are defined for the nodes present either in
the closedSet (which progressively contains the path) or in the openSet (which
contains nodes considered to be part of the path but eventually not selected).
The considered heuristic dist(nbg, n) is based on the Euclidean distance. In the
end, a trajectory (pathi ; datesi ) is obtained for each robot i. If the while loop
is exited with success = f alse, the path is extracted from the node of mini-
i
mal heuristic value instead of qG , therefore an incomplete path is obtained to
determine how far the robot can go before having to stop.
The core of the multi-robot layer lies in the robot-robot collision detection.
When managing static obstacles, it is sufficient and straightforward to check if
the intended configuration is inside or outside Cobs . When it comes to moving
obstacles (higher-priority robots), one has to consider the relative speed between
the robot and the obstacle as well as the obstacle width. Collision checks are
performed every dT seconds, given the relative speed vector and the obstacle
width. When moving from a current node position n to the intended one ngb,
the robot does not change its direction but the moving obstacle might. As a
consequence, care should be taken in selecting the dates and in calculating the
relative speed vector. Moreover, collision checks are performed with all robots of
higher priority (the sweeping is nevertheless stopped when a collision is detected
because one is sufficient). All these additional calculations can be costly so they
are run only if a static obstacle has not already been detected. Finally, a post-
check is conducted on the obtained paths and dates such that the coordination
is considered to have failed in the particular case where a lower-priority robot
has reached its destination and is blocking the path of a higher-priority robot.
500 B. Bouvier and J. Marzat
20 else
21 Get incomplete pathi and vector datesi from start to best reachable node;
22 end
23 Return pathi and datesi ;
24 end
Algorithm 1: Prioritized Planning (PP) pseudocode
4 Numerical Experiments
Extensive simulations have been run based on the following setup. The workspace
is a bounded subset of R2 with translational degrees of freedom. As shown in
Fig. 4a, the free space is sampled at regularly-spaced points (in green) which
form a grid with at most eight neighbors per node. Robots move along the edges
with the aim of traveling from their starting positions (black) to their final ones
(red). The number indicated next to every node is the date at which the robot
reaches the corresponding position.
Comparison of Decoupled Methods 503
Fig. 4. Illustrative test case with 3 robots successfully solved by PP (b) and FPC (c-d).
Arrival times are indicated around the nodes and motion command profiles (d) show
the stop-and-go behaviour of FPC.
Table 1. Monte-Carlo comparison (mean results on 1000 runs) for PP and FPC algo-
rithms
Comparison of Decoupled Methods 505
Fig. 5. PP and FPC success rates show little dependency on occupancy rate
unchanged. This is why Table 1 only mentions travel duration increase. Path
lengthening would be zero for FPC and equal to duration increase for PP because
all speeds are equal. The metric is computed only on multi-robot configurations
for which the paths were successfully modified by the corresponding algorithm.
The results show that FPC delays the robots much more than PP, with 18 to
57% of time increase on average compared to less than 6.1% for the smaller map
and always less than 1% on the bigger one. In many cases, bypassing a robot
has a limited impact on arrival time because various slightly different paths (and
sometimes ones of exact same length) lead to the goal. If two robots drive in
opposite directions, one may simply shift to a parallel lane with PP whereas in
FPC the smaller-priority agent will have to stop to give way.
5 Conclusion
Two path planning algorithms dedicated to multiple robots driving simultane-
ously in a cluttered workspace have been defined as variations of A* procedures
and compared in this paper. Both are based on a predefined order of priority
and assumptions of constant nominal velocities. The first one named Prioritized
Planning (PP) consists in bypassing higher-priority agents while continuously
driving. The second one named Fixed-Path Coordination (FPC) handles collision
risks along fixed paths by inserting pauses in order to let higher-priority robots
move past. The Monte Carlo simulation setup showed that directly superimpos-
ing single-robot planning paths can be successful in a number of configurations
with limited occupancy rates and fleet sizes. This could thus be a first step when
addressing a multi-robot coordination problem, where the new strategies can
then be called upon when this straightforward approach has failed. PP showed
good performances thanks to its flexibility in adapting the paths by following
alternative nodes. Satisfactory success rates and very limited travel duration and
distance increases were obtained. The proposed FPC version proved less effi-
cient because of the incapacity of diverting from the initial single-robot paths.
This strategy may only prove more efficient in extremely cluttered environments
where alternative nodes would be harder to find for PP. Most of the FPC failed
cases were due to initial and final positions conflicts, as they may prevent other
robots from reaching their goals. Moreover, although distance remains minimal
with FPC, travel duration grows significantly more than with PP. Free space
sampling maximization is of interest to improve the success rate of both single-
and multi-robot algorithms.
Future work directions include the evaluation of priority shuffles for both
PP and FPC algorithms and their extension to higher-dimensional workspaces,
which will allow to handle heterogeneous teams of robots. Additional comparison
criteria can also be considered, such as energy efficiency. Finally, the deployment
on embedded computers of real-world robots and the interaction with actual
perception and control modules would also make it possible to estimate the
computational load and performances in full autonomy loops on the field.
References
1. Yan, Z., Jouandeau, N., Cherif, A.A.: A survey and analysis of multi-robot coor-
dination. Int. J. Adv. Rob. Syst. 10(12), 399 (2013)
2. Parker, L.E.: Path planning and motion coordination in multiple mobile robot
teams. Encyclopedia of Complexity and System Science, pp. 5783–5800 (2009)
3. Hoy, M., Matveev, A.S., Savkin, A.V.: Algorithms for collision-free navigation of
mobile robots in complex cluttered environments: a survey. Robotica 33(3), 463–
497 (2015)
4. Bertrand, S., Marzat, J., Piet-Lahanier, H., Kahn, A., Rochefort, Y.: MPC strate-
gies for cooperative guidance of autonomous vehicles. Aerospace Lab J. 8, 1–18
(2014)
5. LaValle, S.M.: Planning Algorithms. Cambridge University Press, Cambridge
(2006)
6. Švestka, P., Overmars, M.H.: Coordinated path planning for multiple robots.
Robot. Auton. Syst. 23(3), 125–152 (1998)
7. Yu, J., LaValle, S.M.: Planning optimal paths for multiple robots on graphs. In:
2013 IEEE International Conference on Robotics and Automation, pp. 3612–3617.
IEEE (2013)
8. LaValle, S.M., Hutchinson, S.A.: Optimal motion planning for multiple robots
having independent goals. IEEE Trans. Robot. Autom. 14(6), 912–925 (1998)
9. Van Den Berg, J.P., Overmars, M.H.: Prioritized motion planning for multiple
robots. In: 2005 IEEE/RSJ International Conference on Intelligent Robots and
Systems, pp. 430–435. IEEE (2005)
10. Zheng, T., Liu, D., Wang, P.: Priority based dynamic multiple robot path plan-
ning. In: International Conference on Autonomous Robots and Agents. Massey
University, New Zealand (2004)
11. Kala, R.: Rapidly exploring random graphs: motion planning of multiple mobile
robots. Adv. Robot. 27(14), 1113–1122 (2013)
12. van Den Berg, J., Snoeyink, J., Lin, M.C., Manocha, D.: Centralized path planning
for multiple robots: optimal decoupling into sequential plans. Robot. Sci. Syst. 2,
2–3 (2009)
13. Sánchez, G., Latombe, J.C.: On delaying collision checking in PRM planning: appli-
cation to multi-robot coordination. Int. J. Robot. Res. 21(1), 5–26 (2002)
14. Saha, I., Ramaithitima, R., Kumar, V., Pappas, G.J., Seshia, S.A.: Implan: scalable
incremental motion planning for multi-robot systems. In: 2016 ACM/IEEE 7th
International Conference on Cyber-Physical Systems (ICCPS), pp. 1–10. IEEE
(2016)
15. Gregoire, J.: Priority-based coordination of mobile robots. Ph.D. thesis, Ecole
Nationale Supérieure des Mines de Paris (2014)
16. Čáp, M., Novák, P., Kleiner, A., Seleckỳ, M.: Prioritized planning algorithms for
trajectory coordination of multiple mobile robots. IEEE Trans. Autom. Sci. Eng.
12(3), 835–849 (2015)
17. Siméon, T., Leroy, S., Lauumond, J.P.: Path coordination for multiple mobile
robots: a resolution-complete algorithm. IEEE Trans. Robot. Autom. 18(1), 42–49
(2002)
18. Bae, H., Kim, G., Kim, J., Qian, D., Lee, S.: Multi-robot path planning method
using reinforcement learning. Appl. Sci. 9(15) (2019)
19. Wen, S., Wen, Z., Zhang, D., Zhang, H., Wang, T.: A multi-robot path-planning
algorithm for autonomous navigation using meta-reinforcement learning based on
transfer learning. Appl. Soft Comput. 110 (2021)
508 B. Bouvier and J. Marzat
20. Das, P.K., Jena, P.K.: Multi-robot path planning using improved particle swarm
optimization algorithm through novel evolutionary operators. Appl. Soft Comput.
92 (2020)
21. Tang, B., Xiang, K., Pang, M., Zhanxia, Z.: Multi-robot path planning using an
improved self-adaptive particle swarm optimization. Int. J. Adv. Robot. Syst. 17(5)
(2020)
22. Li, Q., Gama, F., Ribeiro, A., Prorok, A.: Graph neural networks for decentralized
multi-robot path planning. In: IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS), pp. 11785–11792 (2020)
23. Han, S.D., Yu, J.: Effective heuristics for multi-robot path planning in warehouse
environments. In: IEEE International Symposium on Multi-Robot and Multi-Agent
Systems (MRS), pp. 10–12 (2019)
Smarticle 2.0: Design of Scalable,
Entangled Smart Matter
1 Introduction
Smart robotic matter consists of large aggregates of programmable units capa-
ble of actuation, sensing, and intelligent response to their surroundings [1]. It
promises applications both as a physical test platform to gather further insights
on materials, colloids, and biological swarms [2,3], as well as, the ability to
act as a taskable, modular robot that can morph between shapes and change
properties as needed [4–6]. Demonstrations currently span swarms of carefully
linked, individually-capable modules [1,5], to loose aggregates of modules with
little-to-no individual mobility, sensing, or computation [2,7,8]. The latter is of
special interest in this article because system-wide complexity and functionality
can emerge from local interactions between large numbers of individually limited
components. However, it also involves significant challenges related to scalable
fabrication and operation, and methods for coordination and reliable autonomy.
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 509–522, 2024.
https://doi.org/10.1007/978-3-031-51497-5_36
510 D. Ma et al.
Fig. 1. A) Smarticle 2.0 modules, capable of transitioning between “I” and “U” shapes.
B) Wireless charging circuit. C) Entangled staples hanging from a ruler. D) 20 entan-
gled modules.
This paper presents the newest iteration of a Smarticle (smart active matter)
robot platform [2], specifically designed to support affordable and large-scale
operation, 3D entanglement, a variety of sensing modalities, and low barrier of
entry for novice users across scientific fields (Fig. 1). A Smarticle 2.0 module has
a long slender profile that can alternate between a staple-like shape (“U”) and
an oblong shape (“I”) using a single actuator, and has the ability to sense and
communicate acoustically, optically, and mechanically.
This article focuses on the details of the Smarticle 2.0 design (Sect. 3), fab-
rication (Sect. 4), and mechanical and sensor characterization (Sect. 5). How-
ever, we also show some exploratory multi-robot experiments with up to 10
robots (Sect. 6). To clarify the results presented, we recommend that the reader
watches the accompanying video (https://youtu.be/KrcyYz2ccdg). To the best
of our knowledge, this is the first demonstration of a smart matter platform
capable of large-scale, 3D entangled operation.
Smarticle 2.0 511
3 Electro-mechanical Design
Compared to other modular and swarming robots in literature (Table 1), the
Smarticles 2.0 have a relatively small form factor (2000×20×17mm3 ), very light
weight (20g), low cost (< 71USD/pcs) and assembly time (∼15min), making
it feasible to acquire and operate many in traditional research labs (Table 1).
Equally important, the modules have an optimal aspect ratio which is the key to
3D entanglement (l/w = 0.4− 0.75 [19]). Smarticle 2.0 can translate between “I”
and “U” shapes leveraging a non-backdrivable 4-bar crank and slider mechanism.
Modules also have the ability to sense and communicate optically, acoustically,
and mechanically. Finally, modules are powered by a Venom Fly 180mAh 3.7V
Lithium Polymer battery which can support reasonably long experiments (up to
200 appendage cycles).
The cost of a module breaks down as follows for a 15-piece order: the raw
PCB costs 22.1USD, the electronic components 25.53USD, the motor 14.99USD,
and the battery 6.49USD. (Note that the price of 3D printed parts is negligible
as they total 2.24g of material, and that the module price will drop with bulk
ordering)
The crank and slider mechanism dominates the module design, volume,
weight, and cost. It works by rotating a custom worm gear with two opposite
thread angles, which is translated to linear motion through two gear racks that
attach to raise or lower the appendages. To support this mechanism we chose a
commercially available, affordable, strong, small, and light weight motor; specif-
ically the 700:1 Sub-Micro Plastic Planetary Gearmotor (6mm diameter, 21mm
length, 1.3g) from Pololu.
Beyond the off-the-shelf motor and the battery, modules consist almost
entirely of FR4 and polyimide material, commonly known as rigid and flexi-
ble printed circuit boards (PCBs). We order these pre-soldered by Sandnwave
for ease; because these fabrication houses are located in China, we also circum-
vent long lead times due to the ongoing shipping crisis. Beyond the PCBs, the
module consists of a 3D printed bearing and motor holder which are snap-fit into
holes in the PCB. We use the Formlabs 2.0 and the Carbon 3D printers, with
Tough1500 and UMA90 material respectively, to produce the bearing, motor
holder, and worm gear. While the resolution of the prints are important, the
material itself only affects how quickly the parts wear down.
Figure 2 shows the breakdown of the mechanical parts as well as the top and
side view of a robot. On the PCB top, it mounts the worm gear and motor
pair, the bearing and motor holder, as well as the left and right sliders. The
bearing and motor holder not only secures the worm gear and motor during the
operation, but also help to guide the sliding motion of the gear racks. These
parts are fitted on the center body, connected to the left and right appendage by
4 flexible PCB link pieces. The battery is mounted on the back of the module.
Smarticle 2.0 513
4 Fabrication
Fabrication time and ease are both critical to support user-adoption of swarm
modular robot platforms. All electronic components can be mounted on the rigid
PCB either in- or out of house. The latter requires less in-house expertise and
labor, and is in fact economically advantageous due to the current shipping crisis.
Smarticle 2.0 515
5 Characterization
In this section we characterize essential properties of the Smarticle 2.0 mod-
ules, including their actuation profile (accuracy/longevity, lifting strength, and
holding strength) and sensor specifications (acoustic, orientation).
First, to test the repeatability and durability of the mechanical components,
a module was programmed to raise-, pause-, lower-, and pause the appendages
open-loop for 2 s-3 s-2 s-3 s, respectively. We repeated this behavior for 67min
continuously, amounting to more than 400 cycles. We found that even with-
out sensor feedback, the appendage motion is fairly symmetrical and accurate
within a standard deviation of less than 2◦ , and persists over many cycles. Specif-
ically, the left and right appendage minimum angle was found to 30.2±1.0◦ and
35.8±1.2◦ respectively; and the left and right appendage maximum angle was
found to 88.0±0.9◦ and 85.4±2.0◦ respectively.
Second, we analyzed the appendage lifting force, by combining the governing
equations for the crank and slider mechanism [22] and the worm gear force
translation [23]. These parameters are also shown in Fig. 4A. Geometrically, the
distance from the end of the connecting rod to the crank axle, x, and the angle
between the slider and the connecting rod, β, can be calculated as follows:
x = rcos(α) + l2 − r2 sin2 (α) (1)
x + rcos(π − α)
β = cos−1 ( ) (2)
l
516 D. Ma et al.
where α is the angle of the crank, r is the rotating crank length, and l is the length
of the connecting rod. The torque from the crank and slider on the appendage,
τ , depends on F , the linear force pulling on the slider:
τ = F rsin(α + β) (3)
Conversely, the motor stall torque, τstall , relates to the maximum linear force
the crank and slider mechanism is capable of delivering, estimated as:
2τstall πdmean − f psec(ζ)
F = ( ) (4)
dmean p + πf dmean sec(ζ)
Fig. 4. A) Left: Kinematic forces acting on the worm gear; Right: Single-appendage
view of the crank and slider mechanism, with design parameters. B) Lifting force of
a single appendage. The blue line is the theoretical prediction, red data points show
experimentally obtained values. C) Alternative design with a stronger motor. (Color
figure online)
where dmean and p is the mean thread diameter and pitch of the worm gear, and
ζ and f are the angle and amplitude of the normal force between the teeth and
the grooves in the (FR4-based) rack, respectively. We now combine Eqs. 4 and
3 and iterate over all appendage angles to estimate their lifting force (Fig. 4B).
Note that the model is inaccurate due largely to the loss of efficiency in the worm-
rack transmission and friction between the sliders and center body PCBs. We
partially make up for this inaccuracy by tuning the value of f to two sets of real
Smarticle 2.0 517
data points at low and high appendage angles gathered from the real modules.
We see that the mechanism is strongest (can lift the highest load) when the
appendage is near vertical, and vice versa. Practically, this means that when the
appendage is at a 90◦ angle, the appendage can lift 3.42 modules placed 40mm
out from the center body PCB; at 45◦ it can lift 1.7 modules; and at 10◦ it can
lift 0.33 modules.
The real strength of the chosen mechanism is that it is non-back-drivable.
This feature allow robots to maintain a joint angle without energy consumption,
independent of load size (provided that it does not exceed the breaking point). As
a rule of thumb, this property holds for worm gears with a lead angle smaller than
atan(μ), where μ is the coefficient of friction. Estimating that μ is comparable
to two glass plates sliding against each other (0.4), we get that the lead angle, λ
must be smaller than 21◦ - well beyond the 5.6◦ of our design. Practically, this
means that the module’s holding force is only limited by the bonding strength of
the solder joints that attach the appendages to the main PCB. In other words,
once in position, the module can hold a significant amount of weight without
necessitating continuous power. In the spirit of keeping our robots intact, we
have yet to perform destructive testing, but in preliminary tests we have loaded
appendages with over 15 modules without mechanical failure.
The Smarticle 2.0 design is also compatible with other off-the-shelf motors
such as the Micro Metal Gearmotors (cross Sect. 10 × 12 mm) from Pololu.
Taking a Pololu micro motor with gear ratio 298:1 (stall torque 20 OzIn) as an
example, it uses the same PCB design only and takes some adjustment on the
3D printed parts to fit the motor as shown in Fig. 4C. Assembling a Smarticle
2.0 with this motor follows the same steps as the Plastic Planetary Gearmotor
design and takes similar assembly time. The overall weight of a Smarticle 2.0 is
25 g and its total cost is slightly higher at 72.47USD. With this motor, when the
appendage is at a 90◦ angle, the appendage can lift 41.6 modules placed 40mm
out from the center body PCB; and at 10◦ angle, it can lift 2.04 modules. We
will explore the functionalities of a heterogeneous collectives in the future.
Finally, Fig. 5A-B shows a characterization of the onboard accelerometer and
buzzer-microphone pair. We see that rotation around the accelerometer x-axis
(roll) and y-axis (pitch), is accurately detected. We further see that at 50%
power using the onboard microphone and buzzer pair, two modules can signal
over 300mm range, equivalent to 1.5 the module body length. This value can
be tuned by adjusting the average buzzer power (the pulse width) of the carrier
signal.
6 Multi-robot Demonstrations
While this paper is focused mainly on describing the design of and characterizing
individual modules, there are many avenues for extension of this work. In this
Fig. 7. A-B) Smarticles sequentially activate each other through tactile cues.
period, thereby avoiding self-activation and only flapping the appendages once
upon external stimulus.
7 Summary
We presented and characterized a design optimized for scalable fabrication and
operation of robotic modules, capable of 3D entanglement by a range of scientific
users. This relatively new subfield of 3D entangled robots presents unique oppor-
tunities related to active matter and taskable modular robots as evidenced by
entangled social organisms in nature, but also unique research challenges related
to communication and consensus algorithms, reconfiguration, and mobility. In
future work, we aim to scale up the size of our collective and demonstrate more
complex multi-agent behaviors in both hardware and simulation. The design files
for these robots are available upon request from the authors.
like to acknowledge many fruitful discussions with Prof. Goldman at the Georgia Insti-
tute of Technology.
References
1. Goldstein, S.C., Campbell, J.D., Mowry, T.C.: Programmable matter. Computer
38(6), 99–101 (2005)
2. Savoie, W., Pazouki, A., Negrut, D., Goldman, D.: Smarticles: design and construc-
tion of smart particles to aid discovery of principles of smart, active granular mat-
ter. In: The First International Symposium on Swarm Behavior and Bio-Inspired
Robotics (2015)
3. Chvykov, P., et al.: Low rattling: a predictive principle for self-organization in
active collectives. Science 371(6524), 90–95 (2021)
4. Gardi, G., Ceron, S., Wang, W., Petersen, K., Sitti, M.: Microrobot collectives
with reconfigurable morphologies, behaviors, and functions. Nat. Commun. 13(1),
1–14 (2022)
5. Daudelin, J., Jing, G., Tosun, T., Yim, M., Kress-Gazit, H., Campbell, M.: An inte-
grated system for perception-driven autonomy with modular robots. Sci. Robot.
3(23), eaat4983 (2018)
6. Gilpin, K., Rus, D.: Modular robot systems. IEEE Robot. Autom. Mag. 17(3),
38–55 (2010)
7. Haghighat, B., Droz, E., Martinoli, A.: Lily: a miniature floating robotic platform
for programmable stochastic self-assembly. In: 2015 IEEE International Conference
on Robotics and Automation (ICRA), pp. 1941–1948. IEEE (2015)
8. Ceron, S., Kimmel, M.A., Nilles, A., Petersen, K.: Soft robotic oscillators with
strain-based coordination. IEEE Robot. Autom. Lett. 6(4), 7557–7563 (2021)
9. Wilson, N.J., Ceron, S., Horowitz, L., Petersen, K.: Scalable and robust fabrication,
operation, and control of compliant modular robots. Front. Robot. AI 7, 44 (2020)
10. Li, S., et al.: Particle robotics based on statistical mechanics of loosely coupled
components. Nature 567(7748), 361–365 (2019)
11. Warkentin, R., Savoie, W., Goldman, D.I.: Locomoting robots composed of immo-
bile robots. In: 2018 Second IEEE International Conference on Robotic Computing
(IRC), pp. 224–227. IEEE (2018)
12. Savoie, W., et al.: A robot made of robots: emergent transport and control of a
smarticle ensemble. Sci. Robot. 4(34), eaax4316 (2019)
13. Nilles, A., Ceron, S., Napp, N., Petersen, K.: Strain-based consensus in soft,
inflatable robots. In: 2022 IEEE 5th International Conference on Soft Robotics
(RoboSoft), pp. 789–794. IEEE (2022)
14. Gilpin, K., Knaian, A., Rus, D.: Robot pebbles: one centimeter modules for pro-
grammable matter through self-disassembly. In: 2010 IEEE International Confer-
ence on Robotics and Automation, pp. 2485–2492. IEEE (2010)
15. Swissler, P., Rubenstein, M.: Fireant3D: a 3D self-climbing robot towards non-
latticed robotic self-assembly. In: 2020 IEEE/RSJ International Conference on
Intelligent Robots and Systems (IROS), pp. 3340–3347. IEEE (2020)
16. Savoie, W., et al.: Phototactic supersmarticles. Artif. Life Robot. 23(4), 459–468
(2018)
17. Zhou, W., Gravish, N.: Density dependent synchronization in contact-coupled oscil-
lators. arXiv preprint arXiv:2012.07124 (2020)
18. Ozkan-Aydin, Y., Goldman, D.I., Bhamla, M.S.: Collective dynamics in entangled
worm and robot blobs. Proc. Natl. Acad. Sci. 118(6), e2010542118 (2021)
522 D. Ma et al.
19. Gravish, N., Franklin, S.V., Hu, D.L., Goldman, D.I.: Entangled granular media.
Phys. Rev. Lett. 108(20), 208001 (2012)
20. Woern, H., Szymanski, M., Seyfried, J.: The i-swarm project. In: ROMAN 2006-
The 15th IEEE International Symposium on Robot and Human Interactive Com-
munication, pp. 492–496. IEEE (2006)
21. Rubenstein, M., Ahler, C., Nagpal, R.: Kilobot: a low cost scalable robot system
for collective behaviors. In: 2012 IEEE International Conference on Robotics and
Automation, pp. 3293–3298. IEEE (2012)
22. Hartenberg, R., Danavit, J.: Kinematic Synthesis of Linkages. McGraw-Hill, New
York (1964)
23. Budynas, R., Nisbett, K.: EBOOK Shigley’s Mechanical Engineering Design 11e
in SI Units. McGraw-Hill Education (Asia) (2020)
24. Hu, D., Phonekeo, S., Altshuler, E., Brochard-Wyart, F.: Entangled active matter:
from cells to ants. Eur. Phys. J. Spec. Top. 225(4), 629–649 (2016)
25. Acebrón, J.A., Bonilla, L.L., Vicente, C.J.P., Ritort, F., Spigler, R.: The kuramoto
model: a simple paradigm for synchronization phenomena. Rev. Mod. Phys. 77(1),
137 (2005)
Hybrid Flock - Formation Control
Algorithms
1 Introduction
In recent years, Multi-Robot Systems (MRS) have gained importance, as coor-
dinated teams are able to perform tasks which would be unfeasible for single
robots. Additionally, their inherent redundancy introduces robustness to the
system, allowing it to successfully complete the overall mission objectives even
when some individual agents fail. Spatial coordination of MRS is crucial in a
wide range of applications, ranging from search and rescue missions over surveil-
lance or escorting missions to spacecraft formations. Existing algorithms that
tackle this challenge can be split into two main categories: those that manage
tight spatial topologies, most commonly referred to as formation control, and
those that manage loose ones, usually called flocking.
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 523–536, 2024.
https://doi.org/10.1007/978-3-031-51497-5_37
524 C. Baumann et al.
Flocking has its origin in nature, where such behavior achieves multiple
functionalities (e.g., energy benefits, navigation accuracy, protection from preda-
tors) and is present in all media (e.g., schools of fish and flocks of birds). This
behavior has been first engineered in [1] for the purpose of modeling the behavior
of birds, fish, etc. in computer animation. It has since been improved upon, but
the main principles have remained the same. The behavior of each individual
agent can be expressed as the vector sum of three complementary components:
separation, cohesion, and velocity matching. The result is a swarm-like behav-
ior with (in steady state) almost equal distances among the flock members, but
without any precise spatial formation. Additionally, formal methods for design-
ing flocking behavior have also been proposed, for example, in [2], where three
different kinds of lattices are leveraged to form triangle-like flocks, allowing for
splitting and rejoining, as well as squeezing maneuvers to avoid obstacles.
Formation control is an actively studied topic with a large amount of prior
work, including recent surveys such as [3]. There exist three main approaches
for addressing formation control: Leader-Follower, Virtual Structure (VS), and
Behavior-Based (BB) methods. In the LF approach, one agent is designated as
leader. All other agents, called followers, are then tasked with maintaining a
predefined relative position to the leader, resulting in formation shape retention
[4]. The VS approach [5] consists of three steps. First, the virtual structure (cor-
responding to the predefined formation) is aligned to the current positions of the
agents in order to minimize the distance from every agent to its corresponding
spot within the formation. Then the virtual structure is moved in the desired
direction of movement. In the third step, the agents are controlled towards their
corresponding positions in the shifted virtual structure. The BB approach is
based on defining several desired behaviors for every agent. The control is then
either a weighted average of the resulting motion vectors created by the indi-
vidual behaviors [6], a Finite State Machine (FSM) switching between them [7],
or a more elaborate combination of behaviors [8]. A BB approach results in a
collective response, fairly adaptive to different environmental situations. While
this represents the main advantage of this approach, one major drawback is that
the resulting spatial topology emerges from local interactions and is therefore
difficult to control it precisely.
In multiple related works, the formation control has subsequently been “loos-
ened” up, usually to account for obstacle avoidance. In [9] the formation is capa-
ble of splitting and rejoining. In [10] and [11] dynamical formation changes are
leveraged. In [12] a potential field augmented with formation transformation is
used. Deformable formations are also leveraged in [13].
On the flocking side, it is worth noting that, in comparison to the flocking
based on Reynold’s rules, flocking based on lattices (as in [2]) can be considered
significantly more rigid and thus closer to formation control.
However, the above approaches all remain faithful to their respective cate-
gories of algorithms. More specifically, to the best of our knowledge, no work
exists which is capable of shifting gradually from flocking to formation con-
trol or vice-versa. Yet, this gradual shift would enable a smooth transition
between the characteristics of both types of algorithm, with an infinite number of
Hybrid Flock - Formation Control Algorithms 525
2 Methodology
We consider a homogeneous group of N differential wheeled robots, tasked with
following a predefined trajectory while forming and maintaining spatial coordi-
nation between them. This spatial coordination varies from flock to formation
control depending on the scalar W.
The robot control is designed following the framework introduced and
described in detail in [16]. The control algorithm consists of the following seven
functional elements:
1. Pose estimation: each robot is equipped with both a local range-and-bearing
system and a receiver exploiting data from an emulated Global Navigation
Satellite System (GNSS). Global communication is then used to transmit
(and receive) GNSS-based position data. This effectively results in a global,
occlusion-less sensing of neighboring robots. All information is fused using an
Extended Kalman Filter.
2. Consensus on orientation: the formation orientation is not established a priori
but obtained online using Laplacian consensus.
3. Position allocation: a Hungarian algorithm [17] is used to assign each robot
the optimal position within the formation (minimizing the overall distances
between the robots and their respective target positions).
4. Coordination algorithm: the estimated relative positions of neighboring
robots (Xj with j ∈ [1, N ]) is used for both coordination algorithms. They
are specified in Sects. 2.1 and 2.2, respectively.
5. Group movement: the common goal is given as a temporal trajectory in a
global reference frame.
526 C. Baumann et al.
It is further worth noting that the motion vectors of elements 4, 5 and 6 (Xcoord ,
Xtraj and Xcol respectively) are summed to obtain the final motion vector.
Both coordination algorithms Pot2Lapl and Flock2Lapl use as formation con-
trol part of their respective algorithms a graph-based, Laplacian control algo-
rithm. Using the standard feedback method for a single integrator kinematic
model [18], we obtain for robot i:
1
N
żi = − Li,j zi,j + bi,j (1)
N j=1
with L the Laplacian matrix based on the underlying connectivity graph (which
is fully connected in this work), zi,j the position of robot j and bi,j the desired
position of robot j, both defined in the local coordinates of robot i. bi,j can
be calculated from the desired positions in global coordinates BN for a given
number of robots N . Following the reasoning in [16], we can express the motion
vector of the Laplacian control algorithm as:
with BN , dt numerical parameters. The numerical values used in this work are
summarized in Table 1. The resulting V-shape formation for N = 5 is depicted
in Fig. 3b.
2.1 Pot2Lapl
With ψ the potential function as defined below, dref the desired inter-robot
distance and Xivirt the position of the virtual agent associated with robot i
following a Laplacian-based control. That is, Xivirt = Xilapl .
Hybrid Flock - Formation Control Algorithms 527
To define ψ, we first define a base potential function ψbase (z) similar to [2]:
a + b a−b
ψbase (z) = 1 + (z + c)2 − 1 + c2 + z (4)
2 2
using c = |a−b|
√
2 ab
and a,b numerical parameters.
Similarly to [2], we further define a cut-off function ρ(z), in order to limit
the influence of distant neighbors:
⎧ dref
⎪
⎪0, if z < +δ
⎪
⎨ d
r
z− ref
r −δ dref
ρ(z) = 1 − 1
1 + cos π dref , if +δ ≤z ≤1 (5)
⎪
⎪
2 1− −δ r
⎪
⎩
r
1, otherwise
Krep , if z < 0
B(z) = (6)
Katt , otherwise
with dref , δ, Krep , Katt and rcut numerical parameters. Figure 1 shows the result-
ing ψ(z − dref ) and the corresponding gradient dψ dz (z − dref ) with the numerical
parameters used in this work.
2.2 Flock2Lapl
Flock2Lapl is based on the original Renyolds rules [1]. Intuitively, it uses non-
linear cohesion and separation vectors, extended with an attraction to a virtual
agent following a Laplacian formation control. Formally, we can write the target
motion vector X coord of robot i as
1
N
Xicoh = zi,j (9)
N j=1
B5 {(0, 0), (0,1.2), (0.4, 0.3), (0.4, 0.9), (0.8, 0.6)} [m] .. ..
. .
B10 {(0.8, 0), (0.8, 0.4), (0.8, 0.8), (0.8,1.2), (1.2, 0.2),
rcut 2 [m]
(1.2, 0.6), (1.2,1), (1.6, 0.4), (1.6, 0.8), (2, 0.6)} [m]
Krep 10 [-]
dt 50 [ms]
Katt 0.9 [-]
a 1 [-]
Kcoh 1 [-]
b 7 [-]
Ksep 0.15 [-]
dref 0.4 [-]
s 0.45 [m]
δ 0 [m]
rs 0.3 [m]
.. ..
. . W0 4 [-]
3 Experimental Evaluation
This section introduces the robotic platform used to demonstrate the algorithms,
the experimental setup for both simulated and real robot experiments, as well
as the evaluation metrics used.
and a maximum speed of ∼ 81 cm/s. They are further equipped with eight infrared
(IR) proximity sensors with a range of 20 cm that are used for obstacle avoidance.
As illustrated in Fig. 2, the robots are enhanced with a custom range-and-bearing
module [20] for relative localization and an active marker module featuring two
LEDs that allow accurate tracking through the SwisTrack software [21]. Commu-
nication between the robots is leveraging UDP/IP and is assumed to be error-free
and to have a range larger than the experimental arena.
The performance of the coordination algorithms are evaluated using four metrics
for every temporal instant t given the robots’ positions Zi with i ∈ N in a global
reference frame.
– Φdensity evaluates the density of the group with respect to an ideal, lattice-
like flock with dref between all robots. Based on circle packing, it compares
Ar , the enclosing circle of all robots, with Ao , the optimal enclosing circle for
the given number of robots.
1
Φdensity = (11)
N ||Ao − Ar ||
1
1+
– Φf orm captures the error between the positions of the robots and their respec-
tive attributed positions (ZA ) within the ideal formation (of shape as defined
by BN ), centered on ZF and using optimal position assignments.
530 C. Baumann et al.
1
Φf orm = argmin 1 N (12)
ZF 1+ N i=1 ||Zi − (ZA |ZF )||
– Φorder (adopted from [15]) evaluates the heading alignment of the robots:
1 Żi Żj
Φorder = (13)
N (N − 1) ||Żi || ||Żj ||
i=j
– Φsaf e (adopted from [15]) measures the risk of collision among robots:
ns
Φsaf e = 1 − (14)
N (N − 1)
with ns the number of occurrences where two robots are “too close”, that is,
closer than rs , a numerical parameter: ns = |(i, j) s.t. j = i & ||Zi − Zj || < rs |
We further denote Φ the respective average performances over the entire duration
of the experiment.
An 8-shaped trajectory has been chosen as group movement for all experi-
ments. For the experiments with obstacles, six square obstacles of 20 x 20 cm
are placed arbitrarily in the arena. We note that the positions of the obstacles
are kept constant over all experiments to increase comparability among them.
Figure 3a shows the corresponding obstacle positions used for both real and sim-
ulated experiments.
Fig. 3. Experimental arenas used in this work. The required V-shaped formation is
identical for both sets of experiments.
Environment W N
Webots without obstacles [0, 1, 2, 5] [5, 10]
Webots with obstacles [0, 1, 2, 5] 5
Real robots [0, 1, 2, 5] 5
Fig. 4. Experimental results for the Flock2Lapl algorithm, using both 5 and 10 robots
in simulation without obstacles (WB), using 5 robots in simulation with obstacles (WB)
and using 5 robots in reality without obstacles (RR). Lines show mean values with the
shaded areas representing standard deviation.
Fig. 5. Experimental results for the Pot2Lapl algorithm, using both 5 and 10 robots in
simulation without obstacles (WB), using 5 robots in simulation with obstacles (WB)
and using 5 robots in reality without obstacles (RR). Lines show mean values with the
shaded areas representing standard deviation.
Fig. 6. Experimental results for the narrow passage experiment using five robots in
simulation (WB) and reality (RR).
534 C. Baumann et al.
Figure 6 shows the results of the narrow passage experiments. For all metrics,
a one-way ANOVA revealed that there was a statistically significant difference
in the respective metric between at least two groups. However, Tukey’s HSD
test for multiple comparisons found that the mean value of the metrics differs
only between the simulated and real experiments, with the following exceptions:
for Flock2Lapl, statistical differences (p < 0.05) can be observed for φf orm for
real and φorder for simulated experiments. For Pot2Lapl, φdensity is statistically
different for real experiments (p < 0.01). For both algorithms, the traversing
time is statistically different for both simulated and real experiments (p < 0.05).
This indicates that our proposed algorithms successfully decreased traversing
time, demonstrating one possible use case. We note that the impact on φf orm
is limited by the averaging over the entire experiment. However, given a more
sensitive metric, we would expect this to be a trade-off between traversing time
and φf orm .
For all metrics in Fig. 6, relatively high standard deviations can be observed
for both simulated and real experiments. This is due to the high amount of
stochasticity present, mostly introduced by the random initialization of the
robots within the starting area, but also by sensor noise.
We also acknowledge that since this work focuses on the coordination algo-
rithms, the sensing capabilities of the robots are maximized. Naturally, further
work will be required to analyze the performances of the algorithms under more
realistic situations considering exclusively local sensing affected by occlusions.
5 Conclusion
In this work we introduced two hybrid flock - formation control algorithms. By
varying a single parameter W, both algorithms are able to change gradually
from a pure flocking behavior to a formation behavior and vice-versa. We then
characterized the behavior of the two algorithms using four dedicated metrics for
different numerical values of W in both simulated and real robot experiments.
We further introduced a simple method to compute W on-board, based on the
number of infrared sensors that detect an obstacle. By comparing this adaptive
algorithm to a non-adaptive formation control algorithm for a narrow passage
scenario, we find that the adaptive algorithm achieves a significant traversing
time reduction.
Appendix
A video of the experiments performed for this contribution can be found on our
research web page: www.epfl.ch/labs/disal/research/mrsmodelingcontrol.
Hybrid Flock - Formation Control Algorithms 535
References
1. Reynolds, C.W.: Flocks, herds and schools: a distributed behavioral model. In:
ACM Annual Conference on Computer Graphics and Interactive Techniques,
pp. 25–34 (1987)
2. Olfati-Saber, R.: Flocking for multi-agent dynamic systems: algorithms and theory.
IEEE Trans. Autom. Control 51, 401–420 (2006)
3. Liu, Y., Bucknall, R.: A survey of formation control and motion planning of mul-
tiple unmanned vehicles. Robotica 36(7), 1019–1047 (2018)
4. Wang, P.K.C.: Navigation strategies for multiple autonomous mobile robots mov-
ing in formation. J. Robot. Syst. 8(2), 177–195 (1991)
5. Tan, K.-H., Lewis, M.A.: Virtual structures for high-precision cooperative mobile
robotic control. In: Proceedings of IEEE/RSJ International Conference on Intelli-
gent Robots and Systems, vol. 1, pp. 132–139 (1996)
6. Balch, T., Arkin, R.: Behavior-based formation control for multirobot teams. IEEE
Trans. Robot. Autom. 14(6), 926–939 (1998)
7. Wang, H., Rubenstein, M.: Shape formation in homogeneous swarms using local
task swapping. IEEE Trans. Rob. 36(3), 597–612 (2020)
8. Antonelli, G., Arrichiello, F., Chiaverini, S.: The NSB control: a behavior-based
approach for multi-robot systems. Paladyn, J. Behav. Robot. 1, 48–56 (2010)
9. Monteiro, S., Bicho, E.: Attractor dynamics approach to formation control: theory
and application. Auton. Robots 29, 331–355 (2010)
10. Vilca, J., Adouane, L., Mezouar, Y.: Adaptive leader-follower formation in clut-
tered environment using dynamic target reconfiguration. In: Chong, N.-Y., Cho,
Y.-J. (eds.) Distributed Autonomous Robotic Systems. STAR, vol. 112, pp. 237–
254. Springer, Tokyo (2016). https://doi.org/10.1007/978-4-431-55879-8 17
11. Wasik, A., Pereira, J.N., Ventura, R., Lima, P.U., Martinoli, A.: Graph-based dis-
tributed control for adaptive multi-robot patrolling through local formation trans-
formation. In: IEEE/RSJ International Conference on Intelligent Robots and Sys-
tems, pp. 1721–1728 (2016)
12. Seng, W.L., Barca, J.C., Şekercioğlu, Y.A.: Distributed formation control of net-
worked mobile robots in environments with obstacles. Robotica 34(6), 1403–1415
(2016)
13. Soares, J.M., Aguiar, A.P., Pascoal, A.M., Martinoli, A.: A distributed formation-
based odor source localization algorithm - design, implementation, and wind tun-
nel evaluation. In: IEEE International Conference on Robotics and Automation,
pp. 1830–1836 (2015)
14. Falconi, R., Gowal, S., Martinoli, A.: Graph based distributed control of non-
holonomic vehicles endowed with local positioning information engaged in escort-
ing missions. In: IEEE International Conference on Robotics and Automation,
pp. 3207–3214 (2010)
15. Soria, E., Schiano, F., Floreano, D.: The influence of limited visual sensing on the
reynolds flocking algorithm. In: Third IEEE International Conference on Robotic
Computing, pp. 138–145 (2019)
16. Baumann, C., Martinoli, A.: A modular functional framework for the design and
evaluation of multi-robot navigation. Robot. Auton. Syst. 144, 103849 (2021)
17. Kuhn, H.W.: The Hungarian method for the assignment problem. Naval Res.
Logist. Q. 2(1–2), 83–97 (1955)
18. Mesbahi, M., Egerstedt, M.: Graph Theoretic Methods in Multiagent Networks.
Princeton Series in Applied Mathematics (2010)
536 C. Baumann et al.
19. Soares, J.M., Navarro, I., Martinoli, A.: The Khepera IV mobile robot: performance
evaluation, sensory data and software toolbox. In: Robot 2015: Second Iberian
Robotics Conference. AISC, vol. 417, pp. 767–781. Springer, Cham (2016). https://
doi.org/10.1007/978-3-319-27146-0 59
20. Pugh, J., Raemy, X., Favre, C., Falconi, R., Martinoli, A.: A fast onboard relative
positioning module for multirobot systems. IEEE/ASME Trans. Mechatron. 14(2),
151–162 (2009)
21. Lochmatter, T., Roduit, P., Cianci, C., Correll, N., Jacot, J., Martinoli, A.: Swis-
Track - a flexible open source tracking software for multi-agent systems. In:
IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 4004–
4010 (2008)
22. Michel, O.: WebotsTM: professional mobile robot simulation. Int. J. Adv. Rob.
Syst. 1, 39–42 (2004)
23. Tsai, R.: A versatile camera calibration technique for high-accuracy 3D machine
vision metrology using off-the-shelf TV cameras and lenses. IEEE J. Robot. Autom.
3(4), 323–344 (1987)
Multi-Robot Systems Research: A
Data-Driven Trend Analysis
1 Introduction
In the late 1940s, Walter [1] built two autonomous robot tortoises that displayed
not only impressive individual capabilities, including moving towards a light
source or into a station for self-charging, but also collective capabilities, described
by the author as “mutual recognition” or forming a “sort of community”. These
robots are one of the first physical realisations of multi-robot systems (MRSs).
Despite significant developments, especially over the past 35 years, MRSs
are only beginning to become commonplace in real-world applications. Numer-
ous surveys have been published on the topic of MRSs [2–5], and the reader is
referred to these for a comprehensive overview of the field. This paper seeks to
complement previous studies by providing the first data-driven analysis of the
trends in the field. It reports:
1. the quantitative findings of an exhaustive search of the MRS studies published
in leading robotics journals (in between 2010 and 2020); and
2. an analysis of the views of leading MRS experts.
The findings may provide insights to guide future research, for example, by
determining areas of development that require significant effort and the short-
term to long-term prospects for specific applications.
2 Exhaustive Search
This section presents the methods and findings from the exhaustive search.
All authors contributed equally to this work.
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 537–549, 2024.
https://doi.org/10.1007/978-3-031-51497-5_38
538 J. V. A. Marques et al.
2.1 Methodology
We use robot to refer to a machine that perceives, and acts in, an environment
in physically plausible ways. The environment can be understood as the set of
characteristics that define the space in which the robot resides. Studies employing
abstract environment/robot models are considered within scope where clearly of
relevance to real-world MRSs. For the purpose of the exhaustive search, we
restrict the scope to systems composed of mobile robots, that is, robots that are
able to change their position and orientation in their environment.
A multi-robot system can be defined as a group of robots that operate in the
same environment while exhibiting some form of interaction. When part of an
MRS, robots usually work collaboratively towards a common goal. This allows
them to address a more complex task, or perform a task more efficiently, than
when on their own. MRSs may have different levels of autonomy when deployed,
from being teleoperated to being fully autonomous.
We use an exhaustive search to target all documents published in years 2010–
2020 in 27 of the 28 “robotics” journals that are included in the 2020 Journal
Citations Report of the Web of Science Group.1 The documents include original
research articles, perspectives, and surveys, but exclude editorials, corrections
and other communications. Conference papers, although valuable to the robotics
field, were not considered due to the sheer volume of works and the lack of clear
criteria for which conferences to consider. For each document, we determine
whether it is within scope. Where a document is not within scope, no further
analysis is performed. For each document within scope, it is first rated in terms of
(i) the originality (up to 6 points) of the work that is reported, (ii) its rigour (up
to 4 points) and (iii) its significance (up to 5 points). The individual ratings are
then added to produce a relevance score. Although detailed criteria and training
are used to guide the assessors (two of the authors) and improve consistency in
evaluation, the ratings are finally subjective.
To reduce the impact of possible assessor bias (i.e. one assessor tending to
provide higher values than the other), samples of documents were marked by
both of the assessors—once at the beginning and twice during the exhaustive
search. From the results of these double-marking rounds, a consistent but small
bias can be observed. To counter this bias, the relevance scores that we report
have been adjusted by -0.15 and +0.20 respectively, such that the top 25% of
documents within scope for either assessor have a score of 6.00 or above. In the
following, these works are referred to as highly relevant.
1
The documents of one of the journals (International Journal of Robotics & Automa-
tion) could not be accessed in full and were hence excluded.
Multi-Robot Systems Research: A Data-Driven Trend Analysis 539
Fig. 1. Percentage of documents within scope (per journal). Portions in red correspond
to papers regarded as highly relevant. The numbers represent the total number of
papers identified to be within scope.
percentage of documents that are within scope for each journal. For eight jour-
nals, more than 10% of documents are within scope. The journals Swarm Intel-
ligence (SI) and Autonomous Robots (AURO) have the strongest focus on MRS
research.
Fig. 2. Distribution of the technology readiness levels for the documents within scope
(per journal).
Figure 2 shows the distribution of the technology readiness levels (TRL) that
we estimated for the systems reported in all documents within scope for each
journal (for a definition, see [6]). Most of the journals primarily report find-
ings with a TRL level of up to 4, considered as laboratory validation by the
assessors. The Journal of Field Robotics (JFR) and Revista Iberoamericana de
Automática e Informática Industrial (RIAI) stand out: 75% and 46% of docu-
ments report work of TRL level 5 or above, owing to these journals primarily
focusing on studies that have been validation via field experiments. It should be
noted that journals RIAI, Intelligent Service Robotics (ISR), JFR, The Inter-
national Journal of Robotics Research (IJRR), AURO and Robotica introduce
some documents evaluated with a TRL of 6, where the system was demonstrated
in a relevant environment.
540 J. V. A. Marques et al.
Figure 3 shows the percentage of documents within scope that validated their
research via laboratory and/or real-world experiments, as well as the percent-
age of documents that report formal, theoretical results. As expected, the JFR
presents the greatest proportion of documents with real-world validation. In
terms of theoretical validation, the journals IEEE Transactions on Robotics (T-
RO), IJRR, AURO, IEEE Robotics and Automation Letters (RA-L), Robotica,
and Journal of Intelligent & Robotic Systems (JINT) contain the largest propor-
tions of works reporting theoretical findings (journals with fewer than 10 works
being omitted from consideration of proportions).
This section presents a quantitative analysis of the trends observed based on the
data collected in the exhaustive search.
Figure 4 presents the number and percentage of documents that were within
scope for each year. Dashed lines show the trends, obtained through linear regres-
sion using the least squares method. The trend in number of documents on MRSs
suggests a substantial growth in the number of documents within scope, whereas
the trend in percentage of documents suggests that the fraction of documents
devoted to MRSs has decreased. A possible explanation is that robotics journals
tend to report on an increasingly diverse range of topics, some of which attracted
significant attention in recent years (e.g. soft-bodied and learning robots).
Figure 5 presents the evolution of the percentage of studies within scope that
perform theoretical, laboratory and real-world validation. Note that it is pos-
sible for a single study to feature for multiple types of validation. Overall, the
trends suggest the percentage of theoretical studies is increasing, while labora-
tory validation is shown to be decreasing and real-world validation is shown to
remain relatively constant over the years. Note however that the number of doc-
uments within scope is increasing over the years. As a result, the total amount
of validation is increasing as well, for both laboratory and real-world experi-
ments. Although theoretical validation is often based on strong assumptions, it
Multi-Robot Systems Research: A Data-Driven Trend Analysis 541
Fig. 4. Number and percentage of documents within scope (per year), as identified
by the exhaustive search. The dashed lines represent the trends obtained by linear
regression.
Fig. 5. Percentage of documents within scope using theoretical, laboratory and real-
world validation (per year), as identified by the exhaustive search. The dashed lines
represent the trends obtained by linear regression.
can offer a detailed understanding of the underlying mechanisms and allow for
a basic demonstration of the validity of the proposed solutions. The decreased
focus on laboratory validation may be attributed to increases in computational
capabilities, allowing more trustworthy and realistic simulations to be performed,
which may help substitute for laboratory validations.
3 Expert Survey
The expert survey is conducted to help better understand the state of the art and
possible future developments of MRSs. It captures the views of a representative
sample of internationally leading experts in the field of MRSs.
3.1 Methodology
The expert survey received ethics approval following a review by The University
of Sheffield (application reference 034028). It uses a questionnaire that includes
542 J. V. A. Marques et al.
3.2 Findings
Overall, 179 experts were invited to participate in the survey, including 119 men
(66.5%) and 60 women (33.5%).2 The invited experts were from the following
broad, geographic regions: Africa (10), America (62), Asia (35), Europe (66),
and Oceania (6), as defined by the Statistics Division of the United Nations3 .
A total of 68 participants completed the survey, corresponding to a partic-
ipation of 38.0% of the invited experts. Each participant provided consent to
participate in the survey. In this section, we analyse the findings.
Asia. The remaining participants were from Australia and New-Zealand (2.9%)
or did not disclose their geographic region (1.5%). Although the survey succeeded
in capturing the views of experts from a wide range of regions, some bias can
be observed, in particular towards participants from Europe. It should also be
noted that none of the experts reported Africa as their geographic region.
Most participants work in academia (77.9%), followed by industry (14.7%).
Participants who work in both academia and industry, or in other sectors, are
in the minority (2.9% for either case).
Three quarters of the participants have up to 20 years of relevant profes-
sional experience with half reporting 10 to 20 years (50.0%) and a quarter less
than 10 years (25.0%). Most of the remaining participants have between 20 and
30 years of relevant professional experience (19.1%). Two participants have more
than 30 years of relevant professional experience (2.9%).
Fig. 6. Distribution of the expert responses to the question of whether the following
are considered an MRS: (a) A group of robots stored on the same shelf, (b) Any robot
of which more than two copies exist, (c) A group of robots that can be stacked to ease
transportation, (d) A group of passive beads residing in a same environment, and being
externally controlled through changes in a magnetic field, (e) A robot transporting
another robot, (f ) A group of robots operating in different environments, and that do
not interact; each obtains positional information from the same global infrastructure,
(g) A group of conventional manipulators along a factory assembly line, working on the
same products though at different stages, (h) A group of robots operating in a same
environment, each being teleoperated by a dedicated operator; the operators may have
conflicting goals, though they always attempt to prevent collisions between robots,
(i) A robot composed of several modules that can autonomously rearrange the way
they are connected to assume a new shape, (j) A group of robots operating in a same
environment, each having its own goal, including not to collide with others, (k) A group
of robots operating in different environments and that learn from each other through
a cloud service, () A group of robots operating in a same environment and having a
common purpose. Statements ranked by weighted mean results.
The second question concerns the future deployments of MRSs. The experts
were asked the time frame in which they expect MRSs to be widely deployed
within their geographic region in various application scenarios. Figure 7 illus-
trates the distribution of the expected times for each of the application scenar-
ios. Almost all experts consider warehouse logistics as an achievable target in the
short term (0–5 years). The majority of experts consider surveillance and secu-
rity, manufacturing and entertainment as achievable targets in the short term
(0–5 years). These application scenarios have already fully embraced robotics
in multiple geographic regions. Transport and delivery, inspection and mainte-
nance, agriculture and forestry, education, mining, first response, and domestic
applications are considered achievable targets in the medium term (5–10 years)
by many of the experts. Space, medical applications, rehabilitation and care and
construction are expected to face a longer wait (> 10 years). Additionally, we
should notice that education has a moderate number of experts suggesting MRSs
may never be widely deployed or not be applicable.
Multi-Robot Systems Research: A Data-Driven Trend Analysis 545
Fig. 7. Distribution of the responses by experts to the question in which time frame
they expect MRSs to be widely deployed within their geographic region in the specified
applications. Applications ranked by weighted mean results.
Deploying MRSs in the real world involves solving some problems that are
equally present when deploying single robot systems. The third question specifi-
cally focuses on the transition from systems of single robots to systems of multi-
ple robots. It asks the expert to gauge how much additional effort this transition
requires by the robotics community with respect to a number of areas of develop-
ment. Figure 8 shows the distribution of the additional effort the experts consider
necessary for each of the suggested areas of development. In general, we observe
that all aspects would demand additional effort. Particularly, experimentation
under realistic conditions, autonomy/resilience, system integration, trustworthi-
ness, and end-user acceptance can be highlighted as the most difficult areas of
development, with the highest percentage of experts choosing ratings of either 5
or 6. On the contrary, issues such as materials, modelling, ethical concerns, and
simulation are considered less demanding in terms of additional effort, with the
highest percentage of experts choosing ratings of either 1 or 2. Regarding legal
aspects, the opinions are divided. Diversity and inclusion has been appraised as
“Undecided” by 29.4% of experts, possibly as the challenge is viewed by some
as the same irrespective of whether the system to be designed comprises one or
multiple robots.
The last question asks the experts how fast they expect the field of MRSs
to have grown in ten years from the time of the survey compared to the field
of robotics as a whole. Figure 9 presents the responses, suggesting a bimodal
distribution with a small peak at “slower” and a larger one at “faster”.
Fig. 8. Distribution of the expert responses to the question of how much additional
effort is required by the robotics community to move from single-robot system to MRS
in real-world applications with respect to various areas of development. The scale is
such that 1 corresponds to “insignificant”, whereas 6 corresponds to “unprecedented
(e.g. paradigm change required)”. Areas ranked by weighted mean results.
Fig. 9. Distribution of the expert responses to the question of how fast they expect
the field of MRSs to have grown in ten years from the time of the survey compared to
the field of robotics as a whole.
their attitude had not changed. The remaining participants presented a multi-
tude of aspects that had changed when considering their position towards MRSs.
The most frequently mentioned aspects include the focus of their work (27.3%),
in particular a shift from theoretical work to more practical applications, the
definition of MRS they used, to either include or exclude specific control archi-
tectures (centralised vs. decentralised), system size, etc., and the consideration of
new capabilities or applications to MRSs, such as machine learning, cloud com-
puting, and warehouse logistics. Although the outlook of most responses to this
question were positive, a small number of participants (6.8% of the participants
that reported a change in attitude) reported a negative position, questioning the
extent to which real-world applications will be impacted on by MRSs.
The next question asked participants to give examples of commercial applica-
tions other than warehouse logistics where MRSs are currently being deployed.
A total of 63.2% participants gave examples, with the most common being (i)
entertainment, for example, through the use of drones such as in light shows
and for filming, (ii) agriculture, for seeding, tending and harvesting of crops,
Multi-Robot Systems Research: A Data-Driven Trend Analysis 547
4 Conclusions
This paper reported a data-driven trend analysis of MRSs.
An exhaustive search of all papers published in 27 leading robotics journals
from 2010 to 2020 revealed (i) a substantial growth in the number of papers
Multi-Robot Systems Research: A Data-Driven Trend Analysis 549
on MRSs albeit the fraction of papers devoted to the topic decreased; (ii) for
half of the journals, the fraction of papers devoted to MRSs exceeded 5%; (iii)
six journals reported MRS research at TRL 6 or above; and (iv) a decline, in
relative terms, in the use of laboratory demonstrations while theoretical work
and real-world demonstrations remain in focus.
A survey of 68 leading experts revealed (i) that there is no consensus regard-
ing what constitutes an MRS; (ii) the applications where MRSs are expected to
be widely deployed in the short-, medium- and long-term; and (iii) the areas of
development that require the most effort to move from single-robot systems to
MRSs in real-world applications, including experimentation under realistic con-
ditions, autonomy/resilience, system integration, trustworthiness and end-user
acceptance. There was no consensus regarding how rapid the field will develop
over the next decade, with some expected a faster development while others
expected a slower development.
References
1. Walter, W.G.: The Living Brain. WW Norton, 1953
2. Dudek, G., Jenkin, M.R.M., Milios, E., Wilkes, D.: A taxonomy for multi-agent
robotics. Auton. Robot. 3(4), 375–397 (1996)
3. Farinelli, A., Iocchi, L., Nardi, D.: Multirobot systems: A classification focused on
coordination. IEEE Trans. Syst. Man, Cybern. Part B 34(5), 2015–2028 (2004)
4. Parker, L.E., Rus, D., Sukhatme, G.S.: Multiple mobile robot systems, in Springer
Handbook of Robotics. Springer, 2016, pp. 1335–1384
5. Rossi, F., Bandyopadhyay, S., Wolf, M., Pavone, M.: Review of multi-agent algo-
rithms for collective behavior: a structural taxonomy. IFAC-PapersOnLine 51(12),
112–117 (2018)
6. UK House of Commons, technology and innovation centres, science and technol-
ogy committee, Annex 1: Technology Readiness Levels, 2010. https://publications.
parliament.uk/pa/cm201011/cmselect/cmsctech/619/61913.htm#note221
7. American Society for Engineering Education, Engineering and engineering technol-
ogy by the numbers, 2019, Washington, DC
Coverage Control for Exploration
of Unknown Non-convex Environments
with Limited Range Multi-robot Systems
1 Introduction
The problem of coordinating multiple robots with the task to explore (and map)
an unknown environment has been widely studied in the last years. One of the
most widely adopted strategies, firstly described in [11], exploits potential fields
to generate collision-free trajectories. Every robot is treated as a positive ion,
while obstacles are assumed as positively charged and the goal as negatively
charged, thus resulting in every robot being driven to the goal while rejecting
both other agents and obstacles. Other control strategies can be derived from the
well known random walk approach, as the ones proposed in [10], where robots
move randomly within the environment. An alternative approach is the one pro-
posed by [18], where the authors study the problem of multi-robot exploration
and coverage with limited communication. They propose a decentralized algo-
rithm based on leaving marks on the unknown terrain, which can be sensed by
the robots and allow them to cover the terrain. The limited communication issue
has been considered also by [6], where the control of the multi-robot system relies
on the communication with a base station. In order to assist the robot cover-
age of the environment, [2] proposes an algorithm that deploys radio beacons
and, similarly, [3] proposes robots that, in the process of moving around, deploy
network nodes into the environment.
One of the most relevant and studied exploration methodologies is the frontier
based coordination approach [19]. The environment is discretized in standard
occupancy grids and a frontier cell consists in each already explored cell that
is an immediate neighbor of an unknown, unexplored cell. A mapping of the
environment is required, usually generated exploiting grid maps or graphs, to
keep updated the portion of the environment already explored and to localize
the frontier nodes positions [8].
Coverage based control strategies [5] coordinate a group of autonomous
robots with the aim of maximizing the coverage of the environment and gen-
erally exploit a Voronoi partitioning of the environment. The tools behind these
methodologies can be exploited to deal with the exploration of unknown environ-
ments, like in [1], where the Voronoi diagram is used to generate an obstacle-free
convex neighbourhood among strongly convex obstacles. A significant work is
presented by [9] where a frontier-based exploration methodology is integrated
with a Voronoi partitioning of the environment to avoid duplicated exploration
areas. In both of these works the boundary of environment and the obstacles
needs to be fully or partially known. A recent study presented in [17] deals with
the problem caused by updating a single global grid map managing only local
information. In [16] an exploration and navigation framework has been proposed
to explore GPS-denied environments using the cooperation between UAVs and
UGVs to increase the accuracy of the SLAM algorithm.
The mentioned studies and the majority of the approaches in multi-robot
exploration are based on a discretization of the environment, typically exploit-
ing occupancy grids. This approach is not suitable for large environments,
where the number of cells grows significantly, thus leading to severe compu-
tational and memory consumption. Moreover, these approaches require a strong
communication network in order to keep updated the shared map among the
robots [12]. A methodology that deals with communication issues has been pre-
sented in [14], where the total area is partitioned based on the topology exploiting
the Generalized Voronoi Graphs for exploration purposes. However, the robots
are constrained to move on the Voronoi diagrams edges. Moreover, this strat-
egy is mostly effective in environments structured in a way that naturally drives
robots during exploration, e.g. with corridors, losing effectiveness in more generic
spaces. An alternative and relevant approach to obtain a mapping and explo-
ration of large environments is presented by [4]. A Gaussian mixture model for
global mapping is used to model a complex environment. This approach, devel-
oped and tested for one robot monitoring, has the benefits to require a small
memory footprint and a low volume of communication. Following this strategy,
in this work, we propose a methodology that exploits a mixture of Gaussian dis-
tributions to coordinate a group of robots and explore an unknown environment.
Given the Gaussian function defined by only a few parameters and the lack of a
grid discretization of the environment, the strategy aims to an efficient memory
usage and a low communication data among robots.
552 M. Catellani et al.
2 Problem Statement
Consider a multi-robot system constituted by n holonomic robots able to move
in two dimensions, controlled with the objective of monitoring and exploring
an unknown environment. We assume each robot to be modeled as a single
integrator system,1 whose position pi ∈ R2 evolves according to pi = ui , where
ui ∈ R2 is the control input, ∀i = 1, . . . , n.
We consider the following assumptions:
1. Localization with respect to a global reference frame: each robot is able to
localize itself with respect to a global reference frame, which is shared among
the robots in the team;2
2. Limited sensing capabilities: each robot is able to measure the position of
neighboring robots and objects (including boundaries of the environment),
within its limited sensing range (defined by a circle with radius rsens ∈ R>0 ).
Based on these assumptions, the problem addressed in this paper is then for-
malized as follows:
1 We would like to remark that, even though the single integrator is a very simplified
model, it can still effectively be exploited to control real mobile robots: using a
sufficiently good trajectory tracking controller, the single integrator model can be
used to generate velocity references for widely used mobile robotic platforms, such
as wheeled mobile robots, and unmanned aerial vehicles.
2 This can be achieved, e.g., by means of Global Positioning System (GPS) or coop-
erative localization techniques, which are however out of the scope of the paper and
will not be addressed.
Coverage Control for Exploration of Unknown Non-convex Environments 553
Namely, the i-th cell contains all the points in the environment that are closer
to the i-th robot than to other robots, and that are contained within the sensing
radius of the i-th robot.
The centroid of the Voronoi cell, identified by CVi is then computed according
to the probability density distribution φ(q), that is:
∫
V
qφ(q)dq
CVi = ∫ i . (3)
V
φ(q)dq
i
Every agent in the environment must be able to access this data when q lies inside
its sensing range, thus the exchange of information can be achieved exploiting a
central storage unit or physically placing a marker or a beacon that each robot
Coverage Control for Exploration of Unknown Non-convex Environments 555
can easily localize. During the motion and the exploration, every robot places
several Gaussian functions in the environment. The combination of all the shared
Gaussian functions determines the motion of the multi-robot system. Every time
a new area is explored by a robot, a new Gaussian function ϕ(q), defined as in (4),
is added by the robot to the shared density distribution following the algorithm
described later (Algorithm 1), affecting the motion of the overall multi-robot
system. During the exploration of an unknown environment, we consider two
possible scenarios the robot can face: (1) the robot explores areas that need
to be crossed only once during the navigation, and (2) areas that need to be
crossed more than once to complete the exploration without deadlocks. Thus,
we define two types of Gaussian functions to be added by the robot following the
definition in (4): a time-constant Gaussian function and a time-varying Gaussian
function. The first one intuitively remains constant over time during the whole
exploration:
q − qi 2
fFi (q) = exp − . (5)
2σi2
Conversely, the time-varying Gaussian function decreases over time: for this
purpose, we make use of a forgetting factor k j (t) ∈ [0, 1], which gradually
decreases the intensity of the function over time and is defined as
t − T0 j
k j (t) = 1 − , (6)
T f or get
where T0 j is the time instant at which the Gaussian component is added and
T f or get represents how long the component should remain active. Thus, the time-
varying Gaussian function can be defined as
q − q j 2
fVj (q, t) = min exp − , k j (t) (7)
2σj2
where the forgetting factor k j (t), decreasing over time, lowers the peak of the
probability function until completely nullifying it at t = T0 j + T f or get .
The definition of two different functions allows to specify whether an explored
area needs to be crossed again multiple times or instead is needed to be visited
only once, depending on the specific application. This differentiation is carried
out marking explored areas that don’t need to be visited again with the function
fFi (q), while areas required to be crossed multiple times are marked with the
function fVj (q, t): time variance decreases the intensity of the Gaussian function
that models the area over time, allowing a specific region to be visited again
in the future by the system. In this way, the robots will continue monitoring
the environment even once the exploration is completed, keeping the gathered
information up to date. Moreover, time variance is useful to avoid local min-
ima which cause the robots to get stuck, making it difficult to achieve an entire
exploration of the environment. Every robot stores two lists of Gaussian func-
tions, time-varying and time-constant functions, whose combination generates
556 M. Catellani et al.
the density distribution perceived by the robot and exploited to compute the
control input. During the exploration process, the lists of Gaussian functions are
shared among the robots’ network. Each robot, during the motion, computes
the Gaussian components and communicates their parameters to the network.
Every robot’s control input is continuously updated following the changes on the
shared density distribution. The combination of the various different Gaussian
components is computed as follows:
c m
φ(q) = c + m + 1 − fF i (q) − fV j (q), (8)
i=1 j=1
The proposed algorithm has been implemented using ROS2 [13], and has been
validated by means of extensive simulations and real-world experiments. In par-
ticular, several simulation environments have been created using the Gazebo
simulator, representing wide areas populated by a large number of robots. Two
series of rigorous simulations have been conducted, the first one aiming for a
statistical analysis of the results and the performance of the proposed control
strategy, the other one to make a comparison with other strategies found in
literature. As for the first series of simulations, the initial number of robots
was randomly chosen, as well as their starting location in the environment. The
simulations conducted include configurations with 2, 4, 6 and 8 robots with
a sensing range rsens = 4m and a maximum linear velocity vmax = 0.5 m/s.
Moreover, various environment dimensions have been tested along with different
environment configurations and different obstacles locations. In particular, we
considered non-convex polygonal environments with an area of 200m2 , 400m2 ,
900m2 and 2500m2 . Two of the scenarios tested in the simulations are shown in
Fig. 2: they differ in number of robots, environment dimension and complexity.
For each experimental scenario, we ran 20 tests of 5 min of simulation: overall
320 tests were conducted. For each experiment we saved the robots locations
over time, useful to compute the portion of the covered surface.
The same statistical approach was followed to make a comparison with other
strategies. In this case, our solution was compared with the classic potential
field navigation algorithm [11] and a random walk solution presented in [7],
where robots drive with a constant straight motion, only changing their direction
558 M. Catellani et al.
when an obstacle is detected. As before, the same robots were used in teams of
6, 8, 10 and 12 agents placed in random starting positions within the same
2500m2 environment already implemented. 20 simulations were run for every
team configuration, for a total number of 80 simulations for each method.
We also conducted experiments on real robotic platforms exploiting the ROS2
framework. The tests took place in a limited size environment with polygonal
obstacles, using three Turtlebot3 Burger differential drive robots (see Fig. 3).
The robots localization with respect to a global reference system was obtained
through the Optitrack motion capture system. Every robot, following the control
algorithm, shares the locally added Gaussian function among the robots. We
conducted several tests randomly choosing the initial locations and orientations
of the robots.
Both in simulations and real world experiments, information defining the
Gaussian functions parameters were shared among the agents using a central stor-
age unit, where each robot is able to both save and retrieve data. In addition, we
considered non circulating areas such as corners or dead end streets as places that
need to be visited only once, marking them with time-constant Gaussian functions
fFi (q) defined in (5). Assuming that every robot is capable of detecting corners
in the environment adopting a state-of-the-art strategy such as the one proposed
in [20], it’s possible for the agents to understand when a non circulating area is
visited and consequently define a new time-constant Gaussian function fFi (q).
Fig. 2. Two of the scenarios tested in simulation exploiting Gazebo environment. The
two experiments differ in number of robots, environment dimension and complexity.
The blue circles represent the sensing areas of the robots.
Fig. 3. The experimental set-up of one of the experiments conducted with real robotic
platforms. Three Turtlebot3 burger robots were tasked to explore an unknown non-
convex area.
560 M. Catellani et al.
navigation are clear: this solution had a poor performance because of the large
environment used, which results in a great number of local minima where robots
got stuck. Even the comparison with the Random Walk approach proves our
solution to achieve a better performance: the multi-robot system was able to
explore a larger region of the environment thanks to the ability to coordinate
the agents in an optimal way.
Finally, real-world experiments have also proven very good exploration of the
environment: in the example in Fig. 6, where 3 robots were employed, the covered
area reached 90% of totality. This value is perfectly in line with expectations from
simulated trials, because the total area is much smaller than the one in simulated
environments, but less robots were employed.
Coverage Control for Exploration of Unknown Non-convex Environments 561
6 Conclusions
In this paper we presented a distributed coverage based control to coordinate
a group of autonomous robots with limited sensing capabilities with the aim
to explore an unknown environment. The proposed control strategy exploits a
density distribution to drive the multi-robot system towards areas of the envi-
ronment to be explored. The density distribution is build from the combination
of multiple Gaussian functions placed by the robots in the environment during
the motion. We show how the full exploration of the unknown environment can
be achieved without a discretization of the environment through graphs or grid
cells differently from the majority of the studies found in literature. Moreover,
the robots need to store and share among the network only the Gaussian param-
eters, which require a low communication volume and a low memory consump-
tion. Several experiments and simulations were performed on a group of mobile
robots to validate the performance of the proposed control method and make a
comparison with other solutions found in literature. As a future work we aim
to further investigate the proposed approach in comparison with frontier-based
exploration methodologies.
References
1. Arslan, O., Koditschek, D.E.: Sensor-based reactive navigation in unknown convex
sphere worlds. Int. J. Robot. Res. 38(2–3), 196–223 (2019)
2. Batalin, M.A., Sukhatme, G.S.: Coverage, exploration and deployment by a mobile
robot and communication network. Telecommun. Syst. 26(2), 181–196 (2004)
3. Batalin, M.A., Sukhatme, G.S.: The analysis of an efficient algorithm for robot
coverage and exploration based on sensor network deployment. In: Proceedings of
the 2005 IEEE International Conference on Robotics and Automation, pp. 3478–
3485. IEEE (2005)
562 M. Catellani et al.
4. Corah, M., O’Meadhra, C., Goel, K., Michael, N.: Communication-efficient plan-
ning and mapping for multi-robot exploration in large environments. IEEE Robot.
Autom. Lett. 4(2), 1715–1721 (2019)
5. Cortes, J., Martinez, S., Karatas, T., Bullo, F.: Coverage control for mobile sensing
networks. IEEE Trans. Robot. Autom. 20(2), 243–255 (2004)
6. De Hoog, J., Cameron, S., Visser, A.: Dynamic team hierarchies in communication-
limited multi-robot exploration. In: 2010 IEEE Safety Security and Rescue
Robotics, pp. 1–7. IEEE (2010)
7. Francesca, G., Brambilla, M., Brutschy, A., Trianni, V., Birattari, M.: Automode:
A novel approach to the automatic design of control software for robot swarms.
Swarm Intell. 8, 1–24 (2014)
8. Franchi, A., Freda, L., Oriolo, G., Vendittelli, M.: The sensor-based random graph
method for cooperative robot exploration. IEEE/ASME Trans. Mechatron. 14(2),
163–175 (2009)
9. Hu, J., Niu, H., Carrasco, J., Lennox, B., Arvin, F.: Voronoi-based multi-robot
autonomous exploration in unknown environments via deep reinforcement learning.
IEEE Trans. Veh. Technol. 69(12), 14413–14423 (2020)
10. Kegeleirs, M., Garzón Ramos, D., Birattari, M.: Random walk exploration for
swarm mapping. In: Althoefer, K., Konstantinova, J., Zhang, K. (eds.) TAROS
2019. LNCS (LNAI), vol. 11650, pp. 211–222. Springer, Cham (2019). https://doi.
org/10.1007/978-3-030-25332-5 19
11. Khatib, O.: Real-time obstacle avoidance for manipulators and mobile robots. In:
Cox, I.J., Wilfong, G.T. (eds.) Autonomous Robot Vehicles, pp. 396–404. Springer,
New York, NY (1986). https://doi.org/10.1007/978-1-4613-8997-2 29
12. Kim, J.: Cooperative exploration and networking while preserving collision avoid-
ance. IEEE Trans. Cybern. 47(12), 4038–4048 (2016)
13. Macenski, S., Foote, T., Gerkey, B., Lalancette, C., Woodall, W.: Robot operating
system 2: design, architecture, and uses in the wild. Sci. Robot. 7(66), eabm6074
(2022)
14. Masaba, K., Quattrini Li, A.: GVGExp: communication-constrained multi-robot
exploration system based on generalized Voronoi graphs. In: 2021 International
Symposium on Multi-Robot and Multi-Agent Systems (MRS). IEEE (2021)
15. Pratissoli, F., Capelli, B., Sabattini, L.: On coverage control for limited range multi-
robot systems. In: IEEE/RSJ International Conference on Intelligent Robots and
Systems (IROS) (2022)
16. Qin, H., et al.: Autonomous exploration and mapping system using heterogeneous
UAVs and UGVs in GPS-denied environments. IEEE Trans. Veh. Technol. 68(2),
1339–1350 (2019)
17. Ryu, H.: Graph search-based exploration method using a frontier-graph structure
for mobile robots. Sensors 20(21), 6270 (2020)
18. Senthilkumar, K., Bharadwaj, K.K.: Multi-robot exploration and terrain coverage
in an unknown environment. Robot. Auton. Syst. 60(1), 123–132 (2012)
19. Stachniss, C.: Coordinated Multi-Robot Exploration. In: Robotic Mapping and
Exploration. Springer Tracts in Advanced Robotics, vol. 55, pp. 43–71. Springer,
Berlin, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01097-2 4
20. Yan, C., Shao, B., Zhao, H., Ning, R., Zhang, Y., Xu, F.: 3d room layout estimation
from a single RGB image. IEEE Trans. Multimed. 22(11), 3014–3024 (2020)
Author Index
B G
Baumann, Cyrill 523 Garone, Emanuele 438
Belal, Mehdi 333 Groß, Roderich 537
Beltrame, Giovanni 57 Grover, Jaskaran 317
Bertoncelli, Filippo 333, 550
Bettini, Matteo 42 H
Blumenkamp, Jan 42 Harvey, David 69
Booth, Lorenzo 466 Hauert, Sabine 69, 173
Bouvier, Benjamin 494 Hogg, Elliott 69
Hsieh, M. Ani 83, 216, 347, 363
C
Caccavale, Riccardo 438 I
Cambier, Nicolas 187 Ishihara, Hisashi 100
Cao, Yuhong 202
Carey, Nicole E. 140 K
Carpin, Stefano 466 Karagüzel, Tugay Alperen 187
Catellani, Mattia 550 Kawakami, Masaru 424
Chen, Jiahe 509 Kontoudis, George P. 286
Chen, Jun 127 Kortvelesy, Ryan 42
Choset, Howie 113 Kumar, Vijay 408
Crosscombe, Michael 14 Kunii, Yasuharu 453
Cutler, Sadie 509
L
D Latif, Ehsan 243
Dames, Philip 127, 300, 378 Lawry, Jonathan 14
Defay, Jack A. 257 Lippiello, Vincenzo 438
Liu, Changliu 317
E Liu, Jiazhen 408
Edwards, Victoria 83, 363 Liu, Lantao 273
Eiben, A. E. 187 Lorente, María-Teresa 537
Ermini, Mirko 438
Erunsal, I. Kagan 156 M
Ma, Danna 509
F Maeda, Takao 453
Fedeli, Eugenio 438 Manjanna, Sandeep 347
© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2024
J. Bourgeois et al. (Eds.): DARS 2022, SPAR 28, pp. 563–564, 2024.
https://doi.org/10.1007/978-3-031-51497-5
564 Author Index
O T
Ogawa, Jun 424 Tavano, Fabrizio 438
Osuka, Koichi 100 Tourki, Emna 523
Otte, Michael 286 Tsunoda, Yusuke 100
P V
Pan, Lishuo 347 Ventura, Rodrigo 156
Parasuraman, Ramviyas 243 Vercellino, Cristina 1
Park, Shinkyu 127 Vielfaure, David 57
Perolini, Jonas 523 Villani, Valeria 1
Petersen, Kirstin 257, 509
Pierre, Jean-Elie 394
W
Pratissoli, Federico 333, 550
Watanabe, Yosuke 424
Prorok, Amanda 42
Werfel, Justin 140
Pushp, Durgakant 273
Wilson, James 173
R
Ramachandran, Ragesh 408 X
Ramchurn, Sarvapali 28 Xin, Pujie 300, 378
Rao, Ananya 113 Xu, Junhong 273
Ricard, Guillaume 57
Richards, Arthur 69 Y
Yin, Kai 273
S Yu, Xi 216
Sabattini, Lorenzo 1, 333, 550
Sakamoto, Kosuke 453 Z
Sartoretti, Guillaume 113, 202 Zhang, Yi 100
Sato, Toui 453 Zhou, Lifeng 408