Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Information Fusion 58 (2020) 13–23

Contents lists available at ScienceDirect

Information Fusion
journal homepage: www.elsevier.com/locate/inffus

Decentralised multi-platform search for a hazardous source in a turbulent


flow
Branko Ristic a,∗, Christopher Gilliam a, William Moran b, Jennifer L. Palmer c
a
RMIT University, GPO Box 2476, Melbourne, VIC 3001, Austalia
b
The University of Melbourne, Australia
c
Defence Science and Technology Group, Australia

a r t i c l e i n f o a b s t r a c t

Keywords: The paper presents a cognitive strategy that enables an interconnected group of autonomous vehicles (moving
Autonomous search robots) to search and localise a source of hazardous emissions (gas, biochemical particles) in a coordinated
Decentralised multi-sensor fusion manner. Dispersion of the emitted substance is assumed to be affected by turbulence, resulting in the absence of
Sequential Monte Carlo estimation
concentration gradients. The key feature of the proposed search strategy is that it can be applied in a completely
Sensor control
decentralised manner as long as the communication network of autonomous vehicles forms a connected graph.
Infotaxi
By decentralised operation we mean that each moving robot performs computations (i.e. source estimation and
robot motion control) locally. Coordination is achieved by exchanging the data with the neighbours only, in a
manner which does not require global knowledge of the communication network topology.

1. Introduction Both the up-flow motion methods and the concentration gradient
methods are simple in that they require only a limited level of spa-
The use of autonomous vehicles for environmental monitoring is of tial perception [14]. Their limitations manifest in the presence of tur-
great importance, especially when it involves dangerous missions, such bulent flows, due to the absence of concentration gradients, when the
as the search and localisation of toxic gas releases. Some recent sur- plume typically consists of time-varying disconnected patches (the ef-
veys containing a plethora of relevant references can be found in [1– fect known as intermittency). The information gain-based methods, re-
3]. Practical implementation of miniature airborne platforms for gas ferred to as infotaxis [15], have been developed specifically for search-
sensing tasks are explored in [4,5]. Existing theoretical approaches to ing in turbulent flows. In the absence of a smooth distribution of con-
the problem of searching and localisation of emitting sources, can be centration (e.g. due to turbulence), this strategy directs the searching
loosely divided into three categories: (i) up-flow motion methods, (ii) robot(s) towards the highest information gain. As a theoretically prin-
concentration gradient-based methods and (iii) information gain-based cipled approach, where the source-parameter estimation is carried out
methods. The first category mimics the behaviour of bacteria and insects in the Bayesian framework and the searching platform motion control
in their search for food and mates. For example, upon sensing an odour is based on the information-theoretic principles, the infotaxic (or cogni-
signal, male moths surge upwind in the direction of the flow, but when tive) search strategies have attracted a great deal of interest [16–27].
the odour information vanishes, they exhibit random cross-wind casting This paper focuses on an infotaxic coordinated search by a group of
or zigzagging until the plume is reacquired [6]. This class of methods autonomous vehicles (platforms) for an emitting source of hazardous
inspired robotic searches by various groups, e.g. [7–9]. substance dispersed by turbulent flow in open terrain environment. The
Concentration gradient-based methods, also referred to as chemo- search platforms are equipped with the appropriate sensors for sequen-
taxis, seek for the emitting source by following the positive local gra- tial measuring of: (a) the pollutant concentration and (b) the platform
dient of the chemical concentration [10]. These strategies are effective location. Due to the turbulent transport of the emitted substance, the
close to the source where the odour plume can be considered as a con- concentration measurements are typically sporadic and fluctuating. The
tinuous field. Since the source is at the maximum of the concentration searching platforms form a moving sensor network thus enabling the ex-
field, sometimes these methods are referred to as the extremum seeking change of data and a cooperative behaviour. The multi-robot infotaxis
algorithms [11–13]. have already been studied in [16,22,24,28,29]. However, all mentioned


Corresponding author.
E-mail addresses: branko.ristic@rmit.edu.au (B. Ristic), christopher.gilliam@rmit.edu.au (C. Gilliam), wmoran@unimelb.edu.au (W. Moran),
jennifer.palmer@dst.defence.gov.au (J.L. Palmer).

https://doi.org/10.1016/j.inffus.2019.12.011
Received 18 May 2018; Received in revised form 10 December 2019; Accepted 22 December 2019
Available online 26 December 2019
1566-2535/© 2019 Elsevier B.V. All rights reserved.
B. Ristic, C. Gilliam and W. Moran et al. Information Fusion 58 (2020) 13–23

references assumed all-to-all (i.e. fully connected) communication net- of encounters can be modelled as follows [15]:
work with a centralised fusion and control of the searching group. [ ] ( 𝑖 )
𝑖 𝑄0 (𝑋0 − 𝑥𝑖𝑘 )𝑈 𝑑𝑘 (𝐫0 , 𝐫𝑘𝑖 )
In this paper we propose a fully decentralised infotaxic coordinated 𝑅(𝜼0 , 𝐫𝑘 ) = ( ) exp ⋅ 𝐾0 (1)
ln 𝜆𝑎 2𝐷 𝜆
search. By decentralised operation we mean that each searching sensor
platform performs the computations (i.e. source estimation and platform
motion control) locally and independently of other platforms. Group where D, 𝜏 and U are known environmental parameters,

coordination for the sake of achieving the (common) task mission is
𝑑𝑘𝑖 (𝐫0 , 𝐫𝑘𝑖 ) = (𝑥𝑖𝑘 − 𝑋0 )2 + (𝑦𝑖𝑘 − 𝑌0 )2 (2)
carried out by exchanging the data only with neighbours, in a manner
which does not require global knowledge of the communication net- is the distance between the source and the ith sensor platform, K0 is the
work topology. Hence, the proposed approach is scalable in the sense modified Bessel function of the second kind of order zero, and
that the complexities for sensing, communication, and computing per √
𝐷𝜏
sensor platform are independent of the sensor network size. In addition, 𝜆= , (3)
1 + 𝑈4𝐷𝜏
2
because all sensor platforms are treated equally (no leader-follower hi-
erarchy), this approach is robust to the failure of any of the searching
depends on environmental parameters only.
agents. The only requirement for avoiding the break-up of the search-
The stochastic process of sensor encounters with the dispersed parti-
ing formation is that the communication graph of the sensor network
cles is modelled by a Poisson distribution. The probability that a sensor
remains connected at all times. Source parameter estimation is carried
at location 𝐫𝑘𝑖 encounters 𝑧 ∈ ℤ+ ∪ {0} particles (z is a non-negative in-
out sequentially, and on each platform independently, using a newly
teger) during a time interval t0 is then:
developed Rao-Blackwellised particle filter. Platform motion control, in
the spirit of infotaxis, is based on entropy-reduction and is also carried (𝜇𝑘𝑖 )𝑧 𝑖
(𝑧; 𝜇𝑘𝑖 ) = 𝑒 − 𝜇𝑘 (4)
out independently on every platform. The mathematical models of con- 𝑧!
centration measurements and platform motion are similar to those in where 𝜇𝑘𝑖 = 𝑡0 ⋅ 𝑅(𝜼0 , 𝐫𝑘𝑖 ) is the mean number of particles expected to
[28], which developed a search algorithm assuming the centralised ar- reach the sensor at location 𝐫𝑘𝑖 during interval t0 . The likelihood func-
chitecture and all-to-all communication network. In addition, the source tion of a concentration measurement 𝑧𝑖𝑘 collected by ith sensor is then
estimation algorithm in [28] was a batch algorithm - which would be 𝓁(𝑧𝑖𝑘 |𝜼0 ) = (𝑧𝑖𝑘 ; 𝜇𝑘𝑖 ).
impossible to use in a decentralised architecture. At the time of pub-
lication of this manuscript, another decentralised infotaxic search was
2.2. Motion model
reported in [30].
The paper is organised as follows. Mathematical models are pre-
Let the pose vector of the ith robot platform (𝑖 = 1, … , 𝑁) at time
sented in Section 2. The method for decentralised sequential estimation
tk be denoted 𝜽𝑖𝑘 = [(𝐫𝑘𝑖 )⊺ , 𝜙𝑖𝑘 ]⊺ , where 𝐫𝑘𝑖 = [𝑥𝑖𝑘 , 𝑦𝑖𝑘 ]⊺ has already been
of the source parameters is described in Section 3. Next, decentralised
introduced and 𝜙𝑖𝑘 is the vehicle heading. The group of searching vehi-
formation control of the robotic platforms is explained in Section 4. Our
cles moves in a formation. The centroid of the formation at time tk is
proposed approach is then evaluated in Section 5 using simulated and
specified by coordinates:
experimental data. Finally, conclusions are drawn in Section 6.
𝑁 𝑁
1 ∑ 𝑖 1 ∑ 𝑖
𝑥𝑐𝑘 = 𝑥 , 𝑦𝑐𝑘 = 𝑦 . (5)
2. Mathematical models 𝑁 𝑖=1 𝑘 𝑁 𝑖=1 𝑘

For each platform 𝑖 = 1, … , 𝑁, the offset (Δxi , Δyi ) from the centroid
Search can be described as a repetitive cycle of sensing, estimation, de-
(𝑥𝑐𝑘 , 𝑦𝑐𝑘 ) is predefined and known to it (i.e. 𝑥𝑖𝑘 = 𝑥𝑐𝑘 + Δ𝑥𝑖 , 𝑦𝑖𝑘 = 𝑦𝑐𝑘 + Δ𝑦𝑖 ).
cision making for motion control and actuation (the execution of motion
The measurements of concentration are taken at time instants tk ,
control). This section introduces the mathematical models of sensing, ve-
𝑘 = 1, 2, …. Between two consecutive sensing instants, each platform is
hicle motion and communication, necessary for the autonomy enabling
moving. Let the duration of this interval (referred to as the travel time) for
functions of estimation and motion control.
the ith platform be 𝑇𝑘𝑖 ≥ 0. The assumption is that sensing is suppressed
during the travel time.
2.1. Measurement model Motion of the ith platform during interval 𝑇𝑘𝑖 is controlled by linear
velocity 𝑉𝑘𝑖 and angular velocity Ω𝑖𝑘 . Given that the motion control vec-
We assume that each vehicle is equipped with a sensor for measur- tor 𝐮𝑖𝑘 = [𝑉𝑘𝑖 , Ω𝑖𝑘 , 𝑇𝑘𝑖 ]⊺ is applied to the ith platform, its dynamics during a
ing the concentration of the emitted substance. Let us adopt an open short integration time interval 𝛿 ≪ 𝑇𝑘𝑖 can be modelled by a Markov pro-
field terrain and a two-dimensional geometry, where the ith vehicle po- cess whose transitional density is 𝜋(𝜽𝑖𝑡 |𝜽𝑖𝑡−𝛿 , 𝐮𝑖𝑘 ) =  (𝜽𝑖𝑡 ; 𝛽(𝜽𝑖𝑡−𝛿 , 𝐮𝑖𝑘 ), 𝐐).
sition at time tk is denoted by 𝐫𝑘𝑖 ∈ ℝ2 . The concentration measurement The process noise covariance matrix Q captures the uncertainty in mo-
is modelled using a Lagrange encounters model developed in [15]. Sup- tion due to the unforseen disturbances. The vehicle motion function
pose that the emitting source is located at coordinates specified by the 𝛽(𝜽𝑖𝑡−𝛿 , 𝐮𝑖𝑘 ) is:
vector 𝐫0 = [𝑋0 , 𝑌0 ]⊺ and its release rate, or strength, is Q0 . The goal of

search is to estimate the source parameter vector 𝜼0 = [𝐫0 𝑄0 ]⊺ in the ⎡𝑉𝑘𝑖 cos(𝜙𝑖𝑘−1 )⎤
⎢ 𝑖 ⎥
shortest possible time. The particles released from the source propagate 𝛽(𝜽𝑖𝑡−𝛿 , 𝐮𝑖𝑘 )
= 𝜽𝑖𝑡−𝛿
+ 𝛿 ⎢ 𝑉𝑘 sin(𝜙𝑖𝑘−1 ) ⎥ + 𝐁𝑖𝑘−1 , (6)
with combined molecular and turbulent isotropic diffusivity D, but can ⎢ 𝑖 ⎥
⎣ Ω 𝑘 ⎦
also be advected by wind. The released particles have an average life-
[ 𝛿 ]⊺
time 𝜏 before being absorbed. Let the average wind characteristics be the 𝑖
where vector 𝐁𝑖𝑘−1 = 𝜖𝑥 𝑇 𝑖
𝑖
𝜖𝑦 𝑖𝛿
0 is introduced to compensate
𝑇 𝑘 𝑘
speed U and direction, which by convention, coincides with the direc-
for a distortion of the formation due to process noise with parameters:
tion of the x axis. Suppose a spherical concentration measuring sensor
of small radius a is mounted on the ith robot, whose position at time 𝜖𝑥𝑖 = 𝑥̄ 𝑖𝑘−1 − (𝑥𝑖𝑘−1 − Δ𝑥𝑖 )
k is1 𝐫𝑘𝑖 = [𝑥𝑖𝑘 , 𝑦𝑖𝑘 ]⊺ . This sensor will experience a series of encounters
𝜖𝑦𝑖 = 𝑦̄𝑖𝑘−1 − (𝑦𝑖𝑘−1 − Δ𝑥𝑖 ).
with the particles released from the emitting source. The average rate
Here 𝑥̄ 𝑖𝑘−1 and 𝑦̄𝑖𝑘−1 are the estimates of the coordinates of the forma-
1
Robot locations are assumed to be non-coincidental with the source location tion centroid at 𝑘 − 1 (that is of 𝑥𝑐𝑘−1 and 𝑦𝑐𝑘−1 , respectively) available
r0 . to the ith platform. Coordinates 𝑥𝑖𝑘−1 and 𝑦𝑖𝑘−1 refer to the known ith

14
B. Ristic, C. Gilliam and W. Moran et al. Information Fusion 58 (2020) 13–23

the communication graph is connected, after a sufficient number of iter-


ations (which depends on the topology of the graph), a complete list of
measurement trios (data) from all platforms in the formation, denoted
𝑘 = {(𝑥𝑖𝑘 , 𝑦𝑖𝑘 , 𝑧𝑖𝑘 )}1≤𝑖≤𝑁 , will be available at each platform.
Suppose the posterior density function of the source at discrete-
time 𝑘 − 1 and platform i be denoted 𝑝𝑖 (𝜼0 |1∶𝑘−1 ), where 1∶𝑘−1 ≡
1 , 2 , … , 𝑘−1 . Given 𝑝𝑖 (𝜼0 |1∶𝑘−1 ) and a new dataset 𝑘 , the prob-
lem of sequential estimation is to compute the posterior at time k, i.e.
𝑝𝑖 (𝜼0 |1∶𝑘 ). Using the Bayes rule, the posterior is
𝑔(𝑘 |𝜼0 ) 𝑝𝑖 (𝜼0 |1∶𝑘−1 )
𝑝𝑖 (𝜼0 |1∶𝑘 ) = (7)
∫ 𝑔(𝑘 |𝜼0 ) 𝑝𝑖 (𝜼0 |1∶𝑘−1 )𝑑 𝜼0

where 𝑔(𝑘 |𝜼0 ) is the likelihood function. Assuming that individual plat-
form measurements are conditionally independent, 𝑔(𝑘 |𝜼0 ) can be ex-
pressed as
𝑁
∏ 𝑁
∏ ( )
𝑔(𝑘 |𝜼0 ) = 𝓁(𝑧𝑖𝑘 |𝜼0 ) =  𝑧𝑖𝑘 ; 𝑄0 𝜌(𝐫0 , 𝐫𝑘𝑖 ) (8)
𝑖=1 𝑖=1

where

𝜌(𝐫0 , 𝐫𝑘𝑖 ) = 𝑡0 𝑅(𝜼0 , 𝐫𝑘𝑖 )∕𝑄0 (9)


Fig. 1. An example of a path of a formation of 𝑁 = 7 searching platforms at [ ] ( 𝑖 )
𝑡0 (𝑋0 − 𝑥𝑖𝑘 )𝑈 𝑑𝑘 (𝐫0 , 𝐫𝑘𝑖 )
𝑘 = 1, 2, 3. The communication graphs (based on established links between the = ( ) exp ⋅ 𝐾0 (10)
platforms) are indicated with green lines. (For interpretation of the references ln 𝜆𝑎 2𝐷 𝜆
to colour in this figure legend, the reader is referred to the web version of this
article.) is independent of Q0 .
We will compute the posterior density 𝑝𝑖 (𝜼0 |1∶𝑘 ) using the Rao-
vehicle position at 𝑘 − 1. Fig. 1 illustrates the trajectories of 𝑁 = 7 au- Blackwell dimension reduction scheme [32]. Using the chain rule, the
tonomous vehicles in a formation using the described transitional den- posterior can be expressed as:
sity 𝜋(𝜽𝑖𝑡 |𝜽𝑖𝑡−𝛿 , 𝐮𝑖𝑘 ). 𝑝𝑖 (𝜼0 |1∶𝑘 ) = 𝑝𝑖 (𝑄0 |𝐫0 , 1∶𝑘 ) ⋅ 𝑝𝑖 (𝐫0 |1∶𝑘 ) (11)
In the absence of process noise (i.e. 𝐐 = 0), the vehicles would move
in a perfect formation if (a) all control vectors are identical (i.e. 𝐮1𝑘 = where the posterior of source strength 𝑝𝑖 (𝑄0 |𝐫0 , 1∶𝑘 ) will be worked
𝐮2𝑘 = … = 𝐮𝑁 𝑘
), and (b) all headings are identical (i.e. 𝜙1𝑘−1 = 𝜙2𝑘−1 = … = out analytically, while the posterior of source position 𝑝𝑖 (𝐫0 |1∶𝑘 ) will
𝜙𝑖𝑘−1 ). In this case each platform would know the true coordinates of the be computed a using a particle filter.
formation centroid (i.e. 𝑥̄ 𝑖𝑘 = 𝑥𝑐𝑘 , 𝑦̄𝑖𝑘 = 𝑦𝑐𝑘 , for 𝑖 = 1, … , 𝑁) and hence the Following [33], we express the posterior 𝑝𝑖 (𝑄0 |𝐫0 , 1∶𝑘−1 ) with the
correction vectors 𝐁𝑖𝑘−1 would be zero. Gamma distribution whose shape and scale parameters are 𝜅𝑘−1 and
𝜗𝑘−1 , respectively. That is
2.3. Communication model
𝑝𝑖 (𝑄0 |𝐫0 , 1∶𝑘−1 ) = (𝑄0 ; 𝜅𝑘−1 , 𝜗𝑘−1 )
(𝜅 −1) −𝑄 ∕𝜗
Let us assume that a vehicle can communicate with another vehicle 𝑄0 𝑘−1 𝑒 0 𝑘−1
in the formation if their mutual distance is smaller than Rmax . Because = 𝜅 . (12)
𝜗𝑘𝑘−1
−1
Γ(𝜅𝑘−1 )
of process noise, the distance between the vehicles in the formation will
vary and hence the topology of the communication network graph can Since the conjugate prior of the Poisson distribution is the Gamma dis-
also vary. For simplicity we will assume that communication links (when tribution [34], the posterior 𝑝(𝑄0 |𝐫0 , 1∶𝑘 ) is also a Gamma distribution
established) are error free. Fig. 1 illustrates the communication graphs with updated parameters 𝜅 k and ϑk , i.e. 𝑝(𝑄0 |𝐫0 , 1∶𝑘 ) = (𝑄0 ; 𝜅𝑘 , 𝜗𝑘 ).
of a formation of platforms at three consecutive time instants. All three The computation of 𝜅 k and ϑk can be carried out analytically as a func-
communication graphs are connected2 , although the topology is time- tion of r0 and the measurement set 𝑘 = {(𝐫𝑘𝑖 , 𝑧𝑖𝑘 )}1≤𝑖≤𝑁 [33]:
varying.
𝑁

𝜅𝑘 = 𝜅𝑘−1 + 𝑧𝑖𝑘 , (13)
3. Decentralised sequential estimation
𝑖=1

We adopt the measurement dissemination based decentralised archi- 𝜗𝑘−1


𝜗𝑘 = ∑𝑁 . (14)
tecture [31], where the measurement locations3 and the corresponding 1 + 𝜗𝑘−1 𝜌(𝐫0 , 𝐫𝑘𝑖 )
𝑖=1
measured concentration values, i.e. the measurement trios (𝑥𝑖𝑘 , 𝑦𝑖𝑘 , 𝑧𝑖𝑘 ),
are exchanged via the communication network. The protocol is iter- The parameters of the prior for source strength, 𝑝(𝑄0 ) = (𝜅0 , 𝜗0 ) are
ative. In the first iteration, platform i broadcasts its trio to its neigh- chosen so that this density covers a large span of possible values of Q0 .
bours and receives from them their measurement trios. In the second, Next we turn our attention to the posterior of source position
third and all subsequent iteration, platform i broadcasts its newly ac- 𝑝𝑖 (𝐫0 |1∶𝑘 ) in the factorised form (11). Given 𝑝(𝐫0 |1∶𝑘−1 ), the update
quired trios to the neighbours, and accepts from them only the trios step of the particle filter using 𝑘 applies the Bayes rule:
that this platform has not seen before (newly acquired). Providing that
𝑔(𝑘 |𝐫0 , 1∶𝑘−1 ) 𝑝(𝐫0 |1∶𝑘−1 )
𝑝(𝐫0 |1∶𝑘 ) = (15)
𝑓 (𝑘 |1∶𝑘−1 )
2
A communication graph is connected if there exists at least one path between
any two nodes in the graph [31]. where 𝑓 (𝑘 |1∶𝑘−1 ) = ∫ 𝑔(𝑘 |𝐫0 , 1∶𝑘−1 ) 𝑝(𝐫0 |1∶𝑘−1 )𝑑𝐫0 is a normalisa-
3
Because the measurement locations are assumed to be known exactly, they tion constant. The problem in using (15) is that the likelihood function
will not be treated as random variables. 𝑔(𝑘 |𝐫0 , 1∶𝑘−1 ) is unknown; only 𝑔(𝑘 |𝜼0 ) of (8) is known. Fortunately,

15
B. Ristic, C. Gilliam and W. Moran et al. Information Fusion 58 (2020) 13–23

it is possible to derive an analytic expression for 𝑔(𝑘 |𝐫0 , 1∶𝑘−1 ) (see 4.1. Selection of individual control vectors
Appendix):
𝜅 𝑖 Platform 𝑖 = 1, … , 𝑁 autonomously decides on 𝐮𝑖𝑘 using the infotaxis
𝜗𝑘𝑘 Γ(𝜅𝑘 ) 𝑁
∏ 𝜌(𝐫0 , 𝐫𝑘𝑖 )𝑧𝑘 strategy [15], which can be formulated as a partially observed Markov
𝑔(𝑘 |𝐫0 , 1∶𝑘−1 ) = 𝜅 (16)
𝜗𝑘𝑘−1
−1
Γ(𝜅𝑘−1 ) 𝑖=1 𝑧𝑖𝑘 ! decision process (POMDP) [35]. The elements of POMDP are (i) the in-
formation state, (ii) the set of admissible actions and (iii) the reward
The Rao-Blackwellised particle filter (RBPF) fully describes the pos- function. The information state at time 𝑡𝑘−1 is the posterior density
terior 𝑝𝑖 (𝜼0 |1∶𝑘 ) by a particle system 𝑝𝑖 (𝜼0 |1∶𝑘−1 ); it accurately specifies the ith platform current knowledge
about the source position and its release-rate. Admissible actions can be
𝑘𝑖 ≡ {𝑤𝑚,𝑖
𝑘
, 𝐫0𝑚,𝑖 , 𝜅 𝑖 , 𝜗𝑚,𝑖 }
,𝑘 𝑘 𝑘 1≤𝑚≤𝑀
.
formed with one or multiple steps ahead. A decision in the context of
Here M is the number of particles, 𝑤𝑚,𝑖 is a (normalised) weight as- search is the selection of a control vector 𝐮𝑖𝑘 ∈  which will maximise
𝑘
sociated with the source position sample 𝐫0𝑚,𝑖 , while 𝜅𝑘𝑖 and 𝜗𝑚,𝑖 are the reward function. According to Section. 2.2, the space of admissi-
,𝑘 𝑘
the parameters of the corresponding Gamma distribution for the source ble actions  is continuous with dimensions: linear velocity V, angular
strength. Initially, at time 𝑘 = 0, the weights are uniform (and equal to velocity Ω and duration of motion T. In order to reduce the computa-
𝑚,𝑖
1/M), {𝐫𝑘, } are the points on a regular grid covering a specified search tional complexity of numerical optimisation,  is adopted as a discrete
0
set with only myopic (one step ahead) controls. In addition,  is time-
area, while 𝜅0𝑖 = 𝜅0 and 𝜗𝑚,𝑖 = 𝜗0 .
0 invariant and identical for all platforms. If 𝕍 , 𝕆 and 𝕋 denote the sets
Sequential computation of the posterior 𝑝𝑖 (𝜼0 |1∶𝑘 ) using the RBPF
of possible discrete-values of V, Ω and T, respectively, then  is the
is carried out by a recursive update of the particle system 𝑘𝑖 over time.
Cartesian product 𝕍 × 𝕆 × 𝕋 . The myopic selection of the control vector
While the steps of a single cycle of the RBPF presented in Algorithm 1 are
at time tk on platform i is expressed as:
{ [ ]}
Algorithm 1 A single cycle of the RBPF on platform i. 𝐮𝑖𝑘 = arg max 𝔼  𝑝𝑖 (𝜼0 |1∶𝑘−1 ), 𝑧𝑖𝑘 (𝐯) (17)
𝐯∈
1: Inputs:
where  is the reward function and 𝑧𝑖𝑘 is the future concentration mea-
2: ∙ 𝑘𝑖 −1 ≡ {𝑤𝑚,𝑖 , 𝐫 𝑚,𝑖 , 𝜅 𝑖 , 𝜗𝑚,𝑖 }
𝑘−1 0,𝑘−1 𝑘−1 𝑘−1 1≤𝑚≤𝑀
,
surement collected by the ith platform if the platform moved under the
3: ∙ 𝑘 = {(𝑥𝑘 , 𝑦𝑘 , 𝑧𝑖𝑘 )}1≤𝑖≤𝑁
𝑖 𝑖
control 𝐯 ∈  to position (𝑥𝑖𝑘 , 𝑦𝑖𝑘 ). In reality, this future measurement is
4: Compute 𝜅 𝑖 using (13) not available (the decision has to be made at time 𝑡𝑘−1 ), and therefore
𝑘
5: for 𝑚 = 1, … , 𝑀 do the expectation operator 𝔼 with respect to the prior measurement PDF
6: Compute 𝜗𝑚,𝑖 𝑘
using (14) and 𝐫0𝑚,𝑖 ,𝑘−1 features in (17).
7: Compute 𝑔(𝑘 |𝐫0𝑚,𝑖
,𝑘−1
, 1∶𝑘−1 ) using (16) and 𝜗𝑚,𝑖
𝑘 Previous studies of search strategies [16,23] found that the reward
8: Update weight: 𝑤̃ 𝑚,𝑖𝑘
= 𝑤𝑚,𝑖
𝑘−1
⋅ 𝑔(𝑘 |𝐫0𝑚,𝑖
,𝑘−1
, 1∶𝑘−1 ) function defined as the entropy reduction, results in the most efficient
9: end for search. Hence we adopt the expected reward defined as
∑ { [ ]}
10: Norm. 𝑤
𝑚,𝑖
= 𝑤̃ 𝑚,𝑖 ∕ 𝑀 𝑤̃ 𝑚,𝑖 , for 𝑚 = 1, … , 𝑀
𝑘 𝑘
(∑ 𝑚=1 𝑘 )−1 𝑖 = 𝔼  𝑝𝑖 (𝜼0 |1∶𝑘−1 ), 𝑧𝑖𝑘 (𝐯)
𝑀 𝑚,𝑖 2 ( )
11: Compute 𝑀eff = 𝑚=1 (𝑤𝑘 ) = 𝐻𝑘𝑖 −1 − 𝔼{𝐻𝑘𝑖 𝑧𝑖𝑘 (𝐯) } (18)
12: for 𝑚 = 1, … , 𝑀 do
where 𝐻𝑘−1 is the current differential entropy, defined as
13: if 𝑀eff < 𝑀∕2 then
14: Draw 𝑗 𝑚 ∈ {1, …, 𝑀} based on {𝑤𝑚,𝑖 }
𝑘 𝐻𝑘𝑖 −1 = − 𝑝𝑖 (𝜼0 |1∶𝑘−1 ) ln 𝑝𝑖 (𝜼0 |1∶𝑘−1 )𝑑 𝜼0 , (19)
15: 𝑤𝑚,𝑖 = 1∕ 𝑀 ∫
𝑘
𝑗 ,𝑖
16: 𝐫0𝑚,𝑖
,𝑘
← 𝐫0𝑚,𝑘−1 + 𝜀𝑚 while 𝐻𝑘𝑖 (𝑧𝑖𝑘 (𝐯)) ≤ 𝐻𝑘−1 is the future differential entropy (after a hypo-
17: Recompute 𝜗𝑚,𝑖 using (14) and 𝐫0𝑚,𝑖 thetical control vector v has been applied to collect 𝑧𝑖𝑘 ):
𝑘 ,𝑘
18: else
19: 𝐫0𝑚,𝑖 ← 𝐫0𝑚,𝑖 𝐻𝑘𝑖 (𝑧𝑖𝑘 (𝐯)) = − 𝑝𝑖 (𝜼0 |1∶𝑘−1 , 𝑖𝑘 (𝐯)) ln 𝑝𝑖 (𝜼0 |1∶𝑘−1 , 𝑖𝑘 (𝐯))𝑑 𝜼0 , (20)
,𝑘 ,𝑘−1 ∫
20: end if
21: end for where 𝑖𝑘 = (𝑥𝑖𝑘 , 𝑦𝑖𝑘 , 𝑧𝑖𝑘 ). The expectation operator 𝔼 in (18) is
22: Output: 𝑘𝑖 ≡ {𝑤𝑚,𝑖 , 𝐫0𝑚,𝑖 , 𝜅 𝑖 , 𝜗𝑚,𝑖 } with respect to the probability mass function 𝑃 {𝑧𝑖𝑘 |1∶𝑘−1 } =
𝑘 ,𝑘 𝑘 𝑘 1≤𝑚≤𝑀
∫ 𝓁(𝑧𝑖𝑘 |𝜼0 ) 𝑝𝑖 (𝜼0 |1∶𝑘−1 )𝑑 𝜼0 , that is:
( ) ∑ ( )
𝔼{𝐻𝑘𝑖 𝑧𝑖𝑘 (𝐯) } = 𝑃 {𝑧𝑖𝑘 |1∶𝑘−1 } ⋅ 𝐻𝑘 𝑧𝑖𝑘 (𝐯) . (21)
self explanatory, we only point out that Lines 14–17 perform resampling
𝑧𝑖𝑘
of particles. In order to diversify particles, a small jitter 𝜀m is introduced
in Line 16 to the positional particles. Given that 𝑝𝑖 (𝜼0 |1∶𝑘−1 ) is approximated by a particle system 𝑘𝑖 −1 ,
one can approximately compute 𝐻𝑘𝑖 −1 , which features in (18), as
4. Decentralised formation control 𝑀

𝐻𝑘𝑖 −1 ≈ − 𝑤𝑚,𝑖
𝑘−1
ln 𝑤𝑚,𝑖
𝑘−1
. (22)
In decentralised multi-robot search, each platform autonomously 𝑚=1
makes a decision at time 𝑡𝑘−1 about its next control vector (or action) ( )
In order to compute 𝔼{𝐻𝑘𝑖 𝑧𝑖𝑘 (𝐯) } of (21), first note that 𝑃 {𝑧𝑖𝑘 |1∶𝑘−1 } =
𝐮𝑖𝑘 . Selection of individual actions will be discussed in Section 4.1. Al- 𝑖 𝑖 𝑖
(𝑧𝑘 ; 𝜇̂𝑘−1 ), where 𝜇̂𝑘−1 is the predicted mean rate of chemical particle
though the same RBPF code is meant to be executed in parallel on every
encounters at location 𝐫𝑘𝑖 (where the platform i would move after apply-
platform, even if the same dataset 1∶𝑘 is available to all of them, the
ing a hypothetical control v), computed based on 1∶𝑘−1 . According to
particle systems on individual platforms 𝑘𝑖 , 𝑖 = 1, … , 𝑁, may not be the
Section 2.1,
same, because the local pseudo-number generators (used by particle fil-
𝑀

ters) could be different. As a result, in general, actions {𝐮𝑖𝑘 }1≤𝑖≤𝑁 can
𝜇̂𝑘𝑖 −1 ≈ 𝑤𝑚,𝑖 𝜅 𝑖 𝜗𝑚,𝑖 𝜌(𝐫0𝑚,𝑖
𝑘−1 𝑘−1 𝑘−1
, 𝐫𝑖 )
,𝑘−1 𝑘
(23)
be different. There is a need, therefore, to impose some form of coor-
𝑚=1
dination between the platforms in order to collectively maintain a pre-
scribed geometric shape of the multi-platform formation and thus avoid where the product 𝜅𝑘𝑖 −1 𝜗𝑚,𝑖
𝑘−1
approximates the source release rate as the
its break-up. Coordination will be discussed in Section. 4.2. mean of the Gamma distribution with parameters (𝜅𝑘𝑖 −1 , 𝜗𝑚,𝑖
𝑘−1
). Next we

16
B. Ristic, C. Gilliam and W. Moran et al. Information Fusion 58 (2020) 13–23

find the value of zmax as the minimum value of z′ such that the cu- Let us denote the scalar value of interest by bi , that is
∑′
mulative distribution function 𝑧𝑧=0 (𝑧; 𝜇̂𝑘𝑖 −1 ) is greater than a certain
𝑏𝑖 ∈ {𝑉𝑘𝑖 , Ω𝑖𝑘 , 𝑇𝑘𝑖 , 𝑥̄ 𝑖𝑘−1 , 𝑦̄𝑖𝑘−1 , 𝜙𝑖𝑘−1 }.
threshold 1 − 𝜖, where 𝜖 ≪ 1. The summation (21) is then computed only
( )
for 𝑧𝑖𝑘 = 0, 1, … , 𝑧max . Computation of 𝐻𝑘 𝑧𝑖𝑘 (𝐯) is carried out accord- Ideally we want every platform in the formation to compute the mean

ing to (22), except that 𝑤𝑚,𝑖 is replaced with 𝑤𝑚,𝑖 = 𝑤𝑚,𝑖 ⋅ (𝑧𝑖𝑘 ; 𝜇𝑘𝑚,𝑖 value 𝑏̄ = 𝑁1 𝑁 𝑖=1 𝑏𝑖 . If all platforms in the formation were to use identi-
𝑘−1 𝑘 𝑘−1 −1
)
where cal average values for motion control, centroid coordinates and heading,
then their motion would be coordinated (except for process noise, which
𝜇𝑘𝑚,𝑖 = 𝜅𝑘𝑖 −1 𝜗𝑚,𝑖 𝜌(𝐫0𝑚,𝑖 , 𝐫 𝑖 ).
−1 𝑘−1 ,𝑘−1 𝑘 will be taken care of through vector 𝐁𝑖𝑘−1 in (6)) and the shape of the
Thus (21) is approximated with formation would be maintained (provided Rmax is adequate).
[ 𝑀 ] Average consensus is an iterative algorithm. At iteration 𝑠 = 0, the
𝑧max
( ) ∑ ∑ 𝑚,𝑖 𝑚,𝑖 node in the communication graph (the robot platform) will initialise
𝑖 𝑖 𝑖
𝔼{𝐻𝑘 𝑧𝑘 (𝐯) } ≈ (𝑧; 𝜇̂𝑘−1 ) − 𝑤𝑘 ln 𝑤𝑘 (24)
𝑧=0 𝑚=1 its state bi (0) using either a component of vector 𝐮𝑖𝑘 (if bi is a motion
control parameter) or the platform pose 𝜽𝑖𝑘−1 (if bi is a formation centroid
Pseudo-code of the routine for the computation of control vector on coordinate or heading angle). This value is locally available. The initial
platform i is given by Algorithm 2. values of centroid coordinates are the actual ith platform coordinates,
i.e. 𝑥̄ 𝑖𝑘−1 (0) = 𝑥𝑖𝑘−1 and 𝑦̄𝑖𝑘−1 (0) = 𝑦𝑖𝑘−1 . At each following iteration 𝑠 =
Algorithm 2 Computation of 𝐮𝑖𝑘 . 1, 2, … , each platform updates its state with a linear combination of its
own state and the states of its current neighbours. Let us denote the set
1: Input: 𝑘𝑖 −1 ≡ {𝑤𝑚,𝑖 , 𝐫 𝑚,𝑖 , 𝜅 𝑖 , 𝜗𝑚,𝑖 }
𝑘−1 0,𝑘−1 𝑘−1 𝑘−1 1≤𝑚≤𝑀
,
of current neighbours of platform i by 𝑖 . Then [36]:
2: Compute 𝐻𝑘−1 using (22)
( )
3: Create admissible set  = 𝕍 × 𝕆 × 𝕋 | | 1 ∑
for every 𝐯 ∈  do 𝑏𝑖 (𝑠) = 1 − 𝑖 𝑏𝑖 (𝑠 − 1) + 𝑏 (𝑠 − 1) (25)
4: 𝑁 𝑁 𝑗∈ 𝑗
5: Compute the future platform location 𝐫𝑘𝑖 (𝐯) using (6) with 𝐁𝑖𝑘−1 = 0 𝑖

6: Compute 𝜇̂𝑘𝑖 −1 using (23) where |𝑖 | is the number of neighbours of platform i. This particular lin-
∑𝑧max
7: Determine 𝑧max s.t. 𝑧=0 (𝑧; 𝜇̂𝑘𝑖 −1 ) > 1 − 𝜖 ear combination is based on the so-called maximum degree weights [36].
𝑖
( 𝑖 )
8: Compute 𝔼{𝐻𝑘 𝑧𝑘 (𝐯) } using (24) Other weights can be also used. It can be shown that if the communica-
9: Calculate the expected reward 𝑖 using (18) tion graph is connected, the values bi (s) after many iterations converge
10: end for to the mean 𝑏̄ [36].
11: Find 𝐮𝑖𝑘 using (17) The search continues until the global stopping criterion is satisfied.
12: Output: 𝐮𝑖𝑘 The local stopping criterion is calculated on each platform indepen-
dently based on the spread of the local positional particles {𝐫0𝑚,𝑖
,𝑘
}, mea-
sured by the square-root of the trace of its sample covariance matrix
Ck . For example, if the spread of particles on platform i is smaller than
4.2. Cooperative control through consensus a certain threshold ϖ, then the local stopping criterion is satisfied and
is given a value of one, otherwise it is zero. This local stopping crite-
So far we have explained how platform i would independently of rion value (zero or one) becomes the initial state of the global stopping
the other platforms in the formation determine the best action for it- criterion on platform i, denoted 𝜎 i (0):
self, i.e. 𝐮𝑖𝑘 . In general, individual platforms may disagree on the best { √
1 if tr[𝐂𝑘 ] < 𝜛
action, and in the extreme 𝐮1𝑘 ≠ 𝐮2𝑘 ≠ … , ≠ 𝐮𝑁 𝑘
. This disagreement is un- 𝜎𝑖 (0) = (26)
0 otherwise
desirable because it can lead to the break-up of the multi-platform for-
mation. The break-up can have detrimental effects for two reasons: it The global stopping criterion is computed on each platform using the
can cause some platforms to crash into each other, and it can cause the average consensus algorithm, using (25), but with bi replace by 𝜎 i . After
loss of connectivity in the communication graph. Initially, at 𝑘 = 0, the a sufficient number of iterations, S, platform i decides to stop the search
formation is created in such a manner that its communication graph is if at least one of the platforms in the formation has reached the local
connected. The goal of cooperative control is to maintain the shape of stopping criterion, that is, if 𝜎 i (S) > 0.
the formation during the mission and thereby keep the communication
graph connected. For this to be achieved, during each motion period Remark. Both estimation and control are based on the consensus algo-
(from time 𝑡𝑘−1 to tk ), the platforms need to reach an agreement on the rithm. While the cooperative control is using the average consensus (25),
common action uk to be applied to all of them. But this is not sufficient - the decentralised measurement dissemination of Section 3 achieves the
according to the motion model in Section 2.2, the platforms also need to consensus on the set of measurements at time k. The consensus algo-
compute the formation centroid coordinates and agree on the common rithm is iterative and hence its convergence properties are very impor-
heading angle 𝜙𝑘−1 to be applied in (6). tant. First note that, although the network topology changes with time
We apply decentralised cooperative control based on the average (as the robots move while searching for the source), during the short in-
consensus [36,37] in order to achieve this goal. In a network of col- terval of time when the exchange of information takes place, the topol-
laborating agents, consensus is an iterative protocol designed to reach ogy can be considered as time-invariant. Furthermore, assuming bidi-
an agreement regarding a certain quantity of interest, by computing its rectional communication between the robots in formation, the network
global mean value via local computations. Suppose that every platform, topology can be represented by an undirected graph, The convergence of
as a node in the communication network, initially has an individual the consensus algorithm for a time-invariant undirected communication
scalar value. The goal of average consensus is for every node in the net- topology is guaranteed if the graph is connected [37–39]. Note that this
work to compute the average of initial scalar values, in a completely theoretical result is valid for an infinite number of iterations. In prac-
decentralised manner: by communicating only with the neighbours in tice, if the communication graph at some point of time is not connected,
the communication graph (without knowing the topology of the commu- or if an insufficient number of consensus iterations is performed, it may
nication graph). In our case, there is not only a single individual scalar happen that one or more robots are lost (they could re-join the forma-
value, but six of them. They include three motion control parameters, tion only by coincidence). This event, however, does not mean that the
i.e. for platform i, 𝑉𝑘𝑖 , Ω𝑖𝑘 and 𝑇𝑘𝑖 , two formation centroid coordinates, search mission has failed: the emitting source will be found eventually,
i.e. 𝑥̄ 𝑖𝑘−1 , 𝑦̄𝑖𝑘−1 and the heading angle of each platform 𝜙𝑖𝑘−1 . albeit by a smaller formation in possibly longer interval of time.

17
B. Ristic, C. Gilliam and W. Moran et al. Information Fusion 58 (2020) 13–23

Table 1
Source location estimates by in-
dividual robots at the end of
search. The true coordinates are
𝑋0 = −166, 𝑌0 = −147

robot i ̂0
𝑋 𝑌̂0

1 −167.59 −146.24
2 −167.53 −146.53
3 −168.74 −147.24
4 −167.49 −146.06
5 −168.02 −146.52
6 −167.07 −146.51
7 −167.90 −147.77
Fig. 3. An illustrative run of the decentralised multi-robot search: the prior (red
line) and the final posterior density (blue line) of the source strength parameter.
(For interpretation of the references to colour in this figure legend, the reader
is referred to the web version of this article.)

Fig. 4. The number of links in the communication graph over time.

5. Numerical results

5.1. An illustrative run

In order to illustrate the proposed decentralised multi-robot search,


let us consider the following scenario. All physical quantities are in
arbitrary units (a.u.). The search domain is a square, whose area is
500 × 500, centered at coordinates (0,0). The true source parameters
are 𝑋0 = −166, 𝑌0 = −147 and 𝑄0 = 5. The centroid of the multi-robot
formation is initially at coordinates 𝑥𝑐0 = 225, 𝑦𝑐0 = 240. The formation
is initially in the shape of a regular hexagon, whose side length is
20. There are 𝑁 = 7 robot platforms in the formation, one in the cen-
tre and six on the vertices of the hexagon. The initial heading angles
of the seven platforms are random, with mean −130◦ and standard
deviation of 2∘ . The process noise covariance matrix is diagonal, i.e.
𝐐 = diag[2.5 ⋅ 10−4 𝛿, 2.5 × 10−4 𝛿, 2.5 × 10−5 ], where the motion integra-
tion interval 𝛿 = 0.25. The communication distance is 𝑅max = 25, result-
ing in the communication network graph shown in Fig. 1 at 𝑘 = 1. Envi-
ronmental and sensor parameters are chosen as follows: 𝐷 = 1, 𝜏 = 250,
𝑈 = 0.25, 𝑎 = 1 and 𝑡0 = 1.
Algorithm parameters are selected as follows: 𝜅0 = 3, 𝜗0 = 5.2, 𝑀 =
252 , 𝕍 = {1}, 𝕆 = {−3, −2, −1, 0, 1, 2, 3} degrees per unit of time, 𝕋 =
{0.5, 1, 2, 4, 8, 16, 32, 64}. The number of iterations, both for the exchange
of measurement trios and in the consensus algorithm, is fixed to 30. The
local search stopping threshold is 𝜛 = 3.
The results obtained by an illustrative run of decentralised multi-
Fig. 2. An illustrative run of the decentralised multi-robot search - the source robot search are discussed next. Fig. 2 is the top-down view of the search
position particles {𝐫0𝑚,𝑖
,𝑘
} for platform 𝑖 = 1 at discrete-time (a) 𝑘 = 16; (b) 𝑘 = 43. area, displaying the trajectories of all seven searching robots as well as
the the source position particles {𝐫0𝑚,𝑖 ,𝑘
} for platform 𝑖 = 1 at step index
(a) 𝑘 = 16 and (b) 𝑘 = 43. The source location is marked by an asterisk,

18
B. Ristic, C. Gilliam and W. Moran et al. Information Fusion 58 (2020) 13–23

Fig. 7. Simulation results for the decentralised multi-robot search when using
Fig. 5. Simulation results for the decentralised multi-robot search when using 𝑁 = 1, 7, 19 and 37 searching platforms. Graph (a) shows the mean search time
𝑀 = 152 , 202 , 252 , 302 and 502 particles in the searching algorithm. Graph (a) and (b) shows the mean absolute error in the estimation of the source position.
shows the mean search time and (b) shows the mean absolute error in the esti- The error bars correspond to the 5th and 95th quantiles and all results are ob-
mation of the source’s position. The error bars correspond to the 5th and 95th tained over 200 Monte Carlo simulations.
quantiles and all results are obtained over 200 Monte Carlo simulations.

decided simultaneously to terminate the search at 𝑘 = 53. The estimates


while the resulting mean plume is indicated by a contour plot. Note that of source location coordinates, computed by all platforms in the team
the plume size is much smaller than the search area. Fig. 2a shows the are given in Table 1. The estimates are obtained as the weighted sample
particles before resampling. The particles are placed on a regular grid means of {𝑤𝑚,𝑖 , 𝐫 𝑚,𝑖 }, 𝑖 = 1, … , 𝑁 = 7. While the estimates at indi-
𝑘=53 0,𝑘=53
thus mimicking a grid-based approach, with the value of particle weights vidual platforms are different, their standard deviations are small: 0.53
indicated by a gray-scale intensity plot (white means a zero weight). for 𝑋̂0 and 0.6 for 𝑌̂0 .
Since the wind direction coincides with the x axis, and all concentra- Fig. 3 shows the prior and the final posterior density of source
tion measurements until 𝑘 = 16 were zero, the weights of the particles strength, p(Q0 ) and 𝑝(𝑄0 |𝐫0 , 1∶𝑘=53 ), respectively. The support of the
left from robot trajectories are small (almost zero). This provides a good posterior includes the true value of Q0 and is more concentrated than
visual representation of the posterior 𝑝(𝐫0 |1∶𝑘 ). Fig. 2b shows the situ- the prior.
ation after a non-zero concentration measurement was collected by the Continuing with the same illustrative run, Fig. 4 shows the number
multi-robot team (at 𝑘 = 43). The positional particles have been resam- of links in the communication network graph, over time. Initially, for
pled at this point of time and moved closer to the true source location. the regular hexagon shaped formation, the number of links is 12 (see
The total duration of the search on this run was 𝑘 = 53. All six platforms Fig. 1 at 𝑘 = 1). Because of deformations of the shape of formation as

Fig. 6. Examples of platform formations. In (a) 𝑁 = 7


platforms, in (b) 𝑁 = 19 platforms and in (c) 𝑁 = 37
platforms. The communication graphs, detailing the
communication links between the platforms, are indi-
cated with green lines. (For interpretation of the ref-
erences to colour in this figure legend, the reader is
referred to the web version of this article.)

19
B. Ristic, C. Gilliam and W. Moran et al. Information Fusion 58 (2020) 13–23

robots in the team is lost. If the communication range Rmax is reduced,


it is possible that some of the nodes in the graph become at some stage
disconnected, leading to the formation break-up. The search mission
may be successfully accomplished even if this happens, however, some
of the platforms are then lost.

5.2. Monte Carlo simulations

We now use Monte Carlo simulations to characterise the perfor-


mance of the search algorithm. Unless otherwise specified, the number
of Monte Carlo simulations is 200, and the parameters describing the
scenario and algorithm are the same as specified in the previous sec-
tion. The exception is that we now consider a search area of 750 × 750.
Fig. 8. Graph showing the probability that a platform is lost during the search
We start by analysing the performance of the algorithm as the
task as the communication distance Rmax increases. The solid blue line relates to
number of particles M increases. Specifically, we consider 𝑀 =
using 20 iterations during consensus, the red dashed line to using 30 iterations
152 , 202 , 252 , 302 and 502 particles, which correspond respectively to an
and the green starred line to using 40 iterations. The results correspond to 𝑁 = 7
platforms. (For interpretation of the references to colour in this figure legend, average of one particle per 50 × 50, 37.5 × 37.5, 30 × 30, 25 × 25 and
the reader is referred to the web version of this article.) 15 × 15 a.u.2 in the search area. The results of this analysis are shown
in Fig. 5. Graph (a) shows the mean search time and graph (b) shows
the mean absolute error in the estimation of the source’s position. From
the robots move, the topology of the network varies. This is reflected in the first graph, we observe that using 𝑀 = 502 particles results in the
occasional drops in the number of links. Nevertheless, throughout the smallest mean search time however there is no obvious relationship be-
search, the communication graph remains connected and none of the tween the search time and M. In contrast, the second graph indicates that

Fig. 9. An example of the decentralised multi-robot search using 𝑁 = 37 platforms. Graphs (a)–(d) show the positions and trajectories of the platforms at step indices
𝑘 = 0, 10, 18 and 24, respectively. The concentration of the plume is represented by the blue contours and the position of the particles for 𝑖 = 1 platform by the black
dots.

20
B. Ristic, C. Gilliam and W. Moran et al. Information Fusion 58 (2020) 13–23

Fig. 10. An illustrative run of the decentralised multi-robot search on the experimental dataset using 𝑁 = 7 platforms. Graphs (a)–(d) show the positions and
trajectories of the platforms at step indices 𝑘 = 0, 12, 22 and 32, respectively. The concentration of the plume is represented in grayscale (darker colours represent
higher concentration).

increasing the number of particles reduces the error in the estimation of number of platforms increases; the mean search time decreases from
the source’s position; the mean absolute error decreases from 4.5 a.u to 18275 a.u. to 2059 a.u. as the number of platforms increases from 1
2.5 a.u. as M increases from 152 to 502 particles. However, of course, to 37. Similarly, the second graph indicates that the mean absolute er-
using more particles increases the computational overhead for a single ror in the estimation of the source’s position decreases with the num-
platform. ber of platforms; the mean absolute of error when using 1 platform is
Next, we analyse the performance of the algorithm as the number of 3.5 a.u. and then reduces to 2.7 a.u. when using 37 platforms. These
platforms increases as follows: 𝑁 = 1, 7, 19 and 37 platforms. The geo- results are not unexpected, we would hope that using more search
metric formations for 𝑁 = 7, 19 and 37 platforms are illustrated in Fig. 6. platforms would decrease the search time and locate the source with
Now, increasing the size of the formations also increases the size of the greater accuracy. In terms of the communications between platforms,
communication network, thus more iterations will be required in order the mean number of links in the communication graphs were 11.35 for
to reach consensus. Accordingly, the number of iterations performed in 𝑁 = 7 platforms, 35.64 for 𝑁 = 19 platforms and 67.05 for 𝑁 = 37 plat-
the consensus algorithm is set to 30, 220 and 920 for 𝑁 = 7, 19 and 37 forms. Relative to the number of links the formations initially started
platform formations, respectively. These iteration numbers are based with, these values equate to a percentage decrease of 5.42% for 𝑁 = 7
on calculating the theoretical convergence time of the initial communi- platforms, 15.14% for 𝑁 = 19 platforms and 14.04% for 𝑁 = 37 plat-
cation graphs using the formula in [39]. Note that we return to using forms. Thus, these results indicate that the larger platform formations
𝑀 = 252 in the search algorithm. were more dispersed during the searching task. An example of one sim-
The results of increasing the number of platforms are shown in Fig. 7: ulation for the 𝑁 = 37 platform formation is shown in Fig. 9. Note
graph (a) shows the mean search time and graph (b) shows the mean that the figure shows that one of the platforms becomes separated
absolute error in the estimation of the source’s position. From the first from the formation however the remaining formation still locates the
graph, we observe a significant decrease in mean search time as the source.

21
B. Ristic, C. Gilliam and W. Moran et al. Information Fusion 58 (2020) 13–23

Finally, we examine the stability of the platform formation whilst using a dataset collected experimentally by releasing a fluorescein dye
searching for the source. Specifically, we vary the communication dis- in a large recirculating water channel.
tance Rmax from 20 to 30 and calculate the probability a formation of There many avenues for future work. One possibility is to incorpo-
size 𝑁 = 7 breaks up when using 20, 30 and 40 iterations to reach a rate a feedback term when making movement decisions with the aim
consensus on moving. The resulting probabilities, as a function of com- of keeping the platforms within communication distance of each other.
munication distance Rmax , are shown in Fig. 8. The solid blue line corre- Such an approach could allow a wider range of formation topologies
sponds to using 20 iterations in the consensus algorithm, the red dashed that evolve in time as the searching task progresses. Another possibility
line to 30 iterations and the green starred line to 40 iterations. Note that would be to consider performing consensus on the posterior pdf held lo-
these probabilities are calculated using 100 Monte Carlo simulations. cally by each platform, e.g. via covariance intersection [40]. Finally, we
The graph shows that the probability that the formation breaks apart could investigate methods to improve the convergence of the distributed
increases when using fewer iterations to reach consensus and when the consensus algorithm using different weights, allowing the platforms to
communication distance between platforms is decreased. The first trend have a local memory [41] or by using the distributed Alternating Direc-
is due to the changing topology of the communication graph, as the num- tion Multipliers Method [42,43].
ber of links in the graph decreases more iterations will be required to
reach consensus on movement–failure to reach a consensus will result Declaration of Competing Interest
in the formation breaking apart. The second trend is due to the pro-
cess noise in the movement of the platforms; errors in movement of the The authors declare that they have no known competing financial
platforms become more critical if the communication distance Rmax is interests or personal relationships that could have appeared to influence
smaller. the work reported in this paper.

5.3. Experimental dataset CRediT authorship contribution statement

In this section, we evaluate the proposed search algorithm on an Branko Ristic: Conceptualization, Methodology, Writing - original
experimental dataset. The dataset is provided by COANDA Research draft. Christopher Gilliam: Software, Validation. William Moran: Su-
& Development Corporation and corresponds to finding the source of pervision, Writing - review & editing. Jennifer L. Palmer: Writing -
a fluorescein dye. The dye is released at a constant rate from a nar- review & editing.
row tube in a large recirculating water channel. The data comprises a
sequence of 340 frames of instantaneous concentration field measure- Acknowledgement
ments in the vertical plane and is sampled at every 10/23 s. The size of
a frame is 49 × 49 pixels, where a pixel corresponds to a square area of This research is supported in part by the Defence Science and Tech-
2.935 × 2.935 mm2 . nology Group through its Strategic Research Initiative on Trusted Au-
As the size of the data is relatively small, we follow the approach used tonomous Systems.
in [28]: upscale each frame by a factor of 3 using bicubic interpolation
and place the result in the top left corner of a 500 × 500 search area. Appendix A. Derivation of the likelihood
A measurement obtained by a platform is thus the integer value of the
concentration of the dye taken from the closest spatial and temporal Here we derive (16). Using the Bayes rule we can write:
sample from the experimental data. 𝑔(𝑘 |𝑄0 , 𝐫0 ) 𝑝(𝑄0 |𝐫0 , 1∶𝑘−1 )
An example of the search algorithm running on the experimental 𝑝(𝑄0 |𝐫0 , 1∶𝑘 ) = (A.1)
𝑔(𝑘 |𝐫0 , 1∶𝑘−1 )
data is shown in Fig. 10. The figure shows the progress of 7 platforms
at step indices 𝑘 = 0, 12, 22, 32. Note that the algorithm terminated at We are after 𝑔(𝑘 |𝐫0 , 1∶𝑘−1 ) and hence from (A.1) we have
𝐾 = 33. With 200 Monte Carlo simulations, the mean search time for 𝑔 (𝑘 |𝑄0 , 𝐫0 ) 𝑝(𝑄0 |𝐫0 , 1∶𝑘−1 )
𝑔 (𝑘 |𝐫0 , 1∶𝑘−1 ) = (A.2)
the algorithm was 2525 a.u., with a 5th and 95th quantile of 1840 a.u. 𝑝(𝑄0 |𝐫0 , 1∶𝑘 )
and 3445 a.u., respectively. Note that in all simulations the formation
The expressions for the terms on the right-hand side of (A.2) are avail-
started from the bottom right hand corner indicated in Fig. 10(a).
able in the analytic form. Based on (8)
𝑧𝑖
𝑁 𝑄 𝑘 𝜌(𝐫 , 𝐫 𝑖 )𝑧𝑘 𝑖
6. Conclusions ∏ 0 𝑘 𝑖
𝑔(𝑘 |𝑄0 , 𝐫0 ) = 0
𝑒−𝑄0 𝜌(𝐫0 ,𝐫𝑘 ) , (A.3)
𝑖=1 𝑧𝑖𝑘 !
In this paper, we have proposed the first decentralised infotaxic
while 𝑝(𝑄0 |𝐫0 , 1∶𝑘−1 ) and 𝑝(𝑄0 |𝐫0 , 1∶𝑘 ) are Gamma distributions, with
search algorithm for a group of autonomous robotic platforms. The al-
parameter pairs (𝜅𝑘−1 , 𝜗𝑘−1 ) and (𝜅 k , ϑk ), respectively, see (12). Upon
gorithm allows the platforms to search and locate a source of hazardous
the substitution of (A.3) and the expressions for 𝑝(𝑄0 |𝐫0 , 1∶𝑘−1 ) and
emissions in a coordinated manner without the need for a centralised
𝑝(𝑄0 |𝐫0 , 1∶𝑘 ) into (A.2), and using the following identities:
fusion and control system. More precisely, this distributed coordination
∑𝑁 𝑖
is achieved by local exchange of measurement data between neighbour- 𝑖=1 𝑧𝑘 𝜅 −1
𝑄0 𝑄0𝑘−1
ing platforms. Similarly, the movement decisions taken by the platforms =1 (A.4)
𝜅𝑘 −1
were reached using a distributed average consensus algorithm over the 𝑄0
whole formation. The key aspect is that individual platforms only re- ( )

quire knowledge of their neighbours; the global knowledge of the com- exp −𝑄0 𝑁 𝑖
𝑖=1 𝜌(𝐫0 , 𝐫𝑘 ) exp(−𝑄0 ∕𝜗𝑘−1 )
munication network topology is not required. An advantage of adopted =1 (A.5)
exp(−𝑄0 ∕𝜗𝑘 )
distributed framework is that all platforms are treated equally, making
the proposed search algorithm robust to the failure of a single platform. we obtain (16).
Numerical studies of the algorithm focused on are analysis of the search
time as a function of the size of the multi-robot formation. Furthermore, Supplementary material
the stability of the formation was analysed as a function of the number
of consensus iterations and the maximum communication distance be- Supplementary material associated with this article can be found, in
tween platforms. Finally, the search algorithm has been demonstrated the online version, at doi:10.1016/j.inffus.2019.12.011.

22
B. Ristic, C. Gilliam and W. Moran et al. Information Fusion 58 (2020) 13–23

References [23] B. Ristic, A. Skvortsov, A. Gunatilaka, A study of cognitive strategies for an au-
tonomous search, Inf. Fusion 28 (2016) 1–9.
[1] M. Dunbabin, L. Marques, Robots for environmental monitoring: significant ad- [24] H. Hajieghrary, M.A. Hsieh, I.B. Schwartz, Multi-agent search for source localization
vancements and applications, IEEE Robot. Autom. Mag. 19 (1) (2012) 24–39. in a turbulent medium, Phys. Lett. A 380 (20) (2016) 1698–1705.
[2] H. Ishida, Y. Wada, H. Matsukura, Chemical sensing in robotic applications: areview, [25] N. Fatès, Collective infotaxis with reactive amoebae: a note on a simple bio-inspired
IEEE Sens. J. 12 (11) (2012) 3163–3173. mechanism, in: Intern. Conf. on Cellular Automata, 2016, pp. 157–165.
[3] B. Bayat, N. Crasta, A. Crespi, A.M. Pascoal, A. Ijspeert, Environmental monitoring [26] M. Hutchinson, H. Oh, W.-H. Chen, Entrotaxis as a strategy for autonomous search
using autonomous vehicles: a survey of recent searching techniques, Curr. Opin. and source reconstruction in turbulent conditions, Inf. Fusion 42 (2018) 179–189.
Biotechnol. 45 (2017) 76–84. [27] M. Hutchinson, C. Liu, W. Chen, Information-based search for an atmospheric release
[4] M. Rossi, D. Brunelli, Gas sensing on unmanned vehicles: challenges and oppor- using a mobile robot: algorithm and experiments, IEEE Trans. Control Syst. Technol.
tunities, in: 2017 New Generation of CAS (NGCAS), IEEE, Genova, Italy, 2017, 27 (6) (2019) 2388–2402.
pp. 117–120. [28] B. Ristic, D. Angley, B. Moran, J.L. Palmer, Autonomous multi-robot search for a
[5] J. Burgués, V. Hernández, A.J. Lilienthal, S. Marco, Smelling nano aerial vehicle for hazardous source in a turbulent environment, Sensors 17 (4) (2017) 918.
gas source localization and mapping, Sensors 19 (3) (2019) 478. [29] C. Song, Y. He, B. Ristic, L. Li, X. Lei, Multi-agent collaborative infotaxis search based
[6] J. Kennedy, Zigzagging and casting as a programmed response to wind-borne odour: on cognition difference, J. Phys. A 52 (48) (2019).
a review, Physiol. Entomol. 8 (2) (1983) 109–120. [30] M. Park, H. Oh, Cooperative information-driven source search and estimation for
[7] A.J. Rutkowski, S. Edwards, M.A. Willis, R.D. Quinn, G.C. Causey, A robotic platform multiple agents, Inf. Fusion 54 (2020) 72–84.
for testing moth-inspired plume tracking strategies, in: Proc. IEEE Intern. Conf. on [31] O. Hlinka, F. Hlawatsch, P.M. Djuric, Distributed particle filtering in agent networks:
Robotics and Automation (ICRA), 4, 2004, pp. 3319–3324. A survey, classification, and comparison, IEEE Signal Process. Mag. 30 (1) (2013)
[8] W. Li, J.A. Farrell, S. Pang, R.M. Arrieta, Moth-inspired chemical plume tracing on 61–81.
an autonomous underwater vehicle, IEEE Trans. Robot. 22 (2) (2006) 292–307. [32] A. Doucet, N. De Freitas, K. Murphy, S. Russell, Rao-Blackwellised particle filtering
[9] X. Kang, W. Li, Moth-inspired plume tracing via multiple autonomous vehicles under for dynamic Bayesian networks, in: Proc. of the Sixteenth Conference on Uncertainty
formation control, Adapt. Behav. 20 (2) (2012) 131–142. in Artificial Intelligence, 2000, pp. 176–183.
[10] R.A. Russell, A. Bab-Hadiashar, R.L. Shepherd, G.G. Wallace, A comparison of reac- [33] B. Ristic, A. Gunatilaka, Y. Wang, Rao-Blackwell dimension reduction applied to
tive robot chemotaxis algorithms, Robot. Auton. Syst. 45 (2) (2003) 83–97. hazardous source parameter estimation, Signal Process. 132 (2017) 177–182.
[11] S.I. Azuma, M.S. Sakar, G.J. Pappas, Stochastic source seeking by mobile robots, [34] A. Gelman, J.B. Carlin, H.S. Stern, D.B. Rubin, Bayesian Data Analysis, third ed.,
IEEE Trans. Autom. Control 57 (9) (2012) 2308–2321. CRC Press, 2003.
[12] S.-J. Liu, M. Krstic, Stochastic Averaging and Stochastic Extremum Seeking, Springer [35] E.K.P. Chong, C. Kreucher, A.O. Hero, POMDP Approximation Using Simulation and
Science & Business Media, 2012. Heuristics, Springer, chapter 8.
[13] S. Li, R. Kong, Y. Guo, Cooperative distributed source seeking by multiple robots: Al- [36] L. Xiao, S. Boyd, S. Lall, A scheme for robust distributed sensor fusion based on
gorithms and experiments, IEEE/ASME Trans. Mechatron. 19 (6) (2014) 1810–1820. average consensus, in: Proc. 4th Intern. Symp. on Information Processing in Sensor
[14] J.-B. Masson, Olfactory searches with limited space perception, Proc. Natl. Acad. Sci. Networks, 2005, p. 9.
(PNAS) 110 (28) (2013) 11261–11266. [37] W. Ren, R.W. Beard, E.M. Atkins, Information consensus in multivehicle cooperative
[15] M. Vergassola, E. Villermaux, B.I. Shraiman, ‘Infotaxis’ as a strategy for searching control, IEEE Control Syst. 27 (2) (2007) 71–82.
without gradients, Nature 445 (25) (2007) 406–409. [38] R. Olfati-Saber, J.A. Fax, R.M. Murray, Consensus and cooperation in networked
[16] J.-B. Masson, M. Bailly-Bachet, M. Vergassola, Chasing information to search in ran- multi-agent systems, Proc. IEEE 95 (1) (2007) 215–233.
dom environments, Jour. Phys. A 42 (2009). [39] L. Xiao, S. Boyd, Fast linear iterations for distributed averaging, Syst. Control Lett.
[17] E.M. Moraud, D. Martinez, Effectiveness and robustness of robot infotaxis for search- 53 (1) (2004) 65–78.
ing in dilute conditions, Front. Neurorobot. 4 (2010) 1–8. [40] S.J. Julier, J.K. Uhlmann, A non-divergent estimation algorithm in the presence of
[18] C. Barbieri, S. Cocco, R. Monasson, On the trajectories and performance of infotaxis, unknown correlations, in: Proceedings of the American Control Conference, 4, 1997,
an information-based greedy search algorithm, Europhys. Lett. 94 (2) (2011) 20005. pp. 2369–2373.
[19] A.M. Hein, S.A. McKinley, Sensing and decision-making in random search, Proc. [41] E. Kokiopoulou, P. Frossard, Polynomial filtering for fast convergence in distributed
Natl. Acad. Sci. USA (July 2012), doi:10.1073/pnas.1202686109. consensus, IEEE Trans. Signal Process. 57 (1) (2009) 342–354.
[20] N. Voges, A. Chaffiol, P. Lucas, D. Martinez, Reactive searching and infotaxis in odor [42] I.D. Schizas, A. Ribeiro, G.B. Giannakis, Consensus in ad hoc WSNs with noisy links—
source localization, PLoS Comput. Biol. 10 (10) (2014). part I: Distributed estimation of deterministic signals, IEEE Trans. Signal Process. 56
[21] B. Ristic, A. Skvortsov, A. Walker, Autonomous search for a diffusive source in an (1) (2008) 350–364.
unknown structured environment, Entropy 16 (2) (2014) 789–813. [43] I.D. Schizas, G.B. Giannakis, S.I. Roumeliotis, A. Ribeiro, Consensus in ad hoc WSNs
[22] H. Hajieghrary, A.F. Tomas, M.A. Hsieh, An information theoretic source seeking with noisy links–part II: Distributed estimation and smoothing of random signals,
strategy for plume tracking in 3D turbulent fields, in: Proc. IEEE Intern. Symp. on IEEE Trans. Signal Process. 56 (4) (2008) 1650–1666.
Safety, Security, and Rescue Robotics (SSRR), 2015, pp. 1–8.

23

You might also like