Research On Dynamic Probability Mechanism of Rebroadcasting For UAV Swarm

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

The 12th International Conference on Wireless Communications and Signal Processing

Research on Dynamic Probability Mechanism of


Rebroadcasting for UAV Swarm

Zhen Qin∗ , Tingjun Du† , Fei Xiong‡ , Hai Wang∗§ , Aijing Li∗ , Yajie Chen∗
∗ College of Communications Engineering, Army Engineering University of PLA, Nanjing, China
† Unit 78102 of PLA, Chengdu, China
‡ CMC Political and Law Commission, Beijing, China
§ Corresponding author: Hai Wang
Email: qzqzla912@163.com; 1874404772@qq.com; xbibid@126.com; hai.wang@gmail.com;
lishan wh@126.com; chenyajie0705@gmail.com

Abstract—Unmanned aerial vehicle (UAV) swarm needs to overwhelm the network. This overwhelming network results
broadcast routing messages to each other, accept unified control in overly redundant rebroadcasting (some nodes receive the
commands, and return the discovered targets’ information to same packet more than once), contention and collision, which
the ground control station in time. Most of these messages are
broadcast in the UAV swarm. Therefore, in the UAV swarm are referred to as the Broadcast Storm Problem (BSP) [5].
network, the efficiency of broadcasting and rebroadcasting will When UAVs participate in the swarm mission, there is also
determine the network efficiency of the entire network. Aiming at the demand of network broadcasting, such as the broadcasting
the broadcasting problem of UAV network, in order to adapt to of safe flight information between UAVs, the broadcasting
the dynamic nature of the network and improve the broadcasting of group formation status information, the broadcasting of
ability and efficiency, this paper focuses on the rebroadcasting
mechanism based on dynamic probability. A prediction method command and control information and perception information,
of collision probability of rebroadcasting behavior, and a multi- etc. Therefore, the BSP is the core problem to be solved first
agent cooperative rebroadcasting method based on virtual action for achieving UAV swarm control.
and reinforcement learning are proposed. Simulation results show
that the dynamic probability mechanism of rebroadcasting has To achieve efficient broadcasting and solve the BSP, many
obvious performance advantages over fixed probability flooding methods have been proposed [5]– [13]. The main approach to
and simple flooding in dynamic scenarios after pre-learning. solve the BSP in MANET is focused on how to reduce the
Index Terms—UAV swarm, rebroadcast, dynamic probability,
reinforcement learning. amount of redundant retransmissions, which can be achieved
by preventing a subset of the network nodes from doing
the retransmission. In general, these broadcast protocols are
I. I NTRODUCTION
categorized into: (i) Probability-based methods, which are

U NMANNED aerial vehicles (UAVs) can cooperate with


each other to form a network mode of hop-by-hop
transmission, which is like traditional mobile ad hoc net-
similar to basic flooding, except that each node rebroadcasts
packets with a predetermined probability. These mechanisms
might work in dense networks when multiple nodes have
work (MANET) [1]– [2]. All UAVs can cooperate to form similar neighbor coverage, but they will not have a significant
connections without the help of fixed facilities such as base effect in sparse networks. (ii) Area-based methods, in which a
stations and access points. Each UAV communicates with the node rebroadcasts a packet based on the distance between itself
surrounding nodes to form the routing function. If UAVs are and the node receiving the data packet. A rebroadcast occurs
in the communication range, they can communicate directly. only when the distance is longer than the predefined threshold,
Otherwise, data transmission between UAVs requires the as- so that a larger additional area can be reached. However,
sistance of other UAVs [3]. Therefore, UAV swarm network area-based methods, without considering whether some nodes
can also be considered as a new application of MANET. actually exist within that additional area, can lead to inefficient
Broadcasting is a basic operation and has extensive ap- broadcasting. (iii) Neighbor knowledge methods, which can
plications in MANET. Basic flooding or blind flooding is be further classified as neighbor-designated methods and self-
the simplest way of broadcasting a packet to all nodes in pruning methods. In neighbor-designated methods, a node
a network [4]. Flooding requires each node to rebroadcast transmits a packet and specifies one of its one-hop neighbors
a packet to its neighbors once it receives a new packet. which should forward the packet. In self-pruning methods,
The rebroadcasting behavior continues until all nodes in the a node receiving a packet will decide whether or not to
network receive the packet. The main advantage of basic transmit the packet by itself. However, most existing broadcast
flooding is that it can always find the shortest path between methods assume either the underlying network topology is
sources and destinations, since the packets have been broad- static or quasi-static during the broadcast process, such as the
casted through every possible path in parallel. However, the neighborhood information can be updated in a timely manner.
basic flooding mechanism may trigger a large number of Results of the paper [14] show that most broadcast protocols
packets to be forwarded in MANET which will eventually suffer from low delivery ratio in highly mobile networks.

978-1-7281-7236-1/20/$31.00 ©2020 IEEE 1052


Authorized licensed use limited to: ULAKBIM UASL ISTANBUL TEKNIK UNIV. Downloaded on January 15,2024 at 10:32:01 UTC from IEEE Xplore. Restrictions apply.
overlap areas of other nodes may lose the rebroadcasting
Nodes for next hop
packet because of contention and collision. In order to improve
Nodes for current hop
the rebroadcasting performance under dynamic conditions,
A4
Nodes for previous hop a dynamic probability strategy is needed. This strategy can
forecast the rebroadcasting actions of neighbors, and calculate
the probability of contention and collision. Nodes can choose
R25 A3 to forward or drop a packet following this strategy after
R24
R22
A2
receiving a packet.
R23 R26
Let us analyse the rebroadcasting effect of node R11 . We
R27
A1 assume that hello messages are only transmitted in one hop
R21
R12 R11 range. Therefore, R11 cannot know the existence of node
R14 and R17 . R11 only consider the contention and collision
R28
R13
with R12 and R13 when it prepares for rebroadcasting. We
R0 characterize NR11 ∪R12 ∪R13 = {R21 , R22 , . . . , R28 } as the set
R14
of next hop nodes of R11 , R12 , R13 . According to the possible
R15 contention and collision for rebroadcasting, four areas can be
R16
R17
divided as follow in Fig. 1:

Fig. 1: Sketch map of rebroadcasting behavior A1 : NA1 = NR11 ∩R13 = {R27 } ∈


/ NR0 ∪R12 , (1)

A2 : NA2 = NR11 ∩R12 ∩R13 = {R26 } ∈


/ NR0 , (2)
Therefore, these broadcast protocols are not suitable for UAV
swarm which has high dynamic characteristic.
In this paper, a dynamic probability-based rebroadcasting A3 : NA3 = NR11 = {R25 } ∈
/ NR0 ∪R12 ∪R13 , (3)
method is formulated, in which each node forecasts the
probability of packet contention and collision according to
its neighbors’ positions dynamically, then evaluates the re-
broadcasting actions to form a dynamic probability strategy. A4 : NA4 = NR11 ∩R12 = {R23 , R24 } ∈
/ NR0 ∪R13 . (4)
To model the rebroadcasting actions evaluation problem, a When R11 receives a packet, it should select an action from
collision probability prediction method, and a multi-agent forwarding and dropping based on the probability of rebroad-
cooperative rebroadcasting method based on virtual action and casting. About the rebroadcasting, we propose an evaluation
reinforcement learning (RL) method are proposed. Simulation method. It has two parts: (i) The profit of the action. (ii) The
results validate the dynamic probability-based rebroadcasting loss of the action. The action should be rewarded when the
method is more suitable for dynamic network than conven- profit is greater than the loss, vice versa.
tional probability-based methods. The main contributions of The definition of Forward Profit (FP) is the expectation of
this paper are summarized as follows: the number of next hop nodes which successfully receive a
• The rebroadcasting actions evaluation problem is inves- new packet from this forwarding. Such as the FP of R11 is the
tigated in multi-UAV scenarios, and the coordination expectation of the number of nodes which successfully receive
among UAVs is considered. And we propose a collision a new packet from R11 ’s forwarding. These nodes should be
probability prediction method for rebroadcasting behav- in set NR11 ∪R12 ∪R13 , and not in set NR0 .
ior. The definition of Forward Loss (FL) is the expectation of the
• We propose a multi-agent cooperative rebroadcasting number of next hop nodes which fail to receive a new packet
method based on virtual action and RL to obtain a from this forwarding due to the contention and collision.
dynamic probability strategy for rebroadcasting. Such as the FL of R11 is the expectation of the number
The rest of this paper is organized as follows. The system of nodes which fail to receive the new packet from R11 ’s
model and problem formulation are presented in Section II. forwarding. These nodes should be in set NR11 ∪R12 ∪R13 , and
Then we present a multi-agent cooperative rebroadcasting not in set NR0 . The contention and collision are caused by the
method based on virtual action and RL in Section III. Simula- simultaneous forwarding of R11 and other nodes in set NR0 .
tion results are provided and analyzed in Section IV. Finally, The definition of Drop Profit (DP) is the expectation of
we conclude the paper in Section V. the number of next hop nodes which successfully receive a
new packet due to this dropping. Such as the DP of R11 is the
II. T HE MODEL OF REBROADCASTING ACTIONS
expectation of the number of nodes which successfully receive
EVALUATION PROBLEM
a new packet from the forwarding of other nodes with the
A. An example dropping of R11 . These nodes should be in set NR11 ∪R12 ∪R13 ,
As illustrated in Fig. 1, broadcasting is in progress in the and not in set NR0 .
UAV swarm. The maximum communication distance of UAV The definition of Drop Loss (DL) is the expectation of the
is Cm . Node R0 is a source node or a relay node which number of next hop nodes which fail to receive a new packet
is broadcasting packet. NR0 = {R11 , R12 , . . . , R17 } is the due to this dropping. Such as the DL of R11 is the expectation
neighbors set of R0 . The nodes in NR0 will rebroadcast the of the number of nodes which fail to receive a new packet
packet received from R0 . The nodes in the communication due to the dropping of R11 . These nodes should be in set

1053
Authorized licensed use limited to: ULAKBIM UASL ISTANBUL TEKNIK UNIV. Downloaded on January 15,2024 at 10:32:01 UTC from IEEE Xplore. Restrictions apply.
NR11 ∪R12 ∪R13 , and not in set NR0 . The loss is caused by the In summary, the FL of R11 is:
simultaneous dropping of R11 and other nodes in set NR0 .
F LrR11 = F LrR11 A1 + F LrR11 A2 + F LrR11 A3 + F LrR11 A4
We characterize Pji as the probability of node i forwarding
broadcast packets from node j. The calculations of profit and = PR0 R11 PR0 R13 + PR0 R11 [PR0 R12 (1 − PR0 R13 )+
loss are as follow: (1 − PR0 R12 )PR0 R13 + PR0 R12 PR0 R13 ]+
1) The FP of R11 : There is a node in set NA1 . This node 2PR0 R11 PR0 R12 .
can receive the packet from R11 when R11 forwards and R13 (14)
drops. As a result, the FP of R11 in set NA1 is:
The reward of the forwarding action of R11 is:
F P rR11 A1 = PR0 R11 (1 − PR0 R13 ) × 1. (5)
F rR11 = F P rR11 − F LrR11 . (15)
There is a node in set NA2 . This node can receive the packet
from R11 when R11 forwards and R12 , R13 drop. As a result, This forwarding has more advantages than disadvantages
the FP of R11 in set NA2 is: when F rR11 > 0, otherwise the opposite.
3) The DP of R11 : The node in set NA1 can receive the
packet when R11 drops and R13 forwards. As a result, the DP
F P rR11 A2 = PR0 R11 (1 − PR0 R12 )(1 − PR0 R13 ) × 1. (6) of R11 in set NA1 is:
There is a node in set NA3 . This node can receive the packet DP rR11 A1 = (1 − PR0 R11 )PR0 R13 × 1. (16)
from R11 when R11 forwards. As a result, the FP of R11 in
set NA3 is: The node in set NA2 can receive the packet when R11 drops
while R12 or R13 forwards. As a result, the DP of R11 in set
F P rR11 A3 = PR0 R11 × 1. (7) NA2 is:

There are two nodes in NA4 . These two nodes can receive
DP rR11 A2 =(1 − PR0 R11 )[PR0 R12 (1 − PR0 R13 )+
the packet from R11 when R11 forwards and R12 drops. As (17)
a result, the FP of R11 in set NA4 is: (1 − PR0 R12 )PR0 R13 ] × 1.
The node in set NA3 fails to receive the packet when R11
F P rR11 A4 = PR0 R11 (1 − PR0 R12 ) × 2. (8) drops. As a result, the DP of R11 in set NA3 is:
In summary, the FP of R11 is:
DP rR11 A3 = 0. (18)
F P rR11 = F P rR11 A1 + F P rR11 A2 + F P rR11 A3 + F P rR11 A4
Two nodes in set NA4 can receive the packet when R11
= PR0 R11 (1 − PR0 R13 )+ drops and R12 forwards. As a result, the DP of R11 in set
PR0 R11 (1 − PR0 R12 )(1 − PR0 R13 )+ NA4 is:
PR0 R11 + 2PR0 R11 (1 − PR0 R12 ).
(9) DP rR11 A4 = (1 − PR0 R11 )PR0 R12 × 2. (19)
2) The FL of R11 : The node in set NA1 fails to receive In summary, the DP of R11 is:
the packet from R11 when R11 and R13 forward at the same
time. As a result, the FL of R11 in set NA1 is: DP rR11 =DP rR11 A1 + DP rR11 A2 + DP rR11 A3 + DP rR11 A4
=(1 − PR0 R11 )PR0 R13 +
F LrR11 A1 = PR0 R11 PR0 R13 × 1. (10)
(1 − PR0 R11 )[PR0 R12 (1 − PR0 R13 )+
The node in set NA2 fails to receive the packet from (1 − PR0 R12 )PR0 R13 ] + 2(1 − PR0 R11 )PR0 R12 .
R11 when R11 forwards while at least one of R12 and R13 (20)
forwards. As a result, the FL of R11 in set NA2 is:
4) The DL of R11 : The node in set NA1 fails to receive the
packet when R11 and R13 drop at the same time. As a result,
F LrR11 A2 = PR0 R11 [PR0 R12 (1 − PR0 R13 )+ the DL of R11 in set NA1 is:
(1 − PR0 R12 )PR0 R13 + PR0 R12 PR0 R13 ] × 1.
(11)
DLrR11 A1 = (1 − PR0 R11 )(1 − PR0 R13 ) × 1. (21)
The node in set NA3 can receive the packet from R11 when
R11 forwards. As a result, the FL of R11 in set NA3 is: The node in set NA2 fails to receive the packet when R11
drops, while R12 and R13 forward or drop at the same time.
F LrR11 A3 = 0. (12) As a result, the DL of R11 in set NA2 is:

Two nodes in set NA4 fail to receive the packet from R11 DLrR11 A2 =(1 − PR0 R11 )[PR0 R12 PR0 R13 +
when R11 and R12 forward at the same time. As a result, the (22)
(1 − PR0 R12 )(1 − PR0 R13 )] × 1.
FL of R11 in set NA4 is:
The node in set NA3 fails to receive the packet when R11
F LrR11 A4 = PR0 R11 PR0 R12 × 2. (13) drops. As a result, the DL of R11 in set NA3 is:

1054
Authorized licensed use limited to: ULAKBIM UASL ISTANBUL TEKNIK UNIV. Downloaded on January 15,2024 at 10:32:01 UTC from IEEE Xplore. Restrictions apply.
The DL of UAV i:
DLrR11 A3 = (1 − PR0 R11 ) × 1. (23) X X Y δbc
DLri = (1−Pji ) [1 − δab Pja (1 − Pjc ) ].
Two nodes in set NA4 fail to receive the packet when R11 b∈Ni n a∈Ni c c∈Ni c \a
and R12 drop. As a result, the DL of R11 in set NA4 is: (34)
The reward of the forwarding action of UAV i is:
DLrR11 A4 = (1 − PR0 R11 )(1 − PR0 R12 ) × 2. (24) X Y δab
F ri = F P ri − F Lri = Pji [2 (1 − Pja ) − 1].
In summary, the DL of R11 is: b∈Ni n a∈Ni c
(35)
DLrR11 =DLrR11 A1 + DLrR11 A2 + DLrR11 A3 + DLrR11 A4 The reward of the dropping action of UAV i is:
=(1 − PR0 R11 )(1 − PR0 R13 ) + (1 − PR0 R11 )× Dri =DP ri − DLri
X X Y δ
[PR0 R12 PR0 R13 + (1 − PR0 R12 )(1 − PR0 R13 )]+ =(1 − Pji ) {2 [δab Pja (1 − Pjc ) bc ] − 1}.
1 − PR0 R11 + 2(1 − PR0 R11 )(1 − PR0 R12 ). b∈Ni n a∈Ni c c∈Ni c \a
(25) (36)
The reward of the dropping action of R11 is: III. A MULTI - AGENT COOPERATIVE REBROADCASTING
METHOD
DrR11 = DP rR11 − DLrR11 . (26) In the UAV swarm network, due to the movement of UAVs,
the proposed rebroadcasting method needs to adjust the prob-
The dropping has more advantages than disadvantages when ability of rebroadcasting behavior according to the changes
DrR11 > 0, otherwise the opposite. of UAVs’ positions. Furthermore, the change of position is a
B. The evaluation model for the rebroadcasting actions continuous process. Therefore, the adjustment is a process of
experience learning. Based on the evaluation model for the
Define Ni = {1, 2, . . . , N } as the neighbors set of UAV
rebroadcasting actions, a multi-agent collaborative Q-learning
i. Set Ni can be divided into three subsets according to the
algorithm is proposed which is similar to [15].
positions of UAVs when UAV j ∈ Ni broadcasts first.
We make each UAV sends hello packet periodically in one
The set of previous hop node:
hop range. And their neighbors use Q-learning algorithm to
learn the information received from hello packets independent-
Ni p = {j}. (27)
ly. When a UAV receives a hello packet, it should do a virtual
The set of current hop nodes: action that the UAV assumes a rebroadcasting action has done.
Then the UAV evaluates and learns the reward of this virtual
action. Finally, the UAV updates its Q value table.
Ni c = {n|0 <k Gn − Gj k3 ≤ Cm , n ∈ Ni }. (28)
A. Q value table
The set of next hop nodes:
TABLE I: Q value table
Ni n = {n| k Gn − Gj k3 > Cm , n ∈ Ni }. (29) Fji Dji Gj tj
The communication relation function can be expressed as Sji

0, k Gp − Gq k3 > Cm As shown in Table I, UAV i creates a Q value table Qji
δpq = , (30)
1, k Gp − Gq k3 ≤ Cm for UAV j when i receives a hello packet from j for the first
where UAV p cannot communicate with UAV q when δpq = 0. time. Qji has a state Sji (it means i receives a packet from
UAV p can communicate with UAV q when δpq = 1. j), two actions (forward Fji and drop Dji ), two information
The reward function is composed of the profit and loss (GPS Gj and hello time tj ). Every time UAV i receives hello
of each UAV in set Ni n . The general forms of the reward packet from UAV j, the information should be updated. UAV
function for rebroadcasting actions are proposed as follow: i should delete Qji once it has not received hello packet from
The FP of UAV i: UAV j at time tj +∆, which means the distance between UAV
X Y i and UAV j is more than Cm . For example, the node R11 in
δ
F P ri = Pji (1 − Pja ) ab . (31) Fig. 1 should create seven Q value tables for NR11 .
b∈Ni n a∈Ni c UAV i should select a rebroadcasting action when it receives
a broadcasting packet from UAV j. The selection probability
The FL of UAV i:
X Y of forwarding is calculated according to the formula of Boltz-
δab mann:
F Lri = Pji [1 − (1 − Pja ) ]. (32)
b∈Ni a∈Ni
n c
eFji /T
The DP of UAV i: Pji = , (37)
X X Y eFji /T + eDji /T
δbc
DP ri = (1−Pji ) [δba Pja (1 − Pjc ) ]. where T is a positive parameter called the temperature [16].
b∈Ni n a∈Ni c c∈Ni ch \a
High temperatures cause the actions to be all (nearly) e-
(33) quiprobable. Low temperatures cause a greater difference in

1055
Authorized licensed use limited to: ULAKBIM UASL ISTANBUL TEKNIK UNIV. Downloaded on January 15,2024 at 10:32:01 UTC from IEEE Xplore. Restrictions apply.
Algorithm 1 Q-learning algorithm of UAV i IV. N UMERICAL R ESULTS AND D ISCUSSIONS
1: UAV i keeps listening, and receives the hello packet from
UAV j; The proposed method is simulated by Exata. For our sim-
2: If this is the first time to receive the hello message from ulations, we assume that 100 nodes are randomly distributed
UAV j, a Q value table Qji is constructed. Fji and Dji to form a UAV swarm network in a square area of side length
are set to 1. If Qji has been constructed, we update the equal to 800m. The maximum communication distance of
location and time information of the Q value table; each UAV is 168m, and the bandwidth is 2M bits. The UAVs
3: UAV i selects forward action according the probability: move randomly with a maximum speed of 10m/s and a pause
eFji /T interval of 30 seconds. Each UAV sends a hello message every
Pji = Fji /T ; 2 seconds, and sends broadcast packets every fixed interval.
e +eDji /T
4: The reward of the actions is calculated by formula (40). The size of broadcast packet is 300kbit, TTL is 7, and the data
5: UAV i update Q value table Qji : transmission time is 1800 seconds. Experiments are carried out
in two scenarios: (i) each UAV sends broadcast packet at an

Fji = Fji + α[r(ac) + γ max{Fji , Dji } − Fji ], ac = F orward,
Dji = Dji + α[r(ac) + γ max{Fji , Dji } − Dji ], ac = Drop.
(41)
interval of 5 seconds; (ii) each UAV sends broadcast packet
at an interval of 10 seconds.
The performance of six broadcast mechanisms is evaluated,
and 10 groups of experiments are conducted for each broadcast
mechanism in two scenarios. The six broadcast mechanisms
selection probability for actions that differ in their value are as follows:
estimates.
• A: Flooding
• B: 33% probability flooding
B. Q-learning algorithm
• C: 50% probability flooding
All Q value tables are included in the hello packet when • D: 75% probability flooding
UAV j broadcasts the hello message. UAV i will do a virtual • E: Q-learning
action that assumes a rebroadcasting action has done when • F: Q-learning with delay (1000s pre-learning)
it receives the hello packet from j. UAV i should decide to In this scenario, the amount of broadcasting data is very
forward or drop the packet, and evaluate the decision, then large, and serious collision is inevitable. It is difficult for
learn the reward and update its Q value table. UAV i updates data packets to be effectively transmitted, resulting in the
its Q values according to the Bellman equation which can be poor transmission effect of the whole network. In this extreme
expressed as: condition, it is impossible to ensure the normal transmission of
each packet. Only when the network conditions permit, part of
Qt+1 (st, ac) =Qt (st, ac) + α[rt (st, ac)+ the packets can be transmitted to the outside as far as possible.
(38) Considering that the number of broadcast packets is deter-
γ max0
Qt (st0 , ac0 ) − Qt (st, ac)], mined, whether the network conditions can be improved can
ac
only be controlled by adjusting the behavior of rebroadcasting.
where Qt (st, ac) is the Q value with rebroadcasting deci- That is, when the network conditions permit, the probability
sion ac in state st at time t. rt (st, ac) is the reward func- of rebroadcasting increases correspondingly; when the network
tion when the transition is made from st with decision ac. conditions are bad, the probability of rebroadcasting decreases
max Qt (st0 , ac0 ) means the maximum possible Q values after correspondingly. Therefore, the network data transmission
ac0
the transition with next possible action ac0 . α is the learning situation can be evaluated by referring to the index of packet
rate and controls the learning speed. And γ ∈ [0, 1] is a receiving and sending ratio. The meaning of this parameter is:
discount factor to maintain stableness. Since the state space the number of packets correctly received by all UAVs divided
of st only has one state and the rebroadcasting decision has by the number of packets successfully sent by UAVs in the
two actions (forward and drop), the equation can be simplified form of broadcasting and rebroadcasting.
as below: As shown in Fig. 2 and Fig. 3, compared with other method-
s, flooding broadcast mechanism has a lower packet receiving
Qt+1 (ac) = Qt (ac) + α[rt (ac) + γ max{F, D} − Qt (ac)]. and sending ratio, which indicates that the probability-based
(39) mechanism can improve the broadcast performance, and the
performance improvement is about 40%. Comparing several
According to the evaluation model for the rebroadcasting probability-based methods in two scenarios, none of them has
actions mentioned earlier, the reward function can be defined
as below: obvious advantages in terms of packet receiving and sending
ratio. However, combined with the standard deviation of packet
(1 − Pja )δab − 1), F orward
 P Q




Pji
b∈Ni n
(2
a∈Ni c
receiving and sending ratio, it is found that the performance of
r(ac) = .


 (1 − Pji )
P
{2
P
[δab Pja
Q
(1 − Pjc )δbc ] − 1}, Drop the three methods based on fixed probability changes greatly
 b∈Ni n a∈Ni c c∈Ni c \a
(40) due to the movement of UAVs. On the contrary, the fluctuation
of the two Q-learning methods is smaller and the performance
Based on the above analysis, a Q-learning algorithm is is more stable. Especially, the method based on delayed Q-
proposed. The implementation procedure of one UAV is shown learning has better stability.
in Algorithm 1. All UAVs use Q-learning algorithm to make Through the further analysis of the experimental phenome-
decision independently. na, the following conclusions are drawn:

1056
Authorized licensed use limited to: ULAKBIM UASL ISTANBUL TEKNIK UNIV. Downloaded on January 15,2024 at 10:32:01 UTC from IEEE Xplore. Restrictions apply.
And then it evaluates the value of rebroadcasting behavior,
while forming a dynamic broadcast probability. In order to
1 .4 S c e n a ry i model the evaluation for rebroadcasting behavior, a collision
S c e n a ry ii
P a c k e t R e c e iv in g a n d S e n d in g R a tio

1 .2
probability prediction method for rebroadcasting behavior, and
a multi-agent cooperative rebroadcasting method based on
1 .0 virtual action and RL are proposed. Simulation results show
that the dynamic probability mechanism based on Q-Learning
0 .8 has obvious performance advantages over fixed probability
0 .6 flooding and basic flooding mechanism in dynamic network
scenarios after pre-learning process. In the future, this dynamic
0 .4 probability method can combine with other broadcast mecha-
0 .2 nisms to improve, which will be conducive to the further study
of UAV network broadcast.
0 .0
A B C D E F VI. ACKNOWLEDGMENT
This work was supported in part by the National Natural
Science Foundation of China under Grants No. 61702545.
Fig. 2: Packet receiving and sending ratio
R EFERENCES
[1] P. K. Sharma, and D. I. Kim, “Random 3D Mobile UAV Networks:

0 .5
Mobility Modeling and Coverage Probability,” IEEE Transactions on
Wireless Communications, vol. 18, no. 5, pp. 2527–2538, 2019.
S c e n a ry i [2] D. Wu, D. I. Arkhipov, M. Kim, C. L. Talcott, A. C. Regan, J.
S c e n a ry ii
P a c k e t R e c e iv in g a n d S e n d in g R a tio

0 .4
A. McCann, and N. Venkatasubramanian, “ADDSEN: Adaptive Data
Processing and Dissemination for Drone Swarms in Urban Sensing,”
IEEE Transactions on Computers, vol. 66, no. 2, pp. 183–198, 2017.
S ta n d a rd D e v ia tio n o f

[3] T. Lu and J. Zhu, “Genetic Algorithm for Energy-Efficient QoS Multicast


0 .3 Routing,” IEEE Communications Letters, vol. 17, no. 1, pp. 31–34, 2013.
[4] C. Ho, K. Obraczka, G. Tsudik, and K. Viswanath, “Flooding for
Reliable Multicast in Multi-Hop Ad Hoc Networks,” in Proceedings of
0 .2 the 3rd international workshop on Discrete algorithms and methods for
mobile computing and communications (DIALM), pp. 64-71, Augest
1999.
0 .1 [5] Y. C. Tseng, S. Y. Ni, and Y. S. Chen, “The Broadcast Storm Problem
in a Mobile Ad Hoc Network,” Wireless Networks, vol. 8, no. 2–3, pp.
153–167, 2002.
0 .0
B C D E F
[6] B. Williams and T. Camp, “Comparison of Broadcasting Techniques for
Mobile Ad Hoc Networks,” in Proceedings of International Symposium
on Mobile Ad Hoc Networking and Computing (MOBIHOC), pp. 194–
205, March 2002.
[7] J. Wu and F. Dai, “A Generic Distributed Broadcast Scheme in Ad Hoc
Fig. 3: Standard deviation of packet receiving and sending Wireless Networks,” IEEE Transactions on Computers, vol. 53, no. 10,
pp. 1343–1354, 2004.
ratio [8] Y. C. Tseng, S. Y. Ni, and E. Y. Shih, “Adaptive Approaches to Relieving
Broadcast Storms in a Wireless Multihop Mobile Ad Hoc Network,”
IEEE Transactions on Computers, vol. 52, no. 5, pp. 545–557, 2003.
[9] M. Gerla and Y. Yi, “Team Communications Among Autonomous
(i) For the scenario with a large number of transmitted da- Sensor Swarms,” ACM SIGMOD Record, vol. 33, no. 1, pp. 20–25,
ta, the performance of probability-based rebroadcasting 2004.
methods is better than that of simple flooding rebroad- [10] X. Han, L. F. Rossi, and C. Shen, “Autonomous Navigation of Wire-
less Robot Swarms with Covert Leaders,” in Proceedings of the 1st
casting method due to packet conflict. That is, reasonable international conference on Robot communication and coordination
reduction of transmitted data can improve the overall (RoboComm), pp. 1–8, October 2007.
performance of the network. [11] N. Wisitpongphan, O. K. Tonguz, J. S. Parikh, P. Mudalige, F. Bai, and
V. Sadekar, “Broadcast Storm Mitigation Techniques in Vehicular Ad
(ii) For dynamic network scenarios, due to the frequent Hoc Networks,” IEEE Wireless Communications, vol. 14, no. 6, pp.
changes of network topology, the performance of fixed 84–94, 2007.
probability rebroadcasting methods will be greatly affect- [12] E. Yanmaz, R. Kuschnig, and C. Bettstetter, “Achieving Air-Ground
Communications in 802.11 Networks with Three-Dimensional Aerial
ed by the topology, showing strong randomness. But the Mobility,” in Proceedings of IEEE International Conference on Com-
rebroadcasting methods based on Q-Learning can adapt puter Communications (INFOCOM), pp. 120–124, April 2013.
to the dynamic, and have better stability. [13] R. M. Pires, A. S. R. Pinto, K. R. L. J. C. Branco, “The Broadcast Storm
Problem in FANETs and the Dynamic Neighborhood-Based Algorithm
(iii) In order to play a better performance advantage, rebroad- as a Countermeasure,” IEEE Access, vol. 7, pp. 59737–59757, 2019.
casting method based on Q-Learning needs a learning [14] F. Dai and J. Wu, “Performance Analysis of Broadcast Protocols in Ad
process to adapt to the dynamic nature of the network. Hoc Networks Based on Self-Pruning,” IEEE Transactions on Parallel
and Distributed Systems, vol. 15, no. 11, pp. 1027–1040, 2004.
[15] N. Vlassis, “A Concise Introduction to Multiagent Systems and Dis-
V. C ONCLUSION tributed Artificial Intelligence,” A Concise Introduction to Multiagent
Systems and Distributed Artificial Intelligence, Morgan & Claypool,
In this paper, a rebroadcasting method based on dynamic 2007.
probability is proposed and analyzed. Based on this method, [16] R. S. Sutton and A. G. Barto, “Reinforcement Learning: An Introduc-
tion,” MIT Press, 1998.
each UAV predicts the probability of collision during rebroad-
casting according to the dynamic locations of their neighbors.

1057
Authorized licensed use limited to: ULAKBIM UASL ISTANBUL TEKNIK UNIV. Downloaded on January 15,2024 at 10:32:01 UTC from IEEE Xplore. Restrictions apply.

You might also like