Professional Documents
Culture Documents
Research On Dynamic Probability Mechanism of Rebroadcasting For UAV Swarm
Research On Dynamic Probability Mechanism of Rebroadcasting For UAV Swarm
Research On Dynamic Probability Mechanism of Rebroadcasting For UAV Swarm
Zhen Qin∗ , Tingjun Du† , Fei Xiong‡ , Hai Wang∗§ , Aijing Li∗ , Yajie Chen∗
∗ College of Communications Engineering, Army Engineering University of PLA, Nanjing, China
† Unit 78102 of PLA, Chengdu, China
‡ CMC Political and Law Commission, Beijing, China
§ Corresponding author: Hai Wang
Email: qzqzla912@163.com; 1874404772@qq.com; xbibid@126.com; hai.wang@gmail.com;
lishan wh@126.com; chenyajie0705@gmail.com
Abstract—Unmanned aerial vehicle (UAV) swarm needs to overwhelm the network. This overwhelming network results
broadcast routing messages to each other, accept unified control in overly redundant rebroadcasting (some nodes receive the
commands, and return the discovered targets’ information to same packet more than once), contention and collision, which
the ground control station in time. Most of these messages are
broadcast in the UAV swarm. Therefore, in the UAV swarm are referred to as the Broadcast Storm Problem (BSP) [5].
network, the efficiency of broadcasting and rebroadcasting will When UAVs participate in the swarm mission, there is also
determine the network efficiency of the entire network. Aiming at the demand of network broadcasting, such as the broadcasting
the broadcasting problem of UAV network, in order to adapt to of safe flight information between UAVs, the broadcasting
the dynamic nature of the network and improve the broadcasting of group formation status information, the broadcasting of
ability and efficiency, this paper focuses on the rebroadcasting
mechanism based on dynamic probability. A prediction method command and control information and perception information,
of collision probability of rebroadcasting behavior, and a multi- etc. Therefore, the BSP is the core problem to be solved first
agent cooperative rebroadcasting method based on virtual action for achieving UAV swarm control.
and reinforcement learning are proposed. Simulation results show
that the dynamic probability mechanism of rebroadcasting has To achieve efficient broadcasting and solve the BSP, many
obvious performance advantages over fixed probability flooding methods have been proposed [5]– [13]. The main approach to
and simple flooding in dynamic scenarios after pre-learning. solve the BSP in MANET is focused on how to reduce the
Index Terms—UAV swarm, rebroadcast, dynamic probability,
reinforcement learning. amount of redundant retransmissions, which can be achieved
by preventing a subset of the network nodes from doing
the retransmission. In general, these broadcast protocols are
I. I NTRODUCTION
categorized into: (i) Probability-based methods, which are
1053
Authorized licensed use limited to: ULAKBIM UASL ISTANBUL TEKNIK UNIV. Downloaded on January 15,2024 at 10:32:01 UTC from IEEE Xplore. Restrictions apply.
NR11 ∪R12 ∪R13 , and not in set NR0 . The loss is caused by the In summary, the FL of R11 is:
simultaneous dropping of R11 and other nodes in set NR0 .
F LrR11 = F LrR11 A1 + F LrR11 A2 + F LrR11 A3 + F LrR11 A4
We characterize Pji as the probability of node i forwarding
broadcast packets from node j. The calculations of profit and = PR0 R11 PR0 R13 + PR0 R11 [PR0 R12 (1 − PR0 R13 )+
loss are as follow: (1 − PR0 R12 )PR0 R13 + PR0 R12 PR0 R13 ]+
1) The FP of R11 : There is a node in set NA1 . This node 2PR0 R11 PR0 R12 .
can receive the packet from R11 when R11 forwards and R13 (14)
drops. As a result, the FP of R11 in set NA1 is:
The reward of the forwarding action of R11 is:
F P rR11 A1 = PR0 R11 (1 − PR0 R13 ) × 1. (5)
F rR11 = F P rR11 − F LrR11 . (15)
There is a node in set NA2 . This node can receive the packet
from R11 when R11 forwards and R12 , R13 drop. As a result, This forwarding has more advantages than disadvantages
the FP of R11 in set NA2 is: when F rR11 > 0, otherwise the opposite.
3) The DP of R11 : The node in set NA1 can receive the
packet when R11 drops and R13 forwards. As a result, the DP
F P rR11 A2 = PR0 R11 (1 − PR0 R12 )(1 − PR0 R13 ) × 1. (6) of R11 in set NA1 is:
There is a node in set NA3 . This node can receive the packet DP rR11 A1 = (1 − PR0 R11 )PR0 R13 × 1. (16)
from R11 when R11 forwards. As a result, the FP of R11 in
set NA3 is: The node in set NA2 can receive the packet when R11 drops
while R12 or R13 forwards. As a result, the DP of R11 in set
F P rR11 A3 = PR0 R11 × 1. (7) NA2 is:
There are two nodes in NA4 . These two nodes can receive
DP rR11 A2 =(1 − PR0 R11 )[PR0 R12 (1 − PR0 R13 )+
the packet from R11 when R11 forwards and R12 drops. As (17)
a result, the FP of R11 in set NA4 is: (1 − PR0 R12 )PR0 R13 ] × 1.
The node in set NA3 fails to receive the packet when R11
F P rR11 A4 = PR0 R11 (1 − PR0 R12 ) × 2. (8) drops. As a result, the DP of R11 in set NA3 is:
In summary, the FP of R11 is:
DP rR11 A3 = 0. (18)
F P rR11 = F P rR11 A1 + F P rR11 A2 + F P rR11 A3 + F P rR11 A4
Two nodes in set NA4 can receive the packet when R11
= PR0 R11 (1 − PR0 R13 )+ drops and R12 forwards. As a result, the DP of R11 in set
PR0 R11 (1 − PR0 R12 )(1 − PR0 R13 )+ NA4 is:
PR0 R11 + 2PR0 R11 (1 − PR0 R12 ).
(9) DP rR11 A4 = (1 − PR0 R11 )PR0 R12 × 2. (19)
2) The FL of R11 : The node in set NA1 fails to receive In summary, the DP of R11 is:
the packet from R11 when R11 and R13 forward at the same
time. As a result, the FL of R11 in set NA1 is: DP rR11 =DP rR11 A1 + DP rR11 A2 + DP rR11 A3 + DP rR11 A4
=(1 − PR0 R11 )PR0 R13 +
F LrR11 A1 = PR0 R11 PR0 R13 × 1. (10)
(1 − PR0 R11 )[PR0 R12 (1 − PR0 R13 )+
The node in set NA2 fails to receive the packet from (1 − PR0 R12 )PR0 R13 ] + 2(1 − PR0 R11 )PR0 R12 .
R11 when R11 forwards while at least one of R12 and R13 (20)
forwards. As a result, the FL of R11 in set NA2 is:
4) The DL of R11 : The node in set NA1 fails to receive the
packet when R11 and R13 drop at the same time. As a result,
F LrR11 A2 = PR0 R11 [PR0 R12 (1 − PR0 R13 )+ the DL of R11 in set NA1 is:
(1 − PR0 R12 )PR0 R13 + PR0 R12 PR0 R13 ] × 1.
(11)
DLrR11 A1 = (1 − PR0 R11 )(1 − PR0 R13 ) × 1. (21)
The node in set NA3 can receive the packet from R11 when
R11 forwards. As a result, the FL of R11 in set NA3 is: The node in set NA2 fails to receive the packet when R11
drops, while R12 and R13 forward or drop at the same time.
F LrR11 A3 = 0. (12) As a result, the DL of R11 in set NA2 is:
Two nodes in set NA4 fail to receive the packet from R11 DLrR11 A2 =(1 − PR0 R11 )[PR0 R12 PR0 R13 +
when R11 and R12 forward at the same time. As a result, the (22)
(1 − PR0 R12 )(1 − PR0 R13 )] × 1.
FL of R11 in set NA4 is:
The node in set NA3 fails to receive the packet when R11
F LrR11 A4 = PR0 R11 PR0 R12 × 2. (13) drops. As a result, the DL of R11 in set NA3 is:
1054
Authorized licensed use limited to: ULAKBIM UASL ISTANBUL TEKNIK UNIV. Downloaded on January 15,2024 at 10:32:01 UTC from IEEE Xplore. Restrictions apply.
The DL of UAV i:
DLrR11 A3 = (1 − PR0 R11 ) × 1. (23) X X Y δbc
DLri = (1−Pji ) [1 − δab Pja (1 − Pjc ) ].
Two nodes in set NA4 fail to receive the packet when R11 b∈Ni n a∈Ni c c∈Ni c \a
and R12 drop. As a result, the DL of R11 in set NA4 is: (34)
The reward of the forwarding action of UAV i is:
DLrR11 A4 = (1 − PR0 R11 )(1 − PR0 R12 ) × 2. (24) X Y δab
F ri = F P ri − F Lri = Pji [2 (1 − Pja ) − 1].
In summary, the DL of R11 is: b∈Ni n a∈Ni c
(35)
DLrR11 =DLrR11 A1 + DLrR11 A2 + DLrR11 A3 + DLrR11 A4 The reward of the dropping action of UAV i is:
=(1 − PR0 R11 )(1 − PR0 R13 ) + (1 − PR0 R11 )× Dri =DP ri − DLri
X X Y δ
[PR0 R12 PR0 R13 + (1 − PR0 R12 )(1 − PR0 R13 )]+ =(1 − Pji ) {2 [δab Pja (1 − Pjc ) bc ] − 1}.
1 − PR0 R11 + 2(1 − PR0 R11 )(1 − PR0 R12 ). b∈Ni n a∈Ni c c∈Ni c \a
(25) (36)
The reward of the dropping action of R11 is: III. A MULTI - AGENT COOPERATIVE REBROADCASTING
METHOD
DrR11 = DP rR11 − DLrR11 . (26) In the UAV swarm network, due to the movement of UAVs,
the proposed rebroadcasting method needs to adjust the prob-
The dropping has more advantages than disadvantages when ability of rebroadcasting behavior according to the changes
DrR11 > 0, otherwise the opposite. of UAVs’ positions. Furthermore, the change of position is a
B. The evaluation model for the rebroadcasting actions continuous process. Therefore, the adjustment is a process of
experience learning. Based on the evaluation model for the
Define Ni = {1, 2, . . . , N } as the neighbors set of UAV
rebroadcasting actions, a multi-agent collaborative Q-learning
i. Set Ni can be divided into three subsets according to the
algorithm is proposed which is similar to [15].
positions of UAVs when UAV j ∈ Ni broadcasts first.
We make each UAV sends hello packet periodically in one
The set of previous hop node:
hop range. And their neighbors use Q-learning algorithm to
learn the information received from hello packets independent-
Ni p = {j}. (27)
ly. When a UAV receives a hello packet, it should do a virtual
The set of current hop nodes: action that the UAV assumes a rebroadcasting action has done.
Then the UAV evaluates and learns the reward of this virtual
action. Finally, the UAV updates its Q value table.
Ni c = {n|0 <k Gn − Gj k3 ≤ Cm , n ∈ Ni }. (28)
A. Q value table
The set of next hop nodes:
TABLE I: Q value table
Ni n = {n| k Gn − Gj k3 > Cm , n ∈ Ni }. (29) Fji Dji Gj tj
The communication relation function can be expressed as Sji
0, k Gp − Gq k3 > Cm As shown in Table I, UAV i creates a Q value table Qji
δpq = , (30)
1, k Gp − Gq k3 ≤ Cm for UAV j when i receives a hello packet from j for the first
where UAV p cannot communicate with UAV q when δpq = 0. time. Qji has a state Sji (it means i receives a packet from
UAV p can communicate with UAV q when δpq = 1. j), two actions (forward Fji and drop Dji ), two information
The reward function is composed of the profit and loss (GPS Gj and hello time tj ). Every time UAV i receives hello
of each UAV in set Ni n . The general forms of the reward packet from UAV j, the information should be updated. UAV
function for rebroadcasting actions are proposed as follow: i should delete Qji once it has not received hello packet from
The FP of UAV i: UAV j at time tj +∆, which means the distance between UAV
X Y i and UAV j is more than Cm . For example, the node R11 in
δ
F P ri = Pji (1 − Pja ) ab . (31) Fig. 1 should create seven Q value tables for NR11 .
b∈Ni n a∈Ni c UAV i should select a rebroadcasting action when it receives
a broadcasting packet from UAV j. The selection probability
The FL of UAV i:
X Y of forwarding is calculated according to the formula of Boltz-
δab mann:
F Lri = Pji [1 − (1 − Pja ) ]. (32)
b∈Ni a∈Ni
n c
eFji /T
The DP of UAV i: Pji = , (37)
X X Y eFji /T + eDji /T
δbc
DP ri = (1−Pji ) [δba Pja (1 − Pjc ) ]. where T is a positive parameter called the temperature [16].
b∈Ni n a∈Ni c c∈Ni ch \a
High temperatures cause the actions to be all (nearly) e-
(33) quiprobable. Low temperatures cause a greater difference in
1055
Authorized licensed use limited to: ULAKBIM UASL ISTANBUL TEKNIK UNIV. Downloaded on January 15,2024 at 10:32:01 UTC from IEEE Xplore. Restrictions apply.
Algorithm 1 Q-learning algorithm of UAV i IV. N UMERICAL R ESULTS AND D ISCUSSIONS
1: UAV i keeps listening, and receives the hello packet from
UAV j; The proposed method is simulated by Exata. For our sim-
2: If this is the first time to receive the hello message from ulations, we assume that 100 nodes are randomly distributed
UAV j, a Q value table Qji is constructed. Fji and Dji to form a UAV swarm network in a square area of side length
are set to 1. If Qji has been constructed, we update the equal to 800m. The maximum communication distance of
location and time information of the Q value table; each UAV is 168m, and the bandwidth is 2M bits. The UAVs
3: UAV i selects forward action according the probability: move randomly with a maximum speed of 10m/s and a pause
eFji /T interval of 30 seconds. Each UAV sends a hello message every
Pji = Fji /T ; 2 seconds, and sends broadcast packets every fixed interval.
e +eDji /T
4: The reward of the actions is calculated by formula (40). The size of broadcast packet is 300kbit, TTL is 7, and the data
5: UAV i update Q value table Qji : transmission time is 1800 seconds. Experiments are carried out
in two scenarios: (i) each UAV sends broadcast packet at an
Fji = Fji + α[r(ac) + γ max{Fji , Dji } − Fji ], ac = F orward,
Dji = Dji + α[r(ac) + γ max{Fji , Dji } − Dji ], ac = Drop.
(41)
interval of 5 seconds; (ii) each UAV sends broadcast packet
at an interval of 10 seconds.
The performance of six broadcast mechanisms is evaluated,
and 10 groups of experiments are conducted for each broadcast
mechanism in two scenarios. The six broadcast mechanisms
selection probability for actions that differ in their value are as follows:
estimates.
• A: Flooding
• B: 33% probability flooding
B. Q-learning algorithm
• C: 50% probability flooding
All Q value tables are included in the hello packet when • D: 75% probability flooding
UAV j broadcasts the hello message. UAV i will do a virtual • E: Q-learning
action that assumes a rebroadcasting action has done when • F: Q-learning with delay (1000s pre-learning)
it receives the hello packet from j. UAV i should decide to In this scenario, the amount of broadcasting data is very
forward or drop the packet, and evaluate the decision, then large, and serious collision is inevitable. It is difficult for
learn the reward and update its Q value table. UAV i updates data packets to be effectively transmitted, resulting in the
its Q values according to the Bellman equation which can be poor transmission effect of the whole network. In this extreme
expressed as: condition, it is impossible to ensure the normal transmission of
each packet. Only when the network conditions permit, part of
Qt+1 (st, ac) =Qt (st, ac) + α[rt (st, ac)+ the packets can be transmitted to the outside as far as possible.
(38) Considering that the number of broadcast packets is deter-
γ max0
Qt (st0 , ac0 ) − Qt (st, ac)], mined, whether the network conditions can be improved can
ac
only be controlled by adjusting the behavior of rebroadcasting.
where Qt (st, ac) is the Q value with rebroadcasting deci- That is, when the network conditions permit, the probability
sion ac in state st at time t. rt (st, ac) is the reward func- of rebroadcasting increases correspondingly; when the network
tion when the transition is made from st with decision ac. conditions are bad, the probability of rebroadcasting decreases
max Qt (st0 , ac0 ) means the maximum possible Q values after correspondingly. Therefore, the network data transmission
ac0
the transition with next possible action ac0 . α is the learning situation can be evaluated by referring to the index of packet
rate and controls the learning speed. And γ ∈ [0, 1] is a receiving and sending ratio. The meaning of this parameter is:
discount factor to maintain stableness. Since the state space the number of packets correctly received by all UAVs divided
of st only has one state and the rebroadcasting decision has by the number of packets successfully sent by UAVs in the
two actions (forward and drop), the equation can be simplified form of broadcasting and rebroadcasting.
as below: As shown in Fig. 2 and Fig. 3, compared with other method-
s, flooding broadcast mechanism has a lower packet receiving
Qt+1 (ac) = Qt (ac) + α[rt (ac) + γ max{F, D} − Qt (ac)]. and sending ratio, which indicates that the probability-based
(39) mechanism can improve the broadcast performance, and the
performance improvement is about 40%. Comparing several
According to the evaluation model for the rebroadcasting probability-based methods in two scenarios, none of them has
actions mentioned earlier, the reward function can be defined
as below: obvious advantages in terms of packet receiving and sending
ratio. However, combined with the standard deviation of packet
(1 − Pja )δab − 1), F orward
P Q
Pji
b∈Ni n
(2
a∈Ni c
receiving and sending ratio, it is found that the performance of
r(ac) = .
(1 − Pji )
P
{2
P
[δab Pja
Q
(1 − Pjc )δbc ] − 1}, Drop the three methods based on fixed probability changes greatly
b∈Ni n a∈Ni c c∈Ni c \a
(40) due to the movement of UAVs. On the contrary, the fluctuation
of the two Q-learning methods is smaller and the performance
Based on the above analysis, a Q-learning algorithm is is more stable. Especially, the method based on delayed Q-
proposed. The implementation procedure of one UAV is shown learning has better stability.
in Algorithm 1. All UAVs use Q-learning algorithm to make Through the further analysis of the experimental phenome-
decision independently. na, the following conclusions are drawn:
1056
Authorized licensed use limited to: ULAKBIM UASL ISTANBUL TEKNIK UNIV. Downloaded on January 15,2024 at 10:32:01 UTC from IEEE Xplore. Restrictions apply.
And then it evaluates the value of rebroadcasting behavior,
while forming a dynamic broadcast probability. In order to
1 .4 S c e n a ry i model the evaluation for rebroadcasting behavior, a collision
S c e n a ry ii
P a c k e t R e c e iv in g a n d S e n d in g R a tio
1 .2
probability prediction method for rebroadcasting behavior, and
a multi-agent cooperative rebroadcasting method based on
1 .0 virtual action and RL are proposed. Simulation results show
that the dynamic probability mechanism based on Q-Learning
0 .8 has obvious performance advantages over fixed probability
0 .6 flooding and basic flooding mechanism in dynamic network
scenarios after pre-learning process. In the future, this dynamic
0 .4 probability method can combine with other broadcast mecha-
0 .2 nisms to improve, which will be conducive to the further study
of UAV network broadcast.
0 .0
A B C D E F VI. ACKNOWLEDGMENT
This work was supported in part by the National Natural
Science Foundation of China under Grants No. 61702545.
Fig. 2: Packet receiving and sending ratio
R EFERENCES
[1] P. K. Sharma, and D. I. Kim, “Random 3D Mobile UAV Networks:
0 .5
Mobility Modeling and Coverage Probability,” IEEE Transactions on
Wireless Communications, vol. 18, no. 5, pp. 2527–2538, 2019.
S c e n a ry i [2] D. Wu, D. I. Arkhipov, M. Kim, C. L. Talcott, A. C. Regan, J.
S c e n a ry ii
P a c k e t R e c e iv in g a n d S e n d in g R a tio
0 .4
A. McCann, and N. Venkatasubramanian, “ADDSEN: Adaptive Data
Processing and Dissemination for Drone Swarms in Urban Sensing,”
IEEE Transactions on Computers, vol. 66, no. 2, pp. 183–198, 2017.
S ta n d a rd D e v ia tio n o f
1057
Authorized licensed use limited to: ULAKBIM UASL ISTANBUL TEKNIK UNIV. Downloaded on January 15,2024 at 10:32:01 UTC from IEEE Xplore. Restrictions apply.