A_Successive_Deep_Q-Learning_Based_Distributed_Handover_Scheme_for_Large-Scale_LEO_Satellite_Networks

A Successive Deep Q-Learning Based Distributed Handover
Scheme for Large-Scale LEO Satellite Networks

Haotian Liu, Yichen Wang, and Yixin Wang
2022 IEEE 95th Vehicular Technology Conference (VTC2022-Spring) | 978-1-6654-8243-1/22/$31.00 ©2022 IEEE | DOI: 10.1109/VTC2022-Spring54318.2022.9860376
School of Information and Communications Engineering, Xi’an Jiaotong University, Xi’an, Shaanxi 710049, China
Email: {lht1998@stu.xjtu.edu.cn, wangyichen0819@mail.xjtu.edu.cn, wangyixin6475@stu.xjtu.edu.cn}
Abstract—With the rapid increasing number of the deployed finding an optimal path in the constructed graph [6]–[8]. More-
satellites, the handover strategy becomes more challenging for over, the machine learning based handover schemes were also
large-scale low-earth orbit (LEO) constellations. In this paper, developed in which the users’ QoS requirements can be effi-
a distributed satellite handover scheme for large-scale LEO
constellations is proposed, which not only takes the handover ciently integrated into the scheme designs [9]–[11]. Besides the
delay, handover failure, quality-of-service (QoS) requirements of abovementioned research, a multi-layer handover management
users, and inter-satellite traffic balancing into consideration, but framework and software-defined networking based handover
also enables each user to dynamically perform the handover mechanism were proposed in [12] and [13], respectively.
process only with the local information. Specifically, we adopt
a shadowed Rice model to characterize the user-satellite channel, Although many handover schemes have been proposed for
which is determined by the elevation angle between the user and LEO satellite networks, it is still very challenging to implement
satellite. Then, the user utility function is designed, where the the existing schemes in the large-scale LEO constellations.
user transmission rate requirement and the number of available Specifically, on the one hand, the large number of satellites
channels of visible satellites are jointly considered. An overall
long-term utility maximization problem is further formulated. By will significantly increase the computation complexity and
exploiting the independence feature of different satellites and the signaling overhead, especially for the centralized handover
fact that each user only has finite number of visible satellites, a schemes. On the other hand, the existing schemes rely on
low-complexity successive deep Q-learning algorithm is developed, the global information of the entire LEO satellite networks.
which can significantly reduce the dimensions of state spaces and However, the huge number of satellites in the large-scale LEO
efficiently solve the formulated problem in a distributed manner.
Simulation results show that the proposed scheme can achieve constellations puts heavy burdens on the capacity-limited user
better performance over existing methods. terminals to acquire the global information. Moreover, the
Index Terms—Large-scale LEO satellite networks, distributed highly dynamic environment of large-scale LEO constellations
satellite handover, low-complexity deep reinforcement learning. is not sufficiently cognized by the existing handover schemes,
which may degrade the network performance. Consequently,
I. I NTRODUCTION there is an urgent need to design a distributed low-complexity
Low-earth orbit (LEO) satellites network is recognized as satellite handover scheme for large-scale LEO constellations,
one of the most promising way to achieve seamless global which enables each user to independently perform the handover
communications and will play an important role in the fu- process only based on the local information.
ture sixth-generation (6G) networks due to its superiority in
To achieve the above goal, we propose a distributed satellite
providing the low-latency and broadband communications [1].
handover scheme for large-scale LEO constellations, which
However, the high mobility of the LEO satellites leads to the
allows each user to adaptively perform the handover without
very limited coverage time of single satellite and thus the
the global network information. Specifically, a shadowed Rice
handover has to be frequently implemented during the user’s
model is adopted to characterize the user-satellite link, which
service duration [2]. Consequently, the handover scheme design
is determined by the elevation angle between the user and
is critically important to the LEO satellite networks.
satellite. Then we consider the user transmission rate require-
Early research towards the handover schemes for LEO
ment and the number of available channels of visible satellites
satellite networks mainly concentrated on the beam handover
jointly and design the user utility function. An overall long-
scheme designs [3]–[5]. In particular, most beam handover
term utility maximization problem is further formulated. By
schemes focused on designing the channel allocation strate-
utilizing the independence of different satellites and the fact
gies without considering user’s the quality-of-service (QoS)
that the visible satellites of each user are limited, a successive
requirements. In recent years, with the launch of the large-
deep Q-learning (SDQL) algorithm is developed, which can
scale LEO satellite networks and the application of spotbeam
significantly reduce the dimensions of state spaces and solve
technologies, the satellite handover schemes have attracted
the formulated problem effectively in a distributed manner.
much research attention from both industry and academia.
Simulation results show that the proposed scheme can achieve
Specifically, the graph theory was used to design the satellite
better performance over exiting methods.
handover schemes, where the handover process was viewed as
The rest of this paper is organized as follows. Section II
This work (correspondence author: Yichen Wang) was supported in part by presents the system model. In Section III, the distributed
the National Natural Science Foundation of China under Grant 61871314 and
in part by the Key Research and Development Program of Shaanxi Province satellite handover scheme is proposed. Section IV provides the
under Grant 2019ZDLGY07-04. simulation results and Section V concludes the paper.
978-1-6654-8243-1/22/$31.00 ©2022
Authorized licensed use limited IEEE
to: UNIVERSITY PUTRA MALAYSIA. Downloaded on January 04,2024 at 22:28:35 UTC from IEEE Xplore. Restrictions apply.
7+
II. S YSTEM M ODEL 7$ 7$
ĂĂ ĂĂ ĂĂ
We consider a downlink transmission scenario in a large- 8VHU

+DQGRYHUVWDJH +DQGRYHUVWDJH 'DWDWUDQVPLVVLRQ
7LPHW
scale LEO satellite network which consists of M LEO satellites +DQGRYHU

VWDJH
+DQGRYHU
+DQGRYHUWR +DQGRYHU
and N ground users. The index sets of users and LEO satellites DQHZVDWHOOLWH IDLO VXFFHHG 'HFLVLRQ
are denoted by N = {1, 2, · · · , N } and M = {1, 2, · · · , M }, Fig. 1. The handover mechanism of the proposed scheme.
respectively. Each satellite is assumed to have Cmax channels.
The time is divided into slots with the duration ts . Then, if user characterize the ground-satellite channel, it will increase the
i (i ∈ N ) locates in the coverage area of satellite j (j ∈ M), handover complexity and thus puts greater challenge on the
i.e., satellite j is visible to user i, the overall channel power distributed adaptive handover scheme design.
gain between user i and satellite j in slot t can be written
as [14] III. T HE P ROPOSED D ISTRIBUTED S ATELLITE H ANDOVER
Qi,j (t) = Li,j (t) · GS (ϕi,j ) · GT · hi,j (t) (1) S CHEME FOR L ARGE -S CALE LEO C ONSTELLATIONS
where Li,j (t), GS (ϕi,j ), GT , and hi,j (t) denote the free space A. Handover Mechanism Descriptions
loss, satellite antenna gain, user antenna gain, and small- As the large-scale LEO constellation contains a huge number
scale channel power gain between user i and satellite j in of satellites, each user is usually covered by multiple satellites.
slot t, respectively. Moreover, hi,j (t) is assumed to follow Moreover, due to the high mobility of LEO satellites, the
the shadowed Rice fading model with the probability density available satellites for each user change dynamically. To enable
function (PDF) [15] users to track the highly dynamic LEO satellite network and
mi,j
1 2bi,j mi,j x avoid the transmission outage, we define the handover frame
fhi,j |θi,j (t) (x) = exp −
2bi,j 2bi,j mi,j + Ωi,j 2bi,j that includes TH slots. In this way, during the data transmission
period, users have to decide whether to perform the satellite
Ωi,j x
1 F1 mi,j , 1, (2) handover every TH slots.
2bi,j (2bi,j mi,j + Ωi,j ) To be more specific, as show in Fig. 1, each handover frame
where 2bi,j and Ωi,j denote the average power of the multi- can be divided into two parts, namely handover stage and
path and the line-of-sight (LoS) components between user i data transmission stage. The durations of the handover and
and satellite j, respectively, mi,j is the Nakagami-m fading transmission stages for each user change dynamically from one
parameter, and 1 F1 (·, ·, ·) is the confluent hypergeometric frame to another. Each handover stage includes TA slots used
function. Note that, the parameters bi,j , Ωi,j , and mi,j in (2) for the signaling exchanges and onboard processing. Different
are determined by the elevation angle θi,j (t) that is defined from the existing works, we assume that the handover might
as the angle between the local horizontal plane of user i and be failed if the channel quality between the user and the newly
the direction towards satellite j in slot t. To be specific, the switched satellite causes the outage. In that case, the user will
relationships between the elevation angle θi,j (t) and bi,j , Ωi,j , re-perform the handover process during the next TA slots until
and mi,j are given by [15] the end of the current frame. Moreover, if the user fails to
access one satellite at the end of the current handover frame or
bi,j = −4.7943 × 10−8 θi,j
3
+ 5.5784 × 10−6 θi,j
2
all channels of the newly switched satellite have been occupied,
−2.1344 × 10−4 θi,j + 3.2710 × 10−2 (3) the user’s service will be terminated.
In slot t, the transmission rate between user i and satellite
Ωi,j = 1.4428 × 10−5 θi,j
3
− 2.3798 × 10−3 θi,j
2
j is given by
+1.2702 × 10−1 θi,j − 1.4864 (4)
PK Li,j (t)GS (ϕi,j )GT hi,j (t)
Ri,j (t) = B log 1 + (6)
σ2
mi,j = 6.3739 × 10−5 θi,j
3
+ 5.8533 × 10−4 θi,j
2
where B is the bandwidth of each channel, PK is the transmit
−1.5973 × 10−1 θi,j + 3.5156. (5) power, and σ 2 is the average power of additive white Gaussian
We assume that hi,j (t) (i ∈ N , j ∈ M) follows the quasi- noise (AWGN). Note that, thanks to the hybrid wide-spot beam
static fading model. As the positions of users and satellites will coverage technique that enables the narrow bandwidth spot
not change significantly in each slot, i.e., the elevation angle beams to steer to users [1], the interference between different
θi,j (t) can be viewed as static in each slot, it is reasonable users can be ignored. Then, the transmission outage probability
to assume that Qi,j (t) remains unchanged in each slot, but is determined by
varies independently from one slot to another. We can clearly φi,j (t)
f
observe from (1)-(5) that, different from the widely used Pi,j = Pr Ri,j (t) < Rmin = fhi,j |θi,j (t) (x)dx (7)
0
channel model, where the channel power gain is described
σ 2 (2Rmin /B −1)
by the single PDF, the channel model employed in this paper where φi,j (t) = and Rmin denotes the
Pk Li,j (t)GS (ϕi,j )GT
is characterized by a time-varying PDF that is determined by minimum transmission rate requirement.
the elevation angle θi,j (t). Although the adopted time-varying Based on the above descriptions, the proposed handover
elevation angle based channel model can more accurately mechanism can be summarized as follows:
Authorized licensed use limited to: UNIVERSITY PUTRA MALAYSIA. Downloaded on January 04,2024 at 22:28:35 UTC from IEEE Xplore. Restrictions apply.

• If the user decides not to perform the handover in current GC
i,j (t) = G C
t 0
i,j i,j = U C
i,j x j (t 0
i,j ) · BC , t0i,j ∈ Di (9)
frame, then the user will continue to access the currently
where BC is the minimum cost for occupying one channel,
connected satellite and the entire TH slots can be used for
t0i,j denotes the slot that user i decides to switch to satellite j,
data transmissions.
xj (t) represents the number of occupied channels of satellite
• If the user decides to perform the handover, then TA slots C
j, and Ui,j (·) is the cost coefficient. To efficiently realize the
will be consumed.
inter-satellite load balancing and guarantee the fairness for
• If one handover process is failed, another handover stage C
user transmissions, we expect that Ui,j (·) is the increasing
including TA slots will be initiated.
function of the number of occupied channels of the satellite
• The maximum allowed number of handover stages in each C
in the current slot. Then, Ui,j (x) is designed as a Sigmoid-like
handover frame is TH /TA , where · is floor function.
function and is given by
• If the handover process succeeds after K (1 ≤ K ≤ KC − 1
C
TH /TA ) handover stages, the data transmission stage Ui,j (x) = 1 + (10)
10 Cmax
of the current frame will includes (TH − KTA ) slots. 1 + exp − x−
• If all TH /TA handover processes are failed, the user’s Cmax 2
C
service will be terminated. where KC is the upper-limit of Uj (x), Cmax is the maximal
number of available channels per satellite. We can observe
B. Problem Formulation
from (9) and (10) that, if user i performs handover to satellite
Due to the time-varying network environment and dynami- j, the cost that user i needs to pay in slot t is determined by
cally changed visible satellites of users, the proposed handover the number of idle channels that satellite j has in slot t0i,j , i.e.,
scheme faces the following two challenges: 1) For each single the slot that user i decides to switched to satellite j. If the
user, the handover decision has to be adaptively performed connection between user i and satellite j does not change, the
to avoid the transmission outage; 2) For the entire large-scale cost that user i needs to pay in each slot keeps unchanged.
LEO constellation, the network traffic load should be balanced
Based on (8)-(10), the utility function of user i with satellite
among the available satellites. To solve the above two issues,
j in slot t is given by
we will design the utility function for each user, which takes
user’s transmission requirement and the inter-satellite traffic Gi,j (t) = γi,j (t)GP C
i,j (t) − βi,j (t)Gi,j (t), i ∈ N , j ∈ M. (11)
balancing into consideration. Then, we can formulate the optimization problem that aims to
We assume that user i (i ∈ N ) is going to initiate the
maximize the overall network long-term utilities under the the
transmission in slot ta and will finish the transmission in slot
handover decision constraint as well as the satellite capacity
te . Then, the index set of slots that user i makes handover
and user-satellite connection requirements. Specifically, the for-
decisions is denoted by Di = {ta , ta + TH , · · · , ta + LTH },
mulated optimization problem can be mathematically written
where L is the maximum integer that satisfied ta + LTH ≤ te .
as
Denote by βi,j (t) ∈ {0, 1} the handover decision of user i ∞ M N
for satellite j (j ∈ M), where βi,j (t) = 1 means that user i (OP1) max Gi,j (t) (12)
β(t)
performs the handover operation in slot t (t ∈ Di ) and decides t=0 i=1 j=1
to switch to satellite j and βi,j (t) = 0 denotes that user i M

will not connect to satellite j. As the handover failure might s.t. βi,j (t) ≤ 1, ∀t, ∀i ∈ N (13)
occur, we further denote by γi,j (t) ∈ {0, 1} the connection j=1
status between user i and satellite j, where γi,j (t) = 1 means N

that user i is connected to satellite j in slot t and γi,j (t) = 0 βi,j (t) ≤ Cmax , ∀t, ∀j ∈ M (14)
implies that the connection between user i and satellite j has i=1
not been established. βi,j (t) = βi,j (t − 1), ∀t ∈
/ Di , ∀i ∈ N (15)
Suppose that the size of each data packet is Sp and users βi,j (t) ∈ {0, 1}, ∀t, ∀i ∈ N , ∀j ∈ M (16)
can obtain the reward denoted by Bp if one data packet is
γi,j (t) ∈ {0, 1}, ∀t, ∀i ∈ N , ∀j ∈ M (17)
successfully transmitted. Then, the reward that user i obtains
from satellite j in slot t can be written as where β(t) = [βi,j (t)]N ×M (i ∈ N , j ∈ M) denotes the
⎧
handover decision matrix. Constraints (13) and (14) indicate
⎨ 0,
⎪ Ri,j (t) < Rmin
P that each user can only choose one satellite for handover and
Gi,j (t) = Ri,j (t) · ts (8)
⎪
⎩ BP · , Ri,j (t) ≥ Rmin each satellite can service at most Cmax users in each slot,
Sp respectively. Constraints (15) represents that the handover is
where · is floor function and Rmin denotes the minimum only triggered at the beginning of handover frames. We can
transmission rate requirement. observe from the formulated problem (OP1) that, the optimal
Once satellite j receives the handover request from user i, it handover decision β(t) is time-varying and should be adaptive
will reserve one channel to user i for transmissions no matter to the dynamical network status. Moreover, the handover deci-
whether user i has switched to it successfully. Therefore the sion of each user must consider the possibly obtained reward
user i has to pay the cost in both handover stage and data and the network traffic load balancing simultaneously. As it is
transmission stage. The cost is given by very difficult to solve the formulated problem, we will establish
the Markov decision process (MDP) framework and develop • The global information: As shown in (18), the global
a low-complexity successive deep Q-learning algorithm in the information of the entire constellation is required for user
following section, which can efficiently address the formulated i to update the state si (t), which incurs heavy information
problem in a distributed manner. exchange overheads.
• The huge state space: The state space dimension is very
C. The Proposed Successive Deep Q-Learning Algorithm
large, which puts great challenge to users to adaptively
The deep reinforcement learning (DRL), which can achieve make the handover decisions in a distributed manner.
the long-term optimization through dynamically interacting
To address the above two issues, the following two character-
with the environment, is an efficient approach to solve the
istics of the large-scale LEO constellation are exploited.
formulated optimization problem (OP1). Recall that, the index
• Limited number of visible satellites: Although the num-
set of slots that user i (i ∈ N ) makes the handover decisions
is denoted by Di = {ta , ta + TH , · · · , ta + LTH }. Then, we ber of satellites in the constellation is huge, the number of
can establish the MDP framework of the formulated problem visible satellites for each user is limited. As the handover
as follows. decisions are affected only by the visible satellites, the
1) Agent: Each user is an agent, which independently makes state of each user only needs to contain the information
the handover decisions. of the visible satellites.
• Weak correlation between satellites: As given by (1)-
2) State: At the beginning of slot t (t ∈ Di ), users will
update the corresponding state. For user i (i ∈ N ), the state is (5), although each user has multiple visible satellites, the
given by PDFs of the channel power gains of different satellites,
⎡ ⎤ which are determined by the elevation angle θi,j , are
θi,1 (t) · · · θi,j (t) · · · θi,M (t)
⎢ v ⎥ distinct. Thus, the state of each user can be decoupled
v v
⎢ θi,1 (t) · · · θi,j (t) · · · θi,M (t) ⎥ into a number of independent substates.
⎢ ⎥
⎢ a (t) · · · θa (t) · · · θa (t) ⎥
si (t) = ⎢ θi,1 i,j i,M ⎥ (18) By utilizing the above two characteristics, a successive
⎢ ⎥ deep Q-learning (SDQL) algorithm based on neural network
⎢ G C (t) ⎥
C (t) · · · G
⎣ i,1 (t) · · · G
C
i,j i,M ⎦ is developed. Fig. 2 shows the interaction of the agent and
γi,1 (t) ··· γi,j (t) ··· γi,M (t) the environment of the SDQL algorithm. A neural network
called Q-network is used to map the action a in a certain state
where θi,j (j ∈ M) is the elevation angle between user i and
v a v s to its value, namely (s, a) → Q(s, a|ω), where ω is the
satellite j, θi,j (t) = θi,j (t) − θi,j (t − TH ), θi,j (t) = θi,j (t) −
vector of weights and biases of Q network and value Q(s, a|ω)
v C
θi,j (t − TH ), Gi,j (t) denotes the expected cost for user i to represents the estimated long-term reward if action a is taken
occupy the channel of satellite j in slot t, and γi,j (t) ∈ {0, 1} in state s. The mapping process is completely implemented by
is the connection status between user i and satellite j. To be the network without any artificial restrictions. The algorithm
more specific, the expected cost G C (t) is determined by
i,j
⎧ can be divided into the decision phase and training phase.
C
⎨ Gi,j (t), if βi,j (t) = 1 1) The decision phase: In this phase, the handover decision
GC
i,j (t) =
(19) is made based on the current network with fixed parameters.
⎩ Ui,j
C
xj (t) · BC , otherwise As we have mentioned above, when we evaluate the value
where xj (t) denotes the number of occupied channels of of a satellite, states of other satellites have little influence on
v it. Consequently, we covert the original state of user i into
satellite j in slot t. Moreover, it can be obtained that θi,j (t)
a
and θi,j (t) are used to characterize the mobility of satellite j. a list Si (t) containing substates of all visible satellites. Each
3) Action: The action of user i in slot t is denoted by substate represents the state of a visible satellite, which can be
expressed as
ai (t) = βi,1 (t), · · · , βi,j (t), · · · , βi,M (t) , i ∈ N (20)
Si (t) = si,1 (t), · · · , si,j (t), · · · , si,Lti (t) (22)
which implies the handover decision of user i. Moreover, where
the action ai (t) (i ∈ N ) needs to satisfy the corresponding
constraints of the problem (OP1). si,j (t) = θi,j (t) v
θi,j (t) a
θi,j (t) C
Gi,j (t) γi,j (t) (23)
4) Reward: The reward of user i is designed as the average
utility that obtained in the last handover frame, which is given Lti = {1, · · · , Lti } is the index set of visible satellites for user
by i in slot t and Lti is the number of visible satellites of user i.
t M Since the motion patterns of satellites in the constellation
1
ri (t) = Gi,j (n). (21) are the same, we can evaluate the value of different satellites
TH j=1
n=t−TH with only one neural network. A fully connected network Q
Based on the above discussions, we have established the in constructed as show in Fig 2. There is only one action
general MDP framework for the formulated problem and the ai,j corresponding to each substate si,j , thus the output of
deep Q-learning technique is a promising approach to solve network Q can be completely determined by the substate si,j
the MDP problem. However, the conventional deep Q-learning when parameters in ω are fixed. We denote the output by
method cannot be applied directly to solve the formulated Q(si,j |ω). By successively inputting the substates in Si (t)
problem due to the following reasons. into the network, we can obtain the Q-value list which can
6WDWH VL ªV V VL / º
5HIRUPXODWLRQ L
¬ LL L W
L ¼ Algorithm 1: SDQL Algorithm for user i
,QSXWVXFFHVVLYHO\
1 Initialization:
2 Initialize learning rate α, discount factor γ, evaluated network Q and
5HZDUG UL
'HHS
target network Q , memory pool D, termination time T
1HWZRUN 3 for k = 1:K do
4 Reset the environment and set t ← 0, S ← ∅
(QYLURQPHQW
5 while t < T do
2XWSXWVXFFHVVLYHO\
6 if Handover triggered then
$FWLRQ DL L
ª4 VL
¬
4 VL 4 VL /W º
L ¼
7 Obtain the Si (t) based on (22), (23)
8 Calculate Qi (t) based on (24)
$JHQW
9 Select the satellite
j ∈ L based on (25) with -greedy
Fig. 2. The MDP framework for the proposed SDQL algorithm strategy
10 if S = ∅ then
be expressed as 11 Obtain the S and R based on (26) and
(27),respectivaly
Qi (t) = Q si,1 (t)|ω , · · · , Q si,Lti (t)|ω . (24) 12 Store {S, A, R, S } in memory pool D
13 Sample a batch of experience tuples from memory
Based on the Qi (t), the best satellite to perform handover for pool D
14 Train the evaluated network Q based on (28), (29)
user i is given by 15 Every some step, update networks Q
j = arg maxt Q(si,j (t)|ω). (25) 16 end
j∈Li 17 S ← si,j , A ← j
18 end
2) The training phase: In this phase, users update the 19 t←t+1
parameter vector ω of network Q based on the feedback from 20 end
21 end
the environment. It’s assumed that user i chooses the satellite

j in slot t and records S = si,j (t) and A = j. Then, at the
beginning slot of next handover frame t + TH , user i records
the R and S as • Easy to train: Users do not have to obtain the global
t+T
1 H information to train the network. What’s more, the expe-
R= Gi,j (n) (26) rience from any satellite can be used to train the network
TH n=t
and improve the handover strategy for all satellites.
and
if

si,j (t ), j ∈ Lti IV. S IMULATION R ESULTS
S = (27)
if
In this section, we evaluate the proposed distributed han-
∅, j∈/ Lti
dover scheme through simulations. A OneWeb-like constella-
respectively, where S = ∅ represents the special case that the
tion is constructed which consists of 18 planes and 40 satellites
satellite
j is not visible in the next frame. Now we obtain the
are deployed in each plane. The altitude and inclination of each
experience tuple {S, A, R, S }. User i updates the network
plane are 1200 km and 90◦ , respectively. The minimum eleva-
parameters in ω with a batch of experience tuples to minimize
tion of visual satellites is 20◦ . Users are uniformly distributed
the loss function and the loss function is defined as
in a square hot spot area with a side length of 220 km, centered
2
L(ω) = E (R + γQt − Q(S|ω)) (28) on (40◦ N,116◦ E). Users’ service arrivals are modeled as a
where γ is the discount factor and Qt is the target Q-value. Poisson process with arrival rate λ. The duration of services
Moreover, Qt is determined by is modeled as a exponential distribution with a mean Tm .
Since the speed of the satellite is much faster than that of
0, if S = ∅
Qt = (29) ground users, we assume that users are stationary. The earth’s
Q (S |ω ), otherwise
rotation is taken into consideration in the simulation. A full
where Q is target network with parameter vector ω and has connected network which contains two hidden layers is adopted
the same architecture as the network Q. The detailed algorithm for each user and the numbers of neurons in hidden layers
is given in Algorithm 1. The advantages of the proposed SDQL are 240 and 250, respectively. ReLU function is used as the
algorithm are summarized as follows. activation function. The learning rate α = 5 × 10−5 and the
• Lower complexity: The developed SDQL algorithm can discount factor γ = 0.9. Exploration rate is set to 1 at the
greatly reduced the dimensions of the state space such beginning and gradually decreased to 0.1. The rest parameters
that only one simple neural network is required and the are provided in the Table I.
computation complexity is significantly reduced. Thus, the To illustrate the superiority of the proposed scheme, we
capacity-limited terminals can implement the proposed compare the performance of the maximum elevation (ME)
handover scheme in a distributed manner. handover scheme and the maximum number of free chan-
• Better adaptability: The scale of the network is not relat- nels (MNFC) handover scheme [16]. Specifically, in the ME
ed to the number of satellites in the constellation, which handover scheme, users always select the satellite with the
enables the proposed scheme to be applied into various maximum elevation angle to perform handovers. In the MNFC
large-scale LEO constellations without redesigning the handover scheme, users select the satellite with the most
neural network. available channels to access.
30 5 60
SDQL,100 Users SDQL,100 Users SDQL,100 Users
ME,100 Users ME,100 Users ME,100 Users
25 MNFC,100 Users MNFC,100 Users 50 MNFC,100 Users
Average throughput(Mbps)
4
Forced termination times

SDQL,50 Users SDQL,50 Users SDQL,50 Users
ME,50 Users ME,50 Users ME,50 Users
Average reward
20 MNFC,50 Users MNFC,50 Users 40 MNFC,50 Users

3
15 30
2
10 20
1
5 10
0 0 0
10 15 20 25 30 35 40 45 50 10 15 20 25 30 35 40 45 50 10 15 20 25 30 35 40 45 50
Number of channels per satellite Number of channels per satellite Number of channels per satellite
(a) (b) (c)

Fig. 3. Performance evaluations versus the number of channels per satellite Cmax . (a) Average reward. (b) Average throughput. (c) Forced termination times.
TABLE I
SIMULATION PARAMETERS maximization problem was further formulated. By exploiting
the characteristics of the constellation, a low-complexity SDQL
Notation Value
Maximum antenna gain of satellite Gmax 30dBi algorithm was developed, which can significantly reduce the
Antenna gain of users GT 0dBi dimensions of state spaces and efficiently solve the formulated
Spotbeam angle of users ϕi,j 0.01◦ problem in a distributed manner.
3dB beamwidth ϕ3dB 0.4◦
Transmit power of spotbeam PK 16dBW R EFERENCES
Bandwidth of users B 2MHz [1] Y. Su, Y. Liu, Y. Zhou, J. Yuan, H. Cao and J. Shi, “Broadband LEO
Minimal transmission rate Rmin 2Mbps satellite communications: Architectures and key technologies,” IEEE
Carrier frequency fI 20GHz Wireless Commun. Lett., vol. 26, no. 2, pp. 55-61, April 2019.
Noise power spectral density -173dBm/Hz [2] A. Al-Hourani, “Session duration between handovers in dense LEO
Arrival rate λ 0.1s−1 satellite networks,” IEEE Wireless Commun. Lett., vol. 10, no. 12, pp.
Average call duration Tm 3min 2810-2814, Dec. 2021.
Duration of s slot ts 10ms [3] E. Del Re, R. Fantacci and G. Giambene, “Efficient dynamic channel al-
Duration of a accessing process TA 100ms location techniques with handover queuing for mobile satellite networks,”
IEEE J. Sel. Areas Commun., vol. 13, no. 2, pp. 397-405, Feb. 1995.
Duration of a handover frame TH 1s [4] G. Maral, J. Restrepo, E. del Re, R. Fantacci and G. Giambene, “Perfor-
Size of a packet SP 1000bits mance analysis for a guaranteed handover service in an LEO constellation
Minimum paid fee BP 1 with a ‘satellite-fixed cell’ system,” IEEE Trans. Veh. Technol., vol. 47,
Minimum charging fee BC 5 no. 4, pp. 1200-1214, Nov. 1998.
Upper limit of charging coefficient KC 5 [5] E. Del Re, R. Fantacci and G. Giambene, “Handover queuing strategies
Simulation time 10min with dynamic and fixed channel allocation techniques in low Earth orbit
mobile satellite systems,” IEEE Trans. Commun., vol. 47, no. 1, pp. 89-
102, Jan. 1999.
Figure 3 shows the average reward, average throughput, [6] Z. Wu, F. Jin, J. Luo, Y. Fu, J. Shan and G. Hu, “A graph-based satellite
and forced termination times for the proposed scheme, the handover framework for LEO satellite communication networks,” IEEE
ME scheme, and the MNFC scheme. We can observe from Commun. Lett., vol. 20, no. 8, pp. 1547-1550, Aug. 2016.
[7] L. Feng, Y. Liu, L. Wu, Z. Zhang and J. Dang, “A satellite handover
Fig. 3 that the proposed scheme achieves the best performance. strategy based on MIMO technology in LEO satellite networks,” IEEE
Specifically, as shown in Fig. 3(a), the superiority of the Commun. Lett., vol. 24, no. 7, pp. 1505-1509, July 2020.
[8] C. -Q. Dai, Y. Liu, S. Fu, J. Wu and Q. Chen, “Dynamic handover
average reward mainly comes from that the proposed scheme in satellite-terrestrial integrated networks,” in Proc. IEEE Globecom
can adjust the handover strategies based on the dynamically Workshops (GC Wkshps), pp. 1-6, 2019.
[9] H. Xu, D. Li, M. Liu, G. Han, W. Huang and C. Xu, “QoE-driven
changed environment to maximize the long-term utility. In intelligent handover for user-centric mobile satellite networks,” IEEE
this way, it ensures that the users can always obtain the most Trans. Veh. Technol., vol. 69, no. 9, pp. 10127-10139, Sept. 2020.
reward. As for the average throughput and forced termination [10] C. Qiu, H. Yao, F. R. Yu, F. Xu and C. Zhao, “Deep Q-learning aided
networking, caching, and computing resources allocation in software-
times as shown in Fig. 3(b) and Fig. 3(c), since the transmission defined satellite-terrestrial networks,” IEEE Trans. Veh. Technol., vol. 68,
rate requirement of users and the inter-satellite traffic balancing no. 6, pp. 5871-5883, June 2019.
[11] Y. Cao, S. -Y. Lien and Y. -C. Liang, “Deep reinforcement learning
are exploited and integrated into the scheme design, the aver- for multi-user access control in non-terrestrial networks,” IEEE Trans.
age throughput and forced termination times of the proposed Commun., vol. 69, no. 3, pp. 1605-1619, March 2021.
scheme are efficiently improved as compared to the ME and [12] Y. Li, W. Zhou and S. Zhou, “Forecast based handover in an extensible
multi-layer LEO mobile satellite system,” IEEE Access, vol. 8, pp. 42768-
MNFC schemes. 42783, 2020.
[13] B. Yang, Y. Wu, X. Chu and G. Song, “Seamless handover in software-
V. C ONCLUSION defined satellite networking,” IEEE Commun. Lett., vol. 20, no. 9, pp.
1768-1771, Sept. 2016.
In this paper, a low-complexity distributed handover scheme [14] X. Yan, H. Xiao, K. An, G. Zheng and S. Chatzinotas, “Ergodic capacity
was proposed for the large-scale LEO satellite networks, where of NOMA-based uplink satellite networks with randomly deployed
a shadowed Rice model described by the elevation angle was users,” IEEE Syst. J., vol. 14, no. 3, pp. 3343-3350, Sept. 2020.
[15] A. Abdi, W. C. Lau, M. -. Alouini and M. Kaveh, “A new simple model
employed to characterize the user-satellite channels. The user for land mobile satellite channels: First- and second-order statistics,”
utility function was designed, where the user transmission rate IEEE Wireless Commun. Lett., vol. 2, no. 3, pp. 519-528, May 2003.
[16] P. K. Chowdhury, M. Atiquzzaman and W. Ivancic, “Handover schemes
requirement and the number of available channels of visible in satellite networks: State-of-the-art and future research directions,”
satellites are jointly considered. An overall long-term utility IEEE Commun. Surv. Tutor., vol. 8, no. 4, pp. 2-14, Fourth Quarter 2006.

A_Successive_Deep_Q-Learning_Based_Distributed_Handover_Scheme_for_Large-Scale_LEO_Satellite_Networks

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A_Successive_Deep_Q-Learning_Based_Distributed_Handover_Scheme_for_Large-Scale_LEO_Satellite_Networks

Uploaded by

Copyright:

Available Formats

A Successive Deep Q-Learning Based Distributed Handover

Scheme for Large-Scale LEO Satellite Networks

We consider a downlink transmission scenario in a large- 8VHU

scale LEO satellite network which consists of M LEO satellites +DQGRYHU

Forced termination times

20 MNFC,50 Users MNFC,50 Users 40 MNFC,50 Users

(a) (b) (c)

You might also like