Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

2023 20th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON)

FloodSFCP: Quality and Latency Balanced Service


2023 20th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON) | 979-8-3503-0052-9/23/$31.00 ©2023 IEEE | DOI: 10.1109/SECON58729.2023.10287471

Function Chain Placement for Remote Sensing in


LEO Satellite Network
Ruoyi Zhang† , Chao Zhu† , Xiao Chen† , Qingyuan Gong∗ , Xinlei Xie‡ , Xiangyuan Bu‡

School of Cyberspace Science and Technology, Beijing Institute of Technology, China
∗ School of Computer Science and Shanghai Key Lab of Intelligent Information Processing, Fudan University, China
‡ School of Information and Electronics, Beijing Institute of Technology, China

Abstract—Prompted by the significant advancements in image Typically, remote sensing services require transmitting a
processing technologies and their diverse range of applications, large number of high-resolution images to ground stations.
remote sensing satellites are poised for rapid expansion. Nonethe- However, this process results in high bandwidth costs and
less, offloading the vast amount of remote sensing satellite
images to the ground gateway station is inefficient due to the transmission delays due to the scarcity of satellite resources
exorbitant costs induced by satellite links, while the limited and the vast distance between the satellites and Earth. On the
resources of individual satellites hinder local task processing. other hand, processing a large number of images will consume
With the advancement of the network function virtualization a huge amount of computing resources, making it difficult for
(NFV) technology, a new paradigm for service function chain satellites with limited resources to handle onboard processing.
(SFC) has emerged, which can significantly improve the flexibility
and resource utilization of network services and alleviate resource Since 2021, thousands of low Earth orbit (LEO) satellites
conflicts by dividing large services into smaller ones organized in are keeping being launched to form mega-constellations (e.g.,
the form of SFCs. As mega-constellations (e.g., Starlink) devel- Starlink [1]), aiming at providing low-latency Internet access-
oped, the number of low earth orbit (LEO) satellites is increasing. ing services for global users. By adopting the ideology of
By dividing services into small sub-services and organizing them network functions virtualization (NFV) [2] technologies into
into SFCs throughout the LEO network, services that cannot
be completed by a single satellite can be accomplished through mega-constellations, a new computing paradigm called satel-
multi-satellite cooperation. However, the quality of the remote lite edge computing (SEC) has emerged. In SEC, orchestration
sensing service is positively correlated with its latency, and the platforms (e.g., Kubernetes and KuberEdge) are applied to in-
rapidly changing topology of LEO networks also adds complexity tegrate separate satellite resources on the mega-constellations
to the SFC placement. Hence, how to select appropriate satellites and form a large resource pool to support data-intensive and
to place the SFC and modulate service levels, in order to
obtain better remote sensing results within an acceptable latency, latency-sensitive satellite services. Besides, by employing the
remains a question. To address these issues, this paper proposes methodology of service function chain (SFC) [3], a large
the FloodSFCP, an SFC placement method that aims to increase service could be divided into several small sub-services and
service quality and decrease latency through offline training and sequentially distributed to resource-limited satellites.
online optimization via deep reinforcement learning, taking into By applying the VNF and SFC, a remote sensing service
account the variation in LEO network topology. By introducing
NoisyNet, Dueling, and N-step learning, we improve the model’s could be divided into serval sub-services. For example, a
generalization ability and reduce the state space, thus enhancing simple detection service could be divided into image pre-
convergence speed while reducing decision and training time. processing and object recognition. By placing the sub-services
Experimental results demonstrate that FloodSFCP significantly on different satellites and executing them in a specific order,
improves service quality while reducing total decision costs. the original function could be maintained. Compared with
Index Terms—Service function chain, Deep Reinforcement
Learning, quality-latency balance, satellite network. transmitting the visual data to the ground server, the shorter
link between satellites and lower noise in the vacuum facili-
I. I NTRODUCTION tates a low-latency, low-noise network connection through the
mega-constellations. On the other hand, sub-services demand
With significant advancements in image processing tech-
fewer computing resources than the original service, enabling
nologies, intelligent remote sensing satellites are rapidly ex-
their execution on satellites with limited resources.
panding into diverse areas, including military, forestry, and
Although dividing remote sensing services into sub-services
agriculture. By utilizing advanced techniques like image sharp-
and running sub-service in distributed mode has improved the
ening and recognition, the high-definition camera on the
utilization of satellites resource and decreased the latency in
satellite can capture a broad range of scenes, enabling the
data transmission, the formulation of the sub-service chain and
identification of various targets like buildings, ships, and
making those sub-services work in particular order increase
vehicles within the images.
the complexity and instability of the remote sensing system.
This work was supported by the National Key Research and Development The uneven distribution of satellite resources may incur the
Program of China under grant No. 2020YFB1806000. preemption of adequate nodes and the resource fragmentation

Authorized licensed use limited to: BEIJING INSTITUTE OF TECHNOLOGY. Downloaded on January 13,2024 at 10:54:11 UTC from IEEE Xplore. Restrictions apply.
979-8-3503-0052-9/23/$31.00 ©2023 IEEE 276
2023 20th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON)

problem, and eventually diminish the utilization of resources,


while the continuous movement of satellites may induce 6
Divided
communication disruption and break the sub-service chain. 5 Undivided

Processing Time
Additionally, the sub-service quality has a positive correlation 4
to processing latency and the overall effectiveness of remote
3
sensing service is in a tight connection with the quality
of related sub-services, therefore, the trade-off between the 2

service quality and latency also needs to be taken into account. 1


To address these challenges, we propose the FloodSFCP, 2 4 6 8 10
Number of Services
an SFC placement method that decides where to offload the
(a) (b)
sub-service and selects which level of the sub-service quality.
FloodSFCP aims at increasing service quality while decreasing
1.8
latency through offline training and online optimization via
10

Processing Time(s)
deep reinforcement learning, taking into account the dynamics 1.6

Identified Targets
2.0 15
of satellite topology and the equilibrium of quality and service. 8
1.4 10
Moreover, to improve the generalization ability of the model, 1.5
6
5
we introduce NoisyNet [4], Dueling [5], and N-step learning 1.2 1.0
4 0
[6]to DQN [7] to further reduce the training and convergence 5 5

n
io

io
1.0 4
time. To evaluate the effectiveness of FLoodSFCP, we per- 4

ut

ut
2

ol

ol
5 3 5 3

s
Yolo4 3 Yolo4 3

Re

Re
form evaluations both in program-simulated and real-world 0.8 Res 2 Res 2

c
olut 2 olut 2

ro

ro
ion 1 1 ion 1 1

ep

ep
environments. In the program-simulated environment, FloodS-

Pr

Pr
FCP outperforms the other three algorithms (i.e., NormDQN, (c) (d)
MADDPG [8], and DDPG [9]) in both training speed and
Fig. 1: (a) Remote sensing image example. (b) Comparison
final performance. Furthermore, we set up a testbed using a
between divided and undivided modes. (c) The average pro-
Kubernetes cluster to simulate 20 satellite nodes in LEO and
cessing time at different resolutions. (d) The average number
design a greedy algorithm for comparison. The performance
of identifiable targets at different resolutions.
of FloodSFCP in the testbed is 410% higher than that of the
greedy algorithm.
The key contributions of this work are summarized below:
requires approximately 40 CPU cores and 160GB memory
• We conduct two experiments with the Kubernetes cluster:
[10], which is difficult to place on a single satellite. On the
one aims at exploring the impact of applying SFC on re-
other hand, a higher image resolution usually leads to a better
mote sensing services, and the other aims at exploring the
quality of service, such as target recognition. However, larger
correlation between service quality and service latency.
images also mean larger data volumes and longer processing
• We develop the FloodSFCP, a novel SFC placement
times, resulting in greater service latency. By using NFV and
method deciding the offload position for sub-services and
SFC, the satellite remote sensing services could be divided
their quality levels, aiming at increasing service quality
into multiple lightweight sequential sub-services which have
and decreasing latency with considering the variation of
lower resource requirements and could therefore be more flex-
satellite topology.
ibly placed on resource-constrained satellites. Additionally, by
• We introduce NoisyNet, Dueling, and N-step learning
adjusting the visual resolution of sub-services, the latency and
in DQN to improve the model’s generalization ability
the quality of the remote sensing service can be accommodated
and reduce the state space, and therefore more efficiently
according to various environments. For example, the resolution
handle the balanced placement of services.
of the sub-service would be decreased for a shorter processing
• We prove the effectiveness of FLoodSFCP through sim-
time in latency-sensitive services.
ulation under both program-simulated and real-world
To scrutinize how service dividing impacts processing time,
environments.
the following experiment was designed to characterize the
The rest of this paper is organized as follows. Section II
correlation between service quality and service latency under
gives a detailed motivation and two testbed-based experiments.
various placement methods for satellite remote sensing service,
The problem description and formulation are given in Sec-
and the procedure of the experiment is listed below.
tion III. The MDP-based FloodSFCP placement algorithm is
demonstrated in Section IV. Section V records the evaluation • We use Google Maps Pro to capture 100 remote-sensing
of FloodSFCP. We discuss the related work in Section VI images in various backgrounds with a resolution of
before we conclude in Section VII. 1080×1080, and those images are the input of the ex-
periment. We use YOLO, an acclaimed real-time object
II. MOTIVATION detection and image segmentation model, to sense the
Remote sensing services are typically resource-intensive. input images. Fig. 1a shows one of the remote sensing
For example, a Net Primary Productivity (NPP) application images and its recognition results, all the recognized cars

2
Authorized licensed use limited to: BEIJING INSTITUTE OF TECHNOLOGY. Downloaded on January 13,2024 at 10:54:11 UTC from IEEE Xplore. Restrictions apply.
277
2023 20th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON)

are highlighted with orange boxes. Fig. 1d of Experiment 2 illustrates the relationship between
• We divide remote sensing service into two sub-services: service quality level and the average number of recognized
image pre-processing and image recognition, according to objects. When both sub-services have a service quality level of
the essential sensing procedure of YOLO. In addition, we 1, the recognized target number is only 0.01. When the service
have defined five service quality levels that correspond to levels are 1-5 (pre-processing at level 1 and recognition at level
5 resolutions: 240p, 320p, 480p, 720p, and 1080p. Sub- 5) and 5-1 respectively, the number of recognized targets is
sequently, the sub-services were packaged into Docker 1.4 and 1.88. However, When both sub-service qualities are
images according to the respective service quality level. synchronously improved, the number of recognized targets is
• We set up a Kubernetes cluster composed of 2 nodes 1.6, 3.3, 8.5, and 15.1 respectively for levels 2-2, 3-3, 4-4, and
to host the image of the remote sensing service and 5-5. These observations suggest that the impact of improving
each node was deployed in a VMware virtual machine only one sub-service quality level on the final result is limited
configured with 4 CPU cores and 8GB of RAM. compared to a synchronous improvement of both sub-services.
• Finally, we designed two placement methods, and each Based on the results of the two experiments, it can be
method was executed 100 times to minimize fluctuations concluded that dividing services enables more flexible deploy-
in the experiment result. 1) We randomly assigned 2 to ment, effectively improves resource utilization, and reduces
10 services with a fixed 4-level service quality to two service latency on the satellite nodes. Additionally, increasing
Kubernetes nodes, and we recorded the processing time the service quality level can significantly improve service qual-
both with and without service division to investigate the ity metrics, but it also significantly increases service latency.
impact of service division on processing time. 2) We In the real-world LEO environment where the satellite onboard
deploy all 25 quality levels (e.g., combinations of 5 levels resources are various and the topology is rapidly changing, the
of pre-processing and 5 levels of recognition) of sub- SFC placement method, i.e., the chosen deployment locations
services to one Kubernetes node and recorded the total of the sub-services and the selection of appropriate service
processing time and the number of recognized targets for quality levels, need to be carefully designed.
each combination, in order to characterize the relationship
between service quality and service delay. III. PROBLEM DESCRIPTION AND FORMULATION
The results of Experiment 1 are presented in Fig. 1b,
with the number of randomly incoming services versus the In this section, we present the system model of FloodSFCP
processing time. The median processing time for the divided and the optimal sub-service sequence deployment problem.
mode was from 1.34s to 1.63s, whereas that of the undivided
mode was from 1.46s to 3.37s. We can observe that as the A. System Model
number of services increased, the processing time in the 1) Service Model: We represent the set of all services as K.
divided mode increased by an insignificant 21%. In contrast, For a given service k ∈ K, we use binary variable rn,k ∈ R
the undivided mode exhibited an evident increase of up to to indicate whether it is initiated by satellite node n ∈ N ,
230%. Furthermore, the processing time of the undivided where N is the set of satellites. To further break down each
mode consistently remained greater than that of the divided service k into its constituent sub-services, we employ business
mode under different numbers of tasks. logic to obtain a sequence of sub-services, represented as k =
Fig. 1b shows that when the number of services was 8 [f1 , f2 , ..., f|F | ], where |F | is the number of services divided.
and 10, the expected value and standard deviation for the We introduce a binary variable xfn,k ∈ X to denote whether
divided mode were (1.51, 0.98) and (2.19, 6.08), respectively. satellite node n executes the f -th sub-service of service k.
In contrast, the expected value and standard deviation for the Specifically, we define x0n,k = rn,k . In addition, for each sub-
undivided mode were (2.43, 5.32) and (3.28, 7.69), respec- service f , we define a quality index
tively. This is because each sub-service in the divided mode
needs little resources, so they are more flexible to place and ∀k ∈ K, ∀f ∈ k, qkf = {1, 2, 3, 4, 5} (1)
can make full use of resources.
Fig. 1c describes the impact of service quality levels on the to denote the selection range of quality levels of the f -th sub-
overall processing time in Experiment 2. When the image res- service of service k.
olution of pre-processing and YOLO recognition are both set 2) Time Slot Model: The topology of satellites is time-
at level 1, the entire average processing time is 0.68 seconds. varying and the completion time of the service is uncertain.
As the recognition resolution increases step by step while However, we can assume that the satellite topology and the
the pre-processing resolution remains unchanged, the average service completion status remain unchanged in a sufficiently
processing time increases from 0.71s to 1.03s. Conversely, short period. Therefore, we introduce the concept of a time
as the pre-processing resolution increases step by step while slot to facilitate analysis.
the recognition resolution remains unchanged, the average We have determined a constant value, denoted by τ , and set
processing time increases from 0.81s to 1.87s. Notably, when the total time required for a single placement process as L · τ .
both sub-services are at level 5, the processing time reaches At every interval of τ , the system places sub-services once.
2.18s, which is 320% longer than the lowest level. Then we define a binary variable hlk,f ∈ Hl as an indicator of

3
Authorized licensed use limited to: BEIJING INSTITUTE OF TECHNOLOGY. Downloaded on January 13,2024 at 10:54:11 UTC from IEEE Xplore. Restrictions apply.
278
2023 20th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON)

whether the sub-service f of service k is placed during time Finally, the amount of data returned by a service is usually
slot l ∈ L. Sub-services have a sequential order, hence we get small and can be ignored, hence only the propagation latency
of the backhaul is considered. the total latency Tk for the
∀hlk,f
1
= hlk,f
2
+1 = 1, l1 < l2 . (2)
service k is as follows
3) Satellite Network Topology Model: Based on the def-
|F | |F |
X
inition of time slot, we define the satellite network as an Tk = tk + xn2 ,k · rn1 ,k · D(n1 , n2 ). (7)
undirected graph Gl = (N, E), where E represents all links n1 ,n2 ∈N
between satellites in time slot l. To simplify the problem,
we assume that there is a logical link (n1 , n2 ) between any B. Problem Formulation
two satellites n1 and n2 (n1 , n2 ∈ N ). The logical link is In this section, our objective is to minimize overall service
composed of physical links in E, and the actual hops of latency and maximize overall service quality. The overall
the link are the shortest hops between n1 and n2 (using the service latency for a specific service k has been previously
Dijkstra algorithm). To uniformly allocate the traffic in the real defined, and the service revenue is represented by U (qk ).
link and minimize link utilization, we can use the algorithm
As the two objectives are interconnected and depend on
proposed in [11] , which helps to avoid data flow queuing
the coupling of qk , to achieve a balance between the two
and blocking due to high link utilization. By doing this, we
objectives, we define a joint function as φq · U (qk ) − φt · Tk ,
exclude the impact of actual path selection on transmission
where φn q , φt ∈
o [0, 1] are the two control weights.
latency and only consider the end-to-end bandwidth. f
Furthermore, the wireless transmission rate in the vacuum Xe = x
n,k is used to denote the set of selected satellite
is not a constant and is affected by the number of hops and nodes, and Q e = {qk } is defined as the set of the selected
distance, we define the function W (n1 , n2 ) to represent the service quality, the objective function can be defined as
average data transmission rate of the link (n1 , n2 ). We also
define the function D(n1 , n2 ) to represent the propagation XX
latency of data transmitted through the link (n1 , n2 ). P1 : max [φq · U (qk ) − φt · Tk ]
X,
e Qe
k∈K l∈L
4) Latency Model: We define the predicted time from the
start of the total service to the completion of a certain sub-
service f as tfk . There is
s.t. xfn,k ∈ {0, 1} , qk ∈ {1, 2, 3, 4, 5} ,
tfk = tk,f
start + tk,f
td + tk,f
pro , (3) ∀n ∈ N, k ∈ K, f ∈ k, (8a)
tk,f
X f
where start is the sub-service start time, which is xn,k = hlk,f , ∀f ∈ k, (8b)
 f −1 n∈N
k,f t , f ≥ 2,
tstart = k (4) X
0, f = 1, rn,k = 1, ∀k ∈ K, (8c)
n∈N
tk,f
td is the data transmission latency and propagation latency
xfn1 ,k · xfn−1
X
2 ,k
· W (n1 , n2 ) ≤ b(n1 , n2 ), ∀n1 , n2 ∈ N,
from the service initiation satellite (or the execution satellite
k∈K,f ∈k
of the previous sub-service) to the current execution satellite,
(8d)
which is expressed as
  f   Tk ≤ Rk , ∀k ∈ K. (8e)
V qk , f
f −1 f
k,f
X
ttd = xn2 ,k·xn1 ,k · +D (n1 , n2 ) . (5) Equation (8b) (8c) ensures that each service will only be
W (n1 , n2 )
n1 ,n2 ∈N initiated on one satellite node, and when hlk,f = 1, the sub-
Among them, the function V (q, f ) describes the amount of service will be deployed on one and only one node, otherwise,
data corresponding to different service qualities. Furthermore, the sub-service f will not be deployed in timeslot l. Equation
W (n1 , n2 ) is the internal port transmission rate in the l-th (8d) indicates that the bandwidth of any end-to-end virtual
time slot when n1 = n2 , otherwise, W (n1 , n2 ) is the channel link will not exceed the upper limit b(n1 , n2 ). Offloading
transmission rate. When solving for xfn,k , it is clear that two the service to a distant satellite will significantly increase the
sub-services cannot be deployed or executed simultaneously latency of data transmission. Equation (8e) describes the total
−1
in the same time slot. Therefore, xfn,k is a known variable. time spent on any service, which cannot exceed the upper
k,f
Besides, tpro is the processing latency, expressed as follows tolerance limit Rk .
X f  
f
tk,f
pro = xn,k · P qk , f, o n , (6) IV. MDP-BASED SUB-SERVICES PLACEMENT FOR
n∈N
LEO SATELLITE NETWORK
where the variable on represents the CPU computing resources
of the satellite node n, and the function P (qkf , f, o) charac- In this section, we propose an MDP-based sub-services
terizes the processing time required to execute sub-service f placement method with deep reinforcement learning and a
with quality level q when CPU computing resource is o. framework of online decision and offline training.

4
Authorized licensed use limited to: BEIJING INSTITUTE OF TECHNOLOGY. Downloaded on January 13,2024 at 10:54:11 UTC from IEEE Xplore. Restrictions apply.
279
2023 20th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON)

2IIOLQH7UDLQLQJ
2IIOLQH0RGHO7UDLQLQJ
*HQHUDWHUDQGRPVHUYLFHV *HWREVHUYDWLRQV
2 2) 6LPXODWLRQ
6HUYLFH 6SOLW 


6XEVHUYLFH4XHXH  (QYLURQPHQW
6HUYLFH %DFN
 vnfF vnf2 vnf1 6XEVHUYLFH
 


,QWHUDFWLQJZLWK
SURSDJDWLRQ WKHHQYLURQPHQW
)LOO 
6HUYLFH. LQIRUPDWLRQ  8SGDWHVHUYLFH
6WDWHV $FWLRQV 2XWSXW 7KH6WDWXV H[HFXWLRQGDWD
0DQDJHPHQW
3DUDPHWHUV *HWREVHUYDWLRQV /D\HU
8SGDWHQHWZRUNSDUDPHWHUV 3URYLGHVHUYLFH 8SGDWHVHUYLFH
2QOLQHGHFLVLRQPDNLQJ DQGVDWHOOLWH
LQIRUPDWLRQ VWDWXV
5HDOVHUYLFHVDUULYDO 7KH&HQWUDO'HFLVLRQPDNLQJOD\HU
2 2)
6HUYLFH 6SOLW 


6XEVHUYLFH4XHXH 
6HUYLFH
 vnfF  vnf2 vnf1



 5HTXHVWGHSOR\PHQW
)LOO 6XEVHUYLFH
6HUYLFH. 
LQIRUPDWLRQ 6WDWHV $FWLRQV 7KH&ORXGLILFDWLRQ/D\HU

Fig. 2: The service deployment framework.

A. MDP Formulation the income of the first |F | − 1 sub-services is −φt · tfk ,


As service deployment is dependent solely on the current and the reward for the last sub-service is φq (U (qk )) −
φt tfk + n1 ∈N,n2 ∈N xF
P
state of the environment and not on previous environmental n2 ,k · rn1 ,k · D (n1 , n2 ) . For con-
states, we use MDP to solve the model in Section III. When straint (8d), if the bandwidth of the scheduling node of the
the number of satellite nodes is large (Starlink has 12,000 current sub-service is insufficient, the sub-service will be
satellites [1]), this would result in a significant increase in postponed to the next time slot, and the waiting time will
computation costs and service scheduling delays. Therefore, be included in Tk . Finally, for constraint (8e), when the
we limit the set of satellites allowed to be unloaded to the four execution time of service exceeds the tolerance time, the SFC
satellites around the target satellite and itself when modeling. is terminated, and the final benefit is a penalty value.
1) States: In our model, each service under each time slot
is initiated sequentially according to the order of arrival, thus B. Service Deployment Framework

ensuring constraints (8a), (8b), and (8c). We define sfl = (Nlf Since the training is performed based on a real environment,
⌢ ⌢ ⌢ ⌢ ⌢ ⌢ ⌢ ⌢ there are issues such as lacking training samples and low
, Clf , Olf , D1,l
f k
, D2,l+1 , Blf , f , qlast
f
, rf ) as the state when the
⌢ efficiency of the initial algorithm. To address these issues, we
f -th sub-service starts in the time slot l. Among them, theNlf have designed a system to train the decision-making model
is⌢the node number vector of the candidate satellites. Vector offline through a simulated environment, and then update the

service deployment framework in the real-world environment
Clf = (Cl,q
f,n
) describes the number of sub-services f and
based on the offline model.
different service quality q that are running on each candidate

⌢ As shown in Fig. 2, the entire service deployment frame-
satellite, with n ∈Nlf . The vector O represents the computing work consists of two steps: offline training (green part) and

resources of each candidate satellite. Vector D1,lf
represents the online decision-making (blue part).
distance from the initiator satellite of the current sub-service f 1) Offline Training Module: We design a simulation en-
⌢ vironment based on satellite configuration and time-varying
k
to each candidate satellite, while the vector D2,l+1 refers to the information, including a deep neural network (DNN) that fits
distance from each candidate

satellite to the initiator satellite the processing time of simulated services (without deploying
of the service k. Vector Blf defines the remaining bandwidth VNF) from real service information. We randomly generated

of each candidate node. Variable f indicates the type of the a set of simulated services and decomposed them into sub-
⌢ service queues. The offline training module retrieves informa-
f
current sub-service, variable qlast indicates the service quality tion on sub-services from the queue and satellite observation

level of the previous sub-service in SFC, and variable rlk information from the simulation environment to create a state.
indicates the remaining tolerance time of the service. Then, we input the state into a network to obtain an action, and
2) Actions: We define the action space by combining the interact with the simulation environment to obtain a reward.
variables x and q in the objective function (8) into a one- Through backpropagation, the offline training model is trained.
hot vector whose length is |Nlf | × |q|, to describe the service Finally, the trained model is replicated to the online decision-
quality of sub-service f deployed to a certain satellite. making network.
3) Rewards: Since multiple sub-services on SFC are in- 2) Online Decision-making Module: In the real environ-
terrelated, the final reward depends on each sub-service ment, the decision-making process is similar to the simulation
and will only be obtained at the end, so the overall op- environment. We designed a decision-making layer responsible
timization objective P1 will be divided into two parts: for creating a state by combining the sub-service information

5
Authorized licensed use limited to: BEIJING INSTITUTE OF TECHNOLOGY. Downloaded on January 13,2024 at 10:54:11 UTC from IEEE Xplore. Restrictions apply.
280
2023 20th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON)

Algorithm 1 MDP-based SFC Deployment NoisyNet, Dueling, N-step Learning, and DQN. The specific
Input: observation state s, reward r algorithm is shown in Algorithm 1.
Output: action a DQN forms the foundation of the entire algorithm. In DQN,
1: Initialize all agents network parameters the agent interacts with the environment to obtain the state s
2: Setting replay buffer memory(rbm) size and batch size and reward value r, then fits the action-value function Q based
3: Loading or updating sub-service execution data on r, and finally selects the most valuable action a based on
4: Offline Training Stage: Q. The optimal action value Q∗ can be expressed as:
5: for each episode = 1, 2, 3 . . . do
h i
Q∗ (s, a) = Es′ ∼ε r + λ max Q ∗ ′ ′
(s , a )|s, a , (9)
6: for each timeslot = 1, 2, 3 . . . do ′ a
7: for each subservice = 1, 2, 3 . . . do where ε represents the interaction environment, λ represents
8: Obtain simulation environment observation state s the discount factor for each step. s′ and a′ represent the next
9: Choose an optimal action a state and action. To fit Q, we use a four-layer fully connected
10: Obtain the reward value corresponding to action a, neural network and adopt mean squared error and gradient
and update the simulation environment descent as optimization methods. We define the network
11: Obtain environmental observation of next sub- parameters as θ, and the gradient of the loss function for the
service on the same chain as s′ i-th iteration is
12: if subservice step ≥ n then
∇θi Li (θi ) = Es,a∼ρ(·);s′ ∼ε
13: Put n-step experience into the rbm h  i (10)
end if ′ ′
14: r+λmax ′
Q(s , a ; θ i−1 )−Q(s, a; θ i ) ∇ θ i
Q(s, a; θ i ) ,
a
15: end for
16: end for where ρ(s, a) represents the probability of taking each action
17: Randomly fetch data from the rbm by batch size a in state s.
18: Update offline agent parameters with the above data To normalize the input to each neuron, reduce the differ-
19: Update simulation environment ences between samples and improve the model’s generalization
20: end for ability and training speed, we add a layer normalization (LN)
21: Online Decision-making Stage: after each fully connected layer. Assuming the output of the
22: for each timeslot = 1, 2, 3 . . . do fully connected layer is x̂, the calculation formula for LN is:
23: for each subservice = 1, 2, 3 . . . do x̂ − E(x̂)
24: Obtain real environment observation state s LN (x̂) = p · γ(θ) + β(θ), (11)
V ar(x̂) + ϵ
25: Choose an optimal action a and deploy sub-service
into satellite network where ϵ is a small constant to avoid division by zero. γ and
26: Save the execution data of sub-service β are the learnable parameters of the network.
27: end for In real-world environments, the processing time of a service
28: end for is subject to external factors such as power and temperature,
29: if Reaching the update time then which have certain randomness. In addition, the deployment
30: Update the parameters of the online agent by the offline of services based on local environment data may not be able to
agent achieve global optimality. Therefore, we introduce NoisyNet
31: end if into the network model, which adds noise to θ to regularize
the network and improve its robustness. The definition of
NoisyNet is as follows
and observation data from the real environment and obtaining y = (E(γ) + V ar(γ) · ϕγ ) x̂ + E(β) + V ar(β) · ϕβ , (12)
an action to interact with the cloudification layer. The cloud-
ification layer is a Kubernetes cluster deployed on the real where ϕ represents the random noise, which follows Factorised
environment that is responsible for deploying the VNFs of Gaussian noise [4].
sub-services. Additionally, we designed a status management To further enhance the stability and effectiveness of our
layer to collect real environment data in a centralized manner algorithm, we introduce the Dueling architecture. By de-
and interact with the cloudification layer to avoid frequent composing the Q-function into the state-value function and
interactions that could disrupt the environment. Besides, the advantage function, Dueling can improve the estimation ability
collected service information is provided to the simulation of the action. The state-value function computes the maximum
environment to better fit the real environment. expected reward that the agent can obtain in a particular state,
while the advantage function calculates the advantage of each
C. Generalization Reinforcement Learning Approach action relative to the mean action in that state. Specifically,
Reinforcement learning (RL) has been proven to be an the state-value function V and the advantage function A are
effective method in the fields of SFC deployment and resource
V (s) = Es,a∼ρ(·) [Q(s, a)] , (13)
balancing [12]. In this paper, we design FloodSFCP to address
the issues of SFC placement by combining four RL paradigms: A(s, a) = Q(s, a) − V (s). (14)

6
Authorized licensed use limited to: BEIJING INSTITUTE OF TECHNOLOGY. Downloaded on January 13,2024 at 10:54:11 UTC from IEEE Xplore. Restrictions apply.
281
2023 20th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON)

150
Due to the lack of identifiability and poor performance of Q- 1.25 125

Decision Time (ms)

Training Time (ms)


values [5] computed by (14), we rewrite Q as 1.00 100

Q(s, a; θ1 , θ2 ) =V (s; θ2 )+ 0.75 75


!
0.50 50
1 X ′ (15)
A(s, a; θ1 ) − A(s, a ; θ1 ) , 0.25 25
|A| ′
a 0.00 0
Flood- Norm- MADDPG DDPG Flood- Norm- MADDPG DDPG
SFCP DQN SFCP DQN
where θ1 and θ2 are parameters of two fully-connected layers,
and a′ represents the advantage values in the advantage (a) (b)
function A. Fig. 3: (a) Decision time of different algorithms. (b) Single-
Since the reward for SFC is only generated after the period training time for different networks.
entire chain is deployed, and each service affects the final
result, single-step learning may be difficult to converge or
achieve stable learning effects. Therefore, we introduced n- 12.5 2500
FloodSFCP FloodSFCP
step learning, which modifies the loss function in (10) as 10.0
Norm_DQN
2000 Norm_DQN

The Total Reward


7.5 MADDPG
follows MADDPG

The Reward
DDPG DDPG
5.0 1500
Li (θi ) = Es,a∼ρ(·);sn ∼ε 2.5
1000
0.0
 !2 
n−1
X (16) -2.5 500
 γ k rk+1 +γ n max

Q (sn ,a′; θi−1)−Q (s, a; θi ) . -5.0
a 0
k=0 0 1000 2000 3000 4000 5000 100 150 200 250 300 350
The Epoch of Training The Num of Services

By accumulating the consecutive n-step rewards and using (a) (b)


the sum of these rewards as the new reward signal to update
Q values, n-step learning can better handle the issue of Fig. 4: (a) Trends in training reward for each algorithm. (b)
delayed rewards and guide the decision-making process of the Total reward of each algorithm under different num of services.
agent more effectively. In addition, we have incorporated both
single-step and n-step learning information into the experience
training time, with respective improvements of 51% and 78%,
pool [7] to improve accuracy.
for FloodSFCP only uses local environment data. Additionally,
V. EVALUATION we can observe that FloodSFCP also outperforms DDPG and
In this section, we evaluate the performance of the proposed MADDPG in decision time and training time, especially with
algorithm in both the real-world environment (VM-simulated) the advantage of training time up to 129% over MADDPG.
and the program-simulated environment. Next, we compared the performance of the four algorithms
in terms of average reward per training epoch and cumulative
A. Evaluation Configuration reward under different numbers of services in the simulation
We evaluated our algorithm using 20 nodes. In the begin- environment. As shown in Fig. 4(a), FloodSFCP outperforms
ning, all nodes were divided into two parts, with no direct links the other three algorithms in both training speed and final
between them. As the satellite began to move, nodes from the performance. Norm-DQN has a slower training speed and
two parts gradually established direct links due to the different lower final performance compared to FloodSFCP, and both
angular velocities of the satellite in different orbital planes. AC-based algorithms perform poorly due to the discrete action
Each node was equipped with 8GB memory, 4 CPUs and space. MADDPG has a faster training speed, but the final per-
100 Mb/s downstream. Sub-service was fixed with 20Mb/s formance is still lower than FloodSFCP. DDPG has the worst
downstream. Furthermore, we set the propagation speed to performance in both training speed and final performance. As
200,000 km/s and the delay of each routing hop to 1 ms. shown in Fig. 4(b), when the number of randomly arrived
We tested a remote sensing service and divided it into two services ranges from 100 to 350, FloodSFCP maintains the
sub-services organized in a chain. Each service is randomly best performance. Especially when there are 350 services, the
initiated on a satellite and has a random arrival time and a performance of Norm-DQN significantly drops, whereas our
tolerance time. algorithm still maintains good performance, with a total reward
In addition, we set each time slot τ to 2 seconds and the that is 44% higher than that of Norm-DQN.
duration of each experiment to 10τ .
C. Real-world Environment Evaluation
B. Simulated Environment Evaluation To verify the performance of our algorithm in a real environ-
Firstly, we compared the execution time and training time ment, we tested it using the same neural network parameters
of FloodSFCP with Norm-DQN (using all the nodes data as in both real and simulated environments.
the state) and two classic Actor-Critic (AC) [13] algorithms We define services finishing within the tolerance time as
(i.e., DDPG and MADDPG). As shown in Fig. 3, FloodSFCP complete, and Fig. 5a shows the completion rate under differ-
significantly outperforms Norm-DQN in decision time and ent service numbers. FloodSFCP performed similarly in both

7
Authorized licensed use limited to: BEIJING INSTITUTE OF TECHNOLOGY. Downloaded on January 13,2024 at 10:54:11 UTC from IEEE Xplore. Restrictions apply.
282
2023 20th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON)

5 5
The Rate of Completion(%) Real Environment Real Environment 15.0 Quality Latency 15.0 Quality Latency
Simulation Environment Simulation Environment 4 4
2500 12.5 12.5

The Total Reward


100 Real Environment(Greedy)

Latency(s)

Latency(s)
80 2000 10.0 3 10.0 3

Quality

Quality
60 1500 7.5 7.5
2 2
40 1000 5.0 5.0
1 1
20 500 2.5 2.5

0 0 0.0 0 0.0 0
100 150 200 250 300 350 100 150 200 250 300 350 φq=1 φq=1 φq=1 φq=0.5φq=0.25 φq=1 φq=1 φq=1 φq=0.5φq=0.25
The Number of Services The Num of Services φt=0.25φt=0.5 φt=1 φt=1 φt=1 φt=0.25φt=0.5 φt=1 φt=1 φt=1

(a) (b) (a) (b)

Fig. 5: Comparison between real and simulated environments. Fig. 6: Results of various preferences. (a) Sparse. (b) Dense.
(a) Completion rate (b) Total reward.

4 φq=1.0 φt=0.25 φq=1.0 φt=0.25


φq=1.0 φt=0.5 1.5 φq=1.0 φt=0.5
environments when the numbers were 100 and 150. As the 3 φq=1.0 φt=1.0 φq=1.0 φt=1.0
number increased, the completion rate in the real environment φq=0.5 φt=1.0 φq=0.5 φt=1.0

KDE

KDE
1.0
2 φq=0.25 φt=1.0 φq=0.25 φt=1.0
was slightly lower than that in simulation, with a maximum
difference of only 2.3% when the numbers were 350. 1
0.5

Fig. 5b shows the overall reward under different service


0 0.0
numbers in both environments. The total reward of FloodSFCP 0 1 2 3 4 5 0 1 2 3 4 5
The Quality Level of Service The Quality Level of Service
in the real environment was lower than that in the simulated
(a) (b)
environment, with a minimum difference of 1% when the
service numbers were 100 and a maximum difference of Fig. 7: Distribution of service quality level with different
26% when the numbers were 350. For comparison, we also deployment strategies. (a) Sparse. (b) Dense.
designed a greedy algorithm that selected the node with the
smallest load for deployment with randomly selected quality
levels. We can observe that FloodSFCP outperforms the greedy the increase in latency can only bring a slight improvement
algorithm in all situations. Especially, the performance of in quality under the quality priority strategy for the quality
FloodSFCP was up to 410% higher than the greedy algorithm is limited by the computing capacity in both scenarios. On
when there are 300 services. the other hand, the increase in latency can effectively improve
Fig. 5 illustrates that when the number of services is the service quality synchronously under the latency priority
few, the network parameters trained in the offline simulation strategy, which is more pronounced in dense scenarios.
environment can perform well in a real environment, however Fig. 7 compares the KDE of quality levels for different φq
when the number of services increases, the performance has and φt . As the ratio of φq and φt decreases, the mean service
decreased. These results are caused by the underestimation quality level gradually decreases. In the sparse scenario, level
of service processing time in the simulated environment. In 5 is the most commonly chosen option for all preferences, with
the real-world environment, the processing time is limited by mean values greater than 4.59. In the dense scenario, although
the hardware performance, resulting in a lower completion level 5 is still the most commonly chosen option, level 1 was
rate and total reward. In addition, FloodSFCP is more flexible also frequently chosen, and mean values for φq and φt of
and effective in handling service deployment tasks than fixed (0.25, 1) and (0.5, 1) are 3.29 and 3.89, which are lower than
strategy algorithms such as greedy algorithms. 4.59 and 4.77 in the sparse service scenario.
These experiments demonstrate that our algorithm can flex-
D. Policy Performance Evaluation ibly balance service quality and latency under different service
We analyzed the relationship between service latency and load scenarios and effectively adjust the balance between
service quality under different latency-quality preferences and quality and latency according to different strategy preferences.
service quantities, and the impact on algorithm decisions.
We defined five strategy preferences with φq and φt values VI. RELATED WORK
of (1, 0.25), (1, 0.5), (1, 1), (0.5, 1), and (0.25, 1), and At present, the industry has been carrying out extensive re-
trained different models for each preference. We evaluate these search on the SFC placement mechanisms. In order to solve the
strategies under sparse (100 services) and dense (350 services) delay problem caused by the increasing length of SFC, Baek
scenarios, the results are shown in Fig. 6 and Fig. 7. et al. proposed an order dependency-aware placement scheme
Fig. 6 shows the relationship between latency and quality [14]. Ko et al. optimized the service delay requirements along
under sparse and dense scenarios. As the ratio of φq to φt with the network condition requirements [15]. For the im-
decreases from 1:0.25 to 0.25:1, the quality and latency reduce provement of resource utilization, Wang et al. combined both
synchronously. Divided by the boundary of φq = 1 and the graph convolution network which extracted the features
φt = 1, the strategies with quality priority are on the left of the physical network and the sequence-to-sequence model
side and latency priority on the right side. We observe that which captured the ordered information of the SFC request

8
Authorized licensed use limited to: BEIJING INSTITUTE OF TECHNOLOGY. Downloaded on January 13,2024 at 10:54:11 UTC from IEEE Xplore. Restrictions apply.
283
2023 20th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON)

by adopting deep reinforcement learning [16]. Sekine et al. [6] R. S. Sutton, “Learning to predict by the methods of temporal
used Docker and Kubernetes to design an automatic Internet differences,” vol. 3, no. 1, pp. 9–44. [Online]. Available:
http://link.springer.com/10.1007/BF00115009
of Things (IoT) networking SFC deployment framework with [7] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou,
high resource utilization efficiency [17]. In addition, there are D. Wierstra, and M. Riedmiller, “Playing atari with deep reinforcement
also studies related to the visit order [18], cloud placement learning.” [Online]. Available: http://arxiv.org/abs/1312.5602
[8] R. Lowe, Y. I. Wu, A. Tamar, J. Harb, O. Pieter Abbeel, and I. Mordatch,
[19], etc. Through analyses, existing studies rarely consider “Multi-agent actor-critic for mixed cooperative-competitive environ-
the classification of service quality levels, which is critical for ments,” Advances in neural information processing systems, vol. 30,
optimizing the SFC placement problem. 2017.
[9] D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller,
In addition, the SFC placement based on satellite networks “Deterministic policy gradient algorithms,” in International conference
is relatively few. Gao et al. proposed a location-aware resource on machine learning. Pmlr, 2014, pp. 387–395.
allocation algorithm for SFC placement and routing traffic [10] J. Yan, Y. Ma, L. Wang, K.-K. R. Choo, and W. Jie, “A cloud-
based remote sensing data production system,” Future Generation
problems in satellite ground station networks to minimize Computer Systems, vol. 86, pp. 1154–1166, 2018. [Online]. Available:
link resource utilization and the number of servers used in https://www.sciencedirect.com/science/article/pii/S0167739X17303035
research [20], and discussed the energy optimization problem [11] “Flylisl: Traffic balance awared routing for large-scale mixed-reality
telepresence over reconfigurable mega-constellation,” in 2022 IEEE
of SFC placement and converted it into an integer nonlin- 42nd International Conference on Distributed Computing Systems Work-
ear programming problem in research [21]. To improve the shops (ICDCSW). IEEE, 2022, pp. 260–265.
flexibility and scalability of satellite networks, Zhang et al. [12] Y. Zhu, H. Yao, T. Mai, W. He, N. Zhang, and M. Guizani, “Multiagent
reinforcement-learning-aided service function chain deployment for
proposed an intent-driven SFC deployment scheme, which internet of things,” vol. 9, no. 17, pp. 15 674–15 684. [Online].
can generate the optimal service function path [22]. Ye et Available: https://ieeexplore.ieee.org/document/9712642/
al. considered resource constraints, delay requirements, and [13] V. Konda and J. Tsitsiklis, “Actor-critic algorithms,” Advances in neural
information processing systems, vol. 12, 1999.
intermittent spatial link conditions to ensure the reliability of [14] H. Baek, I. Jang, H. Ko, and S. Pack, “Order dependency-aware service
SFC in the satellite-ground network, and designed a heuristic function placement in service function chaining,” in 2017 International
decoupling algorithm to solve the problem [23]. Conference on Information and Communication Technology Conver-
gence (ICTC). IEEE, Oct. 2017, pp. 193–195.
[15] H. Ko, D. Suh, H. Baek, S. Pack, and J. Kwak, “Optimal placement
VII. C ONCLUSION of service function in service function chaining,” in 2016 Eighth
International Conference on Ubiquitous and Future Networks (ICUFN).
This paper proposes FloodSFCP, an SFC placement method IEEE, Jul. 2016, pp. 102–105.
for remote sensing in the LEO satellite network. To improve [16] T. Wang, Q. Fan, X. Li, X. Zhang, Q. Xiong, S. Fu, and M. Gao,
the utilization of satellites resource, large remote sensing “Drl-sfcp: Adaptive service function chains placement with deep re-
inforcement learning,” in ICC 2021-IEEE International Conference on
services are divided into smaller sub-services and further Communications. IEEE, Jun. 2021, pp. 1–6.
placed on different satellites and executed in a specific order. [17] H. Sekine, K. Kanai, J. Katto, H. Kanemitsu, and H. Nakazato, “Iot-
FloodSFCP aims at finding quality and latency balanced centric service function chainingorchestration and its performance val-
idation,” in 2021 IEEE 18th Annual Consumer Communications &
service placement strategies, including the selection of the Networking Conference (CCNC). IEEE, Jan. 2021, pp. 1–4.
offload position and quality level of sub-service. Given the [18] N. Hyodo, T. Sato, R. Shinkuma, and E. Oki, “Virtual network function
dynamics of satellite topology and equilibrium of quality and placement model for service chaining to relax visit order and routing
constraints,” in 2018 IEEE 7th International Conference on Cloud
service, FloodSFCP seeks the sub-service placement method Networking (CloudNet). IEEE, Oct. 2018, pp. 1–3.
via formulation of a DQN approach and introduces NoisyNet, [19] T. Menouer, A. Khedimi, C. Cérin, and M. Chahbar, “Scheduling service
Dueling, and N-step learning to improve the model’s gener- function chains with dependencies in the cloud,” in 2020 IEEE 9th
International Conference on Cloud Networking (CloudNet). IEEE, Nov.
alization ability further. The FloodSFCP outperforms all the 2020, pp. 1–3.
others both in training speed and final performance in the [20] X. Gao, R. Liu, and A. Kaushik, “An energy efficient approach for
program-simulated environment and real-world environment. service chaining placement in satellite ground station networks,” in
2021 International Wireless Communications and Mobile Computing
(IWCMC). IEEE, Jun. 2021, pp. 217–222.
R EFERENCES [21] X. Gao, R. Liu, and A. Kaushik, “Service chaining placement based on
[1] J. C. McDowell, “The low earth orbit satellite population and impacts satellite mission planning in ground station networks,” IEEE Transac-
of the spacex starlink constellation,” The Astrophysical Journal Letters, tions on Network and Service Management, vol. 18, no. 3, pp. 3049–
vol. 892, no. 2, p. L36, 2020. 3063, Sep. 2021.
[2] R. Mijumbi, J. Serrat, J.-L. Gorricho, N. Bouten, F. De Turck, and [22] L. Zhang, C. Yang, Y. Ouyang, T. Li, and A. Anpalagan, “Isfc: Intent-
R. Boutaba, “Network function virtualization: State-of-the-art and re- driven service function chaining for satellite networks,” in 2022 27th
search challenges,” IEEE Communications surveys & tutorials, vol. 18, Asia Pacific Conference on Communications (APCC). IEEE, Oct. 2022,
no. 1, pp. 236–262, 2015. pp. 544–549.
[3] D. Bhamare, R. Jain, M. Samaka, and A. Erbad, “A survey [23] T. Ye, J. Zhang, C. Zhao, Y. Tang, and C. Zhu, “Service function
on service function chaining,” Journal of Network and Computer chain orchestration in 6g software defined satellite-ground integrated
Applications, vol. 75, pp. 138–155, 2016. [Online]. Available: networks,” in 2022 6th International Conference on Communication and
https://www.sciencedirect.com/science/article/pii/S1084804516301989 Information Systems (ICCIS). IEEE, Oct. 2022, pp. 71–76.
[4] M. Fortunato, M. G. Azar, B. Piot, J. Menick, I. Osband, A. Graves,
V. Mnih, R. Munos, D. Hassabis, O. Pietquin, C. Blundell, and
S. Legg, “Noisy networks for exploration.” [Online]. Available:
http://arxiv.org/abs/1706.10295
[5] Z. Wang, T. Schaul, M. Hessel, H. van Hasselt, M. Lanctot, and
N. de Freitas, “Dueling network architectures for deep reinforcement
learning.” [Online]. Available: http://arxiv.org/abs/1511.06581

9
Authorized licensed use limited to: BEIJING INSTITUTE OF TECHNOLOGY. Downloaded on January 13,2024 at 10:54:11 UTC from IEEE Xplore. Restrictions apply.
284

You might also like