Professional Documents
Culture Documents
FloodSFCP Quality and Latency Balanced Service Function Chain Placement For Remote Sensing in LEO Satellite Network
FloodSFCP Quality and Latency Balanced Service Function Chain Placement For Remote Sensing in LEO Satellite Network
Abstract—Prompted by the significant advancements in image Typically, remote sensing services require transmitting a
processing technologies and their diverse range of applications, large number of high-resolution images to ground stations.
remote sensing satellites are poised for rapid expansion. Nonethe- However, this process results in high bandwidth costs and
less, offloading the vast amount of remote sensing satellite
images to the ground gateway station is inefficient due to the transmission delays due to the scarcity of satellite resources
exorbitant costs induced by satellite links, while the limited and the vast distance between the satellites and Earth. On the
resources of individual satellites hinder local task processing. other hand, processing a large number of images will consume
With the advancement of the network function virtualization a huge amount of computing resources, making it difficult for
(NFV) technology, a new paradigm for service function chain satellites with limited resources to handle onboard processing.
(SFC) has emerged, which can significantly improve the flexibility
and resource utilization of network services and alleviate resource Since 2021, thousands of low Earth orbit (LEO) satellites
conflicts by dividing large services into smaller ones organized in are keeping being launched to form mega-constellations (e.g.,
the form of SFCs. As mega-constellations (e.g., Starlink) devel- Starlink [1]), aiming at providing low-latency Internet access-
oped, the number of low earth orbit (LEO) satellites is increasing. ing services for global users. By adopting the ideology of
By dividing services into small sub-services and organizing them network functions virtualization (NFV) [2] technologies into
into SFCs throughout the LEO network, services that cannot
be completed by a single satellite can be accomplished through mega-constellations, a new computing paradigm called satel-
multi-satellite cooperation. However, the quality of the remote lite edge computing (SEC) has emerged. In SEC, orchestration
sensing service is positively correlated with its latency, and the platforms (e.g., Kubernetes and KuberEdge) are applied to in-
rapidly changing topology of LEO networks also adds complexity tegrate separate satellite resources on the mega-constellations
to the SFC placement. Hence, how to select appropriate satellites and form a large resource pool to support data-intensive and
to place the SFC and modulate service levels, in order to
obtain better remote sensing results within an acceptable latency, latency-sensitive satellite services. Besides, by employing the
remains a question. To address these issues, this paper proposes methodology of service function chain (SFC) [3], a large
the FloodSFCP, an SFC placement method that aims to increase service could be divided into several small sub-services and
service quality and decrease latency through offline training and sequentially distributed to resource-limited satellites.
online optimization via deep reinforcement learning, taking into By applying the VNF and SFC, a remote sensing service
account the variation in LEO network topology. By introducing
NoisyNet, Dueling, and N-step learning, we improve the model’s could be divided into serval sub-services. For example, a
generalization ability and reduce the state space, thus enhancing simple detection service could be divided into image pre-
convergence speed while reducing decision and training time. processing and object recognition. By placing the sub-services
Experimental results demonstrate that FloodSFCP significantly on different satellites and executing them in a specific order,
improves service quality while reducing total decision costs. the original function could be maintained. Compared with
Index Terms—Service function chain, Deep Reinforcement
Learning, quality-latency balance, satellite network. transmitting the visual data to the ground server, the shorter
link between satellites and lower noise in the vacuum facili-
I. I NTRODUCTION tates a low-latency, low-noise network connection through the
mega-constellations. On the other hand, sub-services demand
With significant advancements in image processing tech-
fewer computing resources than the original service, enabling
nologies, intelligent remote sensing satellites are rapidly ex-
their execution on satellites with limited resources.
panding into diverse areas, including military, forestry, and
Although dividing remote sensing services into sub-services
agriculture. By utilizing advanced techniques like image sharp-
and running sub-service in distributed mode has improved the
ening and recognition, the high-definition camera on the
utilization of satellites resource and decreased the latency in
satellite can capture a broad range of scenes, enabling the
data transmission, the formulation of the sub-service chain and
identification of various targets like buildings, ships, and
making those sub-services work in particular order increase
vehicles within the images.
the complexity and instability of the remote sensing system.
This work was supported by the National Key Research and Development The uneven distribution of satellite resources may incur the
Program of China under grant No. 2020YFB1806000. preemption of adequate nodes and the resource fragmentation
Authorized licensed use limited to: BEIJING INSTITUTE OF TECHNOLOGY. Downloaded on January 13,2024 at 10:54:11 UTC from IEEE Xplore. Restrictions apply.
979-8-3503-0052-9/23/$31.00 ©2023 IEEE 276
2023 20th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON)
Processing Time
Additionally, the sub-service quality has a positive correlation 4
to processing latency and the overall effectiveness of remote
3
sensing service is in a tight connection with the quality
of related sub-services, therefore, the trade-off between the 2
Processing Time(s)
deep reinforcement learning, taking into account the dynamics 1.6
Identified Targets
2.0 15
of satellite topology and the equilibrium of quality and service. 8
1.4 10
Moreover, to improve the generalization ability of the model, 1.5
6
5
we introduce NoisyNet [4], Dueling [5], and N-step learning 1.2 1.0
4 0
[6]to DQN [7] to further reduce the training and convergence 5 5
n
io
io
1.0 4
time. To evaluate the effectiveness of FLoodSFCP, we per- 4
ut
ut
2
ol
ol
5 3 5 3
s
Yolo4 3 Yolo4 3
Re
Re
form evaluations both in program-simulated and real-world 0.8 Res 2 Res 2
c
olut 2 olut 2
ro
ro
ion 1 1 ion 1 1
ep
ep
environments. In the program-simulated environment, FloodS-
Pr
Pr
FCP outperforms the other three algorithms (i.e., NormDQN, (c) (d)
MADDPG [8], and DDPG [9]) in both training speed and
Fig. 1: (a) Remote sensing image example. (b) Comparison
final performance. Furthermore, we set up a testbed using a
between divided and undivided modes. (c) The average pro-
Kubernetes cluster to simulate 20 satellite nodes in LEO and
cessing time at different resolutions. (d) The average number
design a greedy algorithm for comparison. The performance
of identifiable targets at different resolutions.
of FloodSFCP in the testbed is 410% higher than that of the
greedy algorithm.
The key contributions of this work are summarized below:
requires approximately 40 CPU cores and 160GB memory
• We conduct two experiments with the Kubernetes cluster:
[10], which is difficult to place on a single satellite. On the
one aims at exploring the impact of applying SFC on re-
other hand, a higher image resolution usually leads to a better
mote sensing services, and the other aims at exploring the
quality of service, such as target recognition. However, larger
correlation between service quality and service latency.
images also mean larger data volumes and longer processing
• We develop the FloodSFCP, a novel SFC placement
times, resulting in greater service latency. By using NFV and
method deciding the offload position for sub-services and
SFC, the satellite remote sensing services could be divided
their quality levels, aiming at increasing service quality
into multiple lightweight sequential sub-services which have
and decreasing latency with considering the variation of
lower resource requirements and could therefore be more flex-
satellite topology.
ibly placed on resource-constrained satellites. Additionally, by
• We introduce NoisyNet, Dueling, and N-step learning
adjusting the visual resolution of sub-services, the latency and
in DQN to improve the model’s generalization ability
the quality of the remote sensing service can be accommodated
and reduce the state space, and therefore more efficiently
according to various environments. For example, the resolution
handle the balanced placement of services.
of the sub-service would be decreased for a shorter processing
• We prove the effectiveness of FLoodSFCP through sim-
time in latency-sensitive services.
ulation under both program-simulated and real-world
To scrutinize how service dividing impacts processing time,
environments.
the following experiment was designed to characterize the
The rest of this paper is organized as follows. Section II
correlation between service quality and service latency under
gives a detailed motivation and two testbed-based experiments.
various placement methods for satellite remote sensing service,
The problem description and formulation are given in Sec-
and the procedure of the experiment is listed below.
tion III. The MDP-based FloodSFCP placement algorithm is
demonstrated in Section IV. Section V records the evaluation • We use Google Maps Pro to capture 100 remote-sensing
of FloodSFCP. We discuss the related work in Section VI images in various backgrounds with a resolution of
before we conclude in Section VII. 1080×1080, and those images are the input of the ex-
periment. We use YOLO, an acclaimed real-time object
II. MOTIVATION detection and image segmentation model, to sense the
Remote sensing services are typically resource-intensive. input images. Fig. 1a shows one of the remote sensing
For example, a Net Primary Productivity (NPP) application images and its recognition results, all the recognized cars
2
Authorized licensed use limited to: BEIJING INSTITUTE OF TECHNOLOGY. Downloaded on January 13,2024 at 10:54:11 UTC from IEEE Xplore. Restrictions apply.
277
2023 20th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON)
are highlighted with orange boxes. Fig. 1d of Experiment 2 illustrates the relationship between
• We divide remote sensing service into two sub-services: service quality level and the average number of recognized
image pre-processing and image recognition, according to objects. When both sub-services have a service quality level of
the essential sensing procedure of YOLO. In addition, we 1, the recognized target number is only 0.01. When the service
have defined five service quality levels that correspond to levels are 1-5 (pre-processing at level 1 and recognition at level
5 resolutions: 240p, 320p, 480p, 720p, and 1080p. Sub- 5) and 5-1 respectively, the number of recognized targets is
sequently, the sub-services were packaged into Docker 1.4 and 1.88. However, When both sub-service qualities are
images according to the respective service quality level. synchronously improved, the number of recognized targets is
• We set up a Kubernetes cluster composed of 2 nodes 1.6, 3.3, 8.5, and 15.1 respectively for levels 2-2, 3-3, 4-4, and
to host the image of the remote sensing service and 5-5. These observations suggest that the impact of improving
each node was deployed in a VMware virtual machine only one sub-service quality level on the final result is limited
configured with 4 CPU cores and 8GB of RAM. compared to a synchronous improvement of both sub-services.
• Finally, we designed two placement methods, and each Based on the results of the two experiments, it can be
method was executed 100 times to minimize fluctuations concluded that dividing services enables more flexible deploy-
in the experiment result. 1) We randomly assigned 2 to ment, effectively improves resource utilization, and reduces
10 services with a fixed 4-level service quality to two service latency on the satellite nodes. Additionally, increasing
Kubernetes nodes, and we recorded the processing time the service quality level can significantly improve service qual-
both with and without service division to investigate the ity metrics, but it also significantly increases service latency.
impact of service division on processing time. 2) We In the real-world LEO environment where the satellite onboard
deploy all 25 quality levels (e.g., combinations of 5 levels resources are various and the topology is rapidly changing, the
of pre-processing and 5 levels of recognition) of sub- SFC placement method, i.e., the chosen deployment locations
services to one Kubernetes node and recorded the total of the sub-services and the selection of appropriate service
processing time and the number of recognized targets for quality levels, need to be carefully designed.
each combination, in order to characterize the relationship
between service quality and service delay. III. PROBLEM DESCRIPTION AND FORMULATION
The results of Experiment 1 are presented in Fig. 1b,
with the number of randomly incoming services versus the In this section, we present the system model of FloodSFCP
processing time. The median processing time for the divided and the optimal sub-service sequence deployment problem.
mode was from 1.34s to 1.63s, whereas that of the undivided
mode was from 1.46s to 3.37s. We can observe that as the A. System Model
number of services increased, the processing time in the 1) Service Model: We represent the set of all services as K.
divided mode increased by an insignificant 21%. In contrast, For a given service k ∈ K, we use binary variable rn,k ∈ R
the undivided mode exhibited an evident increase of up to to indicate whether it is initiated by satellite node n ∈ N ,
230%. Furthermore, the processing time of the undivided where N is the set of satellites. To further break down each
mode consistently remained greater than that of the divided service k into its constituent sub-services, we employ business
mode under different numbers of tasks. logic to obtain a sequence of sub-services, represented as k =
Fig. 1b shows that when the number of services was 8 [f1 , f2 , ..., f|F | ], where |F | is the number of services divided.
and 10, the expected value and standard deviation for the We introduce a binary variable xfn,k ∈ X to denote whether
divided mode were (1.51, 0.98) and (2.19, 6.08), respectively. satellite node n executes the f -th sub-service of service k.
In contrast, the expected value and standard deviation for the Specifically, we define x0n,k = rn,k . In addition, for each sub-
undivided mode were (2.43, 5.32) and (3.28, 7.69), respec- service f , we define a quality index
tively. This is because each sub-service in the divided mode
needs little resources, so they are more flexible to place and ∀k ∈ K, ∀f ∈ k, qkf = {1, 2, 3, 4, 5} (1)
can make full use of resources.
Fig. 1c describes the impact of service quality levels on the to denote the selection range of quality levels of the f -th sub-
overall processing time in Experiment 2. When the image res- service of service k.
olution of pre-processing and YOLO recognition are both set 2) Time Slot Model: The topology of satellites is time-
at level 1, the entire average processing time is 0.68 seconds. varying and the completion time of the service is uncertain.
As the recognition resolution increases step by step while However, we can assume that the satellite topology and the
the pre-processing resolution remains unchanged, the average service completion status remain unchanged in a sufficiently
processing time increases from 0.71s to 1.03s. Conversely, short period. Therefore, we introduce the concept of a time
as the pre-processing resolution increases step by step while slot to facilitate analysis.
the recognition resolution remains unchanged, the average We have determined a constant value, denoted by τ , and set
processing time increases from 0.81s to 1.87s. Notably, when the total time required for a single placement process as L · τ .
both sub-services are at level 5, the processing time reaches At every interval of τ , the system places sub-services once.
2.18s, which is 320% longer than the lowest level. Then we define a binary variable hlk,f ∈ Hl as an indicator of
3
Authorized licensed use limited to: BEIJING INSTITUTE OF TECHNOLOGY. Downloaded on January 13,2024 at 10:54:11 UTC from IEEE Xplore. Restrictions apply.
278
2023 20th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON)
whether the sub-service f of service k is placed during time Finally, the amount of data returned by a service is usually
slot l ∈ L. Sub-services have a sequential order, hence we get small and can be ignored, hence only the propagation latency
of the backhaul is considered. the total latency Tk for the
∀hlk,f
1
= hlk,f
2
+1 = 1, l1 < l2 . (2)
service k is as follows
3) Satellite Network Topology Model: Based on the def-
|F | |F |
X
inition of time slot, we define the satellite network as an Tk = tk + xn2 ,k · rn1 ,k · D(n1 , n2 ). (7)
undirected graph Gl = (N, E), where E represents all links n1 ,n2 ∈N
between satellites in time slot l. To simplify the problem,
we assume that there is a logical link (n1 , n2 ) between any B. Problem Formulation
two satellites n1 and n2 (n1 , n2 ∈ N ). The logical link is In this section, our objective is to minimize overall service
composed of physical links in E, and the actual hops of latency and maximize overall service quality. The overall
the link are the shortest hops between n1 and n2 (using the service latency for a specific service k has been previously
Dijkstra algorithm). To uniformly allocate the traffic in the real defined, and the service revenue is represented by U (qk ).
link and minimize link utilization, we can use the algorithm
As the two objectives are interconnected and depend on
proposed in [11] , which helps to avoid data flow queuing
the coupling of qk , to achieve a balance between the two
and blocking due to high link utilization. By doing this, we
objectives, we define a joint function as φq · U (qk ) − φt · Tk ,
exclude the impact of actual path selection on transmission
where φn q , φt ∈
o [0, 1] are the two control weights.
latency and only consider the end-to-end bandwidth. f
Furthermore, the wireless transmission rate in the vacuum Xe = x
n,k is used to denote the set of selected satellite
is not a constant and is affected by the number of hops and nodes, and Q e = {qk } is defined as the set of the selected
distance, we define the function W (n1 , n2 ) to represent the service quality, the objective function can be defined as
average data transmission rate of the link (n1 , n2 ). We also
define the function D(n1 , n2 ) to represent the propagation XX
latency of data transmitted through the link (n1 , n2 ). P1 : max [φq · U (qk ) − φt · Tk ]
X,
e Qe
k∈K l∈L
4) Latency Model: We define the predicted time from the
start of the total service to the completion of a certain sub-
service f as tfk . There is
s.t. xfn,k ∈ {0, 1} , qk ∈ {1, 2, 3, 4, 5} ,
tfk = tk,f
start + tk,f
td + tk,f
pro , (3) ∀n ∈ N, k ∈ K, f ∈ k, (8a)
tk,f
X f
where start is the sub-service start time, which is xn,k = hlk,f , ∀f ∈ k, (8b)
f −1 n∈N
k,f t , f ≥ 2,
tstart = k (4) X
0, f = 1, rn,k = 1, ∀k ∈ K, (8c)
n∈N
tk,f
td is the data transmission latency and propagation latency
xfn1 ,k · xfn−1
X
2 ,k
· W (n1 , n2 ) ≤ b(n1 , n2 ), ∀n1 , n2 ∈ N,
from the service initiation satellite (or the execution satellite
k∈K,f ∈k
of the previous sub-service) to the current execution satellite,
(8d)
which is expressed as
f Tk ≤ Rk , ∀k ∈ K. (8e)
V qk , f
f −1 f
k,f
X
ttd = xn2 ,k·xn1 ,k · +D (n1 , n2 ) . (5) Equation (8b) (8c) ensures that each service will only be
W (n1 , n2 )
n1 ,n2 ∈N initiated on one satellite node, and when hlk,f = 1, the sub-
Among them, the function V (q, f ) describes the amount of service will be deployed on one and only one node, otherwise,
data corresponding to different service qualities. Furthermore, the sub-service f will not be deployed in timeslot l. Equation
W (n1 , n2 ) is the internal port transmission rate in the l-th (8d) indicates that the bandwidth of any end-to-end virtual
time slot when n1 = n2 , otherwise, W (n1 , n2 ) is the channel link will not exceed the upper limit b(n1 , n2 ). Offloading
transmission rate. When solving for xfn,k , it is clear that two the service to a distant satellite will significantly increase the
sub-services cannot be deployed or executed simultaneously latency of data transmission. Equation (8e) describes the total
−1
in the same time slot. Therefore, xfn,k is a known variable. time spent on any service, which cannot exceed the upper
k,f
Besides, tpro is the processing latency, expressed as follows tolerance limit Rk .
X f
f
tk,f
pro = xn,k · P qk , f, o n , (6) IV. MDP-BASED SUB-SERVICES PLACEMENT FOR
n∈N
LEO SATELLITE NETWORK
where the variable on represents the CPU computing resources
of the satellite node n, and the function P (qkf , f, o) charac- In this section, we propose an MDP-based sub-services
terizes the processing time required to execute sub-service f placement method with deep reinforcement learning and a
with quality level q when CPU computing resource is o. framework of online decision and offline training.
4
Authorized licensed use limited to: BEIJING INSTITUTE OF TECHNOLOGY. Downloaded on January 13,2024 at 10:54:11 UTC from IEEE Xplore. Restrictions apply.
279
2023 20th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON)
2IIOLQH7UDLQLQJ
2IIOLQH0RGHO7UDLQLQJ
*HQHUDWHUDQGRPVHUYLFHV *HWREVHUYDWLRQV
2 2) 6LPXODWLRQ
6HUYLFH 6SOLW
6XEVHUYLFH4XHXH (QYLURQPHQW
6HUYLFH %DFN
vnfF vnf2 vnf1 6XEVHUYLFH
,QWHUDFWLQJZLWK
SURSDJDWLRQ WKHHQYLURQPHQW
)LOO
6HUYLFH. LQIRUPDWLRQ 8SGDWHVHUYLFH
6WDWHV $FWLRQV 2XWSXW 7KH6WDWXV H[HFXWLRQGDWD
0DQDJHPHQW
3DUDPHWHUV *HWREVHUYDWLRQV /D\HU
8SGDWHQHWZRUNSDUDPHWHUV 3URYLGHVHUYLFH 8SGDWHVHUYLFH
2QOLQHGHFLVLRQPDNLQJ DQGVDWHOOLWH
LQIRUPDWLRQ VWDWXV
5HDOVHUYLFHVDUULYDO 7KH&HQWUDO'HFLVLRQPDNLQJOD\HU
2 2)
6HUYLFH 6SOLW
6XEVHUYLFH4XHXH
6HUYLFH
vnfF vnf2 vnf1
5HTXHVWGHSOR\PHQW
)LOO 6XEVHUYLFH
6HUYLFH.
LQIRUPDWLRQ 6WDWHV $FWLRQV 7KH&ORXGLILFDWLRQ/D\HU
5
Authorized licensed use limited to: BEIJING INSTITUTE OF TECHNOLOGY. Downloaded on January 13,2024 at 10:54:11 UTC from IEEE Xplore. Restrictions apply.
280
2023 20th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON)
Algorithm 1 MDP-based SFC Deployment NoisyNet, Dueling, N-step Learning, and DQN. The specific
Input: observation state s, reward r algorithm is shown in Algorithm 1.
Output: action a DQN forms the foundation of the entire algorithm. In DQN,
1: Initialize all agents network parameters the agent interacts with the environment to obtain the state s
2: Setting replay buffer memory(rbm) size and batch size and reward value r, then fits the action-value function Q based
3: Loading or updating sub-service execution data on r, and finally selects the most valuable action a based on
4: Offline Training Stage: Q. The optimal action value Q∗ can be expressed as:
5: for each episode = 1, 2, 3 . . . do
h i
Q∗ (s, a) = Es′ ∼ε r + λ max Q ∗ ′ ′
(s , a )|s, a , (9)
6: for each timeslot = 1, 2, 3 . . . do ′ a
7: for each subservice = 1, 2, 3 . . . do where ε represents the interaction environment, λ represents
8: Obtain simulation environment observation state s the discount factor for each step. s′ and a′ represent the next
9: Choose an optimal action a state and action. To fit Q, we use a four-layer fully connected
10: Obtain the reward value corresponding to action a, neural network and adopt mean squared error and gradient
and update the simulation environment descent as optimization methods. We define the network
11: Obtain environmental observation of next sub- parameters as θ, and the gradient of the loss function for the
service on the same chain as s′ i-th iteration is
12: if subservice step ≥ n then
∇θi Li (θi ) = Es,a∼ρ(·);s′ ∼ε
13: Put n-step experience into the rbm h i (10)
end if ′ ′
14: r+λmax ′
Q(s , a ; θ i−1 )−Q(s, a; θ i ) ∇ θ i
Q(s, a; θ i ) ,
a
15: end for
16: end for where ρ(s, a) represents the probability of taking each action
17: Randomly fetch data from the rbm by batch size a in state s.
18: Update offline agent parameters with the above data To normalize the input to each neuron, reduce the differ-
19: Update simulation environment ences between samples and improve the model’s generalization
20: end for ability and training speed, we add a layer normalization (LN)
21: Online Decision-making Stage: after each fully connected layer. Assuming the output of the
22: for each timeslot = 1, 2, 3 . . . do fully connected layer is x̂, the calculation formula for LN is:
23: for each subservice = 1, 2, 3 . . . do x̂ − E(x̂)
24: Obtain real environment observation state s LN (x̂) = p · γ(θ) + β(θ), (11)
V ar(x̂) + ϵ
25: Choose an optimal action a and deploy sub-service
into satellite network where ϵ is a small constant to avoid division by zero. γ and
26: Save the execution data of sub-service β are the learnable parameters of the network.
27: end for In real-world environments, the processing time of a service
28: end for is subject to external factors such as power and temperature,
29: if Reaching the update time then which have certain randomness. In addition, the deployment
30: Update the parameters of the online agent by the offline of services based on local environment data may not be able to
agent achieve global optimality. Therefore, we introduce NoisyNet
31: end if into the network model, which adds noise to θ to regularize
the network and improve its robustness. The definition of
NoisyNet is as follows
and observation data from the real environment and obtaining y = (E(γ) + V ar(γ) · ϕγ ) x̂ + E(β) + V ar(β) · ϕβ , (12)
an action to interact with the cloudification layer. The cloud-
ification layer is a Kubernetes cluster deployed on the real where ϕ represents the random noise, which follows Factorised
environment that is responsible for deploying the VNFs of Gaussian noise [4].
sub-services. Additionally, we designed a status management To further enhance the stability and effectiveness of our
layer to collect real environment data in a centralized manner algorithm, we introduce the Dueling architecture. By de-
and interact with the cloudification layer to avoid frequent composing the Q-function into the state-value function and
interactions that could disrupt the environment. Besides, the advantage function, Dueling can improve the estimation ability
collected service information is provided to the simulation of the action. The state-value function computes the maximum
environment to better fit the real environment. expected reward that the agent can obtain in a particular state,
while the advantage function calculates the advantage of each
C. Generalization Reinforcement Learning Approach action relative to the mean action in that state. Specifically,
Reinforcement learning (RL) has been proven to be an the state-value function V and the advantage function A are
effective method in the fields of SFC deployment and resource
V (s) = Es,a∼ρ(·) [Q(s, a)] , (13)
balancing [12]. In this paper, we design FloodSFCP to address
the issues of SFC placement by combining four RL paradigms: A(s, a) = Q(s, a) − V (s). (14)
6
Authorized licensed use limited to: BEIJING INSTITUTE OF TECHNOLOGY. Downloaded on January 13,2024 at 10:54:11 UTC from IEEE Xplore. Restrictions apply.
281
2023 20th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON)
150
Due to the lack of identifiability and poor performance of Q- 1.25 125
The Reward
DDPG DDPG
5.0 1500
Li (θi ) = Es,a∼ρ(·);sn ∼ε 2.5
1000
0.0
!2
n−1
X (16) -2.5 500
γ k rk+1 +γ n max
′
Q (sn ,a′; θi−1)−Q (s, a; θi ) . -5.0
a 0
k=0 0 1000 2000 3000 4000 5000 100 150 200 250 300 350
The Epoch of Training The Num of Services
7
Authorized licensed use limited to: BEIJING INSTITUTE OF TECHNOLOGY. Downloaded on January 13,2024 at 10:54:11 UTC from IEEE Xplore. Restrictions apply.
282
2023 20th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON)
5 5
The Rate of Completion(%) Real Environment Real Environment 15.0 Quality Latency 15.0 Quality Latency
Simulation Environment Simulation Environment 4 4
2500 12.5 12.5
Latency(s)
Latency(s)
80 2000 10.0 3 10.0 3
Quality
Quality
60 1500 7.5 7.5
2 2
40 1000 5.0 5.0
1 1
20 500 2.5 2.5
0 0 0.0 0 0.0 0
100 150 200 250 300 350 100 150 200 250 300 350 φq=1 φq=1 φq=1 φq=0.5φq=0.25 φq=1 φq=1 φq=1 φq=0.5φq=0.25
The Number of Services The Num of Services φt=0.25φt=0.5 φt=1 φt=1 φt=1 φt=0.25φt=0.5 φt=1 φt=1 φt=1
Fig. 5: Comparison between real and simulated environments. Fig. 6: Results of various preferences. (a) Sparse. (b) Dense.
(a) Completion rate (b) Total reward.
KDE
KDE
1.0
2 φq=0.25 φt=1.0 φq=0.25 φt=1.0
was slightly lower than that in simulation, with a maximum
difference of only 2.3% when the numbers were 350. 1
0.5
8
Authorized licensed use limited to: BEIJING INSTITUTE OF TECHNOLOGY. Downloaded on January 13,2024 at 10:54:11 UTC from IEEE Xplore. Restrictions apply.
283
2023 20th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON)
by adopting deep reinforcement learning [16]. Sekine et al. [6] R. S. Sutton, “Learning to predict by the methods of temporal
used Docker and Kubernetes to design an automatic Internet differences,” vol. 3, no. 1, pp. 9–44. [Online]. Available:
http://link.springer.com/10.1007/BF00115009
of Things (IoT) networking SFC deployment framework with [7] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou,
high resource utilization efficiency [17]. In addition, there are D. Wierstra, and M. Riedmiller, “Playing atari with deep reinforcement
also studies related to the visit order [18], cloud placement learning.” [Online]. Available: http://arxiv.org/abs/1312.5602
[8] R. Lowe, Y. I. Wu, A. Tamar, J. Harb, O. Pieter Abbeel, and I. Mordatch,
[19], etc. Through analyses, existing studies rarely consider “Multi-agent actor-critic for mixed cooperative-competitive environ-
the classification of service quality levels, which is critical for ments,” Advances in neural information processing systems, vol. 30,
optimizing the SFC placement problem. 2017.
[9] D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller,
In addition, the SFC placement based on satellite networks “Deterministic policy gradient algorithms,” in International conference
is relatively few. Gao et al. proposed a location-aware resource on machine learning. Pmlr, 2014, pp. 387–395.
allocation algorithm for SFC placement and routing traffic [10] J. Yan, Y. Ma, L. Wang, K.-K. R. Choo, and W. Jie, “A cloud-
based remote sensing data production system,” Future Generation
problems in satellite ground station networks to minimize Computer Systems, vol. 86, pp. 1154–1166, 2018. [Online]. Available:
link resource utilization and the number of servers used in https://www.sciencedirect.com/science/article/pii/S0167739X17303035
research [20], and discussed the energy optimization problem [11] “Flylisl: Traffic balance awared routing for large-scale mixed-reality
telepresence over reconfigurable mega-constellation,” in 2022 IEEE
of SFC placement and converted it into an integer nonlin- 42nd International Conference on Distributed Computing Systems Work-
ear programming problem in research [21]. To improve the shops (ICDCSW). IEEE, 2022, pp. 260–265.
flexibility and scalability of satellite networks, Zhang et al. [12] Y. Zhu, H. Yao, T. Mai, W. He, N. Zhang, and M. Guizani, “Multiagent
reinforcement-learning-aided service function chain deployment for
proposed an intent-driven SFC deployment scheme, which internet of things,” vol. 9, no. 17, pp. 15 674–15 684. [Online].
can generate the optimal service function path [22]. Ye et Available: https://ieeexplore.ieee.org/document/9712642/
al. considered resource constraints, delay requirements, and [13] V. Konda and J. Tsitsiklis, “Actor-critic algorithms,” Advances in neural
information processing systems, vol. 12, 1999.
intermittent spatial link conditions to ensure the reliability of [14] H. Baek, I. Jang, H. Ko, and S. Pack, “Order dependency-aware service
SFC in the satellite-ground network, and designed a heuristic function placement in service function chaining,” in 2017 International
decoupling algorithm to solve the problem [23]. Conference on Information and Communication Technology Conver-
gence (ICTC). IEEE, Oct. 2017, pp. 193–195.
[15] H. Ko, D. Suh, H. Baek, S. Pack, and J. Kwak, “Optimal placement
VII. C ONCLUSION of service function in service function chaining,” in 2016 Eighth
International Conference on Ubiquitous and Future Networks (ICUFN).
This paper proposes FloodSFCP, an SFC placement method IEEE, Jul. 2016, pp. 102–105.
for remote sensing in the LEO satellite network. To improve [16] T. Wang, Q. Fan, X. Li, X. Zhang, Q. Xiong, S. Fu, and M. Gao,
the utilization of satellites resource, large remote sensing “Drl-sfcp: Adaptive service function chains placement with deep re-
inforcement learning,” in ICC 2021-IEEE International Conference on
services are divided into smaller sub-services and further Communications. IEEE, Jun. 2021, pp. 1–6.
placed on different satellites and executed in a specific order. [17] H. Sekine, K. Kanai, J. Katto, H. Kanemitsu, and H. Nakazato, “Iot-
FloodSFCP aims at finding quality and latency balanced centric service function chainingorchestration and its performance val-
idation,” in 2021 IEEE 18th Annual Consumer Communications &
service placement strategies, including the selection of the Networking Conference (CCNC). IEEE, Jan. 2021, pp. 1–4.
offload position and quality level of sub-service. Given the [18] N. Hyodo, T. Sato, R. Shinkuma, and E. Oki, “Virtual network function
dynamics of satellite topology and equilibrium of quality and placement model for service chaining to relax visit order and routing
constraints,” in 2018 IEEE 7th International Conference on Cloud
service, FloodSFCP seeks the sub-service placement method Networking (CloudNet). IEEE, Oct. 2018, pp. 1–3.
via formulation of a DQN approach and introduces NoisyNet, [19] T. Menouer, A. Khedimi, C. Cérin, and M. Chahbar, “Scheduling service
Dueling, and N-step learning to improve the model’s gener- function chains with dependencies in the cloud,” in 2020 IEEE 9th
International Conference on Cloud Networking (CloudNet). IEEE, Nov.
alization ability further. The FloodSFCP outperforms all the 2020, pp. 1–3.
others both in training speed and final performance in the [20] X. Gao, R. Liu, and A. Kaushik, “An energy efficient approach for
program-simulated environment and real-world environment. service chaining placement in satellite ground station networks,” in
2021 International Wireless Communications and Mobile Computing
(IWCMC). IEEE, Jun. 2021, pp. 217–222.
R EFERENCES [21] X. Gao, R. Liu, and A. Kaushik, “Service chaining placement based on
[1] J. C. McDowell, “The low earth orbit satellite population and impacts satellite mission planning in ground station networks,” IEEE Transac-
of the spacex starlink constellation,” The Astrophysical Journal Letters, tions on Network and Service Management, vol. 18, no. 3, pp. 3049–
vol. 892, no. 2, p. L36, 2020. 3063, Sep. 2021.
[2] R. Mijumbi, J. Serrat, J.-L. Gorricho, N. Bouten, F. De Turck, and [22] L. Zhang, C. Yang, Y. Ouyang, T. Li, and A. Anpalagan, “Isfc: Intent-
R. Boutaba, “Network function virtualization: State-of-the-art and re- driven service function chaining for satellite networks,” in 2022 27th
search challenges,” IEEE Communications surveys & tutorials, vol. 18, Asia Pacific Conference on Communications (APCC). IEEE, Oct. 2022,
no. 1, pp. 236–262, 2015. pp. 544–549.
[3] D. Bhamare, R. Jain, M. Samaka, and A. Erbad, “A survey [23] T. Ye, J. Zhang, C. Zhao, Y. Tang, and C. Zhu, “Service function
on service function chaining,” Journal of Network and Computer chain orchestration in 6g software defined satellite-ground integrated
Applications, vol. 75, pp. 138–155, 2016. [Online]. Available: networks,” in 2022 6th International Conference on Communication and
https://www.sciencedirect.com/science/article/pii/S1084804516301989 Information Systems (ICCIS). IEEE, Oct. 2022, pp. 71–76.
[4] M. Fortunato, M. G. Azar, B. Piot, J. Menick, I. Osband, A. Graves,
V. Mnih, R. Munos, D. Hassabis, O. Pietquin, C. Blundell, and
S. Legg, “Noisy networks for exploration.” [Online]. Available:
http://arxiv.org/abs/1706.10295
[5] Z. Wang, T. Schaul, M. Hessel, H. van Hasselt, M. Lanctot, and
N. de Freitas, “Dueling network architectures for deep reinforcement
learning.” [Online]. Available: http://arxiv.org/abs/1511.06581
9
Authorized licensed use limited to: BEIJING INSTITUTE OF TECHNOLOGY. Downloaded on January 13,2024 at 10:54:11 UTC from IEEE Xplore. Restrictions apply.
284