Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

System Model and Formulation

İdil Bensu Çılbır (Student)


March 2022

1 Software Defined Networking


SDN (software-defined networking) is a networking architecture that allows the
control and data planes to be seperated in a network. SDN does this by isolating
control plane functions from forwarding devices like as switches and routers and
centralizing them on an SDN controller.
The controller in an SDN network is in charge of deciding which actions
should be performed on packets, such as forwarding or dropping, and installing
these rules in forwarding devices, such as switches. THe southbound interface
is where the SDN controller connects with the forwarding devices, and the com-
munication protocol utilized is the OpenFlow protocol. The simple architecture
of SDN can be seen in Fig. 1.

Figure 1: Simple SDN architecture

1
2 5G Architecture
5G has a complex architecture and there are many entities that function to
enable 5G technology. This complex architecture can be seen in Fig. 2 and each
entitiy has distinct properties.

Figure 2: 5G Service-based architecture

Access control, registration, and mobility management are some of the fea-
tures provided by Access and Mobility Management Function (AMF). Session
Management Control (SMF) provides session management functions such as
creating, updating and releasing sessions. SMF also assists the User Equipment
(UE) with IP address allocation and confiures routing decisions at the User
Plane Function (UPF) to facilitate traffic routing to proper destinations. IT
can also support the user plane’s quality of service (QoS) requirements. An-
other essential feature of UPF is that it links to the data network; the UPF
that connects to the data network is referred to as the Protocol Data Unit
(PDU) Session Anchor. Unified Data Management (UDM) provides user iden-
tification functionality and created authentication credentials, whereas Policy
Control Function (PCF) provides a policy framework to guide network activity.
SMS and subscription management are also supported by UDM. The Authen-
tication Server Function (AUSF) handles authentication, whereas the Network
Slice Selection Function (NSSF) chooses the network slice that will serve the
UE. NSSF is usually associated with AMF. [1]

3 Proposed Architecture
Consider a 5G core architecture based on SDN concepts, as illustrated in Fig.
3. This architecture will be implemented in the environment of Reinforcement
Learning model.

2
Figure 3: SDN-based 5G Network Architecture

All control plane network functions are moved on top of the SDN controller
in this architecture. All of these network operations are implemented as apps,
which connect with the SDN controller via its northbound interface. The data
plane of a 5G network, referred to as the UPF, is implemented as an SDN switch.
The OpenFlow protocol is used by the UPF to connect with the controller on its
southbound interface. The access network remains unchanged; however, each
base station is supposed to be connectec to an SDN switch that can communicate
in order to make the base stations OpenFLow compatible. The SDN controller
is in charge of managing the data plane and implementing flow rules in data
plane switches, such as UPF and base station switches.
The implementation of this architecture is shown in Fig. 4. The way how
this architecture is going to work is described as follows:

• SDN controller gets data traffic information and routes data to base sta-
tions according to the routing table which is formed with data traffic flow.
• When an unusual traffic information arrives to the SDN controller, the
data is rerouted to base stations, and the routing table is upgraded by
adding the new data traffic flow to the routing table.
• SDN controller tries to satisfy the URLLC requirements.

This architecture is for the environment in Reinforcement Learning model.


The next section will clarify this architecture quantitatively.

4 Mathematical Formulation of 5G Architecture


There is a 5G core and backhaul model as a directed graph G(N, L), where N
is a set of forwarding nodes, L is the set of links with link capacity c(e), e ∈ L.

3
Figure 4: SDN-based 5G Network Implementation

Among the forwarding nodes, B denotes the set of BSs. Nc denotes the core
network nodes and D denotes the set of SDN switches. In the network, a flow
is denoted by k and all the flows constitute a set K. A feasible path p that
serves flow k is a loop-free path with source s(k) and destination t(k), and p
must traverse at least one SDN switch. Thus the set of feasible paths that may
serve the flow k, is defined as

Pk = {p| p = (s(k), ..., i, ..., t(k)), i ∈ D}


and Pk (i) = {p|i ∈ p, p ∈ Pk } denotes the paths that serve flow k and
traverse SDN switch i. Considering only one source scenario, the set of all
feasible paths for specific flow is defined as
[
P = Pk
k∈K

and the set of paths that include edge e is defined as

Pe = {p ∈ P | e ∈ p}, e ∈ L
The multi-source scenario can also be expressed as the same mathematical
formulation that can be easily integrated into traffic engineering.
These previous sections described the structure of the environment in Re-
inforcement Learning model in both theoretically and mathematically. Next
sections will describe the overall RL model that is aimed to be designed and
what the other objects in the model are going to do.[3]
Before getting into that, the definition and the mathematical formulation of
reinforcement learning will be given in the next section.

4
5 Reinforcement Learning
Reinforcement Learning (RL) is a method for addressing issues using rewards.
RL is the process through which machines learn to attain a goal through inter-
actions with their surroundings. RL is a sequential decision-making and control
problem in mathematics.
In RL, the purpose is to take actions ober time in order to maximize the
expected value of the return, or to adopt the optimal policy. Return and policy
can be defined as follows. The return Gt is the total discounted reward from
time-step t.

X
Gt = Rt+1 + γRt+2 + ... = γ k Rt+k+1
k=0

The discounted future rewards can be interpreted as the current value of


the future rewards. The discount factor values the immediate reward above
delayed reward: γ close to 0 leads to ”myopic” evaluation; γ close to 1 leads to
”far-sighted” evaluation.
A policy π is a distribution over actions given states,

π(a|s) = P [At = a|St = s]


It is also time independent. A policy guides the choice of action at a given
state.
There are two value functions, state-value function and action-value function.
They have a contribution of finding the optimal policy. The state-value function
vπ is the expected return starting from state s, and then following policy π

vπ (s) = Eπ [Gt |St = s]


The action-value function qπ (s, a) is the expected return starting from state
s, taking action a, and then the policy π

qπ (s, a) = Ep i[Gt |St = s, At = a]


Expressing qπ (s, a) in terms of vπ (s) in the expression of vπ (s), Bellman
equation comes out for vπ ,
X X

vπ (s) = π(a|s)(Rsa + γ a
Pss ′ , vπ (s ))

a∈A s′ ∈S

One use for the Bellman equation is to compute the value function for a
given policy.[2]
Here is the mathematical formulation of RL, described as above. The pro-
posed RL model for data traffic control in 5G architecture to satisfy Ultra
Reliable and Low Latency Communications (URLLC) requirements is clarified
in detail in the next section.

5
6 System Model
The model can briefly shown in Fig. 5, which every object in the model is
defined. Those can be expressed in detail as follows:

• Environment: SDN-based 5G architecture which is clarified in detail in


previous sections.
• State: The model will get states of current traffic and the latency perfor-
mance. The current traffic can be defined as the data traffic happening
because of the time of data flow in base stations. It can be received by
getting the traffic information. For the latency performance, the delay,
which is the result of data traffic, can be measured.
• Action: Routing the data traffic flow to base stations can be expressed as
action in this model. If there are total N ∗ (N − 1) flows in a network with
K
N nodes, the RL model might reqire a large action space size CN ∗(N −1) .
If the action space is defined as {0, 1, ..., (N ∗ (N − 1)) − 1}, then the agent
has to be sampled in different actions in each time step. In other words,
the data flows should be distributed into critical flows.
• Reward: If the total latency satisfies URLLC requirements, then the
model is returned by the reward. QoS-aware reward function would be a
suitable approach, because a real-time traffic has excellent QoS awareness
although it is inelastic to adopt packet transmission rates. In particular,
the agent uses reinforcement learning to select the routing path with the
highest QoS-aware reward, taking into account traffic kinds and users’
apps.
• Policy: Data traffic flow in SDN controllers are distributed to base sta-
tions via switches. In order to control data traffic flow to satisfy URLLC
requirements, the best data traffic flows can be found via different actions,
i.e. critical flows. Data traffic flow distributed to base stations from SDN
controller is the policy of the model.

The next parts will focus on the aim of this model and it will be described
mathematically.

6.1 Objective
The objective is to find the best explicit routing ratios for each critical flow so
that the maximum link utilization is minimized. When the link utilization is
minimized, the total latency can be decreased to meet URLLC requirements.

6.2 Mathematical Formulation


Given a network G(N, L), as 5G core and backhaul model as clarified in Section
4, with the set of traffic demands Ds,d for critical flows < s, d >∈ fK and the

6
Figure 5: System RL model Block Diagram

7
background link load li,j contributed by the remaining flows using the default
ECMP routing, the rerouting problem can be formulated as an optimization
in order to find all viable under-utilized pathways for the chosen critical flows.
This formulation goes as follows.

X X s,d
minimize U +ϵ∗ σi,j
<i,j>∈L <s,d>∈fK

subject to

X s,d
li,j = σi,j ∗ Ds,d + li,j i, j :< i, j >∈ L
<s,d>∈fK

li,j ≤ ci,j ∗ U i, j :< i, j >∈ L


X X −1 if i = s

s,d s,d
σk,i − σi,k = 1 if i = d (1)

k:<k,i>∈L k:<i,k>∈L 
0 otherwise
i ∈ N, s, d :< s, d >∈ fK

s,d
0 ≤ σi,j ≤1 s, d :< s, d >∈ fK , i, j :< i, j >∈ L

Notations:
• G(N, L): network with nodes N and links L
• ci,j : the capacity of link < i, j > < i, j >∈ L
• li,j : the traffic load on link < i, j > < i, j >∈ L
• Ds,d : the traffic demand from source s to destination d s, d ∈ L, s ̸= d
s,d
• σi,j : the percentage of traffic demand from source s to destination d
routed on link < i, j > s, d ∈ N, s ̸= d, < i, j >∈ L, < s, d >∈ fK
P P s,d
ϵ ∗ <i,j>∈L <s,d>∈fK σi,j in first equation is required because otherwise,
the optimal solution may incorporate unnecessarily long paths as long as they
avoid the most congested link, where ϵ(ϵ > 0) is small enough constant to
ensure that minimization of U takes precedence. The second equation indicates
the traffic load on link < i, j < contributed by the traffic demands routed by the
explicit routing and the traffic demands routed by the default ECMP routing.
The third equation is the link capacity utilization constraint. The forth equation
is the flow conservation constraint for critical flows.
The optimal explicit routing solution for critical flows can be obtained by
solving the above linear programming (LP) problem. Then the SDN controller
installs and updates flow entries at the switches accordingly.

8
Another fact that should be discussed is how this system model’s formulation
is going to be trained in RL model. A subsection below discusses about this
situation.

6.2.1 RL model
When the system decides the routing of data traffic, it should also decide
whether the latency and unreliability caused by data traffic satisfies URLLC
requirements or not.
(u)
The state at a time step can be represented by the set St = {St , T Mt where
Stu collect URLLC information at step t and T Mt is a traffic matrix at time
step t that contains information of traffic demand for each flow. In particular,
(u)
the 2-dimensional state St is
(u)
St = {Qt , δt }
where Qt represents the length of URLLC queue at step t, while δt = dmax u −
dold
t represents the difference between the tolerable latency and the latency of
the oldest packet in the queue at step t.
Reward r is defined as 1/U , which is set to reflect the URLLC requirements
after rerouting critical flows to balance link utilization. The smaller U (i.e. the
greater reward r), the closer meeting URLLC requirements.
In particular, in order to satisfy URLLC requirements, the two equations in
[4] written below should be true.
packet
L
RiU RLLC ≥ i [Fi − f |−1(pLat Fi U RLLC
max Fi e )] ≜ Ri,min
W Tmax

poutage
i,n
min
= Pr {SIN Ri,n < SIN Ri,n } ≤ poutage
max

Notations:
• RiU RLLC : the data rate of each URLLC service of the i-th communication
link
packet
• Li : exponential distribution with mean packet size in bits of the i-th
communication link
• W : bandwidth of each channel
• Tmax : the maximum tolerable latency threshold
• Fi = λTmax /(1 − eλi Tmax )
• f−1 (.) : [e−1 , 0) → [−1, ∞]: lower branch of Lambert function meeting
y = f−1 (yey )
• Ri,min
U RLLC
: the minimum data rate to ensure the latency constraint

• poutage
i,n : the outage probability

9
• SIN Ri,n
min
: the minimum SINR threshold of communication link i on the
n-th subchannel
• poutage
max : the maximum violation probability

According to the above two equations, the reward function of the model
includes link utilization, as well as reliability and latency, which is expressed by

r = U − c1 XiU RLLC
where

(
1, if U RLLC requirements are not satisf ied,
XiU RLLC = (2)
0, otherwise

The parameter c1 denotes the the possible constant of latter part and they
are adopted for balancing the utility and cost.

7 Conclusion
An introduction for SDN structure and 5G is defined first, and then the proposed
SDN-based 5G architecture and implementation are described. After that, this
model is explained numerically with formulation. This described architecture
is the environment of the reinforcement learning model. First reinforcement
learning is briefly clarified with formulation. Afterwards, a whole system model
is explained in block diagrams, and objects for this RL model are defined. This
system is formulated and each object in this RL model.

References
[1] Abdulaziz Abdulghaffar et al. “Modeling and Evaluation of Software De-
fined Networking Based 5G Core Network Architecture”. In: IEEE Access
9 (2021), pp. 10179–10198. doi: 10.1109/ACCESS.2021.3049945.
[2] Fabio Saggese et al. Deep Reinforcement Learning for URLLC data man-
agement on top of scheduled eMBB traffic. 2021. arXiv: 2103.01801 [eess.SP].
[3] Gang Wang et al. “Efficient Traffic Engineering for 5G Core and Backhaul
Networks”. In: Journal of Communications and Networks 19 (Aug. 2016).
doi: 10.1109/JCN.2017.000010.
[4] Helin Yang et al. Deep Reinforcement Learning Based Massive Access Man-
agement for Ultra-Reliable Low-Latency Communications. 2020. arXiv: 2002.
08743 [eess.SP].

10

You might also like