Download as pdf or txt
Download as pdf or txt
You are on page 1of 35

ML-based User-Base Station association

for high mobility scenarios

Submitted in partial fulfillment of the requirements of the degree of Bachelor of


Technology

By

Manda Abhinay Reddy - 204242


Pammi Vijay Hemanth - 204255
Suhel Faraz Siddiqui - 204268

Supervisor

Dr. Chayan Bhar


Assistant Professor
Department of Electronics and Communication Engineering
Approval Sheet
This project entitled

“ML-based User-Base Station Association for high


mobility scenarios”

Done by
Manda Abhinay Reddy - 204242
Pammi Vijay Hemanth - 204255
Suhel Faraz Siddiqui - 204268

is approved for the degree of Bachelor of Technology.


Examiners

Supervisor
Dr. Chayan Bhar
Assistant Professor
Department of Electronics and Communication Engineering

Head of the Department


Prof. D. Vakula
Department of Electronics and Communication Engineering
Date: 31-10-2023
Place: Warangal

2
Declaration
I declare that this written submission represents my ideas in my own words
and where others' ideas or words have been included, I have adequately cited
and referenced the original sources. I also declare that I have adhered to all
principles of academic honesty and integrity and have not misrepresented,
fabricated or falsified any idea/data/fact/source in my submission. I
understand that any violation of the above will be cause for disciplinary action
by the Institute and can also evoke penal action from the sources that have
thus not been properly cited or from whom proper permission has not been
taken when needed.

Manda Abhinay Reddy - 204242

Pammi Vijay Hemanth - 204255

Suhel Faraz Siddiqui - 204268

Date: 30-10-2023

Certificate

3
This is to certify that the dissertation Work entitled “ML BASED USER-BASE
STATION ASSOCIATION FOR HIGH MOBILITY SCENARIOS” is a bonafide
record of work carried out by

Manda Abhinay Reddy - 204242


Pammi Vijay Hemanth - 204255
Suhel Faraz Siddiqui - 204268

submitted to the faculty of “Department of Electronics and Communication


Engineering”, in partial fulfillment of the requirements for the award of the
degree of Bachelor of Technology in “Electronics and Communication
Engineering” at National Institute of Technology, Warangal during the
academic year 2023-2024.

Prof. D. Vakula Dr. Chayan Bhar

Head of the Department Assistant Professor

Department of Electronics Department of Electronics


and Communication Engineering and Communication Engineering

NIT Warangal NIT Warangal

Contents

4
Approval Sheet ……………………………………………………………………………….2

Declaration ……………………………………………………………………………………. 3

Certificate ……………………………………………………………………………………….4

List of Acronyms …………………………………………………………………………….6

1. Introduction ……………………………………………………………………………….7
1.1 Problem Statement……………………………………………………………..8
1.2 Theory…………………………………………………………………………….…8
1.3 Applications, Merits, and Demerits……………………………….……11
2. Literature Survey….…………………………………………………………………...12
2.1 Human Mobility……………………………………………………………….12
2.2 Poisson Point Processing(PPP)...................................................15
2.3 Reinforcement Learning…………………………………………………….19
2.3.1 Proximal Policy Optimization…………………………………20
2.4 Path loss, Shadowing and Fading…………………………………………21

3. Workflow…………………………………………………………………………………...24
4. Workdone and Results……………………………………………………………….26
4.1 3D Contour………………………………………………………………………26
4.2 Poisson Point Processing(PPP)...................................................29
4.3 Base Stations………………………………………………………………………32
4.4 RL Agent…………………….…………………………………………………….33

References……………………………………………………………………………………..

5
List of Acronyms

1. ML- Machine Learning


2. DL- Deep Learning
3. RL- Reinforcement Learning
4. BS- Base Station
5. UE-BS User-Base Station
6. UE - User Equipment
7. PPP - Poisson Point Process
8. PPO - Proximal Policy Optimisation
9. UAV - Unmanned Aerial Vehicle
10. PPP - Poisson Point Processing
11. dB - Decibels
12. QoS - Quality of Service
13. MIM0 - Multiple-Input Multiple-Output
14. GPU - Graphics Processing Unit
15. CTWR - Continuous-Time Random-Walk
16. MSD - Mean Square Displacement
17. PMF - Probability Mass Function
18. NB - Negative Binomial

6
Chapter 1

Introduction

In high-mobility scenarios, such as in a vehicular network or a


fast-moving device, traditional methods of user-base station (BS)
association may not be suitable due to rapidly changing network
conditions. This has subsequently led to a rise of several mobility
management issues which require optimization techniques to avoid
performance degradation.

Machine learning (ML) concerns the development of


algorithms that converge to an optimal solution and improves the
system performance without any human intervention. By embracing
ML, wireless networks can achieve dynamic and adaptive UE-BS
associations, ultimately improving network performance and
enhancing the user experience in high-mobility scenarios. In
particular, the work focuses on mitigating mobility-related concerns
and networking issues at different mobility levels by employing
diverse heuristic as well as reinforcement learning (RL) methods.

The applications of this are numerous and a few of those are


Enhanced Connectivity, Latency Reduction, Resource
management, Quality of Service(QoS) improvement, Handover
optimization, 5G and beyond, and Energy efficiency.

7
1.1 Problem Statement
● User and Base Station distribution is modeled as a Poisson point
process (PPP), where they are distributed as individual points.
● Dynamic and immersive 3D contour-based environment for
simulating realistic scenarios.
● Integrating real-world factors like shadowing, signal fading, and
path loss to determine the optimal Base Station association.
● A model of human mobility guides the actions of the users.

1.2 Theory

1.2.1 Machine learning

Machine learning is a subset of artificial intelligence (AI) that


focuses on the development of algorithms and models that enable
computer systems to learn and make predictions or decisions based
on data. It's a field of study that has seen significant advancements
and applications in various fields. Machine learning algorithms are
mathematical models that are trained on data. They can be
categorized into three main types: Supervised learning, unsupervised
learning and Reinforcement learning.

8
(i)Supervised learning: Algorithms learn from labeled data, making
predictions or decisions based on known outcomes. Common
algorithms include linear regression, decision trees, and neural networks.

Figure 1.1(a),(b)

(ii)Unsupervised learning: Algorithms work with unlabeled data,


identifying patterns, clusters, or structures within the data. Examples
include k-means clustering and principal component analysis (PCA).

Figure 1.2

(iii)Reinforcement learning: These algorithms learn by interacting


with an environment and receiving feedback in the form of rewards. The
goal of reinforcement learning is to train an agent to complete a task
within an uncertain environment. At each time interval, the agent

9
receives observations and a reward from the environment and sends an
action to the environment.

Figure 1.3

1.2.2 Users and Base Stations

Base stations are a critical component of wireless


communication networks, and their design and deployment can
vary based on the network's requirements, technology (e.g., 2G, 3G,
4G, 5G), and environmental conditions. They provide the necessary
infrastructure for mobile and wireless communication, ensuring
that users can connect and communicate within the network's
coverage area. Base stations are generally a transceiver, capable of
sending and receiving wireless signals. Base stations in cellular
telephone networks are more commonly referred to as cell towers.
Each cellphone connects to the cell tower, which in turn connects it
to the wired public switched telephone network (PSTN), the

10
internet or to other cellphones within the cell. The size of the base
station depends on the size of the area covered, the number of clients
supported and the local geography.

1.3 Applications, Merits, and Demerits

● ML algorithms can dynamically associate users with the most


appropriate base stations as they move, ensuring seamless and
high-quality connectivity in fast-paced environments.This can be used
in ultra-dense wireless networks, Unmanned Aerial Vehicle (UAV)
networks, heterogeneous networks (HetNets), massive MIMO.
● ML models can predict user movements and help reduce latency by
preemptively connecting them to the best-suited base station.
● This project uses realistic scenarios. In mobile scenarios, ML can
optimize the power consumption of base stations by turning them off
or adjusting their transmit power when not needed due to the mobility
of users.
● Demerits:
○ ML models require continuous training and updates to adapt to
changing network conditions and user behaviors, which can be
resource-intensive and time-consuming.
○ Implementing ML in network infrastructure can be costly, and
requires GPUs and cloud storage.

11
Chapter 2

Literature Survey

2.1 Human Mobility

Understanding human mobility patterns is a fundamental aspect of


urban planning, transportation management, and the optimization of
various services, from healthcare to marketing. This abstract explores the key
concepts related to human mobility patterns. It delves into the significance of
analyzing mobility data, the methodologies and data sources commonly
used, and the implications for both public and private sectors. Human
mobility patterns have gained increased attention in recent years, driven by
advances in technology and data analysis. This knowledge can lead to more
efficient and sustainable urban environments, improved public services, and
enhanced user experiences in a variety of applications. As in [1], the
Individual human trajectories are characterized by fat-tailed distributions of
jump sizes and waiting times, suggesting the relevance of continuous-time
random-walk (CTRW) models for human mobility.

The user’s location was recorded with the resolution that is


determined by the local tower density. The reception area of a tower varies
from as little as a few hundred meters in metropolitan areas to a few
kilometers in rural regions, controlling our uncertainty about the user’s
precise location.

Δr denotes the distances covered by an individual between consecutive


sightings and Δt is the time spent by an individual at the same location.

12
0 < α ≤ 2 and 0< β ≤ 1
Three empirical observations that indicate that human trajectories follow
reproducible scaling laws, but also illustrate the shortcoming of the CTRW
model in capturing the observed scaling properties.

● The number of distinct locations S(t) visited by a randomly moving


object is expected to follow

where µ = 1 for Lévy flights and µ = β for CTRW

● Visitation frequency: the probability f of a user to visit a given location


is expected to be asymptotically (t → ∞) uniform everywhere (f
∼const.) for both Lévy flights and CTRWs. In contrast, the visitation
patterns of humans is rather uneven, so that the frequency f of the kth
most visited location follows Zipf’s law.

ζ ≈ 1.2±0.1
This suggests that the visitation frequency distribution follows

● Ultraslow diffusion: the CTRW model predicts that the mean square
displacement (MSD) asymptotically follows
with ν = 2β/α ≈ 3.1. As both P(Δr) and P(Δt) have
cutoffs, asymptotically the MSD should converge to a Brownian
behavior with ν =1. However, this convergence is too slow to be
relevant in our observational time frame. Either way, CTRW predicts
that the longer we follow a human trajectory, the further it will drift
from its initial position. Yet, humans have a tendency to return home
on a daily basis, suggesting that simple diffusive processes, which are
not recurrent in two dimensions, do not offer a suitable description of
human mobility. Measurements indicate an ultraslow diffusive process,
in which the MSD seems to follow a slower than logarithmic growth.

13
Figure 2.1

After a waiting time is chosen from the P(Δt) distribution, the individual
will change his/her location. We assume that the individual has two choices.
(i) Exploration: with probability

the individual moves to a new location (different from the S locations he/she
visited before). The distance 1r that he/she covers during this exploratory
jump is chosen from the P(Δr) distribution and his/her direction is selected
to be random. As the individual moves to this new position, the number of
previously visited locations increases from S to S + 1.

(ii) Preferential return: with the complementary probability Pret = 1−Pnew


the individual returns to one of the S previously visited locations. In this case,
the probability 𝚷i to visit location i is chosen to be proportional to the

14
number of visits the user previously had to that location. That is, we assume
that 𝚷i = fi.

Figure 2.2 Schematic description of the individual-mobility model.

2.2 Poisson Point Processing(PPP)

As in [3], Cellular systems are becoming more heterogeneous with the


introduction of low-power nodes including femtocells, relays, and
distributed antennas. Unfortunately, the resulting interference environment
is also becoming more complicated, making evaluation of different
communication strategies challenging in both analysis and simulation.
Leveraging recent applications of stochastic geometry to analyze cellular
systems, this paper proposes to analyze downlink performance in a fixed-size
cell, which is inscribed within a weighted Voronoi cell in a Poisson field of
interferers. A nearest out-of-cell interferer, out-of-cell interferers outside a
guard region, and cross-tier interferers are included in the interference
calculations.
Bounding the interference power as a function of distance from the cell
center, the total interference is characterized through its Laplace transform.
Simulations show that the proposed model provides a flexible way to
characterize outage probability and rate as a function of the distance to the

15
cell edge.
PPPs have been used to model co-channel interference from macro
cellular base stations, cross-tier interference from femtocells, co-channel
interference in ad hoc networks, and as a generic source of interference.
Modeling co-channel interference from other base stations as performed is a
good starting point for developing insights into heterogeneous network
interference. PPPs are used to model various components of a
telecommunications network including subscriber locations, base station
locations, as well as network infrastructure leveraging results on Voronoi
tessellation. A cellular system with PPP distribution of base stations, called a
shotgun cellular system, is shown to lower bound a hexagonal cellular
system in terms of certain performance metrics and to be a good
approximation in the presence of shadow fading.

Figure 2.3

(a) Common fixed geometry model with hexagonal cells and multiple tiers of interference. (b)
Stochastic geometric model where all base stations are distributed according to some 2D
random process. (c) A proposed hybrid approach where there is a fixed cell of a fixed size
surrounded by base stations distributed according to some 2D random process, with an
exclusion region around the cell and a dominant interferer at the boundary of the guard region.

As in [5], The number of users per cell is an important quantity in


calculating various performance metrics in a cellular network. For a
non-shadowing environment, while assuming a homogeneous Poisson point
process for base station locations, the distribution of the number of users per

16
cell has been derived using approximations for the distribution of the cell
area. This approach does not extend to path-loss models that include
shadowing where users do not necessarily connect to the spatially nearest
base station. The purpose of this study is to examine a good model for the
distribution of the number of user per cell in a shadowing environment.

Figure 2.4 One of the realization for BS and UE

In order to mitigate frequently selecting different BS in different time slots,


we assume that the UE calculates average power over a certain number of
time slots, L

We can write the distribution of number of users per cell, denoted as Dist(n),
as

17
NB is negative binomial

Note that the probability mass function (PMF) of a negative binomial (NB)
random variable T is

18
Figure 2.5, Negative binomial fit
and simulation for s = 0(no shadowing), s = 5, and s = 10

2.3 Reinforcement Learning

As in [6] & [7] Appropriate exploitation of heuristic information


through reinforcement learning for user association in ultra-dense dynamic
vehicular environments can significantly reduce the handover rate whilst
delivering a guaranteed network quality of service. Reinforcement learning
(RL) is foreseen to adapt to the spatial-temporal irregularities of urban traffic
flow in ultra-dense small cell scenarios enabling improvement in the 18
Chapter 1. Introduction network reliability, quality and latency by utilizing
the heuristic information. The network adaptability and evaluation of
improvement with the RL approaches include the amount of reduction in
the number of handovers per transmission while delivering a guaranteed
Quality of Service (QoS) at different vehicle speeds.

In RL, an agent interacts with an environment and learns to take

19
actions to maximize a cumulative reward over time. It draws inspiration from
behavioral psychology and is often used in applications where an agent must
learn to make a sequence of decisions, such as robotics, gaming,
recommendation systems, and autonomous vehicles. The agent is the learner
or decision-maker that interacts with the environment. It makes decisions
based on a policy to achieve a certain goal. A reward(R) is a scalar value that
the environment provides to the agent after each action. It quantifies how
good or bad the action was in the given state. The agent's objective is to
maximize the cumulative reward over time.

Unmanned Aerial Vehicle (UAV) networks are an emerging


technology, useful not only for the military, but also for public and civil
purposes. Their versatility provides advantages in situations where an existing
network cannot support all requirements of its users, either because of an
exceptionally big number of users, or because of the failure of one or more
ground base stations. Networks of UAVs can reinforce these cellular
networks where needed, redirecting the traffic to available ground stations.
Using machine learning algorithms to predict overloaded traffic areas, we
propose a UAV positioning algorithm responsible for determining suitable
positions for the UAVs, with the objective of a more balanced redistribution
of traffic, to avoid saturated base stations and decrease the number of users
without a connection. The tests performed with real data of user connections
through base stations show that, in less restrictive network conditions, the
algorithm to dynamically place the UAVs performs significantly better than
in more restrictive conditions, reducing significantly the number of users
without a connection.

2.3.1 Proximal Policy Optimization(PPO) algorithm

The Proximal Policy Optimization (PPO) algorithm is a popular


reinforcement learning algorithm used to train agents to perform well in
complex environments and sequential decision-making tasks. It is known for
its stability and sample efficiency, making it a go-to choice for various
applications. PPO is an on-policy algorithm, which means it learns directly
from the agent's experience and actively collects new data for training.
This paper[9] seeks to improve the current state of affairs by introducing an
algorithm that attains the data efficiency and reliable performance of trust

20
region policy optimization (TRPO) , while using only first-order
optimization. We propose a novel objective with clipped probability ratios,
which forms a pessimistic estimate (i.e., lower bound) of the performance of
the policy. To optimize policies, we alternate between sampling data from the
policy and performing several epochs of optimization on the sampled data.
The main objective proposed in this paper is

epsilon is a hyperparameter.

Figure 2.6

where c1, c2 are coefficients, and S denotes an entropy bonus, and LtVF is a squared-error
loss.

2.4 Path loss, Shadowing, and Fading

The wireless channel is susceptible to noise, interference, and other channel


impediments, but these impediments change over time in unpredictable ways
as a result of user movement and environment dynamics.

● Path loss: Path loss, also known as free-space path loss, refers to the

21
reduction in signal strength as a radio wave travels through the air or
any other medium. It is a consequence of the signal spreading out as it
propagates and is inversely proportional to the square of the distance
between the transmitter and receiver.

PL is the path loss in decibels (dB).


d is the distance between the transmitter and receiver.
λ is the wavelength of the signal.

● Shadowing: Shadowing is the random variation in signal strength due


to obstructions and obstacles in the propagation path. These
obstructions cause fluctuations in the received signal strength, and
these fluctuations are often modeled as a log-normal distribution.
Shadowing can be caused by buildings, trees, terrain, and other physical
obstacles that partially block or reflect radio waves. Even small changes
in the environment can lead to significant variations in received signal
strength. The received signal power with the combined effect of path
loss (power falloff model) and shadowing is, in dB, given by

Empirical measurements support the log-normal distribution for ψ:

22
● Fading: Fading refers to the rapid fluctuations in signal strength over
short time scales due to the superposition of multiple reflected and
refracted paths between the transmitter and receiver. This
phenomenon results in constructive and destructive interference,
causing signal strength to vary significantly. Fading is primarily caused
by multipath propagation, where signals take different paths and arrive
at the receiver with different phases. This can lead to constructive
(signal reinforcement) or destructive (signal cancellation)
interference.There are two main types of fading:
○ Small-Scale Fading: Occurs over a short distance and time scale
and is responsible for the fast variations in signal strength.
○ Large-Scale Fading: Occurs over a larger distance and time scale
and is often influenced by path loss and shadowing.

23
Chapter 3

Workflow

3D Contour:
Creating a realistic 3D surface for the environment to align with
real-world scenarios.Create a continuous terrain with the help of Perlin noise
in Python.Perlin noise is applied to create the terrain, with adjustable
parameters like scale, octaves, persistence, lacunarity, and seed for controlling
the appearance of the landscape. The resulting height map is normalized to
fit within a specified height range, ensuring it can be visualized accurately.

User Association in the Environment:


This Python code simulates and visualizes the spatial distribution of
users within a predefined area using a Poisson point process. It begins by
setting user density parameters and defining the size of the spatial area and
individual cells. The expected number of users in the entire space is
calculated based on the user density. Next, a Poisson point process is used to
generate random user counts for each cell in the grid. The code then proceeds
to assign actual user coordinates within each cell based on these counts.

User Mobility:
The user mobility is random and there are many researches are going
on we use the approximations from a specific research paper in the python
code which give a closer approximation for human mobility (“Modeling the
scaling properties of human mobility”) .the parameters like wait time,step
size and probability of exploration and reaching to the same point are
approximated in this research paper.

24
Base station Association & UAV:
Creating a python code such that the base Stations are distributed
among the users using a Poisson Point Process and there can be mobile base
stations (i.e. UAV) where its position changes dynamically along with the
time in the environment.The python code should be given an optimum base
station for a user by considering all the parameters.

Path Loss, Shadowing & Fading effects:


The wireless channel is susceptible to noise, interference, and other
channel impediments, but these impediments change over time in
unpredictable ways as a result of user movement and environment dynamics.
The training of the environment takes into consideration various factors,
including path loss, shadowing, and fading effects, to ensure that the output
closely resembles real-world conditions.

RL Agent:
The goal of reinforcement learning is to train an agent to complete a
task within an uncertain environment. At each time interval, the agent
receives observations and a reward from the environment and sends an action
to the environment.We train the Environment using Proximal Policy
Optimisation algorithm for finding the best Base Station for the user.

25
Chapter 4

Work Done & Results

4.1 3D Contour
To create a more realistic scenario we used the noise function in
Python. Python code that generates synthetic terrain using Perlin noise and
visualizes it from different perspectives.

● Initiate the parameters for the environment (width=1km and


height=1km) and block size.
environment_width = 1000 # 1 km in meters
environment_height = 1000 # 1 km in meters
# Define the size of each block
block_size = 1 # 1 m x 1 m

● Create a mesh grid of X and Y coordinates for the terrain


● Perlin noise is used to generate a continuous terrain, with parameters
like scale, octaves, persistence, lacunarity, and seed. Perlin noise is a
gradient noise function used for creating natural-looking patterns.
height_map = np.zeros((num_blocks_height, num_blocks_width))
for i in range(num_blocks_height):
for j in range(num_blocks_width):
height_map[i][j] = noise.snoise2(X[i][j] / scale, Y[i][j] / scale, octaves=octaves, persistence=persistence,
lacunarity=lacunarity, repeatx=1024, repeaty=1024, base=seed)

● The height map is then normalized to a specific height range.


min_height = np.min(height_map) # Adjust the minimum height
max_height = np.max(height_map) # Adjust the maximum height
height_map = min_height + (max_height - min_height) * (height_map - np.min(height_map)) /
(np.max(height_map) - np.min(height_map))

26
● Parameters:
○ Scale: The scale parameter determines the "zoom" or granularity
of the noise. A larger scale value results in a smoother, broader
pattern, while a smaller scale value produces finer, more detailed
noise. Adjusting the scale can change the overall appearance of
the generated terrain.

○ Octaves: Octaves control the level of detail in the noise. Each


additional octave adds more fine-grained detail to the noise.
Higher octave values lead to a more complex and detailed
pattern. However, too many octaves can lead to noisy or overly
complex results, so it's often useful to strike a balance based on
the desired level of detail.

○ Persistence: Persistence controls the contribution of each octave


to the final noise pattern. A value between 0 and 1 determines
how much each successive octave influences the noise. A higher
persistence emphasizes the details in the noise, while a lower
persistence makes the noise smoother by reducing the influence
of high-frequency details.

○ Lacunarity: Lacunarity is another parameter that influences the


detail in the noise. It controls the frequency of detail changes in
the noise. Higher lacunarity values increase the spatial frequency
of features, resulting in a more "clumpy" or irregular pattern.
Lower lacunarity values create a smoother and more
homogenous pattern.

○ Seed: The seed is a random number that initializes the noise


generation process. Using a different seed value will result in a
different random pattern. This is useful for generating varied
noise patterns while keeping the other parameters constant.

27
28
4.2 Poisson Point Processing(PPP)

The users and Base Stations are distributed on the contour using the
Poisson Point Process. The code generates random user locations within a
grid of cells and visualizes the user distribution.
● Calculate the expected number of users in the entire space based on the
user density. Generate a Poisson point process to determine the
number of users in each cell of the grid.
user_counts = np.random.poisson(lambda_value, (space_size // cell_size, space_size // cell_size))
● Find the number of users
val = 0
for i in user_counts :
for j in i :
val += j
print("Total Users Count from poisson distribution : " , val)

● Initialize an empty list to store user coordinates.

● Assign user locations within each cell based on the Poisson-distributed


user counts.
# Assign user locations within each cell based on user counts
for i in range(space_size // cell_size):
for j in range(space_size // cell_size):
cell_x = x[i]
cell_y = y[j]
num_users_in_cell = user_counts[i][j]
# Generate random user locations within the cell
user_x = np.random.uniform(cell_x - cell_size / 2, cell_x + cell_size / 2, num_users_in_cell)
user_y = np.random.uniform(cell_y - cell_size / 2, cell_y + cell_size / 2, num_users_in_cell)

# Append user coordinates to the list


user_coordinates.extend(list(zip(user_x, user_y)))

29
users_count:

30
4.3 Base Stations

Base Stations are also distributed using the Poisson Point Process. Here
we included a few mobile Base Stations.

● The code defines a class for creating and manipulating a graph in


three-dimensional space. This graph has nodes, edges with weights, and
probabilities associated with those edges. The probabilities are updated
using linear regression.
● This work enables the graph to evolve over multiple epochs,
dynamically updating edge probabilities based on factors such as
existing edge weights, distances between nodes, and predefined
regression coefficients.

● It calculates new edge probabilities based on the existing edge weights,


current edge probabilities, and node distances.

31
● The Edge weights and Edge rewards are assigned randomly, we train
the rewards of each edge by updating its value according to the
parameters (current edge reward, euclidean distance between the nodes
& the edge weights)

● After Training the edge rewards we randomly add a new node into the
graph And new edges between the new node and the original nodes.
The edge weights are also assigned randomly now we have trained to
find the best path from source to destination considering all the new
paths that have been added.

● For each Episode we add a random node into the graph and we train
for 10 epochs.

4.4 RL Agent:

The PPO is a policy gradient algorithm. There are two main variants of
PPO (i.e. PPO Penalty and PPO Clip) and we use PPO clip for training the
environment. It usually outperforms the penalty-based variant and is simpler
to implement. Rather than bothering with changing penalties over time, we

32
simply restrict the range within which the policy can change.

Empirically, ϵ=0.1 and ϵ=0.2 are values that work well.

The project utilizes the Proximal Policy Optimization (PPO) algorithm from
the Stable Baselines3 library to train an agent to make power allocation
decisions.

33
References

1. Song, C., Koren, T., Wang, P. et al. Modelling the scaling properties of
human mobility. Nature Phys 6, 818–823 (2010).
https://doi.org/10.1038/nphys1760
2. González, M., Hidalgo, C. & Barabási, AL. Understanding individual
human mobility patterns. Nature 453, 779–782 (2008).
https://doi.org/10.1038/nature06958
3. R. W. Heath, M. Kountouris and T. Bai, "Modeling Heterogeneous
Network Interference Using Poisson Point Processes," in IEEE
Transactions on Signal Processing, vol. 61, no. 16, pp. 4114-4126,
Aug.15, 2013, doi: 10.1109/TSP.2013.2262679.
4. B. Pilanawithana, S. Atapattu and J. Evans, "Distribution of number of
users per cell in a poisson wireless network with shadowing," 2016
Australian Communications Theory Workshop (AusCTW),
Melbourne, VIC, Australia, 2016, pp. 153-156, doi:
10.1109/AusCTW.2016.7433666.
5. Oliveira, F.; Luís, M.; Sargento, S. Machine Learning for the Dynamic
Positioning of UAVs for Extended Connectivity. Sensors 2021, 21,
4618. https://doi.org/10.3390/s21134618
6. F. Zishen, X. Xianzhong, S. Zhaoyuan and C. Xiping, "Proximal Policy
Optimization Based Continuous Intelligent Power Control in
Cognitive Radio Network," 2020 IEEE 6th International Conference
on Computer and Communications (ICCC), Chengdu, China, 2020,
pp. 820-824, doi: 10.1109/ICCC51575.2020.9345062.
7. Y. Meng, S. Kuppannagari, R. Kannan and V. Prasanna, "PPOAccel: A
High-Throughput Acceleration Framework for Proximal Policy
Optimization," in IEEE Transactions on Parallel and Distributed
Systems, vol. 33, no. 9, pp. 2066-2078, 1 Sept. 2022, doi:
10.1109/TPDS.2021.3134709.
8. S. Sun, T. A. Thomas, T. S. Rappaport, H. Nguyen, I. Z. Kovacs and I.
Rodriguez, "Path Loss, Shadow Fading, and Line-of-Sight Probability
Models for 5G Urban Macro-Cellular Scenarios," 2015 IEEE
Globecom Workshops (GC Wkshps), San Diego, CA, USA, 2015, pp.
1-7, doi: 10.1109/GLOCOMW.2015.7414036.

34
35

You might also like