Professional Documents
Culture Documents
Joint Sensing and Communications For Deep Reinforcement Learning-Based Beam Management in 6G
Joint Sensing and Communications For Deep Reinforcement Learning-Based Beam Management in 6G
1
Accepted by 2022 IEEE Globecom conference, ©2022 IEEE
2
Accepted by 2022 IEEE Globecom conference, ©2022 IEEE
color model, each pixel is defined by three values representing where dist(pm , pn ) is a distance measure between pm and
red, green, and blue ranging from 0 to 255. pn . The uncertain distance is the expected value of the
The satellite images used in the localization process are distance measurement between two uncertain objects.
RGB images with the same resolution of 256 × 256, and Then, the medoid of a cluster is computed as the data point
each image is paired with a location sample. Therefore, we in the cluster with the least average dissimilarity to all other
count the number of pixels that represent one specific object points in the cluster. The update method of a cluster center is:
to compute the proportion of this object in the image. Since
there are some common objects in the satellite images such
X
cj = arg min 0
δ(p, p0 ) (2)
as grass and buildings. We can use this pixel characteristic- p∈Cj p ∈Cj
based feature to characterize one image. In this work, we In this paper, we assume the uncertainty region of all
selected three objects. As shown in Fig. 3, the three selected locations is specified as a uniform circle with center p and
objects are grass, gray building, and road. Then we counted the radius R. Then the PDF for a uniformly distributed circle is
total number of pixels representing three objects respectively shown as below in the polar coordinate system:
as three new features, which will be used to improve the r
localization accuracy. f (r, θ) = (3)
πR2
3
Accepted by 2022 IEEE Globecom conference, ©2022 IEEE
where r ∈ [0, R] and π ∈ [0, 2π]. into account. URLLC users need minimal latency and high
In this way, equation (1) can be rewritten as: reliability, whereas eMBB users require a higher data rate. As
Z R Z 2π Z R Z 2π such, the reward function has to consider the requirements of
δ(pm , pn ) = dist(pm , pn ) different types of users. The reward function is given by:
0 0 0 0 (4) (
2 2 s
f (r, θ) r dr1 dθ1 dr2 dθ2 , sigm( sTi,b
ar ), eMBB user
r= si,b τ T ar (5)
which is the expansion of the previous equation (1) from sigm( sT ar τ q ), URLLC user
the rectangular coordinate system to the polar coordinate
system. In our work, we use Euclidean distance as our distance where si,b is the SINR of the transmission link between ith
measurement. RBG to a user in the bth beam, sT ar is the target SINR
The UK-medoids algorithm is summarized by Algorithm 1. of eMBB users, τ T ar is the target latency of URLLC users,
First, all the uncertain distances between every two objects and τ q is the queuing delay. Here we use a target SINR to
are computed. Note that these distances only need to be represent the high-reliability requirements of URLLC users.
computed once during the whole algorithm, which will reduce sigm denotes the sigmoid function, which will normalize the
the computation complexity. Then, the initial cluster centers reward:
1
are chosen, and it comes to the main loop. The main loop of sigm(x) = (6)
1 + e−x
UK-medoids contains two phases: the first phase is to allocate
objects to a cluster, the second phase is to recompute cluster Finally, we deploy Long Short-Term Memory (LSTM)
centers according to equation (2). The loop continues until network for Q-value prediction in DRL. As a special recurrent
there is no change in the cluster centers. neural network, the LSTM network can better capture the long-
term data dependency, which makes it an ideal application for
Algorithm 1 UK-Medoids complicated wireless environment.
Input: A set of uncertain location P = {p1 , ..., pM }.
Output: A set of clusters C = {C1 , ..., CN }. VI. S IMULATION
1: Computing uncertain distance δ(pm , pn ), ∀pm , pn ∈ P A. Simulation Settings
2: Selecting initial cluster centers c = {c1 , ..., cN }.
1) Localization settings and dataset: We divide the dataset
3: while c 6= c0 do
into 4 regions and each region has its own FFNN to predict the
4: c0 ← c
UE locations. The FFNN architecture is 128×64×32×16×2.
5: c←∅
Sigmoid activation functions are employed for hidden layers,
6: C = {C1 , ..., Cn } ← {∅, ..., ∅}.
whereas linear activation functions are used for output layers.
7: for Each object p ∈ P do
2) Resource Allocation Settings: We perform the simula-
8: Allocating each object to the nearest cluster center
tion on MATLAB 5G Toolbox. The mmWave network consists
according to the uncertain distance.
of 1 gNB, and the clustering is implemented every 10 TTIs
9: end for
based on the UE trajectories. The traffic of users follows
10: for Each cluster Cj ∈ C do
Poisson distribution and the packet size is 32 bytes. Three
11: Recomputing the cluster center cj .
beams with 60◦ angle are predefined. The total simulation
12: c ← c ∪ cj
time is 1400 TTIs, and each TTI is 1/7 ms. The simulation is
13: end for
repeated 5 times, and the confidence interval is 95%.
14: end while
For UEs’ positions, the location provided by the dataset is
considered as exact location, while the localization results are
B. Deep Reinforcement Learning considered as location with error for comparison. We include
UK-Medoids will return a set of clusters, and we assume four scenarios:
each cluster will be served by one beam [6]. Then DRL • Scenario 1: We applied K-means for clustering and DRL
is performed for intra-beam radio resource allocation, which for resource allocation. The scenario is summarized by
aims to offer high QoS requirements for both ultra-reliable K-DRL+Location with Error.
low latency communications (URLLC) and enhanced mobile • Scenario 2: Compared with scenario 1, the only difference
broadband (eMBB) users. The Markov decision process of is that we use UK-means for clustering. UK-means is
DRL is defined as follows: another uncertain clustering method that is developed
1) Actions: In each beam, the action is to allocate ith RBG from K-means [18]. It updates centers as the expected
to j th user. value over PDF and the distance between objects are also
2) States: The UE’s channel quality indicator (CQI) feed- the expected distance. The scenario is named by UK-
back is defined as states, which represents the channel condi- DRL+Location with Error.
tion between UEs and the BS. • Scenario 3: We deploy our proposed UK-medoids based
3) Reward: URLLC and eMBB users have different QoS clustering method and DRL for resource allocation. The
requirements, the reward function must take both requirements scenario is named by UKM-DRL+Location with Error.
4
Accepted by 2022 IEEE Globecom conference, ©2022 IEEE
TABLE III
M EASUREMENTS OF L OCALIZATION R ESULTS
5
Accepted by 2022 IEEE Globecom conference, ©2022 IEEE
ACKNOWLEDGMENT
This work is supported by Ontario Centers of Excel-
lence(OCE) 5G ENCQOR program and Ciena, the NSERC
Collaborative Research and Training Experience Program
(CREATE) under Grant 497981, and Canada Research Chairs
Program. We would like to thank Medhat Elsayed and Hind
Mukhtar for their help in the earlier versions.
R EFERENCES
[1] J. Cui, Z. Ding and P. Fan, ”The Application of Machine Learning in
mmWave-NOMA Systems,” in IEEE Vehicular Technology Conference
(VTC Spring), pp. 1-6, June 2018.
(a) Delay comparison when error is 17.11m [2] T. Nishio, Y. Koda, J. Park, M. Bennis and K. Doppler, ”When
Wireless Communications Meet Computer Vision in Beyond 5G,” in
IEEE Communications Standards Magazine, vol. 5, no. 2, pp. 76-83,
June 2021.
[3] S. Nilwong, D. Hossain, S. Kaneko, and G. Capi, ”Deep Learning-Based
Landmark Detection for Mobile Robot Outdoor Localization,” Machines,
Vol.7, no.2, pp.1-14, Apr. 2019.
[4] S. Ramakrishna, and E. Mary, ”A Survey of Clustering Uncertain Data
Based Probability Distribution Similarity”, in International Journal of
Computer Science and Network Security, vol. 14, no. 9, pp. 77-81,
September 2014.
[5] H. Zhou, M. Elsayed, and M. Erol-Kantarci, “RAN Resource Slicing in
5G Using Multi-Agent Correlated Q-Learning,” in Proceedings of 2021
IEEE PIMRC, pp.1-6, September 2021.
[6] M. Elsayed and M. Erol-Kantarci, ”Radio Resource and Beam Manage-
ment in 5G mmWave Using Clustering and Deep Reinforcement Learn-
ing,” in IEEE Global Communications Conference (GLOBECOM), pp.
(b) Delay comparison when error is 3.62m 1-6, December 2020.
Fig. 6. Delay comparison under various traffic loads [7] R. Di Taranto, S. Muppirisetty, R. Raulefs, D. Slock, T. Svensson and H.
Wymeersch, ”Location-Aware Communications for 5G Networks: How
location information can improve scalability, latency, and robustness of
by a higher sum rate. It further demonstrates the capability of 5G,” in IEEE Signal Processing Magazine, vol. 31, no. 6, pp. 102-112,
our proposed vision-aided localization method. November 2014.
Similarly, Fig. 6 presents the delay under different traffic [8] J. Choi, Y. Park, J. Song, and I. Kweon, ”Localization using GPS and
VISION aided INS with an image database and a network of a ground-
loads when localization error is 17.11 meters and 3.62 meters. based reference station in outdoor environments”, in International Jour-
The result is consistent with Fig. 5. The optimal baseline nal of Control, Automation and Systems, May 2011.
presents the lowest latency. With distorted location, UKM- [9] J. Biswas and M. Veloso, ”Depth camera based indoor mobile robot
localization and navigation,” in IEEE International Conference on
DRL has lower latency than UK-DRL. At last, since K-means Robotics and Automation, pp. 1697-1702, May 2012.
has no ability to handle localization error, it has the highest [10] S. Nilwong, D. Hossain, S.-I. Kaneko, and G. Capi, “Deep learn-
latency. When the error is 3.62 meters, the latency of both UK- ing based landmark detection for mobile robot outdoor localization,”
Machines, vol. 7, no. 2, pp. 25-38, 2019. [Online]. Available: https:
DRL and UKM-DRL are very close to the optical baseline. //www.mdpi.com/2075-1702/7/2/25
Compared with the results in Fig. 6(a), the latency of scenarios [11] J. Cui, Z. Ding, P. Fan, and N. Al-Dhahir, “Unsupervised Ma-
1 to 3 are greatly reduced. When the traffic load is 4 Mbps, chine Learning-Based User Clustering in Millimeter-Wave-NOMA Sys-
tems,” IEEE Transactions on Wireless Communications, vol. 17, pp.
UKM-DRL has an improvement of 32.4% over UK-DRL 7425–7440, November 2018.
under 17.11 meters error and 7.7% under 3.62 meters error. [12] B. Kimy et al., ”Non-orthogonal Multiple Access in a Downlink
Multiuser Beamforming System,” in IEEE Military Communications
VII. C ONCLUSION Conference (MILCOM), pp. 1278-1283, November 2013.
[13] D. Marasinghe, N. Jayaweera, N. Rajatheva, and M. Latva-Aho, “Hier-
Machine learning is considered as a promising technique archical User Clustering for mmWave-NOMA Systems,” in 6G Wireless
Summit (6G SUMMIT), pp. 1–5, February 2020.
to solve the pressing challenges in future 6G networks. [14] J. T. H. L. Christiansen, “Mobile communication system measurements
In this paper, we investigate the joint vision-aided sensing and satellite images,” 2019. [Online]. Available: https://dx.doi.org/10.
and communications for beam management of 6G Networks. 21227/1xf4-eg98
[15] H. Mukhtar and M. Erol-Kantarci, ”Satellite Image and Received
We first introduce a novel vision-aided localization method Signal-based Outdoor Localization using Deep Neural Networks,” in
to improve localization accuracy. Then we propose a UK- IEEE Canadian Conference on Electrical and Computer Engineering
medoids based clustering method to handle the localization (CCECE), pp. 1-6, September 2021.
[16] S. Chowdhury, B. Verma, M. Tom and M. Zhang, ”Pixel characteris-
uncertainty, and apply the deep reinforcement learning for tics based feature extraction approach for roadside object detection,“
intra-beam resource allocation. The simulations demonstrate International Joint Conference on Neural Networks, pp. 1-8, July 2015.
that our proposed method can greatly reduce the localization [17] F. Gullo, G. Ponti and A. Tagarelli, “Clustering Uncertain Data Via
K-Medoids”, in Scalable Uncertainty Management, pp. 229–242, 2008.
error, and it achieves lower delay and higher data rate than [18] M. Chau, R. Cheng, B. Kao, and J. Ng, “Uncertain data mining: An
other baseline algorithms. In the future, we will include more example in clustering location data,” in Pacific-Asia conference on
vision-based methods for localization. knowledge discovery and data mining, vol.2, pp. 199-204, April 2006.