Professional Documents
Culture Documents
Paper2
Paper2
Paper2
Abstract-Drone-cell technology is emerging as a solution to amounts of information about users behavior and network load
support and backup the cellular network architecture. cell-drones must be continuously collected and analyzed by intelligent
are flexible and provide a more dynamic solution for resource algorithms. This should trigger the usage of UAVs on field.
allocation in both scales: spatial and geographic. They allow
to increase the bandwidth availability anytime and everywhere On the other hand, the promising development of artificial
according the continuous rate demands. Their fast deployment intelligence techniques and machine learning tools may speed
provide network operators with a reliable solution to face sudden up the development of cellular networks and leed to develop
network overload or peak data demands during mass events, such dynamic and self-adaptive tools that help the network op-
without interrupting services and guaranteeing better QoS for erators to better manage their infrastructure and integrate new
users. With these advantages, drone-cell network management
is still a complex task. We propose in this paper, a multi- techniques. Machine learning techniques have the advantage of
agent reinforcement learning approach for dynamic drones-cells exploiting the plethora of data generated by cellular networks
management. Our approach is based on an enhanced joint action to improve these dynamic deployment techniques [4].
selection. Results show that our model speed up network learning In this work, we propose a dynamic reinforcement learning
and provide better network performance. solution for drone-cells networks that exploits real traces of de-
mand profiles and adapts in real time the deployment of drone-
I. INTRODUCTION
cells according these demands. We choose an efficient machine
The continuous growth of data demand and the increase in learning paradigm instead of classical linear programming
wireless traffic rates in new mobile networks needs intelligent models. Our solution is based on a multi-agent reinforcement
and dynamic technologies for telecommunication manage- learning (MARL) approach which is applied in two steps: an
ment. Recent studies predict that the new generation cellular off-line exploration step and an on-line one. The off-line step
standards (like 5G) will rely much more heavily on a dense consists in a learning phase where the drone-cells of the multi-
and less power consuming networks to serve dynamically agent topology learn, in a cooperative reinforcement learning
requested data rates to the user [1]. Small cells mounted on context, how to react and move according to different scenarios
unmanned aerial vehicles (UAVs) or drones (we call them of bandwidth demands. Real Traces of network time-series
drone-cells hereafter) are proposed, as an alternative to fixed load are used for this phase. In the on-line phase, the pre-
femto-cells, to support existing macro-cell infrastructure. The learning agents demonstrate how they are adapted instantly on
deployment of these mobile small cells consists on move these a real scenario of load peak signaled by the STAD framework.
small cells toward a target positions (usually within the range The cooperative reinforcement learning is based on a joint
of the macro-cell to support) based on the decision made action selection that aims to choose the optimal set of agents
by a mobile network operator. Since drone operation costs action. We propose a joint action selection algorithm and we
money, the drone-cells movement toward adequate positions compare it against the classical grid selection algorithm.
must be correlated with the amount of data requests, i.e. The paper is organized as follows. In section 2 we present
drone-cells should support overloaded cells (Figure 1). Hence, related work, while in section 3 we present a highlight of
an intelligent entity should be added to the network that multi-agent reinforcement learning approach. Section 4 devel-
instantly monitors the network state and finds the optimal ops our MARL-based model. Section 5 presents simulations
decision to control drone-cells. Several studies have discussed uses-cases and results and we conclude in section 6.
the theoretical ability of cellular networks to use drone-cells to
support their existent macro-cells [2], [3] but the readiness of II. RELATED WORK
those networks to integrate such dynamic entities has not been Smart deployment of drone-cells has been a topic of recent
discussed. For example, drone-cells require coordinated inser- research; examples of such works include [5], where the
tion to the network infrastructure while serving subscribers authors use binary integer linear programming (BILP) to
and smooth de-association at the end of their activity (low selectively place drone-cells in networks and integrate them
battery, low data demand ...). This requires capabilities for to an existent cellular network infrastructure while temporal
efficient network configuration and management flexibilities. increase of user demands occurs. They design the air-to-
It needs self-organizing capabilities for the networks. Massive ground channel for drone-cells as the combination of two
.~ 0.4
Algorithm 1 Centralized Joint action selection
I: Input: (Action,State) space, Q-tables, ServedCeWlist
i
z 0.2
both models. We can notice that after finishing the learning (a) Network QoS evolution 6pm
period (after 300 steps) the network service ratio converges MARL using e-HCS
All cells are fully served. We also notice that 8 drones are ~O_I
~O-&
sufficient for our eHCS'based model, to serve the network. 0'·'1
Whereas, the concurrent model based on HCS performs worse
than our model. We notice that in this scenario, and after
i::-F Step
the exploration phase, the model fails to converge to I and (b) Network QoS evolution 7pm
the network service ratio converges to 0.9 and 0.6 at 6pm
and 7pm respectively. Hence, the HCS based model performs Figure 5: Network service ratio evolution in function of
10010 lower than our model at 6pm and 01040 lower at 7pm, execution steps using 8 drones
when demand is much higher. So, we can say that our model
performs much better than the other model during peak hours.
Moreover, the exploration period was not sufficient for the
HCS based model to achieve the optimality and we conclude
that our model is faster with 8 drones for the period between
6pm and 7pm. Figures 6-10 show simulation and comparison
results using 14 drones. We notice that for all time periods,
the network service ratio converges to 1 after the learning
period for our model (the learning is very fast in this situation).
Whereas, the HCS based model success to converge to 1
only for three time periods; at 6pm, 7pm and Ilpm while its Figure 6: Network QoS evolution in function of execution steps with
network service ratio is lower than I for the rest of periods. 14 drones and at 6pm
Furthermore, for scenarios at 6pm, 7pm and II pm, even that
the HCS based model converges to I after the exploration
period, it is slower that our model.
~,:.:.- -
Globally, we notice that the HCS model fails to reach an
..
~ 01
-F-
JlO,1
period was not sufficient to learn and to find the optimal joint ~:: ----;;--;:;------;±;---=-------=---;:;--------;;;;:------;:;;-
actions within these periods. The other model does not manage eO.•
, ~ 01
battery in a perform ant way. Most of drone batteries are lost JlO,1
~O,6
need one time slot to recharge). That is why the other model Step
reaches optimality at Ilpm again (after recharge). Hence, in Figure 7: Network QoS evolution in function of execution steps with
order to satisfy the global demand during the exploration 14 drones and at 8pm
period with the same period size, the HCS based model
needs more drones to serve all demands. MARL learns these
constraints and succeeds to efficiently serve the network. With enhanced model at 9pm , time-slot corresponding to a heavy
this configuration, our enhanced model only needs II drones at demand behavior. The colors represents the demand intensity
total to cover the whole area during the football match event. at each cell. We can notice that 9 drones of 14 are deployed
These 11 drones alternate to serve the requested amount of at cell locations that guarantee serving the total required
data according to their battery life level. bandwidth. Note that at this time 4 drones are out of charge
Figure II depicts the deployment of drone-cells using our while I drone is not serving at all.
is split between navigation and antenna beaming/transmission,
energy presents a major constraints for drones-cells deploy-
~O,1
~06
l.,
~ -.~
multi-agent reinforcement learning approach for this dynamic
network deployment. We also propose an enhanced joint action
Step
selection algorithm to alleviate the coordination complexity
Figure 8: QoS, 14 drones, 9pm between drone-cells agents and also speed up the search phase
of the optimal joint action. Our model takes into consideration
the battery life constraints while aiming to maximize the
network service ratio. Our solution is validated with real
network traces and we provide a bench marking analysis. Our
model based on the enhanced joint action selection that we
propose is compared with a model based on hill climbing
search algorithm. Results show that our model outperforms
the second model not only when the rate demand is lower but
especially at peak time service.
Our model offers hence a better solution for network op-
erators to dynamically manage their network and to provide
Figure 9: Network QoS evolution in function of execution steps with
14 drones and at IOpm efficiently a better QoS for users during mass events. A
future perspective will introduce the deep aspect that can add
classification and memory of early situations to the multi agent
system.
REFERENCES
[I] l. G. Andrews, S. Buzzi, W. Choi, S. V. Hanly, A. Lozano, A. C. Soong,
and l. C. Zhang, "What will 5g beT' IEEE Journal on selected areas
in communications, vol. 32, no. 6, pp. 1065-1082, 2014.
[2] A. Alsharoa, H. Ghazzai, A. Kadri, and A. E. Kamal, "Energy manage-
ment in cellular hetnets assisted by solar powered drone small cells," in
Wireless Communications and Networking Conference (WCNC), 2017
IEEE. fEEE, 2017, pp. 1-6.
Step [3] E. Kalantari, H. Yanikomeroglu, and A. Yongacoglu, "On the number
and 3d placement of drone base stations in wireless cellular networks,"
Figure 10: Network QoS evolution in function of execution steps in Vehicular Technology Conference (VTC-Fall). fEEE, 2016, pp. 1-6.
with 14 drones and at 11 pm [4] S. E. Hammami, H. Afifi, M. Marot, and V. Gauthier, "Network planning
tool based on network classification and load prediction," in Wireless
Communications and Networking Conference. fEEE, 2016, pp. 1-6.
[5] E. Kalantari, M. Z. Shakir, H. Yanikomeroglu, and A. Yongacoglu,
"Backhaul-aware robust 3d drone placement in 5g+ wireless networks,"
140 arXiv preprint arXiv:1702.08395, 2017.
[6] S. Park, H. Kim, K. Kim, and H. Kim, "Drone formation algorithm on
2-Drones 3d space for a drone-based network infrastructure," in Personal, Indoor,
uo
and Mobile Radio Communications (PIMRC), 20i6 IEEE 27th Annual
international Symposium on. IEEE, 2016, pp. 1-6.
100
[7] M. lun and R. DAndrea, "Path planning for unmanned aerial vehicles in
unceltain and adversarial environments," in Cooperative control: models,
eo applications and algorithms. Springer, 2003, pp. 95-110.
2-Drones
[8] A. Soua and H. Afifi, "Adaptive data collection protocol using reinforce-
60 ment learning for vanets," in 9th International Wireless Communications
and Mobile Computing Conference. IEEE, 2013, pp. 1040-1045.
[9] B. Nour, K. Sharif, F. Li, H. Moungla, and Y. Liu, "M2hav: A
standardized icn naming scheme for wireless devices in internet of
Figure 11: Illustration of drone-cells deployment at 9pm things," in Data Driven Control and Learning Systems (DDCLS), 2017
6th. Springer, 2017, pp. 727-732.
[10] S. Proper and P. Tadepalli, "Scaling model-based average-reward rein-
forcement learning for product delivery," in European Conference on
VI. CONCLUSION Machine Learning. Springer, 2006, pp. 735-742.
[11] C. Cheng, Z. Zhu, B. Xin, and C. Chen, "A multi-agent reinforcement
Drones-cells management is still complex and needs ad- learning algorithm based on stackelberg game," in International Confer-
vanced and intelligent algorithms. Even with their fast deploy- ence on Wireless Algorithms, Systems, and Applications. IEEE, 2017,
ment, drone-cells networks suffer from coordination issues. On pp. 289-30 I.
the other hand, battery technology is still limited and since it