URLLC First Principles and Implementation For 6G 1603167259

1
A Tutorial of Ultra-Reliable and Low-Latency

Communications in 6G: Integrating Theoretical
Knowledge into Deep Learning
Changyang She, Member, IEEE, Chengjian Sun, Student Member, IEEE, Zhouyou Gu, Yonghui Li, Fellow, IEEE,
Chenyang Yang, Senior Member, IEEE, H. Vincent Poor Fellow, IEEE, and Branka Vucetic Fellow, IEEE
arXiv:2009.06010v1 [eess.SP] 13 Sep 2020
Abstract—As one of the key communication scenarios in Benefits Methods Open issues
the 5th and also the 6th generation (6G) cellular networks,
ultra-reliable and low-latency communications (URLLC) will be Ensuring 1 ms State-of-art: No E2E delay
central for the development of various emerging mission-critical delay in RANs 5G NR guarantee
applications. The state-of-the-art mobile communication systems
do not fulfill the end-to-end delay and overall reliability require-
ments of URLLC. A holistic framework that takes into account Ensuring E2E Cross-layer High complexity
latency, reliability, availability, scalability, and decision-making performance design algorithms
under uncertainty is lacking. Driven by recent breakthroughs
in deep neural networks, deep learning algorithms have been
considered as promising ways of developing enabling technologies Fast inference
for URLLC in future 6G networks. This tutorial illustrates Long training
for real-time Deep learning
how to integrate theoretical knowledge (models, analysis tools, phase
implementation
and optimization frameworks) of wireless communications into
different kinds of deep learning algorithms for URLLC. We
first introduce the background of URLLC and review promising Improving
Knowledge + Model
network architectures and deep learning frameworks in 6G. learning
deep learning mismatch
To better illustrate how to improve learning algorithms with efficiency
theoretical knowledge, we revisit model-based analysis tools and
cross-layer optimization frameworks for URLLC. Following that, Good initial Fine-tuning in
we examine the potential of applying supervised/unsupervised performance & real-world
deep learning and deep reinforcement learning in URLLC and fast adaptation systems
summarize related open problems. Finally, we provide simulation
and experimental results to validate the effectiveness of different
learning algorithms and discuss future directions. Fig. 1. Methods to achieve URLLC [11–13].
Index Terms—Ultra-reliable and low-latency communications,

6G, cross-layer optimization, supervised deep learning, unsuper- A. Motivation
vised deep learning, deep reinforcement learning.
According to the requirements in 5G standards [1], to

I. I NTRODUCTION
support emerging mission-critical applications, the End-to-End
As one of the new communication scenarios in the (E2E) delay cannot exceed 1 ms and the packet loss proba-
5th Generation (5G) wireless communications, Ultra-Reliable bility should be 10−5 ∼ 10−7 . Compared with the existing
and Low-Latency Communications (URLLC) are crucial cellular networks, the delay and reliability require significant
for enabling a wide range of future emerging applications improvements by at least two orders of magnitude for 5G
[1], including industry automation, intelligent transportation, networks. This capability gap cannot be fully resolved by the
telemedicine, Tactile Internet, and Virtual/Augmented Reality 5G New Radio (NR), i.e., the physical-layer technology for
(VR/AR) [2–6]. Since these applications lie in the scope of 5G [10], even though the transmission delay in Radio Access
Industrial Internet-of-Things (IoT) [7], Germany Industrie 4.0 Networks (RANs) achieves the 1 ms target. Transmission
[8], and Made in China 2025 [9], URLLC has the potential to delay only contributes to a small fraction of the E2E delay,
improve our everyday life and bring huge economic growth in the stochastic delays in upper layers, such as queueing delay,
the future. processing delay, and access delay, are key bottlenecks for
C. She, Z. Gu, Y. Li, and B. Vucetic are with the School of Elec- achieving URLLC. Beyond 5G systems or so-called the Sixth
trical and Information Engineering, University of Sydney, Sydney, NSW Generation (6G) systems should guarantee the E2E delay
2006, Australia (e-mail: shechangyang@gmail.com, {zhouyou.gu, yonghui.li,
branka.vucetic}@sydney.edu.au). bound with high reliability. The stringent requirements bring
C. Sun and C. Yang are with the School of Electronics and unprecedented research challenges to wireless network design.
Information Engineering, Beihang University, Beijing 100191, China The open issues and the methods to handle them are illustrated
(email:{sunchengjian,cyyang}@buaa.edu.cn).
H. V. Poor is with the Department of Electrical Engineering at Princeto- in Fig. 1. In the sequel, we illustrate motivation of applying
nUniversity, Princeton, NJ 08544, USA. (e-mail: poor@princeton.edu). these methods in URLLC.
2
1) Cross-layer Design: Existing approaches to communi- Packet loss probability Traditional

Video-on-demand
cation system design divide the systems into multiple layers real-time service
service
according to the Open Systems Interconnection model [14]. 10-2
Communication technologies in each layer are developed eMBB
10-3
without considering the impacts of other layers, despite that File download
Tactile Internet
the interactions across different layers are known to have
significant impacts on the E2E delay and reliability. Existing 10-5
approaches do not reflect such interactions; this leads to URLLC
E2E Delay
suboptimal solutions and thus we are yet to be able to meet 1ms ~10ms ~1s ~60s
the stringent requirements of URLLC. To guarantee the E2E Channel Shadowing Handovers
delay and the reliability of the communication system, we need coherence time decorrelation
accurate and analytically tractable cross-layer models to reflect
the interactions across different layers. Fig. 2. Delay and reliability requirements of URLLC and eMBB [24,27,28].
2) Deep Learning: With 5G NR, the radio resources are

allocated in each Transmission Time Interval (TTI) with B. Scope of This Paper
duration of 0.125 ∼ 1 ms [10]. To implement optimization
This paper provides a review of theoretical knowledge (i.e.,
algorithms in 5G systems, the processing delay should be less
analysis tools and optimization frameworks) and deep learning
than the duration of one TTI. Since the cross-layer models
algorithms (i.e., supervised, unsupervised, and reinforcement
are complicated, related optimization problems are non-convex
learning). Then, we illustrate how to combine them to optimize
in general (See some examples on cross-layer optimization
URLLC in a cross-layer manner. The scope of this paper is
in [15–19].). The optimization algorithms in these papers
summarized as follows,
require high computing overheads, and hence can hardly be
implemented in 5G systems. 1) In Section II, we introduce the background of URLLC
According to some survey papers [20–22], deep learning including research challenges, methodologies, and a
has shown great potential to address the above issue in road-map toward URLLC.
beyond 5G/6G networks. The basic idea is to approximate 2) In Section III, we review promising network architec-
the optimal policy with a deep neural network (DNN). After tures and deep learning frameworks for URLLC and
the training phase, a near-optimal solution of an optimization summarize the design principles of deep learning frame-
problem can be obtained from the output of the DNN in each works in 6G networks.
TTI. Essentially, by using deep learning, we are trading the 3) We revisit analysis tools in information theory, queueing
online processing time with the computing resource for off- theory, and communication theory in Section IV, and
line training. show how to apply them in cross-layer optimization for
URLLC in Section V.
3) Integrating Knowledge into Learning Algorithms: Al-
4) Then, we examine the potential of applying super-
though deep learning algorithms have shown great potential,
vised/unsupervised deep learning and Deep Reinforce-
the application of deep learning in URLLC is not straight-
ment Learning (DRL) in URLLC in Sections VI, VII,
forward. As shown in [11], deep learning algorithms con-
and VIII, respectively. The related open problems are
verge slowly in the training phase and need a large number
summarized in each Section.
of training samples to evaluate or improve the E2E delay
5) In Section IX, we provide some simulation and exper-
and reliability. If some knowledge of the environment is
imental results to illustrate how to integrate theoretical
available, such as the estimated packet loss probability of
knowledge into deep learning algorithms for URLLC.
a certain decision, the system can exploit the knowledge to
6) Finally, we highlight some future directions in Section
improve the learning efficiency [23]. Theoretical knowledge
X and conclude this paper in Section XI.
of wireless communications including models, analysis tools,
and optimization frameworks have been extensively studied in
the existing literature [24]. How to exploit them to improve II. BACKGROUND OF URLLC
deep learning algorithms for URLLC has drawn significant
attentions from academic society, such as [21, 25, 26]. The requirements on E2E delay and overall packet loss
4) Fine-tuning in Real-world Systems: Communication en- probability of URLLC and enhanced Mobile BroadBand
vironments in wireless networks are non-stationary in general. (eMBB) services are illustrated in Fig. 2. For eMBB services,
The theoretical models used in off-line training may not match communication systems can trade reliability with delay by
the non-stationary network. As a result, the DNN trained off- retransmissions. For URLLC services, the requirements on the
line cannot guarantee the Quality-of-Service (QoS) constraints E2E delay and reliability are much more stringent than eMBB
of URLLC. Such an issue is referred to as the model-mismatch services. In addition, the wireless channel fading, shadowing,
problem in [13]. To handle the model mismatch, wireless and handovers will have significant impacts on the reliability
networks should be intelligent to adjust themselves in dynamic of URLLC services (See some existing results in [29–31].).
environments, explore unknown optimal policies, and transfer Therefore, ensuring the QoS requirements of URLLC is more
theoretical knowledge to practical networks. challenging than eMBB services.
3
TABLE I
KPI S AND R ESEARCH C HALLENGES OF D IFFERENT URLLC A PPLICATIONS [2, 27, 30, 32–40]
Indoor large-scale scenarios

Applications KPIs (except E2E delay & reliability) Research Challenges
Factory automation SE, EE & AoI Scalability & network congestions
VR/AR applications SE & throughput Processing/transmission 3D videos
Indoor wide-area scenarios
Tele-surgery Round-trip delay, throughput & jitter Propagation delay & high data rate
eHealth monitoring EE & network availability Propagation delay & localization
Outdoor large-scale scenarios
Vehicle safety AoI, SE, security & network availability High mobility, scalability & network congestions
Outdoor wide-area scenarios
Smart grid SE Propagation delay & scalability
Tele-robotic control SE, security, network availability & jitter Propagation delay & high data rate
UAV control EE, security, network availability & AoI Propagation delay & high mobility
A. Other Key Performance Indicators (KPIs) and Challenges in bilateral teleoperation systems [6]. On the other hand, to
maintain stability and transparency, the signals from a haptic
In the existing literature, the application scenarios, KPIs,
sensor are sampled, packetized, and transmitted at 1000 Hz or
and research challenges of URLLC have been extensively
even higher [6]. As a result, the required throughput is very
investigated. In Table I, we classify these applications into four
high. Meanwhile, the jitter of delay is crucial for the stability
categories according to the communication scenarios, includ-
of control systems as stated in [32]. If the jitter is larger than
ing indoor large-scale networks, indoor wide-area networks,
the inter-arrival time between two packets (less than 1 ms
outdoor large-scale networks, and outdoor wide-area networks.
in telesurgery), the order of the control commands arriving
1) Indoor large-scale networks: Typical applications in at the receiver can be different from the order of commands
indoor large-scale networks include factory automation and generated by the controller. In this case, the control system is
VR/AR applications such as immersed VR games [6, 8, 33]. unstable.
As the density of devices increases, improving the spectrum
efficiency, say the number of services that can be supported Due to long communication distance, the propagation delay
with a given total bandwidth, becomes an urgent task. Since in core networks might be much longer than the required E2E
the battery capacities of mobile devices are limited, energy ef- delay, and thus will become the bottleneck of URLLC. To
ficiency is another KPI in this scenario. In factory automation, handle this issue, prediction and communication co-design is
a large number of sensors keep updating their status to the a promising solution, which, however, will introduce extra pre-
controller and actuator [40], where the freshness of informa- diction errors [44]. To guarantee the round-trip delay, overall
tion is one of the KPIs measured by Age of Information (AoI) reliability, and throughput, we need to analyze the fundamental
[41]. tradeoffs among them and design practical solutions that can
In wireless communications, the numbers of constraints and approach the performance limit [45].
optimization variables increase with the number of devices. 3) Outdoor large-scale networks: One of the most im-
Since most of the optimization algorithms in [42] only work portant applications in outdoor large-scale networks is the
well for small and medium scale problems, as the density vehicle safety application [35]. With the current technologies
of devices grows, scalability becomes the most challenging for autonomous vehicles, no information is shared among
issue in indoor large-scale networks. In addition, stochastic vehicles. Thus, vehicles are not aware of potential threats
packet arrival processes from a large number of users will from the blind areas of their image-based sensors and distance
lead to strong interference in the air interface. The burstiness sensing devices [46]. URLLC can enhance road safety by
and correlation of packet arrivals will result in severe network sharing street maps and safety messages among vehicles.
congestions in queueing systems and computing systems. Like indoor large-scale networks, the spectrum efficiency is
How to alleviate network congestions in large-scale networks one of the major KPIs in outdoor large-scale networks with
remains an open problem. high user density. To maintain current state information from
2) Indoor wide-area networks: For long-distance applica- nearby vehicles, minimizing AoI is helpful for improving
tions in Tactile Internet, like telesurgery and remote training, road safety in vehicular networks [47]. In addition, outdoor
the system stability and teleoperation quality are very sensitive wireless networks are more vulnerable than indoor networks.
to round-trip delay [36,43]. Thus, the round-trip delay is a KPI Potential eavesdroppers can receive signals and crack the
4
information with high probabilities. Therefore, security should Moreover, to analyze the E2E performance in URLLC, the
be considered for URLLC in outdoor wireless networks [48]. models may be very complicated, and closed-form results may
Current cellular networks can achieve 95% coverage prob- not be available.
ability, which is satisfactory for most of the existing services. Another approach is to evaluate the performance of the
In URLLC, the required network availability can be up to solution with simulation or experiment. The advantage is
99.999% [49]. As a result, we need to improve the network that they do not rely on any theoretical assumptions. On the
availability by several orders of magnitude. Due to high negative side, the evaluation procedure is time-consuming,
mobility, service interruptions caused by frequent handovers especially for URLLC. For example, to validate the packet
and network congestions are the bottlenecks for achieving high loss probability (around 10−5 ∼ 10−7 in URLLC) of a user,
availability. Besides, existing tools for analyzing availability the system needs to send 108 ∼ 1010 packets.
are only applicable in small-scale networks [50, 51]. How 2) Cross-layer Optimization: The E2E delay and the overall
to analyze and improve network availability in large-scale packet loss probability depend on the solutions in different
networks remains unclear. layers [14]. To optimize the performance of the whole system,
4) Outdoor wide-area networks: Unmanned Aerial Vehicle we need cross-layer optimization frameworks, from which it
(UAV) control, telerobotic control, and smart grids are typical is possible to reveal some new technical issues by analyzing
applications in outdoor wide-area networks [2, 30, 38]. Estab- the interactions among different lays. For example, as shown
lishing secure, reliable, and real-time control and non-payload in [57], when the required queueing delay is shorter than
communication (CNPC) links between UAVs and ground the channel coherence time, the power allocation policy is
control stations/devices or satellites in a wide-area has been a type of channel inverse policy, which can be unbounded
considered as one of the major goals in future space-air-ground over typical wireless channels, such as the Nakagami-m fading
integrated networks [52]. Thus, the KPIs in outdoor wide- channel. One way to analyze the reliability of a wireless link
area networks include security, network availability, round- is to analyze the outage probability [58]. However, such a
trip delay, AoI and jitter. performance metric cannot characterize the queueing delay
In UAV control, there are two ways to maintain CNPC links: violation probability in the link layer. To keep the decoding
ground-to-air communications and satellite communications error probability and the queueing delay violation probability
[53,54]. Nevertheless, ground-to-air links may not be available below the required thresholds, some packets should be dropped
in rural areas, where the density of ground control stations is proactively when the channel is in deep fading [16]. By
low. If packets are sent via satellites, they suffer from long cross-layer design, one can take the delay components and
propagation delays and coding delays. In smart grids, although packet losses in different layers into account, and hence it is
the E2E delay and overall reliability are less stringent than possible to achieve the target E2E delay and overall packet
factory automation, the long communication distance leads to loss probability in URLLC.
long propagation delay [2]. As a result, achieving the target Although cross-layer optimization has the potential to
KPIs in outdoor wide-area networks is still very challenging achieve better E2E performances than dividing communication
with current communication technologies. systems into separated layers, to implement cross-layer opti-
mization algorithms in practical systems, one should address
the following issues,
B. Tools and Methodologies for URLLC • High computing overheads: Wireless networks are highly
To achieve the target KPIs, we need to revisit analysis tools dynamic due to channel fading and traffic load fluctua-
and design methodologies in wireless networks. tions. As a result, the system needs to adjust resource allo-
1) Analysis Tools: To analyze the performance of a wireless cation frequently according to these time-varying factors.
network, a variety of theoretical tools have been developed Without the closed-form expression of the optimal policy,
in the existing literature [24, 55]. For example, to reduce the system needs to solve the optimization problem in
transmission delay, several kinds of channel coding schemes each TTI. This will bring very high computing overheads.
with short blocklength have been developed in the existing Even if the problem is convex and can be solved by
literature [27]. To obtain the achievable rate in the short existing searching algorithms, such as the interior-point
blocklength regime, a new approximation was derived by Y. method [42], the algorithms are still too complicated to
Polyanskiy et al. [56]. It indicates that the decoding error be implemented in real-time.
probability does not vanish for arbitrarily finite Signal-to- • Intractable problems: Owning to complicated models
Noise Ratio (SNR) when the blocklength is short. Such an from different layers, cross-layer optimization problems
approximation can characterize the relationship between data are usually non-convex or Non-deterministic Polynomial-
rate and decoding error probability, and is fundamentally time (NP)-hard. Solving NP-hard optimization problems
different from Shannon’s capacity that was widely used to usually takes a long time, and the resulting optimization
design wireless networks. algorithms can hardly be implemented in real-time. Low-
If we can derive the closed-form expressions of KPIs in complexity numerical algorithms that can be executed in
Table I, then it is possible to predict the performance of a so- practical systems for URLLC are still missing.
lution or decision. The disadvantage is that theoretical analysis • Model mismatch: Since the available models in different
tools are based on some assumptions and simplified models layers are not exactly the same as practical systems, the
that may not be accurate enough for URLLC applications. model mismatch may lead to severe QoS violations in
5
real-world networks. How to implement the solutions and approaches can perform very well in stationary networks
policies obtained from the analysis and optimization in [12, 67]. However, when the environment changes, the
practical systems remains an open problem. DNN trained off-line can no longer guarantee the QoS
3) Deep Learning for URLLC: Unlike optimization al- constraints of URLLC. To handle this issue, the system
gorithms, deep learning approaches can be model-based or needs to adjust the DNN in non-stationary environments
model-free [26, 59, 60], and have the potential to be im- with few training samples [13].
plemented in real-world communication systems [61]. The • Exploration safety of DRL: To improve the policy in
advantages of deep learning algorithms can be summarized the unknown environment, a DRL algorithm will try
as follows. some actions randomly to estimate the reward of different
actions [68]. During explorations, the DRL algorithm
• Real-time implementation: After off-line training, a DNN
may try some bad actions, which will deteriorate the
serves as a mapping from the observed state to a near-
QoS significantly and may lead to unexpected accidents
optimal action in communication systems, where the
in URLLC systems. Thus, the exploration safety will
forward propagation algorithm is used to compute the
become a bottleneck for applying DRL in URLLC.
output of a given input. According to the analysis in [13],
the computational complexity of the forward propagation
algorithm is much lower than searching algorithms for C. Road-map Toward URLLC in 6G Networks
cross-layer optimizations. Therefore, by using DNN in Based on the KPIs and challenges of different applications
5G NR, it is possible to obtain a near-optimal action in Section II-A and the tools and methodologies in Section
within each TTI. II-B, the road-map toward URLLC in 6G networks is summa-
• Searching optimal policies numerically: When optimal rized in Fig. 3. As indicated in [11], by integrating theoretical
solutions are not available, there is no labeled sample for knowledge into deep learning algorithms, it is possible to
supervised learning. To handle this issue, the authors of provide in-depth understandings and practical solutions for
[25, 62, 63] proposed to use unsupervised deep learning URLLC.
algorithms to search the optimal policy. Unlike optimiza- Step 1 (Knowledge-based analysis and optimization): To
tion algorithms that find an optimal solution for a given satisfy the QoS requirements of URLLC in real-word sys-
state of the system, unsupervised deep learning algo- tems, one should first take the advantage of the knowledge
rithms find the form of the optimal policy numerically. in information theory, queueing theory, and communication
The basic idea is to approximate the optimal policy with theory [69]. From theoretical analysis tools and cross-layer
a DNN and optimize the parameters of the DNN towards optimization frameworks, we can obtain the performance
a loss function reflecting the design goal. limits on the tradeoffs among different KPIs [16, 56, 70].
• Exploring optimal policies in real-world networks: When Step 2 (Knowledge-assisted training of deep learning):
there is no labeled sample or theoretical model to formu- Based on analysis tools and optimization frameworks, one can
late an optimization problem, we can use DRL to ex- build a simulation platform to train deep learning algorithms
plore optimal policies from the real-world network [60]. [26, 59]. In supervised deep learning, we can find labeled
For example, an actor-critic algorithm uses two DNNs training samples from the simulation platform [12]. In unsu-
to approximate the optimal policy and value function, pervised deep learning, the knowledge of the communication
respectively. From the feedback of the network, the two systems can be used to formulate the loss function [67]. In
DNNs are updated until convergence [64]. DRL, by initializing algorithms off-line, the exploration safety
Although deep learning algorithms have the above advan- and convergence time of DRL algorithms can be improved
tages, to apply them in URLLC, the following issues remain significantly [71].
unclear. Step 3 (Fine-tuning deep learning in real-world networks)
From the previous step, DNNs are pre-trained with the help
• QoS guarantee: When designing wireless networks for
of models, analysis tools, and optimization frameworks that
URLLC, the stringent QoS requirements should be sat-
capture some key features of real-world systems. Since the
isfied. When using a DNN to approximate the optimal
models are not exactly the same as the real-world systems,
policy, the approximation should be accurate enough to
we need to fine-tune DNNs in non-stationary networks. As
guarantee the QoS constraints. Although the universal ap-
shown in [60], data-driven deep learning updates the initialized
proximation theorem of DNNs indicates that a DNN can
DNNs by exploiting real-world data and feedback. After fine-
be arbitrarily accurate when approximating a continuous
tuning, we can obtain practical solutions from the outputs of
function [65, 66], how to design structures and hyper-
the DNNs in real-time.
parameters of DNNs to achieve a satisfactory accuracy
remains unclear.
• Learning in non-stationary networks: When applying D. Related Works
supervised deep learning in URLLC, the DNN is trained 1) Survey and Tutorial of URLLC: URLLC was included
off-line with a large number of training samples [12]. in the specification of 3GPP in 2016. Since then, great efforts
When using unsupervised deep learning to find the have been devoted to this area [14, 72–77]. The review from
optimal policy, we need to formulate an optimization the information theory aspect was published in 2016 [72],
problem by assuming the system is stationary [67]. Both where the analyses of the impacts of short blocklength channel
6
Key
performance Application Research
indicators scenarios challenges
Real-world
URLLC
systems
Labeled training
Cross-layer Theoretical samples; Knowledge- Supervised
optimization knowledge problem assisted deep deep
formulation; learning learning
safe exploration
environments
Performance Unsupervised
analysis Deep deep learning
Numerical
reinforcement
platform
learning
Fig. 3. Road map toward URLLC in 6G networks [11, 12].
codes on the latency and reliability were discussed. With a have been carried out [20, 21, 60, 82–86]. J. Wang et al.
focus on standard activities, A. Nasrallah et al. [74] provided presented a thorough survey of machine learning algorithms
a comprehensive overview of the IEEE 802.1 Time-Sensitive and their use in the next-generation wireless networks in [82].
Networking (TSN) standard and Internet Engineering Task Q. Mao et al. [83] reviewed how to use deep learning in
Force Deterministic Networking standards as the key solutions different layers of the OSI model and cross-layer design.
in the link and network layers, respectively. X. Jiang et With the focus on the smart radio environment, A. Zappone
al. [14] proposed a holistic analysis and classification of et al. proposed to integrate deep learning with traditional
the main design principles and technologies that will enable model-based technologies in future wireless networks [21]. A
low-latency wireless networks, where the technologies from comprehensive survey of the crossovers between deep learning
different layers and cross-layer design were summarized. With and wireless networks was presented by C. Zhang et al. in [20],
the focus on the physical layer, K. S. Kim introduced solutions where the authors illustrated how to tailor deep learning to
for spectrally efficient URLLC techniques [75]. G. Sutton mobile environments. N. C. Luong et al. presented a tutorial
et al. reviewed the technologies for URLLC in the physical on DRL and reviewed the applications of DRL in wireless
layer and the medium access control layer technologies [78]. communications and networking [60]. A tutorial on how to use
K. Antonakoglou et al. [73] introduced haptic communica- deep/reinforcement learning for wireless resource allocation in
tion as the third media stream that will complement audio vehicular networks was presented by L. Liang et al. in [84]. Y.
and vision over the Internet. The evaluation methodologies Sun et al. summarized some key technologies and open issues
and necessary infrastructures for haptic communications were of applying machine learning in wireless networks in [85].
discussed in [73]. I. Parvez et al. [76] presented a general J. Park et al. provided a tutorial on how to enable wireless
view on how to meet latency and other 5G requirements network intelligence at the edge [86].
with Software-Defined Network (SDN), network function vir-
tualization, caching, and Mobile Edge Computing (MEC).
M. Bennis et al. noticed that the existing design approaches
mainly focus on average performance metrics and hence are
Although the above two branches of existing works either
not applicable in URLLC (e.g., average throughput, average
investigated model-based approaches for URLLC or reviewed
delay, and average response time) [77]. They summarized the
deep learning in wireless networks. None of them discussed
tools and methodologies to characterize the tail distribution of
how to optimize wireless networks for URLLC by integrating
the delay, service quality, and network scalability.
theoretical knowledge of wireless communications into deep
2) Survey and Tutorial of Deep Learning in Wireless Net- learning, which is the central goal of our work. Different from
works: Deep learning has been considered as one of the key existing papers, we revisit the theoretical models, analysis
enabling technologies in the intelligent 5G [79] and the beyond tools, and optimization frameworks, review different deep
5G cellular networks [80, 81]. To combine deep learning with learning algorithms for URLLC. Furthermore, we illustrate
wireless networks, several comprehensive surveys and reviews how to combine them together to achieve URLLC.
7
Central
server
Network-level tasks:
Routing, handovers,
network slicing
Core networks,
high latency
(Partial) Centralized control plane
BS-level Tasks:
BS with caches and …
Access control, scheduler
edge servers
design, resource management
Ultra-reliable and low-latency

radio access networks
Smart devices with local … User-level tasks:

caches and processing units Traffic and mobility prediction
& anomaly detection
Autonomous Factory
VR/AR vehicles automation
Fig. 4. A general architecture in 6G networks [11, 88, 89].
III. N ETWORK A RCHITECTURES AND D EEP L EARNING average power consumption of users subject to the constraint
F RAMEWORKS IN 6G on delay bound violation probability [93]. In multi-cell MEC
The application of machine learning in cellular networks has networks, J. Liu et al. investigated how to improve the tradeoff
gained tremendous attention in both academia and industry. between latency and reliability [94], where a typical user is
Recently, how to operate cellular networks with machine considered. To further exploit the computing resources from
learning algorithms had been considered as one of the 3GPP the central server, a multi-level MEC system was studied in
work items in [87]. Since computing and storage resources will heterogeneous networks, where computing-intensive tasks can
be deployed at both edge and central servers of 6G networks, be offloaded to the central server [95].
6G will have the capability to train deep learning algorithms To implement the existing solutions in real-world MEC
[86]. In this section, we review some network architectures systems, some problems remain open:
and illustrate how to develop deep learning frameworks. • Optimization complexity: Optimization problems in MEC
systems are general non-convex. To find the optimal so-
A. Network Architectures lution, the computing complexity for executing searching
A general network architecture for various application sce- algorithms is too high to be implemented in real-time.
• Mobility of users: In vehicle networks, users are moving
narios (such as IoT networks [88] and vehicular networks
with high speed. The key issue here is adjusting task
[89]) in 6G networks is illustrated in Fig. 4. As shown in
offloading during frequent handovers. How to optimize
[11], the communication, computing, and caching resources
task offloading for high mobility users remains unclear.
are integrated into a multi-level system to support tasks at the
• High overhead for exchanging information: Current task
user level, the Base Station (BS) level, and the network level.
offloading and resource allocation algorithms are exe-
Such an architecture can be considered as a ramification of
cuted at the central controller. This will lead to high
the following architectures.
overhead for exchanging information among each BS and
1) MEC: As a new paradigm of computing, MEC is be-
the central controller. To avoid this, we need distributed
lieved to be a promising architecture for URLLC [90]. Unlike
algorithms for URLLC, which are still unavailable in the
centralized mobile cloud computing with routing delay and
existing literature.
propagation delay in backhauls and core networks, the E2E
• Serving hybrid services: VR/AR applications are ex-
delay in MEC systems consists of Uplink (UL) and Downlink
pected to provide immersion experience to users [96]. To
(DL) transmission delays, queueing delays in the buffers of
achieve this goal, the system needs to send 360o video as
users and BSs, and the processing delay in the MEC [91].
well as tactile feedback to each user. How to optimize task
The results in [91] indicate that it is possible to achieve 1 ms
offloading for both video and tactile services in VR/AR
E2E delay in MEC systems by optimizing task offloading and
applications needs further research.
resource allocation in RANs.
To analyze the E2E delay and overall packet loss probability 2) Multi-Connectivity and Anticipatory Networking: To
of URLLC, Y. Hu et al. considered both decoding error achieve a high network availability, an effective approach is
probability in the short blocklength regime and the queueing to serve each user with multiple links, e.g., multi-connectivity
delay violation probability [92]. C.-F. Liu et al. minimized the [97]. Such an approach will bring two new research chal-
8
lenges: 1) improving the fundamental tradeoff between spec- schedulers, as well as network architectures. Thus, how to
trum efficiency and network availability [98] and 2) providing quantify the network resources required by URLLC remains
seamless service for moving users [99]. unclear. Most of the existing papers on network slicing either
As illustrated in [50], one way to improve network availabil- assumed that the required resources to guarantee the QoS
ity without sacrificing spectrum efficiency is to serve each user of each service are known at the central controller [107] or
with multiple BSs over the same subchannel (or subcarriers). considered a simplified model to formulate QoS constraints
Such an approach is referred to as intra-frequency multi- [110]. To fully address this issue, a cross-layer model for E2E
connectivity. The disadvantage of this intra-frequency multi- optimization is needed.
connectivity is that the failures of different links are highly
correlated. For example, if there is a strong interference on
B. Deep Learning Frameworks
a subchannel, then the signal to interference plus noise ratios
of all the links are low. To alleviate cross-correlation among To cope with the tasks at different levels in Fig. 4, a
different links, different nodes can connect to one user with deep learning framework should exploit computing and storage
different subchannels or even with different communication resources at different entities [11]. A general deep learning
interfaces [98]. To achieve a better tradeoff between spectrum framework is illustrated in Fig. 5, which is built upon the
efficiency and availability, network coding is a viable solution following subframeworks.
[100]. Most of the existing network coding schemes are too 1) Federated Learning: In the multi-level computing sys-
complicated to be implemented in URLLC. To reduce the tem in Fig. 4, we can hardly obtain well-trained DNNs with
complexity of signal processing, configurable templates for local resources and data since the computing resources and the
network coding were developed in [101]. number of training samples at a user or a BS are not enough.
To provide seamless service for moving users, anticipatory Federated learning is capable to exploit local DNNs trained
networking is a promising solution [102]. The basic idea of with local data to learn global DNNs [112]. To obtain a global
anticipatory networking is to predict the mobility of users DNN, the basic idea is to compute the weighted sum of the
according to their mobility pattern and reserve resources parameters of local DNNs. In this way, each user or BS only
before handovers proactively. N. Bui et al. validated that needs to upload its local or edge DNN to the central server,
anticipatory networking allows network operators to save half and does not need to share its data. Such a framework can
of the resources in Long Term Evolution systems [103]. avoid high communication overheads and the security issue
More recently, K. Chen et al. proposed a proactive network caused by collecting private training data from all the users
association scheme, where each user can proactively associate [113].
with multiple BSs [99]. As proved in [104], the requirements Federated learning was used to evaluate the trail probability
on queueing delay and queueing delay violation probability of delay in URLLC [114], where the basic idea is to collect the
can be satisfied with this scheme. estimated parameters from all the users and perform a global
3) SDN and Network Slicing: SDN architectures have average. Such an approach highly relies on the assumption that
been adopted by 5G Infrastructure Public Private Partnership the local training data at different users are Independent and
(5G-PPP) architecture working group [105]. By splitting the Identically Distributed (IID). To implement federated learning
control plane and the user plane, the controller can manage with non-IID data sets, solutions have been proposed in two
radio resources and data flows in fully centralized, partially of the most recent papers [115, 116]. However, both of these
centralized, and fully decentralized manners. Essentially, there works were not focused on URLLC. How to guarantee the
is a tradeoff between control-plane overheads and user-plane QoS requirements of users with non-IID data deserves further
QoS. To achieve a better tradeoff, 5G-PPP structured network research.
functions into the following three parts according to their 2) Deep Transfer Learning: Transfer learning aims to apply
position in the protocol stack [105]: high layer, medium layer, knowledge learned in previous tasks to novel tasks [117].
and low layer. The high layer control plane manages resource Recently, transfer learning was combined with DNNs, and the
coordination and long-term load balancing for QoS guarantee combined approach is referred to as deep transfer learning
and network slicing. The medium layer control plane handles [118]. Different from training a new DNN from scratch, deep
mobility and admission control. The low layer control plane transfer learning only needs a few training samples to update
deals with short-term scheduling and physical-layer resource the DNN trained for the previous task. Such a feature of
management. deep transfer learning brings several advantages in wireless
Since the control plane is deployed at different layers, networks:
there is no need to collect all the network state information • Handle model mismatch problem: In wireless networks,
at the central controller. Network resources (e.g., computing the models for performance analysis and optimization are
resources [106], storage resources [107], and physical resource not exactly the same as real-world systems. To implement
blocks [108]) are allocated to different slices according to the a DNN that is trained from theoretical models, we can
slow varying traffic loads and QoS requirements. To guarantee use deep transfer learning to fine-tune the DNN [12].
the QoS of different services, the system needs to map the QoS • Deep learning in non-stationary networks: In non-
requirements to network resources [109]. However, the E2E stationary wireless networks, the pre-trained DNN needs
delay and overall packet loss probability depend on physical- to be updated according to the non-stationary environ-
layer resource allocation, link-layer transmission protocols and ments. Since it is difficult to obtain a large number of
9
Central library with global DNNs

Network level
Models: Behavior of BSs and Initialize … Transfer Transfer …
users & QoS constraints learning learning
Task 1 Task 2 Task n Task (n+1)
Data: Topologies of networks
& requests from users Fine-tune Lifelong deep learning
…
Sharing models
Sharing DNNs Aggregate edge DNNs at

the central server
BS level DNNs trained

at edge servers Global DNNs
Models: traffic and channel Initialize
models & theoretical formulas
Data: measurement in real-
world BSs Fine-tune
Edge DNNs
Sharing DNNs
Sharing models
Data or local Well-trained

DNNs DNNs
User level
Models: traffic and motion Initialize
models & simulation …
Data: packet arrivals and
trajectories & anomalous events Local DNNs for
Fine-tune
real-time inference
(a) Sharing models and DNNs (b) Sharing storage and computing
from bottom to top resources from top to bottom
Fig. 5. A general deep learning framework [11, 111].
new training samples within a short time in the new available information. To guarantee the QoS requirements of
environment, deep transfer learning should be applied. URLLC with a partially centralized or a fully decentralized
• Applying global DNNs at local devices: Considering that control plane, a distributed algorithm is needed [120].
the data sets at different devices or BSs follow different In a system, where users and BSs make decisions in a
distributions [115, 116], if a global DNN is directly used distributed way, the problem can be formulated as a fully
by a device or a BS for inference, it can neither guaran- cooperative game to improve the global network performance
tee the QoS requirements nor achieve optimal resource [121]. To find stationary solutions of the game, multi-agent
utilization efficiency. To avoid this issue, deep transfer (deep) reinforcement learning turns out to be a promising
learning should be used to fine-tune a global DNN with approach as shown in [122, 123].
local training samples.
3) Lifelong Deep Learning: As shown in [119], lifelong C. Summary of Design Principles
learning is a kind of cumulative learning algorithm that The principles for designing deep learning frameworks in
consists of multiple steps. In each step, transfer learning is 6G networks are summarized as follows.
applied to improve the learning efficiency of the new task. As 1) Sharing Models and DNNs from Bottom to Top: As il-
illustrated in Fig. 5, the knowledge learned from the previous lustrated in Fig. 5(a), the tasks in the network level and the BS
(n − 1) tasks will be reused to initialize the n-th new task. level depend on the behaviors of BSs and users, respectively.
After the training stage, the n-th task becomes a source task When optimizing policies in the higher levels, the system
and will be reused to train new tasks in the future. For needs to acquire the models at lower levels [12]. Alternatively,
example, by combining the federated learning framework with the system can share DNNs that mimic the behaviors of BSs
lifelong learning, a lifelong federated reinforcement learning and users with the centralized control plane. For example,
framework for cloud robotic systems was proposed by B. Liu when the theoretical models are inaccurate and the number
et al. [111]. Such a framework can be adopted in the network of real-word data samples is limited, Generative Adversarial
architecture in Fig. 4, where the DNNs for different tasks are Networks (GANs) can be used to approximate the distributions
saved in a central library at the central server. For each device of packet arrival processes or wireless channels [124]. When
or BS, it fetches the global DNN of the relevant task and fine- the generator networks are available at the centralized control
tunes it according to the environment. Finally, a local DNN plane, there is no need to share traffic and channel models.
can be obtained for real-time inference. 2) Sharing Storage and Computing Resources from Top to
4) Distributed Deep Learning: To avoid high overheads Bottom: To guarantee the data-plane latency, the control plane
for exchanging information, the control plane of a cellular needs to make decisions according to dynamic traffic states and
network should not be fully centralized [105]. In other words, channel states in real-time [105]. With well-trained DNNs, it is
each control plane makes its’ own decisions according to the possible for BSs and mobile devices to obtain inference results
10
within a short time. However, the training phase of a deep regime [56, 70, 72, 136–138]. The milestones of the achiev-
learning algorithm usually needs huge storage and computing able rate in the short blocklength regime are summarized in
resources. As a result, a BS or a device can hardly obtain a Table II.
well-trained DNN with local training samples and computing
resources. As shown in Fig. 5(b), to enable deep learning at TABLE II
lower levels, the central server needs to share storage and A CHIEVABLE R ATE IN THE S HORT B LOCKLENGTH R EGIME
computing resources to BSs and devices by federated learning.
Year Reference Channel Model
3) Tradeoffs among Communications, Computing, and
2010 [56] AWGN channel
Caching: Sharing information among different levels helps
saving computing and caching resources at edge servers and 2011 [139] Gilbert-Elliott Channel
mobile devices at the cost of introducing extra communication 2011 [136] Scalar coherent fading channel
overheads. Meanwhile, communication delay and reliability 2014 [70] Quasi-static Multiple-Input-Multiple-
affect the performance of learning algorithms [125,126]. Thus, Output (MIMO) channel
when developing a deep learning framework in a wireless 2014 [137] Coherent Multiple-Input-Single-Output
network, the tradeoffs among communications, computing, (MISO) channel
and caching should be examined carefully. Novel approaches 2016 [138] Multi-antenna Rayleigh fading channel
that can improve these tradeoffs are in urgent need [113,126]. 2019 [140] Coherent multiple-antenna block-
fading channels
IV. T OOLS FOR P ERFORMANCE A NALYSIS IN URLLC 2019 [141] Saddlepoint Approximations, Rayleigh
block-fading channels
To illustrate how to improve deep learning with theoretical
tools and models, we revisit tools in information theory,
queueing theory, and communication theory for performance To show the difference between the achievable rate in
analysis in URLLC. the short blocklength regime and Shannon’s capacity, we
rephrase some results in [56] here. The achievable rate in short
blocklength regime over an AWGN channel can be accurately
A. Information Theory Tools
approximated by the normal approximation, i.e.,
The fundamental relationship between the achievable rate " r #
and radio resources is crucial for formulating optimization W V −1
R≈ ln (1 + γ) − f (εc ) (bits/s), (2)
problems in communication systems. In 1948, the maximal ln 2 LB Q
error-free data rate that can be achieved in a communication
where LB is the blocklength, εc is the decoding error probabil-
system was derived in [127], which is known as Shannon’s −1
ity, fQ (·) is the inverse of Q-function, and V is the channel
capacity. For example, over an Additive White Gaussian Noise 1
(AWGN) channel, Shannon’s capacity can be expressed as dispersion, which can be expressed as V = 1− (1+γ) 2 over the
follows, AWGN channel. The blocklength LB is the number of symbols

in each block. To transmit LB symbols, the amount of time
R = W log2 (1 + γ) (bits/s), (1) and frequency resources can be obtained from Dt W = LB ,
where Dt is the transmission duration of the LB symbols.
where W is the bandwidth of the channel and γ is the SNR. In
The difference between (1) and (2) lies in the second
the past decades, researchers tried to develop a variety of cod-
term in (2). When the blocklength LB goes to infinite, then
ing schemes to approach the performance limit. Meanwhile,
the achievable rate in (2) approaches Shannon’s capacity. To
Shannon’s capacity was widely applied in communication
transmit b bits in one block with duration Dt , the decoding
system design.
error probability can be derived as follows,
It is well known that Shannon’s capacity can be approached
r !
when the blocklength of channel codes goes to infinity.

Dt W b ln 2
However, to avoid a long transmission delay in URLLC, the εc ≈ fQ ln (1 + γ) − , (3)
V Dt W
blocklength should be short. As a result, Shannon’s capacity
cannot be achieved, and the decoding error probability is non- which is obtained by substituting (2) into Dt R = b. From
zero for arbitrary high SNRs. (3) we can see that to keep the decoding error probability
There are two branches of research in the finite blocklength constant, the required bandwidth decreases with the trans-
regime. The first one is analyzing the delay (i.e., coding mission duration. Essentially, there are tradeoffs among the
blocklength) and reliability tradeoffs that can be achieved with transmission delay, the spectrum efficiency, and the decoding
different coding schemes [128–135]. For example, a lower error probability.
bound on the bit error rate of finite-length turbo codes was de- Considering that the result in (3) is an approximation
rived in [128]. The performance of finite-length Low-Density that cannot guarantee the required decoding error probability,
Parity-Check codes was analyzed in [129]. The authors of the lower and upper bounds of the achievable rate were
[131] investigated how to adjust the blocklength of polar codes provided in [141]. The authors further illustrated that the
to keep the BLock Error Rate (BLER) constant. Saddlepoint Approximation derived in this paper lies between
The second branch of research aims to derive the perfor- the lower and upper bounds, and is more accurate than the
mance limit on the achievable rate in the short blocklength normal approximation over Rayleigh fading channel. However,
11
Saddlepoint Approximation has no closed-form expression in 3) Statistical QoS: For most of the real-time services and
general, and hence we cannot obtain closed-form expressions URLLC, statistical QoS is the best choice among the three
of QoS constraints by using Saddlepoint Approximation. As kinds of metrics. Statistical QoS is characterized by a delay
shown in [27], when the BLER is 10−7 , the gap between the bound and a threshold of the maximal tolerable delay bound
normal approximation and practical channel coding schemes, violation probability, (Dqmax , εq ), respectively. For Voice over
such as the extended Bose, Chaudhuri, and Hocquenghem Internet Protocol services, the requirement in radio access
(eBCH) code, is less than 0.1 dB. Thus, (2) and (3) are still networks is (50 ms, 2%) [158]. For URLLC, the queueing
good choices in practical system design. delay bound should be less than the E2E delay and the
queueing delay violation probability should be less than the
overall packet loss probability, e.g., (1 ms, 10−7 ) [1].
B. Queueing Theory Tools The existing publications on the statistical QoS or the
Depending on the delay metrics, the existing queueing distribution of queueing delay mainly focus on specific arrival
theory research papers can be classified into three categories, models or service models. Some useful results with FCFS
i.e., average delay [142–145], hard deadline [15,146–149], and servers and Processor-Sharing (PS) servers are summarized
statistical QoS [16, 150–156]. in Table III, where Kendall’s Notation indicates arrival types,
1) Average Queueing Delay: Let us consider a queueing service types, number of servers, and scheduling principles.
system with average packet arrival process ā(t), where a(t) Here “M” and “D” represent memoryless processes and de-
is the data arrival rate at time t. Then, the average queueing terministic processes, respectively. “G” means that the inter-
delay D̄(t) and the average queue length Q̄(t) satisfy a simple arrival time between packets or the service time of each
and exact relation [55], i.e., packet may follow any general distribution. The abbreviation
“Geo[X] ” in [159] means that the packets arrived in a time slot
Q̄(t) = D̄(t)ā(t), constitutes a bulk, and the bulk arrival process is stationary and
which is the famous Little’s Law. It is a very general conclu- memoryless with Binomial marginal distribution. Considering
sion that does not depend on the service order of the queueing that PS servers are widely deployed in computing systems
system, the length of the buffer, or the distributions of the [55], A. P. Zwart et al. derived the distribution of the delay in
inter-arrival time between packets and the service time of each the large-delay regime, which is referred to as a tail probability
packet. The only assumption is that the arrival processes and [160]. Since the delay experienced by short packets in the short
service processes are stationary. delay regime is more relevant to URLLC, C. She et al. derived
an approximation of the delay experienced by short packets in
The average delay metric is suitable for services with no
a PS server [91].
strict delay requirement, such as file download. However, it is
not applicable to URLLC. This is because the average delay
TABLE III
metric cannot reflect jitter or the packet loss probability caused S TATISTICAL Q O S OF S PECIFIC Q UEUEING M ODELS
by delay violations. Even if the average delay is shorter than
1 ms, the delay experienced by a certain packet could be much Reference Kendall’s Notation Result
longer than 1 ms. [161] M/D/1/FCFS Distribution of delay
2) Hard Deadline: If a service requires a hard deadline Td , [162] M/M/1/FCFS Distribution of delay
it means that the packets should be transmitted to the receiver [159] Geo[X] /G/1/FCFS Upper bound of delay vio-
before the deadline with probability one. To illustrate how to lation probability
guarantee a hard deadline, we consider a fluid model in a First- [160] M/G/1/PS Tail probability of delay
Come-First-Serve (FCFS) system as that in [147]. The total
[91] M/G/1/PS Delay experienced by short
amount of data arriving at the R tbuffer during the time interval packets
[0, t] is defined as A(t) = 0 a(τ )dτ . The total amount of
data leavingR the buffer during time interval [0, t] is denoted
t There are two principal tools for analyzing statistical QoS
by S(t) = 0 s(τ )dτ , where s(t) is the service rate at time in more general queueing systems, i.e., network calculus and
t. To satisfy the hard deadline Td , the following constraints effective bandwidth/capacity [150, 151, 163].
should be satisfied, The basic idea of network calculus is to convert the accu-
S(t) ≤ A(t), ∀t ∈ [0, T ], (4) mulated arrival and service rates from the bit domain to the
S(t) ≥ A(t − Td ), ∀t ∈ [Td , T + Td ], (5) SNR domain, and then derive the upper bounds of the Mellin
transforms of the arrival and service processes [164]. From
where T is the total service time. Constraint (4) ensures that the upper bounds of the Mellin transforms, the upper bound
the amount of data transmitted by the system does not exceed of the delay violation probability can be obtained [155].
the amount of data that has arrived at the buffer. Constraint Network calculus provides the upper bound of the Com-
(5) ensures the hard deadline requirement. plementary Cumulative Distribution Function (CCDF) of the
The hard deadline is widely applied in wireline com- steady-state queueing delay. However, network calculus is
munications, where the channel is deterministic [147, 157]. not convenient in wireless network design since it is very
However, constraint (5) cannot be satisfied in most wireless challenging to derive the relation between (Dqmax , εq ) and the
communication systems [15]. required radio resources in a closed form. In URLLC, we are
12
interested in the tail probability of the queueing delay, i.e., the TABLE IV
case that εq is very small. Thus, there is no need to derive the U SEFUL RESULTS FOR ANALYZING STATISTICAL Q O S
upper bound of CCDF for all values of εq .
Effective Bandwidth
To analyze the asymptotic case when εq is very small,
Year Reference Sources
effective bandwidth and effective capacity can be used [150,
151]. Based on Mellin transforms in network calculus, we 1995 [150] General definition
can derive the effective bandwidth of arrival processes and 1996 [169] Periodic sources, Gaussian sources, ON-
effective bandwidth of service processes with Gärtner-Ellis OFF sources, and compound Poisson
sources
theory [150, 151, 164].
2016 [170] Interrupted Poisson sources
Effective bandwidth is defined as the minimal constant
service rate that is required to guarantee (Dqmax , εq ) for a 2018 [16] Poisson sources in closed form
random arrival process a(t) [150]. Inspired by the concept Effective Capacity
of effective bandwidth, D. Wu et al. introduced the concept Year Reference Channels
of effective capacity over wireless channels [151]. Effective 2003 [151] General definition
capacity is defined as the maximal constant arrival rate that 2007 [165] ON-OFF channel
can be served by a random service process s(t) subject to the 2007 [171] Nakagami-m fading channel
requirement (Dqmax , εq ). We denote the effective bandwidth 2010 [172] Correlated Rayleigh channel
of a(t) and the effective capacity of s(t) as EB and EC ,
2013 [173] AWGN, finite blocklength regime
respectively. When both the arrival process and the service
process are random, (Dqmax , εq ) can be satisfied if [165] 2016 [174] Rayleigh channel, finite blocklength
regime
EC ≥ EB . (6)
The formal definitions of effective bandwidth and effective For example, NP copies of one packet are transmitted over
capacity were summarized in [166], i.e., NP paths. The packet loss probability of each path is denoted
1 h Rt i by PL (n), n = 1, ..., NP . If the packet losses of different paths
EB = lim ln E eθ 0 a(τ )dτ , are uncorrelated, then the overall packet loss probability can
t→∞ θt
Rt NQP
1 h i
be expressed as PLtot = PL (n).
EC = − lim ln E e−θ 0 s(τ )dτ ,
t→∞ θt n=1
where θ is a QoS exponent. The value of θ decreases with

Dqmax and εq according to the following expression [167],
Pr Dq∞ ≥ Dqmax ≈ exp −θEB Dqmax = εq ,

(7)
where Dq∞ is the steady-state queueing delay. The approxima-
tion in (7) is accurate for the tail probability. For URLLC, the
accuracy of (7) has been validated in [16], where a Poisson
process, an interrupted Poisson process, or a switched Poisson
process is served by a constant service rate (since the delay
requirement is shorter than the channel coherence time in most
cases in URLLC, the channel is quasi-static and the service
Fig. 6. Cross-correlation of packet losses [24, 30].
rate is constant).
Effective bandwidth and effective capacity have been ex- However, the packet losses of different paths may be highly
tensively investigated in the existing literature with different correlated. To illustrate the impact of cross-correlation of
queueing models. Some useful results have been summarized packet losses on the reliability, we consider a multi-antenna
in Table IV. A more comprehensive survey on effective system with NP distributed antennas in Fig. 6. Let 1LoS (nP )
capacity in wireless networks is presented by Amjad et al. be the indicator that the there is a Line-of-Sight (LoS) path
[168]. between the user and the nP -th antenna. If there is a LoS,
1LoS (nP ) = 1, otherwise, 1LoS (nP ) = 0. Then, the probabil-
ity that there is a LoS path between the user and each of the
C. Communication Theory Tools distributed antennas can be defined as PLoSone
, E{1LoS (nP )}.
1) Characterizing Correlations of Multiple Links: To im- When the distances among the NP antennas are comparable to
prove the reliability of URLLC in wireless communications, the typical scale of obstacles (e.g., the widths and lengths of
we can exploit different types of diversities, such as frequency buildings) in the environment, the values of 1LoS (nP ), nP =
diversity [175], spatial diversity [176], and interface diversity 1, ..., NP , are correlated. The correlation coefficient of two
[177]. The basic idea is to send signals through multiple links adjacent antennas is denoted by ρLoS .
in parallel. If one or some of these links experience a low For URLLC, we are interested in the probability that at least
channel quality or congestions, the receiver can still recover one of the antennas can receive the packet from the user. Let’s
the packet in time with the signals from the other links. consider a toy example: if there are one or more LoS paths,
13
the antenna can receive the packet, otherwise, the packet is network congestions lead to a significant delay in large-scale
lost. Then, the reliability of the system is the probability that wireless networks with high user density. Yet, how to analyze
at least one of the antennas has LoS path to the user, i.e. [178], delay, especially in the low-latency regime, remains an open
all one one NP −1 problem.
PLoS = 1 − (1 − PLoS ) [1 − PLoS (1 − ρLoS )] . In 2012, M. Haenggi applied stochastic geometry to ana-
all lyze delays in large-scale networks [185], where the average
The above expression indicates that PLoS decreases with ρLoS .
all one NP access delay was derived. One year later, P. Kong studied
If ρLoS = 0, then PLoS = 1 − (1 − PLoS ) . If ρLoS = 1, then
all
PLoS one
= PLoS . the tradeoff between the power consumption and the average
For a particular antenna, the value of PLoSone
depends on delay experienced by packets [186], where existing results
the communication environment and the locations of both the on M/G/1 queue were fitted within the stochastic geometry
antenna and the user. According to 3GPP LoS model, PLoS one framework. More recently, Y. Zhong et al. proposed a notation
in a terrestrial cellular network can be expressed as follows of delay outage to evaluate the delay performance of different
[158], scheduling policies, defined as the probability that the average
h delay experienced by a typical user is longer than a threshold
one 18 ru i ru conditioned on the spatial locations of BSs, i.e. [187],
PLoS = min , 1 1 − exp(− ) + exp − ,
ru 36 36
Pr{D̄ > Tth |Φb },
where ru is the distance between the user and the antenna.
one
For ground-to-air channels, the value of PLoS depends on the where D̄ is the mean delay experienced by the user, Tth is the
elevation angle according to the following expression [179, required threshold, and Φb is a realization of spatial locations
180], of BSs.
1 As discussed in Section IV-B, the requirement of URLLC
one
PLoS = , should be characterized by a constraint on the statistical QoS.
1 + φe exp [−ψe (θu − φe )]
However, most of the existing studies analyzed the average
where θu is the elevation angle of the user, and φe and ψe delay in large-scale networks. To the best knowledge of the
are two constants, which depend on the communication envi- authors, there is no available tool for analyzing statistical QoS
ronment, such as suburban, urban, dense urban, and highrise in large-scale networks.
urban. The typical values of φe and ψe can be found in [30].
In practice, the packet loss probability depends not only on D. Summary of Analysis Tools
whether there is a LoS path but also on the shadowing of The analysis tools enable us to evaluate the performance of
a wireless channel. To characterize shadowing correlation, S. wireless networks without extensive simulations, and serve as
Szyszkowicz et al. established useful shadowing correlation the key building blocks of optimization frameworks. Yet, there
models in [181]. Even with these models, deriving the packet are some open issues:
loss probability in URLLC is very challenging. To overcome • Each of these tools is used to analyze the performance
this difficulty, a numerical method for evaluating packet loss of one part of a wireless network. A whole picture of
probability was proposed by D. Öhmann et al. [182]. More the system is needed for E2E optimization, which is
recently, C. She et al. analyzed the impact of shadowing analytically intractable in most of the cases.
correlation on the network availability, where a packet is • To apply these tools, we need simplified models and
transmitted via a cellular link and a Device-to-Device (D2D) ideal assumptions. The impacts of model mismatch on the
link [183]. performance of URLLC remain unclear. For example, if
Furthermore, the correlation of multiple subchannels in Rayleigh fading (without LoS path between transmitters
MIMO channels and frequency-selective channels also affects and receivers) is adopted in analysis, one can obtain an
micro-diversity gains. The achievable rate of massive MIMO upper bound of the latency or the packet loss probability
systems with spatial correlation was investigated in [184], (The LoS component exists in practical communication
where the authors maximized the data rate over correlated scenarios). However, such a model will lead to conserva-
channels. The reliability over frequency-selective Rayleigh tive resource allocation.
fading channels was considered in [175], where an approxi- • The tools in Tables II, III, and IV rely on simplified
mation of the outage probability over correlated channels was channel and queueing models. For more complicated and
derived. Considering that there are very few results on the practical models, we can hardly derive the tail probability
reliability over correlated channels, how to guarantee QoS of delay and the packet loss probability.
requirements of URLLC over correlated channels deserves To address the above issues, we will illustrate how to optimize
further research. the whole network with cross-layer design in Section V,
2) Stochastic Geometry for Delay Analysis in Large-Scale introduce deep transfer learning to handle model mismatch
Networks: Most of the existing publications on delay analysis in Section VI, and discuss model-free approaches in Sections
consider the systems with small scale and deterministic net- VII and VIII.
work topologies. However, in practical systems, the locations
of a large number of users are stochastic. Users can either V. C ROSS -L AYER D ESIGN
communicate with each other via D2D links or communicate By decomposing communication systems into seven layers
with BSs via cellular links. Caused by dynamic traffic loads, [14], we can develop practical, but suboptimal solutions for
14
Physical Layer Solutions Link Layer Solutions Network Layer Solutions

Short Frame Structure Retransmission and Repetition MEC
Short Channel Codes DL Schedulers Anticipatory Networking
Massive MIMO, mmWave UL Grant-free Access SDN and Network Slicing
… … ….
Selecting solutions from each layer to obtain cross-layer solutions

Polar codes Resource reservation in UL/DL MEC
… … ….
Yes
Does the combination work? Improving resource utilization efficiency
No
Try other combinations
Are there any new issues from the cross-layer perspective?

Yes
How to address these issues? What’s the performance limit? How to approach it?
Fig. 7. From single-layer design to cross-layer design.
URLLC in each layer. To reflect the interactions among Following this fundamental result, M. J. Neely extended
different layers, and to optimize the delay and reliability of the the power-delay tradeoff into multi-user scenarios [143] and
whole system, we turn to cross-layer optimization. Although proposed a packet dropping mechanism that can exceed the
cross-layer design in wireless networks has been studied in Pareto optimal power-delay tradeoff [198]. Considering that
the existing literature, the solutions for URLLC are limited. the average delay metric is not suitable for real-time services,
The most recent works on cross-layer design for URLLC are such as video and audio, an optimal power control policy
summarized in Table V. that maximizes the throughput subject to the statistical QoS
requirement was derived by J. Tang et al. in [171]. The result
in [171] shows that when the required delay bound goes to
A. Design Principles and Fundamental Results infinite, the optimal power control policy converges to the
water-filling policy in [24]. When the required delay bound
1) Design Principles: The design principles in the cross- approaches the channel coherence time, the optimal power
layer design are shown in Fig. 7. A straightforward approach control policy converges to the channel inverse.
is selecting technologies from each layer, and see whether this
combination works. If the combination can guarantee the QoS
requirement of URLLC, then we can further optimize resource B. Physical- and Link-Layer Optimization
allocation to improve the resource utilization efficiency. Oth- A diagram of cross-layer optimization for physical and link
erwise, we need to try some other combinations. Meanwhile, layers is shown in Fig. 8. Since the E2E delay requirement
we may identify new issues from a cross-layer perspective. By of URLLC is around 1 ms, the required delay bound is
cross-layer optimization, we aim to find out the performance shorter than the channel coherence time [199]. To guarantee
limit and show how to approach the performance limit. a queueing delay that is shorter than the channel coherence
2) Fundamental Results: R. Berry et al. investigated how time, the power control policy should be the channel inverse
to guarantee average delay constraints over fading channels [171]. For typical wireless channels, such as Rayleigh fading
[142]. They obtained the Pareto optimal power-delay tradeoff, or Nakagami-m fading, the required transmit power of the
i.e., the average transmit power decreases with the average channel inverse is unbounded.
queueing delay according to 1/D̄2 in the large delay regime. The issue observed from cross-layer design is similar to the
To achieve the optimal tradeoff, a power control policy should outage probability defined in [58]. Different from the physical-
take both Channel State Information (CSI) and Queue State layer analysis, outage probability does not equal to the packet
Information (QSI) into account. If we apply any power control loss probability in cross-layer design. As shown in [16], by
policy that does not depend on QSI, such as the water- dropping some packets proactively, it is possible to reduce
filling policy [24], then the average transmit power decreases the overall packet loss probability. Such an idea was inspired
with the delay according to 1/D̄. In other words, directly by the result in [198]. Denote the proactive packet dropping
combining the water-filling policy (i.e., the optimal policy that probability by εp . By optimizing decoding error probability,
maximizes the average throughput in the physical layer) with a queueing delay violation probability, and proactive packet
queueing system cannot achieve the optimal tradeoff between dropping probability, we can minimize the required maximal
the average transmit power and the average queueing delay. transmit power at the cost of high computational complexity.
15
TABLE V
C ROSS - LAYER OPTIMIZATION FOR URLLC
Design Objective Physical-layer issues Link-layer issues Network-layer issues Challenges/Results

Power control opti- Normal approximation over Ensuring the statistical QoS with N/A No closed-form expression
mization [173] AWGN channel effective capacity in the short blocklength
regime
Throughput analysis Normal approximation in cog- Ensuring the statistical QoS with N/A No closed-form result in
[188] nitive radio systems effective capacity general
Delay analysis Normal approximation over Analyzing the statistical QoS with N/A No closed-form result in
[155] quasi-static fading channel network calculus general
Packet scheduling Normal approximation over DL packet scheduling before a hard N/A An online algorithm with
[15] quasi-static fading channel deadline small performance losses
Throughput analysis Throughput of relay systems Ensuring the statistical QoS with N/A No closed-form result in
[174] the in short blocklength regime effective capacity general
Power minimization Normal approximation over Ensuring the statistical QoS with N/A Near-optimal solutions of the
[16] quasi-static fading channel effective bandwidth non-convex problem
Improving the N/A Improving the tradeoff between Packet cloning and split- Intractable in large-scale net-
reliability-SE trade- spectrum efficiency and reliability ting on multiple commu- work
off [177] with network coding nication interfaces
Improving reliabil- Decoding errors in short block- Scheduler design under a hard N/A Reducing outage probability
ity [189] length regime deadline constraint with combination strategies
Bandwidth Achievable rate in short block- E2E optimization (i.e., UL/DL N/A Near-optimal solutions of the
minimization length regime transmission delay and queueing non-convex problem
[190] delay)
Power allocation Maximizing the sum through- Ensuring statistical QoS for differ- N/A A sub-optimal algorithm for
[17] put of multiple users in short ent kinds of packet arrivals solving the non-convex prob-
blocklength regime lem
Bandwidth Normal approximation over Burstiness aware resource reser- N/A Reducing 40% bandwidth by
minimization quasi-static fading channel vation under statistical QoS con- traffic state prediction
[191] straints
Availability analysis Normal approximation over Different transmission protocols Improving network An accurate approximation
[183] quasi-static fading channel via cellular links and D2D links availability with multi- of network availability
connectivity
Coding-queueing Variable-length-stop-feedback Retransmission under latency and N/A Accurate approximations of
analyses [159] codes peak-age violation guarantees target tail probabilities
Coded Tiny codes with just 2 packets Automatic Repeat Request (ARQ) N/A Improving 40% throughput
retransmission protocols with delayed and unreli- with latency and reliability
design [192] able feedback guarantees
Optimizing resource Ensuring transmission delay in N/A Network slicing for Searching near-optimal solu-
allocation [18] the short blocklength regime eMBB and URLLC in tions of mixed-integer pro-
cloud-RAN gramming
Delay analysis Multi-user MISO with imper- Analyzing statistical QoS with net- N/A No closed-form solution in
[193] fect CSI and finite blocklength work calculus general
coding
Delay, reliability, Modulation and coding scheme Incremental redundancy-hybrid N/A No closed-form solution in
throughput analysis ARQ over correlated Rayleigh general
[194] fading channel
Resource allocation Modulation and coding scheme Individual resource reservation or N/A Analytical expressions of re-
[195] contention-based resource sharing liability
Spectrum and Interference management for Ensuring statistical QoS with effec- N/A Solving subproblems on
power allocation vehicle-to-vehicle and vehicle- tive capacity spectrum/power allocation
[19] to-infrastructure links in polynomial time
Distortion Joint source and channel codes N/A Improving reliability over Tight approximations and
minimization parallel AWGN channels bounds on distortion level,
[196] but not in closed form
Network CoMP communications E2E design of 5G networks for A novel architecture for Around 2 ms round-trip de-
architecture design industrial factory automation TSN lay in their prototype
[197]
Offloading Normal approximation over Delay analysis in processor-sharing User association & task Closed-form approximations
optimization [91] quasi-static fading channel servers with both short and long offloading of mission- of delay and low-complexity
packets critical IoT in MEC offloading algorithms
16
Physical layer: Link layer: of devices in smart factory can be very large, the individual
Channel coherence time > 1ms Queueing delay < 1ms resource reservation scheme and the contention-based resource
sharing scheme were considered in [195].
Optimal power control:
Channel inverse
C. Physical-, Link-, and Network-Layer Optimization
Issues: The cross-layer models will be very complicated if more
Unbounded transmit power; than two layers are considered. Thus, joint optimization of
Outage probability cannot characterize link-layer packet losses
physical, link, and network layers is very challenging. Al-
though a few papers optimize network management from a
Solution:
Proactive packet dropping cross-layer approach, some simplified models on the physical
and link layers are used in these works [18, 177, 197].
Performance limit: Approach the limit: When optimizing packet splitting on multiple communica-
Optimizing Setting
ec ,eq ,ep max
e c = e q = e p = e tot /3 tion interfaces, the authors of [177] assumed that the distri-
bution of latency (i.e., the latency-reliability function) of each
Optimizing the packet loss probabilities are unnecessary interface is available. With the simplified link-layer model,
coded packets are distributed through multiple interfaces. As
Do we need to optimize delay components? in [177], joint source and channel codes over parallel channels
were developed in [197], where simplified AWGN channels
Yes. Why? with independent fading are considered in the analysis. To
optimize network slicing for eMBB and URLLC in cloud-
R and EB are insensitive to packet loss probabilities, RAN, the resources that are required to satisfy the data rate
but are sensitive to delay components
constraint of eMBB and the transmission delay constraint
of URLLC were derived in [18]. The latency in the link-
Fig. 8. Diagram of physical- and link-layer optimization [16, 69]. layer was not considered in the framework in [18]. Otherwise,
the problem will become intractable. Resource allocation in
To avoid complicated optimization, a straightforward solution vehicle networks was optimized to maximize the throughput of
is setting εc = εq = εp = εmax URLLC in [200], where Shannon’s capacity is used to simplify
tot /3. The results in [16]
show that the performance loss (i.e., required transmit power) the complicated achievable rate over interference channel and
without optimization is less than 5 % when the number of the average packet loss probability and average latency are
antennas is larger than eight. In other words, setting the packet analyzed in the queuing system for analysis tractability.
loss probabilities as equal is near-optimal, and there is no need To validate the E2E design in a time-sensitive network archi-
to optimize the values of εc , εq , and εp . tecture with CoMP for URLLC, issues in the three lower layers
From the above-mentioned conclusion, it is natural to raise were considered in [197], where most of the conclusions are
the following question: Is it necessary to optimize the delay obtained via simulation. Since the models are too complicated,
components subject to the E2E delay? The results in [190] one can hardly obtain any closed-form result. To gain some
show that by optimizing UL/DL delays and queueing delay useful insights from theoretical analysis, the authors of [91]
subject to the E2E delay constraint, a large amount of band- analyzed the E2E delay in an MEC system, where the UL and
width can be saved. This is because the achievable rate in the DL transmission delays and the processing delay at the servers
short blocklength regime is insensitive to the decoding error are considered. In addition, the packet loss probabilities caused
probability, but very sensitive to transmission delay. Therefore, by decoding errors and queueing delay violations were also
optimizing the delay components subject to the E2E delay derived. Based on these physical- and link-layer models, user
requirement is necessary for maximizing resource utilization association and computing offloading of URLLC service were
efficiencies of URLLC. optimized in a multi-cell MEC system, where eMBB services
are considered as background services (i.e., not optimized).
Apart from the above solutions, some other cross-layer
solutions for physical and link layers have been developed
in URLLC recently. By considering the hard deadline metric, D. From Lower Layers to Higher Layers
an energy-efficient scheduling policy for short packet trans- The E2E performance of URLLC depends on all the
mission was proposed in [15], and delay bound violation seven layers in the OSI model [201]. By considering more
probabilities of different schedulers were evaluated in [189]. layers in communication system design, better performance
How to guarantee statistical QoS in the short blocklength can be achieved. However, most of the working groups in
regime has been studied in UL Tactile Internet [191], DL standardization mainly focus on one or two layers [74]. For
multi-user scenarios with Markovian sources [17], point-to- example, the Institute of Electronics and Electrical Engineers
point communications with imperfect CSI [193], and vehicle (IEEE) and the Internet Engineering Task Force (IETF) have
networks with strong interferences between vehicle-to-vehicle formed some working groups on the standardization of TSN
and vehicle-to-infrastructure links [19]. Coded ARQ and in- [202]. The IEEE 802.1 working group mainly focuses on
cremental redundancy-hybrid ARQ schemes were developed the physical and link layers, while the deterministic network
in [192] and [194], respectively. Considering that the density working group of IETF focuses on the network layer and
17
higher layers [74]. By including more protocol layers in one the algorithms in this table are not designed for URLLC, they
design framework, the problem will become more compli- can be applied in the prediction and communication co-design
cated, but the obtained policy has the potential to achieve a framework for URLLC in [191].
better performance. Essentially, there is a tradeoff between
complexity and performance. To move from the three lower
layers to higher layers, novel technologies that can capture
A. Basics of DNNs
complicated features and can be implemented in real-time are
in urgent need. For example, driven by the requirement from In practice, a complicated DNN is usually made up of some
the application layer, the freshness of the knowledge becomes basic structures, such as feed-forward neural networks (FNNs),
one of the KPIs in remote monitoring/control systems [47]. To recurrent neural networks (RNNs), and convolutional neural
achieve a better tradeoff between latency and AoI, the authors networks (CNNs) in Fig. [225]. How to build a DNN for
of [159] developed variable-length-stop-feedback codes that a specific problem depends on the structure of data. In this
outperform the simple automatic repetition request in terms section, we take FNNs as an example to introduce some basics
of the delay violation probability. In addition, their results of DNNs. The motivation is to provide a quick review of DNN.
indicated that by changing the service order of a queueing In general, FNNs work well with small-scale problems. For
system, it is possible a smaller peak-age violation probability. an in-depth introduction of different kinds of DNNs, please
refer to [20].
E. Summary of Cross-layer Optimization
From a cross-layer optimization framework, one can dis-
…
…
cover new research challenges in URLLC. By solving cross-
layer optimization problems, the performance limits on la-
…
…
…
tency, reliability, and resource utilization efficiency can be Input Output
…
obtained. In addition, the optimal policies provide some useful
…
insights on how to find near-optimal solutions and how to
approach the performance limits. Some open issues of cross-
layer optimization are summarized as follows: (a) FNN
• The problem formulation relies on the analysis tools in Output y t -1 yt y t +1
Section IV. Thus, some open issues of the analysis tools, VR VR VR
such as model mismatch, also exist in the optimization WR WR WR
frameworks. st -1 st st +1 WR
• The optimization problems are NP-hard in most cases.
U U U
The optimal policies or fast optimization algorithms are
Input xt -1 xt xt +1 Time (frame)
developed on a case by case basis. A general methodol-
ogy for designing fast algorithms is missing. (b) RNN
• The results obtained in the existing literature (See Table
V) only considered two or three bottom layers. Thus,
these frameworks cannot address the technical problems bC
*
in the upper layers. If the issues in the upper layers are
taken into account, the optimization problems could be
WC
intractable. Output Y
We will discuss how to take advantage of existing theoretical Input X
knowledge and how to transfer the knowledge to practical (c) A single-layer CNN
scenarios in the rest part of this paper.
Fig. 9. Different kinds of neural networks [225].
VI. S UPERVISED D EEP L EARNING FOR URLLC
1) Basics of FNN: An FNN includes an input layer, an
Deep learning is a family of machine learning techniques output layer, and NL hidden layers [225]. The output of the
based on DNNs. Depending on the training methods, it can [n]
n-th layer is a column vector x[n] ∈ RM ×1 , where n =
be categorized as supervised, unsupervised, and reinforcement 0, 1, ..., NL + 1 and M [n] is the number of neurons in the n-th
learning. Supervised deep learning has been applied in dif- layer. For the input layer, n = 0. For the output layer, n =
ferent fields in computer science, including computer vision NL + 1. The relation between x[n] and x[n−1] is determined
[203], speech recognition [204], natural language processing by the following expression,
[205], and so on. Recently, supervised deep learning algo-
[n]
rithms were also adopted in wireless networks [206]. The ex- x[n] = fA W[n] x[n−1] + b[n] ,
isting publications that used supervised deep learning for chan-
[n] [n−1] [n]
nel esimation/prediction, traffic prediction, mobility/trajectory where W[n] ∈ RM ×M and b[n] ∈ RM ×1 are the
[n]
prediction, QoS/Quality-of-Experience (QoE) prediction, and weights and bias of the n-th layer, and fA (·) is the activation
policy approximation are summarized in Table VI. Although function of the n-th layer, and it is an element-wise operation.
18
TABLE VI
S UPERVISED DEEP LEARNING IN WIRELESS NETWORKS
Channel Estimation or Prediction

Reference Summary Accuracy Complexity System Performance
[207] Proposing a multiple-rate compressive sensing deep learning Relative error: below Linear in the size of Not available
framework for CSI feedback in massive MIMO systems 0.01 the channel matrix
[208] Employing CNN for wideband channel estimation in Relative error: below Lower than Minimum Not available
mmWave massive MIMO systems 0.01 Mean Square Error
(MMSE)
[209] Applying Long Short Term Memory (LSTM) to make Relative error: below Not available Outage probability:
channel predictions in body area networks 0.05 1 ∼ 10%
[210] Predicting future signal strength with LSTM and gated Better than linear regres- Not available Not available
recurrent unit algorithms sion
Traffic Prediction
[211] Proposing a hybrid deep learning model for spatiotemporal Reduce 20 ∼ 40% pre- Not available Not available
traffic prediction in cellular networks diction error
[212] Applying RNN in spatiotemporal traffic prediction in cellu- The n-to-n RNN outper- Not available Not available
lar networks forms other schemes
[213] Comparing the performance of statistic learning and deep LSTM is better than Not available 45% decrease in de-
learning methods in user traffic prediction other schemes lay
[214] Developing deep learning architecture for traffic prediction 5 ∼ 20% improvements Not available Not available
and applying transfer learning among different scenarios in different datasets
Mobility or Trajectory Prediction
[215] Comparing LSTM, echo state network, and an optimal LSTM is better than Not available Not available
linear predictor, and Kalman filter, in trajectory prediction other schemes
[216] Comparing FNN, Extreme Gradient Boosting Trees (XG- XGBoost is better than Semi-Markov-based XGBoost performs
Boost), and two other benchmarks in location prediction other schemes (error model has the lowest best (80.68% energy
probability: < 10%) complexity saving)
[217] Applying RNN in unmanned aerial vehicle trajectory pre- Almost 98% prediction Not available Not available
diction accuracy
[218] Applying LSTM in head and body motion prediction to Prediction error: < 1o Not available Mismatched pixels:
reduce latency in VR applications and < 1 mm 1 ∼ 3%
QoS or QoE Prediction
[219] Proposing a combination of CNN, RNN, and Gaussian From 60% to 90% de- Not available Not available
process classifier for a binary QoE prediction pending on datasets
[220] Applying LSTM in QoS prediction for IoT applications LSTM outperforms Not available Not available
other schemes
[221] Predicting the processing time of radio and baseband net- DNN outperforms other Not available Not available
work functions with mathematical and deep learning models schemes
Approximating Optimal Policies
[222] Approximating Weighted Minimum Mean Square Error Up to 98% CPU time: 10% of Near-optimal in
(WMMSE) power control scheme with FNN WMMSE small-scale networks
[223] Using SE/EE as the loss function to train CNNs for power Not available Lower than FNN and Higher SE/EE than
control WMMSE WMMSE
[224] Employing an FNN to approximate the optimal predictive 70 ∼ 90% depending Lower than Interior- 97% of users with
resource allocation policy for delay-tolerant services on the density of users Point algorithm satisfactory QoS
given input according to ŷ = Φ x[0] ; Ω , where Ω ,

The widely used activation functions include the sigmoid
W[n] , b[n] , n = 0, ..., NL + 1 is the set of parameters of the

function,
1 FNN. Let us denote a labeled training sample as (x[0] , y). In
fA (x) = , (8) the training phase, a supervised deep learning algorithm opti-
1 + exp(x)
mizes the parameters of the FNN to minimize the expectation
tanh(x), the Rectified Linear Unit (ReLU) function, of the difference between outputs of the FNN and labels, i.e.,
fA (x) = max (x, 0) , (9) min Ex[0] (ŷ − y)
2
(10)
as well as its generalizations. where the expectation is taken over input data, x[0] .
Given an FNN, we can compute the output from any
19
2) Universal Approximation Theorem: In wireless net- both model-based and data-driven schemes were proposed in
works, an optimal policy maps the state of the system x[0] [191]. Since URLLC has a stringent requirement on reliability,
into the optimal action: u∗ (x[0] ). If we know some optimal prediction errors cannot be ignored. With the model-based
state-action pairs, but do not have the expression of u∗ (·), scheme, the authors of [191] analyzed the prediction error
then we can use an FNN, Φ (·; Ω), to approximate the policy. probability in the traffic state prediction. With the data-
The following theorem shows how accurate the FNN can be driven scheme, a better bandwidth utilization efficiency can
[66, 222]. be achieved, but the prediction error probability can only be
evaluated via experiments [191].
Theorem 1. Universal Approximation Theorem: Let u∗ (·) be
3) Mobility Prediction: To reveal the potential gain of mo-
a deterministic continuous function defined over a compact set
bility prediction in URLLC, a model-based method is applied
D, e.g., x[0] ∈ RK . Then, for any εNN > 0, there exists an
in the prediction and communication co-design framework in
FNN Φ (·; Ω) constructed by sigmoid activations in (8), such
[45]. By predicting the motion of the head and body with
that
LSTM, a predictive pre-rendering approach was proposed in
sup Φ x[0] ; Ω − u∗ (x[0] ) ≤ εNN . [218]. The results in [45] and [218] indicate that mobility
x[0] ∈D prediction can help reduce user-experienced latency in remote
The above theorem has been extended to other kinds of control and VR applications.
activation functions, including the ReLU function in (9) and its It is worth noting that prediction is not error-free. The
variants [226]. More recently, the results in [222] showed that accuracies of different algorithms in trajectory and location
to achieve a target accuracy, the size of network scales with predictions were evaluated in [215] and [216]. Compared
the approximation accuracy εNN according to O (ln(1/εNN )). with model-based methods, data-driven methods can achieve
better prediction accuracies. On the other hand, deriving the
performance achieved by data-driven methods is very difficult.
B. Predictions in URLLC
To obtain some insights on fundamental trade-offs, such as
As shown in Table VI, supervised deep learning has been the latency and reliability trade-off, we still need the help of
employed for channel prediction, mobility prediction, traffic models [45].
prediction, and QoS or QoE prediction. The reason why 4) QoS Prediction: If a control plane can predict the
supervised deep learning algorithms are very powerful in data plane QoS, it can update resource allocation to avoid
prediction is that it is very convenient to collect a large number QoS violation. For example, by predicting the delay caused
of labeled training samples from the history data, such as by baseband signal processing, a better offloading scheme
trajectories [45,218], channel and traffic variations [191,210], in could-RAN or MEC systems was developed in [221].
and QoS experienced by users [219]. According to the evaluation in [219, 220], supervised deep
1) Channel Prediction: Unreliable wireless channels lead learning outperforms other baselines in predicting QoE for
to a high packet loss probability in URLLC [16]. If the video services and QoS of IoT applications. Nevertheless,
transmitter can predict which subchannel will be in deep how to predict the QoS or QoE of URLLC services and how
fading, then packet losses over deep fading channels can be to optimize resource allocation for URLLC according to the
avoided. predicted information remain unclear.
By exploiting the temporal correlation of wireless channels,
different RNN variants were developed in [210] to predict
future signal strength variations. Wideband channels are cor- C. Approximating Optimal Policies
related in the frequency domain. Based on this feature, CNN With the model-based cross-layer optimization in Section V,
was used to predict channels for mmWave massive MIMO we can obtain optimal policies in wireless networks but with
systems [208]. Although these algorithms are data-driven, the high computing complexity. As a result, these policies can
understanding of wireless channel models plays an important hardly be implemented in real time. To handle this issue, FNNs
role. For example, the structures of neural networks depend on were used to approximate optimal policies in the existing
the features of wireless channels (e.g., correlations in temporal literature, such as [13, 222].
and frequency domains). To validate whether the learning 1) Function Approximator: To apply supervised deep learn-
algorithms work or not, the model-based methods serve as ing in URLLC, the analysis tools and optimization algorithms
benchmarks for performance comparison. discussed in Sections IV and V can be used to find labeled
2) Traffic Prediction: As shown in [211, 212, 214], pre- samples. Recently, the results in [222] showed that an FNN
dicting the traffic load in a wireless network is helpful for can be used to approximate an iterative algorithm, such as
network deployment. These works mainly focus on traffic WMMSE for power control. Based on the Universal Approxi-
prediction in a large area. However, to improve the latency, mation Theorem, it is possible to achieve high approximating
reliability, and resource utilization efficiency of URLLC, the accuracy. Inspired by these theoretical results, the authors
system needs to predict the traffic state of each user. Per-user of [223] further showed that by using a CNN model, the
traffic prediction was studied in [213], the results indicate that required number of weights to achieve the same performance
LSTM outperforms statistical learning for both short and long- is much smaller than that in an FNN model. DNNs can be
term future predictions. To improve the bandwidth utilization also applied to approximate other optimal policies such as
efficiency subject to the latency and reliability requirements, bandwidth allocation and user association [12, 224].
20
2) Deep Transfer Learning: To implement supervised deep system further fine-tunes DNNs in non-stationary networks.
learning in dynamic wireless networks, the DNN needs to In this way, theoretical knowledge help to initialize DNNs
adapt to non-stationary environments and diverse QoS re- before online implementation. Deep transfer learning handles
quirements with a few training samples [11]. To achieve this, the model-mismatch problem with the help of few training
network-based deep transfer learning is an optional approach samples in new environments.
[213]. To illustrate the concept of deep transfer learning, Although supervised deep learning shows great potential in
we consider two FNNs, Φ0 (·; Ω0 ) and Φ1 (·; Ω1 ). The first prediction and function approximation, there are some open
FNN is well-trained on a data set with a large number problems when using it in URLLC:
[0]
of labeled samples, S0 = {(xnt , ynt ), nt = 1, 2, ..., Ntr }. • The prediction or approximation accuracy of DNNs will
Now we train a new FNN, Φ1 (·; Ω1 ), from another data set have impacts on the QoS of URLLC. The relative errors
[0]
S1 = {(x̃nf , ỹnf ), nt = 1, 2, ..., Ntf }, where the number of achieved by most of the existing supervised learning algo-
training samples in S1 is much smaller than that in S0 , i.e., rithms in Table VI is around 0.01, which is good enough
Ntf ≪ Ntr . Since the number of labeled samples in S1 is for traditional services, but far from the requirement of
small, we can hardly train Φ1 (·; Ω1 ) from randomly initialized URLLC.
parameters. If the data sets S0 and S1 are related to each other, • In some NP-hard problems, labeled training samples are
then we can initialize the new FNN with Ω0 and update it by unavailable. For these problems, supervised deep learning
using new labeled samples in S1 . is not applicable.
• The number of users and the types of services in a
wireless network are dynamic. When features and the
Source task …
dimension of the input data change, deep transfer learning
…
…
…
…
cannot be applied directly.

…
Inputs Outputs
…
In the coming two sections, we will review unsupervised deep

…
…
…
learning and DRL that do not need labeled training samples.
New task … VII. U NSUPERVISED D EEP L EARNING FOR URLLC

…
…
…
Unlike supervised deep learning, unsupervised deep learn-

…
…
Inputs Outputs
…
ing can find the optimal policy without labeled training sam-
…
…
…
ples. In this subsection, we review two branches of existing

Back-propagation
works that applied unsupervised deep learning in wireless
communications, 1) autoencoder for physical-layer commu-
New task … nications and 2) FNN trained without labels for resource
…
…
allocation. The existing works are summarized in Table VII,
…
…
…
Inputs Outputs
…
where the basic idea is to formulate a functional optimization

…
…
…
problem and then approximate the optimal function with a

…
Back-propagation DNN trained toward a loss function that represents the goal
of the problem .
Layers in Fine-tuned layers
… …
…
…
the source task for the new task

A. Introduction of Functional Optimization
Fig. 10. Illustration of network-based deep transfer learning [11, 13].
A lot of problems in communications can be described by
As illustrated in Fig. 10, Ω1 in the new FNN is initialized a mapping from an input message (or system state), s, to an
with Ω0 . In the training phase, the first three layers of Ω1 output signal (or action), as , i.e.,
remain constant, and the back-propagation stops in the fourth
as = u(s).
layer. Then, fine-tuning is used to refine the weights and bias
of the last two layers on the new training data set with small For example, a modulation and coding scheme maps the input
learning rates [227]. Alternatively, we can fine-tune all the bits stream to coding blocks. A scheduler maps QSI and CSI
parameters of the FNN. The results in [13] show that the to resource blocks allocated to different users.
latter approach requires more training samples and longer 1) Variable Optimization Problems: A regular variable op-
training time, but does not bring significant performance gain. timization problem can be formulated in a general form as
To determine how many layers should be updated, we need to follows,
try different options and find the best one by trial-and-error
[13]. min
s
F (s, as ) (11)
a
s
s.t. Gj (s, a ) ≤ 0, j = 1, ..., J, (11a)
D. Summary of Supervised Deep Learning where F (s, as ) is the objective function, (11a) are the con-
With existing optimization algorithms, a large number of straints on available resources or QoS.
labeled training samples can be obtained off-line for super- By solving problem (11), we can obtain the optimal solution
vised deep learning. After the off-line training phase, the as ∗ for a given input message or system state s. When the input
21
TABLE VII
U NSUPERVISED DEEP LEARNING IN WIRELESS NETWORKS
Part I: Autoencoder for Physical-layer Communications (Unconstrained Functional Optimization)

Year Reference Summary Performance
2018 [61] An autoencoder was used to learn a better constellation The BLER gap between fine-tuned Autoencoder and
an existing constellation is 1 dB
2018 [228] An autoencoder was applied for E2E learning in OFDM systems Autoencoder achieves better BLER than Quadrature
for reliable communications over multipath channels Phase Shift Keying (QPSK)
2018 [229] A joint source and channel coding scheme was proposed to The learning-based method outperforms an existing
minimize the E2E distortion separate source and channel coding scheme
2019 [230] An autoencoder was used for multiuser waveform design at the The learning-based method outperforms linear
transmitter side MMSE in terms of lower BLER
Part II: Primal-dual Method for Radio Resource Allocation (Constrained Functional Optimization)
Year Reference Summary Performance
2019 [63] The primal-dual method is used to train an FNN to find the optimal The sum-capacity of the learning-based method is
resource allocation subject to statistic constraints close to WMMSE
2019 [120] Applying the primal-dual method to train DNNs for distributed Distributed DNNs can achieve near-optimal sum-
wireless resource management problems capacity
2019 [25] Applying the primal-dual method to train a DNN for resource The QoS of URLLC can be satisfied by reserving
allocation subject to the QoS constraints of URLLC 1% of extra bandwidth
2019 [231] Developing a model-free primal-dual algorithm for learning opti- The sum-capacity achieved by the model-free ap-
mal ergodic resource allocations proach is close to WMMSE
2020 [232] Developing a reinforced-unsupervised learning framework for the Unsupervised learning outperforms supervised learn-
model-free optimization problems ing when the number of training samples is large
2020 [233] Training a Graph Neural Network (GNN) to find the optimal GNN outperforms WMMSE in large-scale networks
resource allocation in large-scale wireless networks
message or the state varies, the system needs to update as∗ by the first-order necessary conditions, we can derive that the
solving problem (11) numerically, unless the optimal solution optimal power control policy is the water-filling policy. Com-
can be derived in a closed-form expression with respect to pared with this physical-layer optimization problem, cross-
s. However, in most of the cases, we can hardly derive layer optimization problems are much more complicated and
the closed-form expression, and optimization algorithms for the closed-form expressions of optimal policies are usually
solving problem (11) are too complicated to be implemented unavailable. For example, C. She et al. investigated how to
in real-time. adjust transmit power and subcarrier allocation according to
2) Functional Optimization Problems: As proved in [25], QSI in MIMO-Orthogonal Frequency Division Multiplexing
variable optimization problem (11) is equivalent to the follow- (OFDM) systems. The problem was formulated as a functional
ing functional optimization problem, optimization problem, and a closed-form solution can only be
obtained for a specific packet arrival process in an asymptotic
min Es [F (s, u(s))] (12) case: the number of antennas at the BS goes to infinite [166].
u(·)
s.t. Gj (s, u(s)) ≤ 0, ∀s ∈ Ds , j = 1, ..., J. (12a)

When there is no closed-form solution, we need numerical
In problem (12), the elements to be optimized are resource methods to find near-optimal solutions. There are several nu-
allocation or scheduler functions with respect to the states merical methods for solving functional optimization problems,
of users. By solving the above problem, we can obtain the such as the finite element method (FEM) [236]. The basic
expression of the optimal policy. However, solving a func- idea of the FEM is to parameterize u(s) with a finite sampled
tional optimization problem is more challenging than solving points, and approximate the optimal solution by optimizing
variable optimization problems [42, 234]. This is because the the values on the sampled points. However, the number of
optimization variables are functions, each of which is equiva- the sampled points increases exponentially with the dimension
lent to an infinite-dimensional vector (e.g., Fourier series with of s. In addition, these numerical methods are usually used
infinite terms). Moreover, each constraint in (12a) is equivalent to solve problems formulated in the three-dimensional space,
to an infinite number of constraints in variable optimization and heavily rely on the theorems in dynamics [237]. Whether
problems (constraints for any given value of s). they can work well with high-dimensional communication
3) Methods for Solving Functional Optimization Problems: problems remains unclear.
To derive the closed-form expression of a functional optimiza-
tion problem, we can find the solutions that satisfy the first- Inspired by the Universal Approximation Theorem in [66,
order necessary conditions in [235]. For example, optimizing 226], we can use a DNN to approximate the optimal policy
the instantaneous power control policy to maximize the er- u∗ (·) that maps the state s to the optimal solution as∗ and find
godic capacity under the maximal transmit power constraint the parameters of the DNN with unsupervised deep learning
belongs to functional optimization problems [24]. By solving methods.
22
B. Autoencoder for E2E Design in Physical-Layer Communi- channel model and then implemented in real hardware for
cations over-the-air transmissions [61]. To compensate for the mis-
Autoencoder is a useful tool for data compression and match between the model and the real-world channel, the
feature extraction [238]. The basic idea is to train DNNs with system can fine-tune the receiver part of the autoencoder.
the input data. An autoencoder consists of three components: As summarized in Part I of Table VII, the above method
an encoder, a decoder, and a distortion function that compares has been implemented in several communication systems.
the input and the output of the autoencoder. Since the goal is to minimize the difference between the
input data and output data, the autoencoder is suitable for
Encoder Decoder URLLC. For example, the results in [230] show that the
xa ya x̂a scheme obtained with the autoencoder can achieve much lower
Input Code Output BLER than the linear minimum mean-square error MIMO
receiver. In addition, autoencoders inherently learn how to
(a) Autoencoder deal with hardware impairments [61, 228]. In other words,
they do not rely on the accuracy of models in communication
Transmitter Receiver
Channel systems. Furthermore, after the training phase, the outputs of
xc yc ŷ c x̂c the encoder and the decoder, represented by DNNs, can be
Message Symbols Symbols Message computed from the forward propagation algorithm, which has
much lower complexity than existing modulation and coding
(b) Physical-layer communications schemes.
Fig. 11. Autoencoder for physical-layer communications [61, 238].
C. Optimizing Resource Allocation
As shown in Fig. 11(a), the input and the output of the
a a
autoencoder are xa ∈ RNx ×1 and x̂aa ∈ RNx ×1 , respectively. 1) Model-based learning: In the case that the objective
The compressed code is ya ∈ RNy ×1 , where Nya < Nxa . function and constraints can be derived with the model-based
Denote the encoder and the decoder by fe (·) and fd (·), method in Section IV, a resource allocation problem can be
respectively. The autoencoder problem is to find fe (·) and formulated in a general form as follows,
fd (·) that minimize the average of the distortion function min Es [F (s, u(s))] (14)
[238], u(·)
2 s.t. Es [Gave (s, u(s))] ≤ 0, i = 1, ..., I, (14a)

min E(x̂a − xa ) (13) i
fe (·),fd (·) Gins
j (s, u(s)) ≤ 0, ∀s ∈ Ds , j = 1, ..., J, (14b)
a a 2 u(s) ∈ Da .
where the average is taken over the input, and (x̂ − x )
is the distortion function. Problem (13) is an unconstrained
In addition to the instantaneous constraints in problem (11),
functional optimization problem [234]. To solve the functional
long-term statistic constraints in (14a) are considered. The
optimization problem, a widely applied approach is approxi-
statistic constraints can be the average transmit power con-
mating the encoder and the decoder with two DNNs [239].
straint [24] or the statistical QoS constraint [171]. The instan-
The components of physical-layer communications are
taneous constraints ensure instantaneous requirements, such
shown in Fig. 11(b), which are very similar to that of an
as the maximal transmit power constraint. Here, the objective
autoencoder. The transmitter sends message xc to the receiver
function and the constraints are differentiable but not necessary
over a certain channel. First, the transmitter encodes the
to be convex.
message into a coding block that consists of multiple symbols,
The primal-dual method for solving functional optimization
yc , such as M -QAM symbols. Then, the block is sent over
problems has been extensively used in optimal control theory
a channel. Owning to the noise and interference, the received
[240]. To apply the method, we first define the Lagrangian
symbols, denoted by ŷc , are different from yc . Based on the
of problem (14) in Fig. 12, where the Lagrange multipliers,
received symbols, the receiver tries to decode the block and
λG G
i ≥ 0 and vj (s) ≥ 0, ∀s ∈ Ds , are the weighing coefficients
recover the message x̂c . A communication problem is to find
of the constraints. To solve problem (14), we turn to solve the
the optimal encoder and decoder that minimize the BLER, i.e.,
following problem:
min E{1 (x̂a 6= xa )},
fe (·),fd (·) max min Lopt , (15)
λG
i ,i=1,...,I
u(s)
where the average is taken over input messages and stochastic vjG (s),j=1,...,J
channels, and 1 (x̂a 6= xa ) is an indicator of event x̂a 6= xa .
s.t. λG G
i ≥ 0, vj (s) ≥ 0, ∀s ∈ Ds . (15a)
If x̂a 6= xa happens, then the indicator equals to 1. Otherwise,
it equals to 0. where the policy u (s) and the Lagrangian multipliers are
The encoder and the decoder of the communication system optimized in the primal and dual domains, iteratively.
can be approximated by two DNNs [61, 229]. To allow for To obtain u(s) and Lagrange multipliers numerically, two
back-propagation, a channel should be approximated by a neural networks, Φu (s; Ωu ) and Φv (s; Ωv ), are used to ap-
DNN [229]. Considering that the real-world channel is not proximate u(s) and vG (s), respectively, where vG (s) ,
available, the autoencoder is first trained with a stochastic [v1G (s), ..., vJG (s)]T . Then, by using the Lagrangian in Fig. 12
23
Policy (optimized in the primal domain)

Loss function
Theoretical
knowledge
Weighting coefficients of constraints (optimized in the dual domain)
Fig. 12. Definition of Lagrangian [67].
as the loss function, we can update between primal variables • When applying model-based unsupervised deep learning
Ωu , and dual variables λG v
i and Ω iteratively with the stochas- in functional optimization, the objective function and
tic gradient descent algorithm. As will be shown in the case constraints should be differentiable. Thus, it cannot be
studies, by integrating the theoretical formulations of objective applied to integer or mixed-integer programming directly.
function and constraints into the loss function, unsupervised • Like supervised deep learning, how to update the neural
deep learning outperforms supervised deep learning in terms network according to dynamic features, such as the
of approximation accuracy. dimension of the input data, needs further investigation.
For the applications in Table I, communication systems In the next section, we will illustrate DRL that can optimize
need to satisfy multiple conflicting KPIs, which are either both continuous and discrete variables.
considered in the constraints or formulated as the objective
function. A straightforward approach is to minimize/maximize VIII. D EEP R EINFORCEMENT L EARNING FOR URLLC
a weighted sum of different KPIs by setting the weighting DRL was widely applied in optimal control of stochastic
coefficients manually. From the definition of Lagrangian in processes. The most important feature of DRL is that it learns
Fig. 12, we can see that with different weighting coefficients, from the feedback that evaluates the actions taken rather than
the optimal policies that minimize the Lagrangian are different. learns from correct actions. In this section, we first introduce
The primal-dual method is able to find the optimal weighting the basics in DRL and then summarize the existing studies
coefficients achieving the target balance between the objective that applied DRL in wireless networks. To guarantee the QoS
function and constraints. As such, the objective function is requirements in URLLC, we review a promising candidate:
minimized and the constraints are satisfied [120]. constrained DRL. Finally, we illustrate how to combine model-
2) Model-free learning: In the case that there is no theo- based tools with DRL in URLLC.
retical knowledge of the considered system, the expressions
of F , Gave
i , and Gj
ins
and the gradients or derivatives of
A. Introduction of DRL
these functions are unavailable. For an optimization problem
as in (14), one can train neural networks to approximate 1) Elements of Reinforcement Learning: In reinforcement
these functions according to the observed values of these learning, there is an agent (or multiple agents) that observes the
functions with the supervised deep learning method. Then, reward signal from the environment, takes actions according
the unknown gradients or derivatives can be estimated by the to a policy, and tries to maximize a value function. If a model
neural networks through backward propagation. The details of the environment is available, it can help improve learning
on the model-free unsupervised deep learning method can be efficiency.
found in [63, 231, 232, 241]. Policy: Let s(t) ∈ Ds and as (t) ∈ Da denote the state and
the action in the t-th time slot. A policy is a mapping from
states of the environment to actions an agent may take when
D. Summary of Unsupervised Deep Learning in those states. We denote a stochastic policy by π. When
Theoretical models can be used to initialize autoencoders the numbers of states and actions are finite, π consists of the
and formulate optimization problems. Since autoencoders are probabilities to perform different actions in every state.
trained from the input data, they do not require analytical mod- Reward signal: In each time slot, a real number, the reward
els of communication systems and can handle hardware im- signal, is sent to the agent from the environment. It defines
pairment. For analytically intractable optimization problems, the goal of the problem, and the agent aims to maximize the
an approximation of the optimal solution can be obtained by long term reward. The reward signal depends on the states
combining the formulas into the loss function of unsupervised and actions taken by the agent, but the agent cannot change
deep learning, i.e., Lagrangian. the function that generates the reward signal. We denote the
Although unsupervised deep learning shows above advan- reward signal in the t-th slot by r(s(t), as (t)), which can be
tages over supervised deep learning, it also suffers from some simplified as r(t).
critical issues. Value functions: In general, a reinforcement learning al-
gorithm aims to maximize the long-term return, such as the
• When applying autoencoders in physical-layer design, the accumulated discounted reward,
function that maps the input to the output of real-world ∞
channels is not available, and hence we cannot obtain the X
G(t) = γdi−t r(s(i), as (i)), (16)
gradient required to fine-tune the transmitter. i=t
24
where γd ∈ [0, 1) is the discounting factor. G(t) reflects the 4) Bellman Equation: In the framework of MDP,
long-term return that the agent will receive after the t-th slot Qπ (s (t) , as (t)) can be formally defined as follows,
in an infinite horizon. Similarly, we can also formulate the
π s s
discounted reward inPa finite horizon if the task ends after Q (s (t) , a (t)) = E G(t) s (t) , a (t) . (18)
finite steps, G(t) = ti=t max
γdi−t r(s(i), as (i)). This definition
is also referred to as partial return in [68]. The value of and the expectation in (18) is taken over
taking action as (t) in state s (t) under a policy π, denoted {r (s (i) , as (i)) , s (i + 1) , as (i + 1) , i ≥ t}, following
by Qπ (s (t) , as (t)), is the expected return starting from state policy π. We call the function in (18) the action-value
s (t), taking the action as (t), and thereafter following policy function for policy π.
π. Qπ (s (t) , as (t)) is also known as the state-action value By applying G(t) = r(s (t) , as (t)) + G(t + 1), we can
function. Similarly, the expected return of visiting state s (t) express the value function in recursive form, known as the
is defined as V π (s (t)), which is referred to as the state value Bellman equation [68]. To avoid complicated expectations,
function. we illustrate the Bellman equation with a deterministic policy,
which can be described as as (t) = u(s(t)). In this case, the
2) Exploitation and Exploration: The trade-off between value function in (18) can be simplified as follows,
exploitation and exploration is one of the challenges that raise
in reinforcement learning. By exploiting the experience, the Qu (s (t) , as (t))
agent can obtain a good reward that maximizes the estimated = E [r (s (t) , as (t)) + γd Qu (s (t + 1) , u (s (t + 1)))] ,
value function. To explore better actions, the agent needs to (19)
try some actions that are not optimal according to the previous
where the expectation is taken over r (s (t) , as (t)) and
experience. There are many sophisticated methods to balance
s (t + 1).
exploitation and exploration, such as ε-greedy methods [242],
using optimistic initial values [68], and the Bayesian sampling 5) Actor-Critic Algorithm: For a reinforcement learning
approach [243]. However, when using reinforcement learning problem, we aim to find the optimal policy that maximizes
in URLLC, we need to satisfy the QoS constraints during the value function in (19). It is well known that when
explorations. Thus, random explorations are not applicable. We the transition probabilities are known at the agent, then the
will discuss how to re-design exploration schemes for URLLC optimal policy can be obtained by dynamic programming
in Section VIII-E. [244]. However, due to the curse of dimensionality, dynamic
programming can hardly be implemented when the dimensions
3) Markov Decision Process (MDP): To apply reinforce- of observation/action spaces are high. To handle this issue,
ment learning in optimal control of a dynamic system, the the actor-critic algorithm was proposed in [245]. The basic
uncertainty of the system should be formulated in the frame- idea of the actor-critic algorithm is combining DNN with
work of MDP [68]. An MDP is a controlled stochastic reinforcement learning by approximating the policy and the
process defined by a state space, Ds , an action space, Da , value function with two DNNs [246]. For example, if the
a reward function, Ds × Da × Ds → R, that generates rewards policy is a deterministic and continuous function, the deep
signal according to the states and the action, and transition deterministic policy gradient algorithm can be applied [64].
probabilities, i.e.,
Pr[s(t + 1), r(t + 1)|s(t), a(t)] Actor (Policy)
= Pr[s(t + 1), r(t + 1)|s(0), a(0), r(0), ..., s(t), a(t), r(t)].
… …
… …
Take an action
…
(17)
…
The transition probabilities in (17) indicate that we can predict Environment

…
the state and reward in the (t + 1)-th slot based solely on the Real-world
state and action in the t-th time slot just as well as we know Update policy environment
the states and actions in all the previous slots. Such a property or
is referred to as Markov property or “memorylessness”.
…
Simulation
…
platform
… …
… …
In reinforcement learning, the transition probabilities of the

…
MDP are referred to as the “model” of the problem. This

“model” is definitely different from the models in wireless Update estimated value function
Critic (Value function)
communications. Depending on whether the transition proba-
bilities are available or not, reinforcement learning algorithms Observe a state
are classified into two categories: model-based and model-
Fig. 13. Illustration of actor-critic algorithm with FNNs [64, 68].
free [23, 68]. In this work, we only review the model-free
reinforcement learning that does not need the transition prob- An actor-critic algorithm with FNNs is illustrated in Fig. 13.
abilities, and use the term knowledge-assisted reinforcement Given the state in the current time slot, s(t), the actor takes
learning to represent another category of algorithms that action according to the output of the FNN. Then, the reward
exploit knowledge in wireless communications to improve the from the environment is sent to the actor and the critic
learning efficiency. for updating the policy and the approximation of the value
25
function, respectively. For a deterministic policy, an actor- 1) Reward Shaping: As discussed in [266], delayed reward
critic, model-free algorithm, was proposed in [64], where the remains one of the key challenges in many reinforcement
algorithm directly learn from the real-world environment. To learning problems. It means that an agent receives a delayed
avoid QoS violations during the training phase, the authors of reward after taking several steps. For example, when playing
[12] train the policy in a simulation platform that is built upon chess, the agent only receives a positive or negative reward by
theoretical knowledge and practical configurations. the end of a game.
Reward shaping is a technique to handle this issue by
modifying the reward function according to the design goal of
B. DRL in Wireless Networks a system [267]. With reward shaping, the instantaneous reward
in the t-th slot is redefined as
As summarized in Table VIII, DRL algorithms have been
applied in wireless networks for mobile networking [248–250], r̃ (s (t) , as (t)) = r (s (t) , as (t)) − Ψ (s (t)) + γd Ψ (s (t + 1)) ,
scheduler design [251–255], resource allocation [256,257,259, (20)
261–263], and routing [264, 265].
where Ψ (s (t)) is referred to as the potential function in [267].
Key Challenges: According to [253], research challenges It is a function that depends on the state and does not rely
when applying DRL in wireless networks lie in the following on the action. Such a function reflects the potential benefit
aspects, of visiting a state. For example, to meet the latency and the
• Non-Markovian Processes: In practical systems, optimal jitter requirements of URLLC, the Head-of-Line (HoL) delays
control problems may not be Markovian. For example, the of packets should lie in a small range, [Dmin , Dmax ]. The
network dynamics in the real-world scenarios in [253] agent will receive a positive reward only when a packet is
are not Markovian. However, to implement DRL, we transmitted within this range. Otherwise, r (s (t) , as (t)) = 0.
need to formulate the problem in the framework of MDP. To avoid zero rewards, we can design a potential function in
Otherwise, the Bellman equation does not hold. Fig. 14. If the HoL delay of a packet is lower than Dmin ,
• Safe Exploration: Exploration in the unknown environ- the potential function increases with the HoL delay. In this
ment may lead to bad feedback, especially for mission- region, the packet should not be transmitted in order to get a
critical IoT in factory automation or vehicle networks. positive r̃ (s (t) , as (t)). If the HoL delay is higher than Dmin ,
If an action cannot guarantee ultra-high reliability, then the potential function decreases with the HoL delay. In this
it will result in unexpected accidents. Thus, we need region, the packet should be transmitted as soon as possible,
to avoid bad actions during explorations to improve the otherwise r̃ (s (t) , as (t)) is negative.
safety of a system. As proved in [267], for arbitrary potential function, reward
• Long Convergence Time: With high-dimensional obser- shaping does not change the optimal policy of a reinforce-
vation/action spaces, the convergence time of model-free ment learning algorithm. With this conclusion, we have the
DRL will be long. In addition, estimating the reliability flexibility to design any potential function according to our
of URLLC in real-world networks is time-consuming. For understanding of a problem. How to design potential functions
example, to evaluate the reward of an action in URLLC, for different problems has been widely studied in the existing
the system needs to send a large number of packets for literature, such as [266,268]. We will evaluate the performance
estimating the reliability of an action [12]. As a result, of using reward shaping in URLLC by providing a concrete
it takes a very long time for approximating the value example in Section IX.
function.
In the following subsections, we review some methods to Extra reward Target delay
address the above challenges.
C. Techniques for Improving DRL in URLLC Potential

function
When using DRL, there is no strict proof of the convergence
owning to the approximations of the policy and the value func-
tion [68]. Empirical evaluations in some existing publications
showed that DRL converges slowly when the state and action State: HoL delay
spaces are large [64,246]. Thus, how to reduce the convergence
Target jitter
time of DRL is one of the most important topics in this area
[23]. On the other hand, the approximation errors of the policy
Fig. 14. An example of potential function [71, 267].
and value function may have impacts on the reliability of
URLLC. How to achieve high reliability when using DRL 2) Multi-head Critic: The DRL introduced in Section
to optimize communication systems has not been investigated VIII-A evaluates the long-term reward of the system with a
in the existing literature. In this subsection, we review some single-head critic, where the output of the critic is a scalar
promising techniques for reducing the convergence time and that approximates the value of system when an action is taken
improving the approximation accuracy of DRL. at a state. However, a wireless network consists of multiple
26
TABLE VIII
DRL IN W IRELESS N ETWORKS
Surveys or Tutorials
Reference Contributions
[60] A comprehensive survey on applications of DRL in communications and networking
[22] A tutorial on the implementation of DRL in beyond 5G systems
[247] A tutorial on the applying DRL in resource allocation at the network edge
Mobile Networking
Reference Contributions Performance
[248] Optimizing handover controller at each user via a model-free A3C outperforms the upper confidence bandit algorithm and
asynchronous advantage actor-critic (A3C) framework the 3GPP protocol in terms of throughput and handover rate
[249] Mitigating internet congestion by adjusting the data rate of The proposed framework can achieve a better latency-
each sender via DRL throughput trade-off than existing approaches
[250] A fast-learning DRL model that dynamically optimizing net- The proposed model-free approach with pre-training is
work slicing in WiFi networks promising in dynamic environments
Scheduler Design
[251] Energy-efficient scheduling policy for vehicle safety and QoS The DRL approach outperforms other methods in terms of
concerns incomplete requests, battery lifetime, and latency
[252] Energy-efficient user scheduling and resource allocation in Actor-critic learning can achieve higher reward than Q-
heterogeneous networks learning
[253] Actor-critic DRL for traffic scheduling in cellular networks DRL method can achieve higher reward (network utilization,
with measured data throughput, congestion) than some benchmarks
[254] Scheduler selection for minimizing packet delays and packet The proposed policy outperforms existing scheduling policies
drop rates in terms of both delay and packet drop rate
[255] Task scheduling and offloading for minimizing average slow- DRL method outperforms existing schedulers in terms of
down and timeout period in MEC average slowdown and average timeout period
Resource Allocation
[256] DRL for switching remote radio heads ON and OFF in could DRL outperforms single BS association and full coordinated
radio access networks association in terms of power consumption
[257] Minimizing execution delay and energy consumption in MEC DRL outperforms benchmarks in terms of execution delay
by optimizing offloading and resource allocation and energy consumption
[258] Decentralized resource allocation for vehicle-to-vehicle com- DRL achieves higher data rates in vehicle-to-vehicle and
munications vehicle-to-infrastructure links than existing methods
[259] Maximizing weighted sum of spectrum efficiency and quality- DRL can enhance the effectiveness and agility for network
of-experience in network slicing slicing by matching the allocated resource to users’ demand
[260] Minimizing long-term system power consumption in Fog- The energy consumption achieved by DRL is lower than
RAN benchmarks and can be further reduced by transfer learning
[261] Minimizing average energy consumption or average latency The proposed DRL framework outperforms other DRL algo-
in MEC IoT networks by using Monte Carlo tree search and rithms and benchmarks in terms of average energy consump-
multi-task learning tion and average service latency
[262] Applying the model-based DRL in [23] for resource manage- By introducing discrete normalized advantage functions into
ment in network slicing DRL, better SE and QoE can be achieved
[263] Applying DRL in broadcast beam optimization for both DRL converges to the optimal solutions and can adjust actions
periodic and Markov mobility patterns according to user distributions
Routing
[264,265] Developing a GNN-based framework, RouteNet, to predict RouteNet outperforms model-based analysis in terms of pre-
mean delay, jitter and packet loss [264] and combining GNN diction accuracy and GNN+DRL outperforms existing DRL,
in DRL for routing optimization [265] especially when the topology is changing
27
h i2
components, such as multiple BSs and multiple users. As w(t) = y(t) − Q̂(s(t), as (t)|ΩQ ) . Then, the probability
shown in [269], the single-head critic is not aware of the that a transition will be selected is proportional to its weight.
rewards of different components of a system, and hence the In this way, the transitions with higher approximation errors
agent learns slowly and the algorithm can be unstable. To are more likely to be selected than the transitions with lower
handle this issue, a multi-head critic was proposed in [269] approximation errors. In Section IX, we will illustrate how
to learn a separate value function for each component of the does importance sampling help improve the reliability of
system. The difference between single-head and multi-head URLLC.
critics is illustrated in Fig. 15. In Section IX, we will apply
the multi-head critic in a multi-user URLLC system to improve
the learning efficiency of DRL and the reliability of each user. D. Constrained DRL for URLLC
When applying DRL for the URLLC applications in Table I,
State and action State and action the long term reward is usually defined as a weighted sum
of different KPIs. With different weight coefficients, the final
… … … … achieved KPIs are different. To guarantee the requirements on
latency and reliability, the coefficients should be optimized
… … … … … … automatically. However, in most of the existing DRL algo-
rithms in Table VIII, these coefficients are predetermined as
… … … … … … hyper-parameters. To determine the values of the coefficients,
constrained DRL can be applied [271].
… … … … By formulating some long-term costs as statistic constraints
in an MDP, we can apply constrained DRL for QoS guarantee
… … in URLLC [272]. To illustrate how to guarantee QoS in
MDP
URLLC, Mc cost functions Cm (t), m = 1, ..., Mc , are
Long-term reward Long-term rewards of multiple considered. The cost functions are defined as follows [271],
of the network components of the network "∞ #
X
MDP i−t MDP s
(a) Single-head critic (b) Multi-head critic Cm (t) = E γd cm (s(i), a (i), s(i + 1)) ,
i=t
Fig. 15. Difference between single-head and multi-head critics [71, 269]. where cMDP
m (s(i), as (i), s(i + 1)) is the instantaneous cost. A
long-term cost could be the average queueing delay [273,274],
3) Importance Sampling: In actor-critic DRL algorithms,
the access cost for handovers [275], the visitation probability
the state-value function is approximated by a DNN, i.e., the
of some states [276], or the cost in the tail of a risk distribution
critic. Due to approximation errors, the action that maximizes
[277].
the output of the critic may be different from the action
In URLLC, we aim to find the optimal policy that max-
that maximizes the true state-value function. As a result,
imizes the long-term reward, such as resource utilization
the wireless communication system may take a wrong action
efficiency, subject to the QoS constraints, i.e.,
leading to packet losses in URLLC. To improve the reliability
of URLLC, we need to reduce the approximation errors of max E[G(t)] (22)
π
DRL. MDP QoS
In the training phase, DRL selects a batch of transitions s.t. Cm (t) ≤ Cm ,m = 1, ..., Mc , (22a)
from the replay memory to train the critic by using the Bellman QoS
where Cm is the QoS requirement. To solve problem (22),
equation. Let us denote the critic by Q̂(s(t), a(t)|ΩQ ), where one needs to evaluate the long-term reward as well as the
ΩQ represents the parameters of the DNN. The approximation long-term cost. Due to this fact, finding the solution of
error at the state-action pair (s(t), a(t)) can be characterized constrained DRL is more challenging than regular DRL. To
by solve constrained MDP, the primal-dual method, which is
h i2 referred to as the Lagrangian approach in [272], can be applied
y(t) − Q̂(s(t), as (t)|ΩQ ) , (21) [271].
Similar to the constrained unsupervised deep learning, the
where y(t) , r(t) + Q̂(s(t + 1), as (t + 1)|ΩQ ) is the Lagrangian of problem (22) can be expressed as
approximation of the right-hand side of the Bellman equation
in (19). LCMDP (π, λCMDP )
In most of the existing publications in Table VIII, the
X
λCMDP
MDP QoS

= E[G(t)] − m Cm (t) − Cm , (23)
transitions are selected from the replay memory with the m
same probability. For the state-action pairs that are rarely
visited, the transitions are rarely selected in the training phase. where λCMDP = (λCMDP 1 , ..., λCMDP
Mc ) is the Lagrangian
Thus, the approximation errors could be high. To improve multiplier. Then, problem (22) is converted to an unconstrained
the approximation accuracy, importance sampling proposed in problem as follows,
[270] is a promising approach. The basic idea is to define a min max LCMDP (π, λCMDP ).
λCMDP >0,m=1,...,Mc π
weight of each transition according to the approximation error m
28
Agent: Scheduler at BS, Transitions in real- Training in the central server or MEC
mobility management entity, world networks
and other network functions Replay Memory
Fine-tuned actor Select a batch of training samples

Actor
Fine-tune DRL
Next state Actor Critic
State Action
Reward
Initializing DRL
Parameters of networks
Cross-layer models Digital twin of real-world network:
Simulation & numerical platform
Real-world networks
(a prototype)
Fig. 16. A digital twin of a wireless network [11, 12, 71].
The details of the primal-dual method can be found in [271, Given an input, the central server explores possible actions
272]. S. Bhatnagar et al. proved the asymptotic almost sure based on the output of the policy (FNN in the actor). For each
convergence of the method to a locally optimal solution of action, the constraints and the utility function are evaluated
problem (22) [278]. in the digital twin [12]. Then, all the actions that can satisfy
Compared with the DRL algorithms in Table VIII, con- the constraints will be used to update the FNN in the critic to
strained DRL provides a more general framework for multi- obtain a better approximation of the value function. Among
objective optimization. Instead of setting some weighting these actions, the best one that maximizes the value function
coefficients among different KPIs manually, constrained DRL is saved in the memory together with the input. Finally, some
only includes one KPI in the objective function, while the other training samples are randomly selected from the memory to
KPIs are considered in the constraints. By using the primal- train the FNN in the actor.
dual method, the weighting coefficients, λCMDP in (23), are 3) Transfer to Real-World Networks: Simulation environ-
optimized in the dual domain to achieve the target balance ments are not exactly the same as real-world networks, which
among different KPIs. For example, a weighted sum of the are dynamic. Thus, the actor initialized off-line may not be
execution delay and the energy consumption is minimized by a good policy in non-stationary wireless networks. To handle
using DRL in [257]. If constrained DRL is applied to this the model mismatch problem, the system needs to update the
problem, we can minimize the energy consumption subject to actor and critic according to the transitions from the real-world
a constraint on the execution delay. networks. As discussed in Section VI-C2, transfer learning
can be used to fine-tune pre-trained FNNs. In this way, the
E. Off-line Initialization and Online Fine-tuning of DRL in initial performance of the pre-trained actor is much better than
URLLC the actor initialized with random parameters, and the DRL
In this subsection, we review an architecture that enables converges faster in real-world networks with transfer learning
the implementation of DRL in URLLC. The basic idea is to [71].
initialize the DRL off-line in a simulation platform and fine-
tune the pre-trained DRL in real-world networks. F. Summary of DRL
1) Digital Twins of Wireless Networks: The digital twin
of a wireless network can serve as a bridge between the Theoretical knowledge can help establish a digital twin of
model-based analysis and the data-driven DRL. According a real-world network. By initializing DRL in the digital twin,
to the definition in [279], “a Digital Twin is an integrated the convergence time of DRL in the network can be reduced
multiphysics, multiscale, probabilistic simulation of an as- remarkably. In addition, exploration in the digital twin does not
built vehicle or system that uses the best available physical cause unexpected penalties in the network. Thus, exploration
models, sensor updates, fleet history, etc., to mirror the life of safety can be improved. To handle model mismatch, DRL
its corresponding real system”. It has been considered as an keeps updating the actor and critic from the feedback of the
important approach for realizing Industrie 4.0 in [280, 281]. network. In addition to the research challenges in Section
As illustrated in Fig. 16, a central server establishes a digital VIII-B, when applying DRL in URLLC, there are two open
twin of a real-world network with the network topology, chan- issues.
nel and queueing models, QoS requirements, and formulas in • Unlike unsupervised deep learning that has both long-
Section IV. Then, the DRL algorithm explores possible actions term statistic constraints and instantaneous constraints,
in the digital twin and initializes the actor and critic off-line. constrained DRL can only guarantee long-term con-
After off-line initialization, the actor will be sent to the control straints. How to include instantaneous constraints in DRL
plane of the wireless network for real-time implementation. remains unclear.
2) Exploration in a Digital Twin: By adopting the concept • When applying DRL in URLLC systems, the feedback
of the digital twin in DRL, the agent can explore in a simu- of reward or penalty could be very sparse in the time
lation platform to avoid potential risks in real-world systems. horizon. This is because the packet loss probability in
29
URLLC is extremely small. Improving the learning effi- where u0 (bits/packet) is the packet size, V ≈ 1 is applied,
ciency of DRL in URLLC systems is an urgent task. and the effective bandwidth of the k-th user, EBk [16],
ln(2/εmax )
EBk = h tot i (packets/s),
max Tf ln(2/εmax )
(Dtot − Tf ) ln λa (Dmax tot + 1
IX. C ASE S TUDIES k tot −Tf )
(27)
In this section, we provide concrete examples to illustrate where λak is the average packet arrival rate of the k-th user.
how to guarantee the QoS requirements of URLLC by inte- The optimization algorithm to find labeled training samples
grating theoretical knowledge into deep learning. and the supervised deep learning algorithm were developed in
[282] and [13], respectively.
2) Simulation Results: The setup for simulation can be
found in [282]. We use two FNNs with parameters Ω0 and
A. Deep Transfer Learning for URLLC
Ω1 to approximate the resource allocation policies of delay-
1) Example problem: We consider an example system in tolerant services and URLLC services, respectively. For both
[282], where the transmit power of a BS serving multiple users FNNs, there are four hidden layers and each of them has 800
is minimized by optimizing the transmit power and bandwidth neurons. The dimensions of the input and output layers are 40
allocation subject to the QoS requirements of different types and 20, respectively.
of services. The problem is formulated as follows [282], Since there is no URLLC services in the existing cellular
networks, we assume that it is much easier to obtain labeled
K
X training samples for delay-tolerant services than that for
min t Pkt (24)
Wk ,Pk URLLC services. The parameters of the first FNN, Ω0 , are
k=1
K
trained with the labeled samples for delay-tolerant services in
s.t.
X
Wk ≤ Wmax , set S0 . To reduce the number of labeled training samples for
k=1
URLLC, we initialize the values of Ω1 with Ω0 and apply
network-based deep transfer learning [13]. Specifically, we fix
QoS constraints,
the first three hidden layers and fine-tune the weights and bias
Pkt ≥ 0, and Wk ≥ 0, in the last hidden layer. To evaluate the performance, we define
where Pkt and Wk are the transmit power and the bandwidth the normalized accuracy as follows,
allocated to the k-th user, respectively, and Wmax is the total K K
P̂kt − P̃kt
P P
bandwidth of the system.
k=1 k=1
The QoS constraints depend on the requirements of different ηN N = 1 − K
, (28)
P̃kt
P
kinds of services. For delay-tolerant services, the average
service rate of each user should be equal to or higher than k=1
the average arrival rate of the user, i.e., where and P̂kt P̃kt
are the transmit powers obtained from

αk gk Pkt
the supervised deep learning algorithm and the optimization
Egk Wk log2 1 + ≥ āk (t), (25) algorithm, respectively.
Wk N0 NT
where NT is the number of antennas at the BS, N0 is the 10
single-side noise spectral density, and αk and gk are the large-
8
scale and small-scale channel gains of the k-th user. In this
system, each user is equipped with a single antenna. The 6 Learn from scratch
Transfer learning
instantaneous channel is available at each user, but the BS 4
only knows αk and the distribution of gk . 2
For URLLC, the overall packet losses include decoding er- 0
rors and queueing delay violations, and the E2E delay includes 0 1000 2000 3000 4000
Number of training epochs
the transmission delay and the queueing delay. According 1
to the conclusion in [16], we can set the required decoding 0.9
error probability and the required queueing delay violation 0.8
NN
Learn from scratch

probability as equal, i.e., εmax
tot /2. The result in [190] shows 0.7 Transfer learning
that the optimal transmission delay is one frame, Dt = Tf , 0.6
and the queueing delay bound should be Dqmax = Dtot max
− Tf . 0.5
To guarantee the overall packet loss probability and E2E 0.4
0 1000 2000 3000 4000
delay, the following constraint should be satisfied, Number of training epochs
αk gk Pkt
p
u0 EBk ln 2 Fig. 17. Performance of deep transfer learning [13, 282].
Egk fQ Tf Wk ln 1 + −
Wk N0 NT Wk
As shown in Fig. 17, if there are a large number of training
≤ εmax
tot /2, (26) samples, then the normalized accuracy approaches to one
30
as the number of training epochs increases (The optimiza- and test the bandwidth allocation policy in supervised deep
tion algorithm generates one labeled training sample in each learning, where the solutions obtained from the optimization
epoch). With the network-based deep transfer learning, the algorithm, Wk∗ , are used as the labels.
normalized accuracy reaches 97% after 100 training epochs
(with 100 training samples). However, if the system learns 100
from scratch, then it takes 2500 epochs (with 2500 training
samples) to achieve the same performance. Therefore, by using
10-1
deep transfer learning, we can obtain a good approximation of
the optimal resource allocation policy for URLLC with 100
training samples. 10-2
B. Comparison of Supervised and Unsupervised Deep Learn- 10-3

ing
1) Example Problem: We take a bandwidth allocation prob- 10-4
lem in [67] to illustrate the difference between supervised
and unsupervised deep learning. In this problem, the total 10-5
bandwidth is minimized subject to the QoS requirement of 10-4 10-3 10-2 10-1
URLLC services. The problem is formulated as follows,
K
X Fig. 18. Supervised deep learning versus unsupervised deep learning [67].
min Wk (29)
Wk
k=1
s.t. Egk e−θk sk − e−θk EBk ≤ 0, k = 1, ..., K,

(30)
is not equivalent to
Wk ≥ 0,
where sk is the achievable rate in the finite blocklength regime, Not integrated into the loss function
Problem
the value of θk is determined by the delay bound and delay Supervised Loss
violation probability according to (7), and the QoS constraints learning Labeled function Mean SGD
training square
in (30) is derived from (6). error
samples Optimal
The labeled training samples for supervised deep learning
Theoretical solutions:
are obtained from the optimization algorithm in [67]. With knowledge Loss DNNs
unsupervised deep learning, the primal-dual method is applied Formulating
objective function Lagrangian:
to solve the equivalent functional optimization problem, i.e.,
Unsupervised function & weighted sum Primal-dual
min Eαk {Wk (αk )} (31) learning constraints method
Wk (αk )
Integrated into the loss function
n o
s.t. Egk e−θk sk (αk ) − e−θk EBk ≤ 0, (31a)
Wk (αk ) ≥ 0, Fig. 19. Comparison of supervised and unsupervised deep learning [67].
The results in Fig. 18 shows that with probability 1 − 10−5 ,
where Wk (αk ) is a mapping from the realization of the large- the relative errors of supervised and unsupervised deep learn-
scale channel gain to the bandwidth allocation. ing are less than 3% and 2%, respectively. By reserve a small
2) Simulation results: To show whether supervised and un- portion of extra bandwidth (increasing effective capacity by
supervised deep learning can guarantee the QoS requirements 2 ∼ 3%) to each user, the deep learning approaches can
or not, we evaluate the CCDF of relative error of the QoS guarantee the QoS requirements with a probability of 1−10−5.
constraint. Specifically, the relative error is defined as follows, The results in Fig. 18 also indicate that unsupervised deep
( )
Egk e−θk ŝk − e−θk EBk
learning outperforms supervised deep learning in terms of
ν , max ,0 the approximation error of the QoS constraint. The reason
e−θk EBk
n n o o is illustrated in Fig. 19, where Cnl represents the non-linear
= max Egk eθk (EBk −ŝk ) − 1, 0 , QoS constraint in (31a). In supervised learning, the theoret-
ical knowledge is used to find labeled training samples but
where ŝk is obtained by substituting the output of the DNN, 2
is not integrated into the loss function, Eαk Ŵk − Wk∗ .
Ŵk , into the expression of the achievable rate in the finite
blocklength regime. However, to minimize the approximation error of the QoS
constraint, we should minimize the following loss function
To evaluate the performance of unsupervised deep learning, h i 2
the bandwidth allocation policy is trained and tested in 100 Eαk Cnl (Ŵk ) − Cnl (Wk∗ ) . Since the QoS constraint is non-
trails. In each trial, the policy is trained with 10000 iterations linear, the two loss functions are not equivalent. Different from
by using the primal-dual method and is tested with 1000 real- supervised deep learning, the loss function of unsupervised
izations of αk . The same realizations of αk are used to train deep learning is a weighted sum of the objective function and
31
Edge Server
USRP srsUE
Optical fiber Packet
Destination
Packet USRP
Source USRP srsUE
Packet
Destination
srsEPC srsENB
Delay in Queueing Transmission Processing delay
core network delay in BS delay in RAN in receivers
E2E Latency, 9 to 11 ms
Prediction horizon, 10 ms User experienced

delay, around 1 ms
Fig. 20. Prototype for E2E delay evaluation [11, 71].
the QoS constraint. The policy and the weighting coefficients 2) Simulation Results: We compare the performances
are optimized in the primal and dual domains, respectively. In achieved by three DRL algorithms with different definitions
other words, the theoretical knowledge is integrated into the of state, action, and reward. The first one is a straightforward
loss function. Therefore, unsupervised deep learning outper- application of the deep deterministic policy gradient (DDPG)
forms supervised deep learning. algorithm [64] (with legend “DDPG”). The state includes the
HoL delays and the channel quality indicators (CQIs) of users,
the action is the numbers of resource blocks allocated to users,
C. Knowledge-assisted DRL and the reward is given by
1) Example Problem: To illustrate how to integrate theoret-
(
[1] 1Dmin ≤dk (t)≤Dmax · 1dec
k (t), if xk (t) = 1 ,
ical knowledge into DRL by using the technologies in Section rk (t) = (33)
VIII-C, we take the scheduler design problem in [71] as an 0 , if xk (t) = 0 ,
example. As shown in Fig. 21, a DL scheduler at the BS where 1Dmin ≤dk (t)≤Dmax and 1dec
k (t) are two indicators. If a
determines which users should be scheduled and how many packet is scheduled when dk (t) ∈ [Dmin , Dmax ] and is suc-
resource blocks should be allocated to the scheduled users. [1]
cessfully decoded by the receiver, then rk (t) = 1. Otherwise,
Let us denote the HoL delay of the k-th user in the t-th [1]
rk (t) = 1.
slot as dk (t). To meet the latency and jitter requirements of
URLLC, the users should be scheduled when dk (t) lies in With the second approach, DDPG is applied in a theoretical
[Dmin , Dmax ]. As such, the jitter is (Dmax −Dmin). The goal is DRL framework (with legend “DDPG-T-DRL”), where the
to minimize packet losses caused by decoding errors and delay expression in (3) is applied to compute the number of resource
bound violations. Specifically, in the t-th slot, we maximize blocks that is required to achieve a target decoding error
the summation of the discounted rewards of all the users, probability, denoted by nk (t). Since the value of nk (t) is
∞ K
determined by the CQI of the user, the state is re-defined as
the HoL delays and nk (t), k = 1, ..., K. Given the required
X X
max E γdi−t rk (i), (32)
u(·|Ωu )
i=t k=1
numbers of resource blocks, nk (t), k = 1, ..., K, the scheduler
only needs to determine which users should be scheduled.
where rk (i) is the reward of k-th user in the i-th slot, With the help of the expression in (3), the reward is given
and u(·|Ωu ) is the actor of DRL, where Ωu represents the by
parameters of the actor. (
[2] 1Dmin ≤dk (t)≤Dmax (1 − ǫck (t)), if xk (t) = 1,
Scheduled
rk (t) =
0, if xk (t) = 0 .
users
(34)
DL scheduler
where ǫck (t) is the decoding error probability. Considering that
(34) is strictly smaller than 1, the reward can be re-defined
…
[2]
…
as r̂k (t) = − ln[1 − rk (t)] to further improve the learning

efficiency [71].
Packets for Unscheduled To further improve the performance, a knowledge-assisted
Radio resources
different users users
DDPG algorithm is applied in the theoretical DRL framework
Fig. 21. DL scheduler [71].
in [71] (with legend “KA-DDPG-T-DRL”). The basic idea is
to exploit the knowledge of the rewards of multiple users, the
32
target scheduling policy, and the approximation errors of the

critic. To integrating the three kinds of knowledge into DDPG,
we apply the techniques discussed in Section VIII-C: multi-
head critic, reward shaping, and importance sampling.
Fig. 23. Distribution of E2E delay [71].
To achieve 1 ms user experienced delay and ±1 ms jit-

ter, the E2E delay in the communication system should lie
in [9, 11] ms (given that the prediction horizon is 10 ms).
Fig. 22. Packet loss probability during the training phase, where the duration Since the scheduler can only control the queueing delay and
of each slot is 0.125 ms [71].
cannot reduce the delay and jitter of other delay components,
The details of the simulation setup of Fig. 22 can be found in which cause Dother = 4 ms delay in our prototype, we set
[71]. To reduce the packet loss rate, we turn off the exploration Dmin = 5 ms and Dmax = 7 ms in scheduler design. The
after the 105 -th slot and keep fine-tuning the parameters of the actor and the critic are first initialized in the digital twin that
actor and critic with a small learning rate, 10−4 . In the first mimics the behavior of the real-world network. To handle
105 slots, the packet loss rate is evaluated every 103 slots. the model-mismatch problem, we fine-tune the scheduling
After that, the packet loss rate is evaluated every 2 × 104 policy by using the transitions from the measurement in the
slots. The results in Fig. 22 indicate that the straightforward prototype.
implementation of DDPG does not converge. With the help of The results in Fig. 23 shows that if we initialize the actor
the theoretical DRL framework, DDPG converges slowly and and the critic with random variables (with legend “Random”),
the final packet loss rate is 10−2 . If the knowledge is further the initial E2E delay does not lie in [9, 11] ms. With the
exploited, the algorithm converges faster and the packet loss initialization in the digital twin, more than half of the packets
rate can be reduced by one order of magnitude. can meet the delay and jitter requirements (with legend “Off-
3) Experimental Results: To validate the architecture in line”). After 5 minutes of fine-tuning in the prototype, around
Fig. 16, we built a prototype in [71]. As illustrated in Fig. 20, 90 % packets meet the requirements (with legend “5-min”). If
the prototype was built upon an open-source Long Term Evo- we keep fine-tuning the actor and the critic, the performance
lution (LTE) software suit from Software Radio System (srs) of the scheduler can be further improved. After 15 minutes
Limited, a software company in wireless communications. The of fine-tuning, the delay bound violation probability is around
core network, the BS, and mobile devices are referred to as 1 %. Such a delay bound violation probability is still too high
evolved packet core (srsEPC), eNodeB (srsENB), and user for URLLC. This is because the scheduler can only control
equipment (srsUE), respectively. The radio transceivers are the jitters in buffers of the BS, and cannot reduce the jitters
Universal Software Radio Peripheral (USRP) B210. The E2E of the other delay components, which become the bottleneck
delay includes the delay in the core network, the queueing of the system.
delay in the BS, the transmission delay in RAN, and the To further reduce the jitter, one possible way is to do
processing delay in the receiver. the prediction at the receivers. Since the delay and the jitter
In this experiment, we aim to achieve 1 ms user experienced experienced by each packet can be measured by the receiver,
delay and ±1 ms jitter in Tactile Internet, where the packet it is possible to adjust the prediction horizon to compensate
source is a tactile device [45]. In the LTE software suit, variations of the delay and the jitter at the receiver. However,
the transmission delay is longer than 1 ms. In addition, the the historical trajectory at the receiver is not error-free since
propagation delay in the core network is larger than 1 ms there are some packet losses in the communication system. As
if the communication distance is longer than 300 km. As a a result, the prediction accuracy at the receiver will be worse
result, one can hardly achieve 1 ms E2E delay. To handle than that at the transmitter. Nevertheless, the comparison of
this issue, the prediction and communication co-design method these two approaches is not available in the existing literature.
developed in [45] is applied, where the prediction horizon is
set to be 10 ms. The transmitter predicts its future locations
and sends them to the receivers 10 ms in advance. According D. Summary of Case Studies
to the experimental results in [11], by using an FNN to predict The examples in this section imply that the application of
the trajectory in the next 10 ms, the prediction error probability deep learning in URLLC is not straightforward. We need to
is smaller than 10−5 . integrate the knowledge of wireless communications into dif-
33
ferent kinds of learning algorithms to reduce the convergence the required bandwidth for data transmission will be doubled,
time, guarantee the QoS constraints, and improve the final but the computation latency can be reduced [34]. To guarantee
performances in terms of E2E delay, reliability, and jitter. In the E2E delay requirement, communication resources for task
addition, there are some open issues should be addressed. offloading and computing resources for processing tasks from
• The QoS of URLLC is sensitive to a lot of hyper- different users should be jointly optimized [288]. Multi-agent
parameters, such as the initial values of the parameters, DRL is a promising approach for optimizing task offloading in
the learning rate, and the structure of the neural network. MEC systems. With multi-agent DRL, MEC servers and VR
These hyper-parameters are determined by trial-and-error devices can take actions based on local states. In this way, the
in our case studies. How to find proper hyper-parameters overhead for exchanging control information can be reduced
without human efforts remains an important topic in deep remarkably [122].
learning. On the other hand, to guarantee the ultra-low latency
• The approximation errors of continuous variables are requirement, computing delays caused by decoding and ex-
extremely small and the QoS constraints can be satisfied ecuting optimization algorithms at schedulers are no longer
by reserving a small portion of extra resources. However, negligible [27, 289]. A complicated decoding (or scheduling)
when the actions are discrete variables, such as the users schemes may achieve a better delay-reliability tradeoff in
to be scheduled, the reliability in our prototype is still communications, but will lead to long computing delay at the
unsatisfactory for URLLC. receiver (or the scheduler). However, there is no fundamental
• To fine-tune learning algorithms in non-stationary net- result to quantify the relationship between the computing delay
works, the training phase should be fast enough to follow and the complexity of an algorithm, because the computing de-
the variations of wireless networks. According to our lay highly depends on the hardware. To facilitate the analysis,
results, it takes around 5 to 15 minutes to fine-tune we can use some simplified models validated by measurements
the algorithms, which is fast enough for slow varying [290,291]. By integrating models into constrained DRL [271],
networks. It is still very challenging to update learning the optimal policy that maximizes the resource utilization
algorithms in highly dynamic networks. efficiency subject to the constraint on computing delay can
• In our case studies, FNNs are used to approximate the be obtained.
optimal policy or predict trajectories. FNNs work well in
most of the small-scale problems. When the scales of the B. Massive URLLC
problems grow, novel structures of neural networks are Due to the explosive growth of the numbers of autonomous
needed. vehicles and mission-critical IoT devices [292], future wire-
less networks are expected to support massive URLLC. To
X. F UTURE D IRECTIONS support massive URLLC, novel communication and learning
A. High Data Rate URLLC techniques are needed.
With orthogonal multiple access technologies, the required
As one of the killer applications in 5G networks, VR/AR
bandwidth increases linearly with the number of devices. To
applications require ultra-reliable and low-latency tactile feed-
achieve better tradeoffs among delay, reliability, and scalabil-
back and high data rate 360o videos [96]. Meanwhile, as
ity, other multiple access technologies should be used, such as
the size of devices shrinks, battery lifetime will become a
non-orthogonal multiple access and contention-based multiple
bottleneck for enabling high data rate URLLC [80]. To imple-
access technologies [293,294]. Meanwhile, we need to exploit
ment VR/AR applications in future wireless networks, we need
the above 6 GHz spectrum including mmWave [295] and the
to investigate the fundamental tradeoffs among throughput,
Terahertz band [296].
energy efficiency, reliability, and latency in communications,
When using FNN in massive URLLC, the number of
caching, and computing systems [34], as well as enabling
weights in FNN grows dramatically with the dimension of the
technologies such as touch user interface and haptic codecs
inputs. As a result, it takes a very long time to train the FNN
[283, 284]. With the sub-6GHz spectrum, it is difficult to
in large-scale wireless networks. To overcome this difficulty,
meet the above requirements in 5G cellular networks. To over-
one possible solution is compressing the high-dimensional
come this difficulty, Terahertz communications were applied
inputs of the network with an autoencoder [238]. Another
to support VR applications in [285]. Since high-frequency
viable approach is to use GNNs to represent the topologies of
communications rely on LoS paths, the blockage probability
wireless networks [297]. Since the number of parameters of a
was analyzed in small cells in [285]. To further reduce the
GNN does not increase with the dimension of the input, GNNs
blockage probability, novel network architectures that can
are suitable for optimizing large-scale wireless networks [233].
provide multi-path diversity are much needed.
Fog-RAN or MEC systems are promising network architec-
tures for high data rate URLLC systems [90, 286], where the C. Secure URLLC
resource allocations in communication and computing systems Future URLLC systems will suffer from different kinds of
are intertwined. When serving VR applications in an MEC attacks that result in inefficient communications [298]. The
system, the system can either project a 2D field of view into widely used cryptography algorithms require high-complexity
a 3D field of view at the MEC or the mobile VR device signal processing, and may not be suitable for URLLC,
[287]. If the projection component is offloaded to the MEC, especially for IoT devices with low computing capacities.
34
To defend against eavesdropping attacks in URLLC, physical [10] 3GPP, “Study on new radio (NR) access technology; physical layer
layer security is a viable solution [48]. The maximal secret aspects (release 14).” TR 38.802 V2.0.0, 2017.
[11] C. She, R. Dong, Z. Gu et al., “Deep learning for ultra-reliable and
communication rate in the short blocklength regime over a low-latency communications in 6G networks,” IEEE Network, early
wiretap channel was derived in [299]. The results show that access, 2020.
there are tradeoffs among delay, reliability, and security. Based [12] R. Dong, C. She, W. Hardjawana, Y. Li, and B. Vucetic, “Deep learning
for hybrid 5G services in mobile edge computing systems: Learn from
on this fundamental result, we can further investigate the a digital twin,” IEEE Trans. Wireless Commun., no. 10, pp. 4692–4707,
technologies for improving physical-layer security. Oct. 2019.
Since the security of a system is very sensitive to the [13] ——, “Deep learning for radio resource allocation with diverse quality-
of-service requirements in 5G,” IEEE Trans. Wireless Commun., ac-
traffic and channel models, one can hardly obtain any analysis cepted with minor revision, 2020.
result without accurate models. To overcome this difficulty, [14] X. Jiang, H. Shokri-Ghadikolaei, G. Fodor, E. Modiano, Z. Pang,
GANs were used to generate distributions of inter-arrival time M. Zorzi, and C. Fischione, “Low-latency networking: Where latency
lurks and how to tame it,” Proc. IEEE, no. 107, pp. 280–306, Feb.
between packets and the channel gains numerically in [124]. In 2019.
addition, the results in [300, 301] indicate that GANs perform [15] S. Xu, T.-H. Chang, S.-C. Lin, C. Shen, and G. Zhu, “Energy-efficient
very well in intrusion detection. Thus, it is a promising packet scheduling with finite blocklength codes: Convexity analysis and
approach to improve the security of URLLC. efficient algorithms,” IEEE Trans. Wireless Commun., vol. 15, no. 8,
pp. 5527–5540, Aug. 2016.
[16] C. She, C. Yang, and T. Q. S. Quek, “Cross-layer optimization for ultra-
XI. C ONCLUSION reliable and low-latency radio access networks,” IEEE Trans. Wireless
Commun., vol. 17, no. 1, pp. 127–141, Jan. 2018.
In this paper, we first summarized the KPIs and challenges [17] Y. Hu, M. Ozmen, M. C. Gursoy, and A. Schmeink, “Optimal power
of URLLC in four typical application scenarios: indoor large- allocation for QoS-constrained downlink multi-user networks in the
scale scenarios, indoor wide-area scenarios, outdoor large- finite blocklength regime,” IEEE Trans. Wireless Commun., vol. 17,
no. 9, pp. 5827–5840, Sep. 2018.
scale scenarios, and outdoor wide-area scenarios. We found [18] J. Tang, B. Shim, and T. Q. S. Quek, “Service multiplexing and
that existing optimization algorithms cannot meet these re- revenue maximization in sliced C-RAN incorporated with URLLC and
quirements and the applications of deep learning algorithms multicast eMBB,” IEEE J. Sel. Areas Commun., vol. 37, no. 4, pp. 881–
895, Apr. 2019.
are not straightforward. To handle the issues of existing [19] C. Guo, L. Liang, and G. Y. Li, “Resource allocation for low-latency
methodologies, we made a road-map toward URLLC: inte- vehicular communications: An effective capacity perspective,” IEEE J.
grating theoretical knowledge of wireless communications into Sel. Areas Commun., vol. 37, no. 4, pp. 905–917, Apr. 2019.
[20] C. Zhang, P. Patras, and H. Haddadi, “Deep learning in mobile and
deep learning. wireless networking: A survey,” IEEE Commun. Surveys Tuts., vol. 21,
Following the road-map, we reviewed promising 6G net- no. 3, pp. 2224–2287, 2019.
work architectures and discussed how to develop deep learning [21] A. Zappone, M. Di Renzo, and M. Debbah, “Wireless networks design
in the era of deep learning: Model-based, AI-based, or both?” IEEE
frameworks based on the network architectures. Then, we Trans. Commun., vol. 67, no. 10, pp. 7331–7376, 2019.
revisited theoretical knowledge in wireless communications [22] Z. Xiong, Y. Zhang, D. Niyato, R. Deng, P. Wang, and L.-C. Wang,
including analysis tools and cross-layer optimization frame- “Deep reinforcement learning for mobile 5G and beyond: Fundamen-
tals, applications, and challenges,” IEEE Veh. Technol. Mag., vol. 14,
works. To show how to integrate theoretical knowledge into no. 2, pp. 44–52, Jun. 2019.
deep learning, we introduced different kinds of learning algo- [23] S. Gu, T. Lillicrap, I. Sutskever, and S. Levine, “Continuous deep Q-
rithms and provided concrete examples in case studies. The learning with model-based acceleration,” in Proc. ICML, 2016.
results indicated that the integration outperforms straightfor- [24] A. Goldsmith, Wireless Communications. Cambridge University Press,
2005.
ward applications of learning algorithms in wireless networks. [25] C. Sun and C. Yang, “Learning to optimize with unsupervised learning:
Finally, we highlighted some future directions in this area. Training deep neural networks for URLLC,” in Proc. IEEE PIMRC,
2019.
[26] H. He, S. Jin, C.-K. Wen, F. Gao, G. Y. Li, and Z. Xu, “Model-driven
R EFERENCES deep learning for physical layer communications,” IEEE Wireless
[1] 3GPP, “Study on scenarios and requirements for next generation access Commun., vol. 26, no. 5, pp. 77–83, 2019.
technologies.” TSG RAN TR38.913 R14, Jun. 2017. [27] M. Shirvanimoghaddam, M. Sadegh Mohamadi, R. Abbas, et al.,
[2] P. Schulz, M. Matthé, H. Klessig, et al., “Latency critical IoT applica- “Short block-length codes for ultra-reliable low-latency communica-
tions in 5G: Perspective on the design of radio interface and network tions,” IEEE Commun. Mag., vol. 57, no. 2, pp. 130–137, Feb. 2019.
architecture,” IEEE Commun. Mag., vol. 55, no. 2, pp. 70–78, Feb. [28] P. Dymarski, S. Kula, and T. N. Huy, “QoS conditions for VoIP and
2017. VoD,” J. Telecommun. and Inf. Tech., pp. 29–37, Mar. 2011.
[3] O. N. C. Yilmaz, Y.-P. E. Wang, N. A. Johansson, N. Brahmi, S. A. [29] C. She, C. Yang, and T. Q. S. Quek, “Joint uplink and downlink
Ashraf, and J. Sachs, “Analysis of ultra-reliable and low-latency 5G resource configuration for ultra-reliable and low-latency communica-
communication for a factory automation use case,” in IEEE ICC tions,” IEEE Trans. Commun., vol. 66, no. 5, pp. 2266–2280, May
Workshop, 2015. 2018.
[4] M. Simsek, A. Aijaz, M. Dohler, et al., “5G-enabled tactile internet,” [30] C. She, C. Liu, T. Q. Quek, C. Yang, and Y. Li, “Ultra-reliable and low-
IEEE J. Select. Areas Commun., vol. 34, no. 3, pp. 460–473, Mar. latency communications in unmanned aerial vehicle communication
2016. systems,” IEEE Trans. on Commun., vol. 67, no. 5, pp. 3768–3781,
[5] S. Latif, J. Qadir, S. Farooq, and M. A. Imran, “How 5G wireless May 2019.
(and concomitant technologies) will revolutionize healthcare?” Future [31] D. Öhmann, A. Awada, I. Viering, M. Simsek, and G. P. Fettweis,
Internet, vol. 9, no. 4, p. 93, 2017. “Achieving high availability in wireless networks by inter-frequency
[6] K. Antonakoglou, X. Xu, E. Steinbach, T. Mahmoodi, and M. Dohler, multi-connectivity,” in Proc. IEEE ICC, 2016.
“Towards haptic communications over the 5G tactile internet,” IEEE [32] 3GPP, “Service requirements for cyber-physical control applications in
Commun. Surveys Tuts., vol. 20, no. 4, pp. 3034–3059, 2018. vertical domains,” 3GPP, TS 22.104, 2018, v16.0.0.
[7] GE Digital, “Industrial internet insights report.” White Paper, 2015. [33] B. Holfeld, D. Wieruch, T. Wirth, L. Thiele, S. A. Ashraf, J. Huschke,
[8] “Germany: Industrie 4.0.” Digital Transformation Monitor, 2017. I. Aktas, and J. Ansari, “Wireless communication for factory automa-
[9] “Made in China 2025.” Institute for Security & Development Policy, tion: An opportunity for LTE and 5G systems,” IEEE Commun. Mag.,
2017. vol. 54, no. 6, pp. 36–43, Jun. 2016.
35
[34] Y. Sun, Z. Chen, M. Tao, and H. Liu, “Communications, caching and [59] A. Balatsoukas-Stimming and C. Studer, “Deep unfolding for commu-
computing for mobile virtual reality: Modeling and tradeoff,” IEEE nications systems: A survey and some new directions,” in Proc. IEEE
Trans. Commun., vol. 67, no. 11, pp. 7573–7586, 2019. SiPS, 2019.
[35] I. Mavromatis, A. Tassi, G. Rigazzi, R. J. Piechocki, and A. Nix, [60] N. C. Luong, D. T. Hoang, S. Gong, D. Niyato, P. Wang, Y.-C.
“Multi-radio 5G architecture for connected and autonomous vehicles: Liang, and D. I. Kim, “Applications of deep reinforcement learning in
application and design insights,” arXiv preprint arXiv:1801.09510, communications and networking: A survey,” IEEE Commun. Surveys
2018. Tuts., vol. 21, no. 4, pp. 3133–3174, 2019.
[36] A. Aijaz and M. Sooriyabandara, “The tactile internet for industries: [61] S. Dörner, S. Cammerer, J. Hoydis, and S. ten Brink, “Deep learning
A review,” Proc. IEEE, vol. 107, no. 2, pp. 414–435, Feb. 2019. based communication over the air,” IEEE J. Sel. Topics Signal Process.,
[37] C. Thuemmler, C. Rolffs, A. Bollmann, G. Hindricks, and vol. 12, no. 1, pp. 132–143, Feb. 2018.
W. Buchanan, “Requirements for 5G based telemetric cardiac moni- [62] F. Liang, C. Shen, W. Yu, and F. Wu, “Power control for interference
toring,” in Proc. IEEE WiMob, 2018. management via ensembling deep neural networks,” in Proc. IEEE/CIC
[38] M. Cosovic, A. Tsitsimelis, D. Vukobratovic, J. Matamoros, and ICCC, 2019.
C. Anton-Haro, “5G mobile cellular networks: Enabling distributed [63] M. Eisen, C. Zhang, L. F. O. Chamon, D. D. Lee, and A. Ribeiro,
state estimation for smart grids,” IEEE Commun. Mag., vol. 55, no. 10, “Learning optimal resource allocations in wireless systems,” IEEE
pp. 62–69, Oct. 2017. Trans. Signal Process., vol. 67, no. 10, pp. 2775–2790, May 2019.
[39] M. A. Lema, A. Laya, T. Mahmoodi, M. Cuevas, J. Sachs, J. Mark- [64] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa,
endahl, and M. Dohler, “Business case and technology analysis for 5G D. Silver, and D. Wierstra, “Continuous control with deep reinforce-
low latency applications,” IEEE Access, vol. 5, pp. 5917–5935, 2017. ment learning,” in Proc. ICLR, 2016.
[40] C.-F. Liu and M. Bennis, “Taming the tail of maximal information age [65] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning repre-
in wireless industrial networks,” IEEE Commun. Lett., vol. 23, no. 12, sentations by back-propagating errors,” Nature, vol. 323, 1986.
pp. 2442–2446, Dec. 2019. [66] K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward
[41] C. Kam, S. Kompella, G. D. Nguyen, J. E. Wieselthier, and networks are universal approximators,” Neural networks, vol. 2, no. 5,
A. Ephremides, “On the age of information with packet deadlines,” pp. 359–366, 1989.
IEEE Tran. Inf. Theory, vol. 64, no. 9, pp. 6419–6428, Sep. 2018. [67] C. Sun, C. She, and C. Yang, “Unsupervised deep learning
[42] S. Boyd and L. Vandanberghe, Convex Optimization. Cambridge Univ. for optimizing wireless systems with instantaneous and statistic
Press, 2004. constraints,” submitted to IEEE Jounral for possible publication, 2020.
[43] D. A. Lawrence, “Stability and transparency in bilateral teleoperation,” [Online]. Available: https://arxiv.org/pdf/2006.01641.pdf
IEEE Trans. Robot. Autom., vol. 9, no. 5, pp. 624–637, Oct 1993. [68] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction.
[44] D. Feng, C. She, K. Ying, L. Lai, Z. Hou, T. Q. Quek, Y. Li, and MIT Press, 2018.
B. Vucetic, “Toward ultrareliable low-latency communications: Typical
[69] C. She, C. Yang, and T. Q. S. Quek, “Radio resource management for
scenarios, possible solutions, and open issues,” IEEE Veh. Tech. Mag.,
ultra-reliable and low-latency communications,” IEEE Commun. Mag.,
vol. 14, no. 2, pp. 94–102, Jun. 2019.
vol. 55, no. 6, pp. 72–78, Jun. 2017.
[45] Z. Hou, C. She, Y. Li, Z. Li, and B. Vucetic, “Prediction and commu-
[70] W. Yang, G. Durisi, T. Koch, and Y. Polyanskiy, “Quasi-static multiple-
nication co-design for ultra-reliable and low-latency communications,”
antenna fading channels at finite blocklength,” IEEE Trans. Inf. Theory,
IEEE Trans. Wirelss Commun., vol. 19, no. 2, pp. 1196–1209, Feb.
vol. 60, no. 7, pp. 4232–4264, Jul. 2014.
2020.
[46] T. Luettel, M. Himmelsbach, and H.-J. Wuensche, “Autonomous [71] Z. Gu, C. She, W. Hardjawana et al., “Knowledge-assisted
ground vehicles-concepts and a path to the future.” Proc. of IEEE, deep reinforcement learning in 5G scheduler design: From
vol. 100, no. Centennial-Issue, pp. 1831–1839, May 2012. theoretical framework to implementation,” submitted to IEEE
[47] S. Kaul, M. Gruteser, V. Rai, and J. Kenney, “Minimizing age of Journal for possible publication, 2020. [Online]. Available:
information in vehicular networks,” in Proc. IEEE SAHCN, 2011. https://www.dropbox.com/s/ubaw6bxn0chz4sz/IEEE DRL.pdf?dl=0
[48] R. Chen, C. Li, S. Yan, R. Malaney, and J. Yuan, “Physical layer [72] G. Durisi, T. Koch, and P. Popovski, “Toward massive, ultrareliable, and
security for ultra-reliable and low-latency communications,” IEEE low-latency wireless communication with short packets,” Proc. IEEE,
Wireless Commun., vol. 26, no. 5, pp. 6–11, 2019. vol. 104, no. 9, pp. 1711–1726, Aug. 2016.
[49] P. Popovski, et al., “Deliverable d6.3 intermediate system evaluation [73] K. Antonakoglou, X. Xu, E. Steinbach, T. Mahmoodi, and M. Dohler,
results.” ICT-317669-METIS/D6.3, 2014. [Online]. Available: “Towards haptic communications over the 5G tactile internet,” IEEE
https://www.metis2020.com/wp-content/uploads/deliverables/METIS D6.3 v1.pdfCommun. Surveys Tuts., vol. 20, no. 4, pp. 3034–3059, 2018.
[50] D. Öhmann, A. Awada, I. Viering, M. Simsek, and G. P. Fettweis, [74] A. Nasrallah, A. S. Thyagaturu, Z. Alharbi, C. Wang, X. Shao,
“Modeling and analysis of intra-frequency multi-connectivity for high M. Reisslein, and H. ElBakoury, “Ultra-low latency (ULL) networks:
availability in 5G,” in Proc. IEEE VTC-Spring, 2018. The IEEE TSN and IETF DetNet standards and related 5G ULL
[51] C. She, Z. Chen, C. Yang, T. Q. S. Quek, Y. Li, and B. Vucetic, research,” IEEE Commun. Surveys Tut., vol. 21, no. 1, pp. 88–145,
“Improving network availability of ultra-reliable and low-latency com- 2018.
munications with multi-connectivity,” IEEE Trans. Commun., vol. 66, [75] K. S. Kim, D. K. Kim, C.-B. Chae, S. Choi, Y.-C. Ko, J. Kim, Y.-G.
no. 11, pp. 5482–5496, Nov. 2018. Lim, M. Yang, S. Kim, B. Lim et al., “Ultrareliable and low-latency
[52] J. Liu, Y. Shi, Z. M. Fadlullah, and N. Kato, “Space-air-ground communication techniques for tactile internet services,” Proc. IEEE,
integrated network: A survey,” IEEE Commun. Surveys Tuts., vol. 20, vol. 107, no. 2, pp. 376–393, Feb. 2019.
no. 4, pp. 2714–2741, 2018. [76] I. Parvez, A. Rahmati, I. Guvenc, A. I. Sarwat, and H. Dai, “A survey
[53] R. J. Kerczewski and J. H. Griner, “Control and non-payload commu- on low latency towards 5G: RAN, core network and caching solutions,”
nications links for integrated unmanned aircraft operations,” in NASA, IEEE Commun. Surveys Tuts., vol. 20, no. 4, pp. 3098–3130, 2018.
2012. [77] M. Bennis, M. Debbah, and H. V. Poor, “Ultra-reliable and low-latency
[54] L. Gupta, R. Jain, and G. Vaszkun, “Survey of important issues in UAV wireless communication: Tail, risk and scale,” vol. 106, no. 10. Proc.
communication networks,” IEEE Commun. Surveys Tuts., vol. 18, no. 2, IEEE, 2018, pp. 1834–1853.
pp. 1123–1152, 2016. [78] G. J. Sutton, J. Zeng, R. P. Liu et al., “Enabling technologies for ultra-
[55] M. Harchol-Balter, Performance Modeling and Design of Computer reliable and low latency communications: From PHY and MAC layer
Systems: Queueing Theory in Action. Cambridge University Press, perspectives,” IEEE Commun. Surveys Tuts., vol. 21, no. 3, pp. 2488–
2013. 2524, 2019.
[56] Y. Polyanskiy, H. V. Poor, and S. Verdú, “Channel coding rate in the [79] R. Li, Z. Zhao, X. Zhou, G. Ding, Y. Chen, Z. Wang, and H. Zhang,
finite blocklength regime,” IEEE Trans. Inf. Theory, vol. 56, no. 5, pp. “Intelligent 5G: When cellular networks meet artificial intelligence,”
2307–2359, May 2010. IEEE Wireless Commun., vol. 24, no. 5, pp. 175–183, Oct. 2017.
[57] J. Tang and X. Zhang, “Quality-of-service driven power and rate [80] W. Saad, M. Bennis, and M. Chen, “A vision of 6G wireless systems:
adaptation for multichannel communications over wireless links,” IEEE Applications, trends, technologies, and open research problems,” IEEE
Trans. Wireless Commun., vol. 6, no. 12, pp. 4349–4360, Dec. 2007. Network, vol. 34, no. 3, pp. 134–142, 2019.
[58] G. Caire, G. Taricco, and E. Biglieri, “Optimum power control over [81] K. B. Letaief, W. Chen, Y. Shi, J. Zhang, and Y.-J. A. Zhang, “The
fading channels,” IEEE Trans. Inf. Theory, vol. 45, no. 5, pp. 1468– roadmap to 6G—AI empowered wireless networks,” IEEE Commun.
1489, 1999. Mag., vol. 57, no. 8, pp. 84–90, Aug. 2019.
36
[82] J. Wang, C. Jiang, H. Zhang, Y. Ren, K.-C. Chen, and L. Hanzo, “Thirty [106] A. D. Domenico, Y.-F. Liu, and W. Yu, “Optimal computational
years of machine learning: The road to Pareto-optimal next-generation resource allocation and network slicing deployment in 5G hybrid C-
wireless networks,” IEEE Commun. Surveys Tuts., early access, 2020. RAN,” in Proc. IEEE ICC, 2019.
[83] Q. Mao, F. Hu, and Q. Hao, “Deep learning for intelligent wireless [107] N. V. Huynh, D. T. Hoang, D. N. Nguyen, and E. Dutkiewicz,
networks: A comprehensive survey,” IEEE Commun. Surveys Tuts., “Real-time network slicing with uncertain demand: A deep learning
vol. 20, no. 4, pp. 2595–2621, 2018. approach,” in Proc. IEEE ICC, 2019.
[84] L. Liang, H. Ye, G. Yu, and G. Y. Li, “Deep learning based wireless [108] B. Ojaghi, F. Adelantado, E. Kartsakli, A. Antonopoulos, and C. Verik-
resource allocation with application to vehicular networks,” Proc. of oukis, “Sliced-RAN: Joint slicing and functional split in future 5G radio
IEEE, vol. 108, no. 2, pp. 341–356, 2019. access networks,” in Proc. IEEE ICC, 2019.
[85] Y. Sun, M. Peng, Y. Zhou, Y. Huang, and S. Mao, “Application [109] L. Tello-Oquendo, I. F. Akyildiz, S.-C. Lin, and V. Pla, “SDN-based
of machine learning in wireless networks: Key techniques and open architecture for providing reliable internet of things connectivity in 5G
issues,” IEEE Commun. Surveys Tuts., vol. 21, no. 4, pp. 3072–3108, systems,” in Proc. Med-Hoc-Net, 2018.
2019. [110] P. Popovski, K. F. Trillingsgaard, O. Simeone, and G. Durisi,
[86] J. Park, S. Samarakoon, M. Bennis, and M. Debbah, “Wireless network “5G wireless network slicing for eMBB, URLLC, and mMTC: A
intelligence at the edge,” Proc. IEEE, vol. 107, no. 11, pp. 2204–2239, communication-theoretic view,” IEEE Access, vol. 6, pp. 55 765–
Nov. 2019. 55 779, 2018.
[87] RP-201304, “New SID: Study on further enhancement for data collec- [111] B. Liu, L. Wang, M. Liu, and C. Xu, “Lifelong federated reinforce-
tion.” 3GPP TSG RAN Meeting, Jun.-Jul. 2020. [Online]. Available: ment learning: a learning architecture for navigation in cloud robotic
https://www.3gpp.org/ftp/tsg ran/TSG RAN/TSGR 88e/Docs systems,” IEEE Robot. Autom. Lett., vol. 4, no. 4, pp. 4555–4562, Oct.
[88] H. Song, J. Bai, Y. Yi, J. Wu, and L. Liu, “Artificial intelligence enabled 2019.
internet of things: Network architecture and spectrum access,” IEEE [112] H. B. McMahan, E. Moore, D. Ramage, S. Hampson
Comput. Intell. Mag., vol. 15, no. 1, pp. 44–51, Feb. 2020. et al., “Communication-efficient learning of deep networks from
[89] F. Tang, Y. Kawamoto, N. Kato, and J. Liu, “Future intelligent and decentralized data,” in Proc. AISTATS, 2017. [Online]. Available:
secure vehicular network toward 6G: Machine-learning approaches,” https://arxiv.org/abs/1602.05629
Proc. IEEE, vol. 108, no. 2, pp. 292–307, Feb. 2020. [113] H. H. Yang, Z. Liu, T. Q. S. Quek, and H. V. Poor, “Scheduling policies
[90] Y. Mao, C. You, J. Zhang, K. Huang, and K. B. Letaief, “A survey for federated learning in wireless networks,” IEEE Trans. Commun.,
on mobile edge computing: The communication perspective,” IEEE vol. 68, no. 1, pp. 317–333, Jan. 2020.
Commun. Surveys Tuts., vol. 19, no. 4, pp. 2322–2358, 2017. [114] S. Samarakoon, M. Bennis, W. Saad, and M. Debbah, “Federated
[91] C. She, Y. Duan, G. Zhao, T. Q. S. Quek, Y. Li, and B. Vucetic, learning for ultra-reliable low-latency V2V communications,” in Proc.
“Cross-layer design for mission-critical IoT in mobile edge computing IEEE Globecom, 2018.
systems,” IEEE Internet of Things J., vol. 6, no. 6, pp. 9360–9374, [115] Y. Zhao, M. Li, L. Lai, N. Suda, D. Civin, and V. Chandra, “Federated
Dec. 2019. learning with non-IID data,” arXiv preprint arXiv:1806.00582, 2018.
[92] Y. Hu and A. Schmeink, “Delay-constrained communication in edge [116] L. Liu, J. Zhang, S. Song, and K. B. Letaief, “Client-edge-cloud
computing networks,” in Proc. IEEE SPAWC, 2018. hierarchical federated learning,” in Proc. IEEE ICC, 2020.
[93] C.-F. Liu, M. Bennis, and H. V. Poor, “Latency and reliability-aware [117] S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Trans.
task offloading and resource allocation for mobile edge computing,” in Knowl. Data Eng., vol. 22, no. 10, pp. 1345–1359, Oct. 2010.
IEEE Globecom Workshops, 2017.
[118] C. Tan, F. Sun, T. Kong, W. Zhang, C. Yang, and C. Liu, “A survey
[94] J. Liu and Q. Zhang, “Offloading schemes in mobile edge computing
on deep transfer learning,” Proc. Springer ICANN, 2018.
for ultra-reliable low latency communications,” IEEE Access, vol. 6,
pp. 12 825–12 837, 2018. [119] Z. Chen and B. Liu, “Lifelong machine learning,” Synthesis Lectures
on Artificial Intelligence and Machine Learning, vol. 12, no. 3, pp.
[95] P. Wang, Z. Zheng, B. Di, and L. Song, “HetMEC: Latency-optimal
1–207, 2018.
task assignment and resource allocation for heterogeneous multi-layer
mobile edge computing,” IEEE Trans. Wireless Commun., vol. 18, [120] H. Lee, S. H. Lee, and T. Q. S. Quek, “Deep learning for distributed
no. 10, pp. 4942–4956, Oct. 2019. optimization: Applications to wireless resource management,” IEEE J.
[96] M. S. Elbamby, C. Perfecto, M. Bennis, and K. Doppler, “Toward low- Sel. Areas in Commun., vol. 37, no. 10, pp. 2251–2266, Oct. 2019.
latency and ultra-reliable virtual reality,” IEEE Network, vol. 32, no. 2, [121] L. Liang, H. Ye, and G. Y. Li, “Spectrum sharing in vehicular networks
pp. 78–84, Mar.-Apr. based on multi-agent reinforcement learning,” IEEE J. Sel. Areas
[97] T. Höβler, L. Scheuvens, N. Franchi, M. Simsek, and G. P. Fettweis, Commun., vol. 37, no. 10, pp. 2282–2292, Oct. 2019.
“Applying reliability theory for future wireless communication net- [122] M. K. Sharma, A. Zappone, M. Assaad, M. Debbah, and S. Vassilaras,
works,” in Proc. IEEE PIMRC, 2017. “Distributed power control for large energy harvesting networks: A
[98] J. J. Nielsen, R. Liu, and P. Popovski, “Ultra-reliable low latency com- multi-agent deep reinforcement learning approach,” IEEE Trans. Cog.
munication using interface diversity,” IEEE Trans. Commun., vol. 66, Commun. Netw., vol. 5, no. 4, pp. 1140–1154, Dec. 2019.
no. 3, pp. 1322–1334, Mar. 2018. [123] T. Doan, S. Maguluri, and J. Romberg, “Finite-time analysis of
[99] K.-C. Chen, T. Zhang, R. D. Gitlin, and G. Fettweis, “Ultra-low distributed TD (0) with linear function approximation on multi-agent
latency mobile networking,” IEEE Network, vol. 33, no. 2, pp. 181– reinforcement learning,” in Proc. ICML, 2019.
187, Mar./Apr. [124] A. T. Z. Kasgari, W. Saad, M. Mozaffari, and H. V. Poor, “Experi-
[100] J. M. Walsh, S. Weber, and C. Wa Maina, “Optimal rate-delay tradeoffs enced deep reinforcement learning with generative adversarial networks
and delay mitigating codes for multipath routed and network coded (GANs) for model-free ultra reliable low latency communication,”
networks,” IEEE Trans. on Inf. Theory, vol. 55, no. 12, pp. 5491–5510, arXiv preprint arXiv:1911.03264, 2019.
Dec. 2009. [125] F. Ang, L. Chen, N. Zhao, Y. Chen, W. Wang, and F. R. Yu, “Robust
[101] X. Xu, D. Veitch, Y. Li, and B. Vucetic, “Minimum cost reconfigurable federated learning with noisy communication,” IEEE Trans. Commun.,
network template design with guaranteed QoS,” IEEE Trans. Commun., vol. 68, no. 6, pp. 3452–464, Jun. 2020.
vol. 68, no. 2, pp. 1013–1024, Feb. 2020. [126] G. Zhu, Y. Wang, and K. Huang, “Broadband analog aggregation for
[102] N. Bui, M. Cesana, S. A. Hosseini, Q. Liao, I. Malanchini, and low-latency federated edge learning,” IEEE Trans. Wireless Commun.,
J. Widmer, “A survey of anticipatory mobile networking: Context-based vol. 19, no. 1, pp. 491–506, Jan. 2019.
classification, prediction methodologies, and optimization techniques,” [127] C. E. Shannon, “A mathematical theory of communication,” Bell Syst.
IEEE Commun. Surveys Tuts., vol. 19, no. 3, pp. 1790–1821, 2017. Technical J., vol. 27, no. 3, pp. 379–423, 1948.
[103] N. Bui and J. Widmer, “Data-driven evaluation of anticipatory network- [128] J. W. Lee and R. E. Blahut, “Lower bound on BER of finite-length turbo
ing in lte networks,” IEEE Trans. Mobile Comput., vol. 17, no. 10, pp. codes based on EXIT characteristics,” IEEE Commun. Lett., vol. 8,
2252–2265, Oct. 2018. no. 4, pp. 238–240, Apr. 2004.
[104] S.-C. Hung, H. Hsu, S.-M. Cheng, Q. Cui, and K.-C. Chen, “Delay [129] G. Yue, B. Lu, and X. Wang, “Analysis and design of finite-length
guaranteed network association for mobile machines in heterogeneous LDPC codes,” IEEE Trans. Veh. Tech., vol. 56, no. 3, pp. 1321–1332,
cloud radio access network,” IEEE Trans. Mobile Comput., vol. 17, May 2007.
no. 12, pp. 2744–2760, Dec. 2018. [130] D. Goldin and D. Burshtein, “Improved bounds on the finite length
[105] 5GPPP Architecture Working Group, “View on 5G architecture,” in 5G scaling of polar codes,” IEEE Trans. Inf. Theory, vol. 60, no. 11, pp.
Architecture White Paper, Dec. 2017. 6966–6978, Nov. 2014.
37
[131] S. H. Hassani, K. Alishahi, and R. L. Urbanke, “Finite-length scaling [156] Q. Liu, X. Jiang, and Y. Zhou, “Per-flow end-to-end delay bounds in
for polar codes,” IEEE Trans. Inf. Theory, vol. 60, no. 10, pp. 5875– heterogeneous wireless networks,” in IEEE Int. Conf. Comp. Commun.,
5898, Oct. 2014. 2017.
[132] K. Niu, K. Chen, J. Lin, and Q. T. Zhang, “Polar codes: Primary [157] M. A. Zafer and E. Modiano, “A calculus approach to energy-efficient
concepts and practical decoding algorithms,” IEEE Commun. Mag, data transmission with quality-of-service constraints,” IEEE/ACM
vol. 52, no. 7, pp. 192–203, Jul. 2014. Trans. Networking, vol. 17, no. 3, pp. 898–911, Jun. 2009.
[133] P. M. Olmos and R. L. Urbanke, “A scaling law to predict the finite- [158] 3GPP TR 36.814, Further Advancements for E-UTRA Physical Layer
length performance of spatially-coupled LDPC codes,” IEEE Trans. Aspects. TSG RAN v9.0.0, Mar. 2010.
Inf. Theory, vol. 61, no. 6, pp. 3164–3184, Jun. 2015. [159] R. Devassy, G. Durisi, G. C. Ferrante, O. Simeone, and E. Uysal,
[134] D. Zhang and H. Meyr, “On the performance gap between ml and iter- “Reliable transmission of short packets through queues and noisy
ative decoding of finite-length turbo-coded BICM in MIMO systems,” channels under latency and peak-age violation guarantees,” IEEE J.
IEEE Trans. Commun., vol. 65, no. 8, pp. 3201–3213, Aug. 2017. Sel. Areas Commun., vol. 37, no. 4, pp. 721–734, Apr. 2019.
[135] V. Aref, N. Rengaswamy, and L. Schmalen, “Finite-length analysis of [160] A. P. Zwart and O. J. Boxma, “Sojourn time asymptotics in the M/G/1
spatially-coupled regular LDPC ensembles on burst-erasure channels,” processor sharing queue,” Queueing Systems, vol. 35, pp. 141–166,
IEEE Trans. Inf. Theory, vol. 64, no. 5, pp. 3431–3449, May 2018. 2000.
[136] Y. Polyanskiy and S. Verdu, “Scalar coherent fading channel: dispersion [161] D. Gross and C. Harris, Fundamentals of Queueing Theory. Wiley,
analysis,” in Proc. IEEE ISIT, 2011. 1985.
[137] A. Collins and Y. Polyanskiy, “Orthogonal designs optimize achievable
[162] W. J. Stewart, Probability, Markov chains, queues, and simulation: the
dispersion for coherent miso channels,” in Proc. IEEE ISIT, 2014.
mathematical basis of performance modeling. Princeton university
[138] G. Durisi, T. Koch, J. Ostman, Y. Polyanskiy, and W. Yang, “Short-
press, 2009.
packet communications over multiple-antenna rayleigh-fading chan-
nels,” IEEE Trans. Inf. Theory, vol. 64, no. 2, pp. 618–629, Feb. 2016. [163] J.-Y. Le Boudec and P. Thiran, Network calculus: a theory of determin-
[139] Y. Polyanskiy, H. V. Poor, and S. Verdú, “Dispersion of the gilbert- istic queuing systems for the internet. Springer Science & Business
elliott channel,” IEEE Trans. Inf. Theory, vol. 57, no. 4, pp. 1829–1848, Media, 2001, vol. 2050.
Apr. 2011. [164] H. Al-Zubaidy, J. Liebeherr, and A. Burchard, “Network-layer perfor-
[140] A. Collins and Y. Polyanskiy, “Coherent multiple-antenna block-fading mance analysis of multihop fading channels,” IEEE/ACM Trans. Netw.,
channels at finite blocklength,” IEEE Trans. Inf. Theory, vol. 65, no. 1, vol. 24, no. 1, pp. 204–217, Feb. 2016.
pp. 380–405, Jan. 2019. [165] L. Liu, P. Parag, J. Tang, W.-Y. Chen, and J.-F. Chamberland, “Resource
[141] A. Lancho, J. Östman, G. Durisi, T. Koch, and G. Vazquez-Vilar, “Sad- allocation and quality of service evaluation for wireless communication
dlepoint approximations for short-packet wireless communications,” systems using fluid models,” IEEE Trans. on Inf. Theory, vol. 53, no. 5,
IEEE Trans. Wireless Commun., vol. 19, no. 7, pp. 4831–4846, Jul. pp. 1767–1777, May 2007.
2020. [166] C. She, C. Yang, and L. Liu, “Energy-efficient resource allocation for
[142] R. A. Berry and R. G. Gallager, “Communication over fading channels MIMO-OFDM systems serving random sources with statistical QoS
with delay constraints,” IEEE Trans. Inf. Theory, vol. 48, no. 5, pp. requirement,” IEEE Trans. Commun., vol. 63, no. 11, pp. 4125–4141,
1135–1149, May 2002. Nov. 2015.
[143] M. J. Neely, “Optimal energy and delay tradeoffs for multiuser wireless [167] C. She and C. Yang, “Energy efficiency and delay in wireless systems:
downlinks,” IEEE Trans. Inf. Theory, vol. 53, no. 9, pp. 3095–3113, Is their relation always a tradeoff?” IEEE Trans. on Wireless Commun.,
Sep. 2007. vol. 15, no. 11, pp. 7215–7228, Nov. 2016.
[144] Y. Li, M. Sheng, Y. Shi, X. Ma, and W. Jiao, “Energy efficiency [168] M. Amjad, L. Musavian, and M. H. Rehmani, “Effective capacity in
and delay tradeoff for time-varying and interference-free wireless wireless networks: A comprehensive survey,” IEEE Commun. Surveys
networks,” IEEE Trans. Wireless Commun., vol. 13, no. 11, pp. 5921– Tuts., early access, 2019.
5931, Nov. 2014. [169] F. Kelly, “Notes on effective bandwidths,” Stochastic networks: theory
[145] Z. Niu, X. Guo, S. Zhou, and P. R. Kumar, “Characterizing energy- and applications, 1996.
delay tradeoff in hyper-cellular networks with base station sleeping [170] M. Ozmen and M. C. Gursoy, “Wireless throughput and energy
control,” IEEE J. Sel. Areas Commun., vol. 33, no. 4, pp. 641–650, efficiency with random arrivals and statistical queuing constraints,”
Apr. 2015. IEEE Trans. Inf. Theory, vol. 62, no. 3, pp. 1375–1395, Mar. 2016.
[146] E. Uysal-Biyikoglu, B. Prabhakar, and A. E. Gamal, “Energy-efficient [171] J. Tang and X. Zhang, “Quality-of-service driven power and rate
packet transmission over a wireless link,” IEEE/ACM Trans. Network- adaptation over wireless links,” IEEE Trans. on Wireless Commun.,
ing, vol. 10, no. 4, pp. 487–499, Aug. 2002. vol. 6, no. 8, pp. 3058–3068, Aug. 2007.
[147] M. A. Zafer and E. Modiano, “A calculus approach to minimum energy [172] B. Soret, M. C. Aguayo-Torres, and J. T. Entrambasaguas, “Capacity
transmission policies with quality of service guarantees,” in Proc. IEEE with explicit delay guarantees for generic sources over correlated
INFCOM, 2005, pp. 548–559. Rayleigh channel,” IEEE Trans. on Wireless Commun., vol. 9, no. 6,
[148] I.-H. Hou and P. R. Kumar, Packets with deadlines: A framework for pp. 1901–1911, Jun. 2010.
real-time wireless networks. Synthesis Lectures on Communication
[173] M. C. Gursoy, “Throughput analysis of buffer-constrained wireless sys-
Networks, 2013.
tems in the finite blocklength regime,” EURASIP J. Wireless Commun.
[149] G. Zhang, T. Q. S. Quek, A. Huang, and H. Shan, “Delay and reliability
Netw., vol. 2013, no. 1, Dec. 2013.
tradeoffs in heterogeneous cellular networks,” IEEE Trans. Commun.
[174] Y. Hu, A. Schmeink, and J. Gross, “Blocklength-limited performance
[150] C. Chang and J. A. Thomas, “Effective bandwidth in high-speed digital
of relaying under quasi-static Rayleigh channels,” IEEE Trans. Wireless
networks,” IEEE J. Sel. Areas Commun., vol. 13, no. 6, pp. 1091–1100,
Commun., vol. 15, no. 7, pp. 4548–4558, Jul. 2016.
Aug. 1995.
[151] D. Wu and R. Negi, “Effective capacity: A wireless link model for [175] D. Öhmann, M. Simsek, and G. P. Fettweis, “Achieving high avail-
support of quality of service,” IEEE Trans. Wireless Commun., vol. 2, ability in wireless networks by an optimal number of Rayleigh-fading
no. 4, pp. 630–643, Jul. 2003. links,” in IEEE Globecom Workshops, 2014.
[152] W. Cheng, X. Zhang, and H. Zhang, “Joint spectrum and power [176] M. Serror, C. Dombrowski, K. Wehrle, and J. Gross, “Channel coding
efficiencies optimization for statistical QoS provisionings over versus cooperative ARQ: Reducing outage probability in ultra-low
SISO/MIMO wireless networks,” IEEE J. Sel. Areas Commun., vol. 31, latency wireless communications,” in IEEE Globecom Workshops,
no. 5, pp. 903–915, May 2013. 2015.
[153] L. Musavian and Q. Ni, “Effective capacity maximization with statis- [177] J. J. Nielsen, R. Liu, and P. Popovski, “Ultra-reliable low latency com-
tical delay and effective energy efficiency requirements,” IEEE Trans. munication (URLLC) using interface diversity,” IEEE Trans. Commun.,
Wireless Commun., vol. 14, no. 7, pp. 1608–1621, Jul. 2015. vol. 66, no. 3, pp. 1322–1334, Mar. 2018.
[154] H. Ren, N. Liu, C. Pan, M. Elkashlan, A. Nallanathan, X. You, [178] L. D. Minkova and E. Omey, “A new markov binomial distribution,”
and L. Hanzo, “Low-latency C-RAN: An next-generation wireless Taylor & Francis Commun. in Statistics-Theory and Methods, vol. 43,
approach,” IEEE Veh. Tech. Mag., vol. 13, no. 2, pp. 48–56, Jun. 2018. no. 13, pp. 2674–2688, Jun. 2014.
[155] S. Schiessl, J. Gross, and H. Al-Zubaidy, “Delay analysis for wireless [179] A. Al-Hourani, S. Kandeepan, and A. Jamalipour, “Modeling air-to-
fading channels with finite blocklength channel coding,” in Proc. ACM ground path loss for low altitude platforms in urban environments,” in
MSWiM, 2015. Proc. IEEE Globecom, 2014.
38
[180] A. Al-Hourani, S. Kandeepan, and S. Lardner, “Optimal LAP altitude [203] A. Voulodimos, N. Doulamis, A. Doulamis, and E. Protopapadakis,
for maximum coverage,” IEEE Commun. Lett., vol. 3, no. 6, pp. 569– “Deep learning for computer vision: A brief review,” Computational
572, Dec. 2014. intelligence and neuroscience, vol. 2018, 2018.
[181] S. S. Szyszkowicz, H. Yanikomeroglu, and J. S. Thompson, “On the [204] D. Amodei, S. Ananthanarayanan, R. Anubhai, J. Bai, E. Battenberg,
feasibility of wireless shadowing correlation models,” IEEE Trans. on C. Case, J. Casper, B. Catanzaro, Q. Cheng, G. Chen et al., “Deep
Veh. Tech., vol. 59, no. 9, pp. 4222–4236, Nov. 2010. speech 2: End-to-end speech recognition in english and mandarin,” in
[182] D. Öhmann, A. Awada, I. Viering, M. Simsek, and G. P. Fettweis, Proc. ICML, 2016.
“SINR model with best server association for high availability studies [205] R. Collobert and J. Weston, “A unified architecture for natural language
of wireless networks,” IEEE Wireless Commun. Lett., vol. 5, no. 1, pp. processing: Deep neural networks with multitask learning,” in Proc.
60–63, Feb. 2016. ICML, 2008.
[183] C. She, Z. Chen, C. Yang, T. Q. S. Quek, Y. Li, and B. Vucetic, [206] Z. Qin, H. Ye, G. Y. Li, and B.-H. F. Juang, “Deep learning in physical
“Improving network availability of ultra-reliable and low-latency com- layer communications,” IEEE Wireless Commun., vol. 26, no. 2, pp.
munications with multi-connectivity,” IEEE Trans. Commun., vol. 66, 93–99, Apr. 2019.
no. 11, pp. 5482–5496, Nov. 2018. [207] J. Guo, C.-K. Wen, S. Jin, and G. Y. Li, “Convolutional neural
[184] Z. Jiang, A. F. Molisch, G. Caire, and Z. Niu, “Achievable rates of network based multiple-rate compressive sensing for massive MIMO
FDD massive MIMO systems with spatial channel correlation,” IEEE CSI feedback: Design, simulation, and analysis,” IEEE Trans Wireless
Trans. on Wireless Commun., vol. 14, no. 5, pp. 2868–2882, May 2015. Commun., vol. 19, no. 4, pp. 2827–2840, Apr. 2020.
[185] M. Haenggi, “The local delay in Poisson networks,” IEEE Trans. Inf. [208] P. Dong, H. Zhang, G. Y. Li, N. NaderiAlizadeh, and I. S. Gaspar,
Theory, vol. 59, no. 3, pp. 1788–1802, Mar. 2012. “Deep CNN for wideband mmwave massive MIMO channel estimation
[186] P.-Y. Kong, “Power consumption and packet delay relationship for using frequency correlation,” in Proc. IEEE ICASSP, 2019.
heterogeneous wireless networks,” IEEE Commun. Lett., vol. 17, no. 7, [209] Y. Yang, D. B. Smith, and S. Seneviratne, “Deep learning channel
pp. 1376–1379, Jul. 2013. prediction for transmit power control in wireless body area networks,”
[187] Y. Zhong, T. Q. S. Quek, and X. Ge, “Heterogeneous cellular networks in Proc. IEEE ICC, 2019.
with spatio-temporal traffic: Delay analysis and scheduling,” IEEE J. [210] J. D. Herath, A. Seetharam, and A. Ramesh, “A deep learning model
Sel. Areas Commun., vol. 35, no. 6, pp. 1373–1386, Jun. 2017. for wireless channel quality prediction,” in Proc. IEEE ICC, 2019.
[188] G. Ozcan and M. C. Gursoy, “Throughput of cognitive radio systems [211] J. Wang, J. Tang, Z. Xu, Y. Wang, G. Xue, X. Zhang, and D. Yang,
with finite blocklength codes,” IEEE J. Sel. Areas Commun., vol. 31, “Spatiotemporal modeling and prediction in cellular networks: A big
no. 11, pp. 2541–2554, Nov. 2013. data enabled deep learning approach,” in IEEE INFOCOM. IEEE,
2017.
[189] D. Tuninetti, B. Smida, N. Devroye, and H. Seferoglu, “Scheduling
[212] C. Qiu, Y. Zhang, Z. Feng, P. Zhang, and S. Cui, “Spatio-temporal
on the gaussian broadcast channel with hard deadlines,” in Proc. IEEE
ICC, 2018. wireless traffic prediction with recurrent neural network,” IEEE Wire-
less Commun. Letters., vol. 7, no. 4, pp. 554–557, Aug. 2018.
[190] C. She, C. Yang, and T. Q. S. Quek, “Joint uplink and downlink
[213] A. Azari, P. Papapetrou, S. Denic, and G. Peters, “User traffic prediction
resource configuration for ultra-reliable and low-latency communica-
for proactive resource management: Learning-powered approaches,” in
tions,” IEEE Trans. Commun., vol. 66, no. 5, pp. 2266–2280, May
Proc. IEEE Globecom, 2019.
2018.
[214] C. Zhang, H. Zhang, J. Qiao, D. Yuan, and M. Zhang, “Deep transfer
[191] Z. Hou, C. She, Y. Li, T. Q. S. Quek, and B. Vucetic, “Burstiness
learning for intelligent cellular traffic prediction based on cross-domain
aware bandwidth reservation for ultra-reliable and low-latency com-
big data,” IEEE J. Sel. Areas Commun., vol. 37, no. 6, pp. 1389–1401,
munications in tactile internet,” IEEE J. Sel. Areas Commun., vol. 36,
Jun. 2019.
no. 11, pp. 2401–2410, Nov. 2018.
[215] W. Zhang, Y. Liu, T. Liu, and C. Yang, “Trajectory prediction with
[192] D. Malak, M. Médard, and E. M. Yeh, “Tiny codes for guaranteeable recurrent neural networks for predictive resource allocation,” in IEEE
delay,” IEEE J. Sel. Areas Commun., vol. 37, no. 4, pp. 809–825, Apr. ICSP, 2018.
2019.
[216] H. Gebrie, H. Farooq, and A. Imran, “What machine learning predictor
[193] S. Schiessl, J. Gross, M. Skoglund, and G. Caire, “Delay performance performs best for mobility prediction in cellular networks?” in IEEE
of the multiuser MISO downlink under imperfect CSI and finite-length ICC Workshops, 2019.
coding,” IEEE J. Sel. Areas Commun., vol. 37, no. 4, pp. 765–779, [217] K. Xiao, J. Zhao, Y. He, and S. Yu, “Trajectory prediction of UAV in
Apr. 2019. smart city using recurrent neural networks,” in Proc. IEEE ICC, 2019.
[194] C. Sahin, L. Liu, E. Perrins, and L. Ma, “Delay-sensitive communi- [218] X. Hou, J. Zhang, M. Budagavi, and S. Dey, “Head and body motion
cations over IR-HARQ: Modulation, coding latency, and reliability,” prediction to enable mobile VR experiences with low latency,” in Proc.
IEEE J. Sel. Areas Commun., vol. 37, no. 4, pp. 749–764, Apr. 2019. IEEE Globecom, 2019.
[195] S. E. Elayoubi, P. Brown, M. Deghel, and A. Galindo-Serrano, “Radio [219] M. Lopez-Martin, B. Carro, J. Lloret, S. Egea, and A. Sanchez-
resource allocation and retransmission schemes for URLLC over 5G Esguevillas, “Deep learning model for multimedia quality of experience
networks,” IEEE J. Sel. Areas Commun., vol. 37, no. 4, pp. 896–904, prediction based on network flow packets,” IEEE Commun. Mag.,
Apr. 2019. vol. 56, no. 9, pp. 110–117, Sep. 2018.
[196] L. Zhou, A. Wolf, and M. Motani, “On lossy multi-connectivity: Finite [220] G. White, A. Palade, and S. Clarke, “Forecasting QoS attributes using
blocklength performance and second-order asymptotics,” IEEE J. Sel. LSTM networks,” in Proc. ACM IJCNN, 2018.
Areas Commun., vol. 37, no. 4, pp. 735–748, Apr. 2019. [221] H. Khedher, S. Hoteit, P. Brown et al., “Processing time evaluation and
[197] M. Khoshnevisan, V. Joseph, P. Gupta, F. Meshkati, R. Prakash, and prediction in cloud-RAN,” in Proc. IEEE ICC, 2019.
P. Tinnakornsrisuphap, “5G industrial networks with CoMP for URLLC [222] H. Sun, X. Chen, Q. Shi, M. Hong, X. Fu, and N. D. Sidiropoulos,
and time sensitive network architecture,” IEEE J. Sel. Areas Commun., “Learning to optimize: Training deep neural networks for interference
vol. 37, no. 4, pp. 947–959, Apr. 2019. management,” IEEE Trans. Signal Process., vol. 66, no. 20, pp. 5438–
[198] M. J. Neely, “Intelligent packet dropping for optimal energy-delay 5453, Oct. 2018.
tradeoffs in wireless downlinks,” IEEE Trans. on Autom. Control, [223] W. Lee, M. Kim, and D.-H. Cho, “Deep power control: Transmit
vol. 54, no. 3, pp. 565–579, Mar. 2009. power control scheme based on convolutional neural network,” IEEE
[199] D. Tse, Fundamentals of wireless communication. Cambridge Univ. Commun. Lett., vol. 22, no. 6, pp. 1276–1279, Jun. 2018.
Press, 2005. [224] J. Guo and C. Yang, “Predictive resource allocation with deep learning,”
[200] C. Guo, L. Liang, and G. Y. Li, “Resource allocation for vehicular in Proc. IEEE VTC-Fall, 2018.
communications with low latency and high reliability,” IEEE Trans. [225] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol.
Wireless Commun., vol. 18, no. 8, pp. 3887–3902, Aug. 2019. 521, pp. 436–444, 2015.
[201] I. Al-Anbagi, M. Erol-Kantarci, and H. T. Mouftah, “A survey on cross- [226] M. Leshno, V. Y. Lin, A. Pinkus, and S. Schocken, “Multilayer
layer quality-of-service approaches in WSNs for delay and reliability- feedforward networks with a nonpolynomial activation function can
aware applications,” IEEE Commun. Surveys Tuts., vol. 18, no. 1, pp. approximate any function,” Neural networks, vol. 6, no. 6, pp. 861–
525–552, 2016. 867, 1993.
[202] J. Farkas, L. L. Bello, and C. Gunther, “Time-sensitive networking [227] Y. Shen, Y. Shi, J. Zhang, and K. B. Letaief, “Transfer learning for
standards,” IEEE Commun. Standards Mag., vol. 2, no. 2, pp. 20–21, mixed-integer resource allocation problems in wireless networks,” in
Jun. 2018. Proc. IEEE ICC, 2019.
39
[228] A. Felix, S. Cammerer, S. Dörner, J. Hoydis, and S. Ten Brink, [256] Z. Xu, Y. Wang, J. Tang, J. Wang, and M. C. Gursoy, “A deep
“OFDM-autoencoder for end-to-end learning of communications sys- reinforcement learning based framework for power-efficient resource
tems,” in IEEE SPAWC, 2018. allocation in cloud RANs,” in Proc. IEEE ICC, 2017.
[229] N. Farsad, M. Rao, and A. Goldsmith, “Deep learning for joint source- [257] J. Li, H. Gao, T. Lv, and Y. Lu, “Deep reinforcement learning based
channel coding of text,” in Proc. IEEE ICASSP, 2018. computation offloading and resource allocation for MEC,” in Proc.
[230] S. Xue, Y. Ma, N. Yi, and R. Tafazolli, “Unsupervised deep learning IEEE WCNC, 2018.
for MU-SIMO joint transmitter and noncoherent receiver design,” IEEE [258] H. Ye and G. Y. Li, “Deep reinforcement learning for resource
Wireless Commun. Lett., vol. 8, no. 1, pp. 177–180, Feb. 2019. allocation in V2V communications,” in Proc. IEEE ICC, 2018.
[231] D. S. Kalogerias, M. Eisen, G. J. Pappas, and A. Ribeiro, “Model- [259] R. Li, Z. Zhao, Q. Sun, I. Chih-Lin, C. Yang, X. Chen, M. Zhao, and
free learning of optimal ergodic policies in wireless systems,” arXiv H. Zhang, “Deep reinforcement learning for resource management in
preprint arXiv:1911.03988, 2019. network slicing,” IEEE Access, vol. 6, pp. 74 429–74 441, 2018.
[232] D. Liu, C. Sun, C. Yang, and L. Hanzo, “Optimizing wireless systems [260] Y. Sun, M. Peng, and S. Mao, “Deep reinforcement learning-based
using unsupervised and reinforced-unsupervised deep learning,” IEEE mode selection and resource management for green fog radio access
Netw. Mag., vol. 34, no. 4, pp. 270–277, Jul./Aug. 2020. networks,” IEEE Internet of Things J., vol. 6, no. 2, pp. 1960–1971,
[233] M. Eisen and A. R. Ribeiro, “Optimal wireless resource allocation with Apr. 2018.
random edge graph neural networks,” IEEE Trans. Signal Process., [261] J. Chen, S. Chen, Q. Wang, B. Cao, G. Feng, and J. Hu, “iRAF: a
vol. 68, pp. 2977–2991, 2020. deep reinforcement learning approach for collaborative mobile edge
[234] N. P. Osmolovskii, Calculus of variations and optimal control. Amer- computing IoT networks,” IEEE Internet of Things J., vol. 6, no. 4,
ican Mathematical Soc., 1998, vol. 180. pp. 7011–7024, Aug. 2019.
[235] J. Gregory, Constrained optimization in the calculus of variations and [262] C. Qi, Y. Hua, R. Li, Z. Zhao, and H. Zhang, “Deep reinforcement
optimal control theory. Chapman and Hall/CRC, 2018. learning with discrete normalized advantage functions for resource
[236] O. C. Zienkiewicz, R. L. Taylor, P. Nithiarasu, and J. Zhu, The finite management in network slicing,” IEEE Commun. Lett., vol. 23, no. 8,
element method. McGraw-hill London, 1977, vol. 3. pp. 1337–1341, Aug. 2019.
[237] J. F. Wendt, Computational fluid dynamics: an introduction. Springer [263] R. Shafin, H. Chen, Y. H. Nam, S. Hur, J. Park, J. Reed, L. Liu
Science & Business Media, 2008. et al., “Self-tuning sectorization: Deep reinforcement learning meets
[238] P. Baldi, “Autoencoders, unsupervised learning, and deep architec- broadcast beam optimization,” IEEE Trans. Wireless Commun., vol. 19,
tures,” in Proc. ICML, 2012. no. 6, pp. 4038–4053, Jun. 2020.
[239] A. Droniou and O. Sigaud, “Gated autoencoders with tied input [264] K. Rusek, J. Suárez-Varela, P. Almasan, P. Barlet-Ros, and A. Cabellos-
weights,” in Proc. ICML, 2013. Aparicio, “Routenet: Leveraging graph neural networks for network
[240] J. E. Prussing, Optimal Spacecraft Trajectories. Oxford Univ. Press, modeling and optimization in SDN,” IEEE J. Sel. Areas Commun.,
2017, ch. Optimal Control Theory. early access, 2020.
[241] C. Sun, D. Liu, and C. Yang, “Model-free unsupervised learning for [265] P. Almasan, J. Suárez-Varela, A. Badia-Sampera, K. Rusek, P. Barlet-
optimization problems with constraints,” in Proc. IEEE APCC, 2019. Ros, and A. Cabellos-Aparicio, “Deep reinforcement learning meets
[242] E. Rodrigues Gomes and R. Kowalczyk, “Dynamic analysis of multi- graph neural networks: exploring a routing optimization use case,”
agent Q-learning with ε-greedy exploration,” in Proc. ICML. ACM, arXiv preprint arXiv:1910.07421, 2020.
2009. [266] O. Marom and B. Rosman, “Belief reward shaping in reinforcement
[243] J. Asmuth, L. Li, M. L. Littman, A. Nouri, and D. Wingate, “A bayesian learning,” in Proc. AAAI, 2018.
sampling approach to exploration in reinforcement learning,” in Proc. [267] A. Y. Ng, D. Harada, and S. Russell, “Policy invariance under reward
UAI. AUAI Press, 2009. transformations: Theory and application to reward shaping,” in ICML,
[244] R. Bellman, “Dynamic programming,” Science, vol. 153, no. 3731, pp. 1999.
34–37, 1966. [268] H. Zou, T. Ren, D. Yan, H. Su, and J. Zhu, “Reward shaping via meta-
[245] V. R. Konda and J. N. Tsitsiklis, “Actor-critic algorithms,” in Proc. learning,” arXiv preprint arXiv:1901.09330, 2019.
NIPS, 2000. [269] H. Van Seijen, M. Fatemi, J. Romoff et al., “Hybrid reward architecture
[246] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. for reinforcement learning,” in NIPS, 2017.
Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski [270] T. Schaul, J. Quan, I. Antonoglou, and D. Silver, “Prioritized experience
et al., “Human-level control through deep reinforcement learning,” replay,” in ICLR, 2015.
Nature, vol. 518, no. 7540, p. 529, 2015. [271] Q. Liang, F. Que, and E. Modiano, “Accelerated primal-dual policy
[247] D. Zeng, L. Gu, S. Pan, J. Cai, and S. Guo, “Resource management optimization for safe reinforcement learning,” in Proc. NIPS, 2017.
at the network edge: A deep reinforcement learning approach,” IEEE [272] E. Altman, Constrained Markov decision processes. CRC Press, 1999,
Netw., vol. 33, no. 3, pp. 26–33, May/Jun. 2019. vol. 7.
[248] Z. Wang, L. Li, Y. Xu, H. Tian, and S. Cui, “Handover optimization via [273] D. V. Djonin and V. Krishnamurthy, “Q-learning algorithms for
asynchronous multi-user deep reinforcement learning,” in Proc. IEEE constrained markov decision processes with randomized monotone
ICC, 2018. policies: Application to MIMO transmission control,” IEEE Trans.
[249] N. Jay, N. Rotman, B. Godfrey, M. Schapira, and A. Tamar, “A deep Signal Process., vol. 55, no. 5, pp. 2170–2181, May 2007.
reinforcement learning perspective on internet congestion control,” in [274] M. H. Ngo and V. Krishnamurthy, “Monotonicity of constrained
Proc. ICML, 2019. optimal transmission policies in correlated fading channels with ARQ,”
[250] S. De Bast, R. Torrea-Duran, A. Chiumento, S. Pollin, and H. Gagacin, IEEE Trans. Signal Process., vol. 58, no. 1, pp. 438–451, Jan. 2010.
“Deep reinforcement learning for dynamic network slicing in IEEE [275] C. Sun, E. Stevens-Navarro, and V. W. Wong, “A constrained MDP-
802.11 networks,” in IEEE INFOCOM, 2019. based vertical handoff decision algorithm for 4G wireless networks,”
[251] R. Atallah, C. Assi, and M. Khabbaz, “Deep reinforcement learning- in Proc. IEEE ICC, 2008.
based scheduling for roadside communication networks,” in Proc. IEEE [276] J. Marecki, M. Petrik, and D. Subramanian, “Solution methods for
WiOpt, 2017. constrained markov decision process with continuous probability mod-
[252] Y. Wei, F. R. Yu, M. Song, and Z. Han, “User scheduling and resource ulation,” in Proc. UAI, 2013.
allocation in HetNets with hybrid energy supply: An actor-critic rein- [277] Y. Chow, M. Ghavamzadeh, L. Janson, and M. Pavone, “Risk-
forcement learning approach,” IEEE Trans. Wireless Commun., vol. 17, constrained reinforcement learning with percentile risk criteria,” Jour-
no. 1, pp. 680–692, Jan. 2018. nal of Machine Learning Research (JMLR), vol. 18, no. 1, pp. 1–51,
[253] S. Chinchali, P. Hu, T. Chu, M. Sharma, M. Bansal, R. Misra, 2018.
M. Pavone, and S. Katti, “Cellular network traffic scheduling with deep [278] S. Bhatnagar and K. Lakshmanan, “An online actor-critic algorithm
reinforcement learning,” in Proc. AAAI, 2018. with function approximation for constrained markov decision pro-
[254] I.-S. Comşa, S. Zhang, M. E. Aydin, P. Kuonen, Y. Lu, R. Trestian, and cesses,” J. Optim. Theory Appl., vol. 153, no. 3, pp. 688–708, Jun.
G. Ghinea, “Towards 5G: A reinforcement learning-based scheduling 2012.
solution for data traffic management,” IEEE Trans. Netw. Service [279] E. Glaessgen and D. Stargel, “The digital twin paradigm for future
Manag., vol. 15, no. 4, pp. 1661–1675, Dec. 2018. NASA and US air force vehicles,” in AIAA/ASME/ASCE/AHS/ASC
[255] H. Meng, D. Chao, Q. Guo, and X. Li, “Delay-sensitive task scheduling Struct. Dyn. Mater. Conf., 2012.
with deep reinforcement learning in mobile-edge computing systems,” [280] T. H.-J. Uhlemann, C. Lehmann, and R. Steinhilper, “The digital twin:
in Journal of Physics: Conference Series, vol. 1229, no. 1. IOP Realizing the cyber-physical production system for industry 4.0,” in
Publishing, 2019. Proc. Elsevier CIRP, 2017.
40
[281] G. Digital, “The digital twin: Compressing time-to-value for digital

industrial companies,” in White Paper, 2018.
[282] C. She, R. Dong, W. Hardjawana, Y. Li, and B. Vucetic, “Optimizing
resource allocation for 5G services with diverse quality-of-service
requirements,” in Proc. IEEE Globecom, 2019.
[283] V. A. de Jesus Oliveira, L. Nedel, and A. Maciel, “Assessment of
an articulatory interface for tactile intercommunication in immersive
virtual environments,” Elsevier Computers & Graphics, vol. 76, pp.
18–28, Nov. 2018.
[284] E. Steinbach, M. Strese, M. Eid, X. Liu, A. Bhardwaj, Q. Liu, M. Al-
Jaafreh, T. Mahmoodi, R. Hassen, A. El Saddik et al., “Haptic codecs
for the tactile internet,” Proc. IEEE, vol. 107, no. 2, pp. 447–470, Feb.
2018.
[285] C. Chaccour, M. N. Soorki, W. Saad, M. Bennis, and P. Popovski,
“Can terahertz provide high-rate reliable low latency communications
for wireless VR?” arXiv preprint arXiv:2005.00536, 2020.
[286] M. Aazam, K. A. Harras, and S. Zeadally, “Fog computing for 5G
tactile industrial internet of things: QoE-aware resource allocation
model,” IEEE Trans Ind. Informat., early access, vol. 15, no. 5, pp.
3085–3092, May 2019.
[287] S. Mangiante, G. Klas, A. Navon, Z. GuanHua, J. Ran, and M. D. Silva,
“VR is on the edge: How to deliver 360o videos in mobile networks,”
in Proc. ACM VR/AR Network, 2017.
[288] L. Zhou, S. Leng, N. Yang, and K. Zhang, “Multitasks scheduling
in delay-bounded mobile edge computing,” in Proc. Springer AICON,
2019.
[289] M. Tekinay and C. Beard, “Reducing computation time of a wireless
resource scheduler by exploiting temporal channel characteristics,”
Springer Wireless Netw., 2019.
[290] A. P. Miettinen and J. K. Nurminen, “Energy efficiency of mobile
clients in cloud computing,” in HotCloud, 2010.
[291] W. Zhang, Y. Wen, K. Guan, D. Kilper, H. Luo, and D. O. Wu, “Energy-
optimal mobile cloud computing under stochastic wireless channel,”
IEEE Trans. Wireless Commun., vol. 12, no. 9, pp. 4569–4581, Sep.
2013.
[292] Ericsson, “Internet of Things forecast,” in
Ericsson Mobility Report, 2019. [Online]. Available:
https://www.ericsson.com/en/mobility-report/internet-of-things-forecast
[293] M. Shirvanimoghaddam, M. Dohler, and S. J. Johnson, “Massive non-
orthogonal multiple access for cellular iot: Potentials and limitations,”
IEEE Commun. Mag., vol. 55, no. 9, pp. 55–61, Sep. 2017.
[294] B. Singh, O. Tirkkonen, Z. Li, and M. A. Uusitalo, “Contention-
based access for ultra-reliable low latency uplink transmissions,” IEEE
Wireless Commun. Lett., vol. 7, no. 2, pp. 182–185, Apr. 2018.
[295] I. A. Hemadeh, K. Satyanarayana, M. El-Hajjar, and L. Hanzo,
“Millimeter-wave communications: Physical channel models, design
considerations, antenna constructions, and link-budget,” IEEE Com-
mun. Surveys Tuts., vol. 20, no. 2, pp. 870–913, 2018.
[296] Y. Xing and T. S. Rappaport, “Propagation measurement system and
approach at 140 GHz-moving to 6G and above 100 GHz,” in Proc.
IEEE Globecom, 2018.
[297] F. Gama, J. Bruna, and A. Ribeiro, “Stability properties of graph neural
networks,” arXiv preprint arXiv:1905.04497, 2019.
[298] C. Li, C.-P. Li, K. Hosseini, S. B. Lee, J. Jiang, W. Chen, G. Horn, T. Ji,
J. E. Smee, and J. Li, “5G-based systems design for tactile internet,”
Proc. IEEE, vol. 107, no. 2, pp. 307–324, 2018.
[299] W. Yang, R. F. Schaefer, and H. V. Poor, “Wiretap channels:
Nonasymptotic fundamental limits,” IEEE Trans Inf. Theory, vol. 65,
no. 7, pp. 4069–4093, Jul. 2019.
[300] E. Seo, H. M. Song, and H. K. Kim, “GIDS: GAN based intrusion
detection system for in-vehicle network,” in Proc. IEEE PST, 2018.
[301] A. Ferdowsi and W. Saad, “Generative adversarial networks for dis-
tributed intrusion detection in the internet of things,” in Proc. IEEE
Globecom, 2019.

URLLC First Principles and Implementation For 6G 1603167259

Uploaded by

Copyright:

Available Formats

You might also like

URLLC First Principles and Implementation For 6G 1603167259

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

URLLC First Principles and Implementation For 6G 1603167259

Uploaded by

Copyright:

Available Formats

1

A Tutorial of Ultra-Reliable and Low-Latency

Index Terms—Ultra-reliable and low-latency communications,

According to the requirements in 5G standards [1], to

1) Cross-layer Design: Existing approaches to communi- Packet loss probability Traditional

2) Deep Learning: With 5G NR, the radio resources are

Indoor large-scale scenarios

Fig. 3. Road map toward URLLC in 6G networks [11, 12].

Ultra-reliable and low-latency

Smart devices with local … User-level tasks:

Fig. 4. A general architecture in 6G networks [11, 88, 89].

Central library with global DNNs

Sharing DNNs Aggregate edge DNNs at

BS level DNNs trained

Data or local Well-trained

Fig. 5. A general deep learning framework [11, 111].

follows, AWGN channel. The blocklength LB is the number of symbols

where θ is a QoS exponent. The value of θ decreases with

Physical Layer Solutions Link Layer Solutions Network Layer Solutions

Selecting solutions from each layer to obtain cross-layer solutions

Are there any new issues from the cross-layer perspective?

Fig. 7. From single-layer design to cross-layer design.

Design Objective Physical-layer issues Link-layer issues Network-layer issues Challenges/Results

Channel Estimation or Prediction

given input according to ŷ = Φ x[0] ; Ω , where Ω ,

cannot be applied directly.

In the coming two sections, we will review unsupervised deep

learning and DRL that do not need labeled training samples.

New task … VII. U NSUPERVISED D EEP L EARNING FOR URLLC

Unlike supervised deep learning, unsupervised deep learn-

ples. In this subsection, we review two branches of existing

where the basic idea is to formulate a functional optimization

problem and then approximate the optimal function with a

the source task for the new task

Part I: Autoencoder for Physical-layer Communications (Unconstrained Functional Optimization)

s.t. Gj (s, u(s)) ≤ 0, ∀s ∈ Ds , j = 1, ..., J. (12a)

2 s.t. Es [Gave (s, u(s))] ≤ 0, i = 1, ..., I, (14a)

Policy (optimized in the primal domain)

Fig. 12. Definition of Lagrangian [67].

The transition probabilities in (17) indicate that we can predict Environment

In reinforcement learning, the transition probabilities of the

MDP are referred to as the “model” of the problem. This

C. Techniques for Improving DRL in URLLC Potential

Fine-tuned actor Select a batch of training samples

Fig. 16. A digital twin of a wireless network [11, 12, 71].

Learn from scratch

B. Comparison of Supervised and Unsupervised Deep Learn- 10-3

Prediction horizon, 10 ms User experienced

Fig. 20. Prototype for E2E delay evaluation [11, 71].

as r̂k (t) = − ln[1 − rk (t)] to further improve the learning

target scheduling policy, and the approximation errors of the

Fig. 23. Distribution of E2E delay [71].

To achieve 1 ms user experienced delay and ±1 ms jit-

[281] G. Digital, “The digital twin: Compressing time-to-value for digital

You might also like