Deep Learning for Wireless Physical Layer

COVER PAPER
Deep Learning for Wireless Physical Layer:

Opportunities and Challenges
Tianqi Wang1, Chao-Kai Wen2, Hanqing Wang1,, Feifei Gao3, Tao Jiang4, Shi Jin1,*
1
National Mobile Communications Research Laboratory, Southeast University, Nanjing 210096, China
2
Institute of Communications Engineering, National Sun Yat-sen University, Kaohsiung 80424, Taiwan, China
3
State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory for Information Science and Technology,
Department of Automation, Tsinghua University, Beijing 100084, China
4
School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan 430074, China
* The corresponding author, email: jinshi@seu.edu.cn
Abstract: Machine learning (ML) has been show promising performance improvements
widely applied to the upper layers of wireless but have certain limitations, such as lack of
communication systems for various purposes, solid analytical tools and use of architectures
such as deployment of cognitive radio and that are specifically designed for communi-
communication network. However, its application and implementation research, thereby
cation to the physical layer is hampered by motivating future research in this field.
sophisticated channel environments and lim- Keywords: wireless communications; deep
ited learning ability of conventional ML algo- learning; physical layer
rithms. Deep learning (DL) has been recently
applied for many fields, such as computer I. INTRODUCTION
vision and natural language processing, given
its expressive capacity and convenient optimi- Wireless communication technologies have
zation capability. The potential application of experienced an extensive development to
DL to the physical layer has also been increas- satisfy the applications and services in the
ingly recognized because of the new features wireless network. The explosion of advanced
for future communications, such as complex wireless applications, such as diverse intelli-
scenarios with unknown channel models, high gent terminal access, virtual reality, augment-
speed and accurate processing requirements; ed reality, and Internet of things, has propelled
these features challenge conventional commu- the development of wireless communication
nication theories. This paper presents a com- into the fifth generation to achieve thousand-
prehensive overview of the emerging studies fold capacity, millisecond latency, and massive
on DL-based physical layer processing, in- connectivity, thereby making system design
cluding leveraging DL to redesign a module of an extraordinarily challenging task. Sever-
the conventional communication system (for al promising technologies, such as massive
modulation recognition, channel decoding, multi-input multi-output (MIMO), millimeter
and detection) and replace the communication wave (mmWave), and ultra-densification net-
system with a radically new architecture based work (UDN) have been proposed to satisfy the
Received: Oct. 16, 2017
Editor: Honggang Zhang on an autoencoder. These DL-based methods abovementioned demands. These technolo-
92 China Communications • November 2017

gies demonstrate the same characteristic (i.e., fore, the corresponding algorithms require
the capability to handle large wireless data). parallel signal processing architecture [2] to This paper reviews the
However, extant conventional communication achieve efficiency and accuracy. literature on the appli-
theories exhibit several inherent limitations 3) Limited block-structure communication cation of DL methods
to the physical layer
in fulfilling the large data and ultra-high-rate systems: Conventional communication sys-
of wireless commu-
communication requirements in complex sce- tems that are constructed in a divide-and-con- nication systems to
narios, listed as follows. quer manner, consist of a series of artificially replace parts of the
1) Difficult channel modeling in complex defined signal processing blocks, such as c o n ve n t i o n a l c o m -
scenarios: The design of the communication coding, modulation, and detection; these sys- munication system or
systems significantly depends on practical tems solve the communication problems in create a new DL-based
architecture.
channel conditions or is based on channel imperfect channels by optimizing each block
models that characterize real environments independently. An optimal performance in the
implicitly for mathematical convenience. entire communication task cannot be guaran-
These models struggle in complex scenarios teed, although researchers have attempted to
with many imperfections and nonlinearities optimize the algorithms of each processing
[1], although they may capture some features module for many years and achieved success
in conventional channels. For example, the in- in practice, because the fundamental problem
creased number of antennas in massive MIMO of communication depends on the reliable
systems has changed channel properties [2], message recovery at the receiver side after the
and the corresponding channel models remain message is sent by a transmitter and traverses
unknown. The use of out-of-band signals a channel [8]. This process does not require
or sensors as sources of side information at a block structure. Therefore, it holds promise
mmWave is promising [3]. However, a method for further improvement if the suboptimization
for combining out-of-band and sensor infor- for each module is replaced by optimizing for
mation to obtain the channel state information end-to-end performance.
of the mmWave remains unknown. In scenar- Machine learning (ML) has recently re-
ios, such as underwater or molecular commu- gained attention because of the successful
nications [4], the channels cannot be charac- applications of deep learning (DL) in com-
terized by using rigid mathematical models. puter vision (CV), automatic speech recogni-
Thus, systems or algorithms that can complete tion (ASR), and natural language processing
communication tasks without defined channel (NLP). The book [9] covers the recent results.
models are essential. Researchers are actively attempting to extend
2) Demand for effective and fast signal these technologies to other domains, includ-
processing: The use of low-cost hardware, ing wireless communication. Embedding ML
such as low-resolution analog-to-digital con- theories on a wide range of communication
verters with low energy consumption [5],[6], systems has had an extensive history and has
introduces additional nonlinear imperfections achieved several successes, especially in the
that require using highly robust receiving pro- upper layers, such as in cognitive radio, re-
cessing algorithms (e.g., algorithms for chan- source management [10], link adaptation [11],
nel estimation and detection). However, using [12], and positioning [13]. In contrast to the
these algorithms may increase computational abovementioned straightforward applications,
complexity. Traditional algorithms, such as ML faces several challenges when applied to
algorithms for MIMO data detection, are itera- the physical layer.
tive reconstruction approaches [7] that form a Researchers have applied ML to the phys-
computational bottleneck in real time, whereas ical layer for modulation recognition [14],
the real-time large data processing capabilities [15], channel modeling and identification
are necessary for the advanced systems (e.g., [16], [17],encoding and decoding [18], [19],
massive MIMO, mmWave, and UDN). There- channel estimation \[20], and equalization
China Communications • November 2017 93

[21], [22] (see further details in [23] and [24]); DL architectures given the wide application of
however, ML has been unused commercially DL technologies.
because handling physical channels is a com- Recent studies on DL for wireless commu-
plex process, and conventional ML algorithms nication systems have proposed alternative
have limited learning capacity. Researchers approaches to enhance certain parts of the
believe that ML can achieve further perfor- conventional communication system (e.g.,
mance improvements by introducing DL to the modulation recognition [1], channel encoding
physical layer. DL possesses essential charac- and decoding [26], [27], [28], [29], [30], [31],
teristics, such as deep modularization, which and channel estimation and detection [7], [4],
significantly enhances feature extraction and [32], [33]) and to replace the total system with
structure flexibility, compared with conven- a novel architecture on the basis of an auto-
tional ML algorithms. In particular, DL-based encoder [1], [34]. This paper aims to provide
systems can be used instead of manual feature an overview of these recent studies that focus
extraction to learn features from raw data on the physical layer. We also aim to highlight
automatically and adjust the model structures the potentials and challenges of the DL-based
flexibly via parameter tuning to optimize end- communication systems and offer a guideline
to-end performance. The DL-based commu- for future investigations by describing the mo-
nication system has promising applications in tivation, proposed methods, performance, and
complex scenarios for several reasons. limitations of these studies. Table 1 provides
First, the deep network has been proven the abbreviations appeared in the paper.
to be a universal function approximator [25] The rest of this paper is organized as fol-
with superior algorithmic learning ability de- lows. Section II provides a brief overview of
spite the complex channel conditions [1]. The the basic concepts of DL. Section III presents
``learned’’ algorithms in DL-based commu- several application examples of using DL as
nication systems are represented by learned alternatives for communication systems, such
weights in DL models that optimize end-to- as modulation recognition, channel decoding,
end performance through convenient training and detection. Section IV introduces a novel
methods instead of requiring well-defined communication architecture based on an auto-
mathematic models or expert algorithms that encoder. Section V discusses the areas for fu-
are solidly based on information theories. ture research. Section VI concludes the paper.
Second, handling large data is an essential
feature of DL because of its instinctive nature II. BASIC IDEA OF DL
of distributed and parallel computing archi-
tectures, which ensure computation speed and Langley (1996) defined ML as a branch of
processing capacity. The DL system demon- artificial intelligence that aimed to improve
strates a remarkable potential in producing performance by experience. After long-stand-
impressive computational throughput through ing research from the 20th century, various al-
fast-developing parallelized processing archi- gorithms have been proposed, such as logistic
tectures such as graphical processing units. regression, decision tree, support vector ma-
Third, DL-based communication systems chine (SVM), and neural network (NN). As an
can break the artificial block structure to emerging outstanding algorithm belonging to
achieve global performance improvement NN, DL is originally derived from the neuron
because they are trained to optimize end-to- model that simulates biological neural system
end performance without making an implicit schemes as shown in figure 1. The weighted
request for block-structure. Besides, various sum of several inputs with bias is fed into an
libraries or frameworks, such as TensorFlow, activation function σ (⋅) , usually a sigmoid
Theano, Caffe, and MXNet, have been estab- function, to obtain an output y. An NN is then
lished to accelerate experiments and deploy

established by connecting several neuron ele- Table I List of abbreviation
ments to a layered architecture. Abbreviation Stands for
The simplest NN is called perceptron, ML machine learning
which comprises one input layer and one DL deep learning
output layer. A certain loss function, such as MIMO multi-input multi-output
square error or cross entropy, must be estab- mmWave millimeter wave
lished for the perceptron to produce a value UDN ultra-densification network
that is close to the expected one as much CV computer vision
as possible. Gradient descent (GD) is com- ASR automatic speech recognition
monly used in training the best parameters NLP natural language processing
(i.e., weights and biases) to minimize such SVM support vector machine
loss function. Despite only solving linearly NN neural network
separable problems, a single perceptron will GD gradient descent
introduce nonlinear properties, and function as DNN deep neural network
a universal function approximator by adding ReLU rectified linear units
hidden layers and neurons between the input SGD stochastic gradient descent
and output layers. The evolved architecture CNN convolutional neural network
is called multi-layer perceptron that does not RNN recurrent neural network
significantly differ from the current deep NN LSTM long short-term memory
(DNN) or NNs with multiple hidden layers, SNR signal-to-noise ratio
which have achieved success in CV, ASR, and LLR log-likelihood ratio
NLP. BP belief propagation
A basic DL model is a fully connected HDPC high-density parity check
feedforward NN (figure 2), where each neuron BCH Bose-Chaudhuri-Hocquenghem
is connected to adjacent layers and no connec- AWGN additive whit Gaussian noise
tion exists in the same layer. The back propa- BER bit-error rate
gation algorithm has been proposed as an ef- mRRD modified random redundant iterative decoding
ficient method for training such network with NND neural network decoder
GD for optimization. However, the increase in MAP maximum a posteriori
the number of hidden layers and neurons im-
PNN partitioned neural network
plies the existence of several other parameters
FC fixed channel
to be determined, thereby making the network
VC varying channel
implementation difficult. Numerous problems
AMP approximate message passing
may be encountered during the training pro-
SDR semidefinite relaxation
cess, such as vanishing gradients, slow con-
ISI inter symbol interference
vergence, and falling into the local minimum.
CP cyclic prefix
To solve the vanishing gradient problem,
MMSE minimum mean square error
new activation functions (e.g., rectified linear
RTN radio transformer network
units (ReLU), a special feature of Maxout)
BLER block error rate
have been introduced to replace the classic
CSI channel state information
sigmoid function. Table \ref{activation} shows
the activation functions that can adapt to other
situations. To achieve faster convergence and process; thus, mini-batch SGD (a batch of
decrease computation complexity, classic GD samples computed simultaneously) is adopted
is adjusted into stochastic GD (SGD), which as a tradeoff between classic GD and SGD.
randomly selects one sample to compute the However, such algorithms still converge to the
loss and gradient every time. Stochastici- local optimal solution. To solve this problem
ty causes severe fluctuation in the training and further increase the training speed, sev-

emerging DNN architecture that is developed
x1 from a fully connected feedforward network to
w1
w2 y prevent a rapid growth in the parameters when
x2 ∑
wr applying the latter to image recognition. CNN
output layer
introduces the idea of designing particular
...
x3
bias
DNN architectures depending on the require-
input layer
ments of specific scenarios. The basic concept
of CNN is to add convolutional and pooling
Fig. 1 A mathematical model of a neuron, where r layers before feeding into a fully connected
features are connected by weighted edges and fed network (figure 3). In the convolutional layer,
into a sigmoid activation function to generate an each neuron only connects to parts of neurons
output y in the former adjacent layer. These neurons
that are organized in a matrix form comprise
several feature maps, and the neurons share
hidden layers
the same weights in each map. In the pool-
ing layer, the neurons in the feature maps
...... are grouped to compute for the mean value
......
(average pooling) or maximum value (max
...... pooling). Thus, the parameters are substantial-
...... ly decreased before using the fully connected
...... network.
input layer output layer Recurrent NN (RNN) aims to provide NNs
with memory because the outputs depend
not only on the current inputs but also on the
Fig. 2 A fully connected feedforward NN architecture where all neurons between
adjacent layers are fully connected. The DNN uses multiple hidden layers between formerly available information in cases such
the input and output layers to extract meaningful features as NLP. Compared with the aforementioned
memoryless NNs without connections in the
Table II Activation functions same hidden layer, the neurons are connected
such that the hidden layers consider their for-
Name Activation function σ ( x)
mer outputs as current inputs to acquire mem-
1
sigmoid ory (figure 4). Some commonly used RNNs
1+ e − x
include Elman network, Jordan network, bidi-
tanh tanh( x) rectional RNN, and long short-term memory
e xi (LSTM).
softmax
∑e xj
j III. DL AS AN ALTERNATIVE
ReLU max(0, x)
The general classic communication system
eral adaptive learning rate algorithms (e.g., architecture is constructed as a block structure
Adagrad, RMSProp, Momentum, and Adam) as shown in figure 5, and multiple algorithms
have been proposed. Although the trained net- solidly founded on expert knowledge have
work performs well in the training data, this been developed in long-term research to opti-
network may perform poorly in the testing mize each processing block therein. Previous
process because of overfitting. In this case, studies have attempted to leverage conven-
early stopping, regularization, and dropout tional ML approaches, such as SVM and small
schemes have been proposed to achieve favor- feedforward NNs, as alternative algorithms
able results in training and testing data. for individual tasks. DL architectures have re-
Convolutional NN (CNN) is another cently been introduced into several processing

blocks to adapt to emerging complex commu-
nication scenarios or outperform conventional
communication algorithms. This section pres-
dense layers
ents some examples of DL applications that ...
cover modulation recognition, channel decod-

ing, and detection.
inputs convolution pooling dense
3.1 Modulation recognition
Modulation recognition aims to distinguish Fig. 3 A CNN architecture that adds convolution layers and pooling layers before
modulation schemes of the received noisy dense layers. Each output in the convolution layers is obtained by dot production
signals, which is important to facilitate the between a certain filter matrix and an input matrix comprised of several neurons
communication among different communica- in the upper layer. Each output in the pooling layers is obtained by averaging or
tion systems, or interfere and monitor enemies searching maximum in a group of neurons in the convolution layer
for military use. Studies on modulation rec-
ognition have been conducted for many years
using conventional algorithms that are divided output yt output yt-1 output yt output yt+1
into two categories, namely, decision-theoretic
...
...
...
...
and pattern recognition approaches [35]. These
l+1 l+1 l+1
ht-1 ht h t+1
hidden layer l+1 l+1 l+1 l+1
approaches have several common procedures,
unfold
such as preprocessing, feature extraction, and l
ht-1 h lt
l
h t+1
hidden layer l l l l
classification. Previous studies have been keen
to leverage ML algorithms (usually SVM and
l-1 l-1 l-1
ht-1 ht h t+1
hidden layer l-1 l-1 l-1 l-1
NN) due to their robustness, self-adaption, and
nonlinear processing ability.
...
...
...
...
An NN architecture (figure 6) is proposed input xt input xt-1 intput xt input xt+1
in [35] as a powerful modulation classifier

to discriminate noise-corrupted band-limited
Fig. 4 A RNN architecture that considers the extracted features in the former state
modulated signals from 13 types of digital
as the one of the current input information. The current outputs depend on current
modulation schemes (e.g., MPSK, MASK, and and former inputs so that the network achieves memory
MFSK) and analog modulation schemes (e.g.,
AM, DSB, and FM). Similar to conventional [36] or analog modulations [37] from the orig-
expert feature analysis, this approach manu- inal signals, as well as the three fundamental
ally extracts features that characterize digital parameters of instantaneous amplitude, phase,
source source coding channel encoding modulation RF transmitter
channel
source channel RF
destination demodulation detection
decoding decoding receiver
channel
estimation
Fig. 5 A typical communication system diagram with blocks including source encoding/decoding, channel
encoding/decoding, modulation/demodulation, channel estimation and detection, and RF transceiving. These
signal processing blocks are optimized individually to achieve reliable communication between the source
and target destination

CNN-based modulation classifier dominates
and outperforms two other approaches, name-
AM
DSB ly, extreme gradient boosting with 1,000 esti-
VSB
LSB
mators and a single scikit-learn tree working
USB on the extracted expert features. The perfor-
extracted features as input
Combined(AM-FM)
(sigmoid activation function)
(sigmoid activation function)

.
(linear activation function)
FM mance of the classifier improves along with in-
PSK2
. creasing signal-to-noise ratio (SNR). However,
hidden layer 1
hidden layer 2
PSK4
output layer
input layer
. key feature 1
such performance cannot be improved further
. at high SNR because the short-term nature of
MASK
.
output
ASK2
layer
input
layer
training samples confuses the CNN classifier
. ASK4
between AM/FM if the underlying signal car-
ries limited information and between QAM16/
MFSK
QAM64 that share constellation points. More
output
FSK2
layer
input
layer FSK4
samples must be used to eliminate such con-
key feature 2
fusion and highlight the potential of the CNN
architecture for further improvement.
3.2 Channel decoding

Fig. 6 An NN architecture for modulation recognition that consists of a four-layer
NN and two two-layer NNs [35]. The former NN distinguishes most modulation ML-based decoders have emerged in the
schemes except for ASK and FSK. The latter NNs classify ASK2/ASK4 and FSK2/ 1990s [28] because of the straightforward
FSK4 with additional manually extracted key features applications of NN to channel decoding.
First, channel decoding algorithms focus on
and frequency. A four-layer NN is then fed bit-level processing; therefore, bits, or the
with these features to discriminate the modula- log-likelihood ratios (LLRs) of codewords, are
tion schemes, except the levels of MASK and conveniently treated as the inputs and expect-
MPSK, as identified by another two two-layer ed outputs of NNs. Mainly in previous studies,
NNs. The abovementioned problem-solving the input and output nodes directly represent
procedures, such as manual feature extraction bits in codewords [38], or use one-hot repre-
and NN classification, have been commonly sentation (i.e., each node represents one of all
applied in previous research (e.g., [14]). Their possible codewords) [39] such that the corre-
performance strongly depends on the extracted sponding vector has only one element equal to
features due to the limited learning ability of 1 and other elements equal to 0. Second, un-
conventional NNs. like the difficulty in obtaining datasets in other
DL is known for its impressive learning ca- scenarios, man-made codewords can generate
pacity. Its introduction highlights the possibil- sufficient training samples and achieve labeled
ity of replacing artificially extracting features outputs simultaneously. Furthermore, NN can
with automatic learning features from raw learn from the noise version of codewords and
data to optimize end-to-end performance. For avoid overfitting problems because the code-
example, a CNN-based approach is proposed words are randomized to different samples by
in [1] that learns single-carrier modulation the channel noise in each training epoch.
schemes based on sampled raw time-series Compared with conventional decoders that
data in the radio domain. The CNN classifier are designed strictly based on information the-
is trained by 1.2M sequences for 128 com- ory and often follow an iteration process that
plex-valued baseband IQ samples covering leads to high latency, the NN architecture does
10 different digital and analog modulation not require expert knowledge. After training a
schemes that pass through a wireless channel decoder, the decoding process becomes simple
with the effects of multipath fading, sample with low latency. Furthermore, the well-devel-
rate offset, and center frequency offset. The oped conventional decoding algorithms can

serve as benchmarks for comparing the perfor- nodes that are associated with the neurons or
mance of newly proposed DL-based methods. edges in the Tanner graph.
Despite its advantages, the NN-based The input and output are vectors of size N
decoder is fundamentally restricted by its that represent N-dimensional LLRs received
dimensionality (i.e., the training complexi- from channels and N-bits decoded codewords,
ty increases exponentially along with block respectively. The equations for calculating the
length) [40] to learn fully from and classify messages based on expert knowledge are then
codewords, thereby limiting its scalability. applied to the corresponding layers, and the
Fortunately, DL algorithms provide the poten- final marginalization of the BP algorithm is
tial to this problem. Aside from its inherent achieved after the last layer (i.e., after the last
parallel implementation for complex computa- iteration), which is involved in the loss func-
tions, a multi-layer architecture with realizable tion. The only difference of the DNN-based
training methods offers DL with an excellent BP algorithm from the original algorithm is
learning capacity. Recent studies have lever- that weights are added to these equations or
aged these advantages to address the issue of are assigned to the edges in the Tanner graph.
dimensionality by introducing DL to well-de- Thus, the DNN-based decoder shares the
veloped iterative algorithms (i.e., unfolding same decoding structure of the Tanner graph;
the iterative structure to the layer structure) however, the messages are propagated on the
and by generalizing from limited codewords weighted edges.
(i.e., inferring from seen codewords to unseen The DNN-based BP decoder proposed in
ones). [26] is shown in figure 7. The first two hidden
The fully connected DNN-based decoder layers are merged into one layer because the
proposed in [26], which falls under the first check nodes do not contain any information in
category, aims to improve the performance the first iteration. LLR messages (i.e., inputs)
of the belief propagation (BP) algorithm in are needed by the variable nodes to calcu-
decoding high-density parity check (HDPC) late the outgoing messages according to the
codes. BP algorithm can achieve near Shan- corresponding BP formulas. These messages
non capacity in decoding low-density parity are represented as red arrows at the odd lay-
check codes but struggles at decoding HDPC ers in figure 7. The DNN-based BP decoder
codes, such as Bose-Chaudhuri-Hocquenghem preserves the property of BP with essentially
(BCH) codes, that are commonly used today. similar structures and its performance is in-
Conventional BP decoders can be constructed
using a Tanner graph, where each variable
node is connected to some check nodes. In 2L hidden layers
each iteration, a variable (check) node trans-

mits a message to one of its connected check input layer output layer
……
(variable) nodes based on all messages re-
……
ceived from the other connected check (vari- ……
able) nodes. Thus, the BP algorithm with L
……
……
……
……
……
……
……
iterations can be unfolded as 2L hidden layers ……

in a fully connected DNN architecture, where ……
each hidden layer contains the same number N-dimensional N-dimensional
input vector output vector
of neurons that represent the edges in the first odd and odd layer even layer odd layer last even layer
Tanner graph. These neurons will output the even layer
(merged)
messages transmitted on corresponding edges.
In other words, the odd (even) hidden layers
output messages that are transmitted from the Fig. 7 A fully connected DNN-based BP decoder that unfolds conventional BP al-
variable (check) nodes to the check (variable) gorithms with L iterations to 2L hidden layers [26]

dependent from the transmitted codewords. alization information is considered obtainable
Thus, the network can be trained by the noise after each odd hidden layer. Therefore, a
version of a single codeword (i.e., zero code- modified architecture called multiloss, which
word) that belongs to all linear codes. adds such information into the loss function, is
After training the codewords that pass proposed to increase the gradient update and
through the additive white Gaussian noise allow lower layers. The multiloss architecture
(AWGN) channel with SNRs ranging from achieves a better bit-error rate (BER) perfor-
1 dB to 6 dB, a DNN-based BP decoder for mance compared with the previous DNN-
BCH (15,11) with 10 hidden layers (i.e., 5 full based BP decoder.
iterations) achieves close to maximum likeli- In [27], the aforementioned fully connected
hood results. The performance of the DNN- DNN-based BP decoder is transformed into an
Based decoder degrades on large BCH codes; RNN architecture, which is named as BP-RNN
nevertheless, the decoder consistently out- decoder, by unifying the weights in each itera-
performs the conventional BP algorithm. One tion and feeding back the outputs of parity lay-
interpretation given by [26] is that properly ers into the inputs of variable layers, as shown
weighting the transmitted messages compen- in figure 8. The number of time steps equals
sates for the small cycle effect in the original to that of iterations. This process significantly
Tanner graph. Furthermore, the final margin- reduces the number of parameters and results
in a performance that is comparable with that
of the former decoder. The multiloss concept
is also adopted in this architecture. These
proposed methods outperform the plain BP
decoded codewords
variable (odd) layer
parity (even) layer
algorithm whether on the regular or sparser

LLR messages
marginalization
(output) layer
… …
input layer
Tanner graph representations of the codes. A

… …
… …
… … modified random redundant iterative decoding
(mRRD)-RNN decoder, which combines the
… …
BP-RNN decoder with mRRD algorithms [41]

by replacing its BP blocks, is proposed and
outperforms the plain mRRD decoder with
only a slight increase in complexity.
Fig. 8 A BP-RNN decoder architecture with a vari- A plain DNN architecture called NN de-
able layer and a parity layer [27]. The output of the coder (NND), is proposed in [28] to decode
parity layer is fed into the variable layer in the next codewords of length N with K information
time step, so that L time steps represent L iterations bits, as shown in figure 9. An encoded code-
of the conventional BP algorithm word of length N passes through a modulation
layer and an AWGN channel layer to repre-
sent the communication channel effect. The
LLR information of this noisy codeword is
then generated as an N-dimensional input of
the network with three hidden layers that is
N bits codewords
K bits messages
modulation layer
K bits messages
3 hidden layers
LLR generator
LLR inputs
NN decoder
(estimated)
noise layer
……………
……………
trained to output K estimated information bits.

encoder
This NND works for 16-bit length random

and structured codes (e.g., polar codes) and
achieves maximum a posteriori (MAP) perfor-
mance; however, such performance degrades
along with an increasing number of informa-
Fig. 9 A plain DNN architecture for channel decoding to decode k bits messages tion bits. Fortunately, the NND can generalize
from N bits noisy codewords [28] a subset of codewords (for training) to unseen

codewords when decoding structured codes
so that it is promising to address the issue of
input
dimensionality by learning a form of decoding 1st NND
algorithm.
Other advantages, such as parallel architec- known bits output
BP decoding stages
decoded codeword
…………
ture and one-shot decoding (i.e., no iterations) input
LLR-values
with low latency, makes the NND a promising 2nd NND
alternative channel-decoding algorithm. The known bits output

authors in [28] suggest the existence of an
…
optimal SNR for training to classify the code-
words over arbitrary SNRs, and argue that input
rth NND
having more training epochs can lead to better
performance. Training with direct channel val- output
ues or LLR while using mean squared error or
binary cross-entropy as a loss function has no
significant effect on the final results. Fig. 10 A partitioned NN decoding architecture for polar codes with each NND
To further scale DL-based decoding to decoding a sub-codeword [29]
large codewords, several former NNDs, with
each decoding a sub-codeword, are combined theless, PNN still offers a promising solution
in [29]. These NNDs are firstly trained indi- to dimensionality problems.
vidually to meet the MAP performance, and
3.3 Detection
then combined to replace the sub-blocks of a
conventional decoder for polar codes. Thus, a Along with the increasing application of ad-
large codeword is concurrently decoded. Spe- vanced communication systems with promis-
cifically, the encoding graph of the polar codes ing performance, capacity, and resources (e.g.,
defined as partitionable codes in [29] can be massive MIMO and mmWave), the available
partitioned into sub-blocks that can be decod- communication scenarios or communication
ed independently. Thus, the corresponding BP channels are becoming increasingly com-
decoder is partitioned into the sub-blocks re- plex, thereby increasing the computational
placed by NN architectures that are connected complexity of channel models and the corre-
in the remaining BP decoding stages. sponding detection algorithms. Conventional
A partitioned neural network (PNN) ar- (iterative) detection algorithms form a com-
chitecture is proposed as shown in figure 10. putational bottleneck in real-time implemen-
The received LLR values from the channel tation. In comparison, given their expressive
are propagated step by step corresponding to capacity and tractable parameter optimization,
the BP update algorithm and arrive at the first DL methods can be used for detection by un-
NND to decode the first sub-codeword, which folding specific iterative detection algorithms
is then propagated in the remaining BP decod- (similar to channel decoding) and for making
ing steps as known bits. This decoding process a tradeoff between accuracy and complexity
continues until all sub-codewords are sequen- by leveraging flexible layer structures. Data
tially decoded. Pipeline implementation can detection becomes a simple forward pass
also be applied to decode multiple codewords. through the network and can be performed in
This proposed PNN architecture can compete real time.
well with conventional successive cancellation In [7], a DL-based detector called DetNet,
and BP decoding algorithms; however, its per- which aims to reconstruct the transmitted x by
formance deteriorates along with an increasing treating the received y and channel matrix H
number of partitioned sub-blocks, thereby (assumed to be perfectly known) as inputs, is
limiting its application for large codes. Never- introduced by unfolding a projected gradient

where {W1k , W2 k , W3k } and {b1k , b 2 k , b3k }
L layers are the weights and bias, respectively, and
T
ψ t (⋅) is a defined piecewise linear soft sign
HH
operator. L layers (i.e., L iterations) are ap-
b2k
plied in DetNet. A loss function covering the
soft sign operator

outputs of all layers is adopted to prevent the
piecewise linear
x^ (k)k)k z (k)k)k (weights: w2k ) x^
(k+1)
gradients from vanishing, and the Adam opti-

ReLU activation layer
ReLU activation layer

(weights: w1k+1 )
(weights: w1k )
mization algorithm is used for the training.

…… To test the robust performance of DetNet
v
(k)
k
k) (k)
in complex channels and generalization, two
z kk)
v (k+1)
sum ( w3k )
weighted
channel scenarios are considered, namely, the

fixed channel (FC) model with a deterministic
b1k b 3k b1k+1 yet ill condition and the varying channel (VC)
T
model with an H that is randomly generated
Hy kkth layer (k+1
k )th layer
k+1
by a known distribution. Compared with the
conventional approximate message passing
Fig. 11 A single layer structure of the DetNet architecture [7]. L layers of the (AMP) and semidefinite relaxation (SDR) al-
DetNet represent L iterations of the projected gradient descent algorithm gorithms that provide near-optimal detection
accuracy, the simulation results in the FC sce-
descent algorithm for maximum likelihood op- nario indicate that DetNet outperforms AMP
timization, in which the iteration is computed and achieves similar accuracy as SDR but runs
as follows given by [7]: 30 times faster, whether trained by the FC or
 (k ) 2  VC channel. Therefore, DetNet remains robust
x
( k +1)
Π  x − δ k
=

∂ y − Hx
∂x x = xˆ (k ) 

| in ill-conditioned channels and is generalized
  during training to detect arbitrary channels.
( k ) Similarly, DetNet works 30 times faster than
Π  x − δ k HT y + δ k HT Hx( k )  ,
=
  SDR in the VC channel but shows a compara-
(1) ble performance. DetNet works even faster at
(k )

where δ is the step size, x is the estimate shallow layers but with less accuracy, thereby
k
illustrating a tradeoff between accuracy and
in the kth iteration, and Π (⋅) is a nonlinear
complexity. Accordingly, the DetNet presents
projection operating on a linear combination
a promising solution to computationally chal-
of factors that is formed by the data. These
lenging detection tasks in advanced complex
data are lifted into the higher dimension and
scenarios.
operated by ReLU activation function (repre-
Despite the complex channel models, no
sented by ρ in (2) given by [7]) to leverage
mathematically tractable channel models are
the DNN architecture. As shown in figure 11, available to characterize the physical propa-
each original iteration is unfolded to the fol- gation process accurately in highly complex
lowing layer: cases, such as molecular and underwater
  HT y   communications. Therefore, new detection
  
  x ( k )   approaches that do not require channel infor-
=z ( k ) ρ  W1k    + b1k , mation must be devised for the novel systems.
  HT Hx ( k )  
   DL offers a promising solution to this problem
  v ( k )   (2) because of its expressive capacity, data-driven

= (
xˆ ( k +1) ψ tk W2 k z ( k ) + b 2 k , ) characteristic, non-requirement for a defined
channel model, and ability to optimize end-
vˆ ( k +1) W3k z ( k ) + b3k ,
= to-end performance. In [4] a fully connected
(1)
x = 0,

DNN, CNN, and RNN are applied for detec-
tion in a molecular communication system.
A molecular communication experimental
platform is established to generate an adequate
(softmax activation)
 0.02  0
hidden layer 1
hidden layer 2
feature vector yn
(dense layer)
(dense layer)
output layer
dataset by repeatedly transmitting a consecu-    
input layer
 0.85  1
 0  0
tive sequence of N symbols from M possible    
   
types. Chemical signals, acids (representing  0.03 
  M 1
0
  M 1
bit-0), and bases (representing bit-1) are used estimated x^ n
to encode pH level information. After trans-

mission in water, the pH values in each sym-
bol interval are treated as the received signals.
Fig. 12 A dense-Net for symbol-to-symbol detection to detect an estimated x n in
In a simple baseline detection, bit-0 or bit-1
one-hot representation [4]
is detected in each symbol interval according
to a decrease or increase in pH values, respec-
tively, which is decided by the difference be-
tween the subintervals that the original symbol x^ n-1 xn
^
interval is divided into (i.e., positive for bit-1

output layer output layer
and negative for bit-0). (softmax) (softmax)
When designing the DL-based detector, a
…
…
simple memoryless system is considered such (2)
h n-1
(2)
hn
(2)
h n+1
LSTM 2 LSTM 2
that the nth received signal is determined by
the nth transmitted signal x n . Symbol-to-sym- (1) (1) (1)
h n-1 hn hn+1
bol detection can be implemented using LSTM 1 LSTM 1
Dense-Net, a basic fully connected DNN

architecture (figure 12) that inputs the former input layer input layer
pH difference values and some absolute values
representing as the received feature vector y n yn-1 yn
and outputs an M-dimensional probability vec-

tor. A CNN-based detector is also introduced
Fig. 13 A LSTM-based detector for sequence de-
to adapt to the effect of random shift. The sys-
tection with L LSTM layers followed by the dense
tem with inter symbol interference (ISI) and
output layer with softmax activation function [4]
memory presents a highly sophisticated yet
realistic scenario that requires sequence de-
the physical channel.
tection and can be implemented by the LSTM
Aside from complex scenarios, DL can
network, a typical algorithm for sequence pro-
be applied to well-researched channel condi-
cessing belonging to RNN. As shown in figure
tions to enhance its performance further. For
13, the nth estimated x n is detected from the example, in [32], a five-layer fully connected
previously and currently received signals. DNN is embedded into an OFDM system for
The simulation results demonstrate that channel estimation and detection by treat-
all these DL-based detectors outperform the ing the channel as a black box. During the
baseline, whereas the LSTM-based detector offline training, the original data and pilots
shows an outstanding performance in the mo- are formed into frames, and pass through a
lecular communication system with ISI. This statistic channel model with distortion after
result validates the potential application of DL inverse discrete Fourier transform processing
in future novel systems and highlights the im- and adding a cyclic prefix (CP), so that the re-
portance of selecting suitable DL architectures ceived signal in time domain is generated. The
that adequately reflect the characteristics of frequency domain complex signal is obtained

by removing CP from the former time domain or more blocks are jointly optimized. O’Shea
signal and performing discrete Fourier trans- et al. recast communication as an end-to-end
form. The frequency domain signal comprising reconstruction optimization task and propose
data and pilot information, is then fed into the a novel concept based on DL theories to repre-
DNN detector to reconstruct the transmitted sent the simplified system as an autoencoder.
data in frequency domain. In comparison with They first introduce an autoencoder to the field
the conventional minimum mean square error of communication [1] and then propose radio
(MMSE) method, the DNN detector achieves transformer network (RTN) to combine the DL
comparable performance in online testing but architecture with the knowledge of communi-
shows better performance when less pilots or cation experts. This autoencoder system is also
no CP is used or when clipping distortion is extended to multi-user and MIMO scenarios
introduced to reduce the peak-to-average pow- in [1] and [34], respectively.
er ratio. The DNN detector also shows a stable
4.1 Autoencoder-based End-to-end
performance when tested in channel models
System
with different delays and path numbers, there-
by highlighting its robustness and ability to In [1], communication is considered an end-
further improve conventional communication to-end reconstruction problem where the
systems. transmitted messages are reconstructed at the
receiver side over a physical channel (figure
IV. DL AS A NOVEL COMMUNICATION 14). Therefore, the autoencoder can represent
ARCHITECTURE the entire communication system and jointly
optimize the transmitter and receiver over an
Despite their promising performance, DL- AWGN channel. An original autoencoder is an
based algorithms are often proposed as alter- unsupervised DL algorithm that learns a com-
natives for one or two processing blocks of the pressed representation form of inputs that can
classic block-structure communication system be used to reconstruct the inputs at the output
(figure 5). However, basic communication layer. In the proposed approach, the trans-
tasks aim to propagate signals from one point mitter and receiver are represented as fully
to another through a physical communication connected DNNs, whereas the AWGN channel
channel, and use a transmitter and receiver to between them is represented as a simple noise
manage the practical channel effect and en- layer with a certain variance. Thus, the com-
sure reliability when no rigid block structure munication system can be regarded as a large
is requested. Thus, the optimization in each autoencoder that aims to learn from s, which is
block cannot guarantee global optimization for one out of the M possible messages for prop-
the communication problem because perfor- agation, to generate a representation of the
mance improvements can be achieved if two transmitted signal x that is robust against the
imperfect channel. Therefore, at the receiver
side, the original message can be reconstruct-
message s
transmitter
transmitted x ed as s with a low error rate by learning from
the received y. The entire autoencoder-based
communication system is trained to achieve
channel
reconstruction
end-to-end performance, such as BER or block
estimated ^s received y error rate (BLER). However, this process may
receiver
add redundancy in representation x , which
differs from the original DL autoencoder that
learns to compress inputs restoratively.
Fig. 14 A simple form of communication system that reconstructs the transmitted In an implementation example (figure 15)
message at the receiver side [8].

[1], s is represented as an M-dimensional lated to received signals, thereby enabling the
one-hot vector. Therefore, K = log 2 ( M ) bits integration of communication knowledge into
are transmitted simultaneously. After being a DL system by generating parameters from
fed into the DNN transmitter with multiple these signals for such deterministic transfor-
dense layers followed by a normalization mations.
layer, an N-dimensional vector x with energy RTN that extends the former DNN receiver
constraints is generated. Therefore, the com- by adding a parameter estimation module be-
munication rate of such system is R=K/N. The fore learning to discriminate is proposed [1]
received N-dimensional signal y noised by a (figure 16). Specifically, a parameter vector ω
channel represented as a conditional probabil- is learned from the received y by a fully con-
ity density function p(y | x) is subsequently nected DNN with linear activation in the last
learned by the DNN receiver with multiple layer and then fed into a deterministic trans-
dense layers. The last layer of the receiver formation layer that is parameterized by ω ,
is a softmax activation layer that outputs an which corresponds to specific communication
M-dimensional probability vector p , in which properties. The transformation is performed
the sum of its elements ( ≥ 0 ) is equal to 1. on y to generate a canonicalized y . The for-
The index of the largest element with the merly learned DNN in the receiver with soft-
highest probability determines which of the M max activation completes the following dis-
possible messages is the decoded s . The auto- crimination task to output an estimated s . In
encoder is trained by SGD at a fixed SNR with
categorical cross-entropy as a loss function
to optimize BLER performance. The autoen-
coder-based communication system achieves
normalization layer
 0.02  0
a comparable or better performance than the 0
softmax layer
dense layers
dense layers
    1
noise layer
1 x y  0.85   
conventional BPSK with Hamming code, 0  0  0
     
thereby indicating that this system has learned 
0
  
 0.03 

0
  M 1   M 1   M 1
a joint coding and modulation scheme. s p ^s
Such autoencoder architecture can solve
transmitter channel receiver
other communication problems in the physical
layer, such as pulse shaping and offset com-
pensation, by dealing with IQ samples and can Fig. 15 A simple autoencoder for an end-to-end communication system [1] that
be applied to complex scenarios where com- encodes s in one hot representation to an N-dimensional transmitted signal x. This
munication channels are unknown. encoded signal after adding a noise y is then decoded to an M-dimensional proba-
bility vector p, and then s is determined
4.2 Extended architecture with
expert knowledge
Aside from interpreting the communication
system as a plain DL model, it is reasonable
transformation layer
normalization layer
0 0
linear activation
 0.02 
to consider introducing communication expert
softmax layer
dense layers
dense layers
dense layers
  _    
noise layer
1 x y ω y  0.85  1

knowledge or adjusting the DL architecture to 0
 
 0 
 
0
 
 y    
accommodate certain communication scenar- 0
  M 1
 0.03 
  M 1
0
  M 1
ios or accelerate the training phase. As shown s p ^s
in [1], certain parametric transformations

transmitter channel receiver
correspond to the channel effect. The inverse
forms of these transformations can compen-
sate for the channel distortion whereas the Fig. 16 A receiver implemented as an RTN [1]. In the receiver, a new block con-
sisting of dense layers with linear activation function and a deterministic transfor-
estimation of their parameters are highly re-
mation layer, is added to the original autoencoder-based communication system

addition to its application to the receiver, the over the same interfering channel. The only
RTN architecture can be used for any scenario difference of this scenario from the single-us-
where deterministic transformations with esti- er case is that the entire system is trained to
mated parameters exist. The simulation results achieve conflicting goals with interference
indicate that the autoencoder with RTN out- at the receiver side (figure 17), that is, each
performs and converges faster than the plain transmitter-receiver pair aims to optimize the
autoencoder architecture, thereby validating system to propagate their own messages accu-
the effectiveness of combining the DL model rately.
with prior expert knowledge. One proposed method is to optimize the
weighted sum of cross-entropy loss functions
4.3 Autoencoder for multi-user
as J= α J1 + (1 − α ) J 2 . J1 and J 2 represent
The autoencoder system must be realizable for the losses of the first and second transmit-
highly complex scenarios, such as multi-user ter-receiver pairs, respectively. α is a dy-
communication over interference channels, to namic weighted factor ranging from 0 to 1
become a universal communication architec- and is related to mini-batch parameters. The
ture. The application of autoencoders to a sim- autoencoder system achieves the same or even
ple two-user scenario is explored in [1], where better BLER performance at the same commu-
two autoencoder-based transmitter-receiver nication rate than conventional uncoded QAM
pairs attempt to communicate simultaneously schemes, thereby validating its potential appli-
cation in multi-user cases.
x1 n1 y1
4.4 Autoencoder for MIMO
s1 ^s 1
transmitter 1 receiver 1
In [34], the autoencoder communication
system is extended to MIMO channels. Two
s2 x2 y2 ^s 2
transmitter 2 receiver 2 types of conventional MIMO communication
n2
schemes are considered, namely, an open-
interference channel loop system without channel state information
(CSI) feedback and a closed-loop system with
CSI feedback. Unlike the AWGN channel
Fig. 17. Two-user scenario with interfering channel [1]. Each receiver has to de- model used before, an { r × t } MIMO channel
tect their own messages based on received signals from two transmitters
response H is randomly generated before add-
ing noise in the channel network block.
In the first open-loop case, the transmitter
in the primary autoencoder system is modified
MIMO
channel generator to encode message s to the transmitted signal
(r×t)
x as t parallel streams of N time samples. The
H message is then multiplied by a pre-defined
H before passing through the noise layer to
complex multiply layer
normalization layer
0  0.02  0 generate the received signal y as r streams

 
softmax layer
ReLU layers
   
noise layer
ReLU layers
1 1
linear layer
x y  0.85  of N time samples (figure 18). Therefore, the

0  0  0
 


  
  
 estimated s is learned from y in the follow-
0  0.03  0
  M 1   M 1   M 1 ing procedures. In the simulation of a 2 × 1
s p ^s
MIMO system, the adjusted autoencoder ar-
transmitter channel receiver chitecture outperforms the conventional open-
loop MIMO schemes, such as Alamouti STBC
with an MMSE receiver, when SNR ≥ 15 dB.
Fig. 18 A general MIMO channel autoencoder architecture [34]. This system deals In the closed-loop case, an idealized sit-
with the open-loop case where no CSI feedback exists

uation is considered where the transmitter inspire an extension of the proposed approach-
can obtain perfect CSI information from the es to further applications or evolution.
receiver. Compared with the open-loop case, For example, the success of DL-based
a feedback is added to the general MIMO modulation recognition methods verifies the
autoencoder architecture, as shown in figure capability of DL in feature extraction and rec-
19. More specifically, The channel response H ognition. It implies the potential to apply DL
generated in the channel module is propagated for efficient recognition of other system pa-
to the transmitter as an input, which is concat- rameters, such as recognizing source coding or
enated with s before encoded to x. The MIMO channel coding schemes, extracting CSI, and
autoencoder with perfect CSI outperforms learning characteristics from signals, which
the conventional singular value decomposi- enables the wireless communication system
tion-based precoding scheme at most SNRs. more ``knowledgeable’’ or ``intelligent’’ in the
However, CSI errors exist in practice, such physical layer. Combining the intelligence in
as inaccurate estimation, compaction, and the physical layer and upper layers straight-
unreliable feedback propagation. Therefore, a forwardly, the communication systems can
quantized-CSI scheme is introduced to the au- adapt to the features of transmitted signals and
toencoder to represent the practical situation. propagation environment automatically and
The only difference of this scheme from the achieve flexible deployment.
former MIMO autoencoder for a perfect CSI In [8], the learned architecture has diffi-
is that before it is concatenated with message culties with VCs. Inspired by the competitive
s at the transmitter side, the real-valued H is performance achieved by CSI feedback in the
compacted as a b-bit vector H b by another MIMO autoencoder system in [34], this con-
NN to represent 2b modes, which is shown dition can be improved if the channel informa-
as the yellow dashed box in figure 19. The tion is sent to the receiver, as suggested by the
simulation of a 2 × 2 system demonstrates that expert knowledge from the communication
in contrast to the performance decrease of the domain. Similarly, the performance of the DL
conventional LLoyd algorithm with fewer bits approaches presented in Figs. 12, 13, and 15
of CSI, the autoencoder outperforms the per- can be further improved by introducing chan-
fect CSI in some quantized CSI cases. Thus, nel information.
such quantization method helps the system 2 ) S p e c i a l i z e d D L a rc h i t e c t u re f o r
achieve better convergence for each channel communication: The DL architecture used for
mode. learning has an important influence on the fi-
nal performance of the network, whereas the
V. DISCUSSION AND FUTURE WORKS current design schemes for communication
networks are simple. For certain algorithms
The application of DL to the physical layer with iterations (e.g., data detection [7], decod-
of wireless communication systems presents ing [26], and compressive sensing [42], [43],
a new research field that is still in its early [44]), a straightforward method is to unfold
stage. Although previous studies have shown the iterations as layer structures, and weights
promising results, some extensive challenges can be added to the original iterative formulas
are worth exploring further in future investiga- for training. The sub-blocks of a partitionable
tions. algorithm can also be replaced, as shown in
1) Further extensions on extant researches: [29]. Despite their low complexity and com-
Emerging researches have attempted to intro- parable performance, these approaches only
duce DL as an alternative for certain modules modify the conventional algorithms and do not
of the conventional communication system. create any significant differences or learn any
The performance achieved in these researches new algorithms. Some studies [28], [1] have
applied plain DL methods to communication

cases. However, these methods suffer from di- previous studies but are not limited to the two
mensionality problems because the complexity schemes without verification of optimality.
of networks and training phases grows expo- The selection of loss functions and training
nentially along with the increasing number of strategies, as well as the training of the DL
messages or codewords, thereby limiting their system whether on a fixed SNR or a range,
application in practical scenarios. present other topics worthy of investigation in
Therefore, expert knowledge from the future research.
communication domain must be introduced 4) From simulation to implementation:
to DL architecture design. As presented in Most of the DL-based algorithms designed for
[4], the performance of the system can be im- the physical layer of wireless communication
proved using suitable DL networks that better systems are still in their simulation stages. To
characterize the channel conditions. The RTN the best of our knowledge, only [8] attempts
proposed in [1], represents the first attempt to to investigate the implementation of these al-
add a priori knowledge to the plain DL archi- gorithms. Therefore, researchers still have to
tectures, and illustrates the benefits of reduc- improve these algorithms considerably before
ing complexity and accelerating convergence. they can be implemented. First, an authentic
Despite its limited salability, RTN has inspired set of data from real communication systems
a design of specialized DL systems for com- or prototype platforms in actual physical en-
munication fields based on the basic DL ar- vironments must be made available to all re-
chitecture. However, the specialized design is searchers to help them train their DL architec-
not limited to the proposed RTN architecture tures on common measured data and compare
and worth extensive exploration. Novel and the performance of different algorithms objec-
advanced DL-based systems that correspond tively. Second, the communication channels in
to expert knowledge in propagation must be the simulation are often generated by certain
proposed to devise effective algorithms for models. Therefore, DL systems achieve com-
future communication systems and address the parable performance because of their impres-
limitations in their scalability. sive expressive capacity. However, the diverse
3) Learning strategies and performance physical channel scenarios are considerably
analysis: Although the recently proposed DL- more complex in reality and change over time.
based communication algorithms demonstrate Given that the current DL systems are mainly
a competitive performance, they lack solid trained offline, their generalization capability
foundations for theoretical derivation and must be guaranteed. Designing specialized
analysis. The performance boundaries and the systems for specific scenarios or general sys-
minimum dataset required for training also tems that dynamically adapt to VC conditions
lack any certification. Moreover, given that the is also imperative. DL tools for hardware,
research on the application of DL to the phys- such as field programmable gate array, must
ical layer of wireless communication systems be developed to deploy the DL methods on
is still in its early stage, the rules of learning hardware and achieve fast realization.
strategies remain unknown and warrant further
exploration. Unlike CV where the dataset is VI. CONCLUSION
often represented as pixel values, the system
design in the communication domain relies on This paper reviews the literature on the appli-
practical channel conditions, and the signals cation of DL methods to the physical layer of
are considered man-made representations for wireless communication systems to replace
reliable propagation. Thus, the optimal input parts of the conventional communication sys-
and output representations for DL communi- tem or create a new DL-based architecture
cation systems remain unknown. The inputs (i.e., an autoencoder system). Given their
are represented as binary or one-hot vectors in excellent expressive capacity and convenient

optimization, the DL-based algorithms show Heath, “Adaptation in con- volutionally coded
MIMO-OFDM wireless systems through super-
competitive performance with less complexity
vised learning and SNR ordering,” IEEE Trans.
or latency and have potential application in Veh. Technol., vol. 59, no. 1, pp. 114–126, Jan.
future communication systems where con- 2010.
ventional theories are challenged. The appli- [12] S. K. Pulliyakode and S. Kalyani. (2017) Rein-
forcement learning techniques for outer loop
cation of DL to the physical layer of wireless
link adaptation in 4G/5G systems. [Online].
communication systems presents a promising Available: https://arxiv.org/abs/1708.00994, pre-
research area that is far from maturity. Further print.
studies, including solid theoretical analyses, [13] J. Vieira, E. Leitinger, M. Sarajlic, X. Li, and
F. Tufvesson. (2017) Deep convolutional neu-
must be conducted and new DL-based archi-
ral networks for massive MIMO fingerprint-
tectures must be proposed to implement DL- based positioning. [Online]. Available: https://
based ideas in actual communication scenari- arxiv.org/abs/1708.06235, preprint.
os. [14] A. Fehske, J. Gaeddert, and J. H. Reed, “A
new approach to signal classification using
References spectral correlation and neural networks,” in
Proc. IEEE Int. Symp. New Frontiers in Dynamic
[1] T. J. O’Shea and J. Hoydis. (2017) An intro-
Spectrum Access Networks (DYSPAN), 2005, pp.
duction to deep learning for the physical
144–150.
layer. [Online]. Available: https://arxiv.org/
[15] E. E. Azzouz and A. K. Nandi, “Modulation
abs/1702.00832, preprint.
recognition using artificial neural networks,”
[2] E. G. Larsson, O. Edfors, F. Tufvesson, and T. L.
in Proc. Automatic Modulation Recognition of
Marzetta, “Massive MIMO for next generation
Communication Signals, 1996, pp. 132–176.
wireless systems,” IEEE Commun. Mag., vol. 52,
[16] M. Ibukahla, J. Sombria, F. Castanie, and N. J.
no. 2, pp. 186–195, Feb. 2014.
Bershad, “Neural networks for modeling non-
[3] N. González-Prelcic, A. Ali, V. Va, and R. W.
linear memoryless communication channels,”
Heath Jr. (2017) Millimeter wave communica-
IEEE Trans. Commun., vol. 45, no. 7, pp. 768–
tion with out-of-band information. [Online].
771, Jul. 1997.
Available: https://arxiv.org/abs/1703.10638, pre-
[17] J. Sjöberg, Q. Zhang, L. Ljung, A. Benveniste, B.
print.
Delyon, P.-Y. Gloren- nec, H. Hjalmarsson, and
[4] N. Farsad and A. Goldsmith. (2017) Detection
A. Juditsky, “Nonlinear black-box modeling in
algorithms for communication systems using
system identification: a unified overview,” Au-
deep learning. [Online]. Available: https://arxiv.
tomatica, vol. 31, no. 12, pp. 1691–1724, Oct.
org/abs/1705.08044, preprint.
1995.
[5] Y.-S. Jeon, S.-N. Hong, and N. Lee. (2016)
[18] J. Bruck and M. Blaum, “Neural networks, er-
Supervised-learning-aided communication
ror-correcting codes, and polynomials over the
framework for massive MIMO systems with
binary n-cube,” IEEE Trans. Inf. Theory, vol. 35,
low- resolution ADCs. [Online]. Available:
no. 5, pp. 976–987, Sep. 1989.
https://arxiv.org/abs/1610.07693, preprint.
[19] I. Ortuno, M. Ortuno, and J. Delgado, “Error
[6] W. Tan, S. Jin, C. K. Wen, and Y. Jing, “Spectral
correcting neural networks for channels with
efficiency of mixed- ADC receivers for massive
Gaussian noise,” in Proc. IJCNN International
MIMO systems,” IEEE Access, vol. 4, pp. 7841–
Joint Conference on Neural Networks, vol. 4,
7846, Sep. 2016.
1992, pp. 295–300.
[7] N. Samuel, T. Diskin, and A. Wiesel. (2017)
[20] C. K. Wen, S. Jin, K. K. Wong, J. C. Chen, and P.
Deep MIMO detection. [Online]. Available:
Ting, “Channel Esti- mation for Massive MIMO
Using Gaussian-Mixture Bayesian Learning,” IEEE
[8] S. Dörner, S. Cammerer, J. Hoydis, and S. ten
Trans. Wireless Commun., vol. 14, no. 3, pp.
Brink. (2017) Deep learning-based communi-
1356–1368, Mar. 2015.
cation over the air. [Online]. Available: https://
[21] S. Chen, G. Gibson, C. Cowan, and P. Grant,
arxiv.org/abs/1707.03384, preprint.
“Adaptive equalization of finite non-linear
[9] I. Goodfellow, Y. Bengio, and A. Courville, Deep
channels using multilayer perceptrons,” Elsevier
Learning. MIT Press, 2016, http://www.deep-
Signal processing, vol. 20, no. 2, pp. 107–119,
learningbook.org.
Jun. 1990.
[10] U. Challita, L. Dong, and W. Saad. (2017) Pro-
[22] J. Cid-Sueiro and A. R. Figueiras-Vidal, “Dig-
active resource management in LTE-U systems:
ital equalization using modular neural net-
A deep learning perspective. [Online]. Avail-
works: an overview,” in Proc. Signal Processing
able: https://arxiv.org/abs/1702.07031, preprint.
in Telecommunications. Springer, 1996, pp.
[11] R. C. Daniels, C. M. Caramanis, and R. W.
337–345.

[23] M. Ibnkahla, “Applications of neural networks logue modulation recognition,” Elsevier Signal
to digital communications–a survey,” Elsevier processing, vol. 46, no. 2, pp. 211–222, Oct.
Signal processing, vol. 80, no. 7, pp. 1185–1215, 1995.
Jul. 2000. [38] W. R. Caid and R. W. Means, “Neural network er-
[24] C. Jiang, H. Zhang, Y. Ren, Z. Han, K.-C. Chen, ror correcting decoders for block and convolu-
and L. Hanzo, “Machine learning paradigms tional codes,” in Proc. IEEE Global Telecommun.
for next-generation wireless networks,” IEEE Conf. (GLOBECOM), 1990, pp. 1028–1031.
Wireless Commun., vol. 24, no. 2, pp. 98–105, [39] A. Di Stefano, O. Mirabella, G. Di Cataldo, and
Apr. 2017. G. Palumbo, “On the use of neural networks for
[25] K. Hornik, M. Stinchcombe, and H. White, Hamming coding,” in Proc. IEEE Int. Sympoisum
“Multilayer feedforward networks are universal on Circuits and Systems, 1991, pp. 1601–1604.
approximators,” Neural Networks, vol. 2, no. 5, [40] X.-A. Wang and S. B. Wicker, “An atificial neural
pp. 359–366, 1989. net Viterbi decoder,” IEEE Trans. Commun., vol.
[26] E. Nachmani, Y. Be’ery, and D. Burshtein, 44, no. 2, pp. 165–171, Feb. 1996.
“Learning to decode linear codes using deep [41] I. Dimnik and Y. Be’ery, “Improved random re-
learning,” in Proc. Communication, Control, and dundant iterative HDPC decoding,” IEEE Trans.
Computing (Allerton), 2016, pp. 341–346. Commun., vol. 57, no. 7, Jul. 2009.
[27] E. Nachmani, E. Marciano, D. Burshtein, and Y. [42] A. Mousavi, A. B. Patel, and R. G. Baraniuk, “A
Be’ery. (2017) RNN decoding of linear block deep learning approach to structured signal
codes. [Online]. Available: https://arxiv.org/ recovery,” in Proc. Communication, Control,
abs/1702.07560, preprint. and Computing (Allerton), 2015, pp. 1336–
[28] T. Gruber and S. Cammerer and J. Hoydis and 1343.
S. ten Brink, “On deep learning-based channel [43] A. Mousavi and R. G. Baraniuk, “Learning to
decoding,” in Proc. of CISS, 2017, pp. 1–6. invert: Signal recovery via deep convolutional
[29] T. Gruber, S. Cammerer, J. Hoydis, and S. ten networks,” in Proc. Int. Conf. Acoustics, Speech
Brink. (2017) Scaling deep learning-based and Signal Processing, 2017, pp. 2272–2276.
decoding of polar codes via partitioning. [On- [44] S. Lohit, K. Kulkarni, R. Kerviche, P. Turaga, and
line]. Available: https://arxiv.org/abs/1702.06901, A. Ashok. (2017) Convolutional neural net-
preprint. works for non-iterative reconstruction of com-
pressively sensed images. [Online]. Available:
[30] E. Nachmani, E. Marciano, L. Lugosch, W. J.
Gross, D. Burshtein, and Y. Beery. (2017) Deep
learning methods for improved decoding of
Biographies
linear codes. [Online]. Available: https://arxiv.
org/abs/1706.07043, preprint. Tianqi Wang, was born in
[31] F. Liang, C. Shen, and F. Wu. (2017) An itera- Jiangsu, China, in 1993. She
tive BP- CNN architecture for channel de- received the B.S. degree from
coding. [Online]. Available: https://arxiv.org/ Nanjing University of Science
abs/1707.05697, preprint. and Technology, Nanjing, China
[32] H. Ye, G. Y. Li, and B.-H. F. Juang. (2017) Power in 2016. She is currently work-
of deep learning for channel estimation and ing toward the M.S. degree
signal detection in OFDM systems. [Online]. with the School of Information
Available: https://arxiv.org/abs/1708.08514, pre- Science and Engineering, Southeast University. Her
print. main research interests include deep learning ap-
[33] D. Neumann, T. Wiese, and W. Utschick. (2017) plication in communication and massive MIMO sys-
Learning the MMSE channel estimator. [On- tems.
line]. Available: https://arxiv.org/abs/1707.05674,
preprint. Chao-Kai Wen (S’00-M’04),
[34] T. J. O’Shea, T. Erpek, and T. C. Clancy. (2017) received the Ph.D. degree from
Deep learning based MIMO communica- the Institute of Communica-
tions. [Online]. Available: https://arxiv.org/ tions Engineering, National
abs/1707.07980, preprint. Tsing Hua University, Taiwan,
[35] A. K. Nandi and E. E. Azzouz, “Algorithms for China, in 2004. He was with
automatic modulation recognition of commu- Industrial Technology Research
nication signals,” IEEE Trans. Commun., vol. 46, Institute, and MediaTek Inc.,
no. 4, pp. 431–436, Apr. 1998. from 2004 to 2009. He is currently an Associate
[36] X.-Z. Lv, P. Wei, and X.-C. Xiao, “Automatic Professor of the Institute of Communications Engi-
identification of digital modulation signals us- neering, National Sun Yat-sen University, Kaohsiung,
ing high order cumulants,” Electronic Warfare, Taiwan, China. His research interests center around
vol. 6, p. 001, Jun. 2004. the optimization in wireless multimedia networks.
[37] A. K. Nandi and E. E. Azzouz, “Automatic ana-

Hanqing Wang (S16), re- Tao Jiang, is currently a Distin-
ceived the B.S. degree in com- guished Professor in the School
munications engineering from of Electronics Information and
Anhui University, Hefei, China, Communications, Huazhong
in 2013, and the M.S. degree University of Science and Tech-
in information and commu- nology, Wuhan, P. R. China. He
nications engineering from received Ph.D. degree in infor-
Southeast University, Nanjing, mation and communication
China, in 2016, where he is currently pursuing the engineering from Huazhong University of Science
Ph.D. degree in information and communications en- and Technology, Wuhan, P. R. China, in April 2004.
gineering with the School of Information Science and From Aug. 2004 to Dec. 2007, he worked in some
Engineering. His current research interests include universities, such as Brunel University and University
detection and estimation theory, compressive sens- of Michigan-Dearborn, respectively. He has authored
ing, and their application to communication systems or co-authored about 300 technical papers in ma-
with hardware imperfection and nonlinear distortion. jor journals and conferences and 9 books/chapters
in the areas of communications and networks. He
Feifei Gao, received the B.Eng. served or is serving as symposium technical program
degree from Xi’an Jiaotong committee membership of some major IEEE confer-
University, Xi’an, China in 2002, ences, including INFOCOM, GLOBECOM, and ICC,
the M.Sc. degree from McMas- etc.. He was invited to serve as TPC Symposium Chair
ter University, Hamilton, ON, for the IEEE GLOBECOM 2013, IEEEE WCNC 2013
Canada in 2004, and the Ph.D. and ICCC 2013. He is served or serving as associate
degree from National Univer- editor of some technical journals in communications,
sity of Singapore, Singapore including in IEEE Transactions on Signal Processing,
in 2007. He was a Research Fellow with the Institute IEEE Communications Surveys and Tutorials, IEEE
for Infocomm Research (I2R), A*STAR, Singapore in Transactions on Vehicular Technology, IEEE Internet of
2008 and an Assistant Professor with the School of Things Journal, and he is the associate editor-in-chief
Engineering and Science, Jacobs University, Bremen, of China Communications, etc.. He is a recipient of
Germany from 2009 to 2010. In 2011, he joined the the NSFC for Distinguished Young Scholars Award
Department of Automation, Tsinghua University, Bei- in 2013. He was awarded as the Most Cited Chinese
jing, China, where he is currently an Associate Profes- Researchers announced by Elsevier in 2014, 2015 and
sor. Prof. Gao’s research areas include communication 2016.
theory, signal processing for communications, array
signal processing, and convex optimizations, with Shi Jin (S’06-M’07), received
particular interests in MIMO techniques, multi-carrier the B.S. degree in communica-
communications, cooperative communication, and tions engineering from Guilin
cognitive radio networks. He has authored/ coau- University of Electronic Tech-
thored more than 100 refereed IEEE journal papers nology, Guilin, China, in 1996;
and more than 100 IEEE conference proceeding pa- the M.S. degree from Nanjing
pers, which have been cited more than 4000 times University of Posts and Tele-
from Google Scholar. Prof. Gao has served as an Edi- communications, Nanjing, Chi-
tor of IEEE Transactions on Wireless Communications, na, in 2003; and the Ph.D. degree in communications
IEEE Communications Letters, IEEE Signal Processing and information systems from Southeast University,
Letters, IEEE Wireless Communications Letters, In- Nanjing, in 2007. From June 2007 to October 2009,
ternational Journal on Antennas and Propagations, he was a Research Fellow with the Adastral Park Re-
and China Communications. He has also served as search Campus, University College London, London,
the symposium co-chair for 2015 IEEE Conference U.K. He is currently with the faculty of the Nation-
on Communications (ICC), 2014 IEEE Global Commu- al Mobile Communications Research Laboratory,
nications Conference (GLOBECOM), 2014 IEEE Ve- Southeast University. His research interests include
hicular Technology Conference Fall (VTC), 2018 IEEE space-time wireless communications, random matrix
Vehicular Technology Conference Spring,as well as theory, and information theory. Dr. Jin serves as an
Technical Committee Members for many other IEEE Associate Editor for the IEEE Transactions on Wireless
conferences. Communications, the IEEE Communications Letters,
and IET Communications. He and his coauthors re-
ceived the 2010 Young Author Best Paper Award by
the IEEE Signal Processing Society and the 2011 IEEE
Communications Society Stephen O. Rice Prize Paper
Award in the field of communication theory.

Deep Learning for Wireless Physical Layer

Uploaded by

Copyright:

Available Formats

You might also like

Deep Learning for Wireless Physical Layer

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Deep Learning for Wireless Physical Layer

Uploaded by

Copyright:

Available Formats

COVER PAPER

Deep Learning for Wireless Physical Layer:

92 China Communications • November 2017

China Communications • November 2017 93

94 China Communications • November 2017

China Communications • November 2017 95

96 China Communications • November 2017

cover modulation recognition, channel decod-

in [35] as a powerful modulation classifier

source source coding channel encoding modulation RF transmitter

China Communications • November 2017 97

(sigmoid activation function)

3.2 Channel decoding

98 China Communications • November 2017

each iteration, a variable (check) node trans-

iterations can be unfolded as 2L hidden layers ……

China Communications • November 2017 99

parity (even) layer

algorithm whether on the regular or sparser

Tanner graph representations of the codes. A

BP-RNN decoder with mRRD algorithms [41]

trained to output K estimated information bits.

This NND works for 16-bit length random

100 China Communications • November 2017

alternative channel-decoding algorithm. The known bits output

China Communications • November 2017 101

soft sign operator

gradients from vanishing, and the Adam opti-

ReLU activation layer

mization algorithm is used for the training.

channel scenarios are considered, namely, the

102 China Communications • November 2017

bit-0), and bases (representing bit-1) are used estimated x^ n

to encode pH level information. After trans-

interval is divided into (i.e., positive for bit-1

Dense-Net, a basic fully connected DNN

and outputs an M-dimensional probability vec-

China Communications • November 2017 103

104 China Communications • November 2017

1 x y ω y  0.85  1

in [1], certain parametric transformations

China Communications • November 2017 105

0  0.02  0 generate the received signal y as r streams

x y  0.85  of N time samples (figure 18). Therefore, the

106 China Communications • November 2017

China Communications • November 2017 107

108 China Communications • November 2017

China Communications • November 2017 109

110 China Communications • November 2017

China Communications • November 2017 111

You might also like