Professional Documents
Culture Documents
Deep Learning for Wireless Physical Layer
Deep Learning for Wireless Physical Layer
Deep Learning for Wireless Physical Layer
Abstract: Machine learning (ML) has been show promising performance improvements
widely applied to the upper layers of wireless but have certain limitations, such as lack of
communication systems for various purposes, solid analytical tools and use of architectures
such as deployment of cognitive radio and that are specifically designed for communi-
communication network. However, its appli- cation and implementation research, thereby
cation to the physical layer is hampered by motivating future research in this field.
sophisticated channel environments and lim- Keywords: wireless communications; deep
ited learning ability of conventional ML algo- learning; physical layer
rithms. Deep learning (DL) has been recently
applied for many fields, such as computer I. INTRODUCTION
vision and natural language processing, given
its expressive capacity and convenient optimi- Wireless communication technologies have
zation capability. The potential application of experienced an extensive development to
DL to the physical layer has also been increas- satisfy the applications and services in the
ingly recognized because of the new features wireless network. The explosion of advanced
for future communications, such as complex wireless applications, such as diverse intelli-
scenarios with unknown channel models, high gent terminal access, virtual reality, augment-
speed and accurate processing requirements; ed reality, and Internet of things, has propelled
these features challenge conventional commu- the development of wireless communication
nication theories. This paper presents a com- into the fifth generation to achieve thousand-
prehensive overview of the emerging studies fold capacity, millisecond latency, and massive
on DL-based physical layer processing, in- connectivity, thereby making system design
cluding leveraging DL to redesign a module of an extraordinarily challenging task. Sever-
the conventional communication system (for al promising technologies, such as massive
modulation recognition, channel decoding, multi-input multi-output (MIMO), millimeter
and detection) and replace the communication wave (mmWave), and ultra-densification net-
system with a radically new architecture based work (UDN) have been proposed to satisfy the
Received: Oct. 16, 2017
Editor: Honggang Zhang on an autoencoder. These DL-based methods abovementioned demands. These technolo-
...
x3
bias
DNN architectures depending on the require-
input layer
ments of specific scenarios. The basic concept
of CNN is to add convolutional and pooling
Fig. 1 A mathematical model of a neuron, where r layers before feeding into a fully connected
features are connected by weighted edges and fed network (figure 3). In the convolutional layer,
into a sigmoid activation function to generate an each neuron only connects to parts of neurons
output y in the former adjacent layer. These neurons
that are organized in a matrix form comprise
several feature maps, and the neurons share
hidden layers
the same weights in each map. In the pool-
ing layer, the neurons in the feature maps
...... are grouped to compute for the mean value
......
(average pooling) or maximum value (max
...... pooling). Thus, the parameters are substantial-
...... ly decreased before using the fully connected
...... network.
input layer output layer Recurrent NN (RNN) aims to provide NNs
with memory because the outputs depend
not only on the current inputs but also on the
Fig. 2 A fully connected feedforward NN architecture where all neurons between
adjacent layers are fully connected. The DNN uses multiple hidden layers between formerly available information in cases such
the input and output layers to extract meaningful features as NLP. Compared with the aforementioned
memoryless NNs without connections in the
Table II Activation functions same hidden layer, the neurons are connected
such that the hidden layers consider their for-
Name Activation function σ ( x)
mer outputs as current inputs to acquire mem-
1
sigmoid ory (figure 4). Some commonly used RNNs
1+ e − x
include Elman network, Jordan network, bidi-
tanh tanh( x) rectional RNN, and long short-term memory
e xi (LSTM).
softmax
∑e xj
j III. DL AS AN ALTERNATIVE
ReLU max(0, x)
The general classic communication system
eral adaptive learning rate algorithms (e.g., architecture is constructed as a block structure
Adagrad, RMSProp, Momentum, and Adam) as shown in figure 5, and multiple algorithms
have been proposed. Although the trained net- solidly founded on expert knowledge have
work performs well in the training data, this been developed in long-term research to opti-
network may perform poorly in the testing mize each processing block therein. Previous
process because of overfitting. In this case, studies have attempted to leverage conven-
early stopping, regularization, and dropout tional ML approaches, such as SVM and small
schemes have been proposed to achieve favor- feedforward NNs, as alternative algorithms
able results in training and testing data. for individual tasks. DL architectures have re-
Convolutional NN (CNN) is another cently been introduced into several processing
dense layers
ents some examples of DL applications that ...
...
...
...
and pattern recognition approaches [35]. These
l+1 l+1 l+1
ht-1 ht h t+1
hidden layer l+1 l+1 l+1 l+1
approaches have several common procedures,
unfold
such as preprocessing, feature extraction, and l
ht-1 h lt
l
h t+1
hidden layer l l l l
classification. Previous studies have been keen
to leverage ML algorithms (usually SVM and
l-1 l-1 l-1
ht-1 ht h t+1
hidden layer l-1 l-1 l-1 l-1
NN) due to their robustness, self-adaption, and
nonlinear processing ability.
...
...
...
...
An NN architecture (figure 6) is proposed input xt input xt-1 intput xt input xt+1
channel
source channel RF
destination demodulation detection
decoding decoding receiver
channel
estimation
Fig. 5 A typical communication system diagram with blocks including source encoding/decoding, channel
encoding/decoding, modulation/demodulation, channel estimation and detection, and RF transceiving. These
signal processing blocks are optimized individually to achieve reliable communication between the source
and target destination
Combined(AM-FM)
(sigmoid activation function)
hidden layer 2
PSK4
output layer
input layer
. key feature 1
such performance cannot be improved further
. at high SNR because the short-term nature of
MASK
.
output
ASK2
layer
input
layer
training samples confuses the CNN classifier
. ASK4
between AM/FM if the underlying signal car-
ries limited information and between QAM16/
MFSK
QAM64 that share constellation points. More
output
FSK2
layer
input
layer FSK4
samples must be used to eliminate such con-
key feature 2
fusion and highlight the potential of the CNN
architecture for further improvement.
……
……
……
……
……
……
marginalization
(output) layer
… …
input layer
modulation layer
K bits messages
3 hidden layers
LLR generator
LLR inputs
NN decoder
(estimated)
noise layer
……………
……………
BP decoding stages
decoded codeword
…………
ture and one-shot decoding (i.e., no iterations) input
LLR-values
with low latency, makes the NND a promising 2nd NND
…
optimal SNR for training to classify the code-
words over arbitrary SNRs, and argue that input
rth NND
having more training epochs can lead to better
performance. Training with direct channel val- output
ues or LLR while using mean squared error or
binary cross-entropy as a loss function has no
significant effect on the final results. Fig. 10 A partitioned NN decoding architecture for polar codes with each NND
To further scale DL-based decoding to decoding a sub-codeword [29]
large codewords, several former NNDs, with
each decoding a sub-codeword, are combined theless, PNN still offers a promising solution
in [29]. These NNDs are firstly trained indi- to dimensionality problems.
vidually to meet the MAP performance, and
3.3 Detection
then combined to replace the sub-blocks of a
conventional decoder for polar codes. Thus, a Along with the increasing application of ad-
large codeword is concurrently decoded. Spe- vanced communication systems with promis-
cifically, the encoding graph of the polar codes ing performance, capacity, and resources (e.g.,
defined as partitionable codes in [29] can be massive MIMO and mmWave), the available
partitioned into sub-blocks that can be decod- communication scenarios or communication
ed independently. Thus, the corresponding BP channels are becoming increasingly com-
decoder is partitioned into the sub-blocks re- plex, thereby increasing the computational
placed by NN architectures that are connected complexity of channel models and the corre-
in the remaining BP decoding stages. sponding detection algorithms. Conventional
A partitioned neural network (PNN) ar- (iterative) detection algorithms form a com-
chitecture is proposed as shown in figure 10. putational bottleneck in real-time implemen-
The received LLR values from the channel tation. In comparison, given their expressive
are propagated step by step corresponding to capacity and tractable parameter optimization,
the BP update algorithm and arrive at the first DL methods can be used for detection by un-
NND to decode the first sub-codeword, which folding specific iterative detection algorithms
is then propagated in the remaining BP decod- (similar to channel decoding) and for making
ing steps as known bits. This decoding process a tradeoff between accuracy and complexity
continues until all sub-codewords are sequen- by leveraging flexible layer structures. Data
tially decoded. Pipeline implementation can detection becomes a simple forward pass
also be applied to decode multiple codewords. through the network and can be performed in
This proposed PNN architecture can compete real time.
well with conventional successive cancellation In [7], a DL-based detector called DetNet,
and BP decoding algorithms; however, its per- which aims to reconstruct the transmitted x by
formance deteriorates along with an increasing treating the received y and channel matrix H
number of partitioned sub-blocks, thereby (assumed to be perfectly known) as inputs, is
limiting its application for large codes. Never- introduced by unfolding a projected gradient
(softmax activation)
0.02 0
hidden layer 1
hidden layer 2
feature vector yn
(dense layer)
(dense layer)
output layer
dataset by repeatedly transmitting a consecu-
input layer
0.85 1
0 0
tive sequence of N symbols from M possible
types. Chemical signals, acids (representing 0.03
M 1
0
M 1
…
simple memoryless system is considered such (2)
h n-1
(2)
hn
(2)
h n+1
LSTM 2 LSTM 2
that the nth received signal is determined by
the nth transmitted signal x n . Symbol-to-sym- (1) (1) (1)
h n-1 hn hn+1
bol detection can be implemented using LSTM 1 LSTM 1
reconstruction
end-to-end performance, such as BER or block
estimated ^s received y error rate (BLER). However, this process may
receiver
add redundancy in representation x , which
differs from the original DL autoencoder that
learns to compress inputs restoratively.
Fig. 14 A simple form of communication system that reconstructs the transmitted In an implementation example (figure 15)
message at the receiver side [8].
0.02 0
a comparable or better performance than the 0
softmax layer
dense layers
dense layers
1
noise layer
1 x y 0.85
conventional BPSK with Hamming code, 0 0 0
thereby indicating that this system has learned
0
0.03
0
M 1 M 1 M 1
a joint coding and modulation scheme. s p ^s
Such autoencoder architecture can solve
transmitter channel receiver
other communication problems in the physical
layer, such as pulse shaping and offset com-
pensation, by dealing with IQ samples and can Fig. 15 A simple autoencoder for an end-to-end communication system [1] that
be applied to complex scenarios where com- encodes s in one hot representation to an N-dimensional transmitted signal x. This
munication channels are unknown. encoded signal after adding a noise y is then decoded to an M-dimensional proba-
bility vector p, and then s is determined
4.2 Extended architecture with
expert knowledge
Aside from interpreting the communication
system as a plain DL model, it is reasonable
transformation layer
normalization layer
0 0
linear activation
0.02
to consider introducing communication expert
softmax layer
dense layers
dense layers
dense layers
_
noise layer
x1 n1 y1
4.4 Autoencoder for MIMO
s1 ^s 1
transmitter 1 receiver 1
In [34], the autoencoder communication
system is extended to MIMO channels. Two
s2 x2 y2 ^s 2
transmitter 2 receiver 2 types of conventional MIMO communication
n2
schemes are considered, namely, an open-
interference channel loop system without channel state information
(CSI) feedback and a closed-loop system with
CSI feedback. Unlike the AWGN channel
Fig. 17. Two-user scenario with interfering channel [1]. Each receiver has to de- model used before, an { r × t } MIMO channel
tect their own messages based on received signals from two transmitters
response H is randomly generated before add-
ing noise in the channel network block.
In the first open-loop case, the transmitter
in the primary autoencoder system is modified
MIMO
channel generator to encode message s to the transmitted signal
(r×t)
x as t parallel streams of N time samples. The
H message is then multiplied by a pre-defined
H before passing through the noise layer to
complex multiply layer
normalization layer
noise layer
ReLU layers
1 1
linear layer