FULLTEXT01

DEGREE PROJECT IN COMPUTER SCIENCE AND ENGINEERING,
SECOND CYCLE, 30 CREDITS

STOCKHOLM, SWEDEN 2021
Deep Learning based Approximate

Message Passing for MIMO
Detection in 5G
Low complexity deep learning algorithms for
solving MIMO Detection in real world scenarios
ANDREA POZZOLI
KTH ROYAL INSTITUTE OF TECHNOLOGY

SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE
Deep Learning based
Approximate Message
Passing for MIMO Detection
in 5G
Low complexity deep learning algorithms

for solving MIMO Detection in real world
scenarios
ANDREA POZZOLI
Master’s Programme, ICT Innovation, 120 credits

Date: December 20, 2021
Supervisor: Dong Liu

Examiner: Saikat Chatterjee
Host company: Huawei Technologies Sweden

Swedish title: Deep Learning-baserat Ungefärligt meddelande som
passerar för MIMO-detektion i 5G
Swedish subtitle: Låg komplexitet djupinlärningsalgoritmer för att
lösa MIMO-detektion i verkliga scenarier
To Samuele Riva,
one is yours
© 2021 Andrea Pozzoli
Abstract | i
Abstract
The Fifth Generation (5G) mobile communication system is the latest technology
in wireless communications. This technique brings several advantages, in
particular by using multiple receiver antennas that serve multiple transmitters.
This configuration used in 5G is called Massive Multiple Input Multiple
Output (MIMO), and it increases link reliability and information throughput.
However, MIMO systems face two challenges at link layer: channel estimation
and MIMO detection. In this work, the focus is only on the MIMO detection
problem. It consists in retrieving the original messages, sent by the transmitters,
at the receiver side when the received message is a noisy signal. The optimal
technique to solve the problem is called Maximum Likelihood (ML), but it
does not scale and therefore with MIMO systems it cannot be used. Several
sub-optimal techniques have been tested during years in order to solve MIMO
detection problem, trying to balance the complexity-performance trade-off. In
recent years, Approximate Message Passing (AMP) based techniques brought
interesting results. Moreover, deep learning (DL) is spreading in several and
different fields, and also in MIMO detection, it has been tested with promising
results. A neural network called MMNet brought the most interesting results,
but new techniques have been developed. These new techniques, despite they
are promising, have not been compared with MMNet. In this thesis, two new
techniques AMP and DL based, called Ortoghonal AMP Network Second
(OAMP-Net2) and Learnable Vector AMP (LVAMP), have been tested and
compared with the state of art. The aim of the thesis is to discover if one or
both the techniques can provide better results than MMNet, in order to discover
a valid alternative solution while dealing with MIMO detection problem.
OAMP-Net2 and LVAMP have been developed and tested on different channel
models (i.i.d. Gaussian and Kronecker) and on MIMO systems of different
sizes (small and medium-large). OAMP-Net2 revealed to be a consistent
technique that can be used in solving MIMO detection problem. It provides
interesting results on both i.i.d Gaussian and Kronecker channel models and
with different sizes matrices. Moreover, OAMP-Net2 has good adaptability,
in fact it provides good results on Kronecker channel models also when it is
trained with i.i.d. Gaussian matrices. LVAMP instead has performances that
are similar to MMSE, but with a lower complexity. It adapts well to complex
channels such as OAMP-Net2.
ii | Abstract
Keywords
5G, MIMO detection, Approximate Message Passing, OAMP-Net2, LVAMP,
MMNet, Deep Learning
Sammanfattning | iii
Sammanfattning
Femte generationens (5G) mobila kommunikationssystem är den senaste
tekniken inom trådlös kommunikation. Denna teknik ger flera fördelar, i
synnerhet genom att använda flera mottagarantenner som betjänar flera sändare.
Denna konfiguration som används i 5G kallas Massive Multiple Input Multiple
Output (MIMO), och den ökar länktillförlitligheten och informationsgenomströmningen.
MIMO-system står dock inför två utmaningar i länkskiktet: kanaluppskattning
och MIMO-detektering. I detta arbete ligger fokus endast på MIMO-detekteringsproblemet.
Den består i att hämta de ursprungliga meddelandena, skickade av sändarna,
på mottagarsidan när det mottagna meddelandet är en brusig signal. Den
optimala tekniken för att lösa problemet kallas Maximum Likelihood (ML),
men den skalas inte och därför kan den inte användas med MIMO-system.
Flera suboptimala tekniker har testats under flera år för att lösa MIMO-
detekteringsproblem och försöka balansera komplexitet-prestanda-avvägningen.
Under de senaste åren har Approximate Message Passing (AMP)-baserade
tekniker gett intressanta resultat. Dessutom sprids djupinlärning (DL) inom
flera och olika områden, och även inom MIMO-detektering har det testats med
lovande resultat. Ett neuralt nätverk kallat MMNet gav de mest intressanta
resultaten, men nya tekniker har utvecklats. Dessa nya tekniker, trots att de
är lovande, har inte jämförts med MMNet. I detta examensarbete har två nya
tekniker AMP- och DL-baserade, kallade Ortoghonal AMP Network Second
(OAMP-Net2) och Learnable Vector AMP (LVAMP), testats och jämförts med
den senaste tekniken. Syftet med avhandlingen är att ta reda på om en eller
båda teknikerna kan ge bättre resultat än MMNet, för att upptäcka en giltig
alternativ lösning samtidigt som man hanterar MIMO-detekteringsproblem.
OAMP-Net2 och LVAMP har utvecklats och testats på olika kanalmodeller
(i.i.d. Gaussian och Kronecker) och på MIMO-system av olika storlekar (small
och medium-large). OAMP-Net2 visade sig vara en konsekvent teknik som kan
användas för att lösa MIMO-detekteringsproblem. Det ger riktigt intressanta
resultat på både i.i.d Gaussian och Kronecker kanalmodeller och med matriser
i olika storlekar. Dessutom har OAMP-Net2 god anpassningsförmåga, faktiskt
ger den bra resultat på Kronecker kanalmodeller även när den tränas med i.i.d.
Gaussiska matriser. LVAMP har istället prestanda som liknar MMSE, men
med lägre komplexitet. Den anpassar sig väl till komplexa kanaler som OAMP-
Net2.
iv | Sammanfattning
Nyckelord
5G, MIMO-detektering, Approximate Message Passing, OAMP-Net2, LVAMP,
MMNet, Deep Learning
Acknowledgments | v
Acknowledgments
This thesis has been a long and hard work, started in January 2021, and ended
in November 2021. It ends a study program started in September 2019 and
that I consider the best choice I could take for my Master Degree. During this
journey I met many people and lived with them intense moments. For this
reason in this section, I want to thank who shared moments and feelings with
me during these years.
First of all I want to thank Huawei Technologies Sweden, for giving me
the possibility to work with them and develop the thesis. In particular I
thank Jinliang, Karl and Nima for the support, help, tips, meetings, time and
knowledge they provide me.
For what concerns the university supervisors, I want to thank Dong Liu,
Umberto Spagnolini and Saikat Chatterjee for the help in writing the thesis in
the most rigorous way.
A great thanks to Stefano, for the time spent in reading and fixing the thesis,
giving me very useful advice in order to improve the work.
A special thanks to Federico, for the amazing work in EIT Digital. The
effort, time and passion for the role are incredible and EIT at PoliMi will not
be the same without Fede.
An immense thanks to Elisabetta, Matteo and Giulia, for supporting,
stimulating, helping me in several ways during these years. I love you.
A great thanks to Ancilla, Severino, Manuela, Paolo, Patrizia, Marco,
Giorgia and Andrea for having always believed in me, and for the support in
all my choices.
I want to thank Samuele, for all the time, experiences, moments, difficulties
and joy shared together during our path. It has been long and complex, but
surely satisfying, fulfilling and rewarding.
A great thanks to Edoardo, Simone, Ivan, Chiara, Alessandro, Mario and
Matteo for pushing me to give always the best, for the support, the advice, the
moments spent together, the long talks and the greatest experiences.
Finally, I want to thank all the people that I met this year, but that gave
me feelings, emotions, experiences, travels, dinners, and incredible friendship
that I will remember forever. Thanks to Riccardo, Matteo, Riccardo, Edoardo,
Adriano, and to all the Trento Gang, Simone, Elisa, Alessandro, Marco,
Marco, Jessica, Paolo, Massimiliano.
Stockholm, December 2021

Andrea Pozzoli
vi | Acknowledgments
CONTENTS | vii
Contents
1 Introduction 1
1.1 Intro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Problem and research question . . . . . . . . . . . . . . . . . 7
1.3.1 Problem . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.2 Research Question . . . . . . . . . . . . . . . . . . . 8
1.3.3 Scientific and engineering issues . . . . . . . . . . . . 9
1.4 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.6 Research Methodology . . . . . . . . . . . . . . . . . . . . . 11
1.7 Delimitations . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.8 Structure of the thesis . . . . . . . . . . . . . . . . . . . . . . 11
2 Background 13
2.1 MIMO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.1.1 Channel Models . . . . . . . . . . . . . . . . . . . . 15
2.1.2 MIMO techniques . . . . . . . . . . . . . . . . . . . 16
2.2 MIMO Detection . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.1 Basics of MIMO Detection . . . . . . . . . . . . . . . 19
2.2.2 Iterative framework . . . . . . . . . . . . . . . . . . . 22
2.2.3 Linear MIMO detection techniques . . . . . . . . . . 22
2.2.4 ISTA . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2.5 FISTA . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2.6 Approximate Message Passing (AMP) . . . . . . . . . 24
2.2.7 OAMP . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2.8 Vector Approximate Message Passing . . . . . . . . . 27
2.2.9 Other Mimo detection techniques . . . . . . . . . . . 30
2.3 Deep learning . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.3.1 LISTA . . . . . . . . . . . . . . . . . . . . . . . . . 31
viii | CONTENTS
2.3.2 LAMP . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.3.3 LVAMP . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.3.4 DetNet . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.3.5 OAMPNet . . . . . . . . . . . . . . . . . . . . . . . 34
2.3.6 MMNet . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.3.7 OAMP-Net2 . . . . . . . . . . . . . . . . . . . . . . 37
2.4 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3 Method 41
3.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.2 Performance Metrics . . . . . . . . . . . . . . . . . . . . . . 42
3.3 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.3.1 LVAMP . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.3.2 OAMPNet2 . . . . . . . . . . . . . . . . . . . . . . . 44
3.3.3 MMNet . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.3.4 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.3.5 Preventing Overfitting . . . . . . . . . . . . . . . . . 46
3.4 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.5.1 Experiment 1 . . . . . . . . . . . . . . . . . . . . . . 49
3.5.2 Experiment 2 . . . . . . . . . . . . . . . . . . . . . . 50
3.5.3 Experiment 3 . . . . . . . . . . . . . . . . . . . . . . 50
3.5.4 Experiment 4 . . . . . . . . . . . . . . . . . . . . . . 50
3.5.5 Experiment 5 . . . . . . . . . . . . . . . . . . . . . . 51
3.5.6 Experiment 6 . . . . . . . . . . . . . . . . . . . . . . 51
3.5.7 Experiment 7 . . . . . . . . . . . . . . . . . . . . . . 51
3.5.8 Summary of experiments . . . . . . . . . . . . . . . . 51
3.6 Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4 Results and Analysis 53

4.1 Experiment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.2 Experiment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.3 Experiment 3 . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.3.1 Experiment 3a . . . . . . . . . . . . . . . . . . . . . 57
4.3.2 Experiment 3b . . . . . . . . . . . . . . . . . . . . . 59
4.4 Experiment 4 . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.4.1 Experiment 4a . . . . . . . . . . . . . . . . . . . . . 60
4.4.2 Experiment 4b . . . . . . . . . . . . . . . . . . . . . 62
4.5 Experiment 5 . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Contents | ix
4.6 Experiment 6 . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.6.1 Experiment 6a . . . . . . . . . . . . . . . . . . . . . 65
4.6.2 Experiment 6b . . . . . . . . . . . . . . . . . . . . . 67
4.7 Experiment 7 . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.7.1 Experiment 7a . . . . . . . . . . . . . . . . . . . . . 69
4.7.2 Experiment 7b . . . . . . . . . . . . . . . . . . . . . 70
4.8 Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . 71
4.9 Summary of results . . . . . . . . . . . . . . . . . . . . . . . 72
5 Conclusions and Future work 75

5.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
References 79
A Extra experiments 87
x | LIST OF FIGURES
List of Figures
1.1 Massive MIMO architecture . . . . . . . . . . . . . . . . . . 3

1.2 Different MIMO detection algorithms . . . . . . . . . . . . . 6
3.1 LVAMP network layer . . . . . . . . . . . . . . . . . . . . . . 43

3.2 OAMPNet2 network layer . . . . . . . . . . . . . . . . . . . . 44
3.3 MMNet network layer . . . . . . . . . . . . . . . . . . . . . . 45
4.1 Experiment 1 graphical results: the graph represents the SER

values for the different SNR values for each algorithm, in
a 4 receivers and 4 transmitters MIMO system, trained and
tested on i.i.d. Gaussian channel matrices using QAM-16 as
modulation scheme . . . . . . . . . . . . . . . . . . . . . . . 55
values for the different SNR values for each algorithm, in a
32 receivers and 32 transmitters MIMO system, trained and
4.3 Experiment 3a graphical results: the graph represents the
SER values for the different SNR values for each algorithm,
in a 32 receivers and 32 transmitters MIMO system, trained
with Kronecker channel matrices and tested on a Kronecker
channel model with ρR = ρT = 0.3 using QAM-64 as
4.4 Experiment 3b graphical results: the graph represents the SER
32 receivers and 32 transmitters MIMO system, trained with
Kronecker channel matrices and tested a Kronecker channel
model with ρR = ρT = 0.5 using QAM-64 as modulation
scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
LIST OF FIGURES | xi
4.5 Experiment 4a graphical results: the graph represents the SER

values for the different SNR values for each algorithm, in a 32
receivers and 32 transmitters MIMO system, trained with i.i.d.
Gaussian channel matrices and tested a Kronecker channel
scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
64 receivers and 32 transmitters MIMO system, trained and
4.8 Experiment 6a graphical results: the graph represents the
4.9 Experiment 6b graphical results: the graph represents the
4.10 Experiment 7a graphical results: the graph represents the SER
scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
xii | LIST OF FIGURES

scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
A.1 Experiment A-1 graphical results: the graph represents the

SER values for the different SNR values for each algorithm, in
A.2 Experiment A-2a graphical results: the graph represents the
A.3 Experiment A-2b graphical results: the graph represents the
A.4 Experiment A-3a graphical results: the graph represents the
with i.i.d. Gaussian channel matrices and tested a Kronecker
A.5 Experiment A-3b numerical results: the cells represents the
SER value for the different algorithms at different SNR values,
with i.i.d. Gaussian channel matrices and tested on Kronecker
LIST OF TABLES | xiii
List of Tables
3.1 Experiments summary . . . . . . . . . . . . . . . . . . . . . 52
4.1 Experiment 1 numerical results: the cells represents the SER

value for the different algorithms at different SNR values, in
4.3 Experiment 3a numerical results: the cells represents the SER
value for the different algorithms at different SNR values, in a
Kronecker channel matrices and tested on Kronecker channel
scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.4 Experiment 3b numerical results: the cells represents the SER
scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
xiv | LIST OF TABLES

value for the different algorithms at different SNR values,
scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
LIST OF TABLES | xv

4.12 Complexity and number of iterations/layers for convergence
of each algorithm . . . . . . . . . . . . . . . . . . . . . . . . 72
4.13 Summary of results analysis . . . . . . . . . . . . . . . . . . 73
xvi | LIST OF ALGORITHMS
List of Algorithms
1 OAMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2 VAMP in SVD form . . . . . . . . . . . . . . . . . . . . . . . 28
3 VAMP in LMMSE form . . . . . . . . . . . . . . . . . . . . 29
4 LAMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
List of acronyms and abbreviations | xvii
List of acronyms and abbreviations

3D three dimensions
3D-Uma Urban Macro
3D-Umi Urban-Micro
3GPP 3rd Generation Partnership Project
4G Fourth-Generation
5G Fifth-Generation
AMP Approximate Message Passing
AMP-LA Approximate Message Passing simplified by Linear Approximation
ANN Artificial Neural Network
AP angular profile
AWGN Additive White Gaussian Noise
BER Bit Error Rate
BiG-AMP Bilinear Generalized AMP
BP Belief Propagation
BS Base Station
CCI Cochannel interference
DL Deep Learning
DLBP Deep Learning Belief Propagation
EB ExaBytes
EP Expectation Propagation
FD-MIMO full-dimension MIMO
FISTA Fast ISTA

xviii | List of acronyms and abbreviations
GAMP Generalized Approximate Message Passing
GSCM geometry-based stochastic Spatial Channel Model
i.i.d. Independent and Identically Distributed
ICI Inter-Carrier Interference
IoT Internet of Things
ISI Inter-Symbols Interference
ISTA Iterative Soft Thresholding Algorithm
JCESD joint channel estimation and signal detection
KL Kullback-Leiber
LAMP Learnable AMP
LISTA Learnable ISTA
LMMSE Linear Minimum Mean Squared Error
LOS line of sight
LRA Lattice Reduction Aided
LTE Long-Term Evolution
LVAMP Learnable Vectorized AMP
MAP Maximum a posteriori
MF matched filtering
MIMO Multiple-Input-Multiple-Output
MIMO-OFDM Multiple input multiple output orthogonal frequency division

multiplexing
ML Maximum Likelihood
MLP Multilayer Perceptron

List of acronyms and abbreviations | xix
MMSE Minimum Mean Squared Error

MSE Mean Square Error
MUD Multi User Detection
NLOS non line of sight
OAMP Orthogonal AMP

OAMPNet OAMP Neural Network
OAMPNet2 OAMP Neural Network Second
OFDM orthogonal frequency division multiplexing
OtoI outdoor to indoor
PDP power delay profile

PIC Parallel interference cancellation
QAM Quadrature Amplitude Modulation

QoS Quality of Service
SD Sphere Decoding
SDR Semi-Definite Relaxion
SER Symbol Error Rate
SIC Successive Interference Cancellation
SNR Signal to Noise Ratio
SVD Singular Value Decomposition
TAMP Trainable Approximate Message Passing
UE User Equipment
VAMP Vectorized AMP

VBI Variational Bayesian Inference
ZF Zero Forcing
xx | List of acronyms and abbreviations
Introduction | 1
Chapter 1
Introduction
1.1 Intro
During the last years, the demand in wireless communication data transferring
in a rapid and reliable way is increased and continues to increase drastically.
The global multimedia data traffic grew from 0.82 ExaBytes (EB) per month
in 2013 to 15.9 EB per month in 2018. The causes of this growth are
for sure smartphones, tablets and computers, but also machine to machine
communications and Internet of Things (IoT). In order to face the quantity of
data and the speed of communication, one solution is to increase the spectral
bandwidth. However, in the frequency bands depicted for communication,
the bandwidth is really limited. Also increasing the transmission power is
not feasible since the interference among signals would corrupt the message.
Another way for solving the problem is to increase the spatial bandwidth which
is not used fully and that can offer more degrees of freedom [1]. Massive
Multiple-Input-Multiple-Output (MIMO) technology is the key technology
that is used in advanced wireless communications and in particular in the
Fifth-Generation (5G) mobile communication system [2]. This technique
enhances a large number of antennas at both the transmitter and the receiver
side and therefore is able to face an increasing number of mobile users and
satisfy user demands [3]. The Fourth-Generation (4G) mobile communication
system is based on Long-Term Evolution (LTE) technology and it can afford
eight antennas at the receiver side, also called Base Station (BS). In 5G
systems instead, a BS equipped with a large number of receiver antennas (64-
256) serves multiple single-antenna transmitters, called User Equipment (UE),
simultaneously on the same frequency [4]. Massive MIMO can also reach a
higher spectral efficiency (measured in bits/s/Hz) than LTE standard. MIMO
2 | Introduction
increases the information throughput and link reliability by exploiting link

diversity.
With a high number of antennas, trying to recover the transmitted information
in a massive MIMO up-link receiver is more computationally complex,
because the transmitted signals interfere with each other. Several users
that move frequently access the same channel simultaneously, and also the
surrounding environment can change. Therefore, the channel is time-varying
and shared. The thesis focuses on signal detection in a multi-user communication
scenario, in particular at the receiver side. This means that the focus is on
trying to detect correctly the signals transmitted by the UEs, from the signal
that arrives at the BSs. This is one of the biggest and most important problem
in MIMO systems, because separate/detect the original signal at the receiver
side is difficult due to interference with other signals and environment.
The optimal MIMO detector in terms of performance is the Maximum
Likelihood (ML) detector, but it has a complexity that is exponentially
increasing with the number of transmitters. It consists of an exhaustive search
over all the possible symbols for each UE. Therefore it is necessary to find
sub-optimal algorithms with the best performance/complexity trade-off, such
as Zero Forcing (ZF) and Linear Minimum Mean Squared Error (LMMSE),
that have a lower computational complexity but also worse performance.
Promising detectors with excellent performance and reduced complexity
are iterative detectors based on Approximate Message Passing (AMP) and
Expectation Propagation (EP) algorithms [5]. The AMP based detector
approximates the posterior distribution on a dense factor graph by using the
central limit theorem and the Taylor expansion [6], while EP based detector
approximates the posterior probability with factorized Gaussian distributions.
Both methods can achieve Bayes-optimal performance with large scale MIMO
systems. AMP based detector works well with channel matrices with independent
elements and identically sub-Gaussian distributions, while EP with unitarily-
invariant channel matrices. For small size MIMO systems and with correlated
channel matrices, the AMP-based and EP-based detectors are far from Bayes-
optimal solution.
AMP is an iterative class of algorithms that works well with compressive
sensing [7], that is a problem similar to signal detection. AMP has the same
structure as Iterative Soft Thresholding Algorithm (ISTA), a well known class
of algorithms, but with the addiction of the Onsanger correction term. Thanks
to this term AMP has a faster convergence than ISTA.
Recently, Deep Learning (DL) reported promising results in signal detection
showing that it can reduce the prediction complexity during training. In
Introduction | 3
order to improve accuracy and convergence time, for both ISTA and AMP
neural networks architecture have been applied. The results are Learnable
ISTA (LISTA) and Learnable AMP (LAMP), with LAMP that shows better
performances in both accuracy and complexity [8].
Another version of the AMP algorithm is the Orthogonal AMP (OAMP).
A deep learning network based on OAMP has been developed and it is called
OAMP Neural Network (OAMPNet). This model is supposed to work with a
larger class of channel matrices, in particular with the unitarily-invariant ones
[5].
In figure 1.1 a scheme of a MIMO architecture is shown. It highlights the
massive number of receiver antennas at the BS that serve simultaneously tens
of users.
Figure 1.1: Massive MIMO architecture
1.2 Related work

During years, several methods for solving MIMO detection problem have
been proposed and tested. Some methods tries to reduce the complexity with
approximations scarifying the accuracy. These methods can be used in context
where the problem can afford an accuracy that is tolerable. Instead, if the
MIMO detector is not a bottleneck in the system, it is preferable to choose a
detector with a better accuracy at the expenses of the complexity.
4 | Introduction
The simplest MIMO detection methods work with a linear preprocessing

that decouples the streams before trying to detect the original bits sequence.
Other methods instead use non linear approximation, while the most advanced
ones add some user parameter such as in Sphere Decoding (SD) [9], reduced-
dimension maximum a posteriori, partial marginalization, reduced dimension
maximum likelihood and lattice reduction. Initially the MIMO systems were
composed by tens antennas with a close to square structure, that means that
the number of transmitters and receivers were close. In the last five years,
the research moved on massive MIMO detection with hundreds of receiver
antennas and a structure that can be both rectangular or square. With this
switch of dimensions, detection methods that were well-suited for medium
sized MIMO systems, are no longer valid for large sized systems. Instead,
methods that did not perform well with less antennas, can achieve close to
optimal performances in large systems. In particular, this situation turned out
to happen with rectangular MIMO systems with a ratio between number of
transmitter antennas and receiver ones that is very small.
The two main categories of detectors are hard detector and soft detector.
The first ones decide if a bit is zero or one, the second ones instead decide how
likely a bit is zero or one, producing in this way more information and usually
better performances. Another distinction among MIMO detection methods is
between detectors whose complexity depends on the channel realization and
the ones whose complexity is fixed. The first ones usually are tree based and
the worst case is usually exponential, while the second ones are preferable from
an implementation point of view because they have fixed trade-off between
complexity and performance and they are parallelizable [1].
The first MIMO detection method dates to the 1960s, when in 1967
Shnidman considered the equalization problem of a bandwidth-limited pulse
modulation system. He modelled a system where M waveforms were transmitted
together over a single physical channel after being amplitude-scaled, and then
M outputs were produced. In order to eliminate the interference among the
waves, the scientist formulated a generalized Nyquist criterion and proposed
an optimum linear receiver. The history of MIMO detection can be divide in
four periods. In the period 1960s-1970s the focus was on combating crosstalk
in early single-users FDM/TDM systems. In 1980s-1990s the research moved
to Multi User Detection (MUD). After that from mid-1990s to mid-2000s
there was the period of joint symbol detection in small/medium scale multiple-
antenna systems. Finally, the current period is focused on large-scale multiple
antenna systems [4].
As said in the previous section, the ML detector is the optimal one in terms
Introduction | 5
of accuracy but due to its exponential complexity, it cannot be adopt in MIMO

systems. Therefore the scientists moved the research on sub-optimal methods
with a lower complexity. The traditional methods can be linear or non-linear
detection algorithms. Among the linear ones there are matched filtering
(MF), ZF and Minimum Mean Squared Error (MMSE). These methods are
characterised by much low complexity, but the accuracy performances are bad
and very far from optimality. Therefore, the linear detectors are not a valid
solution for practical detection in Massive MIMO system. For what concerns
the non-linear detection algorithms the traditional methods are divided into
categories [10]:
• Interference cancellation based, such as Successive Interference Cancellation

(SIC) and Parallel interference cancellation (PIC), that have similar
performance and complexity compared to the linear methods [11].
• SD category that is included into the tree searching based methods, with
very good performance and also practical use, but with some problems
with medium fixed Signal to Noise Ratio (SNR).
• Lattice Reduction Aided (LRA) algorithm that has polynomial complexity

and optimal performance, but on small sized MIMO with high SNR
When the research moved on Massive MIMO, new methods, different from
the traditional ones, arose. As it is possible to see in Figure 1.2, the MIMO
detection methods can be divided in different categories. A first division
is between optimal and suboptimal methods. Among the optimal detectors
there are only ML and Maximum a posteriori (MAP). For what concerns the
suboptimal detectors, another division can be done between linear methods
and non linear ones. The linear methods are ZF, MMSE and MF. The non
linear methods are divided among different classes. In the figure 1.2 these
classes are Tree Search, Message Passing, Probabilistic Data Association and
Interference Cancellation.
Based on the Belief Propagation (BP), the AMP based algorithms are sub-
optimal detectors that achieves near optimal accuracy and a good complexity
[12]. The thesis will focus on this class of algorithms [13], therefore AMP,
Vectorized AMP (VAMP), LAMP, Learnable Vectorized AMP (LVAMP),
OAMP [14], OAMPNet and OAMP Neural Network Second (OAMPNet2)
will be explained in details in the next chapter. Another key turning point in
MIMO detection has been the introduction of deep learning [15]. For this
reason, the last part of this section will present some DL and AMP based
6 | Introduction
Figure 1.2: Different MIMO detection algorithms
techniques for solving MIMO detection. These methods are not taken in
consideration for this thesis.
The author of [16] propose an iterative algorithm that combines different
signal processing methods. The algorithm is a main-branch structure that in
parallel iterates the soft information between the main detector and the branch
one. Secondly, expectation propagation and linear approximation are used
to obtain the symbol belief reducing the complexity. Finally, Approximate
Message Passing simplified by Linear Approximation (AMP-LA) is used as
detector with Mean Square Error (MSE) criterion that is used to compute the
correlation coefficient of main and branch detectors for each iteration. This
detector manages to reduce the complexity by avoiding matrix inversion and to
cancel the Inter-Symbols Interference (ISI) and Cochannel interference (CCI),
with a small number of iterations.
In [17] the Generalized Approximate Message Passing (GAMP) algorithm
is presented. GAMP extends AMP methods by including arbitrary distribution
on both the input and the output of the transform. Its advantage is an
efficient approximate implementation of max-sum and sum-problem loopy
belief propagation. Moreover, the algorithm is computationally simple involving
only scalar estimation and linear transforms. GAMP is based on Gaussian
and quadratic approximations of loopy BP, a methodology for estimation
Introduction | 7
problems where the relationships among variables are represented graphically

and the estimations are updated through AMP. In [18] a deep learning based
Trainable Approximate Message Passing (TAMP) algorithm is proposed. It
consists of a neural network composed by a preprocessing layer that acts like a
learnable filter followed by detection layers derived by unfolding the iterations
of GAMP. TAMP uses parameters that are trainable in order to control the
prior mean and variance of MMSE denoiser. TAMP uses backpropagation for
parameter tuning.
A method based on the approximation of the original discrete messages
sent in an AMP based factor graph with continuous Gaussian messages
through Kullback-Leiber (KL) divergence criterion has been proposed in
[19]. The principle of expectation propagation is applied in order to compute
the approximate Gaussian messages and then the approximate message is
computed from the Gaussian approximate belief. Moreover, from the a
posteriori probabilities fed back from channel decoders, it is possible to
compute the approximate symbol belief, reducing the complexity. In addition,
the first order approximation is used to update the message leading to an
algorithm equivalent to AMP. Finally, also the central limit theorem is used
in order to simplify the message updating. The algorithm presents low
complexity and good performances for small MIMO systems.
The authors of [20] combined different techniques in order to complete
channel estimation, MIMO detection and noise level estimation simultaneously.
They proposed an algorithm called variational approximate message passing
that exploits the advantages of AMP, in particular using Bilinear Generalized
AMP (BiG-AMP), and Variational Bayesian Inference (VBI).
Another deep learning neural network, this time base on belief propagation,
has been proposed in [21] and it is called Deep Learning Belief Propagation
(DLBP) detector. The network is composed by multiple units that consist of a
four layer neural network derived by unfolding the BP algorithm. The output
of each unit is designed so that it is possible to speed up the training, that is
conducted using cross entropy loss function.
1.3 Problem and research question

1.3.1 Problem
As it is possible to understand from the intro (1.1) and the related work
(1.2) sections, several methods have been tested during years in order to
solve MIMO detection problem. Every years, new methods are studied,
8 | Introduction
presented and tested in order to find the one that best deal with performance-
complexity trade-off. Recently two new algorithms based on deep learning
and approximate message passing have been proposed. The first one is called
LVAMP and it is a deep leaning based neural network that is obtained by
unfolding the VAMP algorithm. The second one instead is a new network
obtained by unfolding OAMP algorithm, adding more learnable parameters
respect to OAMPNet, and called OAMPNet2. From the first tests that have
been made with these two new algorithms, the results seem to be promising in
performance and complexity on both small and large MIMO systems. For
this reason, the company that hosted the thesis wanted to investigate the
performance of both the neural networks on different channel models, different
MIMO system sizes and different values of SNR. A comparison between these
two algorithms in different situations and with a common baseline is missing,
therefore the thesis is focused on that.
1.3.2 Research Question

From the problem described, the research question emerges easily and can be
stated as follow.
Research Question: Can LVAMP and OAMPNet2 be considered as sub-

optimal solutions to MIMO detection problem and which one performs better
in terms of complexity and Symbol Error Rate (SER) on different MIMO
scenarios?
Some concepts of the research question need an explanation. Sub-

optimal solution stands for a working algorithm with a possible practical
implementation that can handle the MIMO detection problem with a timing
that respects the communication speed and with low errors during the detection.
In particular, the thesis considers as sub-optimal solution a MIMO detector
that obtains at least the same performances of MMSE detector, with a
complexity that is polynomial respect to the dimension of the MIMO system.
The complexity of the algorithm is the time complexity, not the space one, and
it will affect the timing of the algorithm. The SER metric is the measure of
the performance used in the thesis and that will be explained later.
Introduction | 9
1.3.3 Scientific and engineering issues

The main problem while dealing with the problem and the research on this
field is represented by the simulator on which the algorithm is tested. In
fact, the algorithms are not tested on a real 5G network, but on network
simulators implemented in laboratory. The two original neural networks that
are considered for the thesis are implemented and tested on two different
simulators by their authors. A third simulator, different from the two used by
the researchers that developed LVAMP and OAMPNet2, is used for comparing
the algorithms. Therefore, a first issue is to implement a working link layer 5G
simulator, and secondly to adapt the implementations of the two algorithms to
the new simulator.
1.4 Purpose
The purposes of the thesis and the degree project are several. First of all, from
the study conducted, the company can discover the potentialities of two new
algorithms and decide if they can improve the quality of its networks. If the
algorithms perform better than the actual ones, they can be considered to be
adopted, bringing a competitive advantage to the company, and benefits in
terms of quality of transmission and speed for the customers.
Secondly, the project can prove the knowledge and the skills of the author,
validating the Master degree. If well conducted, the degree project meets the
goals and objectives defined by the university and by the course of study.
Moreover, if the thesis will result in a real application for the 5G networks,
it will bring positive effects on the society and life of people. In fact,
the modern society followed an impressive and fast development from the
introduction of Internet, and the possibility of having a more precise and fast
network can help in continuing to improve the status. Thanks to 5G networks
new technologies and commodities can be developed in different fields, from
home automation to health, from transportation to security.
From a sustainability point of view, communications with a low complexity
and less errors (that means also less re-sending of messages) can reduce the
amount of energy required, and therefore the amount of emission.
During the deployment of the project, it was not required to contact other
subjects that can provide data or algorithms. Data have been provided by the
company or was of public domain. The same for machine learning algorithms.
Therefore, they can be used without the necessity of contacting organizations
or people to get the utilization rights. The data and the algorithms provided
10 | Introduction
by the company could be protected by a non-disclosure agreement.

The research project in the analysis did not require any other participants
except the author and the supervisors. The project presents neither surveys
nor interviews, and there are not experiments conducted on animals or persons.
Therefore, from an ethical point of view, the project did not need to collect and
store personal or sensitive data. The author declare that there is no conflict of
interest while conducting the research project.
From a sustainability point of view, the project can help the company
in achieving the UN goals, in particular the Goal 9, Goal 11 and the Goal
12. Goal 9 is about building resilient infrastructure, promoting inclusive
and sustainable industrialization and fostering innovation. Goal 11 is about
making cities and human settlements inclusive, safe, resilient and sustainable.
Goal 12 is about ensuring sustainable consumption and production patterns
[22].
1.5 Goals
The main goal of the degree project is to answer the research question, and
therefore to compare the performance, complexity and feasibility of LVAMP
and OAMPNet2 while solving the MIMO detection problem. In order to meet
this goal, it has been divided in sub-goals.
1. The first sub-goal is to implement a working link-layer 5G network

simulator. This will not be described in the thesis, in order to respect the
willingness of the company and because it is not the main focus of the
thesis. The simulator has been developed in Python [23] and Tensorflow
[24].
2. The second subgoal is to find and study the characteristics of the two
algorithms in their original version.
3. After that, as third sub-goal, it is required to implement both the

algorithms, together with the baselines, adapting all to the simulator.
4. The fourth sub-goal is to make the experiments in order to compare the

algorithms using different datasets and MIMO scenarios.
5. As last sub-goal, an analysis of the results is necessary in order to

evaluate the quality of the algorithms.
Introduction | 11
The final deliverable of the project will be a comparison in terms of SER

and complexity of the algorithms on both small and large sizes MIMO systems,
with different SNR conditions. The results will be presented in both numerical
and graphical way.
1.6 Research Methodology

The research methodology of the thesis will be a quantitative and experimental
one. The choice of this type of methodology is quite obvious. In order
to compare algorithms on different scenarios it is required to perform some
experiments by coding the algorithms and testing them on the pre-defined
conditions. When the results are obtained, a numerical approach is used
to compare the algorithms by analysing the SER achieved by them and
considering also the complexity. A qualitative approach for this type of thesis
cannot be followed and due to the fact that the scenarios are different from the
ones that can be retrieved in the original papers of the algorithms, a survey is
not sufficient for performing a correct comparison.
1.7 Delimitations
The first limit of the thesis, as cited before, is the fact that the structure of the
simulator is not explained. The simulator that has been used is not a public
one, but it belongs to the company and it cannot be published.
The second limit is that the real world scenario is only simulated and the
algorithms are not tested on a real 5G network. Therefore, if the algorithms
will be applied on a real network, the behaviour can be different from the one
that they follows in laboratory.
The algorithms are implemented by following their original definition,
without changing or trying to improve them with different parameters.
1.8 Structure of the thesis

This first chapter was an introduction to the thesis topic and to the development
of the research.
Chapter 2 presents a detailed background about the MIMO detection
problem and algorithms. The MIMO detection problem is explained through
its model. Then, the most famous algorithms for solving it are explained, and
12 | Introduction
a focus is put on the algorithms that are based on AMP, and in particular the
ones that are compared in the thesis work.
Chapter 3 presents the methodology and method used to solve the problem.
The creation of the dataset is explained and due to the fact that the algorithms
are deep learning based, also the training and validation procedures are
detailed. It represents the core part of the thesis and it also shows the
experiments that have been studied and created in order to meet the goal of
the research.
Chapter 4 reports and analyses the results of the experiments, trying to
answer the research question and verifying the achievement of the goals.
Finally, Chapter 5 makes a deeper analysis of the thesis work, by commenting
the procedure and the consequent results. It ends the thesis with a summary
of the work, some reflections, and the future works that can follow this thesis
and that are still missing while dealing with the problem.
Background | 13
Chapter 2
Background
This chapter provides basic background information about MIMO technology

and MIMO detection algorithms. In Section 2.1, MIMO systems are defined
and explained, considering the different channel models that are used in order
to create the channel matrix and the technologies that bring MIMO to be the
main characteristic of 5G networks. After that, MIMO detection algorithms
are defined and explained, starting from the basic ones, continuing with the
iterative-based and finishing with the deep learning ones.
2.1 MIMO
In Massive MIMO the base station is equipped with thousands of antennas
that serve tens of user terminals simultaneously at the same frequency. The
use of multiple antennas is useful for compensating losses of signal that comes
from reflection and collision with objects. Thanks to Massive MIMO the
throughput of the antennas is also increased by more than 10 times and the
energy efficiency by 100 times respect to previous technologies. Moreover,
Massive MIMO requires low power devices at low cost, cuts down the air
interface delay and simplifies the multiple access. Instead, a weakness of
Massive MIMO is that uplink and downlink channels do not satisfy reciprocity.
Moreover, channel estimation and signal detection cannot reach optimal Bit
Error Rate (BER) in a polynomial complex time [10].
MIMO technology brings as benefit an improvement in link reliability and
spectral efficiency. In order to enhance these advantages, it is essential to
have efficient channel estimation and signal detection algorithms that balance
performance and complexity. The thesis focuses only on the uplink channel,
therefore on the signals that the receiver antennas (the base stations) receive
14 | Background
from the transmitter antennas (the user equipment) within a limited coverage
area called cell. In this cellular system the base station has the role of
coordinator. Usually, the receiver antennas are equal or greater in number
respect to the transmitter antennas. MIMO is represented trough a matrix
mathematical approach. Each transmitter antenna i sends a xi message, that
follows different paths with different channel properties to reach the receiver
antennas. The different paths are represented by hij that is the channel that
links the transmitter i with the receiver j. Each receiver antenna j collects a
message yj that is the combination of signals that arrives to the antenna j. with
three transmitters and three receivers for example, the system can be expressed
as in 2.1:
y1 = h11 x1 + h21 x2 + h31 x3

y2 = h12 x1 + h22 x2 + h32 x3 (2.1)
y3 = h13 x1 + h23 x2 + h33 x3
This system uses a technology called spatial multiplexing, that consists in

allowing each transmitter to send multiple streams of data to many receivers at
the same time and via the same frequency. The spatial multiplexity increases
the capacity of MIMO systems by exploiting the multipath propagation [25].
In general, the system is modelled through the MIMO system 2.2 in matrix
form :
ȳ := H̄ x̄ + n̄ (2.2)
where ȳ is a vector of complex values of length N̄r , with N̄r that is the
number of receiver antennas, and represents the signal received by the BSs. x̄
represents the signal transmitted by the UEs and it is a vector of complex values
of length N̄t , with N̄t that is the number of transmitter antennas. n̄ represents
the noise and it is a vector of complex numbers of length N̄r . Finally, H̄ is
called channel matrix, it has shape N̄r × N̄t and hij ∈ C is the channel gain
that derives from x̄j and x̄i antennas. Usually N̄r ≥ N̄t and N̄r and N̄t are
powers of two. For small MIMO systems, N̄r and N̄r assume values lower or
equal than 16 and the ratio N ¯r is equal to 1 (or 0.5). For medium-large MIMO
N̄t
systems, the number of antennas is greater than 16 and the N ¯r is usually 0.5
N̄t
or 0.25 (few times equal to 1). Massive/large MIMO systems use hundreds or
even thousands of antennas.
Definition 1 (MIMO system): the MIMO system is modelled by the linear

system
ȳ = H̄ x̄ + n̄
Background | 15
¯ ¯ ¯
where H̄ ∈ C Nr ×N̄t , ȳ ∈ C Nr , n̄ ∈ C Nr , x̄ ∈ ĀN̄t where Ā ⊂ C is a discrete
alphabet.
The possible values that x̄ can assume are defined by an alphabet called
constellation. The channel matrix can assume different structures, called
channel models. In particular in this thesis the focus is on the Independent
and Identically Distributed (i.i.d.) Gaussian channel model and the Kronecker
channel model.
The MIMO model that is used in the thesis follows the following assumptions:
• Noise is complex zero-mean Gaussian ni ∼ CN (0, σ 2 ) and the

Covariance matrix is σ 2 IN¯r .
• The column of H̄ are normalized, ∥h̄i ∥2 = 1.

E(∥H̄ x̄∥22 )
• SN R = 10 log10 N¯r σ 2
.
• The channel matrix H̄ is known by the receiver.
2.1.1 Channel Models

i.i.d Gaussian channel model
The common channel model, used in most of the literature in order to test
and compare MIMO signal detection algorithms, is the i.i.d. Gaussian one.
This channel model is so common because it is simple but it is far from the
real scenario. In fact, it assumes that each h¯ij is independent from all the
others and h¯ij ∼ CN (0, 1/N̄r ). The columns of H̄ are normalized such that
∥h̄i ∥2 = 1 in this thesis. Due to the fact that the channels in a real scenario are
spatially correlated and so dependent each other, also the Kronecker channel
model is considered.
Kronecker channel model

As said before, the Kronecker channel model simulates the spatial correlation
1/2 1/2
among channels [26]. It is modelled as: H̄ = RR KRT where k¯ij ∼
CN (0, 1/N̄r ) and RR and RT are the receiver and transmitter Correlation
matrices respectively. These matrices are generated according to an exponential
correlation model, depending on a parameter of correlation that assumes
values from 0 (i.i.d. Gaussian) to 1 when all the channels interfere with
each other. The coefficient at receiver side is indicated as ρr while the one
16 | Background
at transmitter side is ρt . Also in this case, the column of H̄ are normalized

such that ∥h̄i ∥2 = 1.
3GPP 3D MIMO channel model

Together with massive MIMO, another key technology in 5G communication
systems is three dimensions (3D) beamforming, therefore channel models that
considers also the elevation direction when transmitting a signal and not only
the azimuth. This can lead to an improvement of spectral efficiency and
reliability of the communication. The 3rd Generation Partnership Project
(3GPP) released a new channel model that is three dimensional spatial,
supports planar antennas and enables elevation beamforming. 3GPP introduced
the geometry-based stochastic Spatial Channel Model (GSCM) that tries to
incorporate a random power delay profile (PDP) and a random angular profile
(AP) in order to model a realistic scenario by both deterministic and stochastic
point of views [27]. It is a geometric stochastic model but with both large-
scale and small-scale parameters randomly drawn from tabulated distributions.
The GSCM model, and its extended version (SCME) model 15 different
real-world scenario including rural, urban and moving environment, with
up to 100MHz of bandwidth and frequency range of 2-6 GHz. The main
scenarios for elevation beamforming and full-dimension MIMO (FD-MIMO)
are Urban Macro (3D-Uma) and Urban-Micro (3D-Umi). The scenarios take
in consideration an European city environment with buildings and nature
considering also height and density, and user equipment that can be inside or
outside the building, with receiver antennas that are at 10 meters for 3D-Umi
(below the height of surrounding buildings) or 25 meters for 3D-Uma (above
the height of surrounding buildings) from ground [28]. The propagation
condition can be line of sight (LOS), non line of sight (NLOS) and outdoor to
indoor (OtoI).
2.1.2 MIMO techniques

From Complex to Real MIMO System
Due to the fact that working with complex values is more difficult than working
with real numbers, the MIMO model previously described can be convert from
a complex-valued system to a real-valued system. The conversion works by
treating the real and the imaginary parts of a complex number separately.
A new vector is defined for each variable of the model, in particular the
Background | 17
transmitted symbol vector becomes

T
(2.3)

x= ℜ(x̄T ) ℑ(x̄T )
where ℜ(·) is the real part of the complex number between brackets and ℑ(·)
is its imaginary part. Following the same approach, also the other variables
are modified and in particular
T
y= ℜ(ȳ T ) ℑ(ȳ T )
T
n = ℜ(n̄T ) ℑ(n̄T ) (2.4)
ℜ(H̄ T ) −ℑ(H̄ T )

H=
ℑ(H̄) ℜ(H̄)
With these new variables, it is possible to define differently the MIMO system
in 2.5
Definition 2 (Real-valued MIMO):
y = Hx + n
Nr = 2N̄r (2.5)
Nt = 2N̄t
The last modification that must be applied concerns the alphabet A. With
these new definition of MIMO system, x can assume values coming from A,
where A = ℜ(Ā) = ℑ(Ā) in the case of Quadrature Amplitude Modulation
(QAM) modulation.
QAM Modulation
One of the techniques used in communication system in order to transmit a
signal is the modulation. Modulation is an operation applied on the periodic
waveform called carrier signal that varies its phase and/or its amplitude and/or
its frequency in order to transmit information. The modulation technique used
in this thesis is the Quadrature Amplitude Modulation (QAM), that changes the
amplitude and the phase of the carrier signal. It consists of producing a signal
in which two carriers with the same frequency shifted in phase by 90 degrees
(they are in quadrature or orthogonal) are modulated and combined. At the
receiver side the signal can be divided thank to the orthogonality property. A
basic signal can transmit only a 0 or a 1 since it can exhibit only two positions.
Thanks to QAM it is possible to enhance different points that differs for phase
18 | Background
and amplitude. The QAM points are spaced in a squared grid with equal
horizontal and vertical spacing, that is called constellation diagram. Due to the
fact that digital communications use binary data, usually the number of points
that compose the constellation is a power of 2. The most used forms of QAM
are QAM-4, QAM-16, QAM-64, QAM-256 as QAM is square. Different
binary values are assigned to different symbols in the constellation and in this
way it is possible to transfer data with a single signal in a much √ higher rate.
In QAM-M the points have values along each axis equal to ±( M − 1)d/2
where M is a power of two and square number and d is the minimum distance
between two different points in the constellation.
OFDM
Another fundamental technique that is used in MIMO systems is the orthogonal
frequency division multiplexing (OFDM), that is a form of multicarrier
modulation that consists of multiple sub-carriers used by a single channel
on adjacent frequencies [29]. Usually, the signals are sent spaced and not
overlapping, so that the receiver can separate them through a filter. Instead,
in OFDM the sub-carriers are overlapping so that the spectral efficiency is
maximized [30]. Usually, using overlapping adjacent channels brings to
interference among each other, but in OFDM the sub carriers are orthogonal
each other, and thanks to this property, the signals overlap without interfering,
ensuring no ISI and reducing Inter-Carrier Interference (ICI) [31]. This
characteristic is achieved by having the carrier spacing equal to the reciprocal
of the symbol period.
Definition 3 (OFDM): It is a method of transmitting digital data parallelly

on orthogonal carriers at very high rate with high Quality of Service (QoS).
The data that has to been transmitted through an OFDM signal is divided
among sub-carriers, reducing in this way the data rate of each sub-carrier and
limiting the effect of interference from reflection [32]. Other advantages of
OFDM are:
• Interference on a given frequency affects only a part of the sub-carriers

and not the whole signal, and if a error-coding technique is used, it is
possible to reconstruct the corrupted carriers at the receiver side.
• Using overlapping sub-carriers it is possible to use efficiently the available

spectrum.
Background | 19
• OFDM is resilient also to inter-symbols and inter-frequency interference,

thanks to a lower data rate.
• OFDM does not need signal equalization.
• Easy and less complex hardware implementation.
Multiple input multiple output orthogonal frequency division multiplexing

(MIMO-OFDM) is the main air interface for 5G broadband wireless communications,
since it provides more reliable communications at high speed. It achieves
the greatest spectral efficiency, and highest capacity and data throughput [31].
MIMO-OFDM achieves antenna diversity and spatial multiplexing.
2.2 MIMO Detection

2.2.1 Basics of MIMO Detection
Definition 4 (MIMO Detection): Given a MIMO system a MIMO Detection
method can be defined as the problem of retrieving the transmitted signal
vector x ∈ RNt from a noisy linear measurement that can be expressed as
y = Hx + n ∈ RNr where H is a known matrix ∈ RNr ×Nt and n is an
unknown unstructured noise vector ∈ RNr .
This problem can be also called standard linear regression or, in the signal
processing literature, linear inverse problem (or compressive sensing if Nr <<
Nt and x is sparse). The multiple symbols can be detected separately or jointly.
In joint detection in order to detect a symbol it is required to consider also the
characteristics of the other symbols, while in separate detection each symbol is
detected indipendently. Typically joint detection achieves better performances
than separate detection despite an higher complexity. The channel matrix can
be known from explicit channel estimation, in this case the detection of x is
called coherent detection, or the explicit estimation can be avoided and in this
case the detection is said incoherent detection.
The optimal method for solving the MIMO detection problem is represented
by the ML detector in 2.6.
Definition 5 (Maximum Likelihood Detector):
x̂ = arg max prob(y|x, H) = arg min ∥y − Hx∥2 (2.6)

x∈ANt x∈ANt
20 | Background
The best estimation x̂ for x is the one that maximize the likelihood
p(y|x, H), but this approach suffers from a high computational complexity due
to the exhaustive search over all possible values of x ∈ ANt . This optimization
problem is a NP-hard problem due to the finite alphabet constraint. It useful
to remember that x is a vector of length Nt and it can assume values taken
only from the discrete alphabet A. The values that form A in this thesis
are derived from the QAM modulation scheme, that converts the binary
signals in pairs amplitude-phase expressed as complex numbers. To solve the
MIMO detection problem it is required to find a sub optimal solution trying
to handle the performance/complexity trade-off. There are methods that are
computationally cheap but with low accuracy, and methods with high accuracy
but that are computationally expensive. The accuracy and the complexity of
some methods can vary with the dimension of the MIMO system.
Several methods have been tested in order to solve this problem during the
years, and the most common ones are regularized quadratic loss minimization,
hard detectors, Zero Forcing (ZF), MAP and in particular Minimum Mean
Squared Error (MMSE) [33].
The regularized quadratic loss minimization tries to estimate x with x̂, that
is computed by solving an optimization problem of the form
1
x̂ = arg min ∥y − Hx∥22 + f (x) (2.7)
x∈RNt 2
where f (x) is called penalty function or “regularization” and it promotes a

desired structure in x̂.
Hard detectors try to predict values z ∈ RNt that can also be different
from the ones in the alphabet A, and the predicted symbol x̂i comes from
x̂i = arg minx∈A |x − zi | where x̂i and zi are the i − th elements of x̂ and z
respectively.
For what concerns Zero Forcing (2.8), it is a method that is not so
applicable in the practice [9]
Definition 6 (Zero Forcing detector): If Nr ≥ Nt and if H has at least Nt

linearly independent columns, x̂ can be estimated by using the pseudo inverse
of the channel H † to y
x̂ = H † y = (H T H)−1 H T y ≈ z + w (2.8)
Therefore x̂ is estimated through z and a new noise w that can be very large
if (H T H) is quite singular.
Background | 21
MAP and MMSE instead are based on the Bayesian methodology where a
prior density p(x) and a likelihood function p(y|x) are presumed and the goal
is to find the posterior density p(x|y) as 2.9
p(y|x)p(x)
p(x|y) = R (2.9)
p(y|x)p(x)dx
MAP method finds an estimation of x with x̂M AP that is computed through
x̂M AP = arg max p(x|y) (2.10)

x∈ANt
while MMSE finds x̂M M SE through

Z
x̂M M SE = arg min ∥x − x̄∥2 p(x|y)dx (2.11)
x∈ANt
that is equal to compute E[x|y]. MMSE can be expressed also as
x̂ = (H T H + σ 2 INt )−1 H T y ≈ z + w (2.12)
when the noise is an Additive White Gaussian Noise (AWGN) vector with
mean of 0 and a covariance matrix of σ 2 INt . MMSE is equal to ZF if the noise
variance σ → 0. In practice MMSE provides good results but the performance
are still far from optimal and it has some difficulties at scaling when the number
of antennas increases. MMSE is used as benchmark in most of the comparative
research for MIMO detection problem, therefore also in this thesis it will be
used as benchmark.
If the noise w can be expressed as w ∼ N (0, γw−1 I), the so called AWGN,
with γw > 0 then the regularized quadratic loss minimization is equivalent
to MAP estimation under the prior p(x) ∝ exp[−γw f (x)] where ∝ denotes
equality up to a scaling that is independent of x.
In these methods, also adopting a many-layer neural network, the H
matrix is not used during training. Instead, iterative methods such as AMP
based or EP based detectors improve accuracy and complexity using also H
parameter. In particular, AMP algorithms that use “Onsanger correction” in
the construction of deep networks require few layers to reach convergence and
get high accuracy [8].
22 | Background
2.2.2 Iterative framework

An approach that can be followed in order to solve the MIMO detection
problem is an iterative estimation of the transmitted signal. This class of
algorithms is based on a number T of iterations that comprise two steps (2.13)
zt = x̂t + At (y − H x̂t ) + bt
general iteration: (2.13)
x̂t+1 = ηt (zt )
The first step has as input x̂t , that is the current estimation of the transmitted
signal x, the channel matrix H and the received signal y, and it computes zt
that is a linear transformation. The second step instead is a non-linear denoiser
that is applied to zt in order to produce the new estimation x̂t+1 of x, that is used
for the first step of the next iteration. The goal of each iteration is to improve
the estimation x̂t of x respect to the previous iterations. For the first iteration,
x̂0 = 0. The term y − H x̂t is called residual term. The denoiser ηt (·) can be
any non-linear function, but usually it applies the same thresholding function
to each element. A element-wise η function can reduce the complexity of
the denoiser. Usually the parameters required by the denoising function are
indicated with σt , and they can change for each iteration. A common choice
for the denoising function is the minimizer of E[∥x̂ − x∥2 |zt ] that is given by
ηt (zt ) = E[x|zt ].
Optimal denoiser for Gaussian noise: if the noise at the input of the
denoiser zt −x has an i.i.d. Gaussian distribution with a covariance matrix that
is diagonal with value σt2 INt , the element-wise thresholding function derived
from the previous formula is
1 X ∥z − xt ∥2
βtg (z; σt2 ) = xt exp(− ) (2.14)
Z x ∈A σt2
t
∥z−xt ∥2
where Z = ). In this case σt represents the standard
P
xt ∈A exp(− σt2
deviation of the Gaussian noise.
2.2.3 Linear MIMO detection techniques

One of the simplest methods to solve MIMO detection problem with an
approximation is to relax the constraint of x ∈ ANt to x ∈ C Nt , and then
try to round the found solution to one of the point of the constellation. This
Background | 23
linear model is expressed in 2.15
z = arg minx∈C Nt ∥y − Hx∥2 = H + y

linear: (2.15)
x̂ = arg minx∈ANt ∥x − z∥2
If each component of z is rounded to the closest point of the constellation,

the resulting algorithm is ZF, that corresponds to a single step of formula
iterative with xˆ0 = 0, A0 = H + , b0 = 0. Also match filtered and MMSE are
single step linear detectors with A0 = H H and A0 = (H H H + σ 2 INt )−1 H H
respectively. The advantage of linear detectors is the low complexity but they
have very worse performances respect to the ML detector. Among the linear
detectors it is possible to use multiple iterations using gradient descent. The
gradient of the first step in formula linear is −2H H (y − Hx), therefore if
At = 2αH H and Bt = 0 the linear step of the iterative approach formula
iterative is equal to the minimization of ∥y − Hx∥2 using gradient descent
with step size α.
2.2.4 ISTA
One simple approach to solve the MIMO detection problem is the ISTA. This
algorithm performs per T iterations the two steps
vt = y − H x̂t
ISTA: (2.16)
x̂t+1 = ηst (x̂t + βH T vt ; λ)
where t = 0, 1, 2, . . . , x̂0 = 0, β is a stepsize, vt is called iteration-t residual

measurement error and ηst (·; λ) : RNt → RNt is defined “soft-thresholding”
shrinkage function. This last function is a component-wise function and can
be defined as
[ηst (r; λ)]j = sgn(rj )max{|rj | − λ, 0} (2.17)
λ is a positive number that represents a tunable parameter that controls the

trade-off between sparsity and measurement fidelity in x̂ [8].
2.2.5 FISTA
A problem of ISTA is that, despite it is guaranteed to converge with β ∈
2 ), its convergence is slow. Among the several modifications that have
1
(0, ∥H∥
2
been proposed to solve this problem, the most famous is Fast ISTA (FISTA)
[8]. In order to speed up the algorithm, the estimation of x through x̂ is
24 | Background
modified as follow
t−2
x̂t+1 = ηst (x̂t + βH T vt + (x̂t − x̂t−1 ); λ) (2.18)
t+1
Thanks to this approach the algorithm is sped up of an order of magnitude in
the number of iterations.
2.2.6 Approximate Message Passing (AMP)

Another approach for approximately solving the MIMO detection problem is
through Belief Propagation (BP) if the problem is seen as a bipartite graph
[34]. On this type of graph, BP requires a number of update messages that is
in O(Nr Nt ) for each iteration, that is not feasible for large MIMO systems. To
face this limit, Jeon et al. [12] proposes Approximate Message Passing (AMP)
for solving MIMO detection problem in i.i.d. Gaussian scenario with a lower
complexity. In fact, AMP uses O(Nr + Nt ) messages for each iteration. The
AMP algorithm performs the steps:
zt = x̂t + H H (y − H x̂t ) + bt
AMP : bt = αt (H H (y − H x̂t−1 ) + bt−1 ) (2.19)
x̂t+1 = ηt (zt ; σt )
AMP is an iterative algorithm that uses At = H H and a bt term that is

called Onsanger correction term. Both σt and αt can be computed using SNR
and system parameters such as the dimension of the system or the constellation.
The denoising function is the optimal one described in formula optimal for
each element of zt . AMP is asymptotically optimal for large i.i.d. Gaussian
channel matrices [35].
The AMP algorithm can be defined also in another way, as described in
2.20:
vt = y − H x̂t + bt vt−1
x̂t+1 = ηst (x̂t + H T vt ; λt )
where : x̂0 = 0,
v−1 = 0, (2.20)
t = 0, 1, 2, . . .
bt = N1r ∥x̂t ∥0
λt = √αNr ∥vt ∥2
where α is a tunable parameter. There are two main difference between ISTA
and AMP. AMP introduces the Onsager correction term bt vt−1 to compute
Background | 25
the residual vt , and it uses λt instead of λ in the shrinkage function used to

estimate x̂t+1 .
The AMP algorithm can also be written by computing x̂t+1 with x̂t+1 =
η(x̂t + H T vt ; σt , θt ) and writing the Osanger term as
P t ∂[η(r;σt ,θt )]j
bt+1 = N1r N j=1 ∂rj
|r=x̂t +H T vt and with σt2 = M1 ∥vt ∥22 .
This different way of expressing AMP is equal to the previous one when
η(rt ; σt , α) = ηst (rt ; ασt ) with θt = α. When H is i.i.d. Gaussian, the input to
the shrinkage function rt = x̂t + H T vt can be written as rt = x + N (0, σt2 INt )
where σt2 is defined as before. That’s means that the Onsanger correction
guarantees that the input rt to the shrinkage function is an AWGN corrupted
version of the original transmitted signal with a known noise variance σt2 .
2.2.7 OAMP
A variant of AMP that relaxes the i.i.d. Gaussian channel assumption is
the Orthogonal AMP (OAMP) that works for unitarily invariant channel
matrices [36]. OAMP is an optimal estimator in terms of MSE with excellent
convergence properties [37]. The principle of OAMP is to decouple the
posterior probability p(x|y, Ĥ) into a series of probabilities p(xi |y, Ĥ)i=1,2,...,Nt
in an iterative way. The OAMP detector can be written as Algorithm 1 [38]
Algorithm 1 OAMP
Require: received signal y, channel matrix H, noise covariance matrix Rn̂n̂
Output: Estimated signal x̂T +1
Initialize: τ0 ← 1, x̂0 ← 0
for t = 1, . . . , T − 1 do
rt = x̂t + Wt (y − H x̂t )
x̂t+1 = E{x|rt , τt }
∥y−H x̂t ∥22 −tr(Rn̂n̂ )
vt2 = tr(H H H)
τt = Nt tr(Bt BtH )vt2 + N1t tr(Wt Rn̂n̂ WtH )
2 1
end for
Return xˆT
The algorithm is divided into two modules, the linear estimator used to
compute rt and the non linear estimator to estimate xt+1 . vt2 and τt2 are instead
the average variance of qt = x̂t − x and pt = rt − x respectively. The vector
pt is used to measure the accuracy of the output of the linear estimator, while
26 | Background
qt is used for the nonlinear estimator. vt2 and τt2 are defined as
E[∥qt ∥22 ]
vt2 = Nt
E[∥pt ∥22 ] (2.21)
τt2 = Nt
The linear estimator presents the matrix Wt that can assume different
values such as the transpose of Ĥ, its pseudo inverse or the LMMSE matrix,
but the optimal definition is
Nt
Wt = Ŵt (2.22)
tr(Ŵt H)
where Ŵt is the LMMSE matrix and it can be defined as
Ŵt = vt2 H H (vt2 HH H + Rn̂n̂ )−1 (2.23)
where Rn̂n̂ is the covariance matrix of the noise in signal detector n̂. The
matrix Wt is de-correlated when tr(Bt ) = 0, where Bt = I − Wt H (pt
uncorrelated with x and mutually uncorrelated with zero-mean and identical
variance).
The non linear estimator instead is MMSE estimate of x which is in relation
to rt through
rt = xt + wt (2.24)
where wt ∼ N C(0, τt2 I). Due to the fact that the values of x are taken from a
constellation of symbols and that the estimation is based on MMSE, in order
to estimate x̂ it is used
si NC (si ; ri , τt2 )p(si )
P
x̂t+1 = E{xi |ri , τi } = Psi 2
(2.25)
si NC (si ; ri , τt )p(si )
where p(si ) is the prior distribution of the symbol xt and is defined as p(xi ) =
√1
P
j∈Nr Nr δ(xi − sj ).
It is important to notice that the prior mean of MMSE estimator is rt and
its variance is τt2 and they control the accuracy and convergence of x̂t+1 . There
is another way of formulating OAMP, expressed as follow:
zt = x̂t + γt H H (vt2 HH H + σ 2 I)−1 (y − H x̂t )

OAMP : (2.26)
x̂t+1 = ηt (zt ; σt2 )
where γt = Nt /tr(vt2 H H (vt2 HH H + σ 2 I)−1 H) is a normalizing factor, while

Background | 27
vt2 can be derived from SNR and system dimensions and it is proportional to
the average noise power at the denoiser output at iteration t.
2.2.8 Vector Approximate Message Passing

The Approximate message passing algorithm is fragile because it works well
with i.i.d. Gaussian matrices but also a small deviation from this class of
channel matrices can cause the algorithm to diverge [39]. In [33] the Vector
Approximate message passing (VAMP) algorithm is proposed, showing that
it holds under a bigger class of channel matrices, those that are right-
orthogonally invariant. Moreover, it keeps the same desirable properties of
AMP, such as low periteration complexity, convergence in few iterations, and
shrinkage inputs rt that can be modelled through the AWGN model [8]. The
VAMP algorithm is based on the “economy” Singular Value Decomposition
(SVD) of the channel matrix: H = Ū diag(s̄)V̄ T where s̄ ∈ RR f orR :=
rank(H) ≤ min(Nr , Nt ).
The iterations of VAMP algorithm are performed by matrix-vector multiplications
with V̄ ∈ RNt ×R and V̄ T and keeping a structure similar to the one of AMP.
For what concerns the computational cost, after having computed the SVD,
the cost per iteration is dominated by O(RNt ) floating-point operations. The
SVD is a single SVD and not a per-iteration SVD and often it can be computed
online. The cost of the SVD is in O(Nr Nt R) or with some modern approach
can be O(Nr Nt log R). Also the standard SVD with H = U SV T where
S ∈ RNr ×Nt and both U and V are orthogonal can be used when it is not
possible to compute the economy SVD. Algorithm 2 is the SVD form of the
VAMP algorithm.
In the algorithm, g1 (rt , γt ) : RNt → RNt is defined as g1 = arg minx∈A [ γ2t |x−
rt |2 − ln p(x)] where p(x) is the prior distribution of x, while g1′ (rt , γt )
as g1′ (rt , γt ) = diag[ ∂g1∂r
(rt ,γt )
t
], and ⟨·⟩ is the empirical averaging operation
⟨u⟩ := Nt n=1 un . Moreover, rt ∈ RNt is called residual term at iteration
1 Nt
P
t − th and γt represents the reciprocal of its variance. Finally, R is defined as

R = rank(H).
As it possible to see, the VAMP algorithm is very similar to the AMP one
[40]. In fact, the denoising and the divergence steps are identical, and also the
VAMP algorithm presents an Onsanger term αk rk . Also the computational
per-iteration cost is similar since both the algorithms are dominated by two
matrix-vector multiplications, that for AMP involve H and H T while for
VAMP they involve V̄ and V̄ T . Finally, another similarity, not visible from
the comparison of the algorithm, is that for both the algorithms for certain
28 | Background
Algorithm 2 VAMP in SVD form

Require: received signal y, channel matrix H, denoiser function g1 (·, γt ),
noise precision γw ≥ 0, number of iterations T , r0 ≥ 0, γ0 ≥ 0
Output: Estimated signal x̂T
Compute economy SVD H = Ū diag(s̄)V̄ T
Compute preconditioned ỹ = diag(s̄)−1 Ū T y
for t = 0, 1, . . . , T do do
x̂t = g1 (rt , γt )
αt = ⟨g1′ (rt , γt )⟩
r̃t = (x̂t − αt rt )/(1 − αt )
γ̃t = γt (1 − αt )/αt
dt = γw diag(γw s̄2 + γ̃t 1)−1 s̄2
γt+1 = γ̃t ⟨dt ⟩/( NRt − ⟨dt ⟩)
rt+1 = r̃t + NRt V̄ diag(dt /⟨dt ⟩)(ỹ − V̄ T r̃t )
end for
Returnx̂T
large random H, rk behaves like a white Gaussian noise corrupted version of

x, rk = x + N (0, τk I) for some variance τk > 0. Moreover, γk can be seen as
an estimate of τk−1 as γk in AMP. As said before, VAMP holds under a class
of channel matrices H that is bigger than the one of MAP, in fact it holds for
large random H whose right singular vector matrix V ∈ RNt ×Nt is uniformly
distributed on the group of orthogonal matrices. VAMP holds for arbitrary left
singular vector matrices U and singular values. Instead, AMP requires large
i.i.d. Gaussian matrices H, with random orthogonal U and V and particular
distribution on the singular values of H.
There is also another approach to write the VAMP algorithm without using
the SVD form. In this second variant, the linear MMSE and the trace of the
covariance matrix must be computed at each iteration, involving the inverse of
a Nt × Nt matrix. This version of VAMP algorithm is derived from an EP-like
approximation of the sum-product belief propagation algorithm. Differently
from AMP that uses a loopy factor graph with scalar valued nodes, VAMP uses
a non loopy graph with vector valued nodes and this is the reason for the name
Vector AMP. The second version of the VAMP algorithm is called LMMSE
form and it follows the steps of Algorithm 3:
Background | 29
Algorithm 3 VAMP in LMMSE form

Require: LMMSE estimator g2 (r2t , γ2t ), denoiser function g1 (·, γ1t ), number
of iterations T , r10 and γ10 ≥ 0
Output: Estimated signal x̂T
for t = 0, 1, . . . , T do
Denoising
x̂1t = g1 (r1t , γ1t )
α1t = ⟨g1′ (r1t , γ1t )⟩
η1t = γ1t /α1t
γ2t = η1t − γ1t
r2t = (η1t x̂1t − γ1t r1t )/γ2t
LMMSE
x̂2t = g2 (r2t , γ2t )
α2t = ⟨g2′ (r2t , γ2t )⟩
η2t = γ2t /α2t
γ1,t+1 = η2t − γ2t
r1,t+1 = (η2t x̂2t − γ2t r2t )/γ1,t+1
end for
Returnx̂1T
where g2 (r2t , γ2t ) = (γw H T H + γ2t I)−1 (γw H T y + γ2t r2t ) is a MMSE
estimate linear in r2k (that’s why it is called LMMSE form) of a random
vector x2 under likelihood N (y; Hx2 , γw−1 I) and prior N (r2k , γ2k −1
I). Instead,
⟨g2′ (r2k , γ2k )⟩ can be defined as ⟨g2′ (r2k , γ2k )⟩ = γN2tr tr[(γw H T H + γ2t I)−1 ].
For what concerns g1 and g1′ , they are defined as in Algorithm 2.
The algorithm in SVD form and the one in LMMSE form are equivalent
if in the LMMSE the economy SVD H = Ū diag(s̄)V̄ T is substituted and
equating x̂k = x̂1k , rk = r1k , γk = γ1k , γ̃k = γ2k and αk = α1k .
The LMMSE form of VAMP is symmetric. In fact the first part of the
algorithm performs denoising on r1k and then Onsager correction in r2k , while
the second part does LMMSE estimation of r2k and Onsager correction in
r1,k+1 . The shrinkage module denoises the pseudo measurement rt = x +
N (0, σt2 ) and this represents the AWGN corrupted model that in this case holds
under large right rotational invariant H. VAMP alternates between two stages
that can be summarized as:
• a MMSE inference of x under likelihood N (y; Hx, σw2 I) and pseudo
prior N (x; r̃t , σ̃t2 I)
• a MMSE inference of x under pseudolikelihood N (rt ; x, σt2 I) and prior

30 | Background
x ∼ p(x).
The OAMP-LMMSE algorithm in [14] is similar to VAMP but they differ

in the approximation of certain variance terms, and on the reliance on matrix
inversion.
2.2.9 Other Mimo detection techniques

Other interesting techniques are Semi-Definite Relaxion (SDR) that converts
the problem as a semi-definite system; sphere Decoding that searches the
solutions x̂ such that ∥y − H x̂∥2 ≤ r; V-BLAST [41] that performs many
linear detection iterations, detecting the strong symbols, and then applies a
interference subtraction from the observation y. All these techniques are too
complex to be implemented for large-scaled MIMO systems. PIC [42] tries to
face this problem by jointly detecting all the transmitted symbols and then for
each transmitter trying to create a channel without interference by removing
the symbols coming from other transmitters. However, the performances of
this model are unsatisfactory for real and large MIMO dataset. The limitations
coming from these approaches brought to focus on learning-based approaches.
2.3 Deep learning

Deep learning is a subset of machine learning that from a dataset composed by
pairs (feature, label) {(y (d) , x(d) )}D
d=1 where D represents the number of pairs,
tries to learn some parameters of a Artificial Neural Network (ANN) aiming
to predict the unknown label x̂ associated to new data y. The network knows
y and uses it in many layers of processing, where each layer is composed by a
linear transformation and a component-wise non-linearity.
Multilayer Perceptron (MLP) is a type of ANN that concatenates T basic
blocks also called layers
Vt ∈ RMt × RMt−1
(2.27)
yt = Ω(Vt yt−1 ) : RMt−1 → RMt
where Vt is a linear operator, Ω a non linear function, and Mt the dimension

of the vector in block t − th. In order to predict x̂ it is necessary to create a
net composed by the concatenation of T layers:
x̂ = g(y) = g(y0 ) = Ω(Vt Ω(Vt−1 − . . . Ω(V1 y0 ) . . . )) (2.28)

Background | 31
During the training phase. The values of V1 , . . . , VT are learnt with the goal of
reducing the prediction error on the labels in the training data. For quantifying
the prediction error, it is required to define a loss function L, that brings the
problem to be defined as a minimization problem
arg min L([x(1) , . . . , x(D) ], [x̂(1) , . . . , x̂(D) ])

V1 ,...,VT
.
In the thesis, to achieve the values of the parameters V1 , . . . , VT that
minimize the loss function L two steps are needed. First, the gradients of
L with respect to the parameters are computed. Second, the parameters are
updated through gradient descent, and, in the case of the thesis, with Adam
optimizer as update rule [3].
Deep Learning has succeeded in computer vision, automatic speech recognition
and natural language processing. Recently, researchers started to apply it
also to physical layer communications and therefore also to MIMO detection.
Deep learning is usually used in MIMO Detection by unfolding an iterative
MIMO Detection models and adding some trainable parameters. The number
of parameters is not big, therefore the training phase is rapid and can be done
with a small dataset.
There are several advantages of using deep learning in MIMO detection.
First, deep learning can increase significantly the speed of convergence with
respect to the traditional iterative algorithms. Second, the DL methods can
decrease the average recovery error respect to the iterative version, because
they do not need to model the problem but they learn a mapping from the
input to the output directly [43].
In [5] the author investigates a joint channel estimation and signal detection
(JCESD) model where the detector takes as input the error coming from the
channel estimation and the channel statistics while the channel estimator uses
the detected data and its error. This model outperforms the iterative algorithm
from which is inspired and also other DL-based detectors achieving also better
robustness [15].
2.3.1 LISTA
In order to improve ISTA, a deep learning approach based on it has been
proposed and called LISTA. LISTA unfold the T iterations of ISTA in a T -
layer feed forward neural network, adding some trainable parameters. In order
32 | Background
to estimate x̂t+1 LISTA performs
x̂t+1 = ηst (S x̂t + By; λ) (2.29)
with B = βH T and S = INr − BH. Instead of using a single λ for all

the iterations, LISTA uses a layer dependent threshold λ = [λ1 , λ2 , . . . , λT ].
D
LISTA tries to learn λ, B and S from the training data (y (d) , x(d) )d=1 by
minimizing the quadratic loss
D
1 X
LT (Θ) = ∥x̂T (y (d) ; Θ) − x(d) ∥22 (2.30)
D d=1
where Θ = [B, S, λ] is the set of trainable parameters and x̂T (y (d) ; Θ) the
output of the T -layer network that takes as input y (d) and Θ.
2.3.2 LAMP
Similarly to LISTA, by unfolding the iterations of AMP, a neural network
named LAMP can be constructed. As AMP requires less iterations than ISTA,
the aim of LAMP is to converge in less layers than LISTA. Differently from
LISTA, LAMP presents the Onsager term in the computation of the residual
term vt and the LAMP’s shrinkage threshold λt = αt√∥vMt ∥2 varies with vt
and changes for each layer. Defining Bt = βt H T for each layer t with
t = 0, 1, 2, . . . , the t-th layer of the LAMP network can be expressed as
x̂t+1 = βt ηst (x̂t + Bt vt ; Nαr ∥vt ∥2 )

(2.31)
v̂t+1 = y − H x̂t+1 + Nβtr ∥x̂t+1 ∥0 vt
with x̂0 = 0, v0 = y and ηst defined as 2.17. The parameters of the network are
therefore Θ = {Bt , αt , βt }t=0
T −1
. In particular the complete LAMP Algorithm
4, considering some steps in order to avoid bad local minimum, can be written
as follow.
Background | 33
Algorithm 4 LAMP
Require: B = H T , α0 = 1, β0 = 1
Learn Θtied
0 = {B, α0 }
for t = 1, . . . , T − 1 do
αt = αt−1
βt = βt−1
Learn {αt , βt } with fixed Θtied t−1
Re-learn Θtied t = {B, {α i , β t
i }i=1 , α0 }
end for
After having computed {Θtied t }t=1
T −1
B0 = H T , α0 = 1, β0 = 1
Learn Θuntied
0 = {B0 , α0 }
for t = 1, . . . , T − 1 do
Bt = Bt−1
αt = αt−1
βt = βt−1
Learn {Bt , αt , βt } with fixed Θuntied t−1
Set Θtied
t = {B i , αi , βi } t
i=0 /β 0
if Θtied
t performs better than Θuntied t then
Replace Θuntied
t with Θ tied
t (B i = B∀i)
end if
Re-learn Θuntied
t
end for
Return Θuntied
T −1
LAMP is showed to outperform LISTA in MIMO detection.
2.3.3 LVAMP
Following the same approach of LISTA and LAMP, LVAMP is the neural
network that results from the unfolding of the iterations of VAMP. The
LVAMP network consists in two modules as VAMP that can be divided
in two parts each. Therefore, four steps compose the LVAMP network,
a LMMSE estimation, decoupling stage, shrinkage estimation, an another
identical decoupling stage. The LMMSE stage uses as parameters θ̃ =
2
{Ut , st , Vt , σwt } for each iterations t. When the channel is not i.i.d. Gaussian
and there are correlations, it is important to consider the covariance matrix.
In this case the parameters of the network becomes θ̃ = {Gt , Kt } and the
34 | Background
LMMSE stage is defined as
η̃(r̃t ; σ̃t , θ̃t ) = Gt r̃t + Kt y (2.32)
where Gt ∈ RNt ×Nt and Kt ∈ RNt ×Nr . The shrinkage stage instead has
as parameter θt that is used in the denoising function η(·). Therefore the
parameters to be learnt are expressed as {θ̃t , θt }Tt=0 . The algorithm is similar to
the one of LAMP by replacing Bt with θ̃t and {αt , βt } with θt . It is suggested
to inizialize U , s, V as the SVD values of H and σw2 at the average value of
Nt−1 ∥y∥2 . LVAMP is more robust to the matrix H than LAMP and converges
faster than LAMP also with i.i.d. Gaussian channel matrix.
2.3.4 DetNet
Recently, research on MIMO detection moved on machine learning and deep
learning approaches. Samuel et al. [44][45] proposed DetNet, a deep learning
network that achieves impressive performance in i.i.d. Gaussian scenario and
for small-sized MIMO systems. The neural network architecture follows the
steps:
(1) (2)
qt = x̂t−1 − θt H H y + θt H H H x̂t−1
(3) (4) (5)
u = [Θt qt + Θt v t−1 + θt ]+
DetNet : t (6) (7) (2.33)
vt = Θt ut + θt
(8) (9)
x̂t = Θt ut + θt
where [x]+ = max(x, 0) is a element-wise function called ReLU activation

function. In the equations θ represents a vector (except for θ(1) and θ(2) that
are scalars), while Θ a matrix. The performance of DetNet can be promising,
but the architecture has two main issues [46]. First, it is difficult to adapt
the network to spatially correlated channels or higher modulation schemes.
Second, it does not use known characteristics of iterative methods and for this
reason it results unnecessarily complex.
2.3.5 OAMPNet
He at al. [5] instead proposed OAMPNet, a deep learning network based
on orthogonal AMP, achieving strong performance on both i.i.d. Gaussian
and small-sized Kronecker channel models. The algorithms adds only two
parameters to the original OAMP for each iteration. The OAMPNet can be
Background | 35
described as:
(1)
zt = x̂t + θt H H (vt2 HH H + σ 2 I)−1 (y − H x̂t )
OAMPNet :
x̂t+1 = ηt (zt ; σt2 )
(2.34)
the algorithm uses the optimal Gaussian denoiser, as AMP. Due to the fact
that the network is based on OAMP, it has a strong assumption that is that
the system is modelled with unitarily-invariant matrices. This brings to
the advantage of having only two trainable parameters, but the performance
degrades on real-world channel models. Moreover, the complexity is higher
than AMP due to the matrix inversion that must be computed.
2.3.6 MMNet
During last years, several machine or deep learning based detectors have
been proposed, achieving promising results on Gaussian channel models,
but with a degradation of performances on real-world channel models with
spatial correlation. In [36] the authors propose MMNet detector, a MIMO
detection scheme based on deep learning and on the theory of iterative soft-
thresholding algorithms. Thanks to a novel training algorithm that leverages
temporal and spectral correlation to accelerate training, MMNet outperforms
existing approaches on realistic channel with the same or lower computational
complexity. MMNet adds the right degree of freedom into the iterative
framework, balancing model flexibility and complexity. On a i.i.d. Gaussian
channel, MMNet achieves the same performances of sub-optimal detectors,
with a two order less of complexity respect to other deep leaning approaches.
While, on a spatial correlated channel, it performs as OAMPNet that is the
optimal detector for this scenario, by achieving a complexity 10 times fewer.
It is also better than a classic linear scheme such as MMSE detector. The
advantage of MMNet is that the algorithm is trained online, and in this way it
can adapt to different channel models. There are two version of MMNet neural
network, one for Gaussian channel matrices and one for arbitrary channels. For
the i.i.d. Gaussian channel, the network has the following architecture:
(1)
zt = x̂t + θt H H (y − H x̂t )
MMNet − iid : (2.35)
x̂t+1 = ηt (zt ; σt2 )
the denoiser is the same of AMP and OAMPNet, in fact it is the optimal
denoiser for Gaussian noise. One of the properties of MMNet is the assumption
that the noise at the input of the denoiser follows the same distribution for all
36 | Background
the transmitted symbols. The estimation of the noise variance σt2 is given by
(2)
θ ∥I − At H∥2F ∥At ∥2F 2
σt2 = t ( [∥y − H x̂ t ∥ 2
2 − Nr σ 2
] + + σ ) (2.36)
Nt ∥H∥2F ∥H∥2F
MMNet assumes that the noise is composed by two parts, the residual error
caused by the estimation of x̂t respect to the real value of x, and by the noise
of the channel n. The first composed is amplified by the linear transformation
(I − At H) while the second by At . For the i.i.d. Gaussian channel model,
(1) (2)
only two parameters per iteration (θt and θt ) are sufficient, and the type
of channel does not require an online training. In fact, MMNet can reach
good performance on i.i.d. Gaussian channels by being trained offline on
randomly sampled i.i.d. Gaussian matrices. The MMNet architecture for
arbitrary channel matrices is structured as follows:
(1)
zt = x̂t + Θt (y − H x̂t )
MMNet : (2.37)
x̂t+1 = ηt (zt ; σt2 )
(1)
where θt is a Nt × Nr complex-valued trainable matrix. In order to add a
degree of freedom to the estimation of the noise per transmitter, the trainable
(2)
parameter θt in formula σt2 becomes a vector of shape Nt × 1. In this
way, the model can handle cases where the different transmitted symbols have
different levels of noise, by scaling the noise variance by different values
for each symbol. These approach distinguishes MMNet from the highly
constrained OAMPNet and from the too complex DetNet. MMNet uses a
flexible linear transformation for computing zt but at the same time using
the optimal denoiser for Gaussian noise. Moreover, it does not need matrix
inversion that can raise the complexity. The loss function that is used for all
the T layers for the training phase is the L-loss:
T
1X
L= ∥x̂t − x∥22 (2.38)
T t=1
DetNet and OAMPNet are trained offline, therefore they try to learn a
model that can work with a family of channel matrices, for example with
i.i.d. Gaussian matrices. However, they differ for what concerns the design
philosophy. DetNet is composed by a large neural network that works with
1-10 million parameters with little domain knowledge. OAMPNet instead
uses a model-driven approach, adding two trainable parameters to the OAMP
algorithm. Both the algorithms perform well in simple scenario, but they suffer
Background | 37
in performance when used on real-world dataset, such as 3GPP 3D MIMO

channel, compared to ML detector. DetNet is too general because it does not
have assumptions but only a large model that is also difficult to train, while
OAMPNet works under strong assumptions on channel model and so it has
difficulties when it has to work with other families of matrices.
MMNet tries to solve these problems with an online training approach,
despite usually it is impossible to respect the stringent performance requirements
of MIMO detection with this kind of approach. MMNet tries to overcome
this issue with two ideas. First it adopts a neural network based on iterative
soft-thresholding algorithms, in particular keeping the denoiser architecture.
Moreover it introduces flexibility through trainable parameters optimized for
each channel realization. Second MMNet uses frequency and time domain
localities of channel matrices at the receiver side, increasing a lot the training
procedure compared to an approach that re-trains the network for each channel
realization. MMNet manages to reach performances that are close to ML,
with a complexity that is 10 times lower than OAMPNet. The main difference
between MMNet and OAMPNet is that MMNet shapes the noise as a Gaussian
distribution at the input of the denoiser stage, so that the denoisers can operate
effectively, attenuating noise maximally.
2.3.7 OAMP-Net2
In [5] the authors proposed another model driven deep learning network
based on OAMP, named OAMPNet2, that is similar to OAMPNet but with
more trainable parameters in order to adapt to various channel environments
and take channel estimation error into consideration. OAMPNet2 performs
considerably better than OAMP and is more robust with respect to SNR,
channel correlation, modulation symbol and MIMO configuration mismatches.
The OAMPNet2 algorithm performs signal detection with channel estimation
error. In the MIMO system, it is possible to express the received data signal
vector yd [n] that corresponds to the n − th data vector as yd [n] = Hxd [n] +
nd [n] where nd [n] ∼ N (0, σ 2 INr ) is AWGN vector. The channel matrix H
can be expressed as H = Ĥ − ∆H where Ĥ is the estimated channel and ∆H
is the error on the channel estimation. If the MIMO detection problem uses
38 | Background
the estimated channel matrix Ĥ, the system can be expressed as
yd [n] = Hxd [n] + nd [n]

= (Ĥ − ∆H)xd [n] + nd [n]
(2.39)
= Ĥxd [n] + nd [n] − ∆Hxd [n]
= Ĥxd [n] + n̂d [n]
where n̂d [n] = nd [n] − ∆Hxd [n] is the noise in signal detector that includes
the channel estimation error and the AWGN vector. This noise is supposed to
be Gaussian distributed.
The OAMPNet2 is obtained by unfolding the OAMP detector and adding
some trainable variables. The OAMPNet2 is a deep learning network composed
by T cascade layers with the same architecture but different parameters. The
inputs of the network are y and Ĥ, while the output is x̂T +1 . For each layer, the
input is the x̂t estimation of x computed in the previous layer. The OAMPNet2
detector follows these steps at each layer:
rt = x̂t + γt Wt (y − Ĥ x̂t )
x̂t+1 = ηt (rt , τt2 ; ϕt , ξt )
OAMP − Net2 : ∥y−Ĥ x̂t ∥22 −tr(Rn̂n̂ ) (2.40)
vt2 = tr(Ĥ H Ĥ)
1 θt2
τt2 = Nt
tr(C t C H 2
t )v t + Nt
tr(Wt Rn̂n̂ WtH )
As it possible to see, the difference between OAMP and OAMPNet2 is

represented by the presence of the learnable parameters Ωt = {γt , ϕt , ξt , θt }
in each layer. When γt = θt = ϕt = 1 and ξt = 0 the OAMPNet2 is reduced
to the OAMP detector, while optimizing the values of the parameters, the
performance can be improved. The matrix Ct = I − θt Wt Ĥ is similar to
Bt of OAMP algorithm adding the trainable parameter θt in order to regulate
τt2 . Also the OAMPNet2 algorithm is divided into two modules, a linear and
a nonlinear estimator. For what concerns the linear estimator, the trainable
parameter γt is added to the formula for updating rt and it can be considered
as the step size of the update. The nonlinear estimator ηt for estimating x̂t+1
instead is revised and it is constructed by the divergence free estimator
ηt (rt , τt2 , ϕt , ξt ) = ϕt (E{x|rt , τt } − ξt rt ) (2.41)
where E{x|rt , τt } is computed as for OAMP. The formula can be seen as

a linear combination between the priori mean rt and the posteriori mean
E{x|rt , τt }, and it uses the learning parameters ϕt and ξt . For what concerns
the variance estimators vt2 and τt2 , vt2 remains the same as for OAMP, while
Background | 39
τt2 is computed using Ct instead of Bt and using γt and θt parameters. vt2

can be replaced with max(vt2 , ξ) for a small positive constant ξ = 5 · 10−13
in order to avoid stability problems. OAMP is far from optimal performance
when there are strong spatial correlation and channel estimation error. With
the help of trainable variables, OAMPNet2 tries to avoid this problem and to
adapt to various channel environments. γt and θt are used to adjust the linear
estimator and to find the optimal step size for updating rt and τt2 . ϕt and ξt
instead are important in order to construct the divergence free estimator ηt (·).
The difference between OAMPNet and OAMPNet2 is also in the use of ϕt and
ξt , that in the first version where ϕt = 1 and ξt = 0 (as in OAMP), while in
this version they are trainable in order to avoid divergence.
2.4 Contribution
In the thesis only few of the algorithms described are considered, implemented
and compared. The thesis is based on a comparison among two new algorithms
that are OAMPNet2 and LVAMP, therefore these two algorithms will be
implemented. Due to the fact that in order to implement these algorithms is
necessary to unfold OAMP and VAMP algorithms, also these two approaches
will be used in the comparison. Only on few specific configurations, MMNet
algorithm will be used in order to compare it with the two new networks.
Finally, for having a common baseline, MMSE algorithm will be coded,
because it is the common baseline for comparing MIMO detection techniques.
40 | Background
Method | 41
Chapter 3
Method
3.1 Methodology
As said in the first chapter, the thesis is based on an experimental and
quantitative methodology. The work done for the thesis started with a detailed
literature review in order to understand the background and the state of the
art of MIMO detection problem. It continued with the implementation of a
link layer 5G network simulator that is used to perform the experiments. After
that, the algorithm to consider has been chosen. The algorithms that are taken
in consideration for the thesis and therefore for the experiments are MMSE,
VAMP, OAMP, LVAMP, OAMPNet2, and also MMNet. While the first three
algorithms are ready for being used, LVAMP, OAMPNet2 and MMNet, that
are based on DL, require a training phase. Therefore, in this chapter, a section
is dedicated to the training phase of these three algorithms. MMSE algorithm
has been chosen because it is a common baseline for comparing MIMO
detectors. MMNet has been taken into consideration because it is one of the
best DL-based algorithm in terms of performance. LVAMP and OAMPNet2
are the focus of the thesis and they has been selected among the other DL-
based algorithms because they are very recent, they are promising in terms
of performance-complexity trade-off and because a comparison of MIMO
detectors that include this two algorithms is missing. VAMP and OAMP are
implemented because they are the two algorithms from which LVAMP and
OAMPNet2 are derived.
The experiments consist in different simulations based on 5G scenarios.
The simulations differ for size of the MIMO system, channel model, QAM
size, SNR values and type of training.
The sizes of MIMO system that are used are: 4×4, 32×32, 64×32, where
42 | Method
for each couple the first value represents the number of receiver antennas while
the second one the number of transmitters.
The channel models on which the algorithms are trained and tested are
the i.i.d. Gaussian and the Kronecker ones. For what concerns the Kronecker
channel model, different correlation values at both receiver and transmitter
side are considered. In particular, the correlation values that are considered
are 0.1, 0.3, 0.5, 0.7 for both sides and the simulations differ for different
combinations of correlation at receivers and correlation at transmitters.
The modulation scheme that are used in the experiments are different types
of QAM. In particular QAM-16 and QAM-64 are the values used to create
different simulations.
For each experiment, the same range of SNR values are considered. The
range that has been selected is from 18 dB to 23 dB, that are the values that
are common in the comparison of MIMO detectors. Therefore, the same
configuration of parameters that composes a simulation is tested on this range
of SNR with a step size of 1 dB.
Finally, the algorithms, in particular the ones that are DL-based, are trained
and tested in three different ways:
• The training phase is conducted with i.i.d. Gaussian channel matrices,

and tested with matrices generated from the same channel model.
• The training phase uses Kronecker channel matrices with different

correlation parameters, and the testing is conducted with Kronecker
channel matrices.
• The training is based on i.i.d. Gaussian channel model, but the testing is
done with Kronecker matrices in order to verify the adaptability of the
algorithms.
The metric that is used to compare the algorithms in the experiments is the
SER performance metric.
3.2 Performance Metrics

The performance metrics that are usually used while working with MIMO
detection problem are the Bit Error Rate (BER) and the SER, at different SNR
values. Both the metrics are a division between the number of errors in the
estimated message x̂ compared to the original transmitted message x and the
Method | 43
number of values transmitted. In particular, the BER is defined as
no. of bits in error

BER = (3.1)
total no. of transmitted bits
While SER is defined as

no. of symbols in error
SER = (3.2)
total no. of transmitted symbols
Therefore BER works on the bit level, while SER on the constellation symbols.
In the thesis, only the SER metric will be used.
3.3 Training
The training phase is the crucial part of the thesis. In this section, the two
networks of OAMPNet2 and LVAMP are presented. After that the dataset
used for training the networks is explained. Finally, some methods used to
prevent overfitting concludes the section.
3.3.1 LVAMP
The LVAMP algorithm has been explained in detail in the previous chapter.
In this subsection, the network that is built for conducting the experiments is
shown. In Figure 3.1 it is possible to see how the network is built. In the
Figure 3.1: LVAMP network layer

44 | Method
figure, the t − th layer of the network is displayed. The layer is composed by a

LMMSE stage that produces a first estimation of the transmitted signal x and
a residual term, a decoupling block, a shrinkage block that produce the final
estimation x̂t of the iteration t and another identical decoupling block.
3.3.2 OAMPNet2
As for the LVAMP network, in this subsection, the OAMPNet2 network
is presented. In Figure 3.2 the network is shown. The figure shows the
Figure 3.2: OAMPNet2 network layer
OAMPNet2 t-th layer in detail. As it is possible to see, each layer receives

as input the received signal y, the channel matrix H and the estimation x̂t of
the transmitted signal x coming from the previous layer. The layer produces
as output the estimation x̂t+1 . Each layer has four hyperparameters as seen in
the previous chapter that are tuned independently for each layer.
3.3.3 MMNet
Finally, the MMNet t − th layer block is shown in Figure 3.3. The At matrix
is a trainable matrix, rt is the residual term coming from the previous block,
η is the denoising function. Each block receives as input the estimation x̂t of
the transmitted signal and rt from the previous layer and the received signal y
and the channel matrix H. The layer has as output the estimation x̂t+1 and the
residual rt+1 .
Method | 45
Figure 3.3: MMNet network layer
3.3.4 Dataset
The training dataset is split in parts of equal size called batches before the
start of the learning procedure. When the training phase starts. The algorithm
iterates over epochs. For each epoch, a training step is repeated for every batch
that compose the training dataset. A training step is divided into two stages:
• forward pass computes the outputs x̂(1) , . . . , x̂(D) from the pairs input-
d=1 , where this time D stands for the size of the batch;
label {(y (d) , x(d) )}D
• backward pass updates the values of V1 , . . . , VT by computing their

gradients with respect to the batch and then by applying gradient
descent. T represents the number of layer of the network.
For each experiment, the dataset is composed by samples with three

sources of randomness: the signal x, the channel matrix H and the noise
n. The signal x is sampled form the constellation randomly and uniformly.
The channel matrix is sampled following the structure of the channel model
selected for the experiment. The noise is derived from the sampling of the
standard deviation sigma that is derived from the SNR of the experiment. For
each sample in the batch, the SNR value is chosen randomly in the range 18-23
dB. Therefore, each batch that is used during training is a tuple of four elements
d=1 where x
{(y (d) , H (d) , σ (d) , x(d) )}D are the values that have to be predicted.
(d)
In this thesis, the algorithms are trained offline with randomly sampled i.i.d.
Gaussian channel matrices. Then they are tested on both i.i.d. Gaussian and
Kronecker channel matrices with same source of randomness and different
random seed. Finally, the algorithms are trained also on Kronecker channel
matrices and tested using the same channel model for generating different
matrices.
The generated training dataset is split in two parts in order to create the
validation set, that can be used for early stopping and for cross validation. The
46 | Method
size of the validation set is 25% of the generated dataset. For each training
step, the batch is randomly generated, since creating a batch has a complexity
that is negligible compared to the forward and backward training steps.
The algorithms that requires deep learning are implemented, trained and
tested using Tensorflow 2.0.0 [24]. For each deep learning model, different
training procedures are run for every MIMO configuration and modulation
order of the different experiments. In order to avoid overfitting, early stopping,
dropout and cross validation are used. These techniques are explained in
the next subsection. All the deep learning based algorithms use all these
techniques during the training procedure. The algorithms have been trained
for 2000 epochs, with Adam optimizer and learning rate of 0.001. Each batch
has size of 1000 samples. Adam is the most used optimization algorithm in
deep learning. In this method, the gradient and its square are refined through
two operations:
(k)f
me = β1 me−1 + (1 − β1 )∇Wf ′ ,e
(k)f (3.3)
ve = β2 ve + (1 − β2 )(∇Wf ′ ,e )2
where (1 − β1 ) and (1 − β2 ) represent the learning rates.
3.3.5 Preventing Overfitting

The main goal of the training phase is to find the values for the parameters
V1 , . . . , VT for solving the minimization problem in formula loss, and therefore
reduce the training loss. This procedure does not ensure to reduce also the
prediction error on the test dataset. As in all the machine learning and deep
learning problems, also in this case it is crucial to choose a training dataset and
a loss function that can minimize the generalization error, that can be defined
as the gap between the training loss and the test loss. When the training dataset
is too small with respect to the number of parameters to be learnt (this number
can be seen also as the degree of freedom of the model), the model is prone to
overfit on the training dataset and therefore the generalization error tends to be
big. A simple solution in order to prevent overfitting is to increase the size of
the training dataset, but unfortunately this is not always possible. Therefore,
other techniques are used in deep learning to avoid the problem, and in this
thesis some of them are adopted and described here.
Method | 47
Early Stopping
An important technique to avoid or limit the overfitting problem is to divide
the dataset not only in training and test sets, but in three parts, where the third
part is called validation set. The validation set is used during the training
phase in order to verify the quality of the training. Thanks to this third set, it
is possible to apply a technique called early stopping to reduce the overfitting.
Early stopping is a criteria that decides when to stop the training of the model
according to the prediction error on the validation set. During the training
of the model, the prediction error on the training set continues to decrease
because this is the goal of the phase. However, the prediction error on the
validation set initially decreases but after a while it stops to go down and starts
to raise. When the prediction error on the validation dataset starts to increase,
this means that the model is overfitting on the training data, therefore it is better
to stop the training procedure.
Droupout
Droupout is a different technique to prevent overfitting. It is applicable to any
layer t of the neural network excepts for the last layer, so the output layer. At
every stage of the training phase, each input yt can drop out temporarily with
a probability p, that is called as dropout rate. When an input drops out, its
value is set to zero and it will not have a contribution during that training step.
In this way the training dataset is differentiated for each step of the training
phase. This technique is applied only during training, and not also on the test
phase.
Cross Validation
Also for the cross validation technique, the validation dataset is required.
During the training phase, with deep learning, not only the values of the
parameters V1 , . . . , VT have to be optimized, but also the hyperparameters of
the model. These parameters cannot be optimized with the backpropagation
step but they require to be tuned in order to achieve better performance.
The hyperparameters are for example the number of layers, the learning rate,
the hidden dimension of each layer, the dropout rate. The cross validation
technique consists in validating the performance of different models composed
by predefined combinations of hyperparameters on the validation set, at the
end of the training. The model with the combination of hyperparameters that
achieves the best performances on the validation set is the best candidate to be
48 | Method
used for testing.
Vanishing gradient problem

Another problem to face when dealing with deep learning is the vanishing
gradient problem. For each step of training, the backpropagation reduce the
values of the gradients of the parameters. If the gradients become too small,
they are insignificant for the training phase and for learning new weights. This
happens when the neural network works with activation function that assumes
value in the interval [0, 1]. Due to the fact that, the gradients are multiplied
with each other for T different steps, this results in a multiplication of T terms
with values in [0, 1]. The multiplication goes exponentially towards zero with
T , and this is the vanishing gradient problem. In order to solve this problem
there are several techniques, but for the thesis, only three techniques are used
and described. The first solution is to use activation function that does not
bring to zero values, for example the ReLU function
ReLU (x) = max(0, x) (3.4)
A second solution is to introduce a penalty term in the loss function for each
layer t = 0, 1, . . . , T − 1 and in this way the loss L is computed as
L = λT LT + λT −1 LT −1 + · · · + λ1 L1 (3.5)
where each λ has to be optimized and follows the rule Tt=1 λt = 1. The
P
last solution is to use a network that does not bring to the vanishing gradient
problem.
3.4 Procedure
In order to perform the different experiments and the comparison among
the algorithms, different datasets have been generated. The datasets follow
the MIMO model defined in the second chapter and its assumptions. For
what concerns the channel models and therefore how H is generated, both
i.i.d. Gaussian and Kronecker have been considered. The algorithms are
tested on randomly sampled i.i.d. Gaussian and Kronecker channel models.
For Kronecker channel model, different values of correlation between firstly
receivers and secondly between both receivers and transmitters have been
chosen. The data are complex valued but transformed as explained in section
to real-valued. The MIMO systems that have been considered have different
Method | 49
shapes. In general the system size ratios N Nt

r
are 1 or 0.5. These values are
comparable with other works and can represents real massive mimo systems.
The modulation is QAM and in particular in the form QAM-16 and QAM-64
fixed for each transmitter in each experiment. The values for SNR goes from
18db to 23db with step size of 1 db.
A single experiment consists in running the selected algorithms in a
scenario composed by fixed channel model, modulation type and system
size. Different SNR are tested during the same experiment, modifying the
impact of the noise in the system. The channel matrix and the symbols
that are transmitted by the UEs are generated randomly for each experiment.
The algorithms in the same experiment use the same data in terms of SNR,
channel matrices H and transmitted symbols x. Thanks to noise (derived from
SNR), channel matrix and transmitted symbols, the received symbols y are
computed. The algorithms aim to detect the original transmitted symbols from
the knowledge of H and y. The estimated transmitted symbols x̂ are computed
and compared with the original symbols, in order to compute the SER. The
number of realizations of the channel matrix can have an effect on the quality
of SER. It can be shown that without sufficient number of realizations the
SER curve can assume strange deviations from the correct decrease with the
increasing of SNR. For this reason, for each experiment, thousands of different
channel matrices are used in order to test the algorithms. Therefore, the
simulations are averaged over thousands of channel realizations. Within the
same sequence of transmitted symbols, the realization of the channel matrix
remains constant.
3.5 Experiments
The number of combinations that can be generated and that have been tested
during the thesis work are hundreds. In the thesis only the most significant
ones are reported. In this section a list of the experiments that are analysed in
the thesis is presented.
3.5.1 Experiment 1
The first experiment that will be performed and discussed is based on a small
MIMO system, with 4 receivers and 4 transmitters. The reasons behind this
choice are two:
• This MIMO system size is the same that is used to present the OAMPNet2
50 | Method
algorithm in its original paper, and therefore this experiment can be used
also as a proof of the correctness of the correct implementation of the
network.
• The small MIMO systems are not the main problem in MIMO detection,
because many algorithms perform well in this scenario, and therefore
this size can be taken as first proof of the quality of the algorithms.
The channel model of this experiment will be i.i.d. Gaussian basically for the
same reasons for both training and testing. The modulation scheme will be
QAM-16 in order to keep the simulation as simple as possible.
3.5.2 Experiment 2
The second experiment is a more complex system respect to the one in the first
experiment. The size of the MIMO system in fact becomes a 32 × 32 system,
and therefore a large-medium size MIMO system. The QAM modulation
scheme used in this experiment is QAM-64. The choice of this dimension and
modulation scheme is due to the willingness of reproducing a more realistic
scenario respect to the one presented before. In this second experiment, the
channel model is still i.i.d. Gaussian for both the training and the testing. The
goal of this experiment is to verify the quality of the algorithms in a more
complex system, but with a simple channel model, so that it is possible to
evaluate their feasibility step by step.
3.5.3 Experiment 3
The third experiment is similar to the second one, but it represents a realistic
scenario. In fact, the size of the system and the modulation scheme remain
the same as the ones in the previous experiment, but the channel model is
different. In fact, in this case, both the training and the testing are conducted
with Kronecker channel matrices, with different combinations of correlation
parameters at both receiver and transmitter sides. The importance of this
experiment is to verify that the algorithms can work also in a realistic scenario.
This experiment is one of the most important because it verifies both the
feasibility of the algorithms and their performances in a realistic scenario.
3.5.4 Experiment 4
The fourth experiment aims to evaluate the adaptability of the algorithms. In
fact the algorithms will be trained using an i.i.d. Gaussian channel model
Method | 51
for generating the matrices, and they will be tested using Kronecker channel
matrices. Thanks to this method it is possible to see if the algorithms can work
on conditions that are different from their training scenario. The size and the
modulation scheme will be the same of Experiment 2 and 3.
3.5.5 Experiment 5
For the last group of experiments, the size of the system changes again,
modifying this time also the shape. In fact, in these experiments the MIMO
system will not be anymore a squared system but it will be a rectangular system
with a ratio between transmitters and receivers equal to 0.5. The number of
transmitters will be 32, while the receivers will be 64. These experiments aim
to verify that the algorithms can work also on rectangular MIMO systems. In
particular, the 64 receivers and 32 transmitters configuration is the most used
for MIMO detectors comparison. For the fifth experiment the algorithms are
trained on a i.i.d. Gaussian channel model and tested using the same channel
model.
3.5.6 Experiment 6
As for the 32 × 32 configuration, also for the 64 × 32 one, three different
experiments are conducted. In the sixth experiment both the training and the
testing phase are conducted with Kronecker channel matrices. The modulation
scheme for these last experiments is still QAM-64.
3.5.7 Experiment 7
Finally, for the last experiment the training phase is conducted through i.i.d.
Gaussian channel matrices and the testing phase is done with Kronecker
channel model. Also for these last three experiments the SNR range of values
that is consider remains the same, therefore from 18 to 23 dB.
3.5.8 Summary of experiments

A resume of the experiments is shown in the table 3.1. The experiments tested
on Kronecker channel matrices are split in two sub-experiments. The two
sub-experiments differ from the correlation paramenters at both receiver and
transmitter side. The first sub-experiment uses ρr = ρt = 0.3 while the second
one uses ρr = ρt = 0.5.
52 | Method
Table 3.1: Experiments summary
Experiment Nr Nt Shape Modulation C.M. training C.M. testing

1 4 4 Squared QAM-16 i.i.d Gaussian i.i.d Gaussian
2 32 32 Squared QAM-64 i.i.d Gaussian i.i.d Gaussian
3 32 32 Squared QAM-64 Kronecker Kronecker
4 32 32 Squared QAM-64 i.i.d Gaussian Kronecker
5 64 32 Rectangular QAM-64 i.i.d Gaussian i.i.d Gaussian
6 64 32 Rectangular QAM-64 Kronecker Kronecker
7 64 32 Rectangular QAM-64 i.i.d Gaussian Kronecker
The first column of the table states the number of the experiment. Nr
and Nt indicates the number of receivers and the number of transmitters
respectively. The fourth column states if the MIMO system has a squared or
rectangular shape. The column named modulation indicates which modulation
scheme the experiment adopts. The last two columns represents the channel
models (indicated with the acronym C.M.) that are used for the training and
the testing phase respectively.
3.6 Technology
In order to perform the experiment, a Dell laptop with 16 GB of RAM and Intel
i7 processor has been used. The experiments are coded using Python 3 [23].
The DL based algorithms are created and run using Tensorflow 2.0.0 [24]. The
results of the experiments are plotted using Matplotlib [47], a Python library
for plotting charts.
Results and Analysis | 53
Chapter 4
Results and Analysis
In this chapter, the results of the seven experiments will be shown. For each
experiment a graphical representation of the result and a numerical one will
be presented and analysed. For each graph, the horizontal axis represents the
different values of SNR expressed in dB on which the algorithms are tested.
The vertical axis instead expresses the SER metric and it is shown with a
logarithmic scale for a clearer comparison. Therefore the graph is a semi-
logarithmic representation of the result. In each graph the MMSE baseline is
represented with a solid line, black colour and with a black square as point
shape. The VAMP algorithm is shown as a red dotted line, with a plus symbol
for each point. The LVAMP algorithm is represented as a dashed line, cyan
colour and triangular point shape. The VAMP and the LVAMP algorithms
in some experiments are very close to each other and in some cases also to
the MMSE algorithm. Therefore in order to distinguish them, a choice of
different line types, colours and point shapes has been taken. The numerical
representation will help in highlight the different results also in these particular
cases. OAMP will be highlighted with a dash-point line, with a circle point
shape and violet colour. OAMPNet2 will be shown with a solid line, "X" point
shape and blue colour. Finally, in the experiments where MMNet is present,
it will be represented through a green dashed line and a three-pointed star as
point shape. The choice of the representations will allow to distinguish the
different algorithms results also if the document will be printed in black and
white or if the reader is colour blind.
The numerical results will be shown in a tabular way, where each column
represents the different values of SNR, each row indicates a different algorithm
and the cells contain the SER metric for each combination algorithm-SNR.
In the section 4.8 an analysis of the complexity of the different algorithms
54 | Results and Analysis
will be conducted.
4.1 Experiment 1
The first experiment is conducted on a small MIMO system with 4 transmitters
and 4 receivers. Both the training and the testing phase are conducted with
i.i.d. Gaussian channel model. For this experiment the algorithms that are
taken into consideration are MMSE, VAMP, OAMP, LVAMP and OAMPNet2.
The testing phase is done with thousands of channel realizations in order to
avoid that some results can be equal to zero for a scarce number of errors.
The results are reported firstly in a numerical way through Table 4.1. In
this representation, only the first four significant digit are reported for a
clearer representation. After that, in Figure 4.1 the graphical and more clear
representation of the results is shown.
The goals of this experiment were to verify the correctness of the implementation
of the algorithms and their capabilities of detecting the transmitted symbols.
As it is possible to see from both the numerical and the graphical representation,
all the four new algorithms perform better than the MMSE baseline. Therefore
for this MIMO configuration, all the AMP based algorithms can be considered
as suboptimal solutions for the MIMO detection problem. Focusing on
the performances and leaving the complexity analysis to the Section 4.8,
interesting considerations can be done.
First of all, as expected the DL-based algorithms perform better than
the algorithms from which they are inspired. In fact, LVAMP improves the
performances of VAMP and OAMPNet2 is definitely better than OAMP. A
second important consideration is that, differently from what expected, the
OAMP algorithm performs better than LVAMP. While OAMP was supposed
to perform better than VAMP due to a higher complexity, there were the
possibility that LVAMP could perform better than OAMP for its DL nature.
Instead, this is not the case, and the reason could be a higher complexity of
OAMP respect to LVAMP. Interesting is the fact that OAMPNet2 outperforms
all the other algorithms, as it is possible to see in the graphical representation,
but also in the numerical one, where the most significant digit is quite always
lower than the one of the other algorithms. In particular the SER value goes
under 0.1 at 20dB of SNR, while for OAMP it happens at 21 dB, for VAMP
and LVAMP at 22 dB and for MMSE at 23dB.
Table 4.1: Experiment 1 numerical results: the cells represents the SER value
for the different algorithms at different SNR values, in a 4 receivers and 4
transmitters MIMO system, trained and tested on i.i.d. Gaussian channel
matrices using QAM-16 as modulation scheme
18 dB 19 dB 20 dB 21 dB 22 dB 23 dB
MMSE 0.2208 0.2077 0.1632 0.1269 0.1082 0.09155
VAMP 0.2048 0.1925 0.1543 0.1164 0.09869 0.08415
OAMP 0.1657 0.1492 0.1230 0.09512 0.08152 0.07263
LVAMP 0.2001 0.1817 0.1441 0.1157 0.08988 0.07840
OAMPNet2 0.1290 0.1190 0.09797 0.07888 0.05861 0.05035
Figure 4.1: Experiment 1 graphical results: the graph represents the SER
values for the different SNR values for each algorithm, in a 4 receivers and
4 transmitters MIMO system, trained and tested on i.i.d. Gaussian channel
4.2 Experiment 2
The second experiment is conducted on an unusual MIMO system composed
by 32 transmitters and 32 receivers. Usually for medium-large MIMO systems
like this one, the number of receivers should be greater than the one of
transmitters, otherwise the signal cannot be detected and separated correctly.
In fact as it possible to see from the numerical results (expressed again with
four significant digits) in Table 4.2 the values of SER are really high, meaning
that the detection is not easy. In this case the interesting results are represented
by VAMP and LVAMP that this time improve the performances of MMSE in a
more evident way as shown also in Figure 4.2. Again the DL-based algorithms
perform better than their original version, even if this time LVAMP and VAMP
are very close in terms of SER values. Again OAMPNet2 provides the best
results, and it is the only algorithm that manages to go under the value of SER
of 0.5, with SNR equal to 23dB. Not only the shape of the system impacts
the results of the experiment, but also the fact that the modulation scheme is
QAM-64, differently from the QAM-16 used in the previous experiment.
for the different algorithms at different SNR values, in a 32 receivers and
18 dB 19 dB 20 dB 21 dB 22 dB 23 dB
MMSE 0.7915 0.7723 0.7462 0.7252 0.6959 0.6653
VAMP 0.7422 0.7227 0.6856 0.6612 0.6262 0.5876
OAMP 0.7106 0.6861 0.6470 0.6172 0.5690 0.5242
LVAMP 0.7387 0.7167 0.6820 0.6496 0.6240 0.5831
OAMPNet2 0.6701 0.6365 0.5958 0.5474 0.5091 0.4584
4.3 Experiment 3
The third experiment is based on a 32 × 32 MIMO system, with QAM-
64 modulation scheme. The algorithms have been trained using Kronecker
channel matrices with different combinations of receivers and transmitters
correlation parameters. The algorithms have been tested on Kronecker
channel model in two different experiments with different pairs of correlation
parameters.
4.3.1 Experiment 3a
The first combination of correlation parameters is ρR = ρT = 0.3, and it
can be seen as a medium correlated Kronecker channel model. The results
are quite similar to the ones in the previous experiments, with a small drop
in the performances due to the more complex and realistic channel model as
it possible to see in Table 4.3. In particular, the distance and shape of the
curves in Figure 4.3 are quite similar to the one of the previous experiment.
This is an interesting result. In fact the algorithms, despite a little drop in
accuracy, work and maintain characteristics similar to the ones that they had
with a simpler channel model. This means that the algorithms are robust also
when the condition of the channel are more complex such in case of Kronecker
channel model with medium correlation.
Table 4.3: Experiment 3a numerical results: the cells represents the SER
value for the different algorithms at different SNR values, in a 32 receivers
and 32 transmitters MIMO system, trained with Kronecker channel matrices
and tested on Kronecker channel model with ρR = ρT = 0.3 using QAM-64
as modulation scheme
18 dB 19 dB 20 dB 21 dB 22 dB 23 dB
MMSE 0.8135 0.8002 0.7752 0.7575 0.7327 0.7043
VAMP 0.7711 0.7542 0.7250 0.7003 0.6717 0.6342
OAMP 0.7393 0.7241 0.6864 0.6636 0.6221 0.5850
LVAMP 0.7698 0.7504 0.7197 0.6924 0.6669 0.6321
OAMPNet2 0.7125 0.6819 0.6426 0.6036 0.5663 0.5251
Figure 4.3: Experiment 3a graphical results: the graph represents the SER
32 transmitters MIMO system, trained with Kronecker channel matrices and
tested on a Kronecker channel model with ρR = ρT = 0.3 using QAM-64 as
modulation scheme
4.3.2 Experiment 3b
In the second experiment with this third MIMO configuration, the algorithms
are tested on a Kronecker channel model with high correlation. In particular
the correlation parameters are ρR = ρT = 0.5. The high correlation impacts
both the accuracy and the shape of the curves. In fact the drop in performances
respect to the second experiment is more evident. Moreover, the OAMPNet2
curve in Figure 4.4 is closer to the one of OAMP and for the first time
LVAMP performs slightly worse than VAMP for some SNR values. This can
be noticed in Table 4.4. The higher drop in performances for the DL-based
algorithms is due to the training phase that uses different combinations of
correlation parameters and not only highly correlated matrices. In fact, for
training the models, ρR and ρT can assume as value 0.1, 0.3, 0.5, 0.7. Different
combinations of these values have been used in order to train the model on
different correlated channels.
Table 4.4: Experiment 3b numerical results: the cells represents the SER
18 dB 19 dB 20 dB 21 dB 22 dB 23 dB
MMSE 0.8476 0.8394 0.8199 0.8055 0.7860 0.7639
VAMP 0.8168 0.8021 0.7785 0.7586 0.7390 0.7097
OAMP 0.7936 0.7782 0.7504 0.7296 0.7046 0.6709
LVAMP 0.8170 0.8011 0.7794 0.7548 0.7357 0.7074
OAMPNet2 0.7790 0.7542 0.7241 0.6924 0.6611 0.6267
Figure 4.4: Experiment 3b graphical results: the graph represents the SER
tested a Kronecker channel model with ρR = ρT = 0.5 using QAM-64 as
modulation scheme
4.4 Experiment 4
Also for this forth experiment two different cases are presented. These two
cases are characterised by a training phase conducted through i.i.d. Gaussian
matrices, and a testing phase based on Kronecker channel matrices. The goal
of this experiment is to verify the adaptability of the algorithms, trained with
a simple channel model and tested on a more complex one.
4.4.1 Experiment 4a
The first case of the fourth experiment is characterised by a testing phase
with a medium correlated Kronecker channel model with ρR = ρT = 0.3.
Different from an expected drop in performances, the experiment provides
very similar results to the one in Experiment 3a, as it is possible to see in Table
4.5. For some SNR values, the performances of LVAMP and OAMPNet2 are
better than the ones obtained in Experiment 3a. This means that these two
new networks are capable of adapting well to different channel models and
matrices. Also with a simple i.i.d. Gaussian training phase, it is possible to
obtain good results also on more complex and realistic channel models. The
similarity of results can be seen also comparing Figure 4.5 that describes this
experiment with Figure 4.3.
Table 4.5: Experiment 4a numerical results: the cells represents the SER value
transmitters MIMO system, trained with i.i.d. Gaussian channel matrices and
tested on Kronecker channel model with ρR = ρT = 0.3 using QAM-64 as
modulation scheme
18 dB 19 dB 20 dB 21 dB 22 dB 23 dB
MMSE 0.8135 0.8002 0.7752 0.7575 0.7327 0.7043
VAMP 0.7711 0.7542 0.7250 0.7003 0.6717 0.6342
OAMP 0.7393 0.7241 0.6864 0.6636 0.6221 0.5850
LVAMP 0.7700 0.7503 0.7196 0.6920 0.6668 0.6316
OAMPNet2 0.7145 0.6847 0.6449 0.6040 0.5673 0.5235
32 transmitters MIMO system, trained with i.i.d. Gaussian channel matrices
and tested a Kronecker channel model with ρR = ρT = 0.3 using QAM-64 as
modulation scheme
4.4.2 Experiment 4b
As for the case of Experiment 3b, in Experiment 4b what differs from the
previous case is the correlation at receiver and transmitter sides. In this
ρR = ρT = 0.5, therefore the channel is highly correlated. Again, the two
DL networks perform as well as when they were trained through Kronecker
channel matrices. Despite the i.i.d. Gaussian training phase and a even more
complex testing phase, the performance are good. The two algorithms can be
defined as robust and can adapt to different situation easily. Moreover, this
allow to trust a simpler and faster training phase. The numerical results are
shown in Table 4.6 while a graphical representation can be seen in Figure 4.6.
Table 4.6: Experiment 4b numerical results: the cells represents the SER value
transmitters MIMO system, trained with i.i.d. Gaussian channel matrices and
tested on Kronecker channel model with ρR = ρT = 0.5 using QAM-64 as
modulation scheme
18 dB 19 dB 20 dB 21 dB 22 dB 23 dB
MMSE 0.8476 0.8394 0.8199 0.8055 0.7860 0.7639
VAMP 0.8168 0.8021 0.7785 0.7586 0.7390 0.7097
OAMP 0.7936 0.7782 0.7504 0.7296 0.7046 0.6709
LVAMP 0.8174 0.8014 0.7799 0.7549 0.7354 0.7071
OAMPNet2 0.7821 0.7571 0.7292 0.6960 0.6625 0.6288
modulation scheme
4.5 Experiment 5
For the last group of experiments, composed by three different experiments,
the MIMO system size and shape changes. In fact, the number of receivers is
64 and the transmitters are 32, therefore the shape is rectangular. This MIMO
configuration is the most common in MIMO detection comparisons. Usually,
this configuration is selected for experiments because it represents a realistic
scenario for MIMO detection. Due to the fact that the thesis wanted to compare
LVAMP and OAMPNet2 with MMNet, also the MMNet algorithm is taken in
consideration for these final experiments. The choice of using MMNet only on
this configuration is because in the previous experiments the focus was only on
the new algorithms. After having studied the characteristics and behaviours
of LVAMP and OAMPNet2, in this common configuration, a comparison with
MMNet can be studied. MMNet provides good results with a low complexity,
but can be fragile in some situation. Especially when the training phase is
conducted offline, as in this case, the algorithm does not adapt well. The
strength of the algorithm is the possibility of being trained online, but this
is not part of this thesis, because it would be an incoherent comparison.
For what concerns Experiment 5, the training phase is conducted through

i.i.d. Gaussian channel matrices, and the same channel model is used for the
testing phase. This experiment is the most used when dealing with comparison
among MIMO detectors. In Table 4.7 and Figure 4.7 the results of the
experiment are presented. As it possible to see, all the algorithms have very
good performances, with SER values that are always (a part from 18 dB of
SNR) under 0.1. The OAMPNet2 algorithm outperforms the others, being the
only one having all SER under 0.1, and reaching 10−4 for 23dB of SNR. VAMP
and LVAMP algorithms instead does not perform well in this configuration,
with values that are very similar to the one of MMSE and curves that overlap
with the baseline.
These two algorithms seem to perform better with smaller MIMO systems.
The OAMP algorithm is the second best algorithm with great results despite
it is not a DL-based algorithm. For what concerns MMNet instead, it has
a particular curve, very different from the others. In fact, at the beginning
the curve and the SER values are very close to the ones of OAMP, but
when the SNR values increase, it does not follow the OAMP curve, having
an improvement of accuracy lower. Interesting is the case of 23 dB, where
the performance of MMNet is worse than all the other algorithms, including
MMSE. The MMNet algorithm, compared to the others, seems to be less
adaptable to changes of SNR, while the others have a more regular shape.
for the different algorithms at different SNR values, in a 64 receivers and
18 dB 19 dB 20 dB 21 dB 22 dB 23 dB
MMSE 0.1373 0.08935 0.04715 0.02435 0.01060 0.004058
MMNET 0.1075 0.05670 0.02833 0.01426 0.008451 0.004744
VAMP 0.1353 0.08800 0.04663 0.02423 0.01022 0.003934
OAMP 0.1008 0.05537 0.02401 0.009446 0.002904 0.0008124
LVAMP 0.1356 0.08815 0.04666 0.02454 0.01013 0.003934
OAMPNet2 0.06693 0.02404 0.006521 0.001843 0.0004999 0.0001251
4.6 Experiment 6
As for the experiment 3, experiment 6 is divided in two cases, that are
based on training and testing phases conducted with Kronecker channel
matrices. Differently from experiment 3, in this case also MMNet algorithm
is considered, and the size and shape of the MIMO system are the same of the
ones in experiment 5.
4.6.1 Experiment 6a
The first case consists in an experiment where the DL-based algorithms are
trained with Kronecker channel matrices and then all the algorithms are tested
on a a Kronecker channel model with ρR = ρT = 0.3. In Table 4.8 and
Figure 4.8 the results of the experiment are reported. At first sight, what is
evident is the strange behaviour of MMNet. In fact, the MMNet algorithm
performs very badly, with SER values that are very high. This algorithm,
as said before, is fragile when trained offline and this experiments shows the
difficulties for MMNet to adapt to a more complex scenario. For what concerns
the other algorithms, they behave similarly to experiment 5. In fact, also in

this case VAMP and LVAMP have performances very close to the ones of
MMSE, and graphically they appear overlapped. OAMP provides again good
results, while OAMPNet2 outperforms the other algorithms. As it possible to
see, OAMPNet2 is the only algorithm that reaches a SER value close to 10−4
with a SNR value of 23 dB. The SER values are a bit worse than the ones in
experiment 5, but this is due to a more complex and realistic MIMO system in
which the experiment is conducted.
18 dB 19 dB 20 dB 21 dB 22 dB 23 dB
MMSE 0.2095 0.1469 0.09239 0.05415 0.02750 0.01172
MMNET 0.8810 0.8756 0.8698 0.8646 0.8605 0.8563
VAMP 0.2066 0.1463 0.09072 0.05430 0.02688 0.01162
OAMP 0.1659 0.09946 0.04852 0.02179 0.007393 0.002249
LVAMP 0.2088 0.1496 0.09230 0.05503 0.02669 0.01184
OAMPNet2 0.1330 0.06125 0.01971 0.005835 0.0009685 0.0003750
modulation scheme
4.6.2 Experiment 6b
The second case changes the correlation parameters respect to the experiment
6a. In fact in this case ρR = ρT = 0.5. The result of the experiment are
presented in Table 4.9 and Figure 4.9. This time MMNet has SER values
very close to 1, meaning that quite all the estimations are wrong. With an
highly correlated Kronecker channel model, MMNet is not able to estimate
the transmitted signal if trained offline. An interesting result of the experiment
is represented by OAMPNet2. In fact for the first time, it has a behaviour
very close to the one of OAMP for SNR equal to 18, 19 and 20 dB. For the
last three values of SNR, instead, OAMPNet2 performs better than the others.
Again, VAMP and LVAMP have accuracy very close to the one of MMSE,
overlapping graphically. Also OAMP is closer to MMSE for 18dB of SNR,
but then its performances improve.
18 dB 19 dB 20 dB 21 dB 22 dB 23 dB
MMSE 0.37177 0.3000 0.2257 0.1615 0.1064 0.06359
MMNET 0.9762 0.9761 0.9763 0.9762 0.9762 0.9762
VAMP 0.3655 0.2960 0.2228 0.1581 0.1053 0.06249
OAMP 0.3381 0.2497 0.1673 0.09584 0.04526 0.01956
LVAMP 0.3678 0.2958 0.2243 0.1592 0.1043 0.06337
OAMPNet2 0.3227 0.2333 0.1408 0.06684 0.02191 0.007268
modulation scheme
4.7 Experiment 7
The final experiment is based on a training phase conducted with i.i.d.
Gaussian channel matrices and then the algorithms are tested on a Kronecker
channel model. Also this time, two cases are analysed. The goal of this last
experiment is to verify the adaptability of the algorithms when trained on a

simple channel model and tested on a more complex one.
4.7.1 Experiment 7a
The first case uses ρR = ρT = 0.3 as correlation parameters for the Kronecker
channel model. The results of the experiment are shown in Table 4.10 and
Figure 4.10. The results are very similar to the ones obtained in experiment
6a, with some SER values that are also better. Again, MMNet is not able to
adapt to a more complex channel model, having high SER values for each SNR
considered. VAMP and LVAMP performs as MMSE as seen in the previous
experiments with the 32 × 64 MIMO configuration. OAMPNet2, despite the
training phase conducted with i.i.d. Gaussian channel matrices, outperforms
the other algorithms. Again it reaches a value of SER in order of 10−4 for SNR
equal to 22 dB and 23 dB.
value for the different algorithms at different SNR values, in a 64 receivers and
18 dB 19 dB 20 dB 21 dB 22 dB 23 dB
MMSE 0.2095 0.1469 0.09239 0.05415 0.02750 0.01172
MMNET 0.8519 0.8538 0.8528 0.8571 0.8549 0.8559
VAMP 0.2066 0.1463 0.09072 0.05430 0.02688 0.01162
OAMP 0.1659 0.09946 0.04852 0.02179 0.007393 0.002249
LVAMP 0.2088 0.1493 0.09212 0.05503 0.02657 0.01162
OAMPNet2 0.1352 0.06406 0.0208 0.005400 0.0009372 0.0003124
modulation scheme
4.7.2 Experiment 7b
The second case and last experiment is conducted testing the algorithms on
a Kronecker channel model with correlation parameters ρR = ρT = 0.5,
therefore an highly correlated channel. The DL-based algorithms are again
trained with i.i.d. Gaussian channel matrices. The results of the experiment
are very similar to the ones of experiment 6b and they can be seen in Table
4.11 and Figure 4.11. MMNet again misses the estimations of all the points,
achieving a SER very close to 1 for all the SNR values. Also in this case,
VAMP, LVAMP and MMSE perform the same, with overlapping curves in the
graph. OAMPNet2 is also in this case the best algorithm, but for low SNR
values it performs like OAMP. For higher SNR values instead it improves its
performances. Also OAMP has performances that are a bit worse for the lowest
SNR values, and better when SNR increases.
18 dB 19 dB 20 dB 21 dB 22 dB 23 dB
MMSE 0.37177 0.3000 0.2257 0.1615 0.1064 0.06359
MMNET 0.9751 0.9752 0.9754 0.9755 0.9755 0.9754
VAMP 0.3655 0.2960 0.2228 0.1581 0.1053 0.06249
OAMP 0.3381 0.2497 0.1673 0.09584 0.04526 0.01956
LVAMP 0.3682 0.2960 0.2247 0.1591 0.1040 0.06329
OAMPNet2 0.3296 0.2413 0.1478 0.07555 0.02642 0.008451
modulation scheme
4.8 Complexity Analysis

The algorithms that are taken in consideration in the experiments have different
complexity. In this section an analysis in terms of computational complexity
is conducted. MMSE has a complexity in O(Nr3 ) because it is dominant

by the matrix inversion of the channel matrix. VAMP and LVAMP have
a complexity in O(T Nr3 ) due to the SVD decomposition, but with some
new implementations they can be in O(2T Nt Nr ) using an economy-SVD.
For VAMP, T represents the number of iterations, while for LVAMP it
is the number of layers of the network. For what concerns OAMP and
OAMPNet2, the matrix inversion is again dominant in the detection, therefore
the complexity is in O(T Nr3 ), where T are the number of iterations for OAMP,
and the number of layers of the network for OAMPNet2. Finally, MMNet has a
complexity in O(T Nr2 ) for the detection. The number of iterations or layers in
order to the algorithms to converge is different, impacting the complexity. For
MMSE, there is only an iteration since it is not an iterative algorithm. VAMP
and LVAMP converge in 5 or 6 iterations/layers. OAMP and OAMPNet2
converge faster, since they converge in 4 or 5 iterations/layers. MMNet is the
slower algorithm of this group. In fact it needs from 10 to 14 layers to converge,
impacting the complexity of the algorithm. A summary of this complexity
analysis is shown in Table 4.12 where each column represents an algorithm,
the first row the computational complexity and the second row the number of
iterations/layers T for convergence.
Table 4.12: Complexity and number of iterations/layers for convergence of

each algorithm
MMSE MMNet VAMP LVAMP OAMP OAMPNet2
Complexity O(Nr3 ) O(T Nr ) O(2T Nr Nt ) O(2T Nr Nt ) O(T Nr3 )
2
O(T Nr3 )
T 1 10-14 5-6 5-6 4-5 4-5
4.9 Summary of results

In this section, through Table 4.13, all the positive and negative aspects of the
algorithms are summarised.
Table 4.13: Summary of results analysis

Algorithm Pros Cons
VAMP · lower complexity than MMSE
· better performances than MMSE
· same performance of MMSE on
on small MIMO systems
medium-large MIMO systems
· few iterations to converge
· adaptability
· good performance on complex
channel model
LVAMP · lower complexity than MMSE
· same performance of MMSE on
· better performances than MMSE
medium-large MIMO systems
on small MIMO systems
· few layers to converge
· adaptability, also with simple
· it does not improve VAMP in
training phase
most cases
channel model
· better performance than VAMP,
OAMP
LVAMP and MMSE always
· few iterations to converge · higher complexity than MMSE
· adaptability
channel model
· better performance than other
OAMPNet2
algorithms always
· few layers to converge
· higher complexity than MMSE
· adaptability, also with simple
training phase
· excellent performance on complex
channel model
· improves OAMP’s performances
Conclusions and Future work | 75
Chapter 5
Conclusions and Future work
In this chapter a final analysis of the results of the thesis are discussed. Firstly,
in section 5.1 the conclusion is presented. After that the future works that can
be followed are listed in section 5.2.
5.1 Conclusions
The aim of the thesis was to present, implement, test and compare new
techniques based on DL in order to solve the MIMO detection problem. The
algorithms that have been analysed are VAMP and its DL version called
LVAMP, OAMP and the deep network based on it OAMPNet2. The research
question to answer in the thesis was:
Research Question: Can LVAMP and OAMPNet2 be considered as sub-

optimal solutions to MIMO detection problem and which one performs better
in terms of complexity and SER on different MIMO scenarios?
Thanks to several experiments, it has been possible to answer to this

question.
First of all, both LVAMP and OAMPNet2 algorithms can be considered
sub-optimal solutions for the MIMO detection problem. In fact, LVAMP
can be built with a complexity lower than MMSE and it obtains SER values
that are equal or better than MMSE in all the scenarios considered for the
experiments. Also OAMPNet2 is a sub-optimal solution to the problem
because it outperforms MMSE in all the scenarios considered, having a
complexity that is just a few greater than MMSE, thanks also to a very fast
convergence.
76 | Conclusions and Future work
In terms of complexity LVAMP is better than OAMPNet2, and also of

MMNet. The performances are very interesting for this complexity and
number of layers for convergence. Therefore, it is perfect for a MIMO system
in which the complexity should be low, but with an accuracy at least equal to
the one of MMSE.
For what concerns the SER performances, OAMPNet2 has incredible
performances, outperforming all the algorithms considered in every scenarios.
This network is perfect for a MIMO system in which the detection does not
represents the complexity bottleneck. It is a very accurate algorithm with
performances often near also to ML, with a complexity only four or five times
the one of MMSE. It could improve the MIMO detection problem, becoming
a promising candidate as detector.
Another important insight discovered thanks to the experiments is that
the two deep networks are able to adapt very well to different conditions.
In fact they perform well on both i.i.d. Gaussian and Kronecker channel
models. They are also able to perform well on realistic channel models like the
Kronecker one, with both medium and high correlation parameters. Moreover,
they can perform well on this type of channel, also with a simple training phase
conducted with i.i.d. Gaussian channel matrices.
Therefore, this thesis brought as positive advantage, two new MIMO
detectors that can have a practical implementations in real MIMO systems.
In fact they can face the complexity-performance trade-off. LVAMP with
performance like MMSE but with less complexity. OAMPNet2 with amazing
performances with a complexity only a few higher than MMSE. The characteristic
that most can help these two algorithms in finding a practical implementation,
is their adaptability.
All the goals of the thesis have been met successfully.
1. The simulator for conducting the experiments has been developed in

time and correctly in order to perform the experiments.
2. The algorithms have been found and studied in both original and
alternative versions in order to find the best for conducting the thesis.
3. The algorithms have been implemented correctly, since they provided

consistent results compared to their original works.
4. All the experiments were successfully conducted and they provided very
interesting insights for the analysis.
Conclusions and Future work | 77
5. A deep analysis of the result of each experiment has been conducted,

showing characteristics of the single algorithms and comparisons among
them.
The drawbacks of this work is that in order to obtain the performances achieved
by OAMPNet2, an high complexity is still needed despite a deep learning
approach. At the same time, lowering the complexity thanks to LVAMP does
not bring a strong improvement of the performances. Therefore, despite deep
learning can be an interesting and promising solution for MIMO detection, a
deep learning that can face better the complexity-performance trade-off is still
missing.
5.2 Future work

In this section, the future works and directions that can follow this thesis work,
are presented.
The first future work that should be done is to try to find new methods in
order to solve the MIMO detection problem. DL is bringing promising results
in very different fields, and also in MIMO detection it can help in improving the
solutions for the problem. New algorithms should be studied, implemented,
tested and compared with the existing solution in order to find the one that best
handle the complexity-performance trade-off.
A second direction can be to compare and verify that all the different
version of VAMP and LVAMP performs the same or if there is a version that
provides better results than the others. This can be done by using different
approaches to compute the SVD decomposition for the VAMP in SVD-form.
Otherwise, different denoising function can be tested for the LMMSE-form of
the VAMP algorithm.
Trying to rethink the OAMPNet2 algorithm in order to lowering its
complexity can be an hard but useful future work for trying to have a very
competitive detector. Some deep learning approaches can be adopted to
approximately invert the channel matrix.
Another interesting future work is to apply all the algorithms presented
in the thesis in the 3GPP channel model, that is the most realistic scenario
for simulating link layer transmission in MIMO systems. It can be interesting
to verify the adaptability of LVAMP and OAMPNet2 to this very complex
channel model.
As modulation scheme and SNR values, in the thesis, only limited values
have been tested. A simple future work can be testing the algorithms also
78 | Conclusions and Future work
using other QAM types as modulation scheme. Moreover, the experiment can
be conducted on a wider range of SNR values in order to discover strange
behaviour of the curves of the algorithms. On the same direction, also different
sizes of MIMO system can be tested, testing the algorithms also on bigger
systems and with different N Nt
r
ratios.
Another direction can be changing the loss function during the training
phase of the DL-based algorithms. Different loss functions can bring different
results and new insights on LVAMP and OAMPNet2.
Finally, even if in the thesis it has already been done, trying different
numbers of iterations/layers for the four proposed algorithms can help in
finding the best trade-off between performances and complexity.
REFERENCES | 79
References
[1] M. Cirkic, “Efficient mimo detection methods,” Linköping studies

in science and technology, Jan 2014. doi: 10.3384/diss.diva-103675.
[Online]. Available: http://www.diva-portal.org/smash/record.jsf?pid=
diva2:690022&dswid=3455
[2] M. Dohler and T. Nakamura, 5G Mobile and Wireless

Communications Technology, A. Osseiran, J. F. Monserrat, and
P. Marsch, Eds. Cambridge University Press, 2016. [Online].
Available: https://www.researchgate.net/publication/305882445_5G_
Mobile_and_Wireless_Communications_Technology
[3] A. Scotti, “Graph neural networks and learned approximate message

passing algorithms for massive mimo detection,” 2020. [Online].
Available: http://kth.diva-portal.org/smash/record.jsf?dswid=-8151&
pid=diva2:1479310
[4] S. Yang and L. Hanzo, “Fifty years of mimo detection: The road
to large-scale mimos,” IEEE Communications Surveys & Tutorials,
vol. 17, no. 4, p. 1941–1988, 2015. doi: 10.1109/comst.2015.2475242.
[Online]. Available: https://arxiv.org/abs/1507.05138
[5] H. He, C.-K. Wen, S. Jin, and G. Y. Li, “A model-

driven deep learning network for mimo detection,” IEEE
Transactions on Signal Processing, vol. 68, p. 1702–1715,
Feb 2020. doi: 10.1109/TSP.2020.2976585. [Online]. Available:
https://ieeexplore.ieee.org/abstract/document/9018199
[6] D. L. Donoho, A. Maleki, and A. Montanari, “Message passing

algorithms for compressed sensing: I. motivation and construction,” in
2010 IEEE Information Theory Workshop on Information Theory (ITW
2010, Cairo), 2010. doi: 10.1109/ITWKSPS.2010.5503193 pp. 1–5.
[Online]. Available: https://ieeexplore.ieee.org/document/5503193
80 | REFERENCES
[7] A. Maleki, “Approximate message passing algorithms for compressed

sensing,” Sep 2011. [Online]. Available: https://searchworks.stanford.
edu/view/8813226
[8] M. Borgerding, P. Schniter, and S. Rangan, “Amp-inspired deep

networks for sparse linear inverse problems,” IEEE Transactions on
Signal Processing, vol. 65, no. 16, p. 4293–4308, Aug 2017. doi:
10.1109/tsp.2017.2708040. [Online]. Available: http://dx.doi.org/10.
1109/TSP.2017.2708040
[9] E. Larsson, “Mimo detection methods: How they work [lecture

notes],” IEEE Signal Processing Magazine, vol. 26, no. 3, pp.
91–95, 2009. doi: 10.1109/MSP.2009.932126. [Online]. Available:
https://ieeexplore.ieee.org/document/4815548
[10] L. Li and W. Meng, Detection for Uplink Massive

MIMO System: A Survey. Springer International Publishing,
11 2019, pp. 287–299. ISBN 978-3-030-36401-4. [Online].
Available: https://www.researchgate.net/publication/337600864_
Detection_for_Uplink_Massive_MIMO_System_A_Survey
[11] N. Shlezinger, R. Fu, and Y. C. Eldar, “Deep soft interference

cancellation for mimo detection,” in ICASSP 2020 - 2020 IEEE
International Conference on Acoustics, Speech and Signal Processing
(ICASSP), 2020. doi: 10.1109/ICASSP40776.2020.9054732 pp.
8881–8885. [Online]. Available: https://ieeexplore.ieee.org/document/
9054732
[12] C. Jeon, R. Ghods, A. Maleki, and C. Studer, “Optimality of large mimo

detection via approximate message passing,” 2015 IEEE International
Symposium on Information Theory (ISIT), June 2015. doi: 1510.06095.
[Online]. Available: http://arxiv.org/abs/1510.06095
[13] D. L. Donoho, A. Maleki, and A. Montanari, “Message-passing

algorithms for compressed sensing,” Proceedings of the National
Academy of Sciences, vol. 106, no. 45, p. 18914–18919, Oct
2009. doi: 10.1073/pnas.0909892106. [Online]. Available: http:
//dx.doi.org/10.1073/pnas.0909892106
[14] J. Ma and L. Ping, “Orthogonal amp,” 2017. [Online]. Available:

https://arxiv.org/abs/1602.06509
REFERENCES | 81
[15] H. Ye, G. Y. Li, and B. H. Juang, “Power of deep learning for

channel estimation and signal detection in ofdm systems,” EEE Wireless
Communications Letters, vol. 7, no. 1, 2017. doi: 1708.08514. [Online].
Available: http://arxiv.org/abs/1708.08514
[16] Q. Zou, X. Tan, M. Liu, and L. Ma, “Main-branch structure iterative

detection using approximate message passing for uplink large-scale
multiuser mimo systems,” International Journal of Antennas and
Propagation, vol. 2016, 2016. doi: 10.1155/2016/2832584. [Online].
Available: https://www.hindawi.com/journals/ijap/2016/2832584/
[17] S. Rangan, “Generalized approximate message passing for estimation

with random linear mixing,” 2012. [Online]. Available: https:
//arxiv.org/abs/1010.5141
[18] P. Zheng, Y. Zeng, Z. Liu, and Y. Gong, “Deep learning based

trainable approximate message passing for massive mimo detection,”
ICC 2020 - 2020 IEEE International Conference on Communications
(ICC), 2020. doi: 10.1109/icc40277.2020.9148845. [Online]. Available:
[19] S. Wu, L. Kuang, Z. Ni, J. Lu, D. Huang, and Q. Guo, “Low-

complexity iterative detection for large-scale multiuser mimo-ofdm
systems using approximate message passing,” IEEE Journal of
Selected Topics in Signal Processing, vol. 8, no. 5, pp. 902–
915, 2014. doi: 10.1109/JSTSP.2014.2313766. [Online]. Available:
[20] Z. Zhang, X. Cai, C. Li, C. Zhong, and H. Dai, “One-bit

quantized massive mimo detection based on variational approximate
message passing,” IEEE Transactions on Signal Processing,
vol. PP, pp. 1–1, 12 2017. doi: 10.1109/TSP.2017.2786256.
[Online]. Available: https://www.researchgate.net/publication/
322016114_One-Bit_Quantized_Massive_MIMO_Detection_Based_
on_Variational_Approximate_Message_Passing
[21] X. Liu and Y. Li, “Deep mimo detection based on belief propagation,”
in 2018 IEEE Information Theory Workshop (ITW), 2018. doi:
10.1109/ITW.2018.8613336 pp. 1–5. [Online]. Available: https://
ieeexplore.ieee.org/document/8613336
82 | REFERENCES
[22] “The 17 goals | sustainable development,” accessed: 2021-12-18.

[Online]. Available: https://sdgs.un.org/goals
[23] G. Van Rossum and F. L. Drake, Python 3 Reference Manual. Scotts

Valley, CA: CreateSpace, 2009. ISBN 1441412697. [Online]. Available:
https://www.python.org/
[24] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S.

Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow,
A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser,
M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray,
C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar,
P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals,
P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng,
“TensorFlow: Large-scale machine learning on heterogeneous systems,”
2015, software available from tensorflow.org. [Online]. Available:
https://www.tensorflow.org/
[25] D. Gesbert, M. Shafi, D. shan Shiu, P. Smith, and A. Naguib, “From

theory to practice: an overview of mimo space-time coded wireless
systems,” IEEE Journal on Selected Areas in Communications, vol. 21,
no. 3, pp. 281–302, 2003. doi: 10.1109/JSAC.2003.809458. [Online].
Available: https://ieeexplore.ieee.org/document/1192168
[26] S. Noh, M. D. Zoltowski, Y. Sung, and D. J. Love, “Pilot beam

pattern design for channel estimation in massive mimo systems,” IEEE
Journal of Selected Topics in Signal Processing, vol. 8, no. 5, pp.
787–801, 2014. doi: 10.1109/JSTSP.2014.2327572. [Online]. Available:
[27] F. Ademaj, M. Taranetz, and M. Rupp, “3gpp 3d mimo channel model:

a holistic implementation guideline for open source simulation tools,”
EURASIP Journal on Wireless Communications and Networking, vol.
2016, 02 2016. doi: 10.1186/s13638-016-0549-9. [Online]. Available:
https://www.researchgate.net/publication/295248394_3GPP_3D_
MIMO_channel_model_a_holistic_implementation_guideline_for_
open_source_simulation_tools
[28] B. Mondal, T. Thomas, E. Visotsky, F. Vook, A. Ghosh, Y.-H. Nam, Y. li,

J. Zhang, M. Zhang, Q. Luo, Y. Kakishima, and K. Kitao, “3d channel
model in 3gpp,” Communications Magazine, IEEE, vol. 53, no. 3,
REFERENCES | 83
pp. 16–23, 02 2015. doi: 10.1109/MCOM.2015.7060514. [Online].

Available: https://ieeexplore.ieee.org/document/7060514
[29] G. Stuber, J. Barry, S. McLaughlin, Y. Li, M. Ingram,

and T. Pratt, “Broadband mimo-ofdm wireless communications,”
Proceedings of the IEEE, vol. 92, no. 2, pp. 271–294,
2004. doi: 10.1109/JPROC.2003.821912. [Online]. Available:
[30] J. J, “Mimo-ofdm for 4g wireless systems,” International

Journal of Engineering Science and Technology, vol. 2, 07
2010. [Online]. Available: https://www.researchgate.net/publication/
50315321_MIMO-OFDM_for_4G_Wireless_Systems
[31] P. Patil, M. R. Patil, S. Itraj, and U. L. Bomble, “A review on mimo

ofdm technology basics and more,” in 2017 International Conference on
Current Trends in Computer, Electrical, Electronics and Communication
(CTCEEC), 2017. doi: 10.1109/CTCEEC.2017.8455114 pp. 119–124.
[32] M. N. Aarab and O. Chakkor, “Mimo-ofdm for wireless systems: An

overview,” in Advanced Intelligent Systems for Sustainable Development,
M. Ezziyyani, Ed. Cham: Springer International Publishing, 2020,
pp. 185–196. [Online]. Available: https://link.springer.com/chapter/10.
1007/978-3-030-33103-0_19
[33] S. Rangan, P. Schniter, and A. K. Fletcher, “Vector approximate message

passing,” CoRR, vol. abs/1610.03082, 2016. [Online]. Available:
http://arxiv.org/abs/1610.03082
[34] X. Tan, W. Xu, Y. Be’ery, Z. Zhang, X. You, and C. Zhang, “Improving

massive mimo belief propagation detector with deep neural network,”
2018. [Online]. Available: https://arxiv.org/abs/1804.01002
[35] D. L. Donoho, A. Maleki, and A. Montanari, “Message passing

algorithms for compressed sensing: II. analysis and validation,” CoRR,
vol. abs/0911.4222, 2009. [Online]. Available: http://arxiv.org/abs/
0911.4222
[36] M. Khani, M. Alizadeh, J. Hoydis, and P. Fleming, “Adaptive

neural signal detection for massive mimo,” IEEE Transactions on
Wireless Communications, vol. 19, no. 8, pp. 5635–5648, 2020. doi:
84 | REFERENCES
10.1109/TWC.2020.2996144. [Online]. Available: https://ieeexplore.

ieee.org/abstract/document/9103314
[37] S. Zhang, M. Zhang, S. Jin, and C.-K. Wen, “Low-complexity

detection for mimo c-fbmc using orthogonal approximate message
passing,” IEEE Signal Processing Letters, vol. 26, no. 1, pp.
34–38, 2019. doi: 10.1109/LSP.2018.2879545. [Online]. Available:
[38] H. He, C.-K. Wen, S. Jin, and G. Y. Li, “A model-driven deep learning
network for mimo detection,” 2018 IEEE Global Conference on Signal
and Information Processing (GlobalSIP), pp. 584–588, 2018. [Online].
Available: https://arxiv.org/abs/1809.09336
[39] S. Lyu and C. Ling, “Hybrid vector perturbation precoding: The

blessing of approximate message passing,” IEEE Transactions
on Signal Processing, vol. 67, no. 1, pp. 178–193, 2019.
doi: 10.1109/TSP.2018.2877205. [Online]. Available: https:
//ieeexplore.ieee.org/document/8501573
[40] S. A. Hodudi Atigh, J. Pourrostam, and B. M. Tazeh kand, “Uplink

massive mimo detector based on vector approximate message passing,”
in 2020 10th International Conference on Computer and Knowledge
Engineering (ICCKE), 2020. doi: 10.1109/ICCKE50421.2020.9303611
pp. 388–391. [Online]. Available: https://ieeexplore.ieee.org/document/
9303611
[41] M. H. Siddiqui, K. Khurshid, I. Rashid, A. A. Khan, and K. Ahmed,

“Optimal massive mimo detection for 5g communication systems
via hybrid n-bit heuristic assisted-vblast,” IEEE Access, vol. 7,
pp. 173 646–173 656, 2019. doi: 10.1109/ACCESS.2019.2949247.
[42] Z. Wang, “Iterative detection and decoding with pic algorithm

for mimo-ofdm systems,” IJCNS, vol. 2, pp. 351–356, 01
2009. doi: 10.4236/ijcns.2009.25038. [Online]. Available: https:
//www.researchgate.net/publication/220099266_Iterative_Detection_
and_Decoding_with_PIC_Algorithm_for_MIMO-OFDM_Systems
[43] Y. Bai, W. Chen, J. Chen, and W. Guo, “Deep learning

methods for solving linear inverse problems: Research directions
and paradigms,” Signal Processing, vol. 177, p. 107729, Dec
REFERENCES | 85
2020. doi: 10.1016/j.sigpro.2020.107729. [Online]. Available:

http://dx.doi.org/10.1016/j.sigpro.2020.107729
[44] N. Samuel, T. Diskin, and A. Wiesel, “Learning to detect,” IEEE

Transactions on Signal Processing, vol. 67, no. 10, p. 2554–2564,
May 2019. doi: 10.1109/tsp.2019.2899805. [Online]. Available: http:
//dx.doi.org/10.1109/TSP.2019.2899805
[45] T. Diskin, N. Samuel, and A. Wiesel, “Deep mimo detection,”

2017 IEEE 18th International Workshop on Signal Processing
Advances in Wireless Communications (SPAWC), pp. 1–5, 2017.
doi: 10.1109/SPAWC.2017.8227772. [Online]. Available: https:
//arxiv.org/abs/1706.01151
[46] G. Gao, C. Dong, and K. Niu, “Sparsely connected neural network

for massive mimo detection,” in 2018 IEEE 4th International
Conference on Computer and Communications (ICCC), 2018. doi:
10.1109/CompComm.2018.8780959 pp. 397–402. [Online]. Available:
[47] J. D. Hunter, “Matplotlib: A 2d graphics environment,” Computing

in Science & Engineering, vol. 9, no. 3, pp. 90–95, 2007. doi:
10.1109/MCSE.2007.55. [Online]. Available: https://matplotlib.org/
86 | REFERENCES
Appendix A: Extra experiments | 87
Appendix A
Extra experiments
In this appendix, the results of the experiments conducted with 16 transmitters

and 32 receivers are shown. As for the 32 × 32 configuration and for the
16 × 32 one, there are three different experiments. In the first experiment,
the algorithms are trained using i.i.d. Gaussian channel model and tested on
the same channel model. In the second experiment, the algorithms are trained
using Kronecker channel matrices, and tested on firstly a Kronecker channel
model with ρR = ρt = 0.3, and secondly with ρR = ρt = 0.5. Finally, the
third experiment consists in a training phase conducted with i.i.d. Gaussian
channel model and again the testing phase is split in two cases. In the first case
the channel model is Kronecker with ρR = ρt = 0.3, while in the second case
ρR = ρt = 0.5. For all the experiments the modulation scheme is QAM-64
and the SNR values go from 18 dB to 23 dB with step size of 1 dB. Only the
graphical results are shown in this appendix, without the numerical results and
without a description of the result.
88 | Appendix A: Extra experiments
Experiment A-1
Figure A.1: Experiment A-1 graphical results: the graph represents the SER
Experiment A-2a
Figure A.2: Experiment A-2a graphical results: the graph represents the SER
modulation scheme
Experiment A-2b
Figure A.3: Experiment A-2b graphical results: the graph represents the SER
modulation scheme
Experiment A-3a
Figure A.4: Experiment A-3a graphical results: the graph represents the SER
modulation scheme
Experiment A-3b
Figure A.5: Experiment A-3b numerical results: the cells represents the SER
For DIVA
{
"Author1": {
"Last name": "Pozzoli",
"First name": "Andrea",
"Local User Id": "apozzoli",
"E-mail": "apozzoli@kth.se",
"ORCiD": "0000-0001-6871-9968",
"organisation": {"L1": "School of Electrical Engineering and Computer Science ",
}
},
"Degree": {"Educational program": "Master’s Programme, ICT Innovation, 120 credits"},
"Title": {
"Main title": "Deep Learning based Approximate Message Passing for MIMO Detection in 5G",
"Subtitle": "Low complexity deep learning algorithms for solving MIMO Detection in real world scenarios",
"Language": "eng" },
"Alternative title": {
"Main title": "Deep Learning-baserat Ungefärligt meddelande som passerar för MIMO-detektion i 5G",
"Subtitle": "Låg komplexitet djupinlärningsalgoritmer för att lösa MIMO-detektion i verkliga scenarier",
"Language": "swe"
},
"Supervisor1": {
"Last name": "Liu",
"First name": "Dong",
"E-mail": "doli@kth.se",
},
"Examiner1": {
"Last name": "Chatterjee",
"First name": "Saikat",
"E-mail": "sach@kth.se",
},
"Cooperation": { "Partner_name": "Huawei Technologies Sweden"},
"Other information": {
"Year": "2021", "Number of pages": "xix,92"}
}
TRITA-EECS-EX-2022:93
www.kth.se

FULLTEXT01

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

FULLTEXT01

Uploaded by

Copyright:

Available Formats

DEGREE PROJECT IN COMPUTER SCIENCE AND ENGINEERING,

SECOND CYCLE, 30 CREDITS

Deep Learning based Approximate

KTH ROYAL INSTITUTE OF TECHNOLOGY

Low complexity deep learning algorithms

Master’s Programme, ICT Innovation, 120 credits

Supervisor: Dong Liu

Host company: Huawei Technologies Sweden

Stockholm, December 2021

4 Results and Analysis 53

5 Conclusions and Future work 75

1.1 Massive MIMO architecture . . . . . . . . . . . . . . . . . . 3

3.1 LVAMP network layer . . . . . . . . . . . . . . . . . . . . . . 43

4.1 Experiment 1 graphical results: the graph represents the SER

4.5 Experiment 4a graphical results: the graph represents the SER

4.11 Experiment 7b graphical results: the graph represents the SER

A.1 Experiment A-1 graphical results: the graph represents the

3.1 Experiments summary . . . . . . . . . . . . . . . . . . . . . 52

4.1 Experiment 1 numerical results: the cells represents the SER

4.5 Experiment 4a numerical results: the cells represents the SER

4.11 Experiment 7b numerical results: the cells represents the SER

List of acronyms and abbreviations

3D-Uma Urban Macro

3GPP 3rd Generation Partnership Project

AMP Approximate Message Passing

AMP-LA Approximate Message Passing simplified by Linear Approximation

ANN Artificial Neural Network

AWGN Additive White Gaussian Noise

BER Bit Error Rate

BiG-AMP Bilinear Generalized AMP

CCI Cochannel interference

DLBP Deep Learning Belief Propagation

FD-MIMO full-dimension MIMO

FISTA Fast ISTA

GAMP Generalized Approximate Message Passing

GSCM geometry-based stochastic Spatial Channel Model

i.i.d. Independent and Identically Distributed

ICI Inter-Carrier Interference

IoT Internet of Things

ISI Inter-Symbols Interference

ISTA Iterative Soft Thresholding Algorithm

JCESD joint channel estimation and signal detection

LAMP Learnable AMP

LISTA Learnable ISTA

LMMSE Linear Minimum Mean Squared Error

LOS line of sight

LRA Lattice Reduction Aided

LTE Long-Term Evolution

LVAMP Learnable Vectorized AMP

MAP Maximum a posteriori

MIMO-OFDM Multiple input multiple output orthogonal frequency division

MLP Multilayer Perceptron

MMSE Minimum Mean Squared Error

NLOS non line of sight

OAMP Orthogonal AMP

PDP power delay profile

QAM Quadrature Amplitude Modulation

TAMP Trainable Approximate Message Passing

VAMP Vectorized AMP

increases the information throughput and link reliability by exploiting link

Figure 1.1: Massive MIMO architecture

1.2 Related work

The simplest MIMO detection methods work with a linear preprocessing

of accuracy but due to its exponential complexity, it cannot be adopt in MIMO