Phd Thesis Sandesh

Deep Learning Based Channel Estimation in
Wireless Communications
A Thesis
Submitted for the Degree of
Doctor of Philosophy
in the Faculty of Engineering
Submitted by
Sandesh Rao Mattu
Electrical Communication Engineering

Indian Institute of Science
Bangalore – 560 012
December 2023
©Sandesh Rao Mattu
December 2023
All rights reserved
To
My Family and Teachers

Acknowledgements
I wish to express my gratitude by referring to the following lines from Taittiriya Upan-
ishad, “Maathru Devo Bhava, Pithru Devo Bhava, Aacharya Devo Bhava”.
I am indebted to my parents for being my support and my strength through highs
and lows.
I am deeply grateful to my guide and my advisor, Prof. A. Chockalingam, who
restrained me and kept me focused whenever I began to stray. I’ll always cherish the
anecdotes, the stories, and the discussions we had.
I thank Dr. K. Raghunath, Raghunath Learning Center, for believing in me and
instilling confidence in me, even when I had given up. I also like to thank Dr. Lakshmi
Narasimhan Theagarajan, IIT Madras, for all the discussions and guidance.
I am grateful to my lab colleagues for all the wonderful discussions and light-hearted
moments.
I am grateful to staffs of the ECE department.
I am also grateful to all my friends at IISc who made my stay memorable and enjoy-
able.
i
Abstract
Deep learning techniques which employ trained neural networks to solve problems have
witnessed widespread adoption in diverse fields like medicine, architecture, robotics,
autonomous vehicles, wireless communications, and many more. This widespread adop-
tion of neural networks (NNs) is a result of significant advancements on both hardware
and software fronts. In wireless communications, deep learning techniques have been
extensively adopted for symbol detection, beam tracking, constellation design, optimal
resource allocation, channel estimation, and several other tasks. Learning based solutions
offered by well trained NNs have shown robustness to model mismatches/non-idealities
witnessed in communication systems. This thesis addresses the problem of channel es-
timation using deep learning techniques for different signalling schemes under various
channel conditions. Specifically, we consider learning based techniques for 1) channel
prediction in time-varying channels, 2) channel estimation in orthogonal frequency divi-
sion multiplexing (OFDM) systems in doubly-selective channels, 3) delay-Doppler (DD)
domain channel estimation for orthogonal time frequency space (OTFS) systems with
different pilot frame structures, and 4) DD channel estimation for Zak transform based
OTFS systems. The details of the contributions made in the thesis are summarised
below.
Deep channel prediction in time-varying channels: In the first part of the
thesis, we consider the problem of channel prediction in time-varying fading channels.
In time-varying fading channels, channel coefficients are estimated using pilot symbols
that are transmitted every coherence interval. For channels with high Dopplers, the
rapid channel variations over time will require these pilots to be transmitted often. We
propose a novel receiver architecture using deep recurrent neural networks (RNNs) that
learns the channel variations and, thereby, reduces the number of pilot symbols required
for channel estimation. Specifically, we design and train an RNN to learn the correlation
in the time-varying channel and predict the channel coefficients into the future with
good accuracy over a wide range of Dopplers and signal-to-noise ratios (SNRs). Also,
the robustness of prediction for different Dopplers and SNRs is achieved by adapting the
number of predictions into the future based on the Doppler and SNR. We also propose a
data decision driven receiver architecture using RNNs, wherein the data symbols detected
using the channel predictions are treated as pilots to enable more predictions, thereby the
pilot overhead is further reduced in every coherence interval. Numerical results show that
the proposed RNN based receiver achieves good bit error performance in time-varying
fading channels, while being spectrally efficient.
Channel estimation in OFDM in doubly-selective channels: In the second
part of the thesis, we consider the problem of channel estimation in doubly-selective
(i.e., time-selective and frequency-selective) channels in OFDM systems in the presence
of oscillator phase noise (PN). Methods reported in the literature to estimate the channel
incur significant overhead in terms of the number of training/pilot symbols needed to ef-
fectively estimate the channel in the presence of PN. We propose a learning based channel
estimation scheme for OFDM systems in the presence of both PN and doubly-selective
fading. We view the channel matrix as an image and model the channel estimation
problem as an image completion problem where the information about the image is
ii
Abstract iii
sparsely available. Towards this, we devise and employ two-dimensional convolutional

neural networks (CNNs) for learning and estimating the channel coefficients in the entire
time-frequency (TF) grid, based on pilots sparsely populated in the TF grid. Further,
using the estimated channel coefficients, we devise a simple and effective PN estimation
and compensation scheme. Our results demonstrate that the proposed network and the
PN compensation scheme achieve robust OFDM performance in the presence of PN and
doubly-selective fading.
DD domain channel estimation in OTFS: Unlike OFDM, which is not robust
to highly time-selective channels due to inter-carrier interference, OTFS modulation has
been shown to be robust to rapidly time-varying channels with high Doppler spreads
(in the order of kHz of Doppler). In OTFS, information symbols are multiplexed in
the DD domain and the channel is also viewed in the DD domain (as opposed to the
TF domain in OFDM). In the third part of the thesis, we consider the problem of DD
domain channel estimation in OTFS systems using deep learning techniques. Widely
considered pilot frame structures for DD channel estimation in OTFS include exclusive
pilot frame, embedded pilot frame, interleaved pilot frame, and superimposed pilot frame.
We devise suitable learning based architectures for channel estimation using these pilot
frames as detailed below. First, we propose a learning based architecture for estimating
the DD channel for both exclusive pilot frame and embedded pilot frame. The proposed
learning network, called DDNet, is based on a multi-layered RNN framework that works
seamlessly for both frames. Our results demonstrate that the proposed DDNet achieves
better mean square error (MSE) and bit error rate (BER) performance compared to
other schemes in the literature. Next, we consider DD channel estimation for interleaved
pilot (IP) frame, where pilot symbols are interleaved with data symbols in a lattice type
fashion, without any guard symbols. For this IP frame structure, we propose an RNN
based channel estimation scheme. The proposed network is called IPNet. Our results
show that the proposed IPNet architecture achieves good BER performance while being
spectrally efficient. The rate loss in the pilot frame structures considered above can
be avoided by superimposing pilot symbols over data symbols. We propose a sparse
superimposed pilot (SSP) scheme, where pilot and data symbols are superimposed in
a few bins and the remaining bins carry data symbols only. For the SSP scheme, we
propose an RNN based learning architecture (referred to as SSPNet) trained to provide
accurate channel estimates overcoming the leakage effects in channels with fractional
delays and Dopplers. Our results show that the proposed SSP scheme with the proposed
SSPNet based channel estimation performs better than a fully superimposed pilot (FSP)
scheme reported in the literature.
DD domain channel estimation in OTFS using TF domain learning: In
the fourth and final part of the thesis, we propose a novel learning based approach for
channel estimation in OTFS systems, where learning is done in the TF domain for DD
domain channel estimation. Learning in the TF domain is motivated by the fact that
the range of values in the TF channel matrix is favorable for training as opposed to
the large swing of values in the DD channel matrix which is not favourable for training.
A key beneficial outcome of the proposed approach is its low complexity along with
very good performance. We develop this TF learning approach for two types of OTFS
systems, namely, 1) two-step OTFS, where the information symbols in the DD domain
are converted to time domain for transmission in two steps (DD domain to TF domain
conversion followed by TF domain to time domain conversion), and 2) single-step OTFS
(also called as Zak OTFS), where the DD domain symbols are directly converted to time
domain in one step using inverse Zak transform. Our results show that the proposed TF
learning-based approach achieves almost the same performance as that of the state-of-
the-art algorithm, while being drastically less complex making it practically appealing.
iv Abstract
Abbreviations
2D : Two-dimensional
3GPP : Third-generation partnership project
4G : Fourth-generation
5G : Fifth-generation
6G : Sixth-generation
A/D : Analog-to-digital
ACF : Auto-correlation function
ADC : Analog-to-digital converter
AR : Auto-regressive
AWGN : Additive white Gaussian noise
BCE : Binary cross entropy
BER : Bit error rate
BS : Base station
CNN : Convolutional neural network
CP : Cyclic prefix
CPSC : Cyclic prefix single carrier
CSI : Channel state information
D/A : Digital-to-analog
DAC : Digital-to-analog converter
DD : Delay-Doppler
DDRE : Delay-Doppler resource element
DFT : Discrete Fourier transform
v
vi Abbreviations
DL : Deep learning
DTZT : Discrete time Zak transform
DZT : Discrete Zak transform
EPA : Extended pedestrian A
ETU : Extended typical urban
EVA : Extended vehicular A
FCNN : Fully connected neural network
FDM : Frequency division multiplexing
FFT : Fast Fourier transform
FLOPS : Floating-point operations per second
FSP : Full superimposed pilot
GPU : Graphical processor unit
GRU : Gated recursive unit
HPA : High power amplifier
ICI : Inter-carrier interference
IDFT : Inverse discrete Fourier transform
IDZT : Inverse discrete Zak transform
IFFT : Inverse fast Fourier transform
IMT : International Mobile Telecommunications
IP : Interleaved pilot
IPI : Inter-path interference
ISFFT : Inverse symplectic finite Fourier transform
ISI : Inter-symbol interference
ITU : International Telecommunications Union
LS : Least squares
LSTM : Long short-term memory
LTE : Long-term evolution
ML : Maximum likelihood
MMSE : Minimum mean square error
Abbreviations vii
MP : Message passing
MSE : Mean square error
NMSE : Normalized mean square error
NN : Neural network
OFDM : Orthogonal frequency division multiplexing
OTFS : Orthogonal time frequency space
P/S : Parallel-to-serial
PAPR : Peak-to-average power ratio
PCA : Principal component analysis
PDP : Power delay profile
PN : Phase noise
PSD : Power spectral density
PSK : Phase-shift keying
PTS : Partial transmit sequence
QAM : Quadrature amplitude modulation
ReLU : Rectified linear unit
RF : Radio frequency
RNN : Recurrent neural network
S/P : Serial-to-parallel
SFFT : Symplectic finite Fourier transform
SISO : Single-input single-output
SNR : Signal-to-noise ratio
SSP : Sparse superimposed pilot
TF : Time-frequency
TFLOPS : Tera floating-point operations per second
YW : Yule-Walker
Notations
(·)T : Transpose operator

(·)H : Conjugate transpose (Hermitian) operator
∈ : Belongs to
∥ · ∥p : p-norm of a vector (or a matrix)
⌊·⌋ : Flooring operator
⌈·⌉ : Ceiling operator
Ik : k × k identity matrix
|·| : Absolute value of a number (or cardinality of a set)
ℜ{·} : Real part of the argument
ℑ{·} : Imaginary part of the argument
E[·] : Expectation operator
vec(·) : Column-wise vectorization operator
⊙ : Element-wise multiplication operator (Hadamard product)
⊗ : Kronecker product
⊛ : Circular convolution
0m×n : All zero matrix of size m × n
CN (0, σ 2 ) : Circularly-symmetric complex Gaussian distribution with
variance σ 2
δ(·) : Dirac delta function
R : Set of all real numbers
Z : Set of all integers
R+ : Set of all non-negative real numbers
viii
Notations ix
Z+ : Set of all non-negative integers

FN : N -point unitary discrete Fourier transform matrix
(·)N : Modulo-N operation
1{·} : Indicator function
Contents
Acknowledgements i
Abstract ii
Abbreviations v
Notations viii
1 Introduction 1
1.1 Wireless channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Time-dispersive channels . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 Frequency-dispersive channels . . . . . . . . . . . . . . . . . . . . 4
1.2 Signalling in wireless channels . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.1 Conventional signalling . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.2 Orthogonal frequency division multiplexing . . . . . . . . . . . . . 9
1.2.3 Conventional OTFS modulation . . . . . . . . . . . . . . . . . . . 14
1.2.4 Discrete Zak transform based OTFS . . . . . . . . . . . . . . . . 19
1.3 Channel estimation in wireless channels . . . . . . . . . . . . . . . . . . . 26
1.3.1 Conventional signalling . . . . . . . . . . . . . . . . . . . . . . . . 26
1.3.2 OFDM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.3.3 OTFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1.4 Learning framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
1.4.1 Classification of machine learning programs . . . . . . . . . . . . 35
1.4.2 Difference between traditional program and machine learning pro-
gram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
1.4.3 Deep learning using neural networks . . . . . . . . . . . . . . . . 38
1.4.4 Common terminologies in deep learning framework . . . . . . . . 41
1.5 Contributions made in the thesis . . . . . . . . . . . . . . . . . . . . . . 43
1.5.1 Deep channel prediction . . . . . . . . . . . . . . . . . . . . . . . 44
x
CONTENTS xi
1.5.2 Learning based channel estimation in OFDM . . . . . . . . . . . . 44

1.5.3 Learning based DD channel estimation in OTFS . . . . . . . . . . 45
1.5.4 Learning in TF domain for DD channel estimation in OTFS . . . 47
1.6 Organization of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2 Deep channel prediction in time-varying channels 48

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.2 System model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.2.1 Deep neural networks . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.3 Proposed deep channel predictor . . . . . . . . . . . . . . . . . . . . . . . 53
2.3.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.3.2 Training methodology . . . . . . . . . . . . . . . . . . . . . . . . 58
2.3.3 Performance results . . . . . . . . . . . . . . . . . . . . . . . . . . 62
2.3.4 Adaptive channel prediction . . . . . . . . . . . . . . . . . . . . . 66
2.3.5 Comparison with linear prediction scheme . . . . . . . . . . . . . 72
2.3.6 Performance in 3GPP channel models . . . . . . . . . . . . . . . . 75
2.3.7 Block transmission in doubly-selective fading channel . . . . . . . 76
2.4 Data driven channel prediction . . . . . . . . . . . . . . . . . . . . . . . 78
2.4.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
2.4.2 Performance results . . . . . . . . . . . . . . . . . . . . . . . . . . 80
2.4.3 Comparison with NN-based prediction scheme in [86] . . . . . . . 81
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
3 Learning based channel estimation in OFDM systems 84

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.2 System model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.3 Proposed channel estimation and PN compensation . . . . . . . . . . . . 87
3.3.1 Proposed channel estimator network and training . . . . . . . . . 89
3.3.2 Proposed PN compensation algorithm . . . . . . . . . . . . . . . 90
3.4 Results and discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
3.4.1 Comparison with NN-based PN compensation scheme in [91] . . . 96
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4 Learning based DD channel estimation in OTFS systems 99

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.2 DD channel estimator for OTFS with embedded pilots . . . . . . . . . . 101
4.2.1 System model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
xii CONTENTS
4.2.2 DDNet - proposed RNN based DD channel estimator . . . . . . . 105

4.2.3 Results and discussions . . . . . . . . . . . . . . . . . . . . . . . . 109
4.3 DD channel estimation with interleaved pilots in OTFS . . . . . . . . . . 114
4.3.1 IPNet – proposed RNN based DD channel estimator . . . . . . . 115
4.4 Fractional DD channel estimation in OTFS with superimposed pilots . . 125
4.4.1 System model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
4.4.2 Proposed sparse superimposed pilot scheme . . . . . . . . . . . . 129
4.4.3 SSPNet - proposed DD channel estimator . . . . . . . . . . . . . 130
4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
5 Learning in TF domain for DD channel estimation in OTFS 141

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
5.2 Learning in TF domain for fractional DD channel estimation in OTFS . . 142
5.2.1 System model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
5.2.2 Proposed TF learning based DD channel estimation . . . . . . . . 144
5.3 DD channel estimation in DZT-OTFS via learning in TF domain . . . . 156
5.3.1 Proposed TF based learning approach using DNN . . . . . . . . . 158
5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
6 Conclusions 166
A Derivation of channel matrix with fractional DD and rectangular pulse171
B Derivation of input-output relation for two-step OTFS 173
List of publications from the thesis 179
Bibliography 182
List of Tables
2.1 Parameters of LSTM layer of channel predictor. . . . . . . . . . . . . . . 53

2.2 Parameters of FCNN layer of channel predictor. . . . . . . . . . . . . . . 56
2.3 Hyper-parameters used for training channel predictor. . . . . . . . . . . . 60
2.4 Tap profiles of 3GPP channel models. . . . . . . . . . . . . . . . . . . . . 74
2.5 Values of np , k, and Nc used for comparison with NN-based prediction
scheme in [86]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.1 Parameters of the 2D-CNN layers in the channel estimator network. . . . 88

3.2 Hyper-parameters used for training the channel estimator network. . . . 90
3.3 Phase noise PSD parameters. . . . . . . . . . . . . . . . . . . . . . . . . 93
4.1 Parameters of the DDNet architecture. . . . . . . . . . . . . . . . . . . . 107

4.2 Hyper-parameters used for training the DDNet. . . . . . . . . . . . . . . 108
4.3 Parameters of the IPNet architecture. . . . . . . . . . . . . . . . . . . . . 118
4.4 Hyper-parameters used for training the IPNet. . . . . . . . . . . . . . . . 119
4.5 Parameters of the SSPNet architecture. . . . . . . . . . . . . . . . . . . . 132
4.6 Hyper-parameters used for training the SSPNet. . . . . . . . . . . . . . . 133
5.1 Parameters of DNN1/DNN2. . . . . . . . . . . . . . . . . . . . . . . . . . 149

5.2 Hyper-parameters used while training. . . . . . . . . . . . . . . . . . . . 149
5.3 Run time complexities of computing r(τ, ν). . . . . . . . . . . . . . . . . 155
5.4 Hyper-parameters used while training. . . . . . . . . . . . . . . . . . . . 160
xiii
List of Figures
1.1 Channel representation when the receiver is moving at a velocity v. . . . 4

1.2 Jakes’ Doppler spectrum. . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Auto-correlation function of the channel gain in a channel with mobility. 7
1.4 Orthogonal subcarriers in the frequency domain in OFDM systems. . . . 9
1.5 Block diagram of an OFDM communication system. . . . . . . . . . . . . 10
1.6 Loss of orthogonality among subcarriers in OFDM. . . . . . . . . . . . . 13
1.7 BER as function of Doppler in an OFDM system. . . . . . . . . . . . . . 13
1.8 A doubly-selective channel represented in the TF domain. . . . . . . . . 15
1.9 A doubly-selective channel represented in the delay-Doppler domain. . . 15
1.10 Block diagram of OTFS modulation scheme. . . . . . . . . . . . . . . . . 16
1.11 Block diagram of DZT-OTFS modulation scheme. . . . . . . . . . . . . . 20
1.12 Block type pilot arrangement. . . . . . . . . . . . . . . . . . . . . . . . . 28
1.13 Comb type pilot arrangement. . . . . . . . . . . . . . . . . . . . . . . . . 29
1.14 Lattice type pilot arrangement. . . . . . . . . . . . . . . . . . . . . . . . 30
1.15 Magnitude response of the channel with integer delay-Doppler. . . . . . . 31
1.16 Magnitude response of the channel with fractional delay-Doppler. . . . . 32
1.17 A block diagram showing the difference between the traditional program
and an machine learning program. . . . . . . . . . . . . . . . . . . . . . . 37
1.18 Architecture of one layer of FCNN. . . . . . . . . . . . . . . . . . . . . . 39
1.19 Structure of the neuron. . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
1.20 A basic structure of RNN and time unfolding. . . . . . . . . . . . . . . . 40
1.21 Pictorial representation of CNN. . . . . . . . . . . . . . . . . . . . . . . . 41
2.1 Recurrent unit of the LSTM architecture. . . . . . . . . . . . . . . . . . . 52

2.2 Block diagram of the channel predictor neural network. . . . . . . . . . . 53
2.3 Mean square error of predictions made by predictor network for different
number of layers in the LSTM network. . . . . . . . . . . . . . . . . . . . 54
xiv
LIST OF FIGURES xv
2.4 Training and validation loss trajectory for 1-layer and 5-layers LSTM ar-
chitectures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.5 Block diagram of the channel predictor aided receiver. . . . . . . . . . . . 57
2.6 Comparison of training and validation loss trajectories as a function of
epochs with and without the training enhancement features. . . . . . . . 63
2.7 Mean square error of predictions as a function of number of predictions
for different SNR and fD values. . . . . . . . . . . . . . . . . . . . . . . . 63
2.8 BER Performance of the proposed channel predictor aided receiver with
ML decoder for fixed number of predictions (N =100, η = 90.9% and
N =10, η = 50%) at fD = 50, 100 Hz for 4-QAM and 16-QAM. . . . . . . 65
2.9 Achieved MSE performance of predictions as a function of number of
future predictions and Doppler for a given SNR of 10 dB and 20 dB. . . . 67
2.10 MSE performance of predictions as a function of fD for a given SNR of
10 dB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
2.11 Number of future predictions chosen by the prediction algorithm as a
function of SNR for different values of fD . . . . . . . . . . . . . . . . . . 69
2.12 BER performance of the proposed adaptive channel predictor aided re-
ceiver with ML decoder at fD = 50, 100 Hz for 4-QAM and 16-QAM. . . 70
2.13 BER performance comparison between the proposed adaptive scheme with
ML decoder and the benchmarking scheme with LMMSE channel estima-
tion and linear interpolation. . . . . . . . . . . . . . . . . . . . . . . . . . 71
2.14 BER performance comparison between the proposed channel predictor
aided receiver and the linear prediction aided receiver, both with ML
decoder. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
2.15 MSE performance of the proposed channel predictor network under vari-
ous 3GPP channel models in [81],[82] as a function of SNR and η. . . . . 74
2.16 BER performance of the proposed predictor network in a CPSC system
with NN-based ViterbiNet detector. . . . . . . . . . . . . . . . . . . . . . 77
2.17 Arrangement of pilot and data symbols in data driven channel prediction. 78
2.18 Block diagram of the proposed data driven channel prediction scheme. . . 79
2.19 MSE and BER performance of 1:k data decision driven channel prediction
scheme with ML decoder at fD = 50, 100 Hz for 16-QAM. . . . . . . . . 80
2.20 BER performance comparison between the proposed scheme and the NN-
based prediction scheme in [86], both with ML decoder. . . . . . . . . . . 82
3.1 Proposed CNN based channel estimator network and PN compensation

scheme. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
xvi LIST OF FIGURES
3.2 Proposed CNN based channel estimator network. . . . . . . . . . . . . . 89

3.3 MSE performance of the proposed channel estimator network without and
with the proposed PN compensation for different values of σPN . . . . . . . 94
3.4 BER performance comparison between the proposed channel estimator
network without and with the proposed PN compensation and the PN
compensation scheme in [64]. . . . . . . . . . . . . . . . . . . . . . . . . . 95
3.5 BER vs SNR performance comparison between the proposed scheme and
the PN compensation scheme in [91]. . . . . . . . . . . . . . . . . . . . . 96
3.6 BER vs pilot SNR performance comparison between the proposed scheme
and the PN compensation scheme in [91]. . . . . . . . . . . . . . . . . . . 97
4.1 OTFS modulation scheme. . . . . . . . . . . . . . . . . . . . . . . . . . . 101

4.2 Pilot, guard, and data symbol placements in exclusive and embedded pilot
frames. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
4.3 Proposed RNN based DDNet channel estimation scheme. . . . . . . . . . 105
4.4 Proposed RNN based DDNet architecture. . . . . . . . . . . . . . . . . . 106
4.5 Effect of number of LSTM layers, P , on the NMSE performance of the
proposed DDNet. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
4.6 NMSE vs spectral efficiency at different pilot SNRs. . . . . . . . . . . . . 111
4.7 NMSE performance comparison between the proposed DDNet and the
estimation schemes in [67] (exclusive pilot) and [68] (embedded pilot). . . 112
4.8 BER performance comparison between the proposed DDNet and the thresh-
olding scheme in [68] (embedded pilot). . . . . . . . . . . . . . . . . . . . 113
4.9 Pilot, guard, and data symbol placements in interleaved pilot and embed-
ded pilot frames. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
4.10 Proposed RNN based IPNet channel estimation scheme. . . . . . . . . . 115
4.11 Proposed RNN based IPNet architecture. . . . . . . . . . . . . . . . . . . 118
4.12 NMSE performance of the proposed IPNet as a function of pilot SNR for
different number of pilots. . . . . . . . . . . . . . . . . . . . . . . . . . . 121
4.13 BER of the proposed IPNet and the scheme in [68] as a function of number
of pilots, Np . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
4.14 BER performance comparison between the proposed IPNet with 12 pilots
and the scheme in [68] for 40 dB pilot SNR. . . . . . . . . . . . . . . . . 123
LIST OF FIGURES xvii
4.17 Pilot and data symbols placement in the proposed SSP scheme and the
FSP scheme in [71]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
4.18 Proposed RNN based channel estimation scheme. . . . . . . . . . . . . . 129
4.19 Energy distribution in FSP and SSP frames at the transmitter and receiver
for integer DD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
4.20 BER performance of the proposed SSPNet as a function of Np for integer
DD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
4.21 BER performance of the proposed SSPNet as a function of Np and σd2 for
integer DD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
4.22 MSE vs SNR performance of the proposed SSPNet compared with that
of the FSP scheme in [71] for integer DD. . . . . . . . . . . . . . . . . . . 137
4.23 BER vs SNR performance of the proposed SSPNet compared with that of
the FSP scheme in [71] for integer DD. . . . . . . . . . . . . . . . . . . . 138
4.24 MSE vs SNR performance of the proposed SSPNet compared with that
of the FSP scheme in [71] for fractional DD. . . . . . . . . . . . . . . . . 139
4.25 BER vs SNR performance of the proposed SSPNet compared with that of
the FSP scheme in [71] for fractional DD. . . . . . . . . . . . . . . . . . . 139
5.1 Proposed TF learning network architecture for learning R(τ , ν). . . . . . 147
5.2 Absolute values of training data in DD domain in dB scale. . . . . . . . . 150
5.3 Absolute values of training data in TF domain in dB scale. . . . . . . . . 151
5.4 NMSE performance comparison between training carried out in DD do-
main and TF domain. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
5.5 NMSE performance comparison between the proposed approach, the DDIPIC
algorithm in [74], and M-MLE algorithm in [98]. . . . . . . . . . . . . . . 153
5.6 BER performance comparison between the proposed approach, DDIPIC
algorithm in [74], M-MLE algorithm in [98], and perfect CSI. . . . . . . . 153
5.7 BER performance comparison between the proposed approach, DDIPIC
algorithm in [74], M-MLE algorithm in [98], and perfect CSI for VehA
channel model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
5.8 Proposed TF learning architecture for learning G matrix. . . . . . . . . . 159
5.9 Absolute values of training data in DD domain in dB scale. . . . . . . . . 162
5.10 Absolute values of training data in TF domain in dB scale. . . . . . . . . 162
5.11 NMSE performance comparison between DD domain learning and TF
domain learning in DZT-OTFS. . . . . . . . . . . . . . . . . . . . . . . . 163
5.12 BER performance comparison between DD domain learning and TF do-
main learning in DZT-OTFS. . . . . . . . . . . . . . . . . . . . . . . . . 164
xviii LIST OF FIGURES
5.13 NMSE performance of the algorithm in [74], M-MLE algorithm in [98],

and the proposed algorithm for different PDPs in DZT-OTFS. . . . . . . 164
5.14 BER performance of the algorithm in [74], M-MLE algorithm in [98], and
the proposed algorithm for different PDPs in DZT-OTFS. . . . . . . . . 165
Chapter 1
Introduction
Machine learning techniques employing trained neural networks (NNs) are being increas-
ingly adopted in diverse fields such as face recognition, gender recognition, medicine, dis-
tribution systems, load management, robotics, chemistry, among many others [1]. This
is fuelled by the rapid advancements in software [2], [3] and hardware. On the software
front, libraries like Pytorch [2] and Tensorflow [3] provide easily implementable functions
for NNs, where the complex operations involving updating the parameters of the NN are
inbuilt, making it easy for training and testing these NNs. On the hardware front, special
architectures, called graphical processor units (GPUs), have also undergone tremendous
growth. The state-of-the-art GPUs support up to 1012 floating point operations per sec-
ond. This advancement in hardware coupled with software that take advantage of the
hardware has led to widespread adoption of NNs. In wireless communications too, NN
based approaches have been extensively employed [4]-[9]. For example, deep learning
based approaches have been used for signal detection [4], beam tracking [5], channel pre-
diction [6], state estimation [8], and several other tasks. Learning based solutions offered
by well trained NNs are also robust to model mismatches/non-idealities witnessed in
communication systems [8].
In wireless communication systems, the information-bearing signal is transmitted
through a wireless channel. Estimating the effect of this channel on the signal at the
receiver is called channel estimation [10]-[14]. Towards this, at the transmitter, special
1
2 Chapter 1. Introduction
symbols called pilots are transmitted. The receiver uses the knowledge of these pilot
symbols and the received symbols corresponding to the pilot symbols to estimate the
channel. Accurate estimation of the channel is crucial for reliable detection of transmitted
data symbols. Achieving this in harsh and dynamically varying channel conditions is
challenging [14].
In this thesis, we address the problem of channel estimation using deep learning
techniques for different signalling schemes under various channel conditions. Before
presenting an introduction to machine learning, we first highlight some basics of wireless
channels, signalling in wireless channels, and channel estimation.
1.1 Wireless channels

Wireless channels can be classified as time-dispersive or frequency-dispersive or both, as
described below.
1.1.1 Time-dispersive channels
A signal transmitted from the transmitter to receiver has to pass through the channel.
The transmitted signal undergoes several reflections from objects like trees, buildings,
and vehicles, which are received at the receiver along several different paths. This phe-
nomenon is called multipath propagation [11]-[14]. The receiver receives a superposition
of the reflected signals from all the paths. Since the distance travelled by each reflected
path is different, each path experiences a different delay. Therefore, what the receiver
observes is a temporally dispersed version of the transmitted signal. Such propagation
channels are referred to as time-dispersive channels. For a time-dispersive channel, the
impulse response can be written as [11]
P
X −1
h(τ ) = ai δ(τ − τi ), (1.1)
i=0
1.1. Wireless channels 3
where ai is the attenuation corresponding to the ith path in the channel, τi is the delay
associated with the ith path, and P is the number of paths in the channel. An important
parameter of a multipath time-dispersive channel is its maximum delay spread, τmax ,
defined as as the difference in propagation time between the longest path and shortest
path, counting only the paths with significant energy, i.e.,
τmax = max |τi − τj |. (1.2)

i,j
If τmax is larger than the signalling interval, the symbols interfere with each other resulting
in inter-symbol interference (ISI). Signalling interval, Ts is dependent on the bandwidth,
1
W available for transmission through the relation Ts = W
. The effect of time-dispersive
channel at time t represented by (1.1) can be equivalently thought of as convolution in
delay. Using the convolution property of Fourier transform, in the frequency domain, the
dispersion in time has a multiplicative effect on the transmitted signal. Fourier transform
of (1.1) yields
P
X −1
H(f ) = ai e−j2πf τi . (1.3)
i=0
For a particular path, the phase is linear in f . However, there is a differential phase of
2πf (τi −τk ) for multiple paths, where i, k = 0, 1, · · · , P −1, i ̸= k. This differential phase
causes selective fading in frequency. Hence, the time-dispersive channels are also called
as frequency-selective channels. Note that, in (1.3), when the frequency (f ) changes by
1
2τmax
, the phase changes significantly. For example, when f = 0, the phase is 0, while at
1
f= 2τi
, the phase is −1. In general, the phase changes significantly when f changes by
1
a factor of 2τmax
. This parameter is defined as the coherence bandwidth, Wc , i.e.,
1
Wc = . (1.4)
2τmax
In other words, the frequency range over which the channel can be assumed to have a flat
frequency response is called the coherence bandwidth of the channel. The channel is said
Figure 1.1: Channel representation when the receiver is moving at a velocity v.
1 1 1 1
to be frequency-selective if τmax > Ts =⇒ τmax
< Ts
=⇒ 2τmax
< Ts
=⇒ Wc < W .
That is, the channel is frequency-selective if the coherence bandwidth of the channel is
less than the bandwidth required for transmission.
1.1.2 Frequency-dispersive channels
To demonstrate the behaviour of wireless channels when there is mobility, consider the
scenario depicted in Fig. 1.1. For simplicity, only one path (ith path, 0 ≤ i ≤ P − 1) is
shown in the figure. In practice, multiple copies of the transmitted signal are received
at the base station (BS) through multiple paths. The car is moving at a velocity v in a
direction that makes an angle θ with the line joining the car to the BS. The velocity along
this direction is v cos(θ). If τi is the initial delay of this path, then due to mobility, the
delay of the path is time-varying, i.e., this delay changes with time. This time variation
is captured as
v cos(θ)t
τi (t) = τi − , (1.5)
c
1.1. Wireless channels 5
where c = 3 × 108 m/s is the speed of light. Substituting (1.5) in (1.3), we have
P −1
v cos(θ)t
ai e−j2πf (τi − ),
X
H(f, t) = c
i=0
P −1
X v cos(θ)t
= ai e−j2πf τi ej2πf c . (1.6)
i=0
f v cos(θ)
Defining fD = c
as the Doppler shift of the path, (1.6) can be written as
P
X −1
H(f, t) = ai e−j2πf τi ej2πfD t . (1.7)
i=0
Note that the frequency response in (1.3) was not dependent on time, while under mo-
bility, the response becomes time-varying. This implies that a time-invariant channel
whose impulse response is represented by (1.1) becomes time-variant under mobility.
For such a channel, the signal observed at the receiver is dispersed in the frequency
domain. The channel in such scenarios is said to be frequency-dispersive. An important
parameter that characterizes these channels is the maximum Doppler spread, denoted
by νmax , which is defined as
νmax = max |νi − νj |, (1.8)

i,j
f vi cos(θi )
where νi = c
is the Doppler shift associated with the ith path and i, j = 0, 1, · · · , P −
1, i ̸= j, and θi is the angle between the direction of arrival of the ith path and the line
joining the user to the base station. Note that, there is a time-varying phase introduced
1
by mobility in (1.7). This phase varies significantly when time t changes by 4νmax
. For
1
example, in (1.7), when t = 0, time-varying phase = 0, and when t = 4fD
, time-varying
π π
phase = 2
=⇒ ej 2 = j. In general, the time-varying phase varies significantly when t
1
changes by a factor of 4νmax
. This parameter is defined as the coherence time, Tc , i.e.,
1
Tc = . (1.9)
4νmax
Figure 1.2: Jakes’ Doppler spectrum.
Alternatively, the time interval for which the channel remains almost constant is called as
the coherence time of the channel. Coherence time is inversely proportional to Doppler,
which means that a channel with large Doppler spreads rapidly vary in time, and vice
f v cos(θ) fv
versa. Recall, fD = c
= fDmax cos(θ), where fDmax = c
is the maximum value of
the Doppler spread. In practice, θ is a random variable which is uniformly distributed
between −π to π. The probability density function (PDF) over fD can be derived as
s 1 2 , |fD | < fDmax , which is referred to as the Jakes’ Doppler spectrum, S(fD )
max fD
πfD 1− f max
D
[11]. That is,
1
S(f ) = r 2 , |f | < fDmax , (1.10)
f
πfDmax 1− max
fD
is the power spectral density (PSD) of Doppler spread of the channel, which is a “U-
shaped” function with support between −fDmax and fDmax as shown in Fig. 1.2. The
auto-correlation function (ACF) of the channel (which is the inverse Fourier transform
1.2. Signalling in wireless channels 7
Figure 1.3: Auto-correlation function of the channel gain in a channel with mobility.
of PSD) can be shown to be
R(∆t) = J0 (2πfDmax ∆t), (1.11)
Rπ
where J0 (x) = 1
π 0
e−jx cos θ dθ is the zeroth order Bessel function of the first kind (see
Fig. 1.3), and ∆t is the variable of the ACF, which is also referred to as time lag.
Channels which are both time and frequency selective are referred to as doubly-
selective or doubly-dispersive channels.
1.2 Signalling in wireless channels

Signalling in wireless communication systems can be carried out using different wave-
forms. Orthogonal frequency division multiplexing (OFDM) uses a multicarrier wave-
form approach. OFDM was designed to tackle the problem of ISI in frequency-selective
channels. The signalling is carried out in the time-frequency (TF) domain. Under high
Doppler spreads, OFDM fails to perform well due to loss in orthogonality among sub-
carriers. Another approach is the orthogonal time frequency space (OTFS) modulation.
In OTFS, symbols are mounted in the DD domain, instead of the TF domain. The early
version of OTFS was built as an overlay on OFDM to cater to doubly-spread channels
and is robust to high Doppler spreads. Conventional OTFS transmitter converts the
signal in DD domain to TF domain and then to time domain for transmission. At the
receiver, similar inverse operations are carried out. A later version of OTFS uses inverse
Zak and Zak transforms to directly convert symbols in DD domain to time domain and
symbols in time domain to DD domain, respectively. This has both complexity and
performance advantages. In the following subsections, we explain each of the signalling
approaches in some detail.
1.2.1 Conventional signalling
In conventional signalling scheme, the transmission happens symbol by symbol. The

input-output relation can be written as
y = hx + n, (1.12)
where x is the transmitted symbol drawn from a modulation alphabet A (e.g., M -

QAM, M -PSK, M -PAM, etc., where M is the order of the modulation alphabet),
n ∼ CN (0, σ 2 ) is the additive white Gaussian noise (AWGN), h ∼ CN (0, 1) is the
complex channel gain, and y is the received symbol corresponding to x. Here, CN (0, σ 2 )
denotes the circularly symmetric complex normal Gaussian random variable with vari-
ance σ 2 . To estimate the channel gain h, a pre-determined symbol xp , called as the pilot
symbol, is transmitted. At the receiver, an estimate of h is obtained using the knowledge
of this pilot symbol. A least squares (LS) estimate of h is obtained as
ĥ = arg min∥y − hxp ∥2 . (1.13)

h
Although this scheme is simple and straightforward, it is ill suited for channels with high
time-selectivity or high Doppler spread. This is because the estimate of the channel needs
Figure 1.4: Orthogonal subcarriers in the frequency domain in OFDM systems.
to be updated often, implying more pilots and increase in latency. However, methods like
channel interpolation or channel prediction can help reduce the number of pilots. Under
frequency-selectivity, there is considerable ISI and the channel estimation and symbol
detection are more involved.
1.2.2 Orthogonal frequency division multiplexing
OFDM is a signalling scheme used for frequency-selective channels. In OFDM, symbols

are multiplexed onto orthogonal subcarriers, each spaced ∆f apart from each other.
Figure 1.4 shows such an arrangement. Although all the subcarriers overlap in the
frequency domain, they are resolvable because any two subcarriers are orthogonal to each
other. The input bitstream is divided into parallel streams and the information symbols
are multiplexed in each of these subcarriers [12]. If W is the total bandwidth available for
transmission, then we have for a frequency selective channel, the coherence bandwidth,
Wc < W . The idea of OFDM is to divide the total available bandwidth into M parts,
through the use of M subcarriers each with bandwidth ∆f , i.e., W = M ∆f , with
the constraint that ∆f < Wc . This ensures that symbols mounted on each subcarrier
Figure 1.5: Block diagram of an OFDM communication system.
experience frequency-flat fading.

The block diagram of an OFDM system is shown in Fig. 1.5. The input bits
are first mapped to modulation symbols, and then M such modulation symbols are
passed through the serial-to-parallel (S/P) converter to obtain X[0], X[1], · · · , X[M − 1].
The symbols are modulated over M subcarriers and sampled to obtain M symbols
denoted by x[0], x[1], · · · , x[M − 1]. Modulating the symbols over M subcarriers and
sampling is equivalent to taking inverse fast Fourier transform (IFFT), which is an ef-
ficient implementation of inverse discrete Fourier transform (IDFT), of the sequence
X[k], k = 0, 1, · · · , M − 1. This IDFT operation of converting X[k], k = 0, 1, · · · , M − 1
to x[n], n = 0, 1, · · · , M − 1 can be represented by
M −1
1 X j2πkn
x[n] = √ X[k]e M . (1.14)
M k=0
Alternately, the above expression can be expressed as
x = FH
M X, (1.15)
where X = [X[0] X[1] · · · X[M − 1]]T , x = [x[0] x[1] · · · x[M − 1]]T , and FM ∈ CM ×M
is the M -point unitary FFT matrix whose (n, k)th entry is given by
j2πnk
e− M
FM [n, k] = √ . (1.16)
M
Cyclic prefix is added to the time domain samples thus obtained. This is done to avoid
interference due to ISI between two OFDM symbols. After this, the resulting samples are
passed through a parallel-to-serial (P/S) converter followed by conversion from digital to
analog using a digital-to-analog (D/A) converter. The output of the D/A converter is the
OFDM baseband signal and is upconverted to a carrier frequency of fc and transmitted
through the channel. At the receiver, the signal is downconverted to baseband and
filtered to remove the high-frequency components. The resulting signal is then passed
through the analog-to-digital (A/D) converter which converts the signal to time samples.
Cyclic prefix is removed from these samples and fast Fourier transform (FFT) (efficient
implementation of discrete Fourier transform (DFT)) converts the time samples back to
frequency domain to get Y [k], k = 0, 1, · · · , M − 1. The input output relation between
the input symbols X[k], k = 0, 1, · · · , M − 1 and demodulated symbols Y [k], k =
0, 1, · · · , K − 1 is [11]
Y [k] = H[k]X[k] + V [k], k = 0, 1, · · · , M − 1, (1.17)
where H[k] is the frequency-flat fading channel gain associated with the kth subcarrier
and V [k] is the AWGN. From (1.17), it is observed that an OFDM system converts a
frequency-selective channel into a set of frequency-flat fading orthogonal subchannels
with different information symbols transmitted over each subchannel.
Advantages of OFDM
• An OFDM system transforms a frequency-selective channel into a set of parallel

frequency-flat fading channels. This allows for a single-tap equalizer which simpli-
fies equalizer design.
• Since all the subcarriers in OFDM overlap, while maintaining orthogonality, the
available bandwidth is more efficiently utilized when compared to frequency divi-
sion multiplexing (FDM).
• IFFT and FFT implementations of IDFT and DFT, respectively, are used in
OFDM. Therefore, the realization of OFDM systems is simple, cost-effective, and
involves low computational complexity.
Limitations of OFDM
• Although the equalizer is single-tap and easy to implement, OFDM does not extract
diversity (time diversity) offered by the multipath channel. Therefore, to extract
diversity in OFDM systems, coding is essential [15], [16].
• The performance of OFDM systems severely degrades when the orthogonality

among the subcarriers is lost [17]. Phase noise effects [18], [19], IQ imbalance,
and frequency/timing synchronization errors [20]-[24] can destroy the orthogonality
among the subcarriers in OFDM. Apart from these impairments at the transmit-
ter or receiver, Doppler shifts in the channel also cause loss of orthogonality [17].
The effect is that the subcarriers undergo inter-carrier interference (ICI). For the
transmit/receive impairments, compensation techniques can be used to improve
the performance of OFDM systems [25]-[27]. Figure 1.6 shows the phenomenon of
ICI when the orthogonality among the subcarriers is lost.
• Design of equalizers for OFDM systems in the presence of ICI is challenging when
the channel is doubly-selective. The performance with single-tap equalizer degrades
as the amount of ICI increases. This is observed in Fig. 1.7, where, as the Doppler
(fD ) increases, the amount of ICI increases and the bit error rate (BER) perfor-
mance degrades. In Fig. 1.7, perfect channel state information (CSI) is assumed
and single-tap equalization is performed at the receiver.
• OFDM systems suffer from high peak-to-average power ratio (PAPR). Large peaks
Figure 1.6: Loss of orthogonality among subcarriers in OFDM.
Figure 1.7: BER as function of Doppler in an OFDM system.

introduce degradation in performance when the signal passes through a non-linear

power amplifier in the radio frequency (RF) chains. Therefore, sophisticated tech-
niques are required to achieve desired PAPR in OFDM systems [28] - [30].
In order to address the degradation in performance of OFDM under doubly-selective

channels, OTFS was introduced [31]-[34]. In the next subsection, we describe the OTFS
modulation.
1.2.3 Conventional OTFS modulation
OTFS is a two-dimensional (2D) modulation technique that multiplexes information

symbols in the delay-Doppler (DD) domain as opposed to the TF domain. Further, in
OTFS, the channel is also represented in the DD domain. The information symbols are
mounted on a DD grid, with each symbol occupying a DD resource element (DDRE).
These symbols in the DD domain undergo 2D periodic convolution with the channel re-
sponse in the DD domain such that each information symbol experiences a near-constant
channel gain even in rapidly time-varying wireless channels [31]-[34]. This behaviour can
be exploited in channel estimation as the channel in the DD domain remains almost
same for a longer duration. Another advantage is the sparsity of the channel in the DD
domain. Compared to the TF representation, the DD representation is sparse, which
implies lower channel estimation complexity. This is demonstrated in Figs. 1.8 and 1.9,
where the doubly-selective channel is represented in the TF domain and DD domain,
respectively. Channel estimation in TF domain is much more complex, as it involves the
estimation of channel gain in all the bins in the TF grid, whereas it is less complex in
the DD domain due to its sparsity.
Figure 1.10 shows the block diagram of OTFS modulation scheme with M delay bins
and N Doppler bins. At the transmitter, M N information symbols (drawn from an
alphabet A, e.g., |A|-QAM, |A|-PSK), x[k, l], k = 0, 1, · · · , N − 1, l = 0, 1, · · · , M − 1 in
the DD domain are mapped to TF domain to obtain X[n, m], n = 0, 1, · · · , N − 1, m =
Figure 1.8: A doubly-selective channel represented in the TF domain.
Figure 1.9: A doubly-selective channel represented in the delay-Doppler domain.

Figure 1.10: Block diagram of OTFS modulation scheme.
0, 1, · · · , M −1 using 2D inverse symplectic finite Fourier transform (ISFFT). These sym-

1
bols occupy a bandwidth of M ∆f and duration of N T , where ∆f = T
is the subcarrier
spacing. Subsequently, this TF signal is transformed into a time domain signal x(t) for
transmission using the Heisenberg transform. This is then passed through the channel
to obtain the received signal y(t) at the receiver. At the receiver, y(t) is converted back
to TF domain to obtain Y [n, m] through Wigner (inverse of Heisenberg) transform. The
TF signal Y [n, m] thus obtained is finally converted back to DD domain using symplectic
finite Fourier transform (SFFT) for demodulation. The operation corresponding to each
block in Fig. 1.10 is detailed below.
• Conversion to TF domain from DD domain

Conversion from DD domain to TF domain is carried out using the ISFFT oper-
ation. ISFFT is a 2D transform, which consists of a Fourier transform along one
dimension and an inverse Fourier along the other dimension. Specifically, delay di-
mension is converted to frequency dimension using Fourier transform and Doppler
dimension is converted to time dimension using inverse Fourier transform [35]. This
operation is represented as
N −1 M −1
x[k, l]e−j2π( M − N ) ,
X X lm kn
X[n, m] = (1.18)
k=0 l=0
where n = 0, 1, · · · , N −1 is the time index and m = 0, 1, · · · , M −1 is the frequency

(subcarrier) index.
• Conversion from TF domain to time domain for transmission

The TF domain symbols are converted to a continuous time domain signal using
the Heisenberg transform. This transform involves two steps, viz., mounting the
symbols on time domain pulses (ptx ) and frequency modulation. Mathematically,
Heisenberg transform operation is represented as
N
X −1 M
X −1
x(t) = X[n, m]ptx (t − nT )ej2πm∆f (t−nT ) , (1.19)
n=0 m=0
1
where T = ∆f
. Index m in the summation indicates the subcarrier index, and, for
a fixed n, the symbols X[n, m] are placed along frequencies given by m∆f (see the
exponent term in the summation). Likewise, for a fixed m, the symbols are placed
on transmit pulse that are shifted by nT (given by ptx (t − nT )).
• Transmission through doubly-selective channel

The time domain signal x(t) is passed through a doubly-selective channel. In
the DD domain, the channel contains P paths that are centered at τi and νi ,
i = 0, 1, · · · , P − 1, with channel gains αi , i.e.,
P
X −1
h(τ, ν) = αi δ(τ − τi )δ(ν − νi ), (1.20)
i=0
where δ denotes the Dirac delta. The corresponding time domain channel impulse
response is obtained as
P
X −1
h(t) = αi δ(t − τi )ej2πνi (t−τi ) . (1.21)
i=0
li ki
Here, τi = M ∆f
, νi = NT
, and li and ki are the delay and Doppler indices, respec-
tively. Depending the values taken by li and ki , the channel is characterized to
either have fractional (when ki ∈ R and li ∈ R+ ) delay-Doppler or integer (when
ki ∈ Z and li ∈ Z+ ) delay-Doppler values.
• Conversion of the received time domain signal to TF domain

At the reciever, thermal noise (AWGN), w(t), is introduced. The received signal
thus becomes
P
X −1
y(t) = αi x(t − τi )ej2πνi (t−τi ) + w(t). (1.22)
i=0
This received signal is then converted to the TF domain, using the Wigner trans-
form as
Z
Y [n, m] = y(t)prx (t − nT )e−j2πm∆f (t−nT ) dt, (1.23)
t
where prx (t) is the receive pulse. Wigner transform carries out inverse operations
of the two steps in Heisenberg transform. First, it demodulates the signal, and
second it match filters the transmit pulse (ptx (t)) with the receive (prx (t)) pulse.
• Conversion from TF domain to DD domain

Finally, the TF domain symbols are converted to DD domain using the SFFT op-
eration. Like ISFFT, SFFT can also be viewed as the combination of a Fourier
transform (along time dimension) and an inverse Fourier transform (along fre-
quency dimension). The received DD domain signal after SFFT operation is given
by
N −1 M −1
Y [n, m]e−j2π( N − M ) .
X X nk ml
y[k, l] = (1.24)
n=0 m=0
y[k, l] is used for symbol detection.
Depending on the whether ki , li are integer or fractional and the choices of ptx (t) and
prx (t), different input-output relations can be derived for the OTFS system using equa-
tions (1.18)-(1.24). In addition, if the processing at the receiver is carried out on the
continuous received signal y(t) (as in (1.23)), the OTFS input-output relation is referred
to as continuous-time OTFS (considered in Chapter 5). On the other hand, if the y(t)
is sampled and this time-domain sequence is converted to DD domain, we end up with a
different input-output relation which is referred to as discrete-time OTFS (considered in
Chapter 4). In practice, discrete-time OTFS is easier to implement and takes less time
for channel matrix generation.
As seen in the above, conversion from DD domain to time domain is carried out
in two steps, viz., conversion from DD domain to TF domain (using ISFFT) and TF
domain to time domain (using Heisenberg transform). At the receiver, the corresponding
inverse operations (Wigner and SFFT) are also carried out in two steps. A more efficient
way of implementation would be to convert from DD domain to time domain directly
without going into the intermediate TF domain. At the receiver too, the received time
domain signal is directly converted to DD domain for detection. This is possible through
a transform called as Zak transform. Inverse Zak transform converts DD symbols to
time domain and the inverse operation is carried out by the Zak transform. Parallel
to the discrete implementation of OFDM, there exists a discrete implementation of Zak
based OTFS, which employes inverse discrete Zak transform (IDZT) at the transmitter
and disctete Zak transform (DZT) at the receiver. We refer to this DZT based OTFS as
DZT-OTFS and the details of DZT-OTFS are presented in the next subsection.
1.2.4 Discrete Zak transform based OTFS
The continuous Zak transform is a mapping of a continuous time signal onto a 2D

function. Implicit usage of the Zak transform was introduced by Gauss [36]. Later, it
was J. Zak who formally introduced the transform in [37] and after whom the transform
is named. Next, Janssen [38] showed the Zak transform from a signal theoretic point of
view. H. Bolcskei and F. Hlawatsch in [39] provided the corresponding discrete version
of Zak transform, namely, the discrete time Zak transform (DTZT) and the discrete Zak
transform (DZT).
DZT-OTFS, like conventional OTFS, is also a 2D modulation technique. Information
symbols multiplexed on a 2D grid in the DD domain are converted to time domain using
Figure 1.11: Block diagram of DZT-OTFS modulation scheme.
inverse Zak transform. Cyclic prefix (CP) is added to the time domain samples and then
these samples are mounted on time domain pulses for transmission. At the receiver,
a matched filter (MF) operation is carried out, which is then followed by sampling to
obtain time domain samples and removal of CP. The time domain samples are then
converted back to DD domain for detection.
Figure 1.11 shows the block diagram of the DZT-OTFS modulation scheme. Infor-
mation symbols Zx ∈ CM ×N drawn from an alphabet A are converted to time domain
using IDZT, which is followed by vectorization to obtain x ∈ CM N ×1 and addition of CP
to obtain s. The digital-to-analog converter (DAC) converts the time domain samples
to time domain signal by mounting each symbol on a time domain pulse to obtain s(t).
s(t) is passed through a doubly-selective channel and the received time domain signal is
r(t). At the receiver, r(t) is match filtered with the receive pulse and sampled to obtain
the time domain samples, from which CP is removed to obtain y ∈ CM N ×1 . Finally,
the time domain samples are converted to DD domain using the DZT. The operations
involved in each block in Fig. 1.11 is detailed below.
• Conversion from DD domain samples to time domain samples

Conversion from DD domain to time domain is carried out through IDZT trans-
form. IDZT is a 2D (Zx ∈ CM ×N ) to 1D (x ∈ CM N ×1 ) transform, which can be
separated into two operations. First, the inverse Fourier transform along Doppler
to convert it to time [35], and, second, vectorization of the delay-time matrix into
a time domain vector. Mathematically, this can be represented as [40]
N −1
1 X ns
x[m + nM ] = √ Zx [m, s]ej2π N , (1.25)
N s=0
where m = 0, 1, · · · , M − 1 and n = 0, 1, · · · , N − 1. The above operation can also

be written in terms of Fourier matrix as
x = vec Zx FH

N , (1.26)
where FN is the N -point unitary DFT matrix and vec(·) represents the column-wise
vectorization operation.
• Conversion from time domain samples to time domain signal

The time domain vector x is then converted to time domain signal s(t) by adding
CP and mounting it on a time domain transmit pulse. CP is added to overcome
the effect of ISI between two transmitted frames, and is given as

x[(u)M N ], −LCP ≤ u ≤ M N − 1

s[u] = (1.27)
0,

otherwise,
where LCP is the length of CP and (·)M N denotes the modulo-M N operation. s is
converted into a time domain signal s(t) using the transmit pulse g(t) as
N −1
MX
s(t) = s[u]g(t − uTs ), (1.28)
u=−LCP
1
where Ts ≥ B
, B is the total bandwidth available for transmission.
• Transmission through a doubly-selective channel

The time domain signal s(t) is transmitted through the channel. The channel
as I paths with each path having channel gain αi , delay τi , and Doppler νi , i =
1, 2, · · · , I. In the DD domain the channel is represented as (similar to (1.20))
I
X
h(τ, ν) = αi δ(τ − τi )δ(ν − νi ), (1.29)
i=1
and corresponding time domain representation is (similar to (1.21))
I
X
h(t) = αi δ(t − τi )ej2πνi (t−τi ) . (1.30)
i=1
τi
The Doppler and delay indices are ki = νi M N Ts ∈ R, and li = Ts
∈ R+ , respec-
tively, are both assumed to take fractional values.
• Conversion of received time domain signal to time domain samples

After transmission through the channel, the received time domain signal r(t) is
given by
I
X
r(t) = αi s(t − τi )ej2πνi (t−τi ) + w(t), (1.31)
i=1
where w(t) is the thermal noise (AWGN) introduced at the receiver. A matched
filtering operation is carried out with the received pulse (taken to be the same as
the transmit pulse) as
Z ∞
y(t) = r(τ )g ∗ (τ − t)dτ. (1.32)
−∞
The obtained y(t) is sampled at t = vTs instants, followed by removal of CP to get

the vector y ∈ CM N ×1 .
• Conversion from time domain samples to DD domain samples

The time domain vector y is converted to DD domain using DZT. DZT is a 1D (y ∈
CM N ×1 ) to 2D (Zy ∈ CM N ×1 ) transform, which, again, is a set of two operations.
Conversion of the vector into a matrix followed by Fourier transform along the time
axis. Mathematically,
N −1
1 X −j2πnk
Zy [m, n] = √ y[m + kM ]e N . (1.33)
N k=0
Zy is used for detection at the receiver.
Derivation of input-output relation for DZT-OTFS
Using (1.28) and (1.31) in (1.32), we have
I
X N −1
MX Z ∞
y(t) = αi s[u] g(τ − uTs − τi )g ∗ (τ − t)ej2πνi τ dτ + w̃(t), (1.34)
i=1 u=−LCP −∞
where w̃(t) is the match filtered noise. Assuming that the maximum Doppler, max {νi },
i
∗
R
is much less than the bandwidth of the pulse, and denoting f (t) = g(τ )g (τ − t)dτ ,
y(t) can be approximated as [40]
I
X N −1
MX
j2πτi νi
y(t) ≈ αi e s[u]ej2πνi uTs f (t − uTs − τi ) + w̃(t). (1.35)
i=1 u=−LCP
For the considered g(t), f (t) can be approximately bounded to finite duration in time
[40]. The signal y(t) is sampled at rate 1/Ts to obtain the discrete signal
I
X N −1
MX
j2πτi νi
y[v] = αi e s[u]ej2πuνi Ts fi [v − u] + w̃[v], (1.36)
i=1 u=−LCP
where fi (u) = f (uTs − τi ) is assumed to have a finite support satisfying the condition
that the range of the support is much less than M N . Removing the CP, (1.36) can be
approximated as
I N −1
MX
X li ki ki
y[v] ≈ αi ej2π M N s[u]ej2πu M N f̃i [v − u] + w̃[v], (1.37)
i=1 u=0
where f̃i [u] is the periodic version of fi [u] with period M N , ki = νi M N Ts ∈ R, and
τi
li = Ts
∈ R+ . Equation (1.37) can be expressed as
I
l k
αi ej2π M N [(x ⊙ vi ) ⊛ f˜i ] + w̃,
X i i
y= (1.38)
i=1
ki
where vi [u] = ej2πu M N , x ⊙ vi denotes the element-wise product of x and vi , and ⊛ is the
circular convolution operator. The vector y is transformed to DD domain using DZT to
obtain Zy as [40]
N −1
1 X −j2πnk
Zy [m, n] = √ y[m + kM ]e N . (1.39)
N k=0
Substituting y in (1.39) and using modulation and circular convolution properties of

DZT [39], Zy can be written as
I
X
Zy = αi ej2πτi νi Zyi + w, (1.40)
i=1
where
−1 −1
M N
!
X X
Zyi [m, n] = Zx [l, k]Zvi [l, n − k] Zf˜i [m − l, n], (1.41)
l=0 k=0
and Zvi and Zf˜i are Zak transforms of vi and f̃i , respectively.
Derivation of vectorized form of the input-output relation
Let zy , zyi , zx denote the vectorized forms of Zy , Zyi , Zx , respectively, i.e., the (nM +m)th
element in the vector is the [m, n]th entry in the corresponding matrix. The vectorized
form of input-output relation between zyi and zx is derived as follows.
Let A ∈ CM ×N and B ∈ C2M −1×N be two matrices with entries A[m, n] = Zvi [m, n]
and B[m, n] = Zf˜i [m − (M − 1), n], m = 0, · · · , M − 1, n = 0, · · · , N − 1. Also, let
RN ∈ CN ×N be a reversal matrix and PN be a basic circulant permutation matrix of
(i)′
size N [41]. Define a matrix Hq ∈ CM ×N as

A[m, n], if m = (q)M

′
H(i)
q [m, n] = (1.42)
0,

otherwise,
(i)
for q = 0, 1, · · · , M N − 1. Here, (·)M denotes the modulo-M operation. Let H1 ∈
q
(i)′ ⌊ ⌋+1
CM N ×M N be a matrix whose qth row is filled with vec(Hq RN PNM ), where ⌊·⌋ denotes
(i)′′
the floor operator. Define Hq ∈ CM ×N as

B[m + (q)M , n], if n = ⌊ q ⌋

′′ M
H(i)
q [m, n] = (1.43)
0,

otherwise.
(i) (i)′′
Also, define H2 ∈ CM N ×M N whose qth row is filled with vec(RM Hq ). Finally, (1.41)
and (1.40) can be vectorized as
(i) (i)
zyi = H2 H1 zx (1.44)
and
I
X li ki
zy = αi ej2π M N zyi , (1.45)
i=1
(i)
respectively. Here, the matrix H1 effectively carries out element-wise multiplication
(i)
with vi and H2 carries out the circular convolution with f̃i in (1.38).
In all the signalling techniques presented above, the transmission is through a fading
channel which distorts the transmitted signal. For reliable detection of information
symbols at the receiver, this needs to be taken care of by undoing the effects of the
channel, which is referred to as channel equalization. Channel equalization requires the
estimates of the channel gains. This procedure of estimating the channel gains through
transmission of special symbols called pilot symbols is referred to as channel estimation.
In the next section, a brief discussion on channel estimation is presented.
1.3 Channel estimation in wireless channels

The received signal is distorted by the fading channel. In order to recover the transmit-
ted information symbols, the channel effect must be estimated and compensated in the
receiver.
1.3.1 Conventional signalling
In conventional signalling scheme described in Section 1.2.1, channel estimation is carried

out by sending a pilot symbol xp ∈ C. The knowledge of the pilot symbol is known at
the receiver and the received symbol yp is used for estimating the channel at the receiver.
The received symbol can be written as (using (1.12))
yp = hxp + n, (1.46)
where h is the channel gain seen by the pilot symbol xp and n is the additive noise. A
least squares (LS) estimate is obtained by
ĥLS = arg min ∥yp − hxp ∥22 , (1.47)

h
which on simplification provides the estimate as [42]
yp
ĥLS = . (1.48)
xp
To further improve this estimate, a (linear) minimum mean square error (MMSE) esti-
mate is defined by introducing a weight w, such that
ĥMMSE = wĥLS . (1.49)

1.3. Channel estimation in wireless channels 27
For a linear MMSE estimator, the error (between the MMSE estimate and the true value)
and the LS estimates are orthogonal to each other [43]. This is used to obtain w, i.e.,
h i
E (h − ĥMMSE )ĥH
LS = 0. (1.50)
Solving the above equation, w is obtained as (under the assumption that E[h] = 0 and
E[h2 ] = 1 [11])
|xp |2
w= , (1.51)
|xp |2 + σ 2
where σ 2 = E[n2 ] is the variance of noise n with E[n] = 0, and | · | denotes the absolute
value of a complex number. Therefore, the MMSE estimate is given by
|xp |2 yp
ĥMMSE = . (1.52)
|xp |2 + σ 2 xp
The estimated channel can be used for detection over the coherence time of the channel.
1.3.2 OFDM
Channel estimation in OFDM systems are carried out typically using more than one
pilot. Recall, that the effective input-output relation in OFDM is (from (1.17))
Y [k] = H[k]X[k] + V [k], k = 0, 1, · · · , M − 1.
Note that one transmit symbol only affects one receive symbol. Therefore, to obtain the
estimate of H[k] it is sufficient to know the tuple (Y [k], X[k]). This information can
be obtained by transmitting pilots at specific locations in the OFDM frame. Based on
the positions of the pilot symbols, three types of pilot structures are commonly used for
OFDM systems, namely, the block type pilot, comb type pilot, and lattice type pilot
[44]-[46]. Each of the pilot structures is described below.
• Block type pilot arrangement

Pilot symbol
Data symbol
Frequency
Time
Figure 1.12: Block type pilot arrangement.
Figure 1.12 shows a block type pilot arrangement. In this arrangement, OFDM
symbols with pilots at all subcarriers are transmitted periodically (with period
St ) for channel estimation. At the receiver, channel estimates are obtained at all
the pilot locations and a time domain interpolation is performed to estimate the
channel along time axis. St is chosen in accordance with the coherence time of
the channel. Since coherence time is inversely related to maximum Doppler spread
(νmax ) of the channel, St must be chosen such that
1
St ≤ . (1.53)
νmax
Block type pilot arrangement is especially suitable for slow-fading channel with
small values of νmax . Although this pilot arrangement is suitable for frequency-
selective channels (as all the subcarriers carry pilot symbols), for time-varying
channels, St would need to be very small which incurs significant overhead to track
the channel.
• Comb type pilot arrangement

Pilot symbol
Data symbol
Frequency
Time
Figure 1.13: Comb type pilot arrangement.
Comb type pilot arrangement is shown in Fig. 1.13. Each OFDM symbol has pilot
tones placed periodically along the subcarriers. The estimates obtained at these
locations are interpolated along the frequency axis. In order to track the channel
effectively, the separation between the pilots, Sf , must be chosen in accordance
with the coherence bandwidth of the channel. Sf can be related to the delay
spread τmax (which is inversely related to coherence bandwidth) of the channel as
1
Sf ≤ . (1.54)
τmax
As opposed to block type pilot arrangement, comb type pilot arrangement is suit-
able for time-varying channels. However, this pilot arrangement is ill suited for
frequency-selective channel (only a few subcarriers in each OFDM symbol have
pilot symbols).
• Lattice type pilot arrangement

Figure 1.14 shows the lattice type pilot arrangement. In this arrangement, pilot
symbols are inserted along both the time and frequency axis. This distribution of
Pilot symbol
Data symbol
Frequency
Time
Figure 1.14: Lattice type pilot arrangement.
pilot symbols facilitates both time and frequency domain interpolations for channel
estimation. However, the downside is that, as opposed to the earlier arrangements
which required 1D interpolation either along time or frequency, this arrangement
requires 2D interpolation along both time and frequency, which may be computa-
tionally expensive. The pilots are spaced St time slots apart and Sf subcarriers
apart from each other. Like for the previous pilot arrangements, St and Sf should
simultaneously support
1 1
St ≤ and Sf ≤ . (1.55)
νmax τmax
This pilot arrangement is suitable for both frequency-selective and time-selective

channels. Note that if νmax is too high, St would become too low, and the pilot
symbols would have to be placed closer to each other, which would result in poor
throughput.
Figure 1.15: Magnitude response of the channel with integer delay-Doppler.
1.3.3 OTFS
Channel estimation in OTFS (both conventional and DZT-OTFS) is carried out using
pilot frames. Depending on the number of pilot symbols, data symbols, and guard
symbols (taken to be zeros to prevent interference), three commonly used pilot frame
structures are exclusive pilot frame, embedded pilot frame, and the superimposed pilot
frame. Recall that the DD channel is represented as (from (1.20))
P
X −1
h(τ, ν) = αi δ(τ − τi )δ(ν − νi ).
i=0
A symbol transmitted at (τp , νp ) location in the DD grid would be received at P locations1

at the receiver given by (τp + τi , νp + νi ), i = 0, 1, · · · , P − 1.
1
P copies of the transmitted symbols are received at the receiver if all the P paths are resolved in
the DD domain. If two or more paths have the same delay and Doppler values, they merge to form a
single path, in which case number of copies of the transmitted signal received is less than P .
Figure 1.16: Magnitude response of the channel with fractional delay-Doppler.
Fractional DD vs integer DD
Figure 1.15 shows the magnitude response of the channel with integer DD values. The
channel considered has 4 paths which is characterized by the four distinct peaks seen in
the figure. Further, all the paths considered here are resolved. It is seen that under the
integer DD scenario, the DD spread due to each path is negligible and all the paths are
well localized. This implies that for every symbol transmitted, there are P = 4 copies
of this transmitted symbol received at the receiver. This is not the case with fractional
DD values. Figure 1.16 shows the magnitude response of the channel with fractional DD
values. All the peaks are still centered around the corresponding integer DD values. It
is seen that all the paths spread in the DD domain and the paths are no longer well
localized. Further, two or more paths may interfere with each other significantly when
they are close. The effect of such a channel is that every symbol transmitted is received
at multiple locations at the receiver and channel estimation with fractional DD is more
involved.
Channel estimation in OFDM vs OTFS
Channel estimation in OTFS is unlike OFDM in the following ways. First, the estimation
is carried out in the DD domain rather than the frequency domain in OFDM. Second,
the estimation in OTFS involves estimation of parameters of P paths while in OFDM
the estimation is carried out at the location of pilot symbols followed by interpolation
to obtain all the values in the TF grid. Third, the estimation in OTFS, for each path,
involves estimation of channel gain, delay, and Doppler, while in OFDM only channel
gain needs to be estimated. Fourth, in OTFS for a P path (integer DD) channel, each
transmitted symbol is received at P locations in the DD grid while in OFDM one trans-
mitted symbol corresponds to one received symbol, and therefore a simple operation
yields the channel estimate.
Pilot frame structures in OTFS
Since there is a “spread” of the transmitted pilot symbol into the neighbouring DD bins
(because of multipath effects), there is interference between symbols in a frame. To
alleviate this, either an exclusive pilot frame, which has a pilot symbol at the center
and zeros elsewhere, or an embedded pilot frame, which has a pilot symbol surrounded
by guard bins (zeros) which are then surrounded by data symbols are used. Although
exclusive pilot frame has no interference, it results in poor throughput. Embedded pilot
frame is more spectrally efficient when compared to the exclusive pilot frame, but it
suffers from interference when fractional DD values are encountered in the channel. Su-
perimposed pilot frame has full spectral efficiency as the pilot symbols are superimposed
on the data symbols that are placed in all the DD bins. The drawback is that these
frames suffer significant interference from pilot to data and data to pilot, and iterative
methods are necessary to improve channel estimation accuracy.
Conventional channel estimation schemes rely heavily on the assumed model of the
channel. However, in practice, the channel may exhibit behaviour that deviates from
the assumed model. Under such situations, conventional schemes may perform poorly.
Machine learning algorithms, on the other hand, can learn the channel using training
data regardless of the underlying model. The training data which consists of different
noise and channel realizations helps the machine learning algorithms to generalize. Since
the estimation is done without assuming the channel model, these algorithms can be
robust to variations in the channel model. This provides the motivation to experiment
and exploit machine learning algorithms for the purpose of channel estimation in wireless
communication systems. A brief machine learning primer is presented in the next section.
1.4 Learning framework

Learning is innate to humans. Right from birth, we learn to crawl and then to walk
accumulating experience from the previous endeavours and course correcting each time
something does not work out as planned. Inspired by the self taught learning approaches
of humans, Arthur Samuel coined the term “machine learning” in 1959 at IBM [47]. He
defined machine learning as “the field of study that gives computers the ability to learn
without being explicitly programmed”. Stated another way, machine learning can be
viewed as a problem of construction of computer programs that automatically improve
with experience, E. Machine learning largely involves models that have trainable (tune-
able) parameters that need to be optimized, using data referred to as training data.
Typically, a metric or cost function or a performance criterion is used to arrive at the
optimum values of these parameters. A typical machine learning problem contains three
main parts. First, objective of the machine learning program denoted by O, second, the
performance criterion given by C, and, third, the training data or experience represented
by D. For example, for a machine learning program employed to classify images of
handwritten digits between 0 - 9, the O, C, and D are defined as follows [48]:
• O: Classify a given image into one of the ten classes labelled from 0 - 9.
• C: Accuracy of classification or percentage of the images correctly classified.
• D: Images of the handwritten digits with their corresponding labels.

1.4. Learning framework 35
A program that learns from E is called as a machine learning program or simply a

learning program.
1.4.1 Classification of machine learning programs
Machine learning programs can be classified based on 1) the structure of training dataset
D, and 2) the nature of the objective O.
Classification based on training dataset
Depending on how the training data D is generated any machine learning program can
be classified into the following four categories [49].
• Supervised learning
In this type of machine learning programs, corresponding to each input there is
an output in the training dataset, D. The training is “supervised” using the
labels available in D. Classification and regression tasks fall under the category of
supervised learning. An example for supervised learning is as follows. Objective
O is the classification of images of birds into three categories, say, parrot, peacock,
and sparrow. Training data, D, consists of images of birds belonging to each of
the three classes. Corresponding to each image, there is a label provided, that
describes if the particular image is a parrot, peacock, or a sparrow. The labels are
used in the criterion C while training.
• Unsupervised learning
This type of learning programs are used to draw inferences from input data without
labels in the training data. The goal in such learning programs is to cluster different
data points into the same class. Consider the task (O) of clustering pictures of birds
and dogs into two clusters. Training data (D) consists of images and dogs without
having the labels for each image. The goal is to make clusters of bird images and
dog images, i.e., recognize the images are from the same class.
• Semi-supervised learning
This subclass of learning programs falls in between the two extremes of supervised
learning and unsupervised learning. Here, some of the data have labels, while the
majority do not. The training data can be thought of as “incomplete” in terms of
labels. This is chosen when the dataset has large amount of data but only few are
labelled due to the cost and effort involved in labelling.
• Reinforcement learning
In this type of learning, there is no expected ground truth with labels. Rather, a
training feedback is given if the output of the learning algorithm is favorable or
not. That is, the learning is carried just by telling the learning algorithm if the
output is right or wrong without explicitly telling the algorithm how to correct it
or why is the output wrong.
Classification based on nature of objective
Based on the nature of the objective for which the learning program is trained, a learning
program can be classified into the following.
• Regression
Regression learning programs are also supervised learning programs. That is, for
such programs, all the samples in the training data have label. The output of
the learning algorithm for such problems typically span the subsets of real line.
Consider a learning algorithm used for channel estimation in wireless communica-
tion systems. The input is the received pilot symbol and the learning algorithm is
trained to provide the channel coefficient value (which can span the real line).
• Classification
Classification programs also fall under the category of supervised learning pro-
grams. The goal is to classify the input data into different classes. As opposed
to regression, the outputs are typically probability mass functions over the classes.
During training, the learning program updates its weights such that the correct
Figure 1.17: A block diagram showing the difference between the traditional program
and an machine learning program.
class is assigned high probability. Classification of received symbols in a communi-

cation system into one of the M classes for data symbols chosen from a modulation
alphabet is an example for a classification program.
• Clustering
Clustering programs are unsupervised. Unlike classification, the classes to which
the training data belongs is not known apriori. The goal here is to group the
training data into meaningful clusters based on the features present in the data.
1.4.2 Difference between traditional program and machine learn-

ing program
Figure 1.17 shows the block diagrams of traditional program and machine learning pro-
gram (supervised) at a high level. In a traditional program, data is fed as input along
with program or logic. The computing resource uses the program to act on the data to
provide the required output. On the other hand, in machine learning programs, inputs
in the form of data and the expected output (ground truth) is fed as input to the com-
puting resource. The computing resource outputs the program (trained parameters of
the network). Once the training concludes, the parameters are frozen and saved. Fol-
lowing this, during the testing phase, block diagram similar to the traditional program
is followed, wherein the input program is the saved network parameters.
1.4.3 Deep learning using neural networks
Classical machine learning algorithms require the manual extraction of the most impor-
tant features from the training data. Later, these features are used to build a machine
learning algorithm for classification or regression task. An example of this is principal
component analysis (PCA), where the most important features (principal components)
in the training data can be extracted [50]. However, as the dimensionality of the training
data grows, it might become infeasible to manually extract the most important features.
Neural networks (NNs) were introduced as a subset of machine learning algorithms to
overcome this difficulty [51]-[54]. NNs can automatically retrieve the relevant character-
istics/features from the training data through learning. Deep learning (DL) is the subset
of machine learning algorithms, where NN having more that two layers (and therefore
deep) is used. Three main architectures of NN exist in literature. They include fully
connected neural network, recurrent neural network, and convolutional neural network,
which are described below.
• Fully connected neural network (FCNN) [55]

FCNN architecture comprises of neurons arranged in layers. Each layer has an
input dimension M and output dimension N , i.e., there are M neurons at the
input of the layer and N neurons at the output. Each neuron at the input of a
layer is connected to each neuron at the output of the layer. This is seen in Fig.
1.18, where the input is M -dimensional and the output is N -dimensional. The
structure of a neuron is shown in Fig. 1.19, where i1 , i2 , · · · , in denote the inputs
to the neuron which is multiplied by trainable weights w1 , w2 , · · · , wn . A trainable
bias b is added, after which it is passed through an non-linear (or linear) function
to obtain the output of the neuron. The output, o, can be expressed as
n
!
X
o=ψ wj ij + b . (1.56)
j=1
During training all the parameters (weights and bias) of all the neurons in the
Figure 1.18: Architecture of one layer of FCNN.
Figure 1.19: Structure of the neuron.

Figure 1.20: A basic structure of RNN and time unfolding.
FCNN are updated.
• Recurrent neural network (RNN) [56]

RNNs are a special class of neural networks that are used for learning involving
time-series data. The architecture of RNNs allow them to “unfold” over time, in
which the output of the previous unfolding step becomes the input at the current
unfolding step. Figure 1.20 shows the basic structure of the RNN on the left and
the unfolding on the right. Following are the operations for the tth iteration:
ht = ψ (whh ht−1 + wxh xt ) and yt = wyh ht , (1.57)
where ψ is a non-linear (or linear) function, whh and why are trainable parameters.
ht is called the hidden state, xt is the input, and yt is the output. Further, across all
unfolding indexed by t, although the weights remain constant, the hidden state and
the output depend on all the unfolding till t, and, therefore, these networks are said
to have “memory”. This property allows these networks to learn time-series data.
This architecture of RNN, called the vanilla RNN, is affected by vanishing gradients
that limits its capabilities. To overcome this and to learn long time-series data two
modified architectures are proposed, namely, long short-term memory (LSTM) [57]
and gated recursive unit (GRU) [58].
• Convolutional neural networks (CNN) [59] CNNs were proposed to identify

patterns from matrix like datasets (images like datasets). For a 2D CNN, the
trainable parameters are the entries of a 2D kernel matrix that slides over the input
multiplying the kernel entries with the corresponding input values and summing
Figure 1.21: Pictorial representation of CNN.
the values to get the output. In other words, the input values are filtered through
the kernel. Fig. 1.21 shows this operation. A 3 × 3 kernel and an input of size
5 × 5 is considered. y1 is obtained as
y1 = ψ (w1 x1 + w2 x2 + w3 x3 + w4 x6 + w5 x7 + w6 x8 + w7 x11 + w8 x12 + w9 x13 ) .

(1.58)
After this operation, the kernel is shifted one position to the right so that w1 is
aligned with x2 to obtain y2 . Once the row is done, the kernel is shifted to the
next row (with w1 aligning with x6 to get y4 ) and the process is continued until
the entire input is filtered.
1.4.4 Common terminologies in deep learning framework
Some common terminologies and their functions for training NNs are presented below.
• Forward propagation: A pass through the NN from the input to the output is called
forward propagation.
• Loss function: A criterion evaluated at the output of the NN after the forward
pass. This quantifies how far (or near) the output of the NN is to the expected
output. Examples include, mean square error loss, binary cross entropy loss, L1
loss, among others.
• Back propagation [60]: Once the loss function value is evaluated, the gradient of
this loss is computed with respect each trainable parameter in the NN working
from output to input. This process of evaluating the gradients from the output
towards the input is called the back propagation.
• Optimizer: Optimizer is an algorithm that actually performs updates to all the

trainable parameters based on the evaluated gradient. Common examples are
Adam, stochastic gradient descent, and RMSProp.
• Learning rate: Defines how fast (or slow) the weight updates happen. It is desirable
to have a large value of learning rate at first and decrease the value as the training
progresses for fine tuning purposes.
• Epoch: The number of training iterations to perform. Each epoch contains multiple
forward and backward propagations.
• Batch size: The number of training samples to be used in each epoch.
• Activation function: Typically a non-linear function used at the ouptut of the NN.
In FCNN, RNN, and CNN descriptions, the function ψ(·) is the activation function.
The non-linearity is what allows the NN to learn complex tasks. Common examples
are rectified linear unit (ReLU), sigmoid, tanh, and softmax.
• Hyperparameters: All the parameters of training not pertaining to the NN are

called hyperparameters. For example, optimizer, loss function, batch size are all
hyperparameters.
• Parameters: All the variables pertaining to the NN are referred to as parameters.

Number of layers, activation function, input and output size are all parameters.
1.5. Contributions made in the thesis 43
• Training dataset: Dataset used for training.
• Validation dataset: Data not in the training dataset, used to validate the training.
Used to handle issues like overfitting in the NN.
• Test dataset: Data used while testing after the training is complete. The data in
this dataset is not present in both the training and validation dataset. Performance
of the trained network is evaluated using this dataset.
• Learning library: This refers to libraries that contain all the basic codes required for
training. That is, the codes for the NN, optimizer, forward and back propagation is
already written in these libraries. Popular examples are PyTorch [2] and Tensorflow
[3].
• Hardware for learning: This refers to hardware that are designed to speed up
training times. The hardware is generally referred to as graphical processor units
(GPUs). Some of the most popular and powerful GPUs are Nvidia Titan, Nvidia
RTX 3090, and Nvidia RTX 4090.
In this thesis, we consider supervised learning for regression tasks. We consider FCNN,
RNN, and CNN architectures.
1.5 Contributions made in the thesis

In this thesis, we focus on the problem of channel estimation in wireless channels for
various signalling techniques (conventional, OFDM, and OTFS). For this purpose, we
consider different pilot arrangements at the transmitter. At the receiver, we design and
train NNs for channel estimation and use the estimates obtained from the trained NNs
for symbol detection.
1.5.1 Deep channel prediction
In Chapter 2, we consider the problem of channel prediction in time-varying fading

channels [61]. In time-varying fading channels, channel coefficients are estimated using
pilot symbols that are transmitted every coherence interval [11]. For channels with high
Dopplers, the rapid channel variations over time will require these pilots to be trans-
mitted often. This requires considerable bandwidth for pilot transmission, leading to
poor throughput. We propose a novel receiver architecture using deep recurrent neural
networks (RNNs) that learns the channel variations and, thereby, reduces the number of
pilot symbols required for channel estimation. Specifically, we design and train an RNN
to learn the correlation in the time-varying channel and predict the channel coefficients
into the future with good accuracy over a wide range of Dopplers and signal-to-noise
ratios (SNRs). The proposed training methodology enables accurate channel prediction
through the use of techniques such as teacher-force training [62], early-stop, and reduc-
tion of learning rate on plateau. Also, the robustness of prediction for different Dopplers
and SNRs is achieved by adapting the number of predictions into the future based on
the Doppler and SNR. We also propose a data decision driven receiver architecture using
RNNs, wherein the data symbols detected using the channel predictions are treated as pi-
lots to enable more predictions, thereby the pilot overhead is further reduced. Numerical
results show that the proposed RNN based receiver achieves good bit error performance
in time-varying fading channels, while being spectrally efficient.
1.5.2 Learning based channel estimation in OFDM
In Chapter 3, we consider the problem of channel estimation in doubly-selective (i.e.,

time-selective and frequency-selective) channels in OFDM systems in the presence of
oscillator phase noise (PN). While channel estimation techniques for OFDM systems in
time-flat, frequency-selective channels have been well studied and adopted in practice,
estimating a channel with rapid time variations is challenging [14]. Also, OFDM receivers
1.5. Contributions made in the thesis 45
are known to be sensitive to impairments due to local oscillator PN [18]. Methods re-
ported in the literature to estimate the channel incur significant overhead in terms of
the number of training/pilot symbols needed to effectively estimate the channel in the
presence of PN [63],[64]. To overcome these shortcomings, we propose a learning based
channel estimation scheme for OFDM systems in the presence of both PN and doubly-
selective fading [65]. We view the channel matrix as an image and model the channel
estimation problem as an image completion problem where the information about the
image is sparsely available. Towards this, we devise and employ two-dimensional convo-
lutional neural networks (CNNs) for learning and estimating the channel coefficients in
the entire time-frequency (TF) grid, based on pilots sparsely populated in the TF grid.
In order to make the network robust to PN impairment, we employ a novel training
scheme where the training data is rotated by random phases before being fed to the net-
work. Further, using the estimated channel coefficients, we devise a simple and effective
PN estimation and compensation scheme. Our results demonstrate that the proposed
network and the PN compensation scheme achieve robust OFDM performance in the
presence of PN and doubly-selective fading.
1.5.3 Learning based DD channel estimation in OTFS
In Chapter 4, we consider the problem of DD domain channel estimation in OTFS

systems using deep learning techniques. Widely considered pilot frame structures for
DD channel estimation in OTFS include exclusive pilot frame, embedded pilot frame,
interleaved pilot frame, and superimposed pilot frame. We devise suitable learning based
architectures for channel estimation using these pilot frames as detailed below.
• First, we propose a learning based architecture for estimating the DD channel for
both exclusive pilot frame (which consists of a single pilot symbol at the middle
of the frame surrounded by zeros) and embedded pilot frame (which consists of a
pilot symbol and data symbols separated by zeros as guard symbols in between)
[66]. The proposed learning network, called DDNet, is based on a multi-layered
RNN framework with a novel training methodology that works seamlessly for both
exclusive pilot frames as well as embedded pilot frames. This generalization is
attributed to the training methodology, wherein multiple frame realizations with
different guard band sizes are used to train the network. Our results demonstrate
that the proposed DDNet achieves better mean square error (MSE) and bit error
rate (BER) performance compared to impulse based [67] and threshold based [68]
DD channel estimation schemes reported in the literature.
• Next, we consider DD channel estimation for interleaved pilot (IP) frame, where
pilot symbols are interleaved with data symbols in a lattice type fashion, without
any guard symbols. For this IP frame structure, we propose an RNN based channel
estimation scheme [69]. The proposed network is called IPNet. The proposed
IPNet is trained to overcome the effects of leakage from data symbols and provide
channel estimates with good accuracy in terms of MSE performance. Our results
show that the proposed IPNet architecture achieves good BER performance while
being spectrally efficient.
• The pilot frame structures considered above incur rate loss due to pilot symbol
and guard symbols. This rate loss can be avoided by superimposing pilot symbols
over data symbols. Our contributions here in this regard are two-fold. First, we
propose a sparse superimposed pilot (SSP) scheme, where pilot and data symbols
are superimposed in a few bins and the remaining bins carry data symbols only
[70]. This scheme offers the benefit of better inter-symbol leakage profile in a frame,
while retaining full rate. Second, for the SSP scheme, we propose an RNN based
learning architecture (referred to as SSPNet) trained to provide accurate channel
estimates overcoming the leakage effects in channels with fractional delays and
Dopplers [70]. Our results show that the proposed SSP scheme with the proposed
SSPNet based channel estimation performs better than a fully superimposed pilot
(FSP) scheme [71] with interference cancellation based channel estimation reported
in the literature.
1.6. Organization of the thesis 47
1.5.4 Learning in TF domain for DD channel estimation in

OTFS
In Chapter 5, we propose a novel learning based approach for channel estimation in OTFS
systems, where learning is done in the TF domain for DD domain channel estimation
[72], [73]. Learning in the TF domain is motivated by the fact that the range of values in
the TF channel matrix is favorable for training as opposed to the large swing of values
in the DD channel matrix which is not favourable for training. A key beneficial out-
come of the proposed approach is its low complexity along with very good performance.
Specifically, it drastically reduces the complexity of the computation of a constituent DD
parameter matrix (CDDPM) in a state-of-the-art algorithm [74]. We develop this TF
learning approach for two types of OTFS systems, namely, 1) two-step OTFS [72], and
2) DZT-OTFS [73]. Our results show that the proposed TF learning-based approach
achieves almost the same performance as that of the state-of-the-art algorithm, while
being drastically less complex making it practically appealing.
1.6 Organization of the thesis

The rest of the thesis is organized as follows. In Chapter 2, we present the proposed
deep channel prediction for time-varying fading channels. In Chapter 3, we present
the proposed learning based approach for channel estimation in OFDM systems in the
presence of phase noise. Delay-Doppler channel estimation for OTFS through learning
using different frame structures, namely, exclusive pilot, embedded pilot, interleaved
pilot, and superimposed pilot is presented in Chapter 4. Chapter 5 presents the proposed
learning approach in TF domain for DD channel estimation for two-step OTFS and DZT-
OTFS. Conclusions and scope for future work are presented in Chapter 6.
Chapter 2
Deep channel prediction in

time-varying channels
2.1 Introduction
In time-varying fading channels, channel coefficients are estimated using pre-determined
symbols (i.e., known both at the transmitter and receiver), called pilot symbols, that are
transmitted once every coherence interval. For channels with high Doppler spread, the
rapid channel variations over time will require considerable bandwidth for pilot trans-
mission, leading to poor throughput. In this chapter, we propose a novel receiver archi-
tecture using deep recurrent neural networks (RNNs) that learns the channel variations,
and thereby reduces the number of pilot symbols required for channel estimation. Specif-
ically, we design and train an RNN to learn the correlation in the time-varying channel
and predict the channel coefficients into the future with good accuracy over a wide range
of Dopplers and signal-to-noise ratios (SNRs). The key contributions and highlights in
this chapter can be summarized as follows [61]:
• First, an RNN is designed and trained to learn the correlation in the time-varying
fading channel and predict the channel coefficients into the future with good accu-
racy over a wide range of Dopplers and SNRs.
48
2.2. System model 49
• The proposed training methodology enables accurate channel prediction through

the use of techniques such as teacher-forced training, early-stop, and reduction of
learning rate on plateau.
• The robustness of prediction for different Dopplers and SNRs is achieved by adapt-
ing the number of predictions into the future based on the Doppler and SNR.
• Next, a data decision driven receiver architecture using RNNs is proposed that
further reduces the pilot overhead while maintaining good performance.
• Numerical results show that the proposed receivers achieve good BER performance
in time-varying fading channels.
The rest of the chapter is organized as follows. The considered system model and a
brief background on deep neural network architectures used in this chapter is presented in
Section 2.2. The proposed deep channel predictor, its architecture, training methodology,
and performance are presented in Section 2.3. In addition, the proposed adaptive channel
prediction scheme and its performance are also presented in this section. The proposed
data decision driven architecture and its performance are presented in Section 2.4. A
summary of the chapter is presented in Section 2.5.
2.2 System model

Consider a point-to-point wireless communication system with a single antenna trans-
mitter and receiver. The channel between the transmitter and receiver is a time-varying
fading channel. The information symbols are chosen from an M -ary constellation. Let
x(t) be the transmit signal at the tth time instant. The channel fade coefficient at the
tth time instant is denoted by h(t). Let y(t) be the received signal at the receiver and
n(t) be the additive noise. Now, y(t) can be written as
y(t) = h(t)x(t) + n(t). (2.1)

50 Chapter 2. Deep channel prediction in time-varying channels
The channel fade coefficients are statistically modelled by a circularly symmetric complex
Gaussian random variable with mean 0 and variance 1, i.e., h ∼ CN (0, 1). The additive
white Gaussian noise (AWGN), n(t), is modelled as n ∼ CN (0, σ 2 ), where σ 2 is the
variance of the noise.
We consider a mobile communication scenario where there is relative motion between
the transmitter and receiver. This introduces Doppler spread in the channel due to time
selectivity and the channel fades h(t) become temporally correlated. The correlation
in h(t) depends on several factors such as scatterers in the propagation environment,
relative velocity between the transmitter and receiver, etc. The power spectral density
(PSD) of h(t) is non-zero in the interval [−fDmax , fDmax ], where fDmax is the maximum
Doppler given by [75],[76]
fc v
fDmax = . (2.2)
c
In (2.2), v is the maximum relative velocity between the transmitter and the receiver, fc
is the carrier frequency, and c is the speed of light. Therefore, the Doppler spread of the
channel is given by 2fDmax . For a low Doppler spread, the channel changes slowly over
time, while a high value of Doppler spread indicates that the channel varies rapidly with
time. The coherence time (Tc ) of the channel is inversely proportional to the Doppler
spread, Tc ∝ 1/fDmax . In order to detect the transmitted signal x(t) from y(t), the value
of h(t) has to be estimated at the receiver. In each transmission block spanning one
coherence time, the channel gain is estimated and employed for detection of the data
signal transmitted in that coherence block.
Typical wireless communication systems transmit one or more pilot symbols in each
coherence block to estimate the channel coefficients. Let Tp and Td be the duration of
pilot transmission and data transmission, respectively, in a coherence block, i.e., Tc =
Tp + Td . Let p(t) be the pilot signal transmitted at the tth time instant. The signal
received during the pilot transmission phase can be written as
yp (t) = h(t)p(t) + n(t). (2.3)

The linear minimum mean square error (LMMSE) estimate of the channel coefficient
that achieves the Cramer-Rao lower bound is given by [77],[11]
yp (t)|p(t)|2
h(t) =
b . (2.4)
p(t) (|p(t)|2 + σ 2 )
The transmission of pilots reduces the spectral efficiency and throughput of the com-
Tp
munication system. That is, Tc
fraction of the channel-uses do not carry data. The
efficiency of channel usage is defined as
Tp
η =1− . (2.5)
Tc
For a fixed number of pilots per coherence block, as the coherence time decreases, η also
decreases. High mobility wireless communication channels may require large amount
of bandwidth to be used for pilot transmission, which, in turn, adversely reduces the
achievable data rate and system capacity.
Since the time-varying channel coefficients are temporally correlated, the number of
pilots transmitted to estimate the channel coefficients can be reduced by learning this
correlation model and using the learnt model in channel estimation. As the correlation
model could be different for different channel geometries, a statistical solution to this
problem may not be robust. Therefore, we propose to employ a deep learning based
solution to learn the channel correlation model and predict the channel coefficients into
the future to reduce the pilot transmissions. Towards this, we employ recurrent neural
networks (RNN) and fully connected neural networks (FCNN) to construct the proposed
deep channel predictor and the receiver. Consequently, in the following subsection, we
present a brief background on deep neural networks, focusing on FCNNs and RNNs.
2.2.1 Deep neural networks
The deep neural network architectures that we employ are FCNNs and RNNs (see Section
1.4.3).
sigmoid tanh concatenation
ci−1 ci
× +
× ×
Si−1 Si
Forget gate Output gate

Ii Input gate
Figure 2.1: Recurrent unit of the LSTM architecture.
There are several implementations of RNN. In this chapter, we make use of an im-
plementation known as the long short-term memory (LSTM) [57]. The block diagram
of the recurrent unit of the LSTM architecture of RNN is shown in Fig. 2.1. This ar-
chitecture consists of three gates. These gates learn the temporal information that are
relevant and pass it to the next iteration. In each gate, a sigmoid function is applied
that restricts the output to values between 0 and 1. The output of the activation are
then multiplied to decide which part of the information is relevant. During training, the
weights are updated such that the relevant information gets a larger weight which yields
a value close to 1 after the sigmoid function. In Fig. 2.1, the variable ci , called the cell
state, is made available to all unfolded blocks. The variable Si refers to the hidden state
of the cell and Ii is the input to the cell. In our setup, the input Ii corresponds to either
the channel estimates from (2.4) or the fed back prediction values (see Fig. 2.2). The ci ’s
and Si ’s are updated at each stage i using Ii ’s. However, the information that is passed
on to the next iteration depends on the gate values. We use LSTM implementation of
the RNN because, as opposed to the basic RNN implementation, LSTMs are able to
learn correlation model in long time-varying sequences [57].
We use PyTorch machine learning library for the implementation, training, and test-
ing of all the neural networks proposed in this chapter [2]. We use the Nvidia Titan RTX
2.3. Proposed deep channel predictor 53
c1 c2 b b b b
cn LSTM FCNN Array
Figure 2.2: Block diagram of the channel predictor neural network.
Parameter Value
Number of layers 1
Input dimension 1
Hidden units 100
Output dimension (l) 100
Direction Uni-directional
Table 2.1: Parameters of LSTM layer of channel predictor.
GPU platform to carry out all the simulations.
2.3 Proposed deep channel predictor

In this section, we present the proposed RNN based deep channel predictor, its architec-
ture, training methodology, and performance. The channel predictor uses the received
pilot symbols to learn the channel variation model and predict the channel coefficients
into the future.
2.3.1 Architecture
The proposed deep channel predictor consists of two prediction networks, one each for
predicting the real and imaginary parts of the channel coefficients. The architecture
for these networks are the same and they are trained separately. The deep channel
predictor network consists of an LSTM network and an FCNN. The block diagram of
the proposed deep channel predictor is shown in Fig. 2.2. The rationale behind choosing
the parameters of LSTM and FCNN are presented below.
Figure 2.3: Mean square error of predictions made by predictor network for different
number of layers in the LSTM network.
(a) 1-layer (b) 5-layers
Figure 2.4: Training and validation loss trajectory for 1-layer and 5-layers LSTM archi-
tectures.
The purpose of using LSTM is two fold. First, the LSTM is capable of identifying
temporal correlations in the inputs and learning a correlation model. Second, the LSTM
can leverage the learnt correlation model to make predictions that obey the model. Also,
we choose a single layer LSTM architecture for the predictor network based on the follow-
ing performance evaluation. Figure 2.3 plots the mean square error (MSE) performance
of the trained predictor network as a function of SNR, for different number of layers in
the LSTM architecture. For fair comparison, the training for all the cases presented in
Fig. 2.3 is performed with the same training parameters and initial conditions. It is
seen that the MSE of the predictions improves in the low to mid SNR regime when the
number of layers is increased from 1 to 3. However, with further increase in the number
of layers, the MSE performance of the network degrades, which can be attributed to
the phenomenon of over-fitting, wherein a network learns to perform well only on the
data set used in training [78]. This is illustrated in Figs. 2.4a and 2.4b which show the
training/validation loss performance for 1-layer LSTM and 5-layer LSTM, respectively.
It is seen that both 1-layer and 5-layer LSTMs show convergence to small training loss
values (indicating successful training). However, in the validation phase (where data
not in the training data set is used for validation) the validation loss in 5-layer LSTM
does not show convergence to small values (indicating over-fitting), while 1-layer LSTM
achieves convergence to small loss values in validation phase as well. Although LSTM
architecture with 3-layers has the best MSE performance, the improvement it offers over
1-layer LSTM architecture is not significant compared to the complexity it introduces.
For example, the number of parameters in the predictor network with 1-layer, 2-layer,
and 3-layer LSTMs are 41301, 122101, and 202901, respectively. The five-fold increase in
the number of parameters when compared to the 1-layer LSTM makes the 3-layer archi-
tecture to be slower and harder to train than the 1-layer counterpart. Further, above 25
dB, the MSE performance of the predictor network with 1-layer, 2-layers, and 3-layers
LSTM architectures are almost similar. Therefore, we choose 1-layer LSTM architecture
with the parameters listed in Table 2.1 for the predictor network throughout this chapter
for its simplicity and reasonably good MSE performance.
Parameter Value
Input dimension (l) 100

output dimension (m) 1
Activation function Linear
Number of layers 1
Table 2.2: Parameters of FCNN layer of channel predictor.
The FCNN layer is employed to reduce the output of the LSTM layer to the required
dimension. In our setup, the data from the output of the LSTM has a dimension of 100,
which is to be reduced to a dimension 1 indicating a single channel prediction. However,
picking one dimension arbitrarily may not yield the best solution. The FCNN takes the
100-dimensional data as input, and during training assigns large weights to those outputs
which have a potentially higher bearing on the prediction value as compared to the rest.
This can improve the performance of the predictions made by the setup. The FCNN
layer parameters are listed in Table 2.2.
Figure 2.2 shows the block diagram that depicts the working of the channel predictor.
The predictor network expects an n-length sequence of channel coefficients as input. The
working of the network is divided into two phases, namely, the initial estimation phase
and the subsequent prediction phase. In the estimation phase, n pilots are transmitted
in n coherence intervals. The LMMSE channel estimates from these transmitted pilots
are obtained using (2.4). These estimates are used to initialize the entries of the input
vector c = [c1 c2 · · · cn ] with entries arranged chronologically, c1 being the least recent
estimate and cn being the most recent. This initialized vector is provided as the input to
the LSTM network. The entries of c reflect the correlation among the channel coefficients,
and are used by the LSTM network to predict the channel coefficients in the subsequent
coherence intervals. The output of the LSTM network is fed to the FCNN layer.
The FCNN layer produces one channel prediction at its output. This output is the
prediction for one-step into the future, i.e., this output is the predicted coefficient for
Figure 2.5: Block diagram of the channel predictor aided receiver.
the coherence interval next to the coherence interval for which the most recent estimate
cn was obtained. This concludes the initial estimation phase. Subsequently, in the
prediction phase, the input vector c is left shifted so that c1 is flushed out and the
previously obtained prediction value is used to fill the vacant cn space after the left shift
operation. The input of the LSTM is thus updated with the most recent prediction. A
procedure similar to that in the estimation phase is followed again to obtain the channel
prediction corresponding to the next coherence interval. This process is repeated for as
many times as the number of predictions required. The predictions thus made by the
network are stored in an array. At the end of required number of predictions, the array
is used to decode the transmitted symbols. The architecture is therefore flexible in the
sense that it allows for dynamic adjustment of the number of channel predictions.
The block diagram of the overall channel predictor aided receiver is shown in Fig.
2.5. The channel predictor block is followed by a data decoder. The data decoder can be
maximum likelihood (ML) decoder or a neural network (NN) based decoder. We will use
ML decoding for transmission schemes that decode symbol by symbol, due to low ML
decoding complexity in such cases. For block transmission schemes which require joint
decoding of symbols (e.g., cyclic prefix single carrier (CPSC) scheme in Section 2.3.7),
we will use NN-based decoder approach.
Further, the channel predictor and the NN decoder can be trained together as a
single network, as it would alleviate the need for a separate decoder. However, we keep
the training for the predictor and the decoder separate with the intention of having a
universal predictor network. That is, once trained, the predictor network can be used in
conjunction with any decoder. On the other hand, if training is done for the predictor and
decoder together, there is a need to train and store multiple models, each corresponding
to a different decoder.
2.3.2 Training methodology
In this subsection, we describe the training of the channel predictor network. To train
the channel predictor network, correlated channel coefficients that mimic a channel with
time selectivity are used. This data is generated using the Clarke and Gan’s model
[75]. Although simpler models like the first order auto-regressive model can be used in
order reduce the effort required in training, they may not adequately capture the true
variations in the channel. Training using such models may lead to higher MSE that
would not allow large number of channel predictions into the future. For a time-selective
channel, the auto-correlation function of the fading process h(t) is given by
R(∆t) = J0 (2πfDmax ∆t), (2.6)
where J0 (.) is the modified Bessel function of the first kind and zeroth order. The power
spectral density (PSD) of h(t), which is also referred to as the Jakes’ spectrum, is given
by

f
rect max
2fD
SH (f ) = r 2 , (2.7)
f
πfDmax 1− max
fD
where rect(·) is the rectangular function defined as


1, x ∈ [− 1 , 1 ]

2 2
rect(x) = (2.8)
0,

otherwise.
Specifically, we use the implementation of Clarke and Gan’s model given by Smith in
[76] to generate the required data. In our simulations, a coherence block consists of 42
symbols for fD = 50 Hz and 11 symbols for fD = 100 Hz.
To begin the training, the LSTM and FCNN networks in Fig. 2.2 are initialized with
random or untrained weights. A large number of correlated channel coefficient samples
are generated using the implementation mentioned above, so that the network is able
to generalize well. The channel coefficients are separated into real and imaginary parts.
The real (imaginary) part is used as training data for the predictor network which is set
to predict real (imaginary) part of the channel coefficients. This is done because the deep
learning libraries support training with only real values. Since the real or imaginary parts
also have correlation, we use separate network for each. While training, the number of
predictions made by the network is fixed at 100. The network is trained with 10-length
sequence of input data and 100-length sequence of ground truth predictions or expected
predictions. Both are obtained from the correlated channel coefficients obtained through
Smith’s method. Therefore, we sample blocks of length 110 from the generated samples
of correlated channel coefficients. The sampled sequence is structured such that the first
entry in the sequence is the least recent and the last entry is the most recent. The first 10
entries (from the least recent end) of the sequence are provided as input to the predictor
network. The network produces 100-length predictions. These predictions are compared
with the last 100 entries of the 110-length sequence by computing a mean square error
(MSE) loss function, given by
L(x, x̂, Θ) = E[x − x̂(Θ)]2 . (2.9)
In the above equation, E[·] is the expectation operator, x is the 100-length expected
output, and x̂(Θ) is the 100-length output of the neural network, which is a function of
the network parameters Θ. At each training iteration, the value of (2.9) is computed and
the weights are updated so as to minimize the loss function through back propagation.
This procedure is repeated for real and imaginary predictions in each iteration. The
values of the hyper-parameters used for training the channel predictor are given in Table
2.3.
Remark 1: Note that the predictor network is trained at a Doppler frequency of
Hyper parameter Value
Starting learning rate 0.01

Minimum learning rate 10−8
Epochs Min. 200, Max. 1000
Optimizer Adam
Loss function MSE
Batch per epoch 4500
Training Doppler frequency 10 Hz (see Remark 1)
Table 2.3: Hyper-parameters used for training channel predictor.
10 Hz (see Table 2.3). The predictor trained at 10 Hz Doppler is able to predict well
over a range of Doppler values (as will be shown in Figs. 2.7 and 2.10 later). This is
because the training teaches the LSTM network essentially to observe the underlying
correlation model in the input data and leverage the observed model to predict future
coefficients. In the 10 Hz case, the network trained at 10 Hz observes that the channel
coefficients have strong correlation and it outputs predictions that obey the underlying
slow variation model. In the 100 Hz case, the input changes are more abrupt and the
same trained network is able to adapt to this underlying fast variation model as well and
produce predictions that obey the faster trend.
Training enhancement features
The training method outlined above, by itself, either leads to a large number of iterations
before converging (where the loss function assumes a small enough value) or to a condition
where the network does not converge at all (where the loss function did not monotonically
decrease with training epochs). This is because in each iteration during the initial part
of the training, the prediction made is inaccurate due to untrained weights and the
erroneous value is fed back to the input to make another prediction. It is only at the
end of 100 predictions that the loss function is evaluated and the back propagation to
update weights is performed. Due to the error accumulating at the input, the output
might become garbled leading to poor weight updates, resulting in slow convergence or
divergence. Therefore, we employ additional techniques while training as enumerated
below.
• Teacher force training: This technique is employed to alleviate the problem men-
tioned above. Teacher force training involves supplementing the training with the
ground truth data. During training, data is fed back from the output of the pre-
dictor network to the input. With teacher force training, with small probability,
p, the ground truth data corresponding to that time instant from the 100-length
expected output is supplied back to the input instead of the output of the predic-
tor network. This prevents the input from accumulating error due to inaccurate
predictions. In our training setup, we found that a probability of p = 0.2 works
well for quick convergence of the network. This implies that with 1 − p = 0.8
probability the prediction made by the network itself is fed back to its input to
make further predictions. Since the input is not allowed to deviate uncontrollably
from the actual values, this helps the network converge faster.
• Reduce learning rate on plateau: Learning rate is a hyper-parameter that needs to

be set while training. The value of the learning rate decides how fast or slow a
network learns, by pacing the weight updates. A large learning rate (of the order
of ∼ 0.01) is desirable at the initial stages of training. However, when the loss
function hits a plateau a large learning rate may not help the loss function to
reduce further. This is because the large value of the learning rate forces large
weight updates and this may not help the network come out of the local minimum.
Large learning rate may also result in unsettling the network from the state it is in.
A small learning rate would ensure that the weight updates are small and the would
help the network to find the minimum within the plateau. If this does not happen
and the loss function continues to maintain the value at plateau, the technique calls
for increasing the learning rate back to its original value. In our training setup, we
implemented this by reducing learning rate by a factor of 10 every time the loss
function value did not reduce for 10 consecutive training iterations. To prevent the
learning rate from becoming minuscule, we set the minimum value to be 10−8 . In
the process of decreasing the learning rate, if at any stage the value of loss function
is found to increase, the learning rate value is reset to its original value.
• Early stop: Yet another problem that is associated with training neural networks
is that of over-fitting. Over-fitting is said to occur when the network is allowed to
learn for a long time on the available data. This results in a trained model that is
tailor made for the training data, but fails to generalize to data beyond those seen
while training. That is, the model performs poorly on any data that is not present
in the training data. To prevent this from happening, we employ a technique called
early stop. The early stop technique dictates that the training be stopped when the
network is not able to learn any further. This happens when the loss function does
not reduce across iterations. We implement this after a minimum of 200 epochs of
training. Following this, if the learning rate has already dropped to 10−8 from the
second technique and the loss function does not reduce significantly in the next
50 iterations, we stop training the network. If such a scenario never occurs during
training, the training is stopped after 1000 epochs.
2.3.3 Performance results
In this subsection, we present simulation results on the training performance, prediction

error performance, and BER performance associated with the proposed channel predictor
aided receiver developed in the previous subsections. In all the simulations, a fixed 4-
QAM symbol is used as a pilot symbol, and the pilot symbol power and the data symbol
power are kept the same. In practice, the pilot power is typically kept at the same or a
higher level compared to the power in the data symbols.
(a) Training loss (b) Validation loss
Figure 2.6: Comparison of training and validation loss trajectories as a function of epochs
with and without the training enhancement features.
Figure 2.7: Mean square error of predictions as a function of number of predictions for
different SNR and fD values.
Training performance
Figure 2.6 shows the training trajectory in terms of training loss and validation loss
at each epoch comparing training performed with and without the above mentioned
enhancement features. In Figs. 2.6a and 2.6b, the plotted line shows the mean of the
training loss and validation loss, respectively, while the shaded area around the line is
indicative of the variance observed in the losses across training runs. The training loss
(MSE loss between the predicted coefficients and the actual coefficients evaluated during
training, see (2.9)) in the presence of the enhancement features shows convergence at
about 50 epochs for the training loss trajectory and about 100 epochs for the validation
loss trajectory, after which the loss remains almost constant. This quick convergence is
attributed to teacher force training and subsequent consistency in the loss function value
is due to the reduced learning rate. Further, as the variance in the validation loss (MSE
loss evaluated on data not present in training data) decreases to small values around
200 epochs, the training is stopped and the network parameters are frozen in accordance
with the early stop training feature. In contrast, without the enhancement features, the
training loss does not seem to converge as it assumes a high value throughout. A similar
trend is observed in the validation loss trajectory as well. Without the enhancement
features, the network shows large variations in the validation loss and training loss even
at 200 epochs, which leads to slow convergence. Figure 2.6, therefore, demonstrates the
effectiveness of the enhancement features in attaining faster convergence.
Prediction error performance
Figure 2.7 shows the MSE performance of predictions as a function of number of future
predictions made by the channel predictor. The plots are obtained for fD = 10, 50,
and 100 Hz. The following observations can be made from Fig. 2.7. First, the MSE
performance is found to improve with increasing SNR, which is expected. Next, for a
given SNR and fD , increasing the number of future predictions increases the MSE. As the
number of predictions is increased at a given SNR and fD , more errors are accumulated
(a) 4-QAM (b) 16-QAM
Figure 2.8: BER Performance of the proposed channel predictor aided receiver with ML
decoder for fixed number of predictions (N =100, η = 90.9% and N =10, η = 50%) at
fD = 50, 100 Hz for 4-QAM and 16-QAM.
due to the feedback from output of the predictor network to input which explains the
observed trend. Therefore, choosing the right number of future predictions becomes
crucial to ensure robustness across different values of Dopplers and SNRs. For a given
SNR and number of predictions, the MSE curves for different fD values are close.
BER performance
In Figs. 2.8a and 2.8b, we demonstrate the BER performance achieved by the proposed
channel prediction aided receiver with ML decoder for 4-QAM and 16-QAM, respectively.
Performance with perfect channel state information (CSI) is also plotted for comparison.
We consider two scenarios to demonstrate the effect of number of future predictions
on the BER performance. The first is a greedy scenario (with respect to bandwidth
efficiency), where we set the number of predictions to be fixed at 100 across different
Eb /N0 and fD values. This corresponds to a bandwidth efficiency (η) of 90.9%. The
second is a conservative scenario (with respect to MSE of predictions), where the number
of predictions is fixed at 10 instead of 100, corresponding to a bandwidth efficiency
of 50%. The following observations can be made. First, it can be seen that in the
conservative scenario with 10 predictions, the achieved BER performance is very close to
the ideal performance with perfect CSI for both 4-QAM and 16-QAM with fD = 50, 100
Hz. Second, although the greedy scenario achieves good bandwidth efficiency, the BER
performance takes a hit. The performance gap between the greedy and the conservative
scenarios is more for the channel with a higher Doppler. While the performance hit in the
greedy scenario is not very significant in the 4-QAM case, it is quite severe in the case of
16-QAM (see BER plots in Fig. 2.8b for 100 predictions). This constrains the number of
predictions to be conservatively fixed at 10 in order to achieve good BER performance,
which leads to poor bandwidth efficiency. Motivated by this need and opportunity for
improvement, in the following subsection (Section 2.3.4), we propose an adaptive scheme
that allows the receiver to dynamically adjust the number of future predictions employed
in the prediction algorithm in accordance with the operating SNR and Doppler.
2.3.4 Adaptive channel prediction
In the previous subsection, the number of future predictions employed in the prediction
algorithm is fixed. Here, we propose to adapt the number of predictions in accordance
with the operating SNR and Doppler with a motivation to improve bandwidth efficiency
and performance. The idea is to create and use a lookup table consisting of the achieved
MSE between the channel predictions and the actual channel coefficients for different
number of future predictions, SNRs, and Dopplers. The desired target MSE for a given
operating SNR is set to be the MSE between the LMMSE channel estimate in (2.4) and
the actual channel coefficients at that SNR. For a given operating SNR, Doppler, and
target MSE, the number of future predictions to be employed in the predictor algorithm
is obtained from the lookup table. This makes the prediction algorithm to adaptively
employ different number of future predictions for different operating conditions.
In Fig. 2.9, we show the 3D plots of the entries of the lookup table (i.e., achieved
MSE values) for different values of number of predictions (N = 5 to 100) and Doppler
(D = 5 to 100 Hz) at SNRs of 10 dB and 20 dB. It can be seen that for a fixed N , the
achieved MSE decreases with decreasing D. Likewise, for a fixed D, the MSE decreases
Figure 2.9: Achieved MSE performance of predictions as a function of number of future

predictions and Doppler for a given SNR of 10 dB and 20 dB.
with decreasing N . It can also be observed that, for all N and D values, the achieved
MSE values at 20 dB SNR are less than those at 10 dB SNR. The contour lines plotted
in the N -D plane at the bottom are for the surface corresponding to 20 dB SNR. A given
contour line shows all the (N, D) values for which the achieved MSE is the same. For
example, the outermost contour with (100, 5) and (5, 100) as the end points has an MSE
of 0.002. In this contour, as D decreases from 100 Hz to 5 Hz, N increases from 5 to
100. Further, the innermost contour is a point at (N, D) = (100, 100) that has an MSE
value of 0.012.
In Fig. 2.10, we show the 2D plots of the MSE performance of the predictions as
a function of channel correlation (Doppler spread) for N = 20, 50, and 100 predictions,
at a fixed SNR of 10 dB. It is seen that for all values of the Doppler spread, the MSE
values are least when the number of predictions is 20 and the highest when the number
of predictions is 100. Further, the MSE values across different Doppler spreads are close
when N = 100, while they are even closer when N = 50, 20. For instance, at 100 Hz
Doppler the maximum increase in MSE compared to 10 Hz Doppler is only about 0.011
Figure 2.10: MSE performance of predictions as a function of fD for a given SNR of 10

dB.
for the case of 100 predictions, while this number drops to 0.0045 for the case of 20
predictions. This demonstrates that the network trained at 10 Hz is able to generalize
and perform quite well across the considered Doppler range (up to 100 Hz).
The prediction algorithm chooses the number of predictions (for a given operating
SNR, target MSE for that SNR, and Doppler) corresponding to an achieved MSE in
the lookup table that is less than the target MSE. Figure 2.11 shows the number of
predictions chosen by the algorithm from the lookup table for different SNRs in the
range -5 dB to 40 dB and fD values in the range 10 Hz to 100 Hz. Here, the target MSE
at an SNR is obtained by evaluating the MSE of LMMSE estimates obtained by pilot
transmissions at that SNR. It can be seen that, for a given fD , the number of predictions
chosen by the algorithm shows a bell-shaped behavior as the SNR is increased. For
example, at very low SNRs, the algorithm chooses very few number of predictions to meet
the target MSE. This is because the correlation in the input to the channel predictor
is perturbed significantly by the additive noise having a high variance. This leads to
a high MSE of predictions, forcing the algorithm to choose a correspondingly small
Figure 2.11: Number of future predictions chosen by the prediction algorithm as a func-
tion of SNR for different values of fD .
value for the number of predictions. As the SNR increases, the number of predictions
chosen by the algorithm increases. This is because the fluctuations in the correlation in
the input decreases with increasing SNR (i.e., decreasing noise variance). As the SNR
increases further beyond a certain value, although the disturbance to the correlation
reduces further, the algorithm chooses smaller and smaller number of predictions, which
can be explained as follows. First, the target MSE (obtained from (2.4)) decreases with
increasing SNR. Second, the achieved MSE for a fixed number of predictions, i.e., the
prediction error does not decrease as fast as the target MSE with SNR. The combined
effect of these two makes the algorithm to choose reduced number of predictions at high
SNR.
In Figs. 2.12a and 2.12b, we demonstrate the BER performance achieved by the
proposed adaptive channel predictor aided receiver with ML decoder for 4-QAM and
16-QAM, respectively. The performance at fD = 50 Hz and 100 Hz are shown. The
performance with perfect CSI is also shown for comparison. It can be seen that, with
the proposed adaptation of the number of predictions, the receiver is able to achieve a
(a) 4-QAM (b) 16-QAM
Figure 2.12: BER performance of the proposed adaptive channel predictor aided receiver
with ML decoder at fD = 50, 100 Hz for 4-QAM and 16-QAM.
performance that is very close to the ideal performance with perfect CSI for both 50 Hz
and 100 Hz Doppler, while also being bandwidth efficient. For example, in Fig. 2.8b,
for fD = 100 Hz, the greedy scenario fixes the number of predictions to be 100 for all
values of Eb /N0 , which causes the BER to floor at 3 × 10−2 . The adaptive scheme, on
the other hand, chooses the number of predictions to be 100 until an Eb /N0 value of 7
dB, after which it reduces the number of predictions towards 5 at 40 dB Eb /N0 , leading
to better performance (no flooring is seen). Likewise, in Fig. 2.8a, for fD = 50 Hz,
the conservative scenario fixes the number of predictions to be 10 throughout the Eb /N0
range. Although the performance for fD = 50 Hz in Figs. 2.8a and Fig. 2.12a are almost
same, the bandwidth efficiency in Fig. 2.8a is only 50%, as there are 10 predictions
made for 10 pilots transmitted. On the other hand, in Fig.2.12a, the adaptive scheme
chooses the number of predictions to be greater than 10 until the Eb /N0 value of 20
dB. For example, at 20 dB, the number of predictions chosen is 15 which translates to
a bandwidth efficiency of 60%, and at 6 dB, the number of predictions is 100, which
achieves a bandwidth efficiency of 90.9%.
Next, we consider a non-neural network based benchmarking scheme to compare
the performance of the proposed adaptive scheme. The benchmarking scheme employs
Figure 2.13: BER performance comparison between the proposed adaptive scheme with
ML decoder and the benchmarking scheme with LMMSE channel estimation and linear
interpolation.
LMMSE channel estimation and linear interpolation (LI) along with ML decoding. For
fair comparison, the bandwidth efficiency is kept same in both the proposed as well as the
benchmarking schemes. We achieve this as follows. In both the schemes, transmission
is made in frames consisting of pilot symbols and data symbols. The number of pilot
symbols (np ) and data symbols (nd ) in each frame are taken to be np = 10 and nd = Nc ,
where Nc is the number of predictions chosen by the predictor algorithm. In the proposed
scheme, np = 10 pilot symbols followed by nd = Nc data symbols are transmitted
nd
in a frame. In the benchmarking scheme, one pilot symbol is sent followed by np
(=
Nc
10
) data symbols and this pilot-data symbol sequence is repeated till the end of the
frame. LMMSE channel estimation is performed during the pilot symbols and linear
interpolation is performed to obtain the channel estimates for the duration between two
pilot symbols.
Figure 2.13 shows the BER performance comparison between the proposed adaptive
scheme with ML decoder and the benchmarking scheme for 16-QAM at fD = 50 Hz
Figure 2.14: BER performance comparison between the proposed channel predictor aided
receiver and the linear prediction aided receiver, both with ML decoder.
and 100 Hz. It is seen that the proposed scheme performs significantly better than the
benchmarking scheme in the low-to-moderate range of Eb /N0 values (0 to 20 dB). This is
because of poor interpolation accuracy in the benchmarking scheme in this Eb /N0 range,
which can be explained as follows. The Nc values chosen in the 0 to 20 dB range are
large compared to the number of pilot symbols np (e.g., Nc is 60 at Eb /N0 = 10 dB for
Nc
fD = 50 Hz and np is 10). A large value of np
means the pilots in a frame are spaced
far apart leading to less accurate interpolation. In the higher range of Eb /N0 values, the
Nc
np
ratio becomes small due to smaller values of Nc , leading to closer spacing of pilots
and hence better interpolation accuracy. This makes the benchmarking scheme perform
close to the performance of the proposed scheme in the high Eb /N0 range.
2.3.5 Comparison with linear prediction scheme
In this subsection, we compare the performance of the proposed channel predictor aided
receiver with that of a receiver with channel predictor replaced by a linear prediction
algorithm. A time-varying channel with fD = 50 Hz is considered. The linear prediction
algorithm models the time-varying channel coefficients as an auto-regressive (AR) process

of order 2, i.e., for any time t,
h(t) = ρ1 h(t − 1) + ρ2 h(t − 2) + ϵt , (2.10)
where ρ1 and ρ2 are the parameters of the AR(2) process that need to be estimated, and
ϵt is a white noise process with zero mean and constant variance, σϵ2 . The values of ρ1
and ρ2 are computed as follows. 10 pilot symbols followed by Nc 4-QAM data symbols
Nc
(corresponding to η = 10+Nc
) are transmitted, where Nc (the number of predictions) is
chosen in accordance with the adaptive prediction algorithm in Section 2.3.4. 10 LMMSE
channel estimates at t = 0, 1, · · · , 9 are obtained using the received pilot symbols. A set
of 8 equations corresponding to time t = 2, 3, · · · , 9 is obtained from (2.10) using the
LMMSE estimates. The Yule-Walker (YW) estimation technique [79],[80] is employed on
these equations to determine the values of ρ1 and ρ2 , which are then used to estimate σϵ2 .
To obtain Nc channel predictions for t > 9, (2.10) is recursively used with the estimated
ρ1 and ρ2 values.
Figure 2.14 shows the BER performance comparison between the proposed channel
predictor aided receiver and linear prediction aided receiver, both with ML decoder.
At low SNRs, the LMMSE estimates are noisy and the values of ρ1 and ρ2 obtained
through YW estimation theory are inaccurate. This leads to poor quality of predictions
and poor BER performance. As SNR increases, the performance of linear prediction
algorithm aided receiver improves owing to better ρ1 and ρ2 estimates and reduced
Nc . However, it is observed that the performance of the proposed channel predictor
aided receiver is better than the linear predictor counterpart (e.g., at 10−4 BER, the
proposed predictor aided receiver has an advantage of about 2.5 dB compared to the
linear prediction receiver).
Model Tap delays (ns) PDP (in dB)
EPA 0, 30, 70, 90, 110, 190, 410 0, -1, -2, -3, -8, -17.2, -20.8
EVA 0, 30, 150, 310, 370, 710, 0, -1.5, -1.4, -3.6, -0.6, -9.1,
1090, 1730, 2510 -7, -12, -16.9
ETU 0, 50, 120, 200, 230, 500, -1, -1, -1, 0, 0, 0,
1600, 2300, 5000 -3, -5, -7
Table 2.4: Tap profiles of 3GPP channel models.
(a) SNR (b) η
Figure 2.15: MSE performance of the proposed channel predictor network under various
3GPP channel models in [81],[82] as a function of SNR and η.
2.3.6 Performance in 3GPP channel models
In this subsection, we present the MSE performance of the proposed channel predictor
network under different multipath channel propagation models defined by 3GPP [81],[82].
We consider extended pedestrian A (EPA) model, extended vehicular A (EVA) model,
and extended typical urban (ETU) model under slow (fD = 5 Hz) and fast (fD = 70 Hz)
mobility conditions. The tap delays and power delay profiles of these models are given
in Table 2.4. We use fD = 5 Hz for EPA model, fD = 5, 70 Hz for EVA model, and
fD = 70 Hz for ETU model. The MSE performance in a multipath propagation model
with L taps is obtained as follows. For estimating the channel across multiple taps, np
pilot sequences are transmitted. Each pilot sequence consists of a pilot symbol along
with L − 1 preceding and succeeding zeroes, making the length of each pilot sequence to
be 2L − 1. np LMMSE channel estimates corresponding to each tap are obtained from
the received pilot sequences. Nc number of deep channel predictions are made on each
tap and the MSE of the predicted coefficients are calculated with respect to the actual
channel coefficients.
Training of the predictor network is carried out using channel coefficients obtained
from the synthetic dataset obtained for a single-tap channel (i.e., the network is not
trained with the dataset from the actual 3GPP models). However, the trained network
could work well for all the actual 3GPP models which have multiple taps and non-uniform
power-delay profiles as shown in Table 2.4. This can be seen in Fig. 2.15a which shows
the obtained MSE values as a function of SNR with np = 10 and Nc is chosen according
to algorithm in Section 2.3.4. It is seen that the MSE values for all the considered 3GPP
models decrease with SNR, closely following the MSE of the LMMSE estimates in the low
and mid SNR regimes. For EVA with fD = 70 Hz and ETU, there is a small deviation
observed in MSE in the high SNR regime due to high Doppler spread. Figure 2.15b
shows the MSE performances of the channel predictor and linear prediction scheme in
Section 2.3.5 as a function of the bandwidth efficiency η for the 3GPP models. It is seen
that the MSE is below 10−2 for η ≤ 0.8, showing that the predictions are reasonably
accurate even when operating at a bandwidth efficiency of 80%. On the other hand,
the MSE achieved by the linear prediction scheme in Section 2.3.5 is found to be much
higher. So, although the predictor network is trained on a synthetic dataset, the network
could learn to observe the correlation in the channel coefficients at its input and use the
learnt correlation to make further predictions, even in settings or environment not seen
while training. This demonstrates the generalization capabilities and robustness of the
proposed channel predictor.
2.3.7 Block transmission in doubly-selective fading channel
In this subsection, we consider block transmission and detection in doubly-selective fad-

ing channels and evaluate the performance of the proposed deep channel predictor. We
consider a cyclic prefix single carrier (CPSC) system [83], [84], where the channel is taken
to be both frequency- and time-selective. Each CPSC frame consists of np pilot sequences
(see Section 2.3.6) followed by Nc = N − np (2L − 1) data symbols, where N is the CPSC
frame length, and L is the number of channel taps. Deep channel prediction is done on
each tap and the predicted coefficients are given, along with the received data symbols,
as input to the detector. We demonstrate the advantage of using NN-based detection in
such channels by comparing the performance of a) maximum-likelihood (ML) detection
using Viterbi algorithm and b) NN-based detection. In the ML detection using Viterbi
algorithm, the channel coefficients predicted by the deep channel predictor are used to
evaluate the likelihood costs. For NN-based detection, we use ViterbiNet [85], which
uses learning based computation of likelihoods in the Viterbi algorithm. We train the
ViterbiNet detector using the fade coefficients predicted by the deep channel predictor.
Figure 2.16 shows the BER performance of the considered CPSC system with N =
128, L = 2, 4-QAM, and fD = 50 Hz. The performance of ML Viterbi detector and
NN-based ViterbiNet detector are also shown. Performance with channel prediction for
np = 2, 4 per CPSC frame are shown. Performance plots with perfect CSI are also shown
for comparison. The following observations can be made from Fig. 2.16. The Viterbi-
Net detector trained using perfect CSI achieves almost the same performance as the ML
Figure 2.16: BER performance of the proposed predictor network in a CPSC system
with NN-based ViterbiNet detector.
Viterbi detector performance with perfect CSI. The performance of both the detectors
degrade when predicted channel coefficients are used. The performance degradation in
ML Viterbi detector is significantly higher than that in ViterbiNet detector. For ex-
ample, the ML Viterbi detector performance floors at a BER of about 10−2 for np = 2
at 25 dB SNR, whereas the ViterbiNet detector achieves a significantly better BER of
about 10−4 for the same SNR. Also, for np = 4, the ViterbiNet detector performs close
to that with perfect CSI (within about 2.5 dB gap at 10−5 BER), whereas ML Viterbi
detector starts flooring at 10−4 BER itself. This is in corroboration with the results
reported in [85], where it is shown that, in the presence of imperfect channel state infor-
mation (CSI), the performance of conventional Viterbi algorithm degrades significantly
whereas the NN-based ViterbiNet detection achieves significantly better performance.
The better performance of the combination of the proposed deep channel prediction and
NN-based ViterbiNet detection therefore demonstrates the benefit of learning approach
in communication receivers.
1 : 1 scheme 1 : k scheme
b b b
k-times
Pilot symbols
p1 p2 b p np
b b
d1 d2 d Nc
b b b
Data symbols
Figure 2.17: Arrangement of pilot and data symbols in data driven channel prediction.
2.4 Data driven channel prediction

In this section, we present the proposed data decision driven channel prediction archi-
tecture and its performance. The motivation for the data decision driven approach is
as follows. We note that the maximum bandwidth efficiency obtained in the adaptive
prediction scheme proposed in the previous section is 90.9%, which is obtained when the
number of predictions Nc = 100 and number of pilots np = 10. In the high SNR region,
however, the algorithm reduces Nc to 5, where it attains a bandwidth efficiency of only
33%. We aim to improve this low bandwidth efficiency by using a data decision driven
prediction architecture proposed in the following subsection.
2.4.1 Architecture
In the proposed data driven prediction approach, we adopt a 1:k transmission scheme
in which 1 pilot block (consisting of np pilot symbols) is sent every k data blocks (each
data block consisting of Nc data symbols) as shown in Fig. 2.17. The shaded block in
Fig. 2.17 represents a pilot block which is used to obtain np LMMSE estimates of the
channel. The Nc channel predictions obtained from the channel predictor network using
these estimates are used to decode Nc data symbols transmitted in the subsequent striped
data block. In the case of k = 1 (i.e., 1:1 scheme), one pilot block and one data block
Nc
are sent in an alternating fashion, leading to a bandwidth efficiency of np +Nc
. The 1:1
scheme is a purely pilot driven prediction scheme and there is no data driven prediction.
On the other hand, for k > 1, there is data decision driven prediction (described in the
2.4. Data driven channel prediction 79
Figure 2.18: Block diagram of the proposed data driven channel prediction scheme.
kNc
next paragraph) and the bandwidth efficiency improves to np +kNc
.
The block diagram of the proposed data decision driven prediction architecture is
shown in Fig. 2.18. The channel predictor and ML decoder blocks are the same as in
Fig. 2.5. The predictions from the channel predictor are fed to the ML decoder, which
uses the predictions to decode data symbols received through the channel. The LMMSE
estimator block receives these decoded symbols from the ML decoder along with the
symbols received through the channel. Here, the decoded symbols from the ML decoder
are treated as pilots and the signal received from the channel as the faded version of these
pilots, and an LMMSE estimate of the fade coefficients are obtained from this decoded
data. These LMMSE channel estimates act as a refined version of the predictions made
by the channel predictor network. The refined channel estimates are used to once again
decode the data symbols using a second ML decoder. If the decoded symbols from the
second ML decoder match the decoded symbols from the first ML decoder, then the
refined channel estimates are fed back to the input of the channel predictor to enable
further predictions. If they do not match, then the current decoded symbols are fed
back to the LMMSE estimator (as pilots) as before, and another set of refined channel
estimates are obtained. This process of data decoding and channel estimate refinement
is iteratively repeated until the decoded outputs from two consecutive iterations match.
When this happens, the LMMSE channel estimates in the subsequent iterations do not
change, as the decoded symbols being fed back as pilots to the LMMSE estimator do not
change. This is set as the convergence criterion in the receiver. If the criterion is not met
(a) MSE (b) BER
Figure 2.19: MSE and BER performance of 1:k data decision driven channel prediction
scheme with ML decoder at fD = 50, 100 Hz for 16-QAM.
for a certain number of iterations (e.g., 200 iterations), then the last obtained LMMSE
estimates are used as the feedback to the input to make further predictions. The fed
back channel estimates are used to obtain Nc predictions which is used to detect the
next set of Nc symbols. If there is an error in decoding a symbol and the corresponding
LMMSE estimate is fed back to the channel predictor input, then the further predictions
obtained may have a large MSE and this may result in more subsequent errors. The
value of k is chosen such that this error propagation is minimized.
2.4.2 Performance results
In Fig. 2.19a, we present the MSE performance of the predictions made by the data
driven channel prediction scheme at fD = 50, 100 Hz for 16-QAM. The values of k
considered are k = 1, 5, 10. It is seen that at low SNR values, the 1:1 scheme has a lower
MSE than the 1:k schemes, k = 5, 10. This is because there is no data driven prediction
in the 1:1 scheme and hence there is no error propagation due to decoding errors. On
the other hand, in the 1:k scheme (k = 5, 10), the MSE of the predictions degrades due
to error propagation caused by decoding errors at these low SNR values. As the SNR
2.4. Data driven channel prediction 81
increases, the MSE of the 1:k schemes decreases (due to fewer decoding errors) and the
gap from the MSE of 1:1 scheme reduces. In the high SNR regime, the 1:1 and 1:k
schemes achieve similar MSE performance, again due to fewer decoding errors in the 1:k
schemes.
Figure 2.19b shows the achieved BER performance with ML decoder corresponding
to the MSE performance presented in Fig. 2.19a. We observe that both the 1:5 and
1:10 schemes perform close to the 1:1 scheme. We also see that the BER performance
of 1:5 scheme is closer to that of the 1:1 scheme than the 1:10 scheme for both fD
values, which is justified owing to larger number of pilots in the 1:5 scheme. We further
note that the main advantage of the data driven prediction scheme is its bandwidth
efficiency due to the reduced number of pilots used in the scheme. For example, for
the 1:10 scheme, when the maximum value of Nc = 100 is chosen by the adaptive
algorithm, the total number of symbols decoded per estimation phase (consisting of 10
pilot transmissions) is 103 (i.e., 10 prediction phases with 100 symbols per prediction
1000
phase) and the bandwidth efficiency achieved is 1010
= 99%. Likewise, for Nc = 5,
50
the bandwidth efficiency achieved is 60
= 83.3%. Similarly, for the 1:5 scheme, the
maximum and minimum achieved bandwidth efficiencies are 98% and 71.4%, respectively.
Recall that, in the previous scheme without data driven prediction (i.e., 1:1 scheme), the
bandwidth efficiencies achieved for Nc = 100 and 5 are 90.9% and 33%, respectively.
In conclusion, we find that the system is able to utilize the channel very efficiently by
maximizing the number of data symbol transmission phases per pilot symbol transmission
phase, and this is achieved at the cost of a small loss in BER performance.
2.4.3 Comparison with NN-based prediction scheme in [86]
In this subsection, we compare the performance of the proposed data driven channel
prediction scheme with an LSTM based channel prediction scheme reported in [86] both
with ML decoder. 16-QAM modulation and fD = 153 Hz are considered. We fix the
ratio of number of pilot symbols (np ) to the number of data symbols (also the number
of predictions, kNc (see Sec. 2.4.1)), while varying np and kNc . The (np :kNc ) values
Scheme np k Nc
10:40 10 5 8
50:200 50 5 40
100:400 100 5 80
150:600 150 10 60
Table 2.5: Values of np , k, and Nc used for comparison with NN-based prediction scheme
in [86].
Figure 2.20: BER performance comparison between the proposed scheme and the NN-
based prediction scheme in [86], both with ML decoder.
2.5. Summary 83
considered are (10:40), (50:200), (100:400), and (150:600). Table 2.5 shows the values
chosen for np , k, and Nc in each case. Figure 2.20 shows the BER comparison between the
two schemes. As expected, the performance of (10:40) scheme is better than the (150:600)
scheme in both the cases due to smaller number of predictions per pilot block. It is
further observed that the proposed scheme achieves significantly better BER performance
compared to the scheme in [86]. This performance advantage in the proposed scheme is
attributed to the data driven feature and the training enhancement features incorporated
in the proposed scheme.
2.5 Summary
In this chapter, we proposed a neural network based framework for the design of robust
receivers in time-varying fading channels with temporal correlation in the fading process.
Central to the proposed framework was the deep channel predictor which used an RNN
that learned the underlying correlation model in the fading process and made predic-
tions of the channel fade coefficients into the future, thereby reducing pilot resources.
An FCNN based data symbol decoder aided by the RNN based channel predictor con-
stituted the receiver architecture. The basic version of the channel predictor kept the
number of future predictions fixed regardless of the operating SNR and Doppler. An
augmented adaptive channel prediction architecture which chose the number of future
predictions in accordance with the operating SNR and Doppler further improved the
bandwidth efficiency and performance. A data decision driven prediction architecture
with decision feedback provided a balance between pilot resources and performance. The
achieved robustness in the receiver performance over a range of Doppler and SNR condi-
tions demonstrated that the proposed deep channel prediction approach is a promising
approach for receiver design in time-varying fading channels.
Chapter 3
Learning based channel estimation

in OFDM systems
3.1 Introduction
In this chapter, we propose a learning based channel estimation scheme for orthogo-
nal frequency division multiplexing (OFDM) systems in the presence of phase noise in
doubly-selective fading channels [65]. Two-dimensional (2D) convolutional neural net-
works (CNNs) are employed for effective training and tracking of channel variation in
both frequency as well as time domain. The proposed network learns and estimates
the channel coefficients in the entire time-frequency (TF) grid based on pilots sparsely
populated in the TF grid. Viewing the TF grid as an image motivates choosing CNN
for this purpose. In order to make the network robust to phase noise (PN) impairment,
a novel training scheme where the training data is rotated by random phases before
being fed to the network is employed. Further, using the estimated channel coefficients,
a simple and effective PN estimation and compensation scheme is devised. Numerical
results demonstrate that the proposed network and PN compensation scheme achieve
robust OFDM performance in the presence of phase noise.
The rest of the chapter is organised as follows. The system model, the pilot frame
structure, and the input-output relation are presented in Section 3.2. The proposed
84
channel estimation scheme and the proposed PN compensation algorithm are presented
in Section 3.4. Numerical results, performance evaluation, and comparison with other
schemes are presented in Section 3.4. A summary of the chapter is presented in Section
3.5.
3.2 System model

Consider a single-input single-output (SISO) OFDM system with Nc subcarriers. Let
xf = [X0 X1 · · · XNc −1 ] be the information symbols (drawn from a modulation alphabet
A) multiplexed on the Nc subcarriers of one OFDM symbol in the frequency domain. Let
the corresponding time domain sequence after inverse discrete Fourier transform (IDFT)
c −1
be xt = {xn }N
n=0 . An Ncp -length cyclic prefix (CP) is added, and the cyclic-prefixed
time domain sequence is transmitted through a frequency-selective channel with L taps

(Ncp ≥ L). The received signal is affected by PN induced multiplicative distortion. The
received time domain sequence at the receiver, after removing the CP, is given by
yt = Φt (ht ⊛ xt + nt ) , (3.1)
where Φt = diag(ejϕ0 , ejϕ1 , · · · , ejϕNc −1 ) ∈ CNc ×Nc is a diagonal matrix with PN realiza-
tions on the diagonal, ht ∈ CNc ×1 is the channel impulse response (padded with Nc − L
zeros), nt ∈ CNc ×1 contains i.i.d. circularly symmetric Gaussian noise samples with vari-
ance σ 2 , and ⊛ denotes circular convolution operator. Let ψ t = [ejϕ0 ejϕ1 · · · ejϕNc −1 ]T
denote the time domain PN vector. The receiver converts yt to a frequency domain
vector yf ∈ CNc ×1 using discrete Fourier transform (DFT), which can be written as
yf = Ψf ⊛ (hf ⊙ xf + nf ) , (3.2)
where Ψf = [ψ0 ψ1 · · · ψNc −1 ]T ∈ CNc ×1 represents the DFT coefficient vector of the
time domain PN vector ψ t , hf , xf , and nf ∈ CNc ×1 represent the channel response,
86 Chapter 3. Learning based channel estimation in OFDM systems
transmitted symbol, and noise vector in the frequency domain, respectively, and ⊙ rep-
resents the Hadamard product (element wise multiplication). Defining a circulant matrix
Θf ∈ CNc ×Nc as
 
ψ0 ψNc −1 · · · ψ1
 
 ψ1 ψ0 · · · ψ2 
 
Θf = 
 .. .. .. ..  ,
 (3.3)
 . . . .
 
ψNc −1 ψNc −2 · · · ψ0
the circular convolution in (3.2) is equivalently represented as
yf = Θf (hf ⊙ xf + nf ) . (3.4)
Using the distributive property of matrix multiplication, (3.4) can be simplified to
yf = Θf hf ⊙ xf + Θf nf = h̃f ⊙ xf + ñf , (3.5)
where h̃f = Θf hf and ñf = Θf nf are the PN effected channel frequency response and
noise vectors, respectively, and the ith entry of ñf , (ñf )i ∼ CN (0, σ 2 ), i = 0, 1, · · · , Nc −
1.
Transmission is divided into subframes. Each subframe consists of Ns OFDM symbols
with lattice type pilot arrangement as shown in Fig. 1.14. From (3.5), the received signal
matrix in the frequency domain corresponding to one subframe can be written as
Yf = H̃f ⊙ Xf + Ñf , (3.6)
(0) (1) (Ns −1)

where Xf = [xf xf · · · xf ] ∈ CNc ×Ns denotes the Ns transmitted OFDM symbols
(i) (0) (1) (Ns −1)
with the i in xf denoting the OFDM symbol index. Likewise, H̃f = [h̃f h̃f · · · h̃f ]
(0) (1) (Ns −1) (0) (1) (Ns −1)
∈ CNc ×Ns , Ñf = [ñf ñf · · · ñf ] ∈ CNc ×Ns , and Yf = [yf yf · · · yf ] ∈
CNc ×Ns are the channel response, noise, and the received OFDM symbols, respectively.
Towards estimating the coefficients of the channel matrix H̃f for a subframe, pilot
3.3. Proposed channel estimation and PN compensation 87
symbols are placed at known locations in the subframe. One such arrangement is called
the lattice-type pilot arrangement, wherein Np pilot symbols are placed in the subframe
(see Fig. 1.14). Let xf,p ∈ CNp ×1 be the vector of transmitted pilot symbols and yf,p be
the corresponding received vector. Let h̃f,p ∈ CNp ×1 be the vector of channel coefficients
seen by the pilot symbols. The vector of least squares (LS) channel estimates at the pilot
ˆ
locations, h̃f,p , is obtained as
ˆ = argmin∥y − h̃ ⊙ x ∥2 ,
h̃ (3.7)
f,p f,p f,p f,p
h̃f,p
which on solving gives
ˆ yf,p
h̃f,p = . (3.8)
xf,p
ˆ at the pilot locations, interpolation is carried out to

Typically, using the knowledge of h̃ f,p
obtain the estimates for the entire TF grid, i.e., to obtain an estimate of H̃f . But due to
the time-selective nature of the channel and the random nature of rotations introduced
by PN, such traditional approaches yield poor estimates/performance. It is therefore
necessary to i) learn and track the time variations of the channel, and ii) estimate and
compensate the PN in order to achieve robust performance. In the following section, we
propose a CNN architecture to solve the first task, and solve the second task using the
estimates obtained from the first task.
3.3 Proposed channel estimation and PN compen-

sation
Figure 3.1 shows the block diagram of the proposed CNN based channel estimator net-
work and PN compensation scheme for OFDM systems. At the transmitter, an OFDM
subframe comprising of Ns OFDM symbols (with pilot and data symbols as shown in
Fig. 1.14), represented by Xf , is converted to time domain using IDFT operation to
Figure 3.1: Proposed CNN based channel estimator network and PN compensation
scheme.
Layer Input channels Output channels Kernel size
1 1 64 (16, 4)
2 64 32 (16, 4)
3 32 21 (17, 5)
4 21 1 (20, 8)
Table 3.1: Parameters of the 2D-CNN layers in the channel estimator network.
obtain Xt ∈ CNc ×Ns , and prefixed with CP. The subframe is transmitted over a doubly-
selective fading channel. The channel matrix for the subframe in the frequency domain
(0) (1) (Ns −1) (i)
is Hf = [hf hf · · · hf ], where hf is the channel response seen by the ith OFDM
symbol. The receiver introduces additive noise (Nt ) and multiplicative distortion (Ψt )
due to PN. CP is removed and the resultant matrix Yt is converted to frequency domain
using DFT to obtain Yf . The matrix Yf comprises of both pilot and data symbols. The
(0) (1) (Ns −1)
pilot symbols are used to obtain an estimate Ĥf = [ĥf ĥf · · · ĥf ] of the channel
matrix Hf using the proposed channel estimator network. Following this, Ĥf and the
received pilot symbols are used to estimate the PN samples and compensate the received
subframe Yf to obtain Yf′ using the proposed PN compensation algorithm. A second
set of channel estimates, Ĥ′f , is obtained using the channel estimator network with Yf′
as the received subframe. Finally, Ĥ′f is used for decoding the data symbols in Yf′ .
Figure 3.2: Proposed CNN based channel estimator network.
3.3.1 Proposed channel estimator network and training
The proposed channel estimator network comprises of four 2D-CNN layers, with the
stride of each layer set to one and padding adjusted to make the output of each layer
have the same dimension as the input. The input to the network is a sparse TF grid
comprising of LS estimates (using (3.8)) of the channel at the pilot locations and zeros
elsewhere. Inserting zeros in non-pilot locations serves two purposes. First, it allows
the CNN to understand where the pilot locations are (i.e., the non-zero locations) and
therefore allows for different number of pilots in the frame, and second, it allows the CNN
to understand where the 2D interpolation needs to be carried out (i.e., the zero locations).
The sparse TF grid is separated into real and imaginary parts and estimation is performed
sequentially. Using the sparse information available at the input, the network is trained
to ‘complete’ the TF grid, i.e., to provide estimates for the entire TF grid. This is
depicted in Fig. 3.2 wherein, at the input, the squares marked yellow represent the
availability of LS estimates in pilot locations and the estimator network provides the
estimates for the TF grid at the output, tracking the channel variations in both time
and frequency. The other parameters of the channel estimator network are presented in
Table 3.1.
Doubly-selective channel realizations are generated and used for training the channel
estimator network. During training, PN induced rotations are introduced at the input of
Hyper-parameter Value
Epochs 10000
Optimizer Adam
Learning rate 0.001, divide by 2 every 2000 epochs
Batch size 1000
Mini-batch size 64
Table 3.2: Hyper-parameters used for training the channel estimator network.
the network. The network is trained using PN samples that are Gaussian distributed [87]
2
with zero mean and σPN variance. Modelling PN samples using Gaussian distribution
improves the training robustness as the model is able to learn both with (when PN
absolute value is greater than zero) and without (when PN absolute value is close to
zero) the effect of PN. This helps the model generalize beyond the σPN values seen while
training. To train the network, the input is set to be a sparse TF-grid, Hp , where the
pilot locations contain channel coefficients of a channel realization with PN and zeros
elsewhere. The output of the network is compared against the true channel realization,
Hact , using an L1 -loss function given by
X
L= |f (ΘCE , Hp ) − Hact |, (3.9)
Hact
where f (·) represents the channel estimator network, ΘCE is the set of all trainable pa-
rameters in the network, and · denotes the mean operation over all the training samples.
The other hyper-parameters used in the training are shown in Table 3.2.
3.3.2 Proposed PN compensation algorithm
The proposed PN compensation algorithm begins by estimating the PN samples in the

TF grid, following which the received subframe is compensated. Using the estimate Ĥf
obtained from the channel estimator network, the ith OFDM symbol (consisting of pilot
and data symbols) in the received subframe can be approximated as (using (3.2))

(i) (i) (i) (i) (i)
yf = Ψf ⊛ ĥf ⊙ xf + nf . (3.10)
Representing (3.10) in the time domain yields (the superscript i is dropped for brevity)

yt = ψ t ⊙ ĥt ⊛ xt + nt . (3.11)
Defining a circulant matrix Ĥt as

 
(0) (Nc −1) (1)
ĥt ĥt ··· ĥt
 
 (1) (0) (2) 
 ĥt ĥt · · · ĥt 
Ĥt = 
 .. .. .. ..  ,
 (3.12)
 . . . . 
 
(Nc −1) (Nc −2) (0)
ĥt ĥt · · · ĥt
(3.11) can be equivalently written as

yt = ψ t ⊙ Ĥt xt + nt = ψ t ⊙ Ĥt FH xf + nt , (3.13)
where F represents the Nc -point DFT matrix and xf is the frequency domain vector
corresponding to xt . Let J denote the set of subcarrier indices at which pilot symbols
are present in xf . For indices j ∈ J , (3.13) can be written as
(j∈J )
(j∈J ) (j∈J ) H (j∈J )
yt = ψt ⊙ Ĥt F xf + nt . (3.14)
To obtain an estimate of PN samples at locations indexed by J , the following objective

function is minimized:
(j∈J ) 2
(j∈J ) (j∈J ) (j∈J ) H
ψ̂ t = argmin yt − ψt ⊙ Ĥt F xf , (3.15)
(j∈J )
ψt
where ∥ · ∥ denotes the 2-norm of the vector. Equation (3.15) is used on all OFDM
symbols containing pilots in the subframe. The PN estimates at all the pilot locations
are interpolated across the entire TF grid using an MMSE interpolator1 to obtain Ψ̂t .
(i) ′(i) ∗(i) (i)
For the ith received symbol yt , a compensated symbol yt = Ψ̂t yt is obtained,
where (·)∗ indicates the conjugation operation. The compensated time domain symbols
are converted to frequency domain to obtain the compensated subframe Yf′ . A final set
of channel estimates, Ĥ′f , is obtained from the pilot locations in Yf′ using the channel
estimator network again. This is done because Ĥ′f has higher accuracy when compared
to Ĥf obtained from Yf having PN induced rotations. Ĥ′f is used for decoding data
symbols in Yf′ . The operations outlined in (3.10) through (3.15) are carried out using
the channel estimates Ĥf obtained from the channel estimator network, to estimate the
PN values. The compensated subframe Yf′ is obtained using the estimated PN values,
and is employed to obtain channel estimates Ĥ′f from the channel estimator network.
3.4 Results and discussions

In this section, we present the mean square error (MSE) and BER performance of the
proposed channel estimator network and the PN compensation algorithm. For all the
simulations presented below, for each subframe, Ns = 14 and Nc = 72 as per LTE
standards [89], Np = 48, Sf = 6, and St = 7. 50000 realizations of Vehicular A (VehA)
channel model defined by ITU-R [90] with six taps, carrier frequency of fc = 2.1 GHz,
bandwidth of 1.6 MHz, and user equipment (UE) speed of 50 km/h (corresponding
Doppler frequency of fD = 97 Hz) are generated. 35000 realizations are used for training
the proposed channel estimator network (training data), 5000 realizations are used for
validating the training (validation data) and the remaining 10000 realizations are used
for testing (test data). Each tap in the VehA model is Rayleigh distributed with the
time selectivity based on Jakes model [10].
1
The PN process is stationary and it is reasonable to assume that the covariance of the process is
known [88].
3.4. Results and discussions 93
Set BPLL L0 fcorner Lfloor σPN

(Hz) (dBc/Hz) (Hz) (dBc/Hz) (degree)
1 107 -95 103 -150 2.78◦

2 4 × 107 -95 103 -150 5.46◦
3 4 × 107 -89 103 -150 10.85◦
Table 3.3: Phase noise PSD parameters.
For all the results presented below, a single trained channel estimator network is
2
obtained using VehA channel realizations with PN ∼ N (0, σPN ) , σPN = 1.58◦ , and fD =
97 Hz as training data. We use PyTorch machine learning library for the implementation,
training, and testing of the channel estimator network. We use Nvidia RTX 3090 GPU
platform to carry out all the simulations. PN samples are generated from its power
spectral density given by [87]
B 2 L0

fcorner
L(fm ) = 2 PLL 2 1+ + Lfloor , (3.16)
BPLL + fm fm
where BPLL is the -3 dB bandwidth of the phase locked loop (PLL), L0 is the in-band
phase noise level in rad2 /Hz (dBc/Hz), fm is the frequency offset from the carrier fre-
quency, fcorner is the flicker corner frequency, and Lfloor is the noise floor. For performance
evaluation, we choose three sets of values for the parameters in (3.16), that correspond
to σPN = 2.78◦ , 5.46◦ , and 10.85◦ , respectively, as shown in Table 3.3.
Figure 3.3 shows the MSE performance of the proposed channel estimator network
as a function of pilot SNR at a Doppler frequency of fD = 97 Hz for the three values
of σPN considered. The MSE performance of the first set of channel estimates (Ĥf )
and the second set of channel estimates refined by the PN estimation and compensation
algorithm (Ĥ′f ) are plotted. The performance with no PN is also plotted for comparison.
In addition, the MSE performance achieved by a 2D spline interpolation scheme is also
shown. It is seen that the MSE performance of the interpolation scheme is poor due to
the presence of PN and time selectivity of the channel. On the other hand, the proposed
Figure 3.3: MSE performance of the proposed channel estimator network without and
with the proposed PN compensation for different values of σPN .
network is able to learn and estimate the channel much better. For example, the MSE
of the first estimate Ĥf itself is much better while the MSE of the second estimate
Ĥ′f is close to the MSE with no PN. This demonstrates the generalization ability of
the proposed channel estimator network, wherein the same trained network is able to
perform well under different PN levels.
Next, Fig. 3.4 shows the BER performance of the OFDM system as a function
of Eb /N0 for 4-QAM, fD = 97 Hz, and 30 dB pilot SNR. The BER performance is
evaluated for the three considered values of σPN using the channel estimates Ĥf (without
PN compensation) and Ĥ′f (with PN compensation). For comparison purposes, the BER
performance with perfect channel state information (CSI) and no PN is also plotted. The
following observations can be made from the figure. Without PN, the BER performance
with the proposed channel estimator network is very close to the that with perfect CSI.
In the presence of PN, using the first channel estimate Ĥf , the BER performance floors
whereas using the refined estimate Ĥ′f , the performance gets close to that with perfect
CSI. For example, while the BER floors at 10−2 for σPN = 10.85◦ when Ĥf is used, the
Figure 3.4: BER performance comparison between the proposed channel estimator net-
work without and with the proposed PN compensation and the PN compensation scheme
in [64].
BER improves to about 4 × 10−4 at Eb /N0 = 30 dB for the same σPN when Ĥ′f is used.
Note that the BER with perfect CSI and no PN for the same Eb /N0 is 2.5 × 10−4 . Figure
3.4 also presents a comparison of the performance of the proposed scheme with that of
the PN compensation scheme (ref. scheme) proposed in [64]. It is observed that, for all
σPN values, the BER performance of the ref. scheme is comparable with that using the
channel estimates Ĥf , while the performance with channel estimates Ĥ′f in the proposed
scheme is superior. This can be attributed to the sub-optimality of the iterative scheme
followed in [64]. Further, the transmitted subframe in [64] consists of an initial block
type pilot and comb type pilot thereafter (about 30% of the symbols are pilots, while
the proposed approach uses lattice-type pilots (only 5% of symbols are pilots) and has
better bandwidth efficiency.
Figure 3.5: BER vs SNR performance comparison between the proposed scheme and the
PN compensation scheme in [91].
3.4.1 Comparison with NN-based PN compensation scheme in

[91]
Figure 3.5 shows the performance comparison between the proposed channel estimation
and PN compensation scheme and the scheme in [91] (ref. scheme) for 16-QAM, fD = 97
Hz, and 30 dB pilot SNR. The PN is modelled as a Brownian motion process (as in [91])
with PN bandwidth parameter β. The BER performance of the ref. scheme for β = 103
Hz is observed to floor. Further, increasing the number of iterations (Niter ) to get the
initial estimates in the ref. scheme improves its performance in the low and mid SNR
regimes. However, the proposed scheme is able to perform better than the ref. scheme
throughout the considered SNR range. This can be attributed to the absence of pre-
processing in the proposed scheme, while in the ref. scheme the networks are trained
using estimates obtained from the first iteration of a non-linear least square estimation
algorithm. Next, for β = 102 Hz, the performance of both the schemes are observed
to be close, and close to the perfect CSI performance. We note that the ref. scheme
considers the pilot arrangement as in [64], which has lower bandwidth efficiency than
Figure 3.6: BER vs pilot SNR performance comparison between the proposed scheme
and the PN compensation scheme in [91].
the proposed scheme. Also, the ref. scheme is computationally expensive. For example,
for each OFDM symbol with Nc = 160, the ref. scheme involves 4 NNs with about 108
floating-point operations (FLOPs), while the proposed scheme’s NN requires only about
3 × 107 FLOPs.
BER performance as a function of pilot SNR for 20, 30 dB data SNRs are plotted in
Fig. 3.6. As expected, BERs are high at low pilot SNRs due to increased MSE. The gap
from respective perfect CSI performance becomes more when data SNR is high. This is
because when data SNR is high (e.g., at 30 dB), MSE of channel estimates dominates
the effect of thermal noise leading to poor performance, and vice versa when data SNR
is low (e.g., at 20 dB). It is also seen that the proposed scheme performs better than the
ref. scheme in [91] across all pilot SNRs considered.
3.5 Summary
In this chapter, we proposed a 2D CNN based learning network to estimate the doubly-
selective channel coefficients of a TF grid in an OFDM system by treating the problem
as an image completion problem using sparsely available data (i.e., pilot symbols). This
channel estimation problem was considered in the presence of PN. To train the net-
work well, random phase rotations, resembling PN were introduced in the training data.
Numerical results showed that a single trained channel estimator network along with a
simple PN compensation scheme performed well under different PN levels and Doppler
frequencies, outperforming other recent schemes.
Chapter 4
Learning based DD channel

estimation in OTFS systems
4.1 Introduction
Orthogonal time frequency space (OTFS) modulation is a recently introduced modu-
lation scheme suited for high-mobility channels [32]. While contemporary multicarrier
modulation schemes such as orthogonal frequency division multiplexing (OFDM) suffer
from inter-carrier interference caused by high Doppler spreads in high-mobility channels,
OTFS has been shown to be robust to high Doppler spreads [32], [92], [93]. Therefore, it
is of significance to consider the problem of channel estimation for OTFS. In this chapter,
we propose learning based approaches for delay-Doppler (DD) domain channel estima-
tion in OTFS systems. First, we consider channel estimation with embedded pilots. The
proposed learning network, called DDNet, is based on a multi-layered recurrent neural
network (RNN) framework with a novel training methodology that works seamlessly
for both exclusive pilot frames as well as embedded pilot frames [66]. This generaliza-
tion is attributed to the training methodology, wherein multiple frame realizations with
different guard band sizes are used to train the network. Since the embedded frame
is spectrally inefficient due to the presence of guard symbols (zeros) around the pilot
symbol, we next consider an interleaved pilot (IP) placement scheme with a lattice-type
99
100 Chapter 4. Learning based DD channel estimation in OTFS systems
arrangement (which does not have guard symbols). For this frame, we propose a deep
learning architecture using recurrent neural networks (referred to as IPNet) for efficient
estimation of DD domain channel state information [69]. The proposed IPNet is trained
to overcome the effects of leakage from data symbols and provide channel estimates with
good accuracy. Lastly, to achieve full spectral efficiency (i.e., full rate), we consider su-
perimposed pilot frame, where the pilot symbols are superimposed on data symbols [70].
Our contributions here in this regard are two-fold. First, we propose a sparse superim-
posed pilot (SSP) scheme, where pilot and data symbols are superimposed in a few bins
and the remaining bins carry data symbols only. This scheme offers the benefit of better
inter-symbol leakage profile in a frame, while retaining full rate. Second, for the SSP
scheme, we propose an RNN based learning architecture (referred to as SSPNet) trained
to provide accurate channel estimates overcoming the leakage effects in channels with
fractional delays and Dopplers.
Simulation results demonstrate that the proposed DDNet achieves better mean square
error and bit error performance compared to impulse based and threshold based DD
channel estimation schemes. Further, simulation results show that the proposed IPNet
architecture achieves good bit error performance while being spectrally efficient. Simu-
lation results also show that the proposed SSP scheme along with fractional DD channel
estimation using the proposed SSPNet performs better than a fully superimposed pilot
scheme.
The rest of the chapter is organized as follows. The proposed DDNet, its architecture,
and training details along with simulation results are presented in Section 4.2. Section
4.3 presents the proposed IPNet and its performance results. Details regarding the
proposed SSPNet and its performance results are presented in Section 4.4. A summary
of the chapter is presented in Section 4.5.
4.2. DD channel estimator for OTFS with embedded pilots 101
Figure 4.1: OTFS modulation scheme.
4.2 DD channel estimator for OTFS with embedded

pilots
In this section, we first present the system model followed by the description of proposed
DDNet, and then present the numerical results and the performance of DDNet [66].
4.2.1 System model
Figure 4.1 shows the block diagram of the OTFS modulation scheme. At the transmitter,
information symbols are placed in the DD domain. They are mapped to time-frequency
(TF) domain using inverse symplectic finite Fourier transform (ISFFT). This is followed
by conversion to time domain using Heisenberg transform. The time domain signal is
transmitted through the channel. At the receiver, the received time domain signal is
converted to TF domain using Wigner transform. This is followed by conversion back
to DD domain using symplectic finite Fourier transform (SFFT) for detection.
M N information symbols, denoted by a[k, l]s, each belonging to a modulation alpha-
n o
l k
bet A are placed in an M ×N DD grid ,
M ∆f N T
, l = 0, · · · , M − 1, k = 0, · · · , N − 1 ,
where N is the number of Doppler bins, M is the number of delay bins, ∆f is the subcar-
rier spacing, and T = 1/∆f . The quantities 1/N T and 1/M ∆f represent the bin sizes
in the Doppler and delay domains, respectively. a[k, l]s in the DD domain are converted
to TF domain symbols A[n, m] using the ISFFT operation, given by
N −1 M −1
1 XX
a[k, l]ej2π( N − M ) ,
nk ml
A[n, m] = √ (4.1)
M N k=0 l=0
for n = 0, · · · , N − 1 and m = 0, · · · , M − 1. To obtain the time domain signal a(t),

Heisenberg transform of the TF signal A[n, m] is computed. Using a transmit pulse
ptx (t), this operation is defined as
N
X −1 M
X −1
a(t) = A[n, m]ptx (t − nT )ej2πm∆f (t−nT ) . (4.2)
n=0 m=0
The time domain signal a(t) is transmitted through the channel. The channel has the
complex baseband channel response in the DD domain, denoted by g(τ, ν), given by [93]
L−1
X
g(τ, ν) = gi δ(τ − τi )δ(ν − νi ), (4.3)
i=0
where L is the number of channel paths in the DD domain, δ is the Dirac delta function,
and gi , τi , and νi represent the channel gain, delay, and Doppler shift, respectively,
corresponding to the ith path. The received time domain signal b(t) at the OTFS
receiver is given by
Z Z
b(t) = g(τ, ν)a(t − τ )ej2πν(t−τ ) dτ dν + w(t), (4.4)
ν τ
where w(t) represents the additive noise. At the receiver, a match filtering operation is
carried out on the received signal b(t) with a receive pulse prx (t) yielding a TF domain
cross-ambiguity function, denoted by Fprx ,b (t, f ), and given by
Z
′
Fprx ,b (t, f ) = p∗rx (t′ − t)b(t′ )e−j2πf (t −t) dt′ , (4.5)
t′
where (·)∗ represents the complex conjugation operation. The transmit and receive pulse
are chosen such that they satisfy the biorthogonality condition, i.e.,
Fhrx htx (t, f )|t=nT,f =m∆f = δ(n)δ(m). (4.6)
Sampling (4.5) at t = nT and f = m∆f gives
B[n, m] = Fprx ,b (t, f )|t=nT,f =m∆f . (4.7)
The TF domain signal B[n, m] is then mapped back to the DD domain through SFFT
operation to obtain b[k, l] as
N −1 M −1
1 XX
B[n, m]e−j2π( N − M ) .
nk ml
b[k, l] = √ (4.8)
M N n=0 m=0
Using (4.1)-(4.8), the input-output relation of the OTFS modulation scheme in the DD
domain can be written as [93]
L−1
X
b[k, l] = gi′ a[(k − βi )N , (l − αi )M ] + w[k, l], (4.9)
i=0
where gi′ = gi e−j2πτi νi , αi is the integer corresponding to the index of delay tap and βi is
the integer corresponding to the Doppler frequency associated with τi and νi , respectively.
αi βi
Therefore, τi = M ∆f
and νi = NT
. Further, (4.9) can be written in a vectorized form as
b = Ga + w, (4.10)
where b, a, w ∈ CM N ×1 and G ∈ CM N ×M N and the (kM +l)th entry of a, akM +l = a[k, l]

for k = 0, · · · , N − 1, l = 0, · · · M − 1 and a[k, l] ∈ A. Likewise, bkM +l = b[k, l] and
wkM +l = w[k, l] for k = 0, · · · , N − 1, l = 0, · · · M − 1. gi s are assumed to be i.i.d. and
are distributed as CN (0, 1/L).
Figure 4.2: Pilot, guard, and data symbol placements in exclusive and embedded pilot
frames.
Pilot placement schemes
To estimate the DD domain channel matrix G, known symbols called pilots are placed
in the DD domain and transmitted. At the receiver, the received symbols corresponding
to the transmitted pilots are used to estimate the channel in the DD domain, which is
then used to construct the matrix G. We consider two types of pilot placement schemes.
Figure 4.2 shows the OTFS frame structure for the two pilot placement schemes. The
first scheme is the exclusive pilot frame scheme used in [67], wherein the entire DD
domain grid consists of a single pilot symbol and zeros elsewhere (see Fig. 4.2a), i.e.,

0, if k ̸= kp , l ̸= lp

a[k, l] = (4.11)
ap ,

if k = kp , l = lp ,
for k = 0, · · · , N − 1 and l = 0, · · · M − 1. A predetermined pilot symbol ap is placed

in the DD grid indexed by kp and lp , the Doppler and delay bin indices, respectively.
The second scheme is the embedded pilot frame scheme used in [68], shown in Fig. 4.2b,
where the DD grid consists of a pilot symbol (marked in red), guard symbols (marked
Figure 4.3: Proposed RNN based DDNet channel estimation scheme.
in yellow), and data symbols (marked in blue), which can be represented as


0, if k = kg , l = lg






a[k, l] = ap , if k = kp , l = lp (4.12)




ad ,

elsewhere,
for k = 0, · · · , N − 1 and l = 0, · · · M − 1. In (4.12), kg s and lg s denote the indices

of guard bands around the pilot symbol ap , and the remaining indices are occupied by
data symbols ad ∈ A. The pilot symbol is surrounded by guard symbols to alleviate
interference from data symbols. The number of guard symbols are adjusted to accom-
modate lτ and kν , the delay and Doppler taps corresponding to the largest delay τmax
and Doppler νmax , respectively [68]. Note that the embedded pilot scheme in (4.12) be-
comes exclusive pilot scheme in (4.11) when kg = 0, 1, · · · , kp − 1, kp + 1, · · · , N − 1 and
lg = 0, 1, · · · , lp − 1, lp + 1, · · · , M − 1.
4.2.2 DDNet - proposed RNN based DD channel estimator
In this subsection, we present the proposed DDNet, an RNN based architecture for
DD channel estimation, and the training methodology [66]. Figure 4.3 shows the block
diagram of the proposed DDNet. The information symbols a[k, l]s in the DD domain are
converted to a time domain signal a(t) at the OTFS transmitter and transmitted through
a doubly-selective fading channel. At the OTFS receiver, the received signal b(t) is
converted back to DD symbols b[k, l]s, k = 0, · · · , N −1, l = 0, · · · , M −1, given by (4.9).
Let b′ [k ′ , l′ ]s, a subset of b[k, l]s, denote the received DD symbols corresponding to the
pilot and guard bins, where k ′ = kp −kν , kp −kν +1, · · · , kp +kν and l′ = lp , lp +1, · · · , lp +lτ
Figure 4.4: Proposed RNN based DDNet architecture.
for the scheme in (4.12) and k ′ = 0, 1, · · · , N − 1, l′ = 0, 1, · · · , M − 1 for the scheme in

(4.11). The number of symbols in b′ [k ′ , l′ ]s is (2kν + 1)(lτ + 1) for pilot scheme in (4.12)
and M N for pilot scheme in (4.11), respectively. These b′ [k ′ , l′ ] symbols are converted
to a vector b′ of length (2kν + 1)(lτ + 1) for scheme in (4.12) and of length M N for
the scheme in (4.11). The input to the DDNet is the vector b′ . The output of the
DDNet is a vector m, called mask, of length (2kν + 1)(lτ + 1) for scheme in (4.12) and
M N for scheme in (4.11). Entries in m are values between 0 and 1. These entries are
thresholded such that values above 0.5 are replaced by 1 and those below are replaced
by 0. In the thresholded m vector, denoted by m′ , the indices corresponding to location
of 1s denote presence of valid channel paths at those locations in the DD grid. These
locations are used to obtain the estimates of the integers corresponding to delay taps
(i.e., α̂i s) and Doppler frequencies (β̂i s) (see (4.9)) in the DD grid. Finally, the vector
m′ is element-wise multiplied with the input vector b′ and the non-zero values from
the resulting vector are returned as DD domain channel coefficient vector ĝ. Using the
estimates ĝ, α̂, and β̂, the estimated DD domain channel matrix Ĝ is obtained. This
matrix is used for detection of data symbols.
Architecture
The architecture of the proposed DDNet block is shown in Fig. 4.4. The architecture
consists of P layers of long short-term memory (LSTM) [57], a variant of RNN. The
output of the LSTM layers is passed through a ReLU activation function, given by
ReLU(x) = max (0, x) , ∀x ∈ (−∞, ∞). This is then passed on to a fully connected
neural network (FCNN) with one layer. The FCNN is employed to reduce the dimension
Parameter Value
Number of LSTM layers P

LSTM Hidden size (h) 100
LSTM input dimensions (c, s, 2)
LSTM output dimensions (c, s, 100)
FCNN input neurons 100
FCNN output neurons 1
Table 4.1: Parameters of the DDNet architecture.
of the output of the LSTM network to the required dimension. This is then followed
1
by a sigmoid activation function, given by sigmoid(x) = 1+e−x
, ∀x ∈ (−∞, ∞). The
purpose of using a sigmoid function is to restrict the output between 0 and 1 and to
determine if a particular DD bin contains a valid path. To achieve this, as mentioned
above, the mask m at the output of sigmoid function is first thresholded to obtain the
vector m′ , followed by element-wise multiplication with the input. The non-zero entries
in the resulting vector constitute the estimated DD channel coefficients, denoted by ĝ.
The other parameters of the DDNet architecture are given in Table 4.1. The variable c
refers to the batch size and s is the sequence length, which is set to be M N for the pilot
scheme in (4.11) and (2kν + 1)(lτ + 1) for pilot scheme in (4.12).
Training methodology
Training data is obtained by generating multiple DD domain OTFS frames using pilot
schemes in (4.11) and (4.12). Further, various guard band realizations are also added to
the training data for the scheme in (4.12). These frames are converted to time domain
and transmitted through a doubly-selective channel and the received signal is converted
back to DD domain. From the received DD symbols, depending on the pilot scheme
employed, s-length vector b′ (see Fig. 4.4) is obtained. The real and imaginary parts of
this vector are concatenated before being fed to the DDNet. For training the DDNet,
Parameter Value
Epochs 20000
Optimizer Adam
Batch size 1000
Mini-batch size 64
Table 4.2: Hyper-parameters used for training the DDNet.
the ground truth is obtained by generating s-length true mask, denoted by z, whose ith
entry is defined as

1, if DD bin corresponding to i is a valid path

zi = (4.13)
0, else.

During training, the weights of the DDNet are updated such that the value of the bi-
nary cross entropy (BCE) loss function between z and the output of the DDNet, m, is
minimized. The BCE loss function for the ith index is given by
L(zi , mi ) = −zi log(mi ) − (1 − zi ) log(1 − mi ), (4.14)
where 0 ≤ mi ≤ 1 is the output of the DDNet and zi = {0, 1} is the ground truth.
The other hyper-parameters used in the training of the DDNet are presented in Table
4.2. Note that this training needs to be carried out offline, only once. Subsequently,
the network weights are frozen. New channel estimates are obtained from pilots in each
OTFS frame using the same trained network.
Inference from DDNet
Once the DDNet is trained, the weights are frozen. During the inference (testing) phase,
channel estimates, ĝ, in the DD domain are obtained through element-wise multiplication
of the input (b′ ) with the thresholded mask (m′ ), as shown in Fig. 4.4. To obtain the
estimates of α and β, denoted by α̂ and β̂, respectively, the following steps are followed
for the pilot scheme in (4.11). Let J denote the set of indices where the thresholded
mask m′ is 1, i.e., J = {j : m′j = 1, j = 0, 1, · · · , s − 1}. Then, for the ith path index,
α̂i = (Ji )M − lp , (4.15)

Ji

β̂i = − kp . (4.16)
M
For the pilot system in (4.12) where s = (2kν + 1)(lτ + 1), the thresholded mask m′ is
reshaped into a matrix of shape (2kν + 1) × (lτ + 1). Vectors u and v are defined as
u = [kp − kν , kp − kν + 1, · · · , kp + kν ] and v = [lp , lp + 1, · · · , lp + lτ ]. Then, an index
set, I, is defined as I = {ui × M + vj : m′ij = 1, i = 0, 1, · · · , 2kν , j = 0, 1, · · · , lτ }. For
the ith path index,
α̂i = (Ii )M − lp , (4.17)

Ii

β̂i = − kp . (4.18)
M
We have carried out the simulations for performance evaluation using PyTorch machine
learning library [2] on RTX 3090 GPU platform.
4.2.3 Results and discussions
In this subsection, we present the performance of the proposed DDNet for DD channel
estimation in OTFS. A carrier frequency of fc = 4 GHz and a subcarrier spacing of
∆f = 15 kHz are considered. We consider the Vehicular A (VehA) channel model
defined by ITU-R [90] with L = 6 paths and a maximum speed of 220 km/h. This speed
at 4 GHz carrier frequency corresponds to a maximum Doppler shift, νmax , of 815 Hz.
Each path has a Doppler shift generated using Jakes model νi = νmax cos θi , where θi
is assumed to be uniformly distributed between [−π, π]. We fix the number of Doppler
bins (N ) and delay bins (M ) to be 12 and 64, respectively. A BPSK symbol +1 is used
Figure 4.5: Effect of number of LSTM layers, P , on the NMSE performance of the
proposed DDNet.
as the pilot symbol and data symbols are chosen from 4-QAM alphabet. To train the
network, the batch size (c) is chosen to be 1100 of which 1000 OTFS frames are used
for training and 100 frames are used for validating the training. This training data is
refreshed every 20 epochs, wherein the pilot schemes in (4.11) and (4.12) are chosen
randomly with equal probability. Pilot energy is taken as follows. For the scheme in
√
(4.11), the pilot amplitude is taken to be M N , and for the scheme in (4.12) the pilot
p
amplitude is taken to be (2kν + 1)(lτ + 1).
To evaluate the accuracy of the channel estimates provided by the DDNet, we evaluate
the normalized mean square error (NMSE) for the DD domain channel matrix. The value
of NMSE is computed as follows. The estimates of the channel coefficients, delay taps,
and Doppler taps are obtained from the DDNet as described in Section 4.2.2. Using
these values, an estimate for the matrix G (see (4.10)), denoted by Ĝ, is obtained. The
h i
∥G−Ĝ∥2F
NMSE is computed as NMSE = E ∥G∥2 . For evaluating the bit error rate (BER)
F
performance, the message passing (MP) detector in [93] is used.

Effect of number of LSTM layers: Figure 4.5 shows the NMSE performance of the
Figure 4.6: NMSE vs spectral efficiency at different pilot SNRs.
DDNet as a function of pilot SNR for three different values of the number of LSTM
layers, P = 1, 2, 3. Other than the value of P , the same parameters in Table 4.1 and
the training hyper-parameters in Table 4.2 are used for all the values of P . The number
of parameters for P layers can be computed as NP = 4h2 (2P − 1) + 4hid + 8P h + 101,
where h is the hidden size (see Table 4.1), id = 2 is the input dimension, and 101 is the
number of parameters in the FCNN layer. Therefore, the number of parameters to be
learnt are 41701, 122501, and 203301 for P = 1, 2, and 3, respectively. Performance of
embedded as well as exclusive pilot frames are shown. From Fig. 4.5, it can be seen that
while the NMSE performance is comparable for different values of P , the performance
for P = 3 is slightly worse. This can be attributed to the steep increase in the number
of parameters that need to be learnt for P = 3, resulting in difficulty in training the
network. As a good balance between training complexity and achieved performance, we
fix P = 2 for the rest of the performance evaluation experiments. During the inference
(testing) stage, only 301 floating point operations (FLOPs) are required to compute the
mask from the DDNet. In contrast, the approach in [68] does not involve an offline
training phase. Further, the number of FLOPs required is 5(2kν + 1)(lτ + 1).
Figure 4.7: NMSE performance comparison between the proposed DDNet and the esti-
mation schemes in [67] (exclusive pilot) and [68] (embedded pilot).
NMSE vs spectral efficiency: Figure 4.6 shows the effect of number of guard symbols
on the NMSE performance of the DDNet. It shows NMSE as a function of spectral
Ng
efficiency η, where η is defined as η = 1 − MN
, and Ng is the number of guard symbols
in the frame. The pilot SNRs considered are 20 dB and 30 dB. The performance of
the threshold based scheme in [68] is also plotted for comparison. It can be seen that
the DDNet achieves significantly better NMSE performance compared to the threshold
based scheme in [68]. Also, while the NMSE of DDNet improves as the number of guard
symbols Ng is increased (i.e., smaller values of η), the NMSE of threshold based scheme
does not improve because of the inherent limitation in using a fixed threshold, for a
given pilot SNR. Whereas, the DDNet is able to generalize for varying guard band sizes
because of the training methodology employed.
NMSE performance comparison with schemes in [67] and [68]: Figure 4.7 shows the
NMSE comparison for the cases of exclusive pilot frame and embedded pilot frame. For
the exclusive pilot case, comparison is made between DDNet and the scheme in [67]. For
the embedded pilot case, comparison is made between DDNet and the scheme in [68]
Figure 4.8: BER performance comparison between the proposed DDNet and the thresh-
olding scheme in [68] (embedded pilot).
for η = 0.97. The following observations can be made from Fig. 4.7. It is seen that in
the exclusive pilot case, DDNet performance is better compared to that of the scheme in
[67]. For example, to achieve an NMSE of -50 dB, DDNet requires about 4 dB less pilot
SNR. In the embedded pilot case, the DDNet performance is far superior compared to
that of the scheme in [68]. The scheme in [68] does not work well up to 15 dB pilot SNR
because of the erroneous estimation of the delay and Doppler indices at these SNRs. For
pilot SNRs greater than 15 dB, the NMSE is seen to reduce with pilot SNR. Even in
this SNR region, there is a significant performance advantage for DDNet. For example,
to achieve an NMSE of -20 dB, DDNet requires about 14 dB less pilot SNR.
BER performance comparison with the scheme in [68]: In Fig. 4.8, we present a
comparison between the BER performance of the proposed DDNet and that of the scheme
in [68] with embedded pilot frame with η = 0.97. The BER performance with perfect
channel state information (CSI) is also presented for comparison. It is seen that the
DDNet outperforms the scheme in [68] by a large margin. For example, at 40 dB pilot
SNR, there is about 4 dB advantage for DDNet at a BER of 10−2 . Also, the DDNet
performance at this pilot SNR is close to that with perfect CSI. When the pilot SNR is
20 dB, the scheme in [68] fails to perform, whereas the DDNet performs much better.
This corroborates with the NMSE performance advantage predicted in Fig. 4.7, where
the scheme in [68] has a high NMSE value at 20 dB pilot SNR.
4.3 DD channel estimation with interleaved pilots in

OTFS
Consider the input-output relation for an M × N OTFS system in (4.10), given by
b = Ga + w,
where b, a, w ∈ CM N ×1 and G ∈ CM N ×M N . To obtain an estimate of the DD domain

channel matrix G, pilot symbols are placed in the DD grid and transmitted. These pilot
symbols leak into the neighboring DD bins due to delay and Doppler spreads of the
channel. At the receiver, the symbols corresponding to the transmitted pilots are used
to obtain an estimate of the DD channel. The number of pilot symbols and how they
are placed in an OTFS frame influence performance and spectral efficiency.
Pilot placement schemes
Figure 4.9 shows two types of pilot placement schemes, namely, interleaved pilot scheme
(Fig. 4.9a) and embedded pilot scheme (Fig. 4.9b), which are described below.
1. Embedded pilot scheme (Fig. 4.9b): The embedded frame is as defined in

(4.12). In this scheme, each frame consists of a pilot symbol (marked in red),
guard symbols (marked in yellow), and data symbols (marked in blue).
2. Interleaved pilot scheme (Fig. 4.9a): In this scheme, pilot symbols (marked in
red) are placed across each frame in a lattice-type arrangement [94]. The pilots are
surrounded by data symbols (marked in blue) without any guard bins in between.
4.3. DD channel estimation with interleaved pilots in OTFS 115
Figure 4.9: Pilot, guard, and data symbol placements in interleaved pilot and embedded
pilot frames.
Figure 4.10: Proposed RNN based IPNet channel estimation scheme.
The pilots are separated in the delay domain by Sτ bins and in Doppler domain by
Sν bins, which are chosen based on the number of pilots, Np , in a frame, with the
constraint Sτ > mτ and Sν > nν , where mτ and nν are the delay and Doppler taps
corresponding τmax and νmax , respectively. We will consider this interleaved pilot
placement scheme, which has not been considered for OTFS before. While guard
bins are avoided in this scheme, the receiver signal processing must be capable of
handling the effect of the leakage between pilot and data symbols. We propose an
RNN based network for this very purpose in the next subsection.
4.3.1 IPNet – proposed RNN based DD channel estimator
In this subsection, we present the proposed IPNet, an RNN based network for DD channel
estimation, its architecture and training methodology [69]. The block diagram of the
proposed IPNet is presented in Fig. 4.10. Information symbols a[n, m]s are converted
to time domain signal a(t) at the OTFS transmitter and transmitted through a time-
varying fading channel. At the OTFS receiver, the received signal, b(t), is converted back
to DD domain to obtain symbols b[n, m], n = 0, · · · , N − 1, m = 0, · · · , M − 1, given by
(4.9). The received DD frame is passed on to the IPNet block. At the IPNet block, the
following set of operations are carried out to obtain the first set of channel estimates.
Let npi and mpi denote the Doppler and delay indices for the ith pilot symbol, respec-
tively (see Fig. 4.9a), where i = 1, · · · , Np . Due to the channel, the pilot symbol spreads
into the nearby DD bins. For the ith pilot, the spread is contained within the indices
npi − nν to npi + nν on the Doppler axis and mpi to mpi + mτ on the delay axis. From
the received OTFS frame, the symbols in these locations are extracted and vectorized to
obtain the vector b′i ∈ C(2nν +1)(mτ +1)×1 for the ith pilot. This is repeated for each pilot
to obtain the vector b′ ∈ CNp (2nν +1)(mτ +1)×1 , given by
b′ = [b′1 b′2 · · · b′Np ]. (4.19)
Note that the vector b′ contains the effect of both pilot and data symbols. The input
to the IPNet is the vector b′ . The output of the IPNet is a vector ĝ ∈ C(2nν +1)(mτ +1)×1
of channel estimates. Among the (2nν + 1)(mτ + 1) entries in this vector, only those
channel estimates are picked as valid paths for which the absolute value is greater than
4% of the maximum absolute value in the vector, i.e., if ĝmax = maxi |ĝi |, then

0, if |ĝi | ≤ 0.04ĝmax

ĝi = (4.20)
ĝi ,

otherwise.
This operation is carried out because the absolute value of the output of the IPNet for
an invalid path is close to zero but not exactly zero. Further, through simulations we
observed that choosing the threshold as 4% of the maximum absolute value for differenti-
ating a valid path from an invalid path results in good channel estimation accuracy. The
locations of the valid paths are then used to obtain the estimates for integers correspond-
ing to delay taps (α̂i s) and Doppler frequencies (β̂i s) (see (4.9)) in the DD grid. Using
the estimates ĝ, α̂, and β̂, the estimated DD domain channel matrix Ĝ is obtained.
This matrix is used for detection of data symbols. To further improve the accuracy of
the channel estimates, the output of the detector, a′ [n, m], is fed back to the IPNet block
for cancelling the effect of data symbols. A new DD frame is constructed as
b′′ = b − Ĝa′ , (4.21)
where a′ ∈ CN M ×1 (b ∈ CN M ×1 ) is the vectorized version of a′ [n, m] (b[n, m]). Vector b′

is computed again using (4.19) and b′′ as the received frame. This is provided as input
to the IPNet and another set of refined channel estimates are obtained. This iterative
procedure is repeated P times and the output of the detector at the end of P th iteration,
â[n, m], is used to compute the bit error performance.
Architecture
The architecture of the proposed IPNet block is shown in Fig. 4.11. The architecture
consists of P layers of long short-term memory (LSTM) [57],[95], a variant of RNN.
The output of the LSTM layers is passed through a ReLU activation function, given
by ReLU(x) = max (0, x) , ∀x ∈ (−∞, ∞). The output of the LSTM layers is passed
through a fully connected neural network (FCNN) with one layer. The FCNN is employed
to reduce the dimension of the output of the LSTM network to the required dimension.
Since the output of the FCNN layer needs to be the channel estimate, a linear activation
function, with range between (−∞, ∞), is used at the output of the FCNN. Valid paths
are picked from the resulting vector at the output of the FCNN layer using (4.20). The
resulting vector ĝ is then returned as the channel coefficient vector. The other parameters
of the IPNet architecture are presented in Table 4.3. The variable c refers to the batch
size and s = Np (2nν + 1)(mτ + 1) is the sequence length. The output of the FCNN is a
vector of dimension 2(2nν + 1)(mτ + 1), where the the first (2nν + 1)(mτ + 1) dimensions
are treated as real and the remaining as imaginary part of the channel estimates.
Figure 4.11: Proposed RNN based IPNet architecture.
Parameter Value
Number of LSTM layers (P ) 3

LSTM hidden size (h) 50
FCNN output neurons 2(2nν + 1)(mτ + 1)
Table 4.3: Parameters of the IPNet architecture.
Training data is obtained by generating multiple OTFS frames with varying Np (num-
ber of interleaved pilots). These frames are converted to time domain and transmitted
through a time-varying fading channel. The received signal is converted back to DD
domain. Np (2nν + 1)(mτ + 1) symbols corresponding to the Np transmitted pilots are
extracted from the received frame as per (4.19) to obtain the vector b′ . The real and
imaginary parts of b′ are concatenated before being provided as input to IPNet. For
training the IPNet, the ground truth data is obtained by generating a (2nν + 1)(mτ + 1)
length true channel estimate vector, g. This vector is constructed such that the entries
are channel estimates only where there are valid paths and zeros elsewhere. During
training, the weights of IPNet are updated such that the L1 loss between the output of
IPNet, ĝ, and g is minimized. The L1 loss function is
N
1 X
L(g, ĝ) = |g − ĝ|, (4.22)
N i=1
Parameter Value
Epochs 20000
Optimizer Adam
Batch size 1000
Mini-batch size 64
Refresh training data Every epoch
Table 4.4: Hyper-parameters used for training the IPNet.
where N is the number of samples in the training set. The other hyper-parameters used
while training the IPNet are presented in Table 4.4. Note that this training needs to
be carried out offline, only once. Subsequently, the network weights are frozen. New
channel estimates are obtained from pilots in each OTFS frame using the same trained
network. Further, as will be shown in Section 4.3.2, the trained network is able to work
well for various Np values, owing to the construction of training data.
Estimation of delay and Doppler indices
Once the IPNet is trained, the weights are frozen. During the inference (testing) phase,
channel estimates, ĝ, in the DD domain are obtained after picking the valid paths (using
(4.20)). To obtain the estimates of α and β, denoted by α̂ and β̂, respectively, the
following steps are followed. The channel estimate vector ĝ ∈ C(2nν +1)(mτ +1)×1 is reshaped
into a matrix Ĥ ∈ C(2nν +1)×(mτ +1) . Index sets, I and J are defined to store the row
and column indices of the non-zero elements in Ĥ, respectively. That is, I = {i :
Ĥ[i, j] ̸= 0, i = 0, · · · , 2nν , j = 0, · · · , mτ } and J = {j : Ĥ[i, j] ̸= 0, i = 0, · · · , 2nν , j =
0, · · · , mτ }. Then, for the pth path index, the estimate of delay and Doppler indices are
obtained as
α̂p = Jp , (4.23)
β̂p = Ip − nν , (4.24)
where Jp and Ip denote the pth entry in J and I, respectively. We have carried out the
simulations using PyTorch machine learning library [2],[96] on RTX 3090 GPU platform.
Remark on complexity: For the IPNet, the number of parameters to be learnt for P
layers can be computed as NP = 4h2 (2P − 1) + 4hid + 8P h + 100 × 2(2nν + 1)(mτ +
1) + 2(2nν + 1)(mτ + 1), where h is the hidden size (see Table 4.3), id = 2 is the input
dimension, and 100×2(2nν +1)(mτ +1)+2(2nν +1)(mτ +1) is the number of parameters
in the FCNN layer. For P = 3, nν = 1, mτ = 2, and h = 50, NP = 52518. Note that
these parameters need to be learnt only once, offline. During the inference (testing)
stage, only 1018 floating point operations (FLOPs) are required to compute the channel
estimate. In contrast, the approach in [68] does not involve an offline training phase.
Further, the number of FLOPs required is 5(2nν + 1)(mτ + 1).
In this subsection, we present the mean square error (MSE) and bit error rate (BER) per-
formance of the proposed IPNet for DD channel estimation in OTFS. A carrier frequency
of fc = 4 GHz and a subcarrier spacing of ∆f = 15 kHz are considered. We consider the
Vehicular A (VehA) channel model [90],[97] with L = 6 paths and a maximum speed of
220 km/h. This speed at 4 GHz carrier frequency corresponds to a maximum Doppler
shift, νmax , of 815 Hz. Each path has a Doppler shift generated using Jakes model
νi = νmax cos θi , where θi is assumed to be uniformly distributed between [−π, π]. We
fix the number of Doppler bins (N ) and delay bins (M ) to be 12 and 64, respectively. A
BPSK symbol +1 is used as the pilot symbol and data symbols are chosen from 4-QAM
alphabet. To train the network, the batch size (c) is chosen to be 1100 of which 1000
OTFS frames are used for training and 100 frames are used for validating the training.
To evaluate the accuracy of the channel estimates provided by the IPNet, we evaluate
the normalized mean square error (NMSE) for the DD domain channel matrix as de-
scribed in Section 4.2.3. For evaluating the BER performance, the message passing (MP)
Figure 4.12: NMSE performance of the proposed IPNet as a function of pilot SNR for
different number of pilots.
detector in [93] is used. Note that with the proposed approach, since the valid paths are
chosen based on (4.20), which in turn depends on the energy of channel estimates, the
matrix Ĝ may have more non-zero entries than the actual channel matrix G. Therefore,
at low pilot SNRs (around 0 dB), where the noise energy is dominant, the NMSE can
take values that are greater than 1. In all the simulations presented below, the pilot
energy is kept same for all the competing schemes.
NMSE performance of IPNet
Figure 4.12 shows the NMSE performance of the proposed IPNet as a function of pilot
SNR for Np = 8 and 12. The pilot power is equally distributed among the Np pilots. The
NMSE performance of the embedded pilot scheme in [68] is also presented for comparison.
The NMSE performance of the IPNet without data cancellation (DC) for both the Np
values is observed to be close to that of the scheme in [68] with the performance being
slightly inferior at SNRs above 35 dB. However, with 1 iteration of DC (see (4.21)),
the NMSE performance improves beyond the scheme in [68] in the mid and high SNR
Figure 4.13: BER of the proposed IPNet and the scheme in [68] as a function of number
of pilots, Np .
regime. Note that, for the considered parameters, nν is 1 and mτ is 2. The scheme in
[68], therefore, requires (4nν + 1)(2mτ + 1) = 25 symbols for interference free estimation
of channel coefficients, while a better NMSE performance is achieved from the proposed
scheme using fewer DD bins (12 and 8 bins for Np =12 and 8, respectively).
BER as a function of number of pilots
Figure 4.13 shows the BER performance of the proposed IPNet as a function of number
of pilots, Np , for a fixed pilot SNR of 30 dB and a fixed data SNR of 16 dB. The
performance of the scheme in [68], with 1 pilot and 24 guard symbols, is also presented
for comparison. With the increase in the the number of pilots, the BER performance is
observed to improve for the proposed IPNet. This is because with increase in Np , the
sequence length s at the input of the IPNet also increases (see Table 4.3). This allows
the IPNet to provide estimates with better accuracy as more information is available at
the input. However, with further increase in the number of pilots (Np > 16), the BER
performance is observed to increase slightly, owing to decrease in the energy per pilot
Figure 4.14: BER performance comparison between the proposed IPNet with 12 pilots
and the scheme in [68] for 40 dB pilot SNR.
symbol. This demonstrates that the proposed IPNet is able to work with different pilot
densities and is able to perform better than the scheme in [68] when DC is employed.
BER vs SNR at different pilot SNRs
Here we present the BER performance of the proposed IPNet with 12 pilots for pilot
SNRs of 40 dB, 30 dB, and 20 dB. In all the figures, the performance of the OTFS system
with perfect CSI is presented for comparison. In addition, the BER performance of the
embedded pilot scheme in [68] is also presented.
1. Pilot SNR = 40 dB: The BER performance of the proposed scheme with a pilot
SNR of 40 dB is presented in Fig. 4.14. It is seen that with no DC, the performance
is close to that of the perfect CSI case till about 12 dB, after which the performance
floors. This flooring is alleviated when DC is employed. With 1 iteration of DC, the
performance improves closer to the perfect CSI performance, while with 2 iterations
of DC, the performance matches that of the scheme in [68]. The performance is
also very close to that of perfect CSI. For example, for a BER of 10−3 , the gap in
data SNR is observed to be less than a dB. For 16-QAM as well, the performance
with 2 iterations of DC is close to that with perfect CSI.
2. Pilot SNR = 30 dB: Figure 4.15 shows the BER performance when the pilot SNR
is 30 dB. The BER performance of the proposed IPNet with and without DC is
observed to be better than that of the scheme in [68]. With DC, the performance
improves with the improvement being larger when two iterations are used. With
two iterations of DC and the proposed approach, there is significant gain observed
over the scheme in [68]. For example, for a BER of 5 × 10−3 an SNR advantage of
about 5 dB is observed.
3. Pilot SNR = 20 dB: The BER performance for a pilot SNR of 20 dB is presented
in Fig. 4.16. It is observed that the performances of the proposed IPNet and the
scheme in [68] floor. In the low and mid SNR regime, the BER performance of
the proposed IPNet is observed to better, after which the performance of IPNet
4.4. Fractional DD channel estimation in OTFS with superimposed pilots125
without DC is observed to floor. With 1 and 2 iterations of DC, the proposed

IPNet is observed to outperform the scheme in [68].
From the results presented above, it is seen that the proposed IPNet with interleaved
pilots can achieve similar or better bit error performance compared to the embedded
pilot scheme in [68], while being more spectrally efficient.
4.4 Fractional DD channel estimation in OTFS with

superimposed pilots
In this section, we first derive the discrete system model for OTFS with rectangular pulse,
followed by the description of the proposed sparse superimposed pilot (SSP) scheme [70].
Finally, we present the numerical results and performance evaluation.
4.4.1 System model
The OTFS system model with fractional DD and rectangular transmit and receive pulses
is derived as follows. Information symbols, ADD [n, m]s, each drawn from a modulation
l
, NkT , l = 0, · · · , M − 1, k =

alphabet A are placed in an M × N DD grid given by M ∆f
0, · · · , N −1 , where M is the number of delay bins, N is the number of Doppler bins, ∆f
is the subcarrier spacing, and T = 1/∆f . Bin sizes in the delay and Doppler domains are
given by 1/M ∆f and 1/N T , respectively. The ADD [n, m]s are converted to TF domain
symbols ATF [k, l]s using the ISFFT operation, as
N −1 M −1
1 X X DD
A [n, m]ej2π( N − M ) ,
nk ml
TF
A [k, l] = √ (4.25)
M N n=0 m=0
for l = 0, · · · , N − 1 and k = 0, · · · , M − 1. The TF frame has duration N T and

bandwidth M ∆f , where T and ∆f are the sampling intervals along time and frequency,
respectively, satisfying T ∆f = 1. Equation (4.25) can be written in matrix form as
M ×N
√
ATF = FM ADD FH N ∈ C , where FM [m, n] = (1/ M ) exp(−j2πmn/M ), FN [m, n] =
√
(1/ N ) exp(−j2πmn/N ), and (·)H represents the Hermitian operation. The TF domain
samples, ATF [k, l]s, are pulse shaped using transmit pulse ptx (t) to generate a time
domain signal a(t). a(t) sampled at rate fs = M ∆f can be expressed in matrix form as
At = Ptx FH
MA
TF
= Ptx ADD FH t
N , where A contains M N samples of a(t). The sampling
interval is set to Ts = 1/M ∆f = T /M as per symbol spaced sampling [68], which results
in M length samples of the transmit and receive pulses. Ptx ∈ CM ×M is a diagonal matrix,
whose diagonal entries are obtained by uniformly sampling the transmit pulse ptx (t) at
time instants mT /M, m = 0, 1, · · · , M − 1. We consider the transmit pulse ptx (t) and
the receive pulse at the receiver prx (t) to be a rectangular pulse (i.e., Ptx = Prx = IM ,
where IM is M × M identity matrix).
Using the relation vec(XYZ) = (ZT ⊗ X)vec(Y), where ⊗ is the Kronecker product,
the time domain vector at = vec(At ) can be written as
at = vec(At ) = vec(Ptx ADD FH H

N ) = (FN ⊗ Ptx )a
DD
, (4.26)
where aDD = vec(ADD ) and the operation vec(Z) vectorizes matrix Z. Let g(τ, ν) denote
the complex baseband channel response in the DD domain. Then,
L−1
X
g(τ, ν) = gi δ(τ − τi )δ(ν − νi ), (4.27)
i=0
where L is the number of channel paths in the DD domain, δ is the Kronecker delta
function, and gi , τi , and νi denote the complex channel gain, delay, and Doppler, respec-
αi +ai
tively, corresponding to the ith path. For fractional delays and Dopplers, τi = M ∆f
and
βi +bi
νi = NT
, where αi = [τi M ∆f ]⊙ , βi = [νi N T ]⊙ , and [·]⊙ denotes the nearest integer
rounding operator with − 12 < ai , bi < 12 . At the OTFS receiver, the time domain signal,
b(t), is given by
Z Z
b(t) = g(τ, ν)a(t − τ )ej2πν(t−τ ) dτ dν + w(t), (4.28)
ν τ
where w(t) represents the additive noise. A forward cyclic shift matrix defined as
 
0 ··· 0 1
 
1 · · · 0 0
 
M N ×M N
 .. . . .. ..  ∈ R
Π= , (4.29)

. . . .
 
0 ··· 1 0
n
and ∆i = diag exp − j2π(αi +a i )(βi +bi )
, exp j2π(1−(αi +ai ))(βi +bi )
, ···,
M
oN MN
exp j2π(M N −1−(α

MN
i +ai ))(βi +bi )
model the delays and Dopplers, respectively, so that the
channel matrix G ∈ CM N ×M N can be obtained as
L−1
X
G= gi ∆i Π⌈αi +ai ⌉ . (4.30)
i=0
The derivation of the above equation is presented in Appendix A. The discrete

baseband vector bt ∈ CM N ×1 of b(t) can be represented as bt = Gat + w. The TF
matrix BTF ∈ CM ×N is derived from bt using Wigner transform, i.e., BTF = FM Prx Bt ,
Figure 4.17: Pilot and data symbols placement in the proposed SSP scheme and the FSP
scheme in [71].
where Bt = vec−1 (bt ) ∈ CM ×N , and Prx = IM for the considered rectangular receive
pulse prx (t). The DD signal matrix, BDD , is obtained from BTF as
BDD = FH
MB
TF
FN = Prx Bt FN . (4.31)
This can be vectorized to obtain
bDD = (FN ⊗ Prx )bt = (FN ⊗ Prx )(Gat + w). (4.32)
Substituting (4.26) in (4.32), we get
bDD = (FN ⊗ Prx )G(FH

N ⊗ Ptx )a
DD
+ w′
= Geff aDD + w′ , (4.33)
where w′ = (FN ⊗Prx )w and Geff ∈ CM N ×M N = (FN ⊗ Prx )G(FH

N ⊗ Ptx ) is the effective
channel matrix.
Figure 4.18: Proposed RNN based channel estimation scheme.
4.4.2 Proposed sparse superimposed pilot scheme
The receiver needs the knowledge of the channel for data detection. Pilot symbols
are sent in OTFS frames for the purpose of estimating the channel at the receiver.
Pilot and data symbols can be placed in a frame in different ways. There is rate loss
in exclusive pilot scheme (where a frame consists of only a pilot symbol and no data
symbols [67]) and embedded pilot scheme (where a frame consists of a pilot symbol
surrounded by some guard bins and the remaining bins are occupied by data symbols
[68]). Superimposed pilot schemes, where all bins are occupied by data symbols and
pilot symbols are superimposed on data symbols, offer full rate frames. We consider
two superimposed pilot schemes that achieve full rate. The first scheme is the full
superimposed pilot (FSP) scheme proposed in [71], where all bins carry both pilot as
well as data symbols as shown in Fig. 4.17(a). The second scheme is the one we consider
in this paper, which we call sparse superimposed pilot (SSP) scheme. In the proposed
SSP scheme, all bins carry data symbols and pilot symbols are superimposed on only
a few bins as shown in Fig. 4.17(b). We sparsely place the pilots in a lattice-type
arrangement where pilot symbols are spaced Sτ bins apart in the delay axis and Sν bins
apart in the Doppler axis. The advantages of doing this are that 1) by careful choice of
Sτ and Sν inter-symbol leakage/interference among the pilot symbols can be alleviated,
and 2) this allows for higher energy per pilot symbol which helps to improve channel
estimation accuracy, as will be seen in Section 4.4.4.
4.4.3 SSPNet - proposed DD channel estimator
In this subsection, we present the proposed SSPNet, an RNN based DD channel esti-
mator network for the proposed SSP frame, its architecture, and training methodology
[70]. The motivation behind using RNN for channel estimation is that the symbols re-
ceived corresponding to each pilot symbol transmitted can be viewed as a time sequence
and RNNs are typically chosen for learning dependency in time sequences. The chan-
nel matrix in (4.33) has a specific structure and a simpler DNN could potentially be
devised to exploit this structure. However, since the pilots are superimposed on data
symbols, there is significant interference from pilot to data symbols and data to pilot
symbols which might restrict this possibility. Exploring this can be a part of future work.
Given a received SSP frame, the task is to obtain estimates of the channel parameters
(gi , τi , νi ), i = 0, · · · , L − 1. Figure 4.18 shows the architecture of the proposed SSPNet
for channel estimation. The vector of received symbols in the DD domain, bDD (see
(4.33)), is used to generate the input vector to the SSPNet, b′DD , as outlined below. Let
npi and mpi denote the Doppler and delay indices for the ith pilot symbol, respectively
(see Fig. 4.17b), where i = 1, · · · , Np , and Np is the number of pilot symbols superim-
posed in the SSP frame. The channel spreads the pilot symbols into their nearby DD
bins. For the ith pilot, the spread is contained within the indices npi − nν to npi + nν
on the Doppler axis, and mpi to mpi + mτ on the delay axis. Here, mτ = ⌈τmax M ∆f ⌉
and nν = ⌈νmax N T ⌉ are integers corresponding to maximum delay and Doppler spread,
respectively. The received symbols in the bins of the ith pilot’s spread area in the frame
are extracted and vectorized to obtain the vector b′DD
i ∈ C(2nν +1)(mτ +1)×1 . This is done
for each pilot. The concatenated vector b′DD ∈ CNp (2nν +1)(mτ +1)×1 , given by
b′DD = [b′DD
1 b′DD
2 · · · b′DD T
Np ] , (4.34)
is fed as the input to the SSPNet. The architecture of the SSPNet is designed and trained
such that the same network, once trained, works for different Np values, SNRs, and DD
profiles. The SSPNet obtains an estimate of the channel gain vector ĝ ∈ C(2nν +1)(mτ +1)×1 .
Among the (2nν + 1)(mτ + 1) entries in this vector, only those channel gain estimates
whose absolute values are greater than a small threshold ϵ are picked as valid paths, i.e.,

0, if |ĝi | ≤ ϵ

ĝi = (4.35)
ĝi ,

otherwise.
This is required as the output of SSPNet for an invalid path is close to but not equal
to zero. The locations of the valid paths are then used to obtain the estimates for delay
(τ̂i s) and Doppler spreads (ν̂i s) in the DD grid. Using the estimated vectors ĝ, τ̂ , and
ν̂, the estimated DD domain channel matrix Ĝ is obtained, which is used for detection
of data symbols.
Iterative scheme (SSP-I): To further improve the accuracy of the channel estimates,
the output of the detector, a′DD , is fed back to the SSPNet for iteratively cancelling the
effect of data symbols in channel estimation as follows. A new DD frame is constructed
as
b′′DD = bDD − Ĝa′DD . (4.36)
The vector b′DD is computed again using (4.34) with b′′DD in (4.36) as the received
frame. This updated b′DD vector is given as input to the SSPNet and another set of
refined channel estimates is obtained. This iterative procedure is repeated Niter times
or until a convergence criterion is met. The procedure is stopped at ith iteration if
∥ĝ(i) − ĝ(i−1) ∥2 < ζ, i.e., the squared norm of the difference between the channel estimate
vector at the ith iteration and the (i − 1)th iteration is less than ζ. The output of
the detector at the end of the iterations, denoted by âDD , is taken as the final detected
output.
Architecture
The SSPNet architecture (see Fig. 4.18) consists of P layers of long short-term memory
(LSTM) [57], a variant of RNN. The output of the LSTM layers is passed through a
rectified linear unit (ReLU) activation function, given by ReLU(x) = max (0, x) , ∀x ∈
Parameter Value
Number of LSTM layers (P ) 3

LSTM hidden size (h) 50
FCNN output neurons 2(2nν + 1)(mτ + 1)
Table 4.5: Parameters of the SSPNet architecture.
(−∞, ∞), followed by a fully connected neural network (FCNN) layer. The FCNN is
employed to reduce the dimension of the output of the LSTM network to the required
dimension. A linear activation function with range (−∞, ∞) is used at the output of the
FCNN. Using (4.35), valid paths are picked at the output of FCNN and ĝ thus obtained
is the estimated channel gain vector. The other parameters of the SSPNet architecture
are presented in Table 4.5. The variable c is the batch size and s = Np (2nν +1)(mτ +1) is
the sequence length. The output of the FCNN is a vector of dimension 2(2nν +1)(mτ +1),
where the the first (2nν + 1)(mτ + 1) dimensions are treated as real and the remaining
as imaginary parts of the channel gain estimates.
Data for training the SSPNet is obtained by synthetically generating multiple SSP OTFS
frames with varying Np . These frames are converted to time domain and sent through
a time-varying fading channel. The received signal is converted back to DD domain.
Np (2nν + 1)(mτ + 1) symbols corresponding to the Np transmitted pilots are extracted
from the received frame as per (4.34) to obtain the vector b′ whose real and imaginary
parts are concatenated. The ground truth data for training the SSPNet is obtained
by generating a (2nν + 1)(mτ + 1) length true channel gain vector, g. This vector is
constructed such that the entries are channel gains only where there are valid paths and
Parameter Value
Epochs 20000
Optimizer Adam
Batch size 1000
Mini-batch size 64
Refresh training data Every epoch
Table 4.6: Hyper-parameters used for training the SSPNet.
zeros elsewhere. During training, the weights of SSPNet are updated such that the L1
loss between the output of SSPNet, ĝ, and ground truth g, given by L(gi , ĝi ) = |gi − ĝi , |,
is minimized. The other hyper-parameters used in the training of the SSPNet are listed
in Table 4.6. Once the training is completed offline, the network weights are frozen. The
same trained network can provide channel estimates for different SNRs, Np values, and
DD profiles in the testing phase.
In this subsection, we present the mean square error (MSE) and bit error rate (BER)
performance of the proposed SSPNet for DD channel estimation. A carrier frequency
of fc = 4 GHz and a subcarrier spacing of ∆f = 15 kHz are considered. Vehicular A
(VehA) channel model [90] with L = 6 paths and a maximum speed of 220 km/h, i.e.,
a maximum Doppler νmax of 815 Hz, is considered. Each path has a Doppler generated
using νi = νmax cos θi , where θi is uniformly distributed between [−π, π]. This implies
that each path can have random Dopplers between −νmax and νmax . We present results
for the cases of integer DDs as well as fractional DDs. In the integer DDs case, the delay
and Doppler taps are rounded off to the nearest integers. In the fractional DDs case,
the fractional DD values are retained as such without rounding. The number of Doppler
bins (N ) and delay bins (M ) are taken to be 12 and 64, respectively. Data symbols
(a) FSP Tx frame, Np = M N = 768 (b) SSP Tx frame, Np = 12
(c) FSP Rx frame, Np = M N = 768 (d) SSP Rx frame, Np = 12
Figure 4.19: Energy distribution in FSP and SSP frames at the transmitter and receiver
for integer DD.
are chosen from BPSK alphabet and +1 is chosen as the pilot symbol. To train the
network, the batch size (c) is chosen to be 1100 of which 1000 OTFS frames are used for
training and 100 frames are used for validating the training. Each data symbol and pilot
symbol has energy denoted by σd2 and σp2 , respectively, with σd2 + σp2 = 1. This ensures
that the average energy per frame is set to 1 for fair comparison with the FSP scheme
in [71]. Data symbol detection is carried out using message passing (MP) detector in
[93]. Maximum number of iterations (Niter ) in SSP-I scheme is 10, Sτ = 2mτ + 1 = 5,
Sν = 4nν + 1 = 5, ϵ = 10−10 , ζ = 10−6 .
Figure 4.20: BER performance of the proposed SSPNet as a function of Np for integer
DD.
Pilot energy spread in FSP and SSP frames
Figure 4.19 shows an example distribution of energy in various bins in FSP and SSP
frames at the OTFS transmitter and receiver for integer DD. At the transmitter (Figs.
4.19a and 4.19b), there is no discernible difference between the pilot and data symbols
in the FSP scheme, whereas, in the proposed SSP scheme, the pilot symbols have signif-
icantly higher energy (M N σp2 /Np ) than data symbols. This is because of fewer pilots in
the SSP scheme for the same total pilot energy per frame. A similar trend is observed
at the receiver as well (Figs. 4.19c and 4.19d), where the pilots and data symbols in
the FSP scheme have leaked into one another, while the interference among the pilots is
alleviated in the SSP scheme. Due to higher energy per pilot symbol in the SSP scheme,
the corresponding received frame also contains high energy DD bins that help to achieve
improved channel estimation accuracy.
Figure 4.21: BER performance of the proposed SSPNet as a function of Np and σd2 for
integer DD.
Choosing optimal Np and σd2
Figure 4.20 shows the BER performance of using the proposed SSPNet for channel
estimation as a function of number of pilots, Np , in a frame (see Fig. 4.17b). When
Np is small, the SSPNet has very few observations (s) to work with and the MSE and
consequently the BER performance is poor. The BER performance improves as Np
is increased. For Np > 20, the energy per pilot symbol gets reduced and becomes
comparable to the data symbol energy. This degrades the MSE of the channel estimates
and therefore the BER increases. The BER attains its minimum value when Np = 12.
We choose this value of Np = 12 in all the simulations that follow. Figure 4.21 shows
the BER performance as a function of the data energy, σd2 . It is seen that the BER
performance improves as the data energy σd2 increases. Minimum BER is attained at
σd2 = 0.84, beyond which the BER increases. This is because as σp2 decreases beyond 0.16
(σd2 > 0.84) the energy per pilot symbol reduces, resulting in poor accuracy of channel
estimates. We fix σd2 = 0.84 for the rest of the simulations. This choice works well for
both integer as well as fractional DDs. We note that the optimal energy allocation for
Figure 4.22: MSE vs SNR performance of the proposed SSPNet compared with that of
the FSP scheme in [71] for integer DD.
the FSP scheme in [71] is σp2opt = 0.3322 and σd2opt = 0.6678.
MSE and BER comparison between SSP and FSP schemes for integer DD
The MSE performance of the proposed SSPNet without iterative cancellation (SSP-NI)
and with iterative cancellation (SSP-I) with integer DD are presented in Fig. 4.22. The
performance of the FSP-NI and FSP-I scheme in [71] are also plotted for comparison. It
is seen that the MSE performance of the SSP-NI scheme is better than both the FSP-
NI and FSP-I schemes. This is because, the proposed approach is able to cancel the
interference between pilot and data symbols better through learning. Also, the SSP-I
scheme achieves even better MSE performance. Figure 4.23 shows the BER performance
of the proposed SSPNet as a function of SNR for integer DD for BPSK and 16-QAM. The
performance of FSP-NI, FSP-I, and perfect CSI schemes are also plotted in an integer
DD channel for comparison. It can be observed that the performance of the SSP-NI
scheme is better than the FSP-NI scheme, with a performance gain of about 4 dB for
BPSK. For example, a BER of 4 × 10−2 is attained at 8 dB with SSP-NI, while it is
Figure 4.23: BER vs SNR performance of the proposed SSPNet compared with that of
the FSP scheme in [71] for integer DD.
attained at 12 dB with FSP-NI scheme. It can also be observed that the SSP-I scheme
performs better than the FSP-I scheme for both the constellations.
MSE and BER comparison between SSP and FSP schemes for fractional DD
The MSE performance of the proposed SSPNet without iterative cancellation (SSP-NI)
and with iterative cancellation (SSP-I) with fractional DD are presented in Fig. 4.24.
The performance of FSP-NI and FSP-I schemes in [71] with fractional DD are also plotted
for comparison. It is seen that the MSE performance of the SSP-NI scheme is superior
compared to that of the FSP-NI scheme and FSP-I scheme in low and mid SNR regime,
while the SSP-I scheme performs even better. Figure 4.25 shows the BER performance of
the proposed SSPNet as a function of SNR with fractional DD. The performance of FSP-
NI, FSP-I, and perfect CSI schemes with fractional DD are also plotted for comparison.
The proposed SSP-NI scheme performs much better than the FSP-NI counterpart. For
example, a BER of 6 × 10−2 is attained at 8 dB with SSP-NI, while it is attained at
16 dB with FSP-NI scheme. It can also be observed that the SSP-I scheme performs
Figure 4.24: MSE vs SNR performance of the proposed SSPNet compared with that of
the FSP scheme in [71] for fractional DD.
Figure 4.25: BER vs SNR performance of the proposed SSPNet compared with that of
the FSP scheme in [71] for fractional DD.
better than the FSP-I scheme. These performance improvements are significant given
that the FSP schemes in [71] assume that the estimates of τi s and νi s are perfectly known
and only the gi s are estimated, whereas the proposed SSPNet estimated all the three
tuples (τi , νi , gi ). This shows that superimposing the pilots sparsely as in the proposed
SSP frame, and using a learning based SSPNet for channel estimation can achieve better
MSE and BER performance.
4.5 Summary
In this chapter, we considered the problem of channel estimation for OTFS systems.
First, we considered the embedded and impulse pilot frames. We proposed an RNN
based detector called DDNet which was able to work seamlessly for both the pilot frames
owing to the training methodology. Second, we proposed the interleaved pilot frame,
wherein the number of guard bins around the pilots were reduced to zero, to improve the
spectral efficiency. For this frame, we proposed IPNet for channel estimation, which was
trained to overcome the leakage from data symbols. Finally, we proposed the full rate
SSP frame having full spectral efficiency. To estimate the channel, we proposed SSPNet
which was trained to overcome interference from pilot symbols due to spread in the DD
domain. Simulation results demonstrated that the proposed DDNet, IPNet, and SSPNet
achieved better mean square error and bit error performance compared to other schemes
in literature.
Chapter 5
Learning in TF domain for DD

channel estimation in OTFS
5.1 Introduction
In this chapter, we propose a learning-based approach for estimation of fractional delay-
Doppler (DD) channel in orthogonal time frequency space (OTFS) systems [72], [73]. A
key novelty in the proposed approach is that learning is done in the time-frequency (TF)
domain for DD domain channel estimation. Learning in the TF domain is motivated by
the fact that the range of values in the TF channel matrix is favorable for training as
opposed to the large swing of values in the DD channel matrix which is not favourable
for training. A key beneficial outcome of the proposed approach is its low complexity
along with very good performance. Specifically, it drastically reduces the complexity of
the computation of a constituent DD parameter matrix (CDDPM) in a state-of-the-art
algorithm [74]. We develop this TF learning approach for two types of OTFS systems,
namely, 1) two-step OTFS, and 2) DZT-OTFS. Our results show that the proposed TF
learning-based approach achieves almost the same performance as that of the state-of-
the-art algorithm, while being drastically less complex making it practically appealing.
The rest of the chapter is organized as follows. The proposed TF learning approach
for conventional OTFS, the training details, and performance results are presented in
141
142 Chapter 5. Learning in TF domain for DD channel estimation in OTFS
Section 5.2. Section 5.3 presents the proposed TF learning approach for DZT-OTFS.
Additionally, the training details and the performance evaluations are also presented in
this section. A summary of the chapter is presented in Section 4.4.
5.2 Learning in TF domain for fractional DD chan-

nel estimation in OTFS
In this section, we present the system model and the pilot frame structure followed by
a brief description of state-of-the-art algorithm in [74]. We present the proposed TF
learning approach and move on to performance results.
5.2.1 System model
M N information symbols are multiplexed in the DD domain to generate the symbol

matrix denoted by ADD ∈ AM ×N , where A signifies the modulation alphabet from which
these information symbols are drawn. These symbols are distributed along the delay
and Doppler axes with spacings of T /M and ∆f /N , respectively, where ∆f equals 1/T
is the subcarrier spacing, B = M ∆f is the bandwidth, and M and N are the number
of delay and Doppler bins, respectively. Through the use of inverse symplectic finite
Fourier transform (ISFFT), the symbols in the DD domain are transformed into the
time-frequency (TF) domain. This transformation can be represented by the equation
ATF = FM ADD FH
N, (5.1)
where FM is the unitary discrete Fourier transform (DFT) matrix of size M , and (.)H sig-
nifies the Hermitian operation. The next step involves converting ATF into a continuous
time domain signal, as defined by the Heisenberg transform, as
M
X −1 N
X −1
a(t) = ATF [m, n]ptx (t − nT )ej2πm∆f (t−nT ) , (5.2)
m=0 n=0
5.2. Learning in TF domain for fractional DD channel estimation in OTFS143
where ptx (t) is the transmit pulse. The transmit signal a(t) passes through a channel
comprising of P paths in the DD domain. Each path is associated with a delay τp (with
0 < τp < T ) and a Doppler shift νp , which are assumed to take fractional values. The
received signal b(t) is then given by
P
X
b(t) = αp a(t − τp )ej2πνp (t−τp ) + w(t), (5.3)
p=1
where w(t) denotes the additive white Gaussian noise. This received signal is converted
back to DD domain, first by transforming it to TF domain using Wigner transform to
obtain BTF ∈ CM ×N , and then using symplectic finite Fourier transform (SFFT) to
obtain BDD ∈ CM ×N in the DD domain. These transformations can be expressed as
Z
′ ′ ′ ′
BTF [m , n ] = b(t)prx (t − n′ T )e−j2πm ∆f (t−n T ) dt, (5.4)
t
where m′ = 0, 1, · · · , M − 1, n′ = 0, 1, · · · , N − 1, prx (t) is the receive pulse, and
BDD = FH
M BTF FN . (5.5)
The transmit and receive pulses are assumed to be rectangular pulses with a duration of
√
T and amplitude 1/ T . Using the equations above, the relationship between BDD and
ADD can be formulated as follows [98]
P
X
bDD = αp Ep (τp , νp )aDD + w, (5.6)
p=1
where w ∈ CM N ×1 is the additive noise samples distributed as i.i.d. CN (0, σ 2 ) and

G = Pp=1 αp Ep (τp , νp ) is the channel matrix. The detailed derivation of the above input-
P
output relation is presented in Appendix B. Additionally, bDD ∈ CM N ×1 , aDD ∈ AM N ×1

are vectorized forms of BDD and ADD , respectively, i.e., bDD [d′ ] = bDD [k ′ M + l′ ] =
BDD [l′ , k ′ ], aDD [d] = aDD [kM +l] = ADD [l, k], l′ , l = 0, 1, · · · M −1, k ′ , k = 0, 1, · · · , N −1,
and d′ , d = 0, 1, · · · , M N − 1, and Ep is an M N × M N matrix whose entries are given

by
Ep [d′ , d] = e−j2πτp νp ee′ , (5.7)
′
−k νp τp
−j2πn k N
ej2π M (l −l−M T ) fτp ,νp ,k,l′ (m), and
PN −1 − ∆f PM −1 m ′
where e = 1
N n=0 e , e′ = 1
M m=0
fτp ,νp ,k,l′ (m) is evaluated using
−1−m
MX
"
j2π sl
′ τp jπ(1+ τTp )( ∆f
νp
−s) τp νp
fτp ,νp ,k,l′ (m) = e M 1− e sinc 1 − −s
s=−m
T T ∆f
τ jπτp νp #
τ ν
e T ( ∆f −s) sinc
k p p p
+ e−j2π N −s .
T T ∆f
(5.8)
5.2.2 Proposed TF learning based DD channel estimation
Here, we present the pilot frame structure, briefly describe the channel estimation algo-
rithm in [74] for reference, and then present the proposed learning approach [72]. As in
[98], an impulse pilot frame, Ap ∈ RM ×N , given by

p
 M N Ep ,

if k = kp , l = lp ,
Ap [k, l] = (5.9)
0,

otherwise,
is transmitted for channel estimation, where k = kp , l = lp is the DD resource element

(DDRE) in which the pilot symbol is transmitted and Ep is the energy of the trans-
mitted time domain pilot symbol. The received OTFS frame corresponding to the pilot
frame is used to estimate the channel at the receiver. The estimation algorithm obtains
estimates of the three tuple (τ̂p , ν̂p , α̂p ), p = 1, 2, · · · , P ′ , where P ′ is the number of
estimated paths. These estimates are used to construct the estimated channel matrix,
P ′
Ĝ = Pp=1 α̂p Ep (τ̂p , ν̂p ) ∈ CM N ×M N (refer (5.6)), which is then used for detection of data
frames that follow the pilot frame.
Channel estimation algorithm: Equation (5.6) can be written in an alternate

form as [74]
P
X
b= r(τp , νp )αp + w = R(τ , ν)α + w, (5.10)
p=1
where r(τp , νp ) = Ep aDD ∈ CM N ×1 , R(τ , ν) = [r(τ1 , ν1 ) · · · r(τP , νP )] ∈ CM N ×P , α =

[α1 · · · αP ]T ∈ CP ×1 , and w ∼ CN (0, σ 2 ) ∈ CM N ×1 . We refer to the matrix R(τ , ν)
as the constituent DD parameter matrix (CDDPM) because it captures the effects of
delay and Doppler of each path in the channel on the transmitted OTFS frame. The
maximum-likelihood solution for the three tuple estimation is given by
(α̂, τ̂ , ν̂) = arg min∥b − R(τ , ν)α∥2 , (5.11)

α,τ ,ν
which is an estimation problem in three variables. To reduce the estimation complexity,

α can be solved given (τ , ν) as
−1 H
α = RH (τ , ν)R(τ , ν)

R (τ , ν)b. (5.12)
To estimate τ and ν, given α, (5.11) can be solved to obtain

[τ̂ , ν̂] = arg max Θ(R) , (5.13)
τ ,ν
where Θ(R) = bH R(τ , ν)(RH (τ , ν)R(τ , ν))−1 RH (τ , ν)b. Substituting τ = τ̂ and ν =

ν̂ in (5.12), we obtain the estimate of the channel coefficient α̂.
The algorithm proceeds in a path-wise fashion, i.e., the delay and Doppler values
of pth path (1 ≤ p ≤ Pmax ) are estimated before values of (p + 1)th path values are
estimated. Since the knowledge of the number of paths is not assumed to be known, a
maximum of Pmax paths are estimated. The estimation of τp and νp for the pth path is
carried out in three steps. First, a coarse estimation (integer estimation) is carried out
to obtain τp′ , νp′ . This is followed by an iterative fine estimation step, where the fractional
estimation of the delay and Doppler is carried out to obtain τ̂p , ν̂p . Finally, a refinement
step refines the estimates obtained till the pth path. In each of the steps, the cost func-
tion in (5.13) is maximized over different search areas as described below. The algorithm
begins with initializing R(τ , ν) = [r(τ1 , ν1 ) r(τ2 , ν2 ) · · · r(τPmax , νPmax )] = 0M N ×Pmax .
Coarse estimation: The search area in this step is defined as G = M0∆f , M1∆f , · · · , ML∆f ×

− NKT , · · · , N0T , · · · , NKT , where L = ⌈τmax M ∆f ⌉, K = ⌈νmax N T ⌉, τmax and νmax de-

note the maximum delay and Doppler, respectively, and × denotes Cartesian product of
two sets. For estimating the parameters for the pth path, r(τp , νp ) is computed for all
(τp , νp ) in G and the coarse estimates are obtained from (5.13) by maximizing the cost
function over the search area.
Iterative fine estimation: Following the coarse estimation step, the search area is now
defined around the optimal coarse value (for s = 1) or the fine estimate obtained in
(s)
the previous iteration of the fine estimation step (for s > 1) given by F (s) = wτ Γ +
(s)
τ̂ (s) ⊗ wν Λ + ν̂ (s) , where s is the iteration number in the fine estimation step,
(s) 1 (s) 1
wτ = M ∆f ms−1
and wν = N T ns−1
are the resolution along Doppler and delay, respec-
τ ν
0, · · · , ⌊ m2τ ⌋ for τ̂s = 0 and Γ = −⌊ m2τ ⌋, · · · , 0, · · · , ⌊ m2τ ⌋ for τ̂s > 0,

tively, Γ =
and Λ = −⌊ n2ν ⌋, · · · , 0, · · · , ⌊ n2ν ⌋ . For s = 1, τ̂ (1) and ν̂ (1) are initialized to the coarse

estimates obtained earlier. A similar procedure as in coarse estimation step is followed

using F (s) as the search area for obtaining the first fine estimate (τ̂ (2) , ν̂ (2) ), following
which s in incremented by 1. For s > 1, the search area is centered over the newly ob-
tained fine estimate with finer resolution. This is carried out until a maximum number
of iterations (smax ) is reached or the stopping criterion given by |τ̂ (s+1) − τ̂ (s) | < ϵτ and

|ν̂ (s+1) − ν̂ (s) | < ϵν is met.
Refinement step: To further improve the accuracy of the estimates, the refinement of the
estimated parameters are carried out. After the tth path estimation, with 1 < t < Pmax ,
before estimation of the (t + 1)th path, the refinement of all the paths till t are carried
out. For refining the zth path (1 < z < t), all the t columns in R are filled except
the zth column. Next, coarse and iterative fine estimation are carried for the zth path.
After refinement of all the paths, the algorithm proceeds to estimate the parameters of
Figure 5.1: Proposed TF learning network architecture for learning R(τ , ν).
the (t + 1)th path.

Stopping criterion: After estimating t paths, the matrix R(τ̂ , ν̂) is obtained and α̂ is
obtained from (5.12). If ∥E (t) − E (t−1) ∥2 < ϵ, where E (t) = b − R(τ̂ , ν̂)α̂ then the algo-
rithm stops. If the criterion is not met until t = Pmax , then the algorithm is terminated
at t = Pmax .
Proposed TF learning based approach using DNN
In the above channel estimation algorithm, all the coarse estimation, fine estimation,
and refinement steps require multiple computations of the cost function in (5.13) which
requires generation of the CDDPM, R. Computing the columns of R, r(τi , νi ), for each
path involves high complexity. Therefore, to reduce the complexity and for practicality,
we propose to devise and train a network for learning the columns of R. The proposed
DNN architecture and training methodology are presented in the following text.
1) Architecture: Figure 5.1 shows the architecture of the proposed approach. The
input to the DNN is the matrix T = [τ̆ ν̆] ∈ RS×2 , where S is the cardinality of G (for
coarse estimation) or F (s) (for the sth iteration of the fine estimation) and τ̆ = κτ /τmax ,
ν̆ = κν/νmax . The division by τmax (νmax ) is carried out to normalize the values of delay
(Doppler) between 0 and 1 (-1 and 1)1 . Further, the multiplication by κ is done to
magnify small changes in delay and Doppler values in the training data. The vectors τ
and ν are obtained from the search area G or F (s) . This matrix is passed through two
networks, DNN1 and DNN2. DNN1 outputs the real part, [or1 · · · orM N ] ∈ RS×M N , while
DNN2 outputs the imaginary part, [oi1 · · · oiM N ] ∈ RS×M N . The real and imaginary
S×M ×N
parts are combined and reshaped to obtain RTF
col ∈ R . Each M × N matrix in RTF
col
is then converted to DD domain from TF domain2 using SFFT and vectorized to obtain
an M N -length vector. These vectors form the rows of Rcol ∈ CS×M N . The DNN1 and
DNN2 are trained so as to provide r(τ (t), ν(t)) ∈ C1×M N as the tth row of Rcol for tth
row in the input, T(t) = [τ (t) ν(t)].
The architectures of DNN1 and DNN2 are identical and are comprised of fully con-
nected layers as shown in Fig. 5.1. The output dimension is twice the input dimension,
i.e., the ith layer (i = 1, 2, · · · ) of DNN1 (DNN2) has input and output dimensions 2i
and 2(i+1) , respectively. The number of layers, NL , in DNN1 (DNN2) is determined by
the choices of M and N , such that the last layer has input dimension 2NL and output di-
mension min(2NL +1 , M N ), with 2NL < M N and 2NL +1 ≥ M N . For each fully connected
layer except the last layer, a rectified linear unit (ReLU) activation function is used, and
a linear activation function is used for the last layer to allow the output of DNN1 and
DNN2 to span R. The parameters of DNN1 (DNN2) are listed in Table 5.1.
2) Training methodology: Training data is obtained by generating (τ, ν) tuples and
corresponding r(τ, ν) vectors using r(τ, ν) = (EaDD )T (see (5.10)). The vectors r(τ, ν) ∈
C1×M N are reshaped into matrices of size M × N and converted to TF domain using
ISFFT, following which they are converted back to vectors to obtain rTF (τ, ν) ∈ C1×M N .
To train the network, the tuples are fed as input to the DNN1 and DNN2 to generate the
1
The normalization to values between 0 and 1, and -1 and 1 for delay and Doppler, respectively, are
done so that the ranges of both the inputs are similar, which aids training. Without this normalization,
delay would be in the order of 10−6 , while Doppler would be in 103 .
2
We note that the training is carried out in the TF domain instead of DD domain. This is done
because the ratio of the absolute values of the highest value to the lowest value in the vector r(τ, ν) in
the DD domain is huge which is detrimental to training. In the TF domain, however, this ratio is quite
reasonable and therefore favorable for training (see Figs. 5.2 and 5.3).
Parameter Value
Architecture Fully connected neural network

Input dimension 2
Output dimension M N = 2048
Number of layers (NL ) 10
Activation function ReLU for all layers except last layer
and linear activation function for last layer
κ 1000
Table 5.1: Parameters of DNN1/DNN2.
Batch size 40000

Mini batch size 8000
Number of epochs 40000
Number of training samples 325000
Table 5.2: Hyper-parameters used while training.

Figure 5.2: Absolute values of training data in DD domain in dB scale.
output. Training is carried out using an Adam optimizer to minimize the L1 loss function
evaluated between the output of the DNN1 (DNN2) and ℜ{rTF (τ, ν)} (ℑ{rTF (τ, ν)})
values, where ℜ{·} and ℑ{·}, represent the real part and imaginary part, respectively.
The other hyper parameters used while training are presented in Table 5.2. We note that
this training has to be carried out offline, only once. Subsequently, the network weights
are frozen. During test phase, DNN1 and DNN2 with the trained weights are used for
both coarse and fine estimation steps in the estimation algorithm.
In this section, we present the results obtained using the proposed low-complexity TF
learning based channel estimation algorithm. The following parameters are considered.
M = 64, N = 32, ∆f = 30 kHz and fc = 5.1 GHz. Two power delay profiles (PDPs)
are considered. First, as in [98], the channel is considered to have P = 5 paths with
a line of sight (LOS) path having a Rice factor of 15 dB. The first and second paths
have fixed delays given by 0.667µs, 0.867µs, respectively, and for the other paths, the
delays are uniformly distributed in (0.867µs, 7µs]. For all the paths, the Dopplers are
Figure 5.3: Absolute values of training data in TF domain in dB scale.
generated using the Jake’s Doppler spectrum, i.e., νp = νmax cos(θp ), with θp ∼ U(−π, π],
where U{·} denotes uniform distribution. As in [98], the channel gain of the LOS path
is determined by a fixed absolute squared value, according to the Rice factor. For the
remaining paths, an exponential PDP is considered. Second, the Vehicular A (VehA)
channel model with P = 6 paths defined in [90] is considered. Further, mτ = nν = 10,
ϵτ = 10−10 , ϵν = 10−2 , Pmax = 15, smax = 10, and ϵ = 0.01M N σ 2 .
Training in DD domain vs TF domain
Figures 5.2 and 5.3 show the absolute value of the training data in dB scale in the DD
domain and TF domain, respectively. It is seen that the values in the DD domain range
from about -25 dB to 15 dB while that in TF domain are centered around 0 dB with
very little variance. This large swing in the DD domain compared to the TF domain
is detrimental to training. The network trained in the DD domain, therefore, performs
worse than that trained in the TF domain as will be shown next.
The advantage of training the network in TF domain is demonstrated in the NMSE
plots shown in Fig. 5.4. The networks DNN1 and DNN2 with the same parameters and
Figure 5.4: NMSE performance comparison between training carried out in DD domain
and TF domain.
hyper-parameters are trained using data obtained in DD domain and TF domain. Due
to the large range of values in the CDDPM in DD domain, the training is not effective
leading to poor NMSE performance. Whereas, the training produces good NMSE results
when the CDDPM in TF domain is used for training. This justifies the need for training
in the TF domain and effectiveness of the proposed approach.
NMSE as a function of pilot SNR
Figure 5.5 shows the NMSE performance comparison between the proposed approach,
the DDIPIC algorithm in [74], and the M-MLE algorithm in [98]. It is observed that the
NMSE performance of the proposed approach is better than the M-MLE scheme. For
example, an NMSE of 10−3 is attained at a pilot SNR of 15 dB for M-MLE, whereas
it is achieved at about 10 dB in the proposed approach. Also, the performance of the
proposed approach is similar to that of DDIPIC.
Figure 5.5: NMSE performance comparison between the proposed approach, the DDIPIC
algorithm in [74], and M-MLE algorithm in [98].
Figure 5.6: BER performance comparison between the proposed approach, DDIPIC
algorithm in [74], M-MLE algorithm in [98], and perfect CSI.
Figure 5.7: BER performance comparison between the proposed approach, DDIPIC
algorithm in [74], M-MLE algorithm in [98], and perfect CSI for VehA channel model.
BER as a function of SNR
For computing the BER, we assume that the channel remains constant for two OTFS
frame duration. The first frame is the pilot frame given in (5.9), which is used for
channel estimation, and the second frame is an OTFS frame with data symbols drawn
from a constellation, detected using the estimated channel. Figures 5.6 and 5.7 show the
BER performance of the proposed approach as a function of SNR using 64-QAM and
minimum mean square error (MMSE) detection for P = 5 with pilot SNR 10 dB and
P = 6 with pilot SNR 15 dB, respectively. Additionally, the performances of DDIPIC
and M-MLE algorithms are also presented. Perfect channel state information (CSI)
performance is also added as a benchmark. It is observed that for both the considered
channel models, the performance of M-MLE scheme is inferior to both the proposed
approach as well as the DDIPIC scheme, owing to the inferior NMSE performance.
Also, the proposed approach performs very close to the DDIPIC algorithm, which is in
turn close to the perfect CSI performance. For example, in Fig. 5.6, for a BER of 10−3 ,
the proposed approach has an advantage of about 2.5 dB, while the DDIPIC algorithm
Method Description Average time (in sec)
1 Brute force 60
2 TF domain processing 1
3 Method in [99] 10
4 TF learning (Prop.) 0.1
Table 5.3: Run time complexities of computing r(τ, ν).
has an advantage of less than a dB when compared with the proposed algorithm.
Complexity comparison
In this subsection, we compare the complexity of generating r(τ, ν), which are columns
of the R matrix, using the following approaches. The first method is the brute force
computation using the equivalence r(τ, ν) = (EaDD )T . In the second method, we com-
pute this equivalence in TF domain to simplify calculations, i.e., aDD is converted to aTF
using ISFFT and rTF is obtained using an analytical input-output relation which requires
much less computational complexity, followed by rTF to r(τ, ν) conversion using SFFT.
The third method is from [99], which gives a low-complexity method for generation of
the matrix G. Here, we computed the (kp M + lp )th column of G using the method in
[99] by substituting P = α = 1, which gives r(τ, ν). The fourth method is the proposed
TF learning method, where the RTF
col learnt in TF domain is converted to DD domain
through SFFT to obtain r(τ, ν). We obtained the run time complexity of computing
r(τ, ν) of all the methods on the same machine for fair comparison (see Table 5.3). It is
seen that the naive brute force way of computing r(τ, ν) requires 60 s of run time, while
the second method requires just about 1 s. The third method in [99] requires 10 s, which
is not as good as the second method. The run time of the proposed method is the best
amongst all (better by a factor of at least 10), which is practically very attractive. While
the other three methods give exact values of r(τ, ν) (due to their computation using
analytical expressions in the system model), the proposed method gives r(τ, ν) through
learning. Yet, the performance achieved by the proposed learning is quite close to those
obtained using exact analytical computations. This shows that a judicious adoption of
learning approach for the problem at hand can yield efficient solutions.
5.3 DD channel estimation in DZT-OTFS via learn-

ing in TF domain
Consider a DZT-OTFS system having the vectorized input-output relation in (1.45),
given by
I
X li ki
zy = αi ej2π M N zyi ,
i=1
PM −1 PN −1
where Zyi [m, n] = l=0 k=0 Zx [l, k]Zvi [l, n − k] Zf˜i [m − l, n] (from (1.41)). To
estimate the channel an impulse pilot frame given by

M N
p
 M N Ep ,

if m = ,n =
2 2
Zx [m, n] = (5.14)
0,

otherwise,
is considered, where Ep is the average energy of each bin of the frame. The received
pilot signal vector, zy , is used to estimate the channel, which is then used for detection
of data symbols, at the receiver. The channel estimation algorithm from [74] adapted to
DZT-OTFS is described below.
Channel estimation algorithm: Equation (1.45) can be written in an alternate
form as
I
X
zy = gi αi + w = Gα + w, (5.15)
i=1
li ki (i) (i)
where gi = ej2π M N H2 H1 zx ∈ CM N ×1 , G = [g1 (l1 , k1 ), g2 (l2 , k2 ), · · · , gI (lI , kI )] ∈
CM N ×I , and α = [α1 , α2 , · · · , αI ]T ∈ CI×1 . The matrix G is referred to as the delay-
Doppler matrix (DDM) as it captures the effect of the channel delay and Doppler on
the transmitted symbols. The maximum likelihood (ML) solution for the three tuple
5.3. DD channel estimation in DZT-OTFS via learning in TF domain 157
estimation is then given by
(l̂, k̂, α̂) = argmin∥zy − G(l, k)α∥22 , (5.16)

l,k,α
where ∥ · ∥2 denotes 2-norm. This is an estimation problem in three variables. To reduce

the complexity, we first solve for α given (l, k) as
−1 H
α = GH (l, k)G(l, k)

G (l, k)zy . (5.17)
Now, to estimate k and l, given α, (5.16) can be solved to obtain

l̂, k̂ = arg max Θ(G) , (5.18)
l,k
−1
where G = zH H
y G(l, k) G (l, k)G(l, k) GH (l, k)zy . Substituting l = l̂ and k = k̂ in
(5.17), we obtain the estimate of the channel coefficient vector α̂.
The channel estimation algorithm proceeds in a path-wise fashion, i.e., the delay
and Doppler values of pth path (1 ≤ p ≤ Pmax ) are estimated before the values of
(p + 1)th path values are estimated. Since the knowledge of the number of paths is
not assumed to be known, a maximum of Pmax paths are estimated. The estimates
of lp and kp for the pth path is carried out in two steps. First, a coarse estimation
(integer estimation) is carried out to obtain ˜lp , k̃p . This is followed by an iterative
fine estimation step where the fractional estimation of the delay and Doppler is car-
ried out to obtain ˆlp , k̂p . In each of the steps, the cost function in (5.18) is maximized
over different search ranges as described below. The algorithm begins by initializing
G(l, k) = [g1 (l1 , k1 ) g2 (l2 , k2 ) · · · gPmax (lPmax , kPmax )] = 0M N ×Pmax .
Coarse estimation: The search range in this step is defined as G = L ⊗ K, where
L = {0, 1, · · · , ⌈lmax ⌉}, K = {−⌈kmax ⌉, · · · , 0, · · · , ⌈kmax ⌉}, lmax = max {li }, kmax =
i
max {ki }, and ⊗ denotes the Cartesian product of two sets. For estimating the param-
i
eters of the pth path, gp (lp , kp ) is computed for all (lp , kp ) in G and the coarse estimates
are obtained using (5.18) by maximizing the cost function over the search range.
Iterative fine estimation: Following the coarse estimation step, the search area is now
defined around the optimal coarse value (for s = 1) or the fine estimate obtained in the
previous iteration of the fine estimation step (for s > 1), given by
(
5 (s−1) 4 5
I(s) = lp(s−1)
− s , lp − s , · · · , lp (s−1)
+ s
10 10 10
)
5 4 5
⊗ kp(s−1) − s , kp(s−1) − s , · · · , kp(s−1) + s , (5.19)
10 10 10
with s denoting the iteration number in the fine estimation step. To begin the iterations,
s = 1, lp = ˜lp , and kp = k̃p . A similar procedure as in coarse estimation step is followed
(0) (0)
(1) (1)
using I(s) as the search range for obtaining the first fine estimate (lp , kp ), following
which s is incremented by 1. Note that the search resolution becomes finer as s increases.
Next, for s > 1, the search area is centered over the newly obtained fine estimate with
finer resolution. This iterative procedure is stopped when a predefined value for s is
achieved, i.e., s = smax , and (ˆlp , k̂p ) = (lp max , kp max ).
(s ) (s )
(p)
Stopping criterion: The algorithm stops once Pmax paths have been estimated or ∥zc −
(p−1) 2 (p)
zc ∥2 < ϵ, where zc = G(l̂, k̂)α̂(l̂, k̂).
5.3.1 Proposed TF based learning approach using DNN
In the channel estimation algorithm described above, the coarse estimation step and the
iterative fine estimation step require multiple estimations using cost function in (5.18),
which requires the computation of the DDM, G. Computing the columns of G, gi (li , ki ),
for each path involves high complexity. Therefore, in order to reduce the complexity, we
propose to design and train a network to learn the columns of G [73]. The architecture
and training methodology are presented in the following text.
1) Architecture: Figure 5.8 shows the block diagram of the proposed TF learning
approach. The architecture consists of two architecturally identical neural networks,
DNN1 and DNN2, which receive the delay and Doppler indices (l̆, k̆) as input. The
input is a matrix of size S × 2, where S is the cardinality of G for coarse estimation
Figure 5.8: Proposed TF learning architecture for learning G matrix.
step or the cardinality of I(s) for the sth iteration of the fine estimation step and l̆ =
ζl/lmax , k̆ = ζk/kmax . The division by lmax (kmax ) is carried out to normalize the values
of delay (Doppler) indices between 0 and 1 (−1 and 1)3 . Further, the multiplication by ζ
is carried out to magnify small changes in the delay and Doppler indices in the training
and test data. The vectors l and k are obtained from the search area G or I(s). The
input matrix is passed through DNN1 and DNN2. DNN1 (DNN2) is trained to output
the real (imaginary) part of the column, ℜ{GTF
col } ∈ R
S×M N
(ℑ{GTF
col } ∈ R
S×M N
), of the
DDM in TF domain. The real and imaginary parts are combined and reshaped to obtain
S×M ×N
GTF
col ∈ R . Each M × N matrix in GTF
col is then converted to DD domain from TF
domain using SFFT and vectorized column-wise to obtain an M N -length vector. These
vectors form the rows of Gcol ∈ CS×M N . The DNN1 and DNN2 are trained so as to
provide g(l[t], k[t]) ∈ C1×M N as the tth row of Gcol as output for (l̆[t], k̆[t]) ∈ R1×2 as
the tth row in the input matrix.
Architectures of DNN1 and DNN2 are comprised of fully connected layers. For each
layer, the output dimension is twice the input dimension, i.e., the ith layer (i = 1, 2, · · · )
of DNN1 and DNN2 has input and output dimensions 2i and 2i+1 , respectively. Number
of layers, NL , in DNN1 and DNN2 are determined by the choices of M and N , such that
the last layer has input dimension 2NL and output dimension min(2NL +1 , M N ), with
3
The normalization of values between 0 and 1, and -1 and 1 for delay and Doppler indices, respectively,
are done so that ranges are similar, which aids training. Without this normalization, delay indices would
span 0 to lmax and Doppler indices would span −kmax and kmax . Note that lmax and kmax need not be
equal.
Batch size 40000

Mini batch size 8000
Number of epochs 40000
Learning rate 0.001, multiply by 0.9 every 4000 epochs
Number of training samples 325000
Table 5.4: Hyper-parameters used while training.
2NL < M N and 2NL +1 ≥ M N . For each fully connected layer except the last layer, a
rectified linear unit (ReLU) activation function is used and a linear activation function
is used for the last layer to allow the output of DNN1 and DNN2 to span R.
2) Training methodology: Training data is obtained by generating (l, k) tuples and
lk
the corresponding g(l, k) vectors using g(l, k) = ej2π M N H2 H1 zx (see (5.15)). The vectors
g(l, k) ∈ CM N ×1 are reshaped into matrices of size M × N and converted to TF domain
using ISFFT, following which they are vectorized to obtain gTF (l, k) ∈ C1×M N . To
train the network, the tuples (l, k) are fed as input to the DNN1 and DNN2 to generate
the output. Training is carried out using an Adam optimizer to minimize the mean
square error loss evaluated between the output of the DNN1 (DNN2) and ℜ{gTF (l, k)}
(ℑ{gTF (l, k)}). The other hyper parameters used while training are presented in Table
5.4. We note that this training has to be carried out offline, only once. Subsequently,
the network weights are stored. During test time, the same trained weights are used for
both coarse and fine estimation steps of the channel estimation algorithm.
This section presents the performance of the proposed TF learning based channel estima-
tion algorithm. A DZT-OTFS system with M = 64, N = 32 is considered. Square-root
raised cosine pulse with roll-off factor 0.5 is used as the transmit and receive pulse. Two
parameter sets are considered for the simulation: For the first set, ∆f = 3.75 kHz, I = 4
with uniform power delay profile (PDP), delays are uniformly distributed in (0, τmax ],
τmax = 0.133 ms, and νmax = 937 Hz. The second set, a more practical scenario, considers
Vehicular A (VehA) channel model [90] with ∆f = 156.25 kHz, and νmax = 1700 Hz. For
both the cases, Dopplers are generated using Jakes’ Doppler spectrum, νi = νmax cos(θi ),
where θi is uniformly distributed in (0, 2π], and carrier frequency fc = 4 GHz. Further,
the following algorithm parameters are chosen: Pmax = 15, nmax = 2, and ϵ = 20σ 2 ,
where σ 2 is the variance of noise. For the networks DNN1 and DNN2, NL = 10 and
ζ = 103 . A single training is carried out and the same trained network is used during
the testing phase in both the scenarios which shows the network’s generalizability. Pi-
lot signal-to-noise ratio (SNR) is taken to be the same as data SNR. Normalized mean
∥Ĥ−H∥2F
square error (NMSE) is computed as ∥H∥2F
, where Ĥ is the channel matrix obtained
using the estimated (α̂, l̂, k̂) and ∥ · ∥F denotes the Frobenius norm.
DD domain vs TF domain training: Figures 5.9 and 5.10 show the absolute values
of the training data in the DD domain and TF domain, respectively. As observed in the
conventional OTFS case (Figs. 5.2 and 5.3), even in DZT-OTFS, the range of values
in the DD domain is large when compared to the TF domain. The smaller swing in
values are favorable for training and therefore we choose TF domain training over DD
domain training. Better training in TF domain is also observable through the Figs. 5.11
and 5.12. The NMSE and BER performances of the channel estimation algorithm using
learning in DD domain and TF domain are presented in Figs. 5.11 and 5.12, respectively.
It is seen that with DD domain learning, both NMSE and BER performances are poor.
This is because the large swing in the absolute values of the G matrix entries in the DD
domain (see Fig. 5.9) results in ineffective training. Since the same network needs to
cater to both the high and low values, the training accuracy and thereby the NMSE and
BER performances is poor when trained in the DD domain. Whereas, the performances
are seen to significantly improve with the proposed TF domain learning, which is a
consequence of the effective training achieved in the TF domain due to a smaller swing
in the absolute values in the TF domain (see Fig. 5.10). Further, it is also seen that
the learning in the TF domain achieves close to perfect channel state information (CSI)
Figure 5.9: Absolute values of training data in DD domain in dB scale.
Figure 5.10: Absolute values of training data in TF domain in dB scale.

Figure 5.11: NMSE performance comparison between DD domain learning and TF do-
main learning in DZT-OTFS.
performance.
NMSE and BER performance: The NMSE performance of the proposed TF learning
based channel estimation algorithm is plotted as a function of pilot SNR in Fig. 5.13.
The NMSE performance obtained for uniform PDP using the estimation algorithm in
[74] and a modified maximum likelihood estimation (M-MLE) algorithm in [98] is also
added for comparison. It is seen that for uniform PDP, the performance of the proposed
approach is quite close to that in [74]. Further, the performance of M-MLE is observed to
be worse than the proposed approach. The NMSE performance of the proposed approach
with VehA PDP is also seen to perform closely to that with uniform PDP. It is noted
that the same trained network works effectively for uniform PDP and VehA PDP channel
models, highlighting its generalizability. Next, Fig. 5.14 shows the BER performance of
the algorithm in [74], M-MLE algorithm in [98], and the proposed approach as a function
of SNR. The performance attained using perfect CSI is also plotted for comparison. It is
seen that BER performance close to that with perfect CSI is achieved by the proposed
method at a much reduced complexity, which is detailed below.
Figure 5.12: BER performance comparison between DD domain learning and TF domain
learning in DZT-OTFS.
Figure 5.13: NMSE performance of the algorithm in [74], M-MLE algorithm in [98], and
the proposed algorithm for different PDPs in DZT-OTFS.
5.4. Summary 165
Figure 5.14: BER performance of the algorithm in [74], M-MLE algorithm in [98], and
the proposed algorithm for different PDPs in DZT-OTFS.
5.4 Summary
In this chapter, we considered the problem of channel estimation in conventional OTFS
and DZT-OTFS systems with fractional DD. The state-of-the-art DDIPIC algorithm in
[74] incurred high complexity although it achieved very good performance. The high
complexity in DDIPIC was attributed to the computation of CDDPM for conventional
OTFS and DDM for DZT-OTFS. Further, this computation was carried out multiple
times for each path estimation, which made the complexity of the DDIPIC algorithm
to be high. Therefore, to reduce complexity, we proposed to learn these matrices using
NNs. This learning was carried out in the TF domain rather than the DD domain,
owing to the smaller swing in the values in CDDPM/DDM, which was observed to be
beneficial for training. For the conventional OTFS case, the trained network performed
close to the DDIPIC performance at a much less complexity. For DZT-OTFS also, the
performance of the trained NN was observed to be close to that of DDIPIC adapted
for DZT-OTFS, while being computationally more efficient. This shows that a judicious
adoption of learning approach for the problem at hand can yield efficient solutions.
Chapter 6
Conclusions
In this thesis, we considered the problem of channel estimation in wireless communica-

tions systems. For efficiently estimating the channel under diverse scenarios, we employed
tools from machine learning. Specifically, we trained NNs for the task of channel pre-
diction and channel estimation. To be begin with, we first considered the simple SISO
system with a one path channel that was time-varying, followed by OFDM which had
both time and frequency selectivity, and finally considered OTFS for channel estimation
under high Doppler spreads.
In Chapter 2, we considered the problem of channel prediction in time-varying fad-
ing channels. In time-varying fading channels, channel coefficients are estimated using
pilot symbols that are transmitted every coherence interval. For channels with high
Dopplers, the rapid channel variations over time will require these pilots to be transmit-
ted often. This requires considerable bandwidth for pilot transmission, leading to poor
throughput. We proposed a novel receiver architecture using deep RNNs that learned the
channel variations, and thereby reduced the number of pilot symbols required for chan-
nel estimation. Specifically, we designed and trained an RNN to learn the correlation in
the time-varying channel and predict the channel coefficients into the future with good
accuracy over a wide range of Dopplers and SNRs. The proposed training methodology
enabled accurate channel prediction through the use of techniques such as teacher-force
training, early-stop, and reduction of learning rate on plateau. Also, the robustness of
166
167
prediction for different Dopplers and SNRs was achieved by adapting the number of pre-
dictions into the future based on the Doppler and SNR. We also proposed a data decision
driven receiver architecture using RNNs, wherein the data symbols detected using the
channel predictions are treated as pilots to enable more predictions, thereby the pilot
overhead is further reduced. Numerical results showed that the proposed RNN based
receiver achieves good bit error performance in time-varying fading channels, while being
spectrally efficient.
In Chapter 3, we considered the problem of channel estimation in doubly-selective
(i.e., both time-selective and frequency-selective) channels in OFDM systems in the
presence of oscillator PN. While channel estimation techniques for OFDM systems in
time-flat, frequency-selective channels have been well studied and adopted in practice,
estimating a channel with rapid time variations is challenging. Also, OFDM receivers
are known to be sensitive to impairments due to local oscillator PN. Methods reported in
the literature to estimate the channel incur significant overhead in terms of the number
of training/pilot symbols needed to effectively estimate the channel in the presence of
PN. To overcome these shortcomings, we proposed a learning based channel estimation
scheme for OFDM systems in the presence of both PN and doubly-selective fading. The
proposed approach viewed the channel matrix as an image and modelled the channel
estimation problem as an image completion problem. Towards this, we devised and
employed two-dimensional CNNs for learning and estimating the channel coefficients in
the entire TF grid, based on pilots sparsely populated in the TF grid. In order to make
the network robust to PN impairment, we employed a novel training scheme where the
training data was rotated by random phases before being fed to the network. Further,
using the estimated channel coefficients, we devised a simple and effective PN estimation
and compensation scheme. Our results demonstrated that the proposed network and the
PN compensation scheme achieved robust OFDM performance in the presence of PN
and doubly-selective fading.
In Chapter 4, we considered the problem of DD domain channel estimation in OTFS
systems using deep learning techniques. We devised suitable learning based architectures
168 Chapter 6. Conclusions
for channel estimation using exclusive pilot frame, embedded pilot frame, interleaved
pilot frame, and superimposed pilot frame. We proposed a learning based architecture
suitable for estimating the DD channel for both exclusive pilot frame and embedded pilot
frame. The proposed learning network, called DDNet, was based on a multi-layered
RNN framework with a novel training methodology that worked seamlessly for both
exclusive pilot frames as well as embedded pilot frames. Our results demonstrated that
the proposed DDNet achieves better MSE and BER performance compared to impulse
based and threshold based DD channel estimation schemes reported in the literature. We
considered DD channel estimation for interleaved pilot (IP) frame, where pilot symbols
were interleaved with data symbols in a lattice type fashion, without any guard symbols.
For this IP frame structure, we proposed an RNN based channel estimation scheme
using a network called IPNet. The proposed IPNet was trained to overcome the effects
of leakage from data symbols and provide channel estimates with good accuracy in terms
of MSE performance. Our results showed that the proposed IPNet architecture achieved
good bit error performance while being spectrally efficient. Next, we proposed a sparse
superimposed pilot (SSP) scheme, where pilot and data symbols were superimposed
in a few bins and the remaining bins carried data symbols only. This scheme offered
the benefit of better inter-symbol leakage profile in a frame, while retaining full rate.
For the SSP scheme, we proposed an RNN based learning architecture (referred to as
SSPNet) trained to provide accurate channel estimates overcoming the leakage effects
in channels with fractional delays and Dopplers. Our results showed that the proposed
SSP scheme with the proposed SSPNet based channel estimation performs better than
a fully superimposed pilot (FSP) scheme with interference cancellation based channel
estimation reported in the literature.
In Chapter 5, we proposed a novel learning based approach for channel estimation
in OTFS systems, where learning was done in the TF domain for DD domain channel
estimation. Learning in the TF domain was motivated by the fact that the range of
values in the TF channel matrix was favorable for training as opposed to the large
swing of values in the DD channel matrix which is not favourable for training. A key
169
beneficial outcome of the proposed approach was its low complexity along with very
good performance. Specifically, it drastically reduced the complexity of the computation
of a constituent DD parameter matrix in a state-of-the-art algorithm. We developed
this TF learning approach for two types of OTFS systems, namely, 1) two-step OTFS,
and 2) DZT-OTFS. Our results showed that the proposed TF learning-based approach
achieved almost the same performance as that of the state-of-the-art algorithm, while
being drastically less complex, making it practically appealing.
Scope of future work
Here, we discuss some possible extensions to the work reported in this thesis.
1. In Chapter 2, we considered the problem of channel estimation in time-varying

fading channels using NNs. We considered the channel to have one path for most
of the results presented in the Chapter. However, channel prediction in a frequency
selective channel (with more than one path) could be of potential interest.
2. In Chapter 3, we considered channel estimation in OFDM systems with PN us-

ing CNNs. We also proposed a simple PN estimation and compensation scheme
using the estimates obtained from the trained CNN. Devising efficient schemes,
with minimal pilot overhead to estimate the channel in the presence of practical
impairments like IQ-imbalance and carrier frequency offset can be an interesting
topic of future research.
3. In Chapter 4, we considered the DD domain channel estimation for OTFS systems

with different pilot frames using RNNs. In all the results presented in the chapter
either an MMSE or an MP detector was used for detecting the data symbols.
Efficient NN based joint channel estimator and detector for OTFS is a problem
that can be investigated. Devising a suitable network for channel estimation in
OTFS systems in the presence of phase noise can be considered for future work.
170 Chapter 6. Conclusions
4. In Chapter 5, we considered estimating the channel in DD domain with low-

complexity for conventional OTFS and DZT-OTFS using NNs. This learning was
carried out in the TF domain to achieve good training performance. We consid-
ered the impulse pilot frame for this purpose. Learning based techniques for low-
complexity channel estimation in more challenging embedded and superimposed
pilot frames can be taken up as future work.
5. The channel estimation schemes proposed in this thesis are carried out in the
context of SISO systems. This can be extended to MIMO systems as well, which
can be considered for future work.
Appendix A
Derivation of channel matrix with

fractional DD and rectangular pulse
From (4.26), the discrete signal at the transmitter is given by at = (FH

N ⊗ Ptx )a
DD
. This
is converted into a continuous time domain signal as
N −1
MX
t
a (t) = at [n]ptx (t − nTs ), (A.1)
n=0
1
where Ts is the sampling period with M Ts = T = ∆f
. For a rectangular pulse, ptx (t −
nTs ) = 1{nT s≤t<(n+1)Ts } , where 1{·} is the indicator function. Equation (A.1) can be
simplified as
N −1
MX
t
t
a (t) = a [n]1{nT s≤t<(n+1)Ts } = a
t t
, (A.2)
n=0
Ts MN
where ⌊·⌋ denotes that flooring operation and [·]M N denotes the modulo-M N operation.
At the receiver, the received signal bt (t) is obtained as
L−1
X
bt (t) = gi at (t − τi )ej2πνi (t−τi ) , (A.3)
i=0
171
172Appendix A. Derivation of channel matrix with fractional DD and rectangular pulse
βi +bi
where L is the number of paths in the channel and gi , τi = (αi + ai )Ts , and νi = NT
are
the channel gains, delay, and Doppler spread of the ith path, respectively. Substituting
in the above equation, we get
L−1
t − (αi + ai )Ts

X j2π(βi +bi )
t t (t−(αi +ai )Ts )
b (t) = gi a e NT . (A.4)
i=0
Ts MN
Sampling the continuous signal bt (t) at t = nTs , n = 0, 1, · · · , and simplifying, we get
L−1
X j2π(βi +bi )
bt [n] = gi at [n − ⌈αi + ai ⌉]M N e MN
(n−(αi +ai ))
. (A.5)
i=0
This can be vectorized to obtain
L−1
X
t
b = gi ∆i Π⌈αi +ai ⌉ at = Gat , (A.6)
i=0
where G is the effective channel matrix given by
L−1
X
G= gi ∆i Π⌈αi +ai ⌉ , (A.7)
i=0
Π is defined as
 
0 ··· 0 1
 
1 · · · 0 0
 
Π=
 .. . . .. .. 
 , (A.8)
. . . .
 
0 ··· 1 0
M N ×M N
n
and ∆i = diag exp − j2π(αi +a i )(βi +bi )
, exp j2π(1−(αi +ai ))(βi +bi )
, ···,
M
oN MN
exp j2π(M N −1−(α

MN
i +ai ))(βi +bi )
. ■
Appendix B
Derivation of input-output relation

for two-step OTFS
Let ADD [l, k] and BDD [l′ , k ′ ] denote the delay-Doppler (DD) domain symbols at the trans-
mitter and the receiver, respectively. Similarly, let ATF [m, n] and BTF [m′ , n′ ] denote the
time-frequency (TF) domain symbols at the transmitter and the receiver, respectively,
where l, l′ , m, m′ ∈ {0, 1, · · · , M − 1}, and k, k ′ , n, n′ ∈ {0, 1, · · · , N − 1}, ∆f denotes the
subcarrier spacing, T = 1/∆f , and τp and νp denote the pth path’s delay and Doppler,
τp lτp νp kνp
respectively, with T
= M
and ∆f
= N
. At the transmitter, the TF domain symbols are
obtained from the DD symbols using inverse symplectic finite Fourier transform (ISFFT)
operation as
M −1 N −1
1
ADD [l, k]e−j2π( M − N ) .
X X ml nk
ATF [m, n] = √ (B.1)
M N l=0 k=0
The corresponding time-domain (TD) signal is obtained using Heisenberg transform as
M
X −1 N
X −1
a(t) = ATF [m, n]g (t − nT ) ej2πm∆f (t−nT ) , (B.2)
m=0 n=0
where g(t) denotes the transmit pulse. A cyclic prefix (CP) of length τmax = max {τp }
p
is added, i.e.,
a(t) = a(t + N T ) ∀ t ∈ [−τmax , 0) . (B.3)
173
174 Appendix B. Derivation of input-output relation for two-step OTFS
The TD signal at the receiver (without noise) is given by
X
b(t) = αp a (t − τp ) ej2πνp (t−τp ) . (B.4)
p
After discarding the CP at the receiver, the corresponding TF domain symbols are
obtained using Wigner transform as
Z
′ ′ ′ ′
BTF [m , n ] = b(t)g ∗ (t − n′ T ) e−j2πm ∆f (t−n T ) dt. (B.5)
t
The received TF symbols are converted to DD symbols using SFFT operation as
1 XX
′′
n′ k′

m l
′ ′ ′ ′ j2π M − N
BDD [l , k ] = √ BTF [m , n ]e . (B.6)
M N n′ m′
Considering g(t) to be a rectangular pulse of unit energy ranging from 0 to T , and

τmax < T, BTF [m′ , n′ ] becomes
Z (n′ +1)T X
′ 1X
′ ′
BTF [m , n ] = αp ATF [m, n′ ]ej2πm∆f (t−n T −τp ) ej2πνp (t−τp )
T p n′ T +τp m
Z n′ T +τp X
′ ′ ′
× e−j2πm ∆f (t−n T ) dt + ATF [m, n′ − 1]ej2πm∆f (t−(n −1)T −τp )
n′ T
m
′ ′
× ej2πνp (t−τp ) e−j2πm ∆f (t−n T ) dt .
(B.7)
We evaluate the two integrals in (B.7) separately. The first integral is simplified as
follows:
(n′ +1)T
1
Z
′ ′
ej2πm∆f (t−τp −n T ) e−j2πm ∆f t ej2πνp (t−τp ) dt
T n′ T +τp
(n′ +1)T
e−j2πm∆f τp e−j2πτp νp
Z
′
= ej2π[(m−m )∆f +νp ]t dt
T n′ T +τp
−j2πm∆f τp −j2πτp νp
e e h νp
j2πνp n′ T j2π (m−m′ + ∆f )
νp
j2πn′ (m−m′ + ∆f
νp τp
) ej2π(m−m′ + ∆f )T
i
= e e − e
T j2π[(m − m′ )∆f + νp ]
175
′ νp
e−j2πm∆f τp e−j2πτp νp ej2πn ∆f h j2π(m−m′ + ∆f
νp νp τp
) − ej2π(m−m′ + ∆f )T
i
= e
T j2π[(m − m′ )∆f + νp ]
νp νp τp νp νp τp
h
= k ejπ(m−m + ∆f ) ejπ(m−m + ∆f ) T ejπ(m−m + ∆f ) e−jπ(m−m + ∆f ) T −
′ ′ ′ ′
1
ν ν τp
i
−jπ (m−m′ + ∆f
p
) jπ(m−m′ + ∆fp )
e e T ,
νp
j2πn′
e−j2πm∆f τp e−j2πτp νp e ∆f
where k1 = T j2π[(m−m′ )∆f +νp ]
. Therefore, we can write
(n′ +1)T
1
Z
′ ′
ej2πm∆f (t−τp −n T ) e−j2πm ∆f t ej2πνp (t−τp ) dt
T n′ T +τp
ν τpνp τ νp τ
h i
jπ (m−m′ + ∆f
p
)(1+ ) jπ (m−m′ + ∆f )( 1− Tp ) −jπ (m−m′ + ∆f )(1− Tp )
= k1 e e T −e
τp ′ νp νp τp
τp
= e−j2πm T e−j2πτp νp ej2πn ∆f ejπ(m−m + ∆f )(1+ T ) 1 −
′
T
νp τp
× sinc m − m′ + 1− .
∆f T
The second integral is simplified as follows:
n′ T +τp
1
Z
′ ′ ′
ej2πm∆f (t−(n −1)T −τp ) ej2πνp (t−τp ) e−j2πm ∆f (t−n T ) dt
T n′ T
−j2πm∆f τp −j2πτp νp
e e h
j2πn′ (m−m′ + ∆f
p ν
) j2π(m−m′ + ∆fp )( ν τp νp
) − ej2πn′ (m−m′ + ∆f )
i
= e e T
νp
j2π m − m′ + ∆f
e−j2πm∆f τp e−j2πτp νp j2πn′ ∆f

νp
h νp τp
i
ej2π(m−m + ∆f )( T ) − 1
′
= e
νp
j2π m − m′ + ∆f

−j2πm Tp −j2πτp νp j2πn′ ∆f jπ (m−m′ + ∆f )( T ) τp νp τ p
τ νp νp τp

′
=e e e e sinc m−m + .
T ∆f T
Substituting the evaluated integrals in (B.7), we get

X
τp ′ νp νp τp
ejπ(m−m + ∆f )(1+ T )
X ′
′
BTF [m , n ] = ′
αp e −j2πτp νp
ATF [m, n′ ]e−j2πm T ej2πn ∆f
p m
!
τp
νp τ p

× 1− sinc m − m′ + 1−
T ∆f T
(B.8)
−j2πm Tp j2πn′ ∆f jπ (m−m′ + ∆f )( T ) τp
τ νp νp τp
X
′
+ ATF [m, n − 1]e e e
m
T

νp τp
× sinc m − m′ + .
∆f T
Substituting for ATF [m, n′ ] and ATF [m, n′ − 1] using (B.1), we have

1 X ′ XXX ′

−j2π ml − nNk τp ′ νp
BTF [m , n ] = √′ ′
αp ADD [l, k]e M
e−j2πm T ej2πn ∆f
MN p m k l

ν τ
jπ (m−m′ + ∆f )(1+ T )
p p
τp ′ νp τp
×e 1− sinc m−m + 1−
T ∆f T
(n′ −1)k

ml τp
−j2π M − N
XXX
+ ADD [l, k]e e−j2πm T
m k l

νp ν τp
) τp sinc νp τ p

j2πn′ ∆f jπ (m−m′ + ∆f
p
)( ′
×e e T m−m + ,
T ∆f T
(B.9)
where αp′ = αp e−j2πτp νp . Substituting the obtained BTF [m′ , n′ ] in (B.6), we have

1 X ′ XXXXX ′
′′ ′ ′

′ ′ −j2π ml − nNk j2π mMl − nNk
BDD [l , k ] = α ADD [l, k]e M
e
M N p p m k l m′ n′

τ νp νp τp
−j2πm Tp j2πn′ ∆f jπ (m−m′ + ∆f )(1+ T )
τp ′ νp τp
×e e e 1− sinc m−m + 1−
T ∆f T
(n′ −1)k
′′
n′ k ′

ml m l τp ′ νp
−j2π M − N j2π M − N
XXXXX
+ ADD [l, k]e e e−j2πm T ej2πn ∆f
m k l m′ n′

ν τp
) τp sinc νp τp

jπ (m−m′ + ∆f
p
)( ′
×e T m−m + .
T ∆f T
(B.10)
177
Rearranging the terms in the above equation, we get

′ ′ 1 X ′ XX X j2πn′ k−k′ + νp X X
BDD [l , k ] = α ADD [l, k] e N ∆f
MN p p k l n′ m m′

′ ′
τ τ ν

mτp τp ′ + νp
−j2π ml m l

e M
− M
+ T
ejπ (1+ T )(m−m ∆f ) 1−
p
sinc 1 −
p ′
m−m +
p
T T ∆f
XX X j2πn′ k−k′ + νp X X −j2π ml − m′ l′ + mτp
+ ADD [l, k] e N ∆f
e M M T
k l n′ m ′
τ m
τp νp τp νp
× e−j2π N ejπ( T )(m−m + ∆f )
k ′ p ′
sinc m−m + . (B.11)
T T ∆f
Simplifying the above equation results in the following:

" #"
X XX 1 X j2πn′ k−k′ + νp 1 XX
BDD [l′ , k ′ ] = αp′ ADD [l, k] e N ∆f
p l k
N n′ M m m′

′ ′
τp τp νp

mτ τ νp
−j2π ml − mMl + T p

jπ (1+ Tp )(m−m′ + ∆f ) ′
e M
e 1− sinc 1 − m−m +
T T ∆f
′ ′

mτ τp νp
ml
− mMl + T p
e−j2π N ejπ( T )(m−m + ∆f )
−j2π k ′
+e M
(B.12)
τ #
p τp ν p
× sinc m − m′ + .
T T ∆f
Substituting m′ − m = s in the above equation yields

" #
1 X −j2πn′ k′N−k − ∆f

X XX νp
BDD [l′ , k ′ ] = αp e−j2πτp νp ADD [l, k] e
p l k
N n′
" M −1−m
1 X j 2π (m(l′ −l−M τp ∆f )) X j2π sl′
× e M e M (B.13)
M m s=−m

τ νp
jπ (1+ Tp )( ∆f −s)
τp τ p νp
× e 1− sinc 1 − −s
T T ∆f
τ !#
k jπ νp −s τp τ ν
e ( ∆f ) T
−j2π N p p p
+e sinc −s .
T T ∆f
Vectorizing BDD and ADD to obtain bDD [d′ ] = bDD [k ′ M + l′ ] = BDD [l′ , k ′ ] and aDD [d] =
aDD [kM + l] = ADD [l, k], respectively, d, d′ = {0, 1, · · · , M N − 1}, the above equation
can be written as
X
bDD = αp Ep (τp , νp )aDD , (B.14)
p
which is the (5.6) for the noiseless case. In the above equation, Ep (τp , νp ) is an M N ×M N
matrix whose entries are given by
Ep [d′ , d] = e−j2πτp νp ee′ , (B.15)
where
N −1
1 X −j2πn k′N−k − ∆f

νp
e= e , (B.16)
N n=0
M −1
′ 1 X j2π M (l −l−M τTp ) f
m ′
e = e τp ,νp ,k,l′ (m), (B.17)
M m=0
and
−1−m
MX
"
j2π sl
′
jπ (1+
τp
)( νp
−s)
τp τ p νp
fτp ,νp ,k,l′ (m) = e M e T ∆f 1− sinc 1 − −s
s=−m
T T ∆f
(B.18)
τ #
k jπ
−j2π N
ν
( ∆fp −s) τp
p τp νp
+e e T sinc −s .
T T ∆f
which is (5.8). ■
List of publications from the thesis
Journal papers
1. S. R. Mattu, L. N. Theagarajan, and A. Chockalingam, “Deep channel prediction: a
DNN framework for receiver design in time-varying fading channels,” IEEE Trans.
Veh. Tech., vol. 71, no. 6, pp. 6439-6453, Jun. 2022.
2. S. R. Mattu and A. Chockalingam, “Learning-based channel estimation and phase

noise compensation in doubly-selective channels,” IEEE Commun. Lett., vol. 26,
no. 5, pp. 1052-1056, May 2022.
3. S. P. Muppaneni, S. R. Mattu, and A. Chockalingam, “Channel and radar parame-

ter estimation with fractional delay-Doppler using OTFS,” IEEE Commun. Lett.,
vol. 27, no. 5, pp. 1392-1396, May 2023.
4. S. R. Mattu and A.Chockalingam, “Learning in time-frequency domain for frac-

tional delay-Doppler channel estimation in OTFS,” accepted in IEEE Wireless
Commun. Lett., doi: 10.1109/LWC.2024.3367112.
Conference papers
1. S. R. Mattu and A. Chockalingam, “Fractional delay-Doppler channel estimation
in OTFS with sparse superimposed pilots using RNNs,” Proc. IEEE VTC’2023-
Spring, pp. 1-6, Jun. 2023.
179
180 List of publications from the thesis
2. S. R. Mattu and A. Chockalingam, “An RNN based DD channel estimator for

OTFS with embedded pilots,” Proc. IEEE PIMRC’2022, pp. 457-462, Sep. 2022.
3. S. R. Mattu and A. Chockalingam, “Learning based delay-Doppler channel estima-

tion with interleaved pilots in OTFS,” Proc. IEEE VTC’2022-Fall, pp. 1-6, Sep.
2022.
4. S. P. Muppaneni, S. R. Mattu, and A. Chockalingam, “Data-aided fractional delay-

Doppler channel estimation with embedded pilot frames in DZT-based OTFS,”
Proc. IEEE VTC’2023-Fall, pp. 1-7, Oct. 2023.
5. S. P. Muppaneni, S. R. Mattu, and A. Chockalingam, “Delay-Doppler domain

channel estimation for DZT-based OTFS systems,” Proc. IEEE SPAWC’2023, pp.
236-240, Sep. 2023.
6. S. R. Mattu and A. Chockalingam, “Delay-Doppler channel estimation in DZT-

OTFS via deep learning in time-frequency domain,” submitted to IEEE VTC’2024-
Spring.
Other publications (not part of this thesis)

1. S. R. Mattu, T. Lakshmi Narasimhan, and A. Chockalingam, “Autoencoder based
robust transceivers for fading channels using deep neural networks,” Proc. IEEE
VTC’2020-Spring, pp. 1-5, May 2020.
2. V. Yogesh, V. S. Bhat, S. R. Mattu, and A. Chockalingam, “On the bit error

performance of OTFS modulation using discrete Zak transform,” Proc. IEEE
ICC’2023, pp. 741-746, Jun. 2023.
3. F. Jesbin, S. R. Mattu, and A. Chockalingam, “Sparse superimposed pilot based

channel estimation in OTFS systems,” Proc. IEEE WCNC’2023, pp. 1-6, Mar.
2023.
List of publications from the thesis 181
4. V. Yogesh, S. R. Mattu, and A. Chockalingam, “Low-complexity delay-Doppler

channel estimation in discrete Zak transform based OTFS,” accepted in IEEE
Commun. Lett., doi: 10.1109/LCOMM.2024.3351685.
5. V. Yogesh, Anagha V, S. R. Mattu, and A. Chockalingam, “On the PAPR of

discrete Zak transform based OTFS modulation,” submitted to IEEE VTC’2024-
Spring.
6. V. Yogesh, S. R. Mattu, and A. Chockalingam, “Iterative channel estimation/detection

for DZT-OTFS using superimposed pilot frames,” submitted to IEEE VTC’2024-
Spring.
Bibliography
[1] A. F. Murray, Applications of Neural Networks, Boston: Kluwer Academic Publish-

ers, 1995.
[2] A. Paszke et al., “Pytorch: an imperative style, high-performance deep learning

library,” NeurIPS’2019, pp. 1-12, Dec. 2019.
[3] Martı́n Abadi et al., “TensorFlow: Large-scale machine learning on heterogeneous

systems,” software available from tensorflow.org (2015).
[4] N. Van Huynh and G. Y. Li, “Transfer learning for signal detection in wireless
networks,” IEEE Wireless Commun. Lett., vol. 11, no. 11, pp. 2325-2329, Nov.
2022.
[5] H. Park, J. Kang, S. Lee, J. W. Choi, and S. Kim, “Deep Q-network based beam
tracking for mobile millimeter-wave communications,” IEEE Trans. Wireless Com-
mun., vol. 22, no. 2, pp. 961-971, Feb. 2023.
[6] I. Helmy, P. Tarafder, and W. Choi, “LSTM-GRU model-based channel prediction

for one-bit massive MIMO system,” IEEE Trans. Veh. Tech., vol. 72, no. 8, pp.
11053-11057, Aug. 2023.
[7] L. Dai, R. Jiao, F. Adachi, H. V. Poor, and L. Hanzo, “Deep learning for wireless
communications: an emerging interdisciplinary paradigm” IEEE Wireless Com-
mun., vol. 27, no. 4, pp. 133-139, Aug. 2020.
182
BIBLIOGRAPHY 183
[8] G. Choi, J. Park, N. Shlezinger, Y. C. Eldar, and N. Lee, “Split-KalmanNet: a

robust model-based deep learning approach for state estimation,” IEEE Trans. Veh.
Tech., vol. 72, no. 9, pp. 12326-12331, Sep. 2023.
[9] Y. I. Tek, A. T. Dogukan, and E. Basar, “Autoencoder-based enhanced orthogonal

time frequency space modulation,” IEEE Commun. Lett., vol. 27, no. 10, pp. 2628-
2632, Oct. 2023.
[10] W. C. Jakes, Microwave Mobile Communications, New York: IEEE Press, reprinted
1994.
[11] D. Tse and P. Viswanath, Fundamentals of Wireless Communication, Cambridge

Univ. Press, 2005.
[12] A. Goldsmith, Wireless Communications, Cambridge Univ. Press, 2005.
[13] A. F. Molisch, Wireless Communications, John-Wiley, 2nd Ed., 2011.
[14] F. Hlawatsch and G. Mats, Wireless Communications over Rapidly Time-Varying

Channels, Academic Press, 2011.
[15] W. Su, Z. Safar, and K. J. R. Liu, “Towards maximum achievable diversity in space,
time, and frequency: performance analysis and code design,” IEEE Trans. wireless
Commun., vol. 4, no. 4, pp. 1847-1857, Sep. 2005.
[16] Y. Gong and K. B. Letaief, “An efficient space-frequency coded OFDM system for
broadband wireless communications,” IEEE Trans. Commun., vol. 51, no. 12, pp.
2019-2029, Dec. 2003.
[17] T. Wang, J. G. Proakis, E. Masry, and J. R. Zeidler, “Performance degradation of

OFDM systems due to Doppler spreading,” IEEE Trans. Wireless Commun., vol.
5, no. 6, pp. 1422-1432, Jun. 2006.
[18] L. Tomba, “On the effect of Wiener phase noise in OFDM systems,” IEEE Trans.
Commun., vol. 46, no. 5, pp. 580-583, May 1998.
184 BIBLIOGRAPHY
[19] D. Petrovic, W. Rave, and G. Fettweis, “Effects of phase noise on OFDM systems
with and without PLL: characterization and compensation,” IEEE Trans. Com-
mun., vol. 55, no. 8, pp. 1607-1616, Aug. 2007.
[20] Y. Zhao and S. G. Haggman, “Sensitivity to Doppler shift and carrier frequency
errors in OFDM systems - the consequences and solutions,” Proc. IEEE VTC’1996,
pp. 1-5, Apr.-May 1996.
[21] H. Minn, V. K. Bhargava, and K. B. Letaief, “A robust timing and frequency

synchronization for OFDM systems,” IEEE Trans. Wireless Commun., vol. 2, no.
4, pp. 822-839, Jul. 2003.
[22] Y. Mostofi and D. C. Cox, “Mathematical analysis of the impact of timing synchro-
nization errors on the performance of an OFDM system,” IEEE Trans. Commun.,
vol. 54, no. 2, pp. 226-230, Feb. 2006.
[23] B. Yang, K. Letaief, and R. Cheng, “Timing recovery for OFDM transmission,”
IEEE J. Sel. Areas Commun., vol. 18, no. 11, pp. 2278-2291, Nov. 2000.
[24] M. Speth, F. Classen, and H. Meyr, “Frame synchronization of OFDM systems in

frequency selective fading channels,” Proc. IEEE VTC’1997, pp. 1-5, May 1997.
[25] C. R. N. Athaudage, K. Sathananthan, and R. R. V. Angiras, “Enhanced frequency

synchronization for OFDM systems using timing error feedback compensation,”
Proc. IEEE PIMRC’2004, pp. 1-5, Sep. 2004.
[26] X. Cai, Y. C. Wu, H. Lin, and K. Yamashita, “Estimation and compensation of

CFO and I/Q imbalance in OFDM systems under timing ambiguity,” IEEE Trans.
Veh. Tech., vol. 60, no. 3, pp. 1200-1205, Mar. 2011.
[27] D. Sen, S. Chakrabarti, and R. V. R. Kumar, “A multi-band timing estimation and

compensation scheme for ultra-wideband communications,” Proc. IEEE GLOBE-
COM’2008, pp. 1-5, Nov.-Dec. 2008.
BIBLIOGRAPHY 185
[28] S. H. Han and J. H. Lee, “PAPR reduction of OFDM signals using a reduced
complexity PTS technique,” IEEE Sig. Proc. Lett., vol. 11, no. 11, pp. 887-890,
Nov. 2004.
[29] A. M. Rateb and M. Labana, “An optimal low complexity PAPR reduction tech-
nique for next generation OFDM systems,” IEEE Access, vol. 7, pp. 16406-16420,
Jan. 2019.
[30] M. Kim, W. Lee, and D. H. Cho, “A novel PAPR reduction scheme for OFDM
system based on deep learning,” IEEE Commun. Lett., vol. 22, no. 3, pp. 510-513,
Mar. 2018.
[31] A. Monk, R. Hadani, M. Tsatsanis, and S. Rakib, “OTFS - orthogonal time fre-
quency space: a novel modulation technique meeting 5G high mobility and massive
MIMO challenges,” online: arXiv:1608.02993 [cs.IT] 9 Aug 2016.
[32] R. Hadani, S. Rakib, M. Tsatsanis, A. Monk, A. J. Goldsmith, A. F. Molisch,

and R. Calderbank, “Orthogonal time frequency space modulation,” Proc. IEEE
WCNC’2017, pp. 1-7, Mar. 2017.
[33] R. Hadani and A. Monk, “OTFS: a new generation of modulation addressing the
challenges of 5G,” online: arXiv:1802.02623 [cs.IT] 7 Feb 2018.
[34] R. Hadani, S. Rakib, S. Kons, M. Tsatsanis, A. Monk, C. Ibars, J. Delfeld, Y.

Hebron, A. J. Goldsmith, A. F. Molisch, and R. Calderbank, “Orthogonal time
frequency space modulation,” online: arXiv:1808.00519v1 [cs.IT] 1 Aug 2018.
[35] P. Bello, “Characterization of randomly time-variant linear channels,” IEEE Trans.

Commun. Sys., vol. 11, no. 4, pp. 360-393, Dec. 1963.
[36] W. Schempp, “Radar ambiguity functions, the Heisenberg group, and holomorphic
theta series” Proc. Amer. Math. Soc., vol. 92, Sep. 1984.
186 BIBLIOGRAPHY
[37] J. Zak, “Finite translation in solid-state physics,” Physical Review Lett., vol. 19, no.
24, Dec. 1967.
[38] A. J. E. M. Janssen, “The Zak transform: a signal transform for sampled time-
continuous signals,” Philips J. Res., 43, pp. 23-69, 1988.
[39] H. Bolcskei and F. Hlawatsch, “Discrete Zak transforms, polyphase transforms, and
applications,” IEEE Trans. Signal Proc., vol. 45, no. 4, pp. 851-866, Apr. 1997.
[40] F. Lampel, A. Avarado, and F. M. J. Willems, “On OTFS using the discrete Zak
transform,” Proc. IEEE ICC’2022 Workshops, pp. 729-734, May 2022.
[41] R. Horn and C. Johnson, Matrix Analysis, Cambridge Univ. Press, 2013.
[42] Y. S. Cho, J. Kim, W. Y. Yang, and C. G. Kang, MIMO-OFDM Wireless Commu-

nications with MATLAB, Wiley-IEEE Press, 2011.
[43] G. Casella and R. L. Berger, Statistical Inference, Duxbury Press, Pacific Grove,
2002.
[44] M.-H. Hsieh and C.-H. Wei, “Channel estimation for OFDM systems based on
comb-type pilot arrangement in frequency selective fading channels,” IEEE Trans.
Consumer Electronics, vol. 44, no. 1, pp. 217-225, Feb. 1998.
[45] S. Coleri, M. Ergen, A. Puri, and A. Bahai, “Channel estimation techniques based
on pilot arrangement in OFDM systems,” IEEE Trans. Broadcasting, vol. 48, no. 3,
pp. 223-229, Sep. 2002.
[46] R. van Nee and R. Prasad, OFDM for Wireless Multimedia Communications, Artech
House Publishers, 2000.
[47] A. L. Samuel, “Some studies in machine learning using the game of checkers,” IBM
Journal of Research and Development, vol. 3, no. 3, pp. 210-229, Jul. 1959.
[48] C. M. Bishop, Pattern Recognition and Machine Learning, vol. 4. no. 4. Springer,
2006.
BIBLIOGRAPHY 187
[49] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern classification, Wiley, 2001.
[50] K. Pearson, “On lines and planes of closest fit to systems of points in space,” Philo-
sophical Magazine, vol. 2, no. 11, pp. 559–572, 1909.
[51] W. S. McCulloch and W. Pitts, “A logical calculus of the ideas immanent in nervous
activity” Bulletin of Mathematical Biophysics, vol. 5, 115-133, Dec. 1943.
[52] B. Widrow and M. E. Hoff, “Adaptive switching circuits,” IRE WESCON conven-
tion record, vol. 4, no. 1, 1960.
[53] F. Rosenblatt, Principles of Neurodynamics: Perceptrons and the Theory of Brain

Mechanisms, Spartan books, 1962.
[54] D. E. Rumelhart and J. L. McClelland, Parallel Distributed Processing: Explorations

in the Microstructure of Cognition: Foundations, MIT Press, 1987.
[55] A. G. Ivakhnenko and V.G. Lapa, Cybernetic Predicting Devices, Joint Publications
Research Service, 1965.
[56] S. -I. Amari, “Learning patterns and pattern sequences by self-organizing nets of
threshold elements,” IEEE Trans. Computers, vol. C-21, no. 11, pp. 1197-1206, Nov.
1972.
[57] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation,

vol. 9, no. 8, pp. 1735-1780, Nov. 1997.
[58] J. Chung, G. Caglar, K. Cho, and Y. Bengio. “Empirical evaluation of gated recur-
rent neural networks on sequence modeling,” online: arXiv:1412.3555 [cs.LG], 11
Dec 2014.
[59] K. Fukushima, “Neocognitron: a self-organizing neural network model for a mecha-

nism of pattern recognition unaffected by shift in position,” Biol. Cybernetics, vol.
36, pp. 193–202, Apr. 1980.
188 BIBLIOGRAPHY
[60] S. Linnainmaa, “Taylor expansion of the accumulated rounding error,” BIT Numer-
ical Mathematics, vol. 16, pp. 146–160, Jun. 1976.
[61] S. R. Mattu, L. N. Theagarajan, and A. Chockalingam, “Deep channel prediction: a

DNN framework for receiver design in time-varying fading channels,” IEEE Trans.
Veh. Tech., vol. 71, no. 6, pp. 6439-6453, Jun. 2022.
[62] R. J. Williams and D. Zipser, “A learning algorithm for continually running fully
recurrent neural networks,” Neural Computation, vol. 1, no. 2, pp. 270-280, Jun.
1989.
[63] R. Wang, H. Mehrpouyan, M. Tao, and Y. Hua, “Channel estimation, carrier re-
covery, and data detection in the presence of phase noise in OFDM relay systems,”
IEEE Trans. Wireless Commun., vol. 15, no. 2, pp. 1186-1205, Feb. 2016.
[64] Q. Zou, A. Tarighat, and A. H. Sayed, “Compensation of phase noise in OFDM

wireless systems,” IEEE Trans. Signal Proc., vol. 55, no. 11, pp. 5407-5424, Nov.
2007.
[65] S. R. Mattu and A. Chockalingam, “Learning-based channel estimation and phase

noise compensation in doubly-selective channels,” IEEE Commun. Lett., vol. 26, no.
5, pp. 1052-1056, May 2022.
[66] S. R. Mattu and A. Chockalingam, “An RNN based DD channel estimator for OTFS
with embedded pilots,” Proc. IEEE PIMRC’2022, pp. 457-462, Sep. 2022.
[67] M. K. Ramachandran and A. Chockalingam, “MIMO-OTFS in high-Doppler fading

channels: signal detection and channel estimation,” Proc. IEEE GLOBECOM’2018,
pp. 206-212, Dec. 2018.
[68] P. Raviteja, K. T. Phan, and Y. Hong, “Embedded pilot-aided channel estimation

for OTFS in delay-Doppler channels,” IEEE Trans. Veh. Tech., vol. 68, no. 5, pp.
4906-4917, May 2019.
BIBLIOGRAPHY 189
[69] S. R. Mattu and A. Chockalingam, “Learning based delay-Doppler channel estima-

tion with interleaved pilots in OTFS,” Proc. IEEE VTC’2022-Fall, pp. 1-6, Sep.
2022.
[70] S. R. Mattu and A. Chockalingam, “Fractional delay-Doppler channel estimation in

OTFS with sparse superimposed pilots using RNNs,” Proc. IEEE VTC’2023-Spring,
pp. 1-6, Jun. 2023.
[71] H. B. Mishra, P. Singh, A. K. Prasad, and R. Budhiraja, “OTFS channel estima-

tion and data detection designs with superimposed pilots,” IEEE Trans. Wireless
Commun., vol. 21, no. 4, pp. 2258-2274, Apr. 2022.
[72] S. R. Mattu and A.Chockalingam, “Learning in time-frequency domain for fractional

delay-Doppler channel estimation in OTFS,” accepted in IEEE Wireless Commun.
Lett., doi: 10.1109/LWC.2024.3367112.
[73] S. R. Mattu and A. Chockalingam, “Delay-Doppler channel estimation in DZT-

OTFS via deep learning in time-frequency domain,” submitted to IEEE VTC’2024-
Spring.
[74] S. P. Muppaneni, S. R. Mattu, and A. Chockalingam, “Channel and radar parameter

estimation with fractional delay-Doppler using OTFS,” IEEE Commun. Lett., vol.
27, no. 5, pp. 1392-1396, May 2023.
[75] R. H. Clarke, “A statistical theory of mobile-radio reception,” The Bell Syst. Tech.
Journ., vol. 47, no. 6, pp. 957-1000, Jul.-Aug. 1968.
[76] J. I. Smith, “A computer generated multipath fading simulation for mobile radio,”
IEEE Trans. Veh. Tech., vol. 24, no. 3, pp. 39-40, Aug. 1975.
[77] H. L. Van Trees, Optimum Array Processing: Part IV of Detection, Estimation, and
Modulation Theory, John Wiley & Sons, 2004.
190 BIBLIOGRAPHY
[78] S. Lawrence and C. L. Giles, “Overfitting and neural networks: conjugate gradi-
ent and backpropagation,” Proc. IEEE-INNS-ENNS Intl. Joint Conf. on Neural
Networks, pp.114-119, Jul. 2000.
[79] G. U. Yule, “On a method of investigating periodicities disturbed series, with special
reference to Wolfer’s sunspot numbers,” Phil. Trans. of the Royal Soc. of London,
Cont. Papers of a Math. or Phy. Char., pp.267-298, Jan. 1927.
[80] G. T. Walker, “On periodicity in series of related terms,” Proc. Royal Soc. of Lon-
don, Cont. Papers of a Math. and Phy. Char., pp.518-532, Jun. 1931.
[81] 3GPP TS 36.104 V16.6.0 (2020-07). “Evolved Universal Terrestrial Radio Access
(E-UTRA); Base Station (BS) Radio Transmission and Reception”, 3rd Generation
Partnership Project; Technical Specification Group Radio Access Network.
[82] 3GPP TS 36.101 V16.7.0 (2020-12). “Evolved Universal Terrestrial Radio Access (E-
UTRA); User Equipment (UE) Radio Transmission and Reception,” 3rd Generation
Partnership Project; Technical Specification Group Radio Access Network.
[83] R. W. Chang, “Synthesis of band-limited orthogonal signals for multichannel data

transmission,”The Bell Syst. Tech. Journ., vol. 45, no. 10, pp. 1775-1796, Dec. 1966.
[84] F. Pancaldi, G. M. Vitetta, R. Kalbasi, N. Al-Dhahir, M. Uysal, and H. Mheidat,

“Single-carrier frequency domain equalization,” IEEE Signal Proc. Mag., vol. 25,
no. 5, pp. 37-56, Sep. 2008.
[85] N. Shlezinger, N. Farsad, Y. C. Eldar and A. J. Goldsmith, “ViterbiNet: a deep

learning based Viterbi algorithm for symbol detection,” IEEE Trans. Wireless Com-
mun., vol. 19, no. 5, pp. 3319-3331, May 2020.
[86] D. Madhubabu and A. Thakre, “Long-short term memory based channel prediction
for SISO system,” Proc. IEEE Intl. Conf. on Commun. and Elec. Sys. (ICCES),
pp. 1-5, Jul. 2019.
BIBLIOGRAPHY 191
[87] L. Smaini, RF Analog Impairments Modeling for Communication Systems Simula-

tion: Application to OFDM-Based Transceivers, Wiley, 2012.
[88] A. Leshem and M. Yemini, “Phase noise compensation for OFDM systems,” IEEE
Trans. Signal Proc., vol. 65, no. 21, pp. 5675-5686, Nov. 2017.
[89] 3GPP TS 36.211 V14.5.0 (2018-01), “Evolved Universal Terrestrial Radio Access (E-
UTRA); Physical Channels and Modulation,” 3GPP; Technical Specification Group
Radio Access Network.
[90] ITU-R M.1225, “Guidelines for the evaluation of radio transmission technologies for
IMT-2000,” International Telecommunication Union Radio communication, 1997.
[91] A. Mohammadian, C. Tellambura, and G. Y. Li, “Deep learning-based phase noise

compensation in multicarrier systems,” IEEE Wireless Commun. Lett., vol. 10, no.
10, pp. 2110-2114, Oct. 2021.
[92] K. R. Murali and A. Chockalingam, “On OTFS modulation for high-Doppler fading
channels,” Proc. ITA, pp. 1-10, Feb. 2018.
[93] P. Raviteja, K. T. Phan, Y. Hong, and E. Viterbo, “Interference cancellation and

iterative detection for orthogonal time frequency space modulation,” IEEE Trans.
Wireless Commun., vol. 17, no. 10, pp. 6501-6515, Oct. 2018.
[94] S. Coleri, M. Ergen, A. Puri, and A. Bahai, “Channel estimation techniques based
on pilot arrangement in OFDM systems,” IEEE Trans. Broadcasting, vol. 48, no. 3,
pp. 223-229, Sep. 2002.
[95] R. C. Staudemeyer and E. R. Morris, “Understanding LSTM – a tutorial into long

short-term memory recurrent neural networks,” online: arXiv:1909.09586 [cs.NE]
12 Sep 2019.
[96] J. Han et al., “An empirical study of the dependency networks of deep learning
libraries,” Proc. IEEE ICSME’2020, pp. 868-878, Nov. 2020.
192 BIBLIOGRAPHY
[97] R. Jain, “Channel models: a tutorial,” WiMAX Forum AATG, vol. 10, Dept. CSE,
Washington univ. St. Louis, 2007.
[98] I. A. Khan and S. K. Mohammed, “Low complexity channel estimation for OTFS
modulation with fractional delay and Doppler,” online: arXiv:2111.06009 [cs.IT], 11
Nov 2021.
[99] Z. Wang, L. Liu, Y. Yi, R. Calderbank, and J. Zhang, “Low-complexity channel

matrix calculation for OTFS systems with fractional delay and Doppler,” Proc.
IEEE MILCOM’2022, pp. 787-792, Nov. 2022.

Phd Thesis Sandesh

Uploaded by

Copyright:

Available Formats

You might also like

Phd Thesis Sandesh

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Phd Thesis Sandesh

Uploaded by

Copyright:

Available Formats

Deep Learning Based Channel Estimation in

Sandesh Rao Mattu

Electrical Communication Engineering

My Family and Teachers

sparsely available. Towards this, we devise and employ two-dimensional convolutional

(·)T : Transpose operator

Z+ : Set of all non-negative integers

1.5.2 Learning based channel estimation in OFDM . . . . . . . . . . . . 44

2 Deep channel prediction in time-varying channels 48

3 Learning based channel estimation in OFDM systems 84

4 Learning based DD channel estimation in OTFS systems 99

4.2.2 DDNet - proposed RNN based DD channel estimator . . . . . . . 105

5 Learning in TF domain for DD channel estimation in OTFS 141

A Derivation of channel matrix with fractional DD and rectangular pulse171

B Derivation of input-output relation for two-step OTFS 173

List of publications from the thesis 179

2.1 Parameters of LSTM layer of channel predictor. . . . . . . . . . . . . . . 53

3.1 Parameters of the 2D-CNN layers in the channel estimator network. . . . 88

4.1 Parameters of the DDNet architecture. . . . . . . . . . . . . . . . . . . . 107

5.1 Parameters of DNN1/DNN2. . . . . . . . . . . . . . . . . . . . . . . . . . 149

1.1 Channel representation when the receiver is moving at a velocity v. . . . 4

2.1 Recurrent unit of the LSTM architecture. . . . . . . . . . . . . . . . . . . 52

3.1 Proposed CNN based channel estimator network and PN compensation

3.2 Proposed CNN based channel estimator network. . . . . . . . . . . . . . 89

4.1 OTFS modulation scheme. . . . . . . . . . . . . . . . . . . . . . . . . . . 101

5.13 NMSE performance of the algorithm in [74], M-MLE algorithm in [98],

1.1 Wireless channels

1.1.1 Time-dispersive channels

τmax = max |τi − τj |. (1.2)

Figure 1.1: Channel representation when the receiver is moving at a velocity v.

1.1.2 Frequency-dispersive channels

νmax = max |νi − νj |, (1.8)

Figure 1.2: Jakes’ Doppler spectrum.

[11]. That is,

of PSD) can be shown to be

R(∆t) = J0 (2πfDmax ∆t), (1.11)

1.2 Signalling in wireless channels

1.2.1 Conventional signalling

In conventional signalling scheme, the transmission happens symbol by symbol. The

where x is the transmitted symbol drawn from a modulation alphabet A (e.g., M -

ĥ = arg min∥y − hxp ∥2 . (1.13)

Figure 1.4: Orthogonal subcarriers in the frequency domain in OFDM systems.

1.2.2 Orthogonal frequency division multiplexing

OFDM is a signalling scheme used for frequency-selective channels. In OFDM, symbols

Figure 1.5: Block diagram of an OFDM communication system.

experience frequency-flat fading.

Alternately, the above expression can be expressed as

Y [k] = H[k]X[k] + V [k], k = 0, 1, · · · , M − 1, (1.17)

• An OFDM system transforms a frequency-selective channel into a set of parallel

• The performance of OFDM systems severely degrades when the orthogonality

Figure 1.6: Loss of orthogonality among subcarriers in OFDM.

Figure 1.7: BER as function of Doppler in an OFDM system.

introduce degradation in performance when the signal passes through a non-linear

In order to address the degradation in performance of OFDM under doubly-selective

1.2.3 Conventional OTFS modulation

OTFS is a two-dimensional (2D) modulation technique that multiplexes information

Figure 1.8: A doubly-selective channel represented in the TF domain.

Figure 1.9: A doubly-selective channel represented in the delay-Doppler domain.

Figure 1.10: Block diagram of OTFS modulation scheme.