Channel Estimation and Hybrid Precoding For Millimeter Wave Communications A Deep Learning-Based Approach

Received August 10, 2021, accepted August 25, 2021, date of publication August 30, 2021, date of current
version September 8, 2021.

Digital Object Identifier 10.1109/ACCESS.2021.3108625
Channel Estimation and Hybrid Precoding for

Millimeter Wave Communications: A Deep
Learning-Based Approach
QIUJIN LU , TIAN LIN , (Graduate Student Member, IEEE), AND YU ZHU , (Member, IEEE)
State Key Laboratory of ASIC and System, School of Information Science and Technology, Fudan University, Shanghai 200433, China
Corresponding author: Yu Zhu (zhuyu@fudan.edu.cn)
This work was supported by the National Natural Science Foundation of China under Grant 61771147.
ABSTRACT Hybrid analog and digital beamforming (HBF) has been regarded as a key technology for future
millimeter wave (mmWave) communication systems due to its ability to obtain a good trade-off between
achievable beamforming gain and hardware cost. In this paper, we investigate the channel estimation and
hybrid precoding for mmWave MIMO systems with deep learning. We adopt the hierarchical codebook
based algorithm for channel estimation as it requires limited number of pilot transmissions, and enhance its
performance by proposing a new codebook design algorithm based on manifold optimization (MO). With
the estimated channel state information (CSI) as the input, we develop a robust HBF network (HBF-Net)
by applying convolutional layers and attention mechanism, which can be trained to generate a robust HBF
matrix targeting at spectral efficiency maximization with imperfect CSI. To further improve the performance,
we propose a joint channel estimation and HBF optimization network (CE-HBF-Net). Considering that the
adaptively selected HBF vectors in the hierarchical codebook based channel estimation are different for
different channel realizations, we skillfully propose an index assign-and-input method to efficiently feed such
information to the CE-HBF-Net to reduce the network input dimensions and make the network trainable.
Furthermore, we propose a signal self-attention mechanism to enable the CE-HBF-Net to intelligently assign
larger weight coefficients to those signals that contribute more to channel estimation. Simulation results
show that the well-designed HBF-Net and CE-HBF-Net outperform the conventional HBF algorithms with
imperfect channel and exhibit robustness to mismatches between offline training and online deployment
stages.
INDEX TERMS Millimeter wave (mmWave), channel estimation, hierarchical codebook, manifold opti-
mization (MO), hybrid beamforming (HBF), deep learning (DL), attention mechanism.
I. INTRODUCTION A. RELATED WORKS AND MOTIVATIONS

Hybrid analog and digital beamforming (HBF) is a promising To deal with the complicated HBF optimization problem
technology to balance the beamforming gain, the hard- with non-convex constraints, several algorithms have been
ware cost, and the power consumption for future mas- proposed recently. In [1], the authors devoted to selecting
sive multiple-input and multiple-output (MIMO) millimeter analog beamformers from a pre-defined codebook via the
wave (mmWave) communication systems. By separating the orthogonal matching pursuit (OMP) method. Subsequently,
whole beamformer into a low-dimensional baseband digital the authors in [2]–[4] proposed some iterative algorithms
one and a high-dimensional analog one implemented with to further improve performance. Although these HBF algo-
phase shifters, the HBF architecture can significantly reduce rithms can deal with the multivariate optimization problem
the number of radio frequency (RF) chains but still guarantee with the non-convex constant modulus constraint in the ana-
a sufficient beamforming gain [1]–[4]. log beamforming matrix, they either require some approxi-
mations to simplify the original objective function, or require
a lot of time-consuming serial iterations to obtain a solu-
The associate editor coordinating the review of this manuscript and tion. Recently, intelligent communications with the power of
approving it for publication was Cunhua Pan . deep learning (DL) have received much attention and shown
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
120924 VOLUME 9, 2021
Q. Lu et al.: Channel Estimation and Hybrid Precoding for mmWave Communications
many benefits in tackling intractable conventional physical

level problems [5]–[11]. Some existing works applied DL to
the beamforming optimization problem [12]–[18]. In [12],
the authors proposed a neural network (NN) to select the
analog beamformer from a pre-defined codebook for max-
imizing the spectral efficiency. However, the restriction of
solution space might result in inevitable performance loss.
In [14], the authors designed an NN which can directly out-
put the optimized analog beamformer satisfying the constant
modulus constraint. The authors in [15] proposed a deep
FIGURE 1. Block diagrams of HBF designs with imperfect channel
NN (DNN) framework to construct an auto-precoder. The estimation. (a) Conventional channel estimation and HBF design;
authors in [17] unfolded the gradient ascent beamforming (b) Proposed MO codebook for HC-CE enhancement and robust DL-based
HBF design for spectral efficiency improvement; (c) Proposed DL-based
algorithm with a residual neural network. However, most of joint channel estimation and HBF optimization.
the above works assumed that perfect channel state informa-
tion (CSI) is available to form the beamformers, which may there are mainly three problems in the conventional design
not be satisfied in practical systems. that can be improved.
As perfect CSI is not available in practical systems, chan-
1) The design of the hierarchical codebook is critical to
nel estimation is usually carried out before HBF optimization.
channel estimation. The conventional HBF hierarchical
Many channel estimation algorithms with the HBF architec-
codebook requires a number of RF chains to achieve
ture have been proposed [19]–[30], including these [19]–[22]
the desired beam pattern. It is necessary to design a
applying the DL method. The authors in [19] regarded the
hierarchical codebook with good beam response and
channel as a 2D image and used image processing technol-
fewer RF chains.
ogy and a convolutional neural network (CNN) to achieve
2) Most of the existing works on HBF optimization
super-resolution image restoration and channel reconstruc-
assumed that perfect CSI is available to form the beam-
tion. The authors in [20] used residual denoising techniques
formers, which cannot be satisfied in practical systems.
to estimate noise and used an CNN to estimate the channel.
It is necessary to investigate the robust HBF design
The authors in [21] proposed to use NNs to learn the basic
considering the effect of channel estimation errors.
characteristics of the channel from low-rank measurements
As DL has been widely used in communication systems
and map the characteristics to a channel matrix. The authors
to achieve good performance [12]–[18], it motivates us
in [22] proposed a DL-based channel estimation algorithm in
to design robust HBF with DL.
a time-varying Rayleigh fading channel to dynamically track
3) In the conventional design, channel estimation and
the CSI. The authors in [23]–[25] converted channel estima-
HBF optimization are usually carried out in series with
tion into a sparse signal recovery problem. In [23], the authors
different goals. For example, people usually take the
achieved sparse signal recovery and finally achieved channel
objective of mean square error (MSE) minimization
estimation with the OMP algorithm. In [24], the authors
for channel estimation and that of spectral efficiency
proposed a structured random sensing matrix with robustness
maximization for HBF optimization. Thus, a chan-
in a low signal to noise ratio (SNR) range. The authors
nel estimate with a less MSE may not always result
in [25] proposed a compressive channel estimation frame-
in an HBF solution with a higher spectral efficiency.
work based on multiple measurement vectors and reduced the
Therefore, it is meaningful to jointly optimize channel
computational cost. The hierarchical codebook based channel
estimation and HBF with only one objective. There
estimation (HC-CE) algorithm [26]–[28] is another attractive
have been some works that jointly optimize multiple
method as it utilizes the sparse prior knowledge of mmWave
communication modules and achieve competitive per-
channels and requires a small number of pilot transmissions.
formance [31]–[33]. These works motivate us to apply
The authors in [26] designed a hierarchical codebook and
DL to jointly optimize channel estimation and HBF.
proposed an adaptive channel estimation algorithm with the
pre-designed codebook in both the single-path case and the
multi-path case. The authors in [27] compared the exhaustive B. CONTRIBUTIONS AND PAPER ORGANIZATION
search and hierarchical search from different aspects. In [28], To deal with the above problems in the conventional design,
the authors designed a hierarchical codebook by exploiting we have made the following improvements, which are also
beam widening with the sub-array technique. highlighted with red color in Fig. 1(b) and (c). Our contribu-
Since the HC-CE algorithm has been well studied, we take tions can be summarized as follows:
it as a practical channel estimation algorithm to obtain a 1) Manifold Optimization (MO) Based Codebook: Differ-
channel estimate. It normally includes three basic steps: code- ent from the conventional hierarchical codebook design
book design, hierarchical search, and channel reconstruction. approach [26], where the HBF codebook is optimized
Fig. 1(a) shows a typical block diagram of the conventional with the objective of approaching a digital beamform-
HBF design with imperfect channel estimation. However, ing DBF) codebook, in our approach, we optimize
VOLUME 9, 2021 120925

FIGURE 2. Diagram of the downlink of a typical MIMO system with hybrid precoding.
each HBF codeword by directly targeting the expected of the conventional HC-CE algorithm, and then optimize
beam response and applying MO [2] to deal with the the hierarchical codebook with MO. The basic ideas and
constant modulus constraint in the analog beamformer. implementation details of the HBF-Net and the CE-HBF-
The proposed MO codebook is shown to achieve better Net are presented in Section IV. We demonstrate various
beam response and better performance via simulation. simulation results in Section V. Finally, we conclude the
2) Robust HBF Design Based on DL: We propose paper in Section VI.
a DL-based robust HBF optimization network, We use the following notation throughout this paper: A
i.e., HBF-Net. As shown in Fig. 1(b), the input of the is a matrix, a is a vector and a is a scaler. IP is a P × P
HBF-Net is the estimated channel, and the output is identity matrix. kak denotes the norm of a. |A| denotes the
the optimized hybrid precoder. The HBF-Net adopts determinant of A and kAkF denotes its Frobenius norm.
convolutional layers and a depth attention mechanism A∗ , AH , A−1 are the complex conjugate, complex conjugate
to efficiently exploit the characteristics of the imperfect transpose, and inverse of A, respectively. A† denotes the
CSI. By setting the loss function of the HBF-Net to pseudo-inverse of A, i.e., (A)† = (AH A)−1 AH . Finally,
the opposite value of the average spectral efficiency if J (A) represents a real function of a complex matrix A, then
over all training samples with perfect CSI during the ∇J (A) denotes the conjugate gradient of J (A) with respect
training stage, it can be trained to approach the ideal to A, i.e., ∇J (A) = ∂J∂A(A)
∗ .
spectral efficiency (the one without estimation errors)
II. SYSTEM MODEL
as much as possible and thus exhibits robustness to
Consider the downlink of a typical narrowband MIMO
channel estimation errors.
mmWave system shown in Fig. 2, where a base station (BS)
3) Joint Channel Estimation and HBF Network: We fur-
equipped with Nt antennas and NRF RF chains transmits Ns
ther propose a novel NN for joint channel estima-
data streams (Nt ≥ NRF ≥ Ns ) to a user equipped with Nr
tion and HBF optimization, which is referred to as
antennas. For the ease of presentation, in this paper, we focus
CE-HBF-Net. As shown in Fig. 1(c), the CE-HBF-Net
on the design of hybrid precoder and assume that the user
does not take the estimated channel as its input like
uses a fully DBF architecture with Nr = Ns . The extension
that in Fig. 1(b), but takes the intermediate data in
to the design of hybrid combiner is similar to that of hybrid
the hierarchical search process. Thus, the CE-HBF-Net
precoder. At the BS, denote the Ns × 1 original symbol vector
has the ability of joint channel estimation and HBF
by s with normalized average symbol energy, i.e., E{ssH } =
optimization, and is expected to exhibit stronger robust-
INs . After hybrid precoding with an NRF × Ns DBF matrix VB
ness and achieve higher spectral efficiency than the
and an Nt × NRF analog one VRF implemented using phase
HBF-Net. As the intermediate data must include not
shifters, from the equivalent baseband representation point of
only the information of which HBF codewords are
view, the precoded signal can be represented as x = VRF VB s.
selected in the hierarchical search process but also the
Assume that x is transmitted through an mmWave MIMO
resulting signals after HBF, in order to ensure that the
channel H, and the received signal vector at the user is pro-
CE-HBF-Net is trainable, we skillful propose an index
cessed with an Nr × Ns fully digital combiner WB . The com-
assign-and-input (IAI) method to compress the dimen-
bined signal can be represented as y = WH H
B Hx+WB n, where
sions of the intermediate data. In addition to applying
n denotes the additive noise vector satisfying the complex
convolutional layers and the depth attention mechanism
circularly symmetric Gaussian distribution with zero mean
like that in the HBF-Net, we specially adopt a signal
and covariance matrix σ 2 INr , i.e., n ∼ CN 0, σ 2 INr . In this
self-attention mechanism in the design of the CE-HBF-
paper, a widely-used mmWave channel model in [3], [26] is
Net structure for better performance.
applied, which is
The rest of the paper is organized as follows. We intro- r L
Nt Nr X
duce the system model and the HBF problem formulation in H= α` ar (θ` ) aH
t (φ` ) , (1)
Section II. In Section III, we briefly review the procedure L
`=1
120926 VOLUME 9, 2021

FIGURE 3. Hierarchical codebook F5 with L = 3, K = 2, and N = 96.
where α` denotes the complex gain of the `-th path, for Assume that the number of multiple channel paths, L,
` = 1, . . . , L, and at (φ` ) and ar (θ` ) denote the antenna array is known at the BS.1 We denote the whole codebook by FS ,
response vectors at the BS and the user, respectively, with φ` where S is the number of the levels. Assume that each AoD
and θ` denoting the angle of departure (AoD) and the angle range in a certain level in the codebook is divided further
of arrival (AoA) associated with the `-th path, respectively. into K sub-ranges in the next level. At the first level, the full
Similar to [1], [3], we take the spectral efficiency as the AoD range is divided equally into KL exclusive sub-ranges.
performance metric. Under the assumption that the optimal Then, the total number of sub-ranges in the last level (i.e.,
fully digital receiver is adopted at the user side with Nr = Ns , the S-th level) is N = LK S , which determins the resolu-
the objective can be expressed as tion of the codebook. Fig. 3 shows a BS codebook F5 for
a uniform linear antenna array (ULA) with the full AoDs
HVRF VB VH H
B VRF H
H
range [0, π], where f(s,n) denotes n-th codeword of the s-th
R = log INs + . (2)
σ2 level. To estimate an L-path channel, the HC-CE algorithm
needs L outer iterations. In each one, an estimated AoD/AoA
The hybrid precoding optimization problem can be formu- is obtained after S inner iterations going through the BS
lated as follows hierarchical codebook from the top level to the bottom level.
After computing each path gain by the least square algorithm,
max R
VRF ,VB the estimated channel can be reconstructed with (1). For more
s.t. [VRF ]ij = 1, ∀i, j details of the hierarchical search process, please refer to [26].
kVRF VB kF ≤ 1, (3)
B. HIERARCHICAL CODEBOOK ENHANCEMENT WITH MO
where the first constant modulus constraint comes from the The accuracy of channel estimation highly depends on the
implementation of phase shifters for analog beamforming, hierarchical codebook. Ideally, choosing a codeword means
and the second constraint denotes the total transmit power testing the corresponding angle range. However, the HBF
constraint with a normalized power. codebook suffers from beam power leakage and fluctuation
due to hardware limitations, which leads to beam alignment
III. CHANNEL ESTIMATION FOR mmWave CHANNELS deviation, and the received power is not only determined
Channel estimation for the mmWave system with the HBF by the path gain in the test angle range. In the two-step
structure is different from the conventional one with the fully HBF codebook design algorithm in [26], [34], for each HBF
DBF structure mainly due to the fact that one should estimate vector in the codebook, an DBF vector satisfying the expected
the high-dimensional air-interface MIMO channel based on beam response requirement is first designed, and then a cor-
the low-dimensional observation signal space. In this section, responding HBF vector is generated to approach the DBF
we first breifly review the conventional HC-CE algorithm, vector. However, this algorithm requires a number of RF
and then further propose an enhanced codebook with MO for chains to guarantee the DBF vector approximation accuracy.
better channel estimation. In this subsection, we propose a new HBF codebook design
algorithm, where every HBF vector is optimized with fewer
A. HIERARCHICAL CODEBOOK BASED CHANNEL RF chains by directly targeting the expected beam response
ESTIMATION ALGORITHM objective with MO.
The basic idea of the HC-CE algorithm is to design a hierar- Let denote the cosine value of an AoD. We design an
chical codebook with the beamwidth becoming narrower for S-level hierarchical codebook by dividing the range instead
higher beam-vector levels. Using the hierarchical codebook, of AoD in [26]. Take Fig. 3 as an example, the full AoDs range
the HC-CE algorithm can gradually narrow the promising 1 In practical, L can be estimated by classical direction-of-arrival (DOA)
angle range and finally get the estimated channel with a small estimation methods, e.g., multiple signal classification (MUSIC) and estima-
number of pilot transmissions. tion of signal parameters via rotational invariant techniques (ESPRIT).
VOLUME 9, 2021 120927

Algorithm 1: MO Based HBF Algorithm Although the problem in (8) is still difficult to solve due
Input: DBS , g to the non-convex constant modulus constraint, we can use
Output: FRF , fBB MO [2] to obtain a suboptimal solution. This is because the
Initialization FRF,0 randomly and set i = 0; constant modulus constraint defines a Riemannian manifold
Compute fBB according to (7); and MO uses the Riemannian gradient descent algorithm to
Repeat ensure that the searched point always satisfies the constant
1. Compute ∇J (FRF ) according to (11); modulus constraint. The Riemannian gradient is defined as
2. Use MO method to compute FRF,i+1 ; the projection of the Euclidean conjugate gradient onto the
3. i ← i + 1; tangent space of a point on the Riemannian manifold. The
4. Update fBB according to (7); challenge of applying MO is to derive the Euclidean conju-
Until a stopping criterion triggers gate gradient. According to [4], the differential of J (FRF ) can
be expressed as

d (J (FRF )) = tr ∇J (FRF ) d(FH RF ) , (9)
is [0, π], i.e., ∈ [−1, 1]. Each codeword design goal can
be expressed as
( where d (·) denotes the differential operation. According to
1 if φ̄u ∈ (s,n) the differential rule of complex matrix and the property of
f(s,n) aBS φ̄u =
H

(4) tr (AB) = tr (BA), we have
0 if φ̄u ∈
/ (s,n) ,
n
where aBS φ̄u is the antenna array response vector asso- d (J (FRF )) = tr DBS DH H H
BS FRF (fRF DBS DBS FRF )
−1

ciated with φ̄u , s = 1, · · · , S and n = 1, · · · , LK s . As a × fH

H H H
RF − INt DBS gg DBS FRF (fRF
result, the codeword f(s,n) is designed to achieve a constant o
projection on the array response vectors within the expected × DBS DH −1
BS FRF ) d FRF
H
. (10)
directions and zero projection on the other directions. By uni-
formly quantizing range (s,n) and ignoring the quantization By comparing (10) with (9), it can be found that ∇J (FRF )
error, (4) can be rewritten as is given by
DHBS FRF,(s,n) fBB,(s,n) = g(s,n) , ∇J (FRF ) = DBS DH

(5) H H −1
BS FRF (fRF DBS DBS FRF )
H H H

aBS φ̄1 , . . . , aBS φ̄u , . . . , aBS φ̄N

where DBS = × fRF − INt DBS gg DBS FRF
RF DBS DBS FRF ) .
−1
defines an NBS ×N BS AoD dictionary matrix with N ≥ NBS , × (fH H
(11)
φ̄u = −1 + 2 · (u − 1)/N for u = 1, · · · , N , and g(s,n) is an
N × 1 vector with ones in locations u if φ̄u ∈ (s,n) and zeros With the derived Euclidean conjugate gradient, we can
if φ̄u ∈
/ (s,n) . For example, in Fig. 3, g(1,1) is an 96 × 1 vector compute the Riemannian gradient, obtain a point in the tan-
where the first 32 elements are ones, and the other elements gent space along the Riemannian gradient, and finally retract
are zeros. the searched points into the manifold. The MO process is
Since for any s and n, the codeword design method is summarized in the Algorithm 1. Note that as the transmitter
the same, for simplicity, we omit the subscript (s, n) in the has a total power constraint, the final HBF codeword is given
following derivation. The optimization problem becomes by
2
min DH
BS FRF fBB − g f=
FRF fBB
. (12)
FRF ,fBB
kFRF fBB k
s.t. [FRF ]ij = 1, ∀i, j
kVRF VB kF ≤ 1. (6) After calculating all codewords, an MO based HBF code-
book is generated. Although MO requires multiple iterations
To solve this problem, we first ignore the total power to obtain a local optimal solution, the HBF codebook is
constraint temporarily and obtain the optimal DBF vector generated offline and does not increase time overhead.
given a fixed analog beamforming vector FRF with the least It is worth noting that compared with the conventional
square algorithm, which is given by two-step HBF codebook design algorithm in [26], the pro-
† posed MO based algorithm directly optimizes each HBF
fBB = (DH
BS FRF ) g. (7) codeword aiming at the expected beam response objective
By substituting (7) back into (6) and defining J (FRF ) , and thus leads to a more accurate HBF codebook and the
2 resulting AoD estimates. It is also worth noting that although
DH H †
BS FRF (DBS FRF ) g − g , the problem in (6) becomes the above MO based HBF algorithm is proposed in the
min J (FRF ) case of fully connected HBF architecture, it can be readily
FRF extended to other cases such as the partially connected HBF
s.t. [FRF ]ij = 1, ∀i, j. (8) architecture.
120928 VOLUME 9, 2021

IV. HBF OPTIMIZATION WITH DL normalization (denoted by ‘‘Lambda2’’). The desired digital
While the HBF optimization is a complicated non-convex beamformer is given by
problem and it is unlikely to find a closed-form optimal solu-
V0B
tion [1], DL has been regarded as an efficient design approach VB = . (15)
and is expected to handle this intractable problem [35]. In this kVRF V0B kF
section, we first propose a robust HBF optimization network, In summary, the HBF-Net can be expressed as a nonlinear
i.e., HBF-Net, with the input of imperfect CSI. We also function g(·) of the estimated CSI and the noise power. That
propose a joint channel estimation and HBF optimization is,
network, i.e., CE-HBF-Net, for better performance.
VRF,k , VB,k = g(Ĥk , σk2 ), (16)
A. DESIGN OF HBF-Net
where Ĥk and σk2 denote the estimated CSI and the noise
There have been some iterative algorithms in the literature to
power associated with the k-th sample. Note that the proposed
deal with the intractable constant modulus constraint in the
HBF-Net can optimize the digital matrix and analog matrix
HBF optimization [1]–[3]. However, these algorithms either
simultaneously instead of alternating optimization like that
require approximations to simplify the original objective
in [2]–[4].
function or require a lot of time-consuming serial iterations
to obtain a solution. Besides, most of these works are based
2) STRUCTURE OF HBF-Net
on the knowledge of perfect CSI and thus suffer from certain
The overall structure of the HBF-Net is shown in Fig. 4(a).
performance loss due to channel estimation errors. As the
It adopts two convolutional (Conv) blocks for feature extrac-
DL-based scheme has exhibited its ability in extracting the
tion. To enhance the generalization ability for different SNRs,
complicated characteristics of the wireless channels [6], [7],
the output tensor of the second convolutional block is flat-
in this subsection, we propose to apply DL to solve (3) with
tened to a one-dimensional vector and concatenated with the
imperfect CSI.
noise power σ 2 before entering the subsequent two fully con-
1) THE INPUT AND OUTPUT SETTING
nected layers with Nt NRF and 2Ns NRF neurons, respectively.
The detailed structure of the convolutional block is shown
Generally speaking, in almost all the conventional HBF opti-
in Fig. 4(b), which includes a convolutional layer, a depth
mization algorithms [1]–[4], the HBF optimization can be
attention module, and a batch normalization layer.
essentially represented as a function of perfect CSI and SNR,
• Convolutional Layer: According to (1), the sparsity of
VRF , VB = f (H, γ ). (13) the channel and the correlation among its elements moti-
This inspires us to apply DL to solve the same problem, vate us to apply the convolutional layer, which has been
as the NN has already demonstrated its ability to fit complex widely used in computer vision to enhance the opera-
functions with a large number of neurons and nonlinear acti- tional efficiency of NNs.
vation functions [6], [35]. • Depth Attention Module: Depth attention mechanism
Since the perfect CSI, H, is normally not available in prac- can enhance the feature extraction ability of the con-
tical systems, the estimated CSI, Ĥ, is set as the input of the volutional layer by assigning different weights to depth
HBF-Net. By separating the real and imaginary parts of the dimensions of its outputs, as those corresponding fea-
estimated CSI, the input Ĥ is represented by an Nr × Nt × 2 tures have different contributions to the NN target [36].
real-valued tensor. To achieve the generalization for various • Batch Normalization (BN) Layer: It can make the out-
SNRs, we feed the noise power into the HBF-Net as the put distribution more uniform and prevent gradient
transmit power is normalized. dispersion.
In order to ensure that the analog precoder VRF meets the In the first convolutional block, the convolutional layer uses
constant modulus constraint and the HBF precoder meets the 4Nt different 3 × 2 filters with the well-known ReLu (i.e.,
total power constraint, the HBF-Net first outputs the phase f (x) = max(0, x)) as activation function. In the realization of
information of each element, i.e., Nt NRF × 1 real-valued the depth attention, both global average pooling and global
vector (the output of a fully connected layer). The vector is maximum pooling are used to obtain the depth descriptor.
reshaped to an Nt × NRF matrix (denoted by 8), and then Two dense layers are used to extract the non-linear depen-
finally reconstructs the analog precoder with Euler’s formula. dency between depth dimensions. Note that to reduce the
The desired analog beamformer VRF is given by trainable parameters, dense layers after pooling operation
share parameters. After adding the outputs of two excitation
VRF = exp(j · 8) = cos(8) + j · sin(8), (14)
√ operations, we obtain the final depth attention values. The
where j = −1. We denote the transformation from 8 to re-weighting operation Fscale (·) outputs the weighted spatial
VRF as ‘‘Lambda1’’. As to the digital precoder VB , we con- feature map. The implementation of the second convolutional
vert the output from a 2Ns NRF × 1 real-valued vector (the block is similar to the first one, except that the convolutional
output of a fully connected layer) to an Ns × NRF complex- layer uses 2Nt different 3 × 1 filters. The output shapes of
valued matrix (denoted by V0B ), and then perform power the main layers/blocks of the HBF-Net are listed in Tab. 1.
VOLUME 9, 2021 120929

FIGURE 4. Network structure of the proposed HBF-Net. (a) The overall network structure; (b) The detailed structure of the convolutional (Conv)
block in (a).
TABLE 1. Implementation details of the HBF-Net. • More Effective Information Extraction for the Hierar-
chical Search: In the conventional HC-CE algorithm,
the function of the upper-level search is limited to nar-
rowing the beam alignment ranges of the lower-level,
and the estimation does not fully utilize the information
such as the power of the resulting signal after HBF
(representing how likely an AoD (or AoA) belongs to
a range). joint optimization is expected to extract more
B. DESIGN OF CE-HBF-Net information from all the received signals and obtain an
Although the conventional HC-CE algorithm has the advan- HBF design with a higher spectral efficiency.
tage of fewer pilot transmissions compared with the exhaus- • Joint Optimization With the Same Goal: The conven-
tive beam search algorithm, it has several limitations that can tional design approach for channel estimation and HBF
be considered for improvement. Firstly, in each angle range optimization in mmWave MIMO communication sys-
refining iteration, a ‘‘hard’’ decision is made about whether tems is usually split into two independent blocks with
the AoD belongs to a particular range or not. However, different optimization objects (e.g. MSE minimization
the ‘‘soft’’ information of how likely the AoD belongs to a for channel estimation and spectral effecient maximiza-
range is not sufficiently utilized. Furthermore, the channel tion for HBF optimization). As the goals of these two
estimation accuracy is limited by the resolution of the last blocks do not always entirely match, the joint optimiza-
level of the codebook. Secondly, in the multipath scenario, tion with a unified goal is supposed to achieve higher
the AoDs/AoAs of the multiple paths are not jointly estimated spectral efficiency than the conventional separate design
but estimated one by one. The multipath interference may approach.
greatly affect the estimation accuracy. Finally, the HC-CE Although the basic idea and motivation of the joint opti-
algorithm estimates the path gain at the last level in each mization are clear, it is difficult to sovle the problem with the
outer iteration. Actually, the upper-level searches are also conventional model-based method. Fortunately, DL provides
useful for the path gain estimation. According to the above an attractive approach. In this paper, we propose the CE-HBF-
discussion, the HC-CE algorithm can be further enhanced. Net to implement channel estimation and HBF optimization
As the estimation error in the conventional algorithm comes jointly.
from multiple aspects and is challenging to model, we resort
to designing an NN to improve the performance. 1) THE INPUT AND OUTPUT SETTING
Intuitively, we can design two networks to implement One key issue in the design of the CE-HBF-Net is how to
channel estimation and HBF optimization, respectively. How- set its input, which includes: What should be input to the
ever, such a two-step optimization is suboptimal and the joint CE-HBF-Net in order to achieve better channel estimation
channel estimation and HBF optimization is more attractive and HBF optimization? How to input efficiently?
due to the following reasons:
• Implicit Channel Estimation Considering HBF Opti- a: RECEIVED SIGNALS
mization: Conventional channel estimation and HBF One necessary input is the collection of the received signals.
design are usually optimized separately, which indicates By feeding the combined signals in each outer and inner
that the HBF design does not affect the channel estima- iteration into the CE-HBF-Net, the CE-HBF-Net is expected
tion. Joint optimization allows the channel estimation to fully utilize the historical ‘‘soft’’ information and learn
to take the HBF design into account and obtain an esti- intelligently how to compensate for the codebook quantiza-
mated channel more beneficial to the HBF design. tion error and how to reduce the multipath interference in
120930 VOLUME 9, 2021

FIGURE 5. Network structure of the proposed CE-HBF-Net. (a) The overall network structure; (b) The detailed
structure of the convolutional (Conv) block in (a).
the offline training stage. As the backpropagation algorithm no matter how different the channel is for each realization,
of the NN only supports real number operations, we feed the whole hierarchical codebook is the same. Thus, an effi-
the real and imaginary parts of the digital signals into the cient method is to assign an index (i.e., a unique number)
CE-HBF-Net. Such processing does not lose any information to each vector in the hierarchical codebook and input only
because the real and imaginary parts can completely char- the indices of the selected vectors to the CE-HBF-Net. It is
acterize the amplitude and phase of the combined signals. expected that with enough training samples, the CE-HBF-Net
We denote the collection of the real and imaginary parts of can intelligently learn the relationship between the combined
the received digital signals as R. Denoting the number of pilot signal, the index, and the selected vector. We refer to this
transmissions as Np , then the dimension of R is 2Np . index assign-and-input method as IAI.
One way to assign indices can be found by noting that each
b: PRECODING AND COMBINING VECTORS selected HBF vector has a dedicated position in the codebook,
Different from the existing works on the DL-based chan- which is determined by the row number (in which level) and
nel estimation [18], [37], where the precoders and combin- the column number (which one in this level) of the codebook,
ers are the same for different channel realizations, in the as shown by the subscript (s, n) of each HBF vector f(s,n)
CE-HBF-Net, the precoders and combiners are adaptively in Fig. 3. Thus, the pair of numbers (s, n) can be taken as
selected from the codebook. These selected vectors are the HBF index. Actually, the row number s is not necessary
closely related to the AoDs/AoAs of each realized channel to be input to the CE-HBF-Net since the HBF vectors are
and thus are different for different channel realizations. It is always selected from the top to the bottom level. The first
necessary to tell the CE-HBF-Net the precoding and combin- level of the codebook will always be selected, so there is no
ing vectors associated with R. need to assign indices, as in [18], [37]. Moreover, if an AoD
However, if we directly input the selected vectors to the range in the s-th level is selected as one candidate range where
CE-HBF-Net, the input dimension would be extremely high the path is probably in (for example, assuming it is [ π6 , 2π 6 ]
and even prohibitive. For example, assuming that NBS = 64, in the 1st level in Fig. 3), then the K adjacent HBF vectors
and a total of 30 HBF complex vectors are selected for signal related to this range in the (s + 1)-th level will be all selected
precoding in the whole estimation process, a long vector with and tested for determining a finer sub-range (f(2,3) , f(2,4) in
64 × 30 × 2 = 3840 real numbers is then needed to represent the 2nd level in Fig. 3). The column number of the selected
all the selected precoding vectors. The input dimension for HBF vectors in the upper level can fully represent the K
representing the precoding vectors would come up to 3840. selected HBF vectors in the next level. As a result, it is only
As a result, the CE-HBF-Net would be very complex and hard necessary to assign indices to the HBF vectors that win in
to converge. the angle test of each level, and the indices of 1/K vectors
To reduce the input dimensions of the selected vectors, need to be input to the CE-HBF-Net. It is expected that the
we need to find some smart method. For the BS side, note that CE-HBF-Net can intelligently learn this pattern. In the above
VOLUME 9, 2021 120931

TABLE 2. Implementation details of the CE-HBF-Net. target range. As it is challenging to eliminate such errors,
we propose a signal attention mechanism to compensate for
the beam pattern imperfections and enable the NN to learn
how to assign signal attention values accordingly.
The block diagram of the proposed signal attention mech-
anism is shown in Fig. 5(a). First, concatenate the received
signal collection R, the index collection Isel and noise power
σ 2 together. Then, input them to the proposed signal attention
vector generator and obtain the learned signal attention vec-
tor. In this paper, we use one dense layer to generate the signal
attention vector. The output of the dense layer is determined
example with NBS = 64 and a total of 30 selected HBF by its learnable parameters, which are updated to reduce the
vectors, if assuming K = 2, the input dimension can be loss of CE-HBF-Net. As a result, the attention vector gener-
reduced from 64 × 30 × 2 = 3840 to 30 ator tends to learn an attention vector conducive to network
K = 15 by using
the IAI method. The user codebook has only one level, and convergence. Finally, the received signals weighted with the
the estimated AoA is obtained in the first inner iteration. For signal attention vector.
an L-path channel, the number of indices is L. We denote After processing by the signal attention mechanism,
the collection of the indices of the selected precoding and the weighted received signals are connected with the indices
combining vectors as Isel , and the corresponding dimension collection Isel and the noise power σ 2 , and are fed into the
is Nsel . CE-HBF-Net. The second dense layer of the CE-HBF-Net
can be regarded as the first step of implicit channel esti-
c: SNR mation. As the number of real dimensions of the mmWave
The SNR is an optional input, and it would be a good indicator channel matrix is 2Nr Nt , the number of neurons in the fully
for the CE-HBF-Net. Since the transmitter has a total power connected layer is set to 6Nr Nt to ensure sufficient feature
constraint, we feed noise power σ 2 to the CE-HBF-Net if it extraction and allow some redundant information. The output
is available. of the fully connected layer is reshaped to Nr × Nt × 6, where
The output setting is as the same as the HBF-Net. In sum- the size of the space feature map is Nr × Nt and the number
mary, the CE-HBF-Net can be expressed as a nonlinear of depth dimensions is 6. The subsequent feature extraction
function h(·) of the received signal collection R, the index is performed by two convolutional blocks.
collection Isel and noise power σ 2 . That is, The structure of the convolutional block in CE-HBF-Net is
as shown in Fig. 5(b). Compared with the convolution block
VRF,k , VB,k = h(Rk , Isel,k , σk2 ), (17) in the HBF-Net, since the CE-HBF-Net allows redundancy,
the average pooling layer is added to reduce some redun-
where Rk , Isel,k and σk2 denote the collection of the real dancy. In the first convolutional block, the convolutional layer
and imaginary part of the recieved signals, the collection of applies 8Nt different 3 × 2 filters with the ReLu as activation
indices of the selected precoding and combining vectors and function. The pooling window size is 2 × 1, and the step size
the noise power associated with the k-th sample, respectively. is 2. The implementation of the second convolutional block
Similar to the HBF-Net, the proposed HBF-Net also can is similar to the first, except that the convolutional layer uses
optimize the digital matrix and analog matrix simultaneously 6Nt different 3 × 1 filters. The output shapes of the main
instead of alternating optimization like that in [2]–[4]. layers/blocks of the CE-HBF-Net are listed in Tab. 2.
2) STRUCTURE OF CE-HBF-Net
In the HC-CE algorithm, as the BS sends a number of pilots C. UNSUPERVISED LEARNING STRATEGY BASED NEURAL
using different HBF codewords from the top to the bottom NETWORK TRAINING
level of the hierarchical codebook, the contribution of differ- 1) UNSUPERVISED LEARNING AND LOSS FUNCTION DESIGN
ent received pilot signals to channel estimation in the hierar- Most of the conventional intelligent communication designs
chical search is different. For example, since there is no prior use the supervised learning strategy. For example, in [6],
knowledge of the actual channel, doing a hierarchical search [7], [38], the labels are set to the transmitted bits or perfect
in a large range is necessary. Naturally, those searches aligned channel estimates, and the loss function is set to the MSE
to the AoDs and AoAs contribute more to channel estimation. between the output signals of the NN and the labels. However,
Theoretically, the received signal power represents the possi- it is difficult to find a proper label for the design of VRF and
bility that the corresponding target range contains scattering VB . If we take an optimized HBF beamformer based on a
paths, and also represents the attention value of the signal. conventional algorithm as the label and adopt the MSE as the
However, the beam response of each codeword is imper- loss function, then the NN can only be trained to approach
fect with power leakage and fluctuating beam gain, and the the conventional algorithm but cannot outperform it. Perhaps
received signal does not correspond precisely to the desired taking the fully digital beamformer as the label is another
120932 VOLUME 9, 2021

spectral efficiency. However, it is worth noting that as the

networks are trained offline, the computational complexity of
the training stage is a less concern. This is because the size of
the training set only affects the offline time overhead, which
is not strictly limited [14]. In this paper, we use 1.8 × 106
training samples in the offline training stage to ensure that
both the HBF-Net and CE-HBF-Net fully converge, and set
the sizes of the verification and test data sets to 2 × 105 and
104 , respectively. To achieve the generalization for various
SNRs, σk2 is generated randomly between −10dB and 10dB.
The Adam optimizer is applied to train the HBF-Net and
the CE-HBF-Net via the stochastic gradient descent method.
We use the learning rate decay strategy. If the loss does not
FIGURE 6. Offline training and online deployment strategy. (a) HBF-Net;
(b) CE-HBF-Net.
decrease after training for 20 epochs, the learning rate is
multiplied by 0.2, where the initial and minimum values of
choice, but such a matrix approximation method also results the learning rate are set to 10−3 and 5 × 10−5 , respectively.2
in some performance loss. Notice that the ultimate objective As shown in Fig. 6, the trained HBF-Net can replace the
of HBF optimization is to maximize the spectral efficiency in conventional HBF designer, and the trained CE-HBF-Net can
(3). Here, we use an unsupervised learning strategy without replace the conventional channel estimation algorithm and
the label, and propose to optimize VB and VB directly aiming the conventional HBF designer. Most conventional HBF algo-
at the HBF optimization objective. A loss function directly rithms require a number of serial iterations to achieve good
related to the objective of spectral efficiency in (3) is adopted, performance [2]–[4]. In contrast, the proposed DL-based
which is given by schemes with a limited number of matrix multiplications
can operate fast due to the acceleration of parallel com-
N
1 X Hk VRF,k VB,k VH H H
B,k VRF,k Hk puting and is, therefore, more applicable for high-speed
L=− log INs + , (18)
N
k=1
σk2 communications.
where N denotes the total number of training samples, and D. COMPLEXITY ANALYSIS
VRF,k and VB,k represents the output of the HBF-Net and In this subsection, we analyze the computational complexity
CE-HBF-Net in the k-th training. It can be clearly seen of the proposed HBF-Net and CE-HBF-Net in terms of the
from (18) that the physical meaning of the definition of number of real multiplications. The number of real multipli-
the loss function exactly corresponds to the opposite value cations for a dense layer is NI NO , where NI and NO denote the
of the average spectral efficiency over all training samples. input and output dimensions, respectively. The number of real
The less the loss value, the higher the average spectral effi- multiplications for a convolutional layer is CI CO kw kh WH ,
ciency. As DL is essentially a gradient descend method [35], where CI and CO denote the input and output depth dimen-
the loss value is guaranteed to converge to a local minimum sions, respectively, kw × kh is the size of convolution kernel,
point with training iterations and leads to the increase of and W × H is the output spatial dimension [40]. By substitut-
the average spectral efficiency. Besides, with the particular ing the specific hyper-parameter settings, we obtain the com-
loss function associated with the perfect CSI, the HBF-Net putational complexity as shown in Tab. 3 and Tab. 4, where
and the CE-HBF-Net can be trained to approach the ideal Ns , Nt and NRF are the number of data streams, the number
spectral efficiency (the one without estimation errors) as of BS antennas and the number of RF chains, respectively,
much as possible. In particular, since the optimization goal and Np and Nsel are the number of pilot transmissions and
of the CE-HBF-Net is to maximize the spectral efficiency, the dimension of the selected indices, respectively. According
the implicit channel estimation tends to be jointly opti- to [4], for the conventional model-based HBF algorithms
mized targeting at higher spectral efficiency, and thus the proposed in [3] and [4], the asymptotic computational com-
CE-HBF-Net is expected to achieve better performace than plexity in terms of the number of complex multiplications is
the HBF-Net due to such joint optimization. O(Nt3 ) and Nlp (4Nt2 NRF + 13Nt NRF 2 + 3N N 3
t RF + 4NRF +
3
4O(NRF )), respectively, where Nlp denotes the number of
2) OFFLINE TRAINING AND ONLINE DEPLOYMENT total iterations.
For the sake of avoiding a time-consuming online training Numerically, for Ns = NRF = Nr = 2, Nt =
process, similar to [6], [39], both the HBF-Net and the 64, Nsel = 48 and Np = 180, the proposed HBF-Net
CE-HBF-Net are trained offline with simulated samples as and CE-HBF-Net, and the conventional HBF algorithms
shown in Fig. 6. Note that based on the channel model in (1),
enough samples can be generated by simulation during the 2 All source codes and trained models are provided openly in
offline training stage. There is a trade-off between the num- https://github.com/LQJecho/Channel-Estimation-and-Hybrid-Precoding-
ber of training samples and the performance of achievable for-Millimeter-Wave-Systems-Based-on-Deep-Learning
VOLUME 9, 2021 120933

TABLE 3. Computational complexity of the proposed HBF-Net. Np = (KL × KL + KL × (S − 1)) × L = 180 pilots,
the BS selects M = KL × S × L = 90 HBF vectors for each
channel realization, and the user selects KL × L = 18 DBF
vctors for each channel realization. From the discussion in
Section IV-B1, the input dimension of the CE-HBF-Net is 409
(dim(R) = 2Np = 360, dim(Isel ) = Nsel = M K + L = 45 +
3 = 48 by using the IAI method proposed in Section IV-B,
TABLE 4. Computational complexity of the proposed CE-HBF-Net. and dim(σ 2 ) = 1).
Our simulation results consist of four parts. The first part
is to show the performance improvement of the proposed
MO codebook. The second part is to compare the DL-based
HBF NNs with the conventional algorithms. The third part
is to verify the effectiveness of our network structure, where
the advantages of signal attention mechanism, deep attention
mechanism, convolution block, and CE-HBF-Net input set-
tings are demonstrated via simulations. The fourth part is to
demonstrate the robustness of the DL-based HBF NNs to
proposed in [3] and [4] require around 7.46×106 , 2.08 × 107 , the mismatches between the offline training stage and the
1.18 × 107 , and 6.64 × 106 real multiplications, respec- online deployment stage. For example, channel path mis-
tively. It can be seen that the proposed HBF-Net has com- match, channel model mismatch, and non-ideal device char-
petitive computational complexity when compared with the acteristics such as random errors of phase shifters.
conventional model-based HBF algorithms. As the task of
the CE-HBF-Net is more complicated, including implicit A. CODEBOOK ENHANCEMENT
channel estimation and HBF optimization, its complexity Fig. 7(a) shows the normalized MSE (NMSE) of imper-
is the highest. However, different from the conventional fect channel estimation as a function of the pilot to noise
algorithms, the matrix multiplication of the HBF-Net and ratio (PNR) for three different codebooks with 5 levels in
the CE-HBF-Net can be efficiently accelerated by parallel the hierarchical search of the HC-CE algorithm. Fig. 7(b)
computation via graphics processing unit [40]. Under the further shows the resulting spectral efficiency of the HBF
hardware configuration of Intel(R) Core(TM) i9-9900K CPU optimization algorithm in [3] by directly taking the imperfect
and GeForce GTX 1080 Ti, the conventional HBF algorithms estimated channel from that in Fig. 7(a), when the SNR
proposed in [4] and [3] (implemented in MATLAB) take is fixed at 0dB. As shown in these two figures, compared
1.14 × 10−2 s and 9.10 × 10−3 s for a hybrid precoding opti- with the two conventional codebooks in [26] and [28], our
mization, while the HBF-Net and the CE-HBF-Net (imple- proposed MO codebook can achieve more accurate channel
mented in Python) only need 4.6 × 10−5 s and 5.8 × 10−5 s. estimation and higher spectral efficiency in the entire PNR
The proposed DL-based NNs can better meet the demand range.
of real-time communication under the existing hardware Next, to investigate how pilot overhead affects channel
configuration. estimation accuracy and spectral efficiency, Fig. 8 demon-
strates the NMSE and achievable spectral efficiency with
V. SIMULATION RESULTS different numbers of codebook levels when PNR = 0dB and
Throughout the simulation, a half-wave spaced ULA is SNR = 0dB. As shown in this figure, the performance of
deployed at the BS and the user with Nt = 64 and Nr = all the three codebooks is improved with the increase of the
2. At the BS, the number of RF chains and that of data number of codebook levels, and the proposed MO codebook
streams are set to NRF = Ns = 2. The mmWave channel always results in higher spectral efficiency and more accurate
model in (1) is used with exactly the same parameters as channel estimation than the conventional codebooks. A close
those in [3], [26], where the path number L is set to 3, α` observation to the performace curves of the proposed MO
satisfies independently and circularly symmetric Gaussian codebook reveals that there is little performance increase
distribution with zero mean and unit variance, and both φ` when the number of levels is larger than 5. Considering the
and θ` satisfy independently uniform distribution in [0, π]. trade-off between performance and pilot overhead, a good
Suppose that the estimation of the path number is correct, choice to the number of the MO codebook levels is 5,
i.e., Lest = L. i.e., S = 5, which is used in the following simulations.
The BS adopts an HBF codebook with S = 5 levels,
and each AoD range in a level is divided further into two B. DIFFERENT HBF ALGORITHMS
sub-ranges in the next level, i.e., K = 2. The resolu- In this part, we compare the spectral efficiency of different
tion N of the BS codebook is 96. The number of outer HBF algorithms to show the superiority of our proposed two
iterations is equal to the number of paths, i.e., L = 3. DL-based NNs. Fig. 9 shows the spectral efficiency versus
According to [26], the HC-CE algorithm requires a total of the SNR for different HBF algorithms when the PNR is set
120934 VOLUME 9, 2021

FIGURE 7. Performance comparison between the conventional codebooks and the proposed MO codebook. (a) Channel
estimation error in terms of NMSE v.s. PNR; (b) Spectral efficiency v.s. PNR.
FIGURE 8. The influence of the number of BS codebook levels on channel estimation. (a) NMSE v.s. the number of BS codebook
levels; (b) Spectral efficiency v.s. the number of BS codebook levels.
to 0dB. As shown in the figure, the proposed HBF-Net and stronger robustness to channel estimation errors than other
CE-HBF-Net can achieve higher spectral efficiency than the algorithms. On the other hand, as the input of HBF-Net is the
two conventional HBF algorithms with imperfect channel estimated channel, no matter what value S takes, the input
estimation, where the CE-HBF-Net achieves the better per- dimension of HBF-Net remains the same. It means that we
formance as it can extract information conducive to HBF only need to train the HBF-Net once since a well-trained
design more flexibly and implement joint channel estimation NN model usually has good generalization ability. In this
and HBF optimization. simulation, the HBF-Net was trained under S = 5 but test for
Fig. 10 shows the spectral efficiency versus the PNR for different values of S. As shown in Fig. 11, the HBF-Net can
different HBF algorithms when the SNR is fixed at 0dB. achieve higher spectral efficiency than the two conventional
As shown in this figure, all the HBF algorithms achieve algorithms under different S without retraining. Therefore,
higher spectral efficiency with more accurate channel esti- the proposed HBF-Net and CE-HBF-Net provide two choices
mation (due to the increase of PNR). The proposed HBF-Net to practical systems depending on the target performance
and CE-HBF-Net both outperform the two conventional HBF requirement and the computational complexity budget.
algorithms over the entire PNR range.
Fig. 11 shows the spectral efficiency versus the number C. NETWORK STRUCTURE DESIGN
of BS codebook levels S for different HBF algorithms when The performance improvement shown in Section V-B not
SNR = 0dB and PNR = 0dB. As shown in Fig. 11, no matter only comes from the basic idea of joint channel estima-
what value S takes, both the HBF-Net and the CE-HBF-Net tion and robust HBF optimization, but also from the effi-
perform better than the two conventional algorithms. Note cient design in the network structures in Fig. 4 and Fig. 5.
that the CE-HBF-Net under different values of S has different In this part, we compare the performance of HBF-Net and
input dimensions because of its joint channel estimation and CE-HBF-Net with different network structures to demon-
HBF optimization design approach. Thus, the CE-HBF-Net strate the effectiveness of our well-designed structures.
under different S needs to be retrained. However, the perfor- Fig. 12(a) shows the performance of the CE-HBF-Net with
mance gain of CE-HBF-Net is more significant, especially or without the signal attention mechanism, where the system
with a smaller S, which indicates that the CE-HBF-Net has setup and the codebook are the same as those in Fig. 9 and the
VOLUME 9, 2021 120935

FIGURE 9. Spectral efficiency v.s. SNR for different HBF algorithms. FIGURE 11. Spectral efficiency v.s. the number of BS codebook levels for
different HBF algorithms.
outer iteration (labeled as ‘‘CE-HBF-Net-V2’’). Both two

CE-HBF-Net variants have the same network structure as the
CE-HBF-Net except for input settings. Comparing the curves
of the CE-HBF-Net and the CE-HBF-Net-V1 in Fig. 12(d),
we can see the performance gap with or without inputting
the information of the precoding and combining vectors,
and the efficiency of the proposed IAI method. Furthermore,
as shown in this figure, the CE-HBF-Net outperforms the
CE-HBF-Net-V2 since it utilizes the ‘‘soft’’ information of
upper-level pilot transmissions, i.e., how likely the AoD
belongs to a sub-range.
FIGURE 10. Spectral efficiency v.s. PNR for different HBF algorithms.
D. ROBUSTNESS TO MISMATCHES BETWEEN OFFLINE
PNR is set to 0dB. As shown in Fig. 12(a), the CE-HBF-Net TRAINING AND ONLINE DEPLOYMENT
with the signal attention mechanism (the curve labeled with For the sake of avoiding the time-consuming training process
‘‘CE-HBF-Net’’) can obtain higher spectral efficiency than online, both the HBF-Net and the CE-HBF-Net are trained
that without the signal attention mechanism (the curve labeled offline with simulated samples. However, there are inevitably
with ‘‘CE-HBF-Net w/o SA’’). mismatches between actual scenarios and training scenarios.
Fig. 12(b) shows the performance of the HBF-Net and the Therefore, the robustness to these mismatches is another
CE-HBF-Net with or without the depth attention mechanism. critical metric for our proposed two DL-based HBF NNs.
It can be seen that the introduction of the depth attention In the previous simulation, it was assumed that the esti-
mechanism brings a spectral efficiency improvement of about mation of the path number is correct. However, in practical
0.5bits/s/Hz to both the HBF-Net and the CE-HBF-Net. applications, the estimation of the path number may not be
Fig. 12(c) compares the performance of the proposed accurate. Therefore, the robustness to this mismatch should
HBF-Net and CE-HBF-Net with that of their variants by be evaluated. We assume that the channel in the offline
replacing the network structure with the fully connected training stage and the actual one in the online test follow
structure (labeled as ‘‘HBF-Net DNN’’ and ‘‘CE-HBF-Net the same model, but with different numbers of containing
DNN’’, respectively) while keeping the total number of train- paths, denoted by Lsim and L, respectively. Fig. 13 shows the
able parameters approximately unchanged. It can be seen spectral efficiency as a function of SNR for different HBF
from this figure that, compared to the simple DNN structure, algorithms with a mismatch between Lsim and L. It is assumed
there is about a performance gain of 0.7bits/s/Hz in the that during the offline training stage, both the HBF-Net and
spectral efficiency at high SNRs for both the DL-based HBF the CE-HBF-Net are trained with Lsim = 3, and the estima-
NNs due to the specific considerations of using the convo- tion of the path number is always correct in the offline training
lutional layer and the depth attention mechanism for both stage. However, Considering the sparsity of the mmWave
the HBF-Net and the CE-HBF-Net, and the signal attention channels, L of the actual channel during the online test stage
mechanism for the CE-HBF-Net. is set to L = 1 or L = 2. As shown in Fig. 13, compared
To demonstrate the novelty of the CE-HBF-Net input set- with the two conventional HBF algorithms, the proposed two
ting with the IAI method, we also simulate two variants NNs are more robust to the mismatch, and the more serious
of the CE-HBF-Net with different input settings: (a) Input the mismatch, the stronger the robustness. For example, for
only σ 2 and R (labeled as ‘‘CE-HBF-Net-V1’’). (b) Input a target spectral efficiency of 5bits/s/Hz, the SNR gains of
only σ 2 , R and Isel in the last inner iteration of every the HBF-Net and CE-HBF-Net over the conventional HBF
120936 VOLUME 9, 2021

FIGURE 12. Spectral efficiency of the DL-based HBF NNs with different network structures. (a) Impact of signal attention (SA)
mechanism on the CE-HBF-Net; (b) Impact of depth attention (DA) mechanism on the HBF-Net and the CE-HBF-Net;
(c) Spectral efficiency v.s. SNR for the HBF-Net and CE-HBF-Net with the proposed structures and the fully connected
networks; (d) Spectral efficiency v.s. SNR for the CE-HBF-Net with different input settings.
FIGURE 13. Spectral efficiency v.s. SNR for different HBF algorithms with the mismatch in the number of paths. (a) Lsim = 3, L = 1;
(b) Lsim = 3, L = 2.
algorithm in [3] are 1.4dB and 2.1dB, respectively, for L = 2, HBF-Net and CE-HBF-Net, they were trained in the NLoS
and are increased to 2.6dB and 3.4dB, respectively, for L = 1. channel model but tested directly in the LoS model. The
In the previous simulation, it was assumed that all the NLoS channel model is the same as that in the previous sim-
paths are non-line-of-sight (NLoS) and the complex gain ulations. For the LoS model, it is assumed that there are still
of each path satisfies independently and circularly symmet- three paths, where the gain of the LoS path is fixed at 2, and
ric Gaussian distribution with zero mean and unit variance. the complex gain of the other two paths satisfy independently
However, the actual wireless channel may be line-of-sight and circularly symmetric Gaussian distribution with zero
(LoS). Therefore, it is necessary to consider the robustness mean and 0.5 variance. As shown in the figure, the HBF-Net
of the DL-based HBF NNs to the mismatch in the channel and the CE-HBF-Net can achieve higher spectral efficiency
model. Fig. 14 shows the spectral efficiency of different HBF than conventional algorithms without retraining, which indi-
algorithms under the LoS channel model. For the proposed cates that the DL-based NNs can learn the characteristics of
VOLUME 9, 2021 120937

2.1bits/s/Hz. These results show that the proposed HBF-Net

and CE-HBF-Net have good robustness to imperfect phase
shifters.
VI. CONCLUSION
We have enhanced the HBF optimization with imperfect CSI
from both the improvement of channel estimation and the
HBF optimization approach. In terms of codebook design,
we have applied MO to solve the HBF vector optimization
problem directly targeting the beam response requirement.
In terms of the HBF optimization, we proposed the HBF-Net
to solve the HBF optimization problem with DL and make
FIGURE 14. Spectral efficiency v.s. SNR for different HBF algorithms the HBF robust to channel estimation errors. We further
tested in the LoS channel model. proposed the CE-HBF-Net to achieve the joint channel esti-
mation and HBF optimization. In particular, with the pro-
posed IAI method, the information of the selected codewords
in the hierarchical search process is efficiently input to the
CE-HBF-Net with much lower input dimensions. Simulation
results have shown that with a well-designed loss function
and the introduction of the attention mechanism, the HBF-
Net and CE-HBF-Net can well handle the challenges of hard-
ware constraints and channel estimation errors, and achieve
significant performance improvement over the conventional
algorithms.
REFERENCES
[1] O. E. Ayach, S. Rajagopal, S. Abu-Surra, Z. Pi, and R. W. Heath, Jr.,
FIGURE 15. Spectral efficiency v.s. SNR for different HBF algorithms with ‘‘Spatially sparse precoding in millimeter wave MIMO systems,’’ IEEE
imperfect phase shifters. Trans. Wireless Commun., vol. 13, no. 3, pp. 1499–1513, Mar. 2014.
[2] X. Yu, J.-C. Shen, J. Zhang, and K. B. Letaief, ‘‘Alternating minimization
the mmWave propagation channels and have robustness to the algorithms for hybrid precoding in millimeter wave MIMO systems,’’
IEEE J. Sel. Topics Signal Process., vol. 10, no. 3, pp. 485–500, Apr. 2016.
channel model mismatch. [3] F. Sohrabi and W. Yu, ‘‘Hybrid digital and analog beamforming design for
Finally, we consider the effect of imperfect phase shifters large-scale antenna arrays,’’ IEEE J. Sel. Topics Signal Process., vol. 10,
in practical systems. As the two NNs are trained with no. 3, pp. 501–513, Apr. 2016.
the assumption of perfect phase shifters, there is mismatch [4] T. Lin, J. Cong, Y. Zhu, J. Zhang, and K. Ben Letaief, ‘‘Hybrid beamform-
ing for millimeter wave systems using the MMSE criterion,’’ IEEE Trans.
between the perfect phase shifters in the offline training stage Commun., vol. 67, no. 5, pp. 3693–3708, May 2019.
and the imperfect ones in practical implementation in the [5] Y. Zhou, T. Lin, and Y. Zhu, ‘‘Automatic modulation classification in
online deployment. Unlike the quantization error, random time-varying channels based on deep learning,’’ IEEE Access, vol. 8,
pp. 197508–197522, Oct. 2020.
phase and gain errors are neither known nor reciprocal [13]. [6] H. Ye, G. Y. Li, and B.-H. Juang, ‘‘Power of deep learning for channel esti-
Therefore, it is meaningful to explore the robustness of the mation and signal detection in OFDM systems,’’ IEEE Wireless Commun.
DL-based HBF NNs to imperfect phase shifters with random Lett., vol. 7, no. 1, pp. 114–117, Feb. 2018.
[7] H. He, C.-K. Wen, S. Jin, and G. Y. Li, ‘‘Deep learning-based channel esti-
phase and gain errors. According to [13], an imperfect
√ phase mation for beamspace mmWave massive MIMO systems,’’ IEEE Wireless
shifter can be modeled as γ ej(φ−δ) , where j = −1, φ is the Commun. Lett., vol. 7, no. 5, pp. 852–855, Oct. 2018.
ideal phase shift, δ denotes the phase error satisfying a Gaus- [8] F. A. Aoudia and J. Hoydis, ‘‘Model-free training of end-to-end com-
and variance σδ , i.e., δ ∼

sian distribution with zero 2 munication systems,’’ IEEE J. Sel. Areas Commun., vol. 37, no. 11,
mean pp. 2503–2516, Nov. 2019.
N 0, σδ2 , and γ ∼ N 1, σγ2 denotes the non-ideal gain of

[9] A. M. Elbir, ‘‘DeepMUSIC: Multiple signal classification via deep learn-
the phase shifter. According to [41] and [42], a typical σδ is ing,’’ IEEE Sensors Lett., vol. 4, no. 4, pp. 1–4, Apr. 2020.
[10] Y. Yang, F. Gao, X. Ma, and S. Zhang, ‘‘Deep learning-based channel
set to 0.1rad and a typical σγ is set to 0.2. estimation for doubly selective fading channels,’’ IEEE Access, vol. 7,
Fig. 15 shows the spectral efficiency of different HBF pp. 36579–36589, Mar. 2019.
algorithms with imperfect phase shifters, where the dotted [11] E. Balevi, A. Doshi, and J. G. Andrews, ‘‘Massive MIMO channel esti-
mation with an untrained deep neural network,’’ IEEE Trans. Wireless
lines represent the performance with perfect phase shifters Commun., vol. 19, no. 3, pp. 2079–2090, Jan. 2020.
and the solid ones represent the performance with imperfect [12] A. Alkhateeb, S. Alex, P. Varkey, Y. Li, Q. Qu, and D. Tujkovic, ‘‘Deep
phase shifters, respectively. As shown in this figure, the spec- learning coordinated beamforming for highly-mobile millimeter wave sys-
tems,’’ IEEE Access, vol. 6, pp. 37328–37348, Jun. 2018.
tral efficiency loss of the proposed HBF-Net, CE-HBF-Net,
[13] W. Wang, H. Yin, X. Chen, and W. Wang, ‘‘Robust and low-overhead
and the HBF algorithm [4] is less than 0.4bits/s/Hz. How- hybrid beamforming design with imperfect phase shifters in multi-user mil-
ever, the spectral efficiency loss of the HBF algorithm [3] is limeter wave systems,’’ IEEE Access, vol. 8, pp. 74002–74014, Apr. 2020.
120938 VOLUME 9, 2021

[14] T. Lin and Y. Zhu, ‘‘Beamforming design for large-scale antenna arrays [37] D. Hu, Y. Zhang, L. He, and J. Wu, ‘‘Low-complexity deep-learning-based
using deep learning,’’ IEEE Wireless Commun. Lett., vol. 9, no. 1, DOA estimation for hybrid massive MIMO systems with uniform circular
pp. 103–107, Jan. 2020. arrays,’’ IEEE Wireless Commun. Lett., vol. 9, no. 1, pp. 83–86, Jan. 2020.
[15] H. Huang, Y. Song, J. Yang, G. Gui, and F. Adachi, ‘‘Deep-learning-based [38] C.-K. Wen, W.-T. Shih, and S. Jin, ‘‘Deep learning for massive MIMO
millimeter-wave massive MIMO for hybrid precoding,’’ IEEE Trans. Veh. CSI feedback,’’ IEEE Wireless Commun. Lett., vol. 7, no. 5, pp. 748–751,
Technol., vol. 68, no. 3, pp. 3027–3032, Mar. 2019. Oct. 2018.
[16] W. Xia, G. Zheng, Y. Zhu, J. Zhang, J. Wang, and A. P. Petropulu, ‘‘A deep [39] S. Dörner, S. Cammerer, J. Hoydis, and S. ten Brink, ‘‘Deep learning based
learning framework for optimization of MISO downlink beamforming,’’ communication over the air,’’ IEEE J. Sel. Topics Signal Process., vol. 12,
IEEE Trans. Commun., vol. 68, no. 3, pp. 1866–1880, Mar. 2020. no. 1, pp. 132–143, Feb. 2018.
[17] C.-H. Lin, Y.-T. Lee, W.-H. Chung, S.-C. Lin, and T.-S. Lee, ‘‘Unsuper- [40] K. He and J. Sun, ‘‘Convolutional neural networks at constrained time
vised ResNet-inspired beamforming design using deep unfolding tech- cost,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., May 2015,
nique,’’ in Proc. IEEE Global Commun. Conf. (GLOBECOM), Dec. 2020, pp. 5353–5360.
pp. 1–7. [41] R. Garg and A. S. Natarajan, ‘‘A 28-GHz low-power phased-array receiver
[18] M. Wenyan, Q. Chenhao, Z. Zhang, and J. Cheng, ‘‘Sparse channel estima- front-end with 360◦ RTPS phase shift range,’’ IEEE Trans. Microw. Theory
tion and hybrid precoding using deep learning for millimeter wave massive Techn., vol. 65, no. 11, pp. 4703–4714, Nov. 2017.
MIMO,’’ IEEE Trans. Commun., vol. 68, no. 5, pp. 2838–2849, Feb. 2020. [42] C. W. Byeon and C. S. Park, ‘‘A low-loss compact 60-GHz phase shifter
[19] S. Ramjee, S. Ju, D. Yang, X. Liu, A. El Gamal, and Y. C. Eldar, in 65-nm CMOS,’’ IEEE Microw. Wireless Compon. Lett., vol. 27, no. 7,
‘‘Fast deep learning for automatic modulation classification,’’ 2019, pp. 663–665, Jul. 2017.
arXiv:1901.05850. [Online]. Available: http://arxiv.org/abs/1901.05850
[20] Y. Jin, J. Zhang, S. Jin, and B. Ai, ‘‘Channel estimation for cell-free
mmWave massive MIMO through deep learning,’’ IEEE Trans. Veh. Tech-
nol., vol. 68, no. 10, pp. 10325–10329, Nov. 2019. QIUJIN LU received the B.Eng. degree (Hons.)
[21] N. Song, C. Ye, X. Hu, and T. Yang, ‘‘Deep learning based low-rank chan- in communication science and engineering from
nel recovery for hybrid beamforming in millimeter-wave massive MIMO,’’ Fudan University, in 2018, where she is cur-
in Proc. IEEE Wireless Commun. Netw. Conf. (WCNC), May 2020, pp. 1–6. rently pursuing the M.S. degree in communica-
[22] Q. Bai, J. Wang, Y. Zhang, and J. Song, ‘‘Deep learning-based channel tion science and engineering. Her current research
estimation algorithm over time selective fading channels,’’ IEEE Trans. interests include channel estimation, hybrid beam-
Cognit. Commun. Netw., vol. 6, no. 1, pp. 125–134, Mar. 2020. forming design, deep learning, and millimeter
[23] J. Lee, G.-T. Gil, and Y. H. Lee, ‘‘Channel estimation via orthogonal wave signal processing.
matching pursuit for hybrid MIMO systems in millimeter wave communi-
cations,’’ IEEE Trans. Commun., vol. 64, no. 6, pp. 2370–2386, Jun. 2016.
[24] C.-R. Tsai and A.-Y. Wu, ‘‘Structured random compressed channel sensing
for millimeter-wave large-scale antenna systems,’’ IEEE Trans. Signal
Process., vol. 66, no. 19, pp. 5096–5110, Oct. 2018. TIAN LIN (Graduate Student Member, IEEE)
[25] C.-R. Tsai, Y.-H. Liu, and A.-Y. Wu, ‘‘Efficient compressive channel received the B.Eng. degree (Hons.) in commu-
estimation for millimeter-wave large-scale antenna systems,’’ IEEE Trans. nication science and engineering from Fudan
Signal Process., vol. 66, no. 9, pp. 2414–2428, May 2018.
University, in 2017, where he is currently pursu-
[26] A. Alkhateeb, O. El Ayach, G. Leus, and R. W. Heath, Jr., ‘‘Channel
ing the Ph.D. degree. His current research inter-
estimation and hybrid precoding for millimeter wave cellular systems,’’
IEEE J. Sel. Topics Signal Process., vol. 8, no. 5, pp. 831–846, Oct. 2014. ests include hybrid beamforming for massive
[27] C. Liu, M. Li, S. V. Hanly, I. B. Collings, and P. Whiting, ‘‘Millimeter wave MIMO systems, millimeter wave signal process-
beam alignment: Large deviations analysis and design insights,’’ IEEE ing, passive beamforming and channel estima-
J. Sel. Areas Commun., vol. 35, no. 7, pp. 1619–1631, Jul. 2017. tion for intelligent reflecting surface-aided MIMO
[28] Z. Xiao, H. Dong, L. Bai, P. Xia, and X.-G. Xia, ‘‘Enhanced channel systems, and deep learning for physical layer
estimation and codebook design for millimeter-wave communication,’’ communication.
IEEE Trans. Veh. Technol., vol. 67, no. 10, pp. 9393–9405, Oct. 2018.
[29] C. Chen, Y. Dong, X. Cheng, and L. Yang, ‘‘Low-resolution PSs based
hybrid precoding for multiuser communication systems,’’ IEEE Trans. Veh.
Technol., vol. 67, no. 7, pp. 6037–6047, Jul. 2018. YU ZHU (Member, IEEE) received the B.Eng.
[30] K. Chen and C. Qi, ‘‘Beam training based on dynamic hierarchical code- degree (Hons.) in electronics engineering and the
book for millimeter wave massive MIMO,’’ IEEE Commun. Lett., vol. 23, M.Eng. degree (Hons.) in communication and
no. 1, pp. 132–135, Jan. 2019. information engineering from the University of
[31] T. O’Shea and J. Hoydis, ‘‘An introduction to deep learning for the physical Science and Technology of China, in 1999 and
layer,’’ IEEE Trans. Cogn. Commun. Netw., vol. 3, no. 4, pp. 563–575, 2002, respectively, and the Ph.D. degree from the
Dec. 2017. Department of Electronic and Computer Engineer-
[32] T. J. O’Shea, T. Erpek, and T. Charles Clancy, ‘‘Deep learning based ing, The Hong Kong University of Science and
MIMO communications,’’ 2017, arXiv:1707.07980. [Online]. Available: Technology, in 2007.
http://arxiv.org/abs/1707.07980 Since 2008, he has been with Fudan University,
[33] D. Wu, M. Nekovee, and Y. Wang, ‘‘Deep learning-based autoencoder for where he is currently a Professor with the School of Information Science and
m-User wireless interference channel physical layer design,’’ IEEE Access, Technology. His current research interests include broadband wireless sys-
vol. 8, pp. 174679–174691, Sep. 2020.
tems and networks and signal processing for communications. He received
[34] S. Noh, M. D. Zoltowski, and D. J. Love, ‘‘Multi-resolution code-
Shanghai Pujiang Scholar Award, in 2008, and Fudan Zhuoxue Award,
book and adaptive beamforming sequence design for millimeter wave
beam alignment,’’ IEEE Trans. Wireless Commun., vol. 16, no. 9, in 2012. He served as the Physical Track Co-Chair for the IEEE WCNC
pp. 5689–5701, Sep. 2017. 2013, the Co-Chair for the Wireless Communications Systems Symposium
[35] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. Cambridge, of the IEEE ICCC 2014 and IEEE ICCC 2019, and the Workshop Co-Chair
MA, USA: MIT Press, 2016. for the Asia–Pacific Conference on Communications, in 2017. He served as
[36] X. Cheng, X. Li, J. Yang, and Y. Tai, ‘‘SESR: Single image super resolution an Editor for the IEEE Wireless Communications Letters and the Journal of
with recursive squeeze and excitation networks,’’ in Proc. 24th Int. Conf. Communications and Information Networks.
Pattern Recognit. (ICPR), Aug. 2018, pp. 147–152.
VOLUME 9, 2021 120939

Channel Estimation and Hybrid Precoding For Millimeter Wave Communications A Deep Learning-Based Approach

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Channel Estimation and Hybrid Precoding For Millimeter Wave Communications A Deep Learning-Based Approach

Uploaded by

Copyright:

Available Formats

Received August 10, 2021, accepted August 25, 2021, date of publication August 30, 2021, date of current

version September 8, 2021.

Channel Estimation and Hybrid Precoding for

I. INTRODUCTION A. RELATED WORKS AND MOTIVATIONS

many benefits in tackling intractable conventional physical

VOLUME 9, 2021 120925

120926 VOLUME 9, 2021

FIGURE 3. Hierarchical codebook F5 with L = 3, K = 2, and N = 96.

VOLUME 9, 2021 120927

ciated with φ̄u , s = 1, · · · , S and n = 1, · · · , LK s . As a × fH

DHBS FRF,(s,n) fBB,(s,n) = g(s,n) , ∇J (FRF ) = DBS DH

120928 VOLUME 9, 2021

VOLUME 9, 2021 120929

120930 VOLUME 9, 2021

VOLUME 9, 2021 120931

120932 VOLUME 9, 2021

spectral efficiency. However, it is worth noting that as the

VOLUME 9, 2021 120933

120934 VOLUME 9, 2021

VOLUME 9, 2021 120935

outer iteration (labeled as ‘‘CE-HBF-Net-V2’’). Both two

120936 VOLUME 9, 2021

VOLUME 9, 2021 120937

2.1bits/s/Hz. These results show that the proposed HBF-Net

 and variance σδ , i.e., δ ∼

120938 VOLUME 9, 2021

VOLUME 9, 2021 120939

You might also like

and variance σδ , i.e., δ ∼