Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

http://www.diva-portal.

org

Postprint

This is the accepted version of a paper published in IEEE Transactions on Vehicular Technology. This
paper has been peer-reviewed but does not include the final publisher proof-corrections or journal
pagination.

Citation for the original published paper (version of record):

Brandt, R., Bengtsson, M. (2015)


Distributed CSI Acquisition and Coordinated Precoding for TDD Multicell MIMO Systems.
IEEE Transactions on Vehicular Technology
http://dx.doi.org/10.1109/TVT.2015.2432051

Access to the published version may require subscription.

N.B. When citing this work, cite the original published paper.

Permanent link to this version:


http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-166426
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, ACCEPTED FOR PUBLICATION, APRIL 2015 1

Distributed CSI Acquisition and Coordinated


Precoding for TDD Multicell MIMO Systems
Rasmus Brandt⇤ , Student Member, IEEE, and Mats Bengtsson, Senior Member, IEEE

Abstract—Several distributed coordinated precoding methods references therein. These methods typically require informa-
exist in the downlink multicell MIMO literature, many of which tion about the received signal covariance and local effective
assume perfect knowledge of received signal covariance and local channels at the involved nodes, and it is often assumed
effective channels. In this work, we let the notion of channel state
information (CSI) encompass this knowledge of covariances and that this information is perfectly known. In this work, we
effective channels. We analyze what local CSI is required in the denote the information about the received signal covariance
WMMSE algorithm for distributed coordinated precoding, and and effective channels as channel state information (CSI). We
study how this required CSI can be obtained in a distributed take a systems perspective and propose methods for estimating
fashion. Based on pilot-assisted channel estimation, we propose and acquiring the necessary CSI at the involved nodes in
three CSI acquisition methods with different tradeoffs between
feedback and signaling, backhaul use, and computational com- a distributed fashion. The resource allocation is based on
plexity. One of the proposed methods is fully distributed, meaning the WMMSE algorithm1 [12] for distributed weighted sum
that it only depends on over-the-air signaling but requires no rate optimization, because of its low per-iteration complexity
backhaul, and results in a fully distributed joint system when and tractable form. Due to poor performance when naïvely
coupled with the WMMSE algorithm. Naïvely applying the applying the WMMSE algorithm, we also propose some ro-
WMMSE algorithm together with the fully distributed CSI
acquisition results in catastrophic performance however, and bustifying procedures, leading to a robust and fully distributed
therefore we propose a robustified WMMSE algorithm based joint coordinated precoding and CSI acquisition system.
on the well known diagonal loading framework. By enforcing As the first step in the system design, we succinctly de-
properties of the WMMSE solutions with perfect CSI onto the scribe what information, in terms of weights and CSI, that
problem with imperfect CSI, the resulting diagonally loaded
is needed for the nodes of the network to perform their
spatial filters are shown to perform significantly better than
the naïve filters. The proposed robust and distributed system part in the WMMSE algorithm. There is a multitude of
is evaluated using numerical simulations, and shown to perform conceivable methods to obtain the necessary information at the
well compared with benchmarks. Under centralized CSI acquisi- nodes, e.g. using various combinations of channel estimation,
tion, the proposed algorithm performs on par with other existing feedback, signaling, backhaul, etc. We propose three methods
centralized robust WMMSE algorithms. When evaluated in a
for acquiring the necessary CSI. Based on channel estimation
large scale fading environment, the performance of the proposed
system is promising. through pilot transmissions, feedback, signaling, and backhaul
use, the proposed CSI acquisition methods correspond to
different tradeoffs between these techniques. In particular, one
I. I NTRODUCTION of the proposed CSI acquisition methods is fully distributed,
in the sense that the nodes of the network solely cooperate by
M ULTIPLE-ANTENNA coordinated precoding is a
promising technique for improving spectral efficiency
in multicell multiple-input multiple-output (MIMO) networks,
means of over-the-air signaling, thus requiring no backhaul.
A key component of the proposed CSI acquisition methods
is the estimation of the effective channels. It is based on
by serving several spatially separated users simultaneously in
synchronous pilot transmission in the downlink, enabling the
the same time/frequency resource block [1], [2]. The cascade
receiving user equipments (UEs) to estimate both desired
of physical channels and precoders are the effective channels
and interfering effective channels [4]. Assuming time-division
experienced by the receivers. By suitably selecting the pre-
duplex (TDD) operation and perfectly calibrated transceivers
coders, the downlink weighted sum rate of the network can
[15], [16], similar channel estimation can be performed in the
be maximized. The requirements for practical implementation
uplink at the receiving base stations (BSs). This is contrary
of coordinated precoding include channel estimation [3]–[5],
to frequency-division duplex operation, where the BSs obtain
robustness against channel estimation errors [6]–[10], and suf-
their required information by feedback and backhaul signaling.
ficiently low complexity; preferably achieved using distributed
When combining the fully distributed CSI acquisition with the
methods [11]–[14].
WMMSE algorithm, the joint system is fully distributed.
In the multicell MIMO literature, there are several examples
of distributed coordinated precoding methods; see e.g. [11] and Naïvely applying the original WMMSE algorithm together
with the proposed fully distributed CSI acquisition method
Copyright (c) 2015 IEEE. Personal use of this material is permitted. leads to catastrophic performance however. This is because
However, permission to use this material for any other purposes must be the original algorithm was not developed to be robust against
obtained from the IEEE by sending a request to pubs-permissions@ieee.org.
The authors are with the Department of Signal Processing, School of Elec-
trical Engineering, KTH Royal Institute of Technology, SE-100 44 Stockholm, 1 The algorithm takes this name since it is a Weighted Minimization of the
Sweden. E-mails: rabr5411@kth.se, mats.bengtsson@ee.kth.se. Mean Squared Error (WMMSE).
2 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, ACCEPTED FOR PUBLICATION, APRIL 2015

imperfect CSI. We therefore propose a robustified WMMSE quantized feedback. Other robustified versions of the WMMSE
algorithm which retains the distributedness of the original algorithm, where the contribution of the downlink channel
algorithm, contrary to state of the art [7]–[10]. We formu- estimation errors in the involved covariance matrices was
late a worst-case WMMSE problem, and solve an upper averaged out, were proposed in [7], [8]. The same approach
bounded version of the problem. The resulting precoders are was taken in [9], where it was mentioned that this corresponds
diagonally loaded, a technique which is well known for its to optimizing a lower bound on the achieved performance, and
robustifying effect on beamformers [17]–[23]. The optimal in [10] where the lower bound was explicitly derived. The
amount of diagonal loading is determined by the worst-case filters were in effect robustified by diagonal loading, where
channel estimation errors, whose statistics are unfortunately the diagonal loading factors were determined by the downlink
unavailable in the proposed CSI acquisition setup. Instead, we channel estimation performance. The work in [7]–[10] was
propose a practical method for implicitly selecting the amount mainly focused on proposing robust WMMSE methods and
of diagonal loading for the precoders. At the UEs, we show thus the actual CSI acquisition was not conclusively studied,
an inherent property of the (spatial) receive filters and mean contrary to this paper. The major assumption in the system
squared error (MSE) weights obtained from the WMMSE model of [7]–[10] is that downlink channel estimation is
algorithm with perfect CSI. When this property is enforced performed at the UEs, and that the downlink channel estimates
onto the filters with imperfect CSI, the resulting receive filters are fed back to the BSs. In this work we are interested in
are also diagonally loaded. The robust MSE weights have TDD channel estimation and although the algorithms in [7]–
smaller eigenvalues than the non-robust MSE weights. This [10] could be applied in such a setting, doing so leads to some
can be interpreted as the receivers requesting lower data rates idiosyncrasies that will be detailed in Sec. IV-E. Due to the
when there are large discrepancies in their estimated CSI. system model in [7]–[10], the nodes of the network require
feedback of all filters in all iterations of the algorithm, leading
to a large amount of feedback which would typically be
A. Related Work implemented using a centralized CSI acquisition infrastructure.
In [4], a reciprocal channel was exploited to directly es- In this paper, contrary to [7]–[10], we incorporate a detailed
timate the filters maximizing the signal-to-interference-and- analysis of the CSI acquisition component of the system,
noise ratio, requiring no other signaling. Similar work was leading up to a robust and distributed coordinated precoding
performed in [24], where non-linear filters also were studied. system.
Focusing on the reciprocity, and using the receive filters as
transmit filters in the uplink, [25] performed extensive simu-
lations for a beam selection approach. Our proposed effective
channel estimation is similar to their ‘busy burst’ methodology. B. Contributions
For the single-stream multiple-input single-output (MISO) The major contributions of this work are as follows.
interference channel, an analytical method for finding the rate-
maximizing zero-forcing beamformers was derived in [13]. • We succinctly describe the required information for the
Saliently, the method does not require cross-link CSI, and was network nodes to perform one WMMSE iteration. We
consequently shown to be highly robust against CSI imper- propose three CSI acquisition methods which provide
fections. In [14], decentralized algorithms based on WMMSE the necessary information. The methods have varying
ideas were proposed, achieving faster convergence than the levels of distributedness and signaling needs. One of the
original WMMSE algorithm in [12], in addition to signaling proposed methods is fully distributed, meaning that it can
strategies for obtaining the necessary CSI. TDD reciprocity be implemented entirely by over-the-air signaling.
was assumed, and the UEs used combinations of inter-cell • For resilient performance against channel estimation
and intra-cell effective channel pilot transmissions. Contrary errors, we propose a robustified, but still distributed,
to our work, perfect channel estimation was assumed, and their WMMSE algorithm to be applied together with the
decentralized algorithms still require some BS backhaul. proposed CSI acquisition schemes. The robustness is due
Weighted sum rate maximization by means of weighted to diagonal loading, and the level of diagonal loading
MMSE minimization was originally proposed for multiuser for the precoders is determined implicitly by a practical
MIMO systems in [26], where the MSE weights were used procedure.
to equate the Karush-Kuhn-Tucker (KKT) conditions of the • We identify and explore new inherent properties of the
weighted MMSE problem to the KKT conditions of the WMMSE algorithm. When the properties are explicitly
weighted sum rate problem. This same method was directly enforced onto solutions with imperfect CSI, the resulting
applied to multicell MIMO systems in [7], [27], but it was not receive filters are diagonally loaded.
until [12] that a rigorous connection to the multicell weighted • Performance is evaluated numerically, and it is shown that
sum rate problem was presented. An earlier work is [6], the proposed fully distributed system performs excellently
where the weighted MMSE optimization problem was solved compared with the naïve WMMSE algorithm with fully
using the same technique, but without explicitly providing distributed CSI acquisition. With centralized CSI acqui-
the rigorous connection to the weighted sum rate problem. In sition, the proposed robust WMMSE algorithm performs
[6], a robust WMMSE algorithm was also suggested for the on par with existing robust WMMSE algorithms, which
case of norm bounded channel uncertainty arising from limited however require centralized CSI acquisition.
BRANDT AND BENGTSSON: DISTRIBUTED CSI ACQUISITION AND COORDINATED PRECODING FOR TDD MULTICELL MIMO SYSTEMS 3

C. Notation One main assumption in this work is that there is a per-


The operations (·)⇤ , (·)H , (·)T are complex conjugate, fectly reciprocal uplink channel available. That is, the channel
Hermitian transpose, and regular transpose, respectively. The in the uplink⇣ from ⌘UE jl to BS i is H jl i = Hjl i . Let
T

operators Tr (·), k·k2 , k·kF are the matrix trace, Euclidean x ⇤ik ⇠ CN 0, ⌅ ik be the transmitted signal from UE ik in
norm and Frobenius matrix norm, respectively. We denote the the uplink. The uplink is described by the interfering multiple
partial ordering of positive (negative) semidefinite matrices as access channel, and the received signal for BS i is then
⌫ ( ). The mth largest eigenvalue (singular value) of Q is
Kc Kt X
Kc
denoted m (Q) (sm (Q)). The zero-mean and covariance Q X X
complex symmetric Gaussian distribution is CN (0, Q), and y ⇤i = HT ⇤
ik i x ik + HT ⇤ ⇤
jl i x jl + z i , (3)
E (·) denotes expectation. Estimated quantities are denoted k=1 j6=i l=1

with a hat b a and uplink quantities with an arrow a . The where z ⇤i ⇠ CN 0, t2 IMt . For convenience, we work with
Kronecker delta is i,j . the complex conjugate version of the received signal. That is,
II. D ISTRIBUTED W EIGHTED S UM R ATE O PTIMIZATION the model we will use for the uplink is:
FOR THE M ULTICELL MIMO D OWNLINK Kc Kt X
Kc
⇤ X X
Our system model is a multicell system with Kt BSs, each y i = y ⇤i = HH
ik i x ik + HH
jl i x jl + z i . (4)
serving Kc UEs, for a total of Kr = Kt Kc UEs. We index the k=1 j6=i l=1
BSs as i 2 {1, . . . , Kt }. The kth served UE of BS i is indexed
With the uplink model in (4), the channel estimation in
by the pair of indices (i, k). For compactness, we will often
Sec. III can be tailored to the needs of the weighted sum rate
write this pair of indices as ik . The system is operating using
optimization, which we detail in the next section.
coordinated precoding, i.e. each UE is only served data from
one BS and the signals from the other BSs constitute inter-
cell interference2 . When Kc 2, intra-cell interference is also A. Weighted Sum Rate Optimization
observed. The BSs are equipped with Mt antennas each, the
UEs have Mr antennas and are served Nd data streams each3 . Since the CSI acquisition to be proposed is tailored for
Communication takes place both in the downlink and in the the WMMSE algorithm [12], we now briefly summarize the
uplink. We focus on optimizing performance in the downlink, algorithm, as well as introduce some necessary notation.
since that typically experiences heavier traffic loads than the By assigning the UEs data rate weights P ↵ik 2 [0, 1] , 8 ik ,
uplink. The presented method could equally well be applied the weighted sum rate is formulated as (i,k) ↵ik Rik . This
in the uplink however. In the downlink, the multiuser in- formulation describes the ultimate performance of the system,
teraction is described by the interfering broadcast channel. but is just one way of forming a system-level utility from
Denote a realization of the flat-fading MIMO channel between the user rates [2]. The data rate weights ↵ik can be selected
BS j and UE ik as Hik j and let each user’s data signal corresponding to user priority, e.g. to achieve a proportionally
xik ⇠ CN (0, INd ) be linearly precoded by Vik 2 CMt ⇥Nd . fair solution [28]. In the following, we will assume that the
The received signal at UE ik is then weights are selected at the BSs.
X Let Pi be the sum power constraint for BS i. With the
yik = Hik i Vik xik + Hik j Vjl xjl + zik , (1) precoders {Vik } as optimization variables, the weighted sum
(j,l)6=(i,k) rate optimization problem is:
where the last term is a white Gaussian noise term X
zik ⇠ CN 0, r2 IMr . The signals {xik } and {zik } are maximize ↵ ik R ik
{Vik }
i.i.d. over users. Given these assumptions, the received (i,k)
interference (5)
P plus noise covariance matrix for UE ik is Kc
X
i+n
ik =
H H 2
(j,l)6=(i,k) Hik j Vjl Vjl Hik j + r I.
subject to Tr Vik ViHk  Pi , i = 1, . . . , Kt .
Assuming that the decoders in the UE terminals treat k=1
interference as additive noise, the achievable downlink data Due to the non-convexity of (2), this is a non-convex opti-
rate for UE ik is mization problem. At least when Mr = 1, the problem is also
⇣ 1

Rik = log det I + ViHk HH ik i
i+n
ik H i k i V i k
. (2) NP-hard [29]. We can therefore only reasonably strive to find
a locally optimal solution.
Note that (2) is non-convex in {Vik }, since the precoders By introducing additional optimization variables {Wik }
appear inside i+n
ik . This non-convex dependence on the pre- (acting like MSE weights), it was shown in [12] that (5) has
coders describes the coupling between users, and will be the the same global solutions as the following weighted MMSE
key challenge in the optimization to come. optimization problem:
2 Since the focus of this paper is the distributed implementation of multicell X
processing, we do not investigate joint transmission, where several BSs jointly minimize ↵ik (Tr (Wik Eik ) log det (Wik ))
{Aik },{Vik }
serve the UEs with data. Such joint transmission requires significant backhaul (i,k)
{Wik 0}
between BSs, and is not amenable to fully distributed implementation.
3 The system model can easily be extended to scenarios where the BSs serve Kc
(6)
X
different number of UEs each and scenarios where the nodes have different subject to Tr Vik ViHk  Pi , i = 1, . . . , Kt .
number of antennas.
k=1
4 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, ACCEPTED FOR PUBLICATION, APRIL 2015

TABLE I Algorithm 1 WMMSE Algorithm [12] (Perfect CSI)


S UMMARY OF CSI QUANTITY SHORTHANDS 1: repeat
At UEs:
Downlink F i k = H i k i Vi k 1 1
2: W ik = I F H ik ik F ik
= F ik F H i+n
i k + ik
p 1/2
ik
P 3: Aik = ik1 Fik , Uik = ↵ik Aik Wik
i+n = (j,l)6=(i,k) Hik j Vjl VjHl HH 2
ik ik j + rI At BSs: PK c
Uplink Gi k = H Hik i Uik
4: Find µi which satisfies k=1 Tr Vik ViHk  Pi
P 1 p 1/2
i = s+i =
i
H H H
(j,l) Hjl i Ujl Ujl Hjl i 5: Bik = ( i + µi I) Gik , Vik = ↵ik Bik Wik
6: until convergence criterion met, or fixed number of iters.

TABLE II
Q UANTITIES THAT MUST BE SIGNALED IN ORDER FOR EACH NODE TO
PERFORM ONE ITERATION OF THE WMMSE ALGORITHM for BS i is a quadratically constrained quadratic program with
Covariance matrix Effective channel(s) Weight(s)†
optimization variables {Vik }Kk=1 . The solution is [12]
c

1
UE ik ik F ik ↵ik Vik = ↵ik ( i + µi I) HH i k i A ik W i k , 8 i k , (10)
1/2 Kc P
BS i {Gik }K where i = s+i (j,l) ↵jl Hjl i Ajl Wjl Ajl Hjl i is a signal
c
{Wi }k=1 H H
i =
i k=1 k
† Note that the user priorities {↵ik }K c
k=1 are selected, and thus fully plus
PK interference covariance matrix for BS i in the uplink.
known, at the serving BSs. If k=1 c
Tr Vik ViHk  Pi is satisfied for µi = 0, the sum
power constraint for BS i is inactive and PKthe problem is solved.
Otherwise, µi > 0 is found such that k=1 c
Tr Vik ViHk = Pi
The {Aik } are linear receive filters, and
holds. This can be done efficiently using e.g. bisection [12].
⇣ ⌘
H When the precoders have been found, a new iteration is com-
E ik = E x ik A H ik y ik x ik A H ik y ik
(7) menced by again optimizing over {Aik }. With each update
=I AH
i k Hik i V i k ViHk HH H
ik i A ik + A ik ik A ik of {Aik }, {Wik } or {Vik }, the objective value in (6) cannot
increase. The iterations thus continue until convergence, or for
is the MSE matrix for UE ik . Further, a fixed number of iterations.
i+n
ik = H V V H
HH
i k i ik ik i k i + ik is the received signal and 2) Required Local Information for the WMMSE Iterations:
interference plus noise covariance matrix for UE ik . In order to clarify what information the CSI acquisition
The optimization problem in (6) is still non-convex over the schemes should provide, we introduce some shorthands for
joint set {Aik , Wik , Vik }, but the key benefit of (6) over (5) the quantities involved in the WMMSE algorithm. For UE ik ,
is that it is independently convex in the blocks of variables we define a weighted receive filter as Uik = ↵ik Aik Wik
p 1/2
{Aik }, {Wik }, and {Vik }, when the remaining blocks are and denote the effective downlink channel as Fik = Hik i Vik .
kept fixed. Further, a stationary point can be found through The receive filter can then be written as Aik = ik1 Fik .
alternating minimization4 [30, Ch. 2.7] over the blocks [12]. Symmetrically, in the uplink for UE ik , the precoder is
There is a one-to-one correspondence between the stationary p 1/2
Vik = ↵ik Bik Wik and the effective uplink channel is
points of (5) and the stationary points of (6) [12], and since
Gik = HH ik i Uik . Finally, the component precoder is Bik =
(6) optimizes a locally tight lower bound of (5), alternating 1
( i + µi I) Gik . We summarize the shorthands in Table I,
minimization of (6) will also converge to a stationary point of
and the WMMSE algorithm written using these shorthands in
(5) [31].
Algorithm 1.
1) WMMSE Algorithm for Distributed Weighted Sum Rate
The WMMSE algorithm operates in two phases: one in
Optimization: First, by fixing {Wik , Vik } in (6), it can easily
which the UEs form their receive filters and weights, and one
be shown that the problem decouples over the UEs. The
in which the BSs form the precoders for their served UEs.
solution is the well known MMSE receiver
The optimization steps at the UEs and BSs are completely
A ik = 1
8 ik . (8) decoupled, and as summarized in Table II, the nodes only
ik Hik i Vik ,
require local CSI and local weights. Hence, the WMMSE
Next, by fixing {Aik , Vik }, the problem again decouples over algorithm is an example of distributed resource allocation.
the UEs. UE ik should therefore solve minWik Tr (Wik Eik ) In Sec. III, we will describe how the nodes can exploit the
log det (Wik ), and the solutions are channel reciprocity to obtain local CSI in a distributed fashion.
1
Wik = Eik1 = I ViHk HH
ik i
1
i k Hik i V i k , 8 ik , (9) III. D ISTRIBUTED CSI ACQUISITION
where the last equality comes from plugging in Aik from (8). According to Table II, the UEs require knowledge about
the effective channel Fik from their serving BSs, as well
Finally, it remains to solve (6) for {Vik }, while keeping
as the signal and interference plus noise covariance matrix
the UE variables {Aik , Wik } fixed. The problem decouples
ik . The BSs need to know the effective uplink channels
over the BSs, and it can be shown that the remaining problem
{Gik }K k=1 to the UEs they serve, the corresponding MSE
c

1/2
4 This technique is also known as block coordinate descent or block weights {Wik }K k=1 , and the uplink signal plus interference
c

nonlinear Gauss-Seidel in the literature. covariance matrix i . Several methods for obtaining the
BRANDT AND BENGTSSON: DISTRIBUTED CSI ACQUISITION AND COORDINATED PRECODING FOR TDD MULTICELL MIMO SYSTEMS 5

Kc
X
BSs estimate {Gik }K
Downlink training
k=1 and
c
BSs transmit V i k P ik i

Uplink training
k=1

... ... ... ...

BS 1 BS i BS Kt BS 1 BS i BS Kt

UE 11 UE 12 UE ik UE KtKc UE 11 UE 12 UE ik UE KtKc
... ... ... ...
UEs estimate Fik and ik UEs transmit Uik P ik

Fig. 1. CSI estimation in one subframe (cf. Fig. 2). In each subframe, the downlink channels are estimated using pilots from the BSs. Later, the uplink pilots
are estimated using pilots from the UEs. Additionally, the UEs feed back Wik to their serving BS using an out-of-band feedback link.

Subframe n-1 Subframe n Subframe n+1

Guard
Uplink pilots Uplink data Downlink pilots Downlink data time Uplink pilots Uplink data Downlink pilots Downlink data

Optimization @ BSs Optimization @ UEs Optimization @ BSs Optimization @ UEs

Fig. 2. Schematic drawing of subframes.

required CSI5 at the nodes can be imagined, using various 90 uplink-downlink iterations in one coherence interval when
combinations of channel estimation, feedback, signaling and the UEs are slowly moving. In continuous fading channels, the
backhaul. In this section, we will propose three CSI acquisition proposed algorithm would possibly instead be able to track the
methods, with different tradeoffs between these aspects. channel variations, assuming that they are slow enough. In the
The channel estimation in the proposed methods exploits rest of the paper, we make the assumption that the channel is
the reciprocity of the network, and uses pilot transmissions in changing slowly enough for the iterative algorithm to reach
both uplink and downlink. As the effective channels change adequate performance.
between iterations in the WMMSE algorithm, we propose We now detail the different CSI acquisition methods, which
to perform a training phase between one iteration and the all rely on pilot-assisted channel estimation. When a statistical
next. A schematic drawing of the subframe structure that we characterization of the channel is available, the MMSE channel
envision can be seen in Fig. 2. The subframe is split between estimator [5] is typically used. Here we estimate the effective
pilot transmission and data transmission, in both the uplink channels, which are updated in each WMMSE iteration based
and downlink. Data transmission thus takes place between the on the current channel conditions. Obtaining a statistical
filter updates of the algorithm. The ratio between uplink and characterization of the effective channel is thus complicated.
downlink data transmission lengths could be flexibly allocated In the estimation, we therefore regard the effective channels
[32]. Before the iterative algorithm has converged, the data as deterministic but unknown. Under this perspective from
rates that are achievable in the downlink data transmission classical estimation theory, it is easy to find the minimum
phase may be low, but not negligible, as shown by the variance unbiased (MVU) estimator.
numerical results in Sec. V-A1. An illustration of the channel
estimation in one subframe is shown in Fig. 1. A. Fully Distributed CSI Acquisition
In block fading channels, the coherence interval should be First, we seek to estimate the effective downlink channel
sufficiently long such that the iterative algorithm can perform Fik = Hik i Vik using synchronous pilot transmissions. In the
enough iterations to reach good performance. The deployment downlink training phase, the BSs transmit orthogonal pilot
scenario will determine the coherence time of the channel, and sequences6 Pik 2 CNd ⇥Np,d per user, such that Pik PH jl =
the details of the frame structure will determine the number Np,d INd ik ,jl . In order to fulfill the orthogonality requirement,
of subframes that can be transmitted within one coherence Np,d Kr Nd . The received signal Yik 2 CMr ⇥Np,d at UE
interval. As a brief example, under a block fading channel ik is then
with carrier frequency fc = 2 GHz and UE speed v = 3 km/h, X
the coherence time can be modeled as Tc = 2f1c vc = 90 ms Y i k = H i k i V i k Pi k + Hik j Vjl Pjl + Zik . (11)
[33]. For future 5G systems, the TDD switching periodicity (j,l)6=(i,k)
is planned to be 1 ms or less [32], [34], leading to at least 6 The framework can be extended to allow for non-orthogonal pilots, but
then a pilot allocation scheme must be set up to minimize the problem of pilot
5 We remind the reader that our notion of ‘CSI’ encompasses knowledge of contamination [35]. Furthermore, the resource allocation step should take the
the effective channels and the covariance matrices; see Tables I and II. pilot contamination into account.
6 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, ACCEPTED FOR PUBLICATION, APRIL 2015

Notice that the power allocated to the pilots is the same as the The WMMSE algorithm however needs an estimate of i =
s+i s+i+n
power allocated to the data symbols in (1). This will enable i , without the noise covariance component of i . In
distributed and unbiased estimation of ik . This type of pilot Sec. IV-C, we resolve this issue by modifying the WMMSE
transmissions, intended to estimate the effective channels, are algorithm.
called ‘UE-specific reference signals’ in the LTE standard [36]. When forming the precoder in (10), the product
p 1/2
Assuming that UE ik knows its designated pilot Pik , this ↵ik HH ik i Aik Wik = Gik Wik is needed. Instead of inde-
is a deterministic parameter estimation problem in Gaussian pendently estimating this quantity in a second uplink esti-
noise. The MVU estimator of the effective channel Fik 2 mation phase, we let UE ik feed back Wik to its serving
CMr ⇥Nd is then [5]: BS i. Together with (15), BS i can then form G b i W1/2
k ik
and use that in (10). The point of this procedure is to avoid
Fb i = 1 Yi PH = Hi i Vi + 1 Zi PH . (12)
k
Np,d k ik k k
Np,d k ik signal cancelation [37], where a small mismatch between the
1/2
estimate of Gik Wik and the estimate of i can have a
The MVU estimator is an unbiased, efficient and asymptot-
large detrimental impact on performance. If Gik and i are
ically consistent (in Np,d ) estimator of Fik . In addition to
estimated using the same pilot transmissions, as in (15) and
knowing Fik , UE ik also needs knowledge of ik 2 CMr ⇥Mr .
(16), the covariance matrix can be decomposed as b s+i+n =
This can be achieved by applying the sample covariance b i+n + Gbi G
i
b H . Because of this structure, there is no mismatch
estimator: i k ik
between G b i W1/2 and b i , and signal cancelation does not
b i = 1 Yi Y H
k ik
occur [37].
k
Np,d k ik
X It can be shown that Rik = log det (Wik ). Feedback of
1
= Hik j Vjl VjHl HH
ik j + Zi ZH the eigenvalues of Wik therefore constitutes a rate request for
Np,d k ik (13) each data stream of UE ik , describing what rate that stream can
(j,l)
1 X handle under the current network conditions. This information
+ Hik j V j l P j l Z H H H H
ik + Zik Pjl Vjl Hik j . is already fed back to the serving BS in a practical system.
Np,d
(j,l) Recall that ↵ik is fixed and known at BS i, and does therefore
Since the only stochastic component of Yik is Zik , the not need to be fed back.
estimator in (13) is unbiased. Remark 1. The CSI acquisition proposed in this section is
The uplink estimation is performed in a similar man- fully distributed over BSs and UEs, in the sense that only
ner as the downlink estimation. Now the UEs each trans- over-the-air signaling is required. UE ik feeds back Wik to its
mit a signal X ik = Uik P ik , where P ik 2 CNd ⇥Np,u serving BS, but the BSs do not need to share any information
are orthogonal pilots, such that P ik P Hjl = Np,u INd ik ,jl . over a BS backhaul.
As will be shown by Proposition 1 in Sec. IV-D,
2 1/2
kUik kF = ↵ik ||Aik Wik ||2F  ↵ik Nd / r2 . In order
to maximize the uplink B. CSI Acquisition with Global Sharing of Individual Scaling
p estimation SNR, the scaling factor Parameters
is set as7 = Pr r2 /Nd , where Pr is the maximum
transmit power of the UEs. The UE quantities Pr and r2 As noted in the previous section, and proved in Sec. IV-D,
2
are assumed to be known at the BSs, such that they have kUik kF  ↵ik Nd / r2 . The scaling factor was set based
perfect a priori knowledge of . For this setup, assuming on this to maximize the uplink transmit power. However,
synchronized pilot transmissions from the UEs, the received unless the inequality is met with equality and ↵ik = 1,
signal Y i 2 CMt ⇥Np,u at BS i during the uplink training the transmit power constraint of that particular UE is not
phase is met. Correspondingly, the uplink estimation SNR suffers for
Kc Kt X
Kc
that UE. If the requirement of fully distributed estimation of
X X the uplink covariance matrix is dropped, and by introducing
Yi = HH
ik i U ik P ik + HH
jl i U jl P jl + Z i .
individual scaling factors for the UEs, the maximum uplink
k=1 j6=i l=1
(14) transmit power can always be used.
The MVU estimator of the uplink effective channel Gik 2 In this section, we keep the downlink estimation the same as
CMt ⇥Nd is in Sec. III-A, but modify the uplink estimation to maximize
the transmit power used. The BSs will then need access to
b i = 1 Y i P H = HH Ui + 1 Z i P H . (15)
G a backhaul network, where information about the individual
k ik ik i k ik
Np,u Np,u
scaling parameters canp be shared.
Furthermore, the signal and interference plus scaled noise Letting X ik = U Pr Uik P ik , the effective uplink trans-
covariance matrix s+i+n 2 CMt ⇥Mt is estimated using the k ik kF
i
mit power is maximized for UE ik . The received signal at BS
sample covariance:
i is then
b s+i+n = 1 1
YiYH (16) p X 1
i 2 i . Y i = Pr HH Uj P j + Z i . (17)
Np,u kUjl kF jl i l l
(j,l)
7 Note 2
that the UE dependent factor ↵ik in Uik F should not be removed
by the scaling, since then i cannot be estimated in a fully distributed fashion. We now assume that the individual scaling factors kUik kF are
If ↵ik < 1, the full transmit power of UE ik cannot be used. fed back from the UEs to their serving BSs, and then globally
BRANDT AND BENGTSSON: DISTRIBUTED CSI ACQUISITION AND COORDINATED PRECODING FOR TDD MULTICELL MIMO SYSTEMS 7

TABLE III
F EEDBACK AND ESTIMATION NEEDED FOR THE DIFFERENT CSI ACQUISITION METHODS .

Method Estimated at BS i feedback to Estimated at UE ik feedback to Shared information over


UE ik served UE ik BS i serving BS i BS backhaul

Fully distributed (Sec. III-A) ik , F ik ↵ik ⇧ i, {Gik } Wik —


Globally shared individual scale ik , F ik ↵ik ⇧ {Gjl i } Uik F
, Wik { Ujl F
}
factors (Sec. III-B)
Globally shared filters (Sec. III-C) {Hik j }† ↵ik ⇧, {Vjl } {Hjl i }† Uik , Wik {Ujl }, {Wjl }, {Vjl }
⇧ The user priorities only need to be fed back if/when they are changed.
† The estimated quantities for the methods in Sec. III-A and Sec. III-B depend on the transmit and receive filters, and must therefore be re-estimated in every
subframe. The estimated quantities for the method in Sec. III-C do not change within one coherence block, and can therefore be improved upon in every
subframe.

shared over a BS backhaul. BS i can then estimate the effective TABLE IV


channels from UE jl as T OTAL ESTIMATION COMPLEXITY, PER ITERATION AND UE.

b j i = pkUjl kF Y i P H = HH Uj + pkUjl kF Z i P H .
G Method Approximate number of flops
l jl ik j l jl
Pr Np,u Pr Np,u Sec. III-A (Mr Nd + Mr2 )Np,d + Mt Nd Np,u + Mt2 Np,u /Kc
(18)
Sec. III-B (Mr Nd + Mr2 )Np,d + (Mt Nd Np,u + Mt2 Nd + Mt2 )Kt
Since the scaled pilots effectively all have the same
weight, the sample covariance estimator of i in (16) Sec. III-C Mr Mt Kt (Np,d + Np,u ) + (Mr Mt Nd + Mr2 Nd +
Mr2 )Kr + (Mr Mt Nd + Mt2 Nd + Mt2 )Kt + Mr
cannot be used. PInstead, we rely on the biased estima-
tor b s+i+n
i = b bH
(j,l) Gjl i Gjl i . The bias is determined
p
by the factors kUik kF / Pr and the pilots { P jl }. Since Rayleigh fading, vec (Hik j ) ⇠ CN (0, I), the MMSE estimator
2
kUik ⇥kF p ↵ik Nd / ⇤r2 , the scaling factors could be quantized [5] is p
over 0, ↵ik Nd / r , 8 ik . b Pj / (Kc Mt )
Hik j = Y i k PH
j. (20)
This estimation scheme is similar to one proposed in [14], Pj
Np,d Kc M + 2
r
where a scaled version of Aik was used as the uplink precoder.
t

The MSE weights Wik can then be directly estimated at the Assuming that the noise variance r2 is known, and that all
serving BSs, and do not need to be fed back. However, in order precoders {Vjl } have been fed back to UE ik , it can form
X
for the BSs to estimate b s+i+n in that estimation scheme, they bi = H b i i Vi , b i = b i j Vj V H H
bH 2
F H jl ik j + r I. (21)
i
must exchange the MSE weights for their corresponding UEs
k k k k k l
(j,l)
over the backhaul. In essence, reduced over-the-air feedback
has been traded for more backhaul use. With the subframe structure in Fig. 2, consecutive training
phases can be used to monotonically improve the channel
Remark 2. The CSI acquisition proposed in this section is estimates in one coherence block of the channel. This can
fully distributed over the UEs, but not over the BSs. Each BS be done using iterative techniques, see e.g. [38, Ch. 12.6].
needs knowledge of the individual scaling factors for all UEs, In the uplink, assuming feedback of receive filters and MSE
information which is shared over a BS backhaul. weights, which are shared among all BSs, G b i and b i are
k
formed in a similar fashion as in (21).
C. CSI Acquisition with Global Sharing of Precoders, Receive Remark 3. The CSI acquisition proposed in this section is
Filters and MSE Weights centralized. It requires significant signaling of filters among
Lastly, we present an CSI acquisition scheme which re- BSs and UEs in every subframe. In terms of estimating the
lies even further on feedback, signaling and backhaul. We underlying channels {Hik j }, it is however distributed over
present this method since the state-of-the-art robust WMMSE the BSs and UEs.
algorithms in [7]–[10] require this type of CSI acquisition.
In this scheme, only the underlying channels are estimated
exploiting the reciprocity, and the filters and MSE weights D. Feedback Requirements, Computational Complexity and
must be signaled between all nodes. Quantized MSE Weight Feedback
In the downlink training, Pj 2 CMt ⇥Np,d are orthogonal We compare the feedback requirements of the proposed
pilots sent from BS j such that Pi PH j = Np,d IMt i,j . The
estimation schemes in Table III. The estimation matrix
received signal at UE ik is then operation complexities [39] are shown in Table IV. The
r
X
r Mr Mt Kt (Np,d + Np,u ) term in the Sec. III-C estimation
Pi Pj method flop count dominates all other terms when the number
Y ik = Hi i Pi + Hi j P j + Z i k .
Kc Mt k K c Mt k of pilots is large. This effect is illustrated in Fig. 8 of
(j,l)6=(i,k)
(19) Sec. V-A3, where the complexities of the estimators are
This type of pilot transmissions are called ‘cell-specific ref- visually compared with the correspondingly achieved sum
erence signals’ in the LTE standard [36]. For the case of rates.
8 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, ACCEPTED FOR PUBLICATION, APRIL 2015

In the CSI acquisition proposed in Sec. III-A and Sec. III-B,

Average sum rate [bits/s/Hz]


40
feedback of the MSE weights to the serving BS is required. 35 Perfect CSI
Sec. III−C
In order to be practical, the MSE weights should be quantized 30 Sec. III−B
and fed back. Since the MSE matrix Wik is Hermitian, it 25 Sec. III−A
can be quantized by quantizing its eigenvalue decomposition. 20
The eigenvectors can be quantized using e.g. Grassmannian 15
subspace packing [40]. Recalling that s1 (·) denotes the largest 10
singular value, we have the following helpful lemma for the 5
quantization of the eigenvalues: 0
0 5 10 15 20 25 30
Lemma 1. The eigenvalues of the MSE weights are bounded Downlink SNR [dB]
P s2 (H )
as 1  n (Wik )  1 + i 1 2 ik i , 8 ik , n. Fig. 3. Sum rate performance when naïvely applying the WMMSE algorithm
r
together with the CSI acquisition schemes. The scenario was a Kt = 3,
Proof: The proof is given in Appendix A. Kc = 2, Mt = 4, Mr = 2 interfering broadcast channel with Nd = 1.
The square roots of the eigenvalues of Wik can therefore The channels were i.i.d. Rayleigh fading, and the uplink SNR was set
be quantized by finding8 a suitable scalar quantizer over the as SNRu = Pr / t2 = 10 dB for all links. Note that SNRd = Pt / r2
affects both the power constraint in the WMMSE algorithm, as well as the
interval in Lemma 1. After UE ik has fed back the quantized estimation performance in the downlink estimation, since the downlink pilots
eigenvectors and the quantized square roots of the eigenvalues, are precoded with the same precoders as used in the data transmission.
the serving BS can then form the reconstructed square root of
the MSE weight W c 1/2 .
ik
The proposed CSI acquisition methods provide estimates both
PAs mentioned in Sec. III-A, Rik = log det (Wik ) = of the downlink effective channels, as well as of the uplink ef-
n log ( n (Wik )) can be seen as the data rate (summed
over data streams) for UE ik . Quantizing n (Wik ) therefore fective channels. Due to the definition of these effective chan-
corresponds to making a set of discrete rates available to the nels, the general uncertainty set in (22) cannot be explicitly
UE, corresponding to e.g. a set of different modulation and defined in terms of the uplink and downlink estimation errors
coding schemes. simultaneously. For example, one of the terms in the objective
1/2 H
function of (22) is Wik AH i k Hik i V i k = W i k G i k V i k =
IV. I NHERENT AND E NFORCED Wik Aik Fik , which cannot be written in terms of Gik and Fik
H

ROBUSTNESS OF WMMSE S OLUTIONS simultaneously. In the forthcoming alternating minimization,


we will therefore solve (22) with the CSI uncertainty relating
In this section, we propose some modifications to the
to the particular block of variables for which (22) is solved
original WMMSE algorithm that lead to an algorithm which
for. That is, for the precoders the CSI uncertainty at the BSs
is robustified against CSI estimation errors.
will be considered, whereas for the receive filters and MSE
weights, the CSI uncertainty at the UEs will be considered.
A. Naïve WMMSE Algorithm with Estimated CSI We now detail the alternating minimization solutions for the
It is straightforward to naïvely feed the WMMSE algorithm three blocks of (22).
the estimated CSI from one of the presented CSI acquisition
methods. An example of the resulting performance can be C. Precoder Robustness
seen in Fig. 3. The simulation settings are described in detail
First we fix {Aik , Wik } and solve (22) with respect to
in Sec. V-A. It is clear that the naïve application of the
the precoders {Vik }. The optimization problem can then be
WMMSE algorithm works moderately well for the centralized
interpreted as a local optimization problem at each BS, given
CSI acquisition schemes, but the performance for the fully dis-
that the CSI uncertainty in (22) comes from the uplink channel
tributed CSI acquisition scheme catastrophically deteriorates
estimation phase. We let the estimation errors for BS i be e i =
at high SNR. Thus, some form of robustification against CSI b s+i and G
e i = Gi G b i , k = 1, . . . , Kc . For the local
i
estimation errors is necessary. i k k k
uncertainty set, we assume that the errors are norm bounded
as e i  "(BS) e i W1/2  ⇠ (BS) , k = 1, . . . , Kc .
and G
B. General Worst-Case Robustness WMMSE Problem i k ik
F
ik
F
One approach to robustifying the optimization problem in Note that ⇠i(BS)
k
depends on Wik , which is fixed. The local
(6) is to minimize the objective function under the worst-case worst-case optimization problem for BS i is then:
error conditions: Kc
X ⇣ ⇣ ⌘ ⌘
X minimize max Tr ViHk b s+i + e i Vi
i
min. max ↵ik (Tr (Wik Eik ) log det (Wik )) {Vik } k e i kF "(BS)
k

{Aik } {uncertainty} i k=1


{Wik 0}
(i,k) e i W1/2
G k i ⇠i(BS)
k
F k
{Vik } ✓ ✓ ⇣ ⌘H ◆◆
p 1/2 b e
Kc
X 2 ↵ik Re Tr Wik G ik + G ik V ik
subject to Tr Vik ViHk  Pi , i = 1, . . . , Kt .
Kc
X
k=1
(22) subject to Tr Vik ViHk  Pi , i = 1, . . . , Kt .
k=1
8 The details of designing such a quantizer is outside the scope of this paper. (23)
BRANDT AND BENGTSSON: DISTRIBUTED CSI ACQUISITION AND COORDINATED PRECODING FOR TDD MULTICELL MIMO SYSTEMS 9

The solution to the inner optimization problem of (23) can be

Average sum rate [bits/s/Hz]


40
found by extending the results of [22], [23] to the multiuser SNRu 2 {0, 5, . . . , 25, 30} dB
matrix case. By upper bounding the optimal value of the inner 30
optimization problem using the triangle inequality9 and the
submultiplicativity of the Frobenius norm, the (pessimistic) 20
robust optimal precoder for UE ik is
! ! 1 10
(BS)
p ⇠
b s+i + "(BS) + ik + µi I b i W1/2 .
Vrob = ↵i
ik k i i G k ik 0
Virob
k F
−10 0 10 20 30
(24) Scaled Sum Power Constraint ⇢Pt [dB]
As before, µi is the Lagrange multiplier for the sum power
constraint. Note that the robust precoder in (24) is diagonally Fig. 4. Sum rate performance when varying ⇢, for Pt = 1000 and
loaded by a constant factor "(BS) r = 1 and varying SNRu . The solid markers represent the performance
2
i , a data dependent factor P / t2
⇠i(BS)
k
/ V rob
i k F
, and the Lagrange multiplier µi . Diagonal load- for ⇢ = min( Pr /
t
2
r
, 1). The scenario is the same as in Fig. 3.
ing is well known to robustify beamformers in various settings,
and a large body of literature has studied its robustifying
2) Form scaled precoders Viopt
(⇢)
effects; see e.g. [17]–[23]. k
= p1⇢ Vik , and use these
In order to construct the robust precoder in (24), the for downlink pilot and data transmission. This scaling
parameters "(BS) i and ⇠i(BS) must be known. For the fully ensures that the correct transmit power is used.
k
distributed CSI estimation in Sec. III-A, the effective channel 3) At the UEs, perform the estimation given the precoders
error G e i follows a zero-mean Gaussian distribution with {Vik }, giving {Fb (⇢) } and { b (⇢) }.
k ik ik
p b (⇢)
known covariance, and ⇠i(BS) k
can thus be selected such that 4) Scale the estimates as b ik = ⇢ b ik , F
(⇢) b
ik = ⇢F i k ,
(BS)
Gei W
k
1/2
ik  ⇠ holds with some probability. The
ik
and use { b ik } and {F b i } to form receive filters and
k
F
MSE weights. This scaling is necessary in order for the
statistics of the covariance error e i however depend on the
WMMSE algorithm at the UEs to be aware of what the
filters {Uik }, which are unknown at BS i. Since the optimal (⇢)
original precoders Vik were.
amount of diagonal loading is unknown, we therefore propose
to disregard "(BS) and ⇠i(BS) , and let the factor µi handle all The same scaling ⇢ is used at all BSs, and therefore the signal-
i
to-interference ratios of the cross-links are not affected. In
k
the diagonal loading. To compensate for the missing "(BS) i and
⇠i(BS) , we implicitly amplify µ using a scaling procedure. Fig. 4, we plot the impact of selecting different⇣ ⇢. A simple ⌘
i P / 2
selection that appears to work well is ⇢ = min Prt / 2t , 1 .
k
1) Implicitly Selecting the Diagonal Loading Parameter: r
When applying diagonal loading for robustness, a heuristic 2) Removing the Noise Component of b s+i+n i : Comparing
often used in the literature [23] is to select a fixed loading (25) with (10), it can be noted that the covariance matrix
level around 10 dB over the noise level. Instead, we propose should be b s+i , and not b s+i+n . The noise portion of b s+i+n
i i i
a data dependent method for selecting the diagonal loading will on average be t2 I, but simply subtracting that might
2

parameter implicitly. We note that, given estimates b s+i+n i


bi
,G k make the resulting matrix indefinite. Instead, we modify µi
and fed back Wik , the precoders in the WMMSE algorithm to allow for negative values; this is the same as seeing
are formed like µi as the difference of a non-negative Lagrange multiplier
⇣ ⌘ 1
p with an estimate
⇣ 2 of the
⇣ noise⌘ power.
⌘ Specifically, we allow
Vik = ↵ik b s+i+n i + µ i I Gb i W1/2 .
ik (25)
min t2 , Mt b s+i+n
k
µi i ⇣ where ⇣ is some constant
The form of (25) and (24) are similar, and it can therefore be value determining how close to singular b s+i+n + µi I can be. i
concluded that µi alone acts as the diagonal loading for the
naïve WMMSE precoder. The factor µi therefore robustifies
the solution, and the amount of diagonal loading is determined D. Receive Filter and MSE Weight Robustness
by b s+i+n b i W1/2 and Pi .
,G With similar notation and assumptions as in (23), the local
i k ik
In order to artificially amplify the factor µi , we now worst-case optimization problem for the receive filter at UE
introduce a scaling procedure. We let 0  ⇢  1 be a scaling ik is
factor, and modify the WMMSE algorithm as follows: ⇣ ⇣ ⇣ ⌘ ⌘⌘
min. max Tr Wik I + AH b i + e i Ai
ik
1) In the precoder optimization at BS i (step 4 in Al- {Aik } k e i k "(UE)
k F i
k k k

gorithm 1), let the sum power constraint be ⇢Pi . The e i W1/2
F i ⇠i(UE)
(⇢) k k F k
resulting precoders from (25) are denoted {Vik }, and ✓ ⇣ ✓ ⌘H ◆◆
will have equal or higher diagonal loading level than 2Re Tr Wik Fbi + F
k
ei
k
A ik ,
the original precoder in (25), since µi is nonincreasing
(26)
in the sum power constraint value.
whose (pessimistic) solution [23] is
9 This relaxes the problem such that e is the worst for each UE simultane-
i
! ! 1
ously. This is equivalent to replacing the existing covariance constraint with ⇠i(UE)
Arob = bi + "(UE) + k
I bi .
F (27)
 "(BS) , k = 1, . . . , Kc , and changing the objective accordingly.
ei ik k i 1/2 k
k i
Arob
i Wi
F k k F
10 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, ACCEPTED FOR PUBLICATION, APRIL 2015

Finally, the corresponding (pessimistic) robust MSE weight is 40

Average sum rate [bits/s/Hz]


Perfect CSI
0 ! ! 1 1 1 35 Robust UE and BS
(UE)
⇠i 30 Robust BS
Wrob = @I F b H b i + "(UE) + k
I Fbi A . 25 Robust UE
ik ik k i 1/2 k
Arob
i Wi Naive WMMSE
k k F
20
(28) 15
Again, the optimal level of diagonal loading is unknown. This 10
is because "(UE)
i depends on the statistics of the covariance 5
error e ik = ik b ik , which in turn depend on the unknown 0
precoders {Vik }. We therefore propose to indirectly apply 0 5 10 15 20 25 30
diagonal loading at the UEs instead, based on the following Downlink SNR [dB]
observation:
Fig. 5. Sum rate performance when selectively applying the robustifying
Proposition 1. The receive filter Aik and MSE measures in Sec. IV, together with the fully distributed CSI acquisition in
Sec. III-A. For comparison purposes, the scenario is the same as in Fig. 3.
weight Wik obtained in the UE side optimization
of the WMMSE algorithm ⇣ with perfect
⌘ CSI satisfies
1/2 i+n 1 1
||Aik Wik ||2F = Tr ik ik  N d / r . If the
2
the same form as Wirob . The important difference is that Aopt
ik
effective channel is fully contained in an interference-free
k
opt
and Wik do not depend on unknown parameters, whereas
subspace of dimension Ns  Nd , then asymptotically Arob rob opt opt
1/2 ik and Wik do. Thus, Aik and Wik can be applied as
||Aik Wik ||2F ! Ns / r2 as the SNR grows large. realizable proxies for the unrealizable Arob rob
ik and Wik in the
Proof: The proof is given in Appendix B. robust WMMSE algorithm to be proposed.
The first part of this proposition has an important con- By increasing ⌫ik , the requested rate log det Wiopt k
is
nection to the uplink training stage in the fully dis- decreased. A large ⌫ik would occur when there are obvious
1/2
tributed
p estimation scheme (Sec. III-A). Since Uik = discrepancies in the estimated CSI, such that ||Aik Wik ||2F 
p 1/2
Pr r2 /Nd ↵ik Aik Wik is acting as the uplink training Nd / r is far from being fulfilled without the diagonal loading.
2
1/2
stage precoder, ||Aik Wik ||2F determines the effective UE We visualize the robustifying effects in Fig. 5 for the same
transmit power, and hence the uplink estimation SNR. The simulation settings as in Fig. 3. The robustifying measures are
1/2
second part shows that ||Aik Wik ||2F also indicates whether effective, and result in up to a factor 5 sum rate gain over the
perfect interference alignment is achieved for UE ik . naïve WMMSE algorithm.
1) Enforcing Proposition 1 onto WMMSE Solutions with
Imperfect CSI: Proposition 1 relates to perfect CSI, but the
inequality may not hold for the naïve solutions in (8) and E. Robustified WMMSE Algorithm
(9) with imperfect CSI. In order to robustify the algorithm,
we therefore explicitly impose the constraint on the UE side We now combine the diagonal loading robustifications in
optimization problem with imperfect CSI. The problems still Sec. IV-C and Sec. IV-D (i.e. Viopt k
, Aopt opt
ik , and Wik ) to
decouple over users, and the problem each UE should solve form a RoBustified WMMSE algorithm (RB-WMMSE); see
is Algorithm 2. This algorithm can be combined with any of
⇣ ⇣ ⌘⌘
b i Ai the channel estimation procedures outlined in Sec. III, and
minimize Tr Wik I + AH
Aik ,Wik 0
ik k k
the joint system is fully distributed if the CSI acquisition is
⇣ ⌘ distributed.
2Re Wik F b H Ai
ik log det (Wik ) (29)
k
The existing robust WMMSE algorithms in [7]–[10] also
1/2
subject to ||Aik Wik ||2F  Nd / 2
r.
gain their robustness from diagonal loading, obtained by
optimizing a lower bound on performance. Although not being
Proposition 2. The solution to (29) is
directly tailored for TDD channel estimation, these algorithms
⇣ ⌘ 1
Aopt = b i + ⌫ opt I bi
F can be applied together with the centralized CSI acquisition
ik

k ik k
◆ method10 proposed in Sec. III-C. In doing so, an implicit
⇣ ⌘ 1 1
assumption on the channel estimation errors in the uplink
Wiopt
k
= I b H b i + ⌫ opt I
F i k k i k
bi
F k
. and downlink is made however. Since these algorithms only
1/2
have a notion of downlink channel estimation errors, they
If ||Aopt
i k W ik
opt
||2F  Nd / r2 holds for ⌫iopt
k
= 0, the con- are unaware of the uplink channel estimation errors in the
straint is not active and the solution has the same form as the TDD channel estimation. Thus, the implicit assumption that
original solution in Sec. II-A1. Otherwise, ⌫iopt can be found by the channel estimation errors in the downlink and uplink are
⇤ k
opt
bisection over 0, r such that ||Aik Wik
2 opt 1/2 2
||F = Nd / r2 . identical is made. The performance of this approach is studied
in Sec. V-A2.
Proof: The proof is given in Appendix C.
Interestingly, explicitly imposing Proposition 1 as a con- 10 The algorithms in [7]–[10] need the statistics of the CSI uncertainty,
straint in (29) corresponds to diagonal loading of the receive which is very complicated to derive for the CSI acquisition methods in
filter Aopt rob opt
ik , giving it the same form as Aik . Likewise, Wik has Sec. III-A and Sec. III-B since those methods estimate the effective channels.
BRANDT AND BENGTSSON: DISTRIBUTED CSI ACQUISITION AND COORDINATED PRECODING FOR TDD MULTICELL MIMO SYSTEMS 11

Algorithm 2 RB-WMMSE Algorithm (Estimated CSI)

Average sum rate [bits/s/Hz]


30
1: repeat
At UEs: 25
WMMSE (perfect CSI)
(⇢) b (⇢)
2: Pilot transmission from BSs: estimate b ik and F ik 20 RB−WMMSE (Sec. III−A)
RB−WMMSE (Sec. III−B)
using one of the methods in Sec. III. 15
(⇢) b p b (⇢) TDMA (MMSE estim.)
3: Rescale b i = ⇢ b , F
k i =
ik ⇢F k ik 10 Naive WMMSE (Sec. III−A)
2
1/2
4: Find ⌫ik to satisfy A i k W ik  Nd / 2
r 5 Uncoord. trans. (MMSE estim.)
✓ F ◆
⇣ ⌘ 1 1 0
5: W ik = I F ik b H b i k + ⌫ ik I
b
F ik 0 1 2 3
10 10 10 10
⇣ ⌘ 1 Subframe number
6: A ik = b ik + ⌫ ik I Fb i , Ui = p↵i Ai W1/2
k k k k ik
Fig. 6. Convergence comparison of the different methods for Kt = 3, Kc =
At BSs: 2, Mt = 4, Mr = 2, Nd = 1, SNRd = 20 dB and SNRu = 10 dB.
7: Pilot transmission from UEs: estimate b s+i+n i and G bi
k
using one of the methods in Sec. III.
1/2
8: Obtain Wik through ⇣ 2 feedback.⇣ ⌘ ⌘ A. Canonical interfering broadcast channel
9: Find µi min t2 , Mt b s+i+n i ⇣ For the simulations with the canonical channel the channel
PK c ⇣ ⌘
to satisfy k=1
(⇢) (⇢),H
Tr Vik Vik  ⇢Pi model was i.i.d. Rayleigh fading on all antenna-pairs in the
⇣ ⌘ 1 system such that [Hik j ]mn ⇠ CN (0, 1). This models a setting
10:
(⇢)
Bik = b s+i+n + µi I b i , V(⇢) = p↵i B(⇢) W1/2
G where each interfering link on average is equally strong as the
i k ik k ik ik

11: Scale Vik = p1⇢ Vik


(⇢) desired channel. We assume a sufficiently long coherence in-
terval, such that the channels do not change between iterations.
12: until fixed number of iterations
We let each BS serve Kc = 2 UEs, a setting which is feasible
for interference alignment [42]. For fairness when comparing
V. P ERFORMANCE E VALUATION estimation schemes, we let Np,d = Kt Mt and Np,u = Kr Mr .
The results were averaged over 1000 independent Monte Carlo
Performance of the proposed system is evaluated by means realizations.
of numerical simulations11 . Two scenarios are studied: 1) Convergence: First, we investigate the average conver-
1) A canonical interfering broadcast channel, without large gence behaviour of the RB-WMMSE algorithm with SNRd =
scale fading. This model is relevant in local environ- Pt / r2 = 20 dB and SNRu = Pr / t2 = 10 dB. The results in
ments where the inter-cell interference power levels are Fig. 6, indicate that the RB-WMMSE algorithm needs on the
on par with the desired power levels. order of 1000 iterations to converge, which is consistent with
2) A large scale 3-cell network, with path loss, shadow the findings of [11]. We do however note that a significant
fading, and small scale fading. This models a possible fraction of the final performance is achieved after just around
large scale deployment scenario, where only cell-edge 10 to 20 iterations. In the following, we therefore let the
users are significantly affected by inter-cell interference. algorithms iterate for 20 iterations.
In both scenarios, we study a case with Kt = 3 BSs, each 2) Sum Rate vs. Signal-to-Noise Ratio: Next, we study the
serving Kc users with Nd = 1 data streams each. The number sum rate when the downlink and uplink SNRs are varied.
of antennas were Mt = 4 and Mr = 2. The BSs transmit Recall that the downlink SNR affects both the downlink data
power was Pi = Pt for all BSs, and the UEs transmit power transmission, as well as the downlink estimation performance
was Pr for all UEs. Unless otherwise ⇣stated,2 the⌘RB-WMMSE (cf. Sec. III-A). The uplink SNR only affects the uplink estima-
P /
BS power scaling was set as ⇢ = min Prt / 2t , 1 , based on the tion performance. We compare with MaxSINR [43], for which
findings in Fig. 4. For numerical
r
stability,⌘we let the constant we actively turn off two users in order not to overload the

⇣ = 10 10 such that Mt b s+i+n + µi I 10 10 , 8 i, in algorithm. The results for the fully distributed CSI acquisition
i
(Sec. III-A) are shown in Fig. 7a. The RB-WMMSE algorithm
the RB-WMMSE algorithm. The UE data rate weights were
consistently performs better than MaxSINR, and better than
↵ik = 1 for all UEs. Truncated discrete Fourier transform
TDMA for sufficiently high uplink SNR. The results for the
(DFT) matrices of appropriate dimensions were used for the
CSI acquisition with global sharing of filters and MSE weights
pilot matrices Pik and Pi , as well as for the initial precoders.
(Sec. III-C) are shown in Fig. 7b. Here we also compare
As a baseline performance measure, we used single-user
with the lower bound optimization method of [7]–[10], which
eigenprecoding and waterfilling with channels estimated by the
requires this form of centralized CSI estimation. We relax
MMSE estimator in Sec. III-C. With the single-user process-
their requirement of downlink and uplink estimation errors
ing, we show the performance under time-division multiple
being identical (cf. Sec. IV-E). In Fig. 7b, it can be seen that
access (TDMA), as well as under nonorthogonal concurrent
the RB-WMMSE algorithm exhibits similar performance as
transmissions from all BSs simultaneously (‘uncoordinated
the lower bound optimization method of [7]–[10]. The sum
transmission’).
rates in Fig. 7b are higher than the corresponding sum rates
11 In the spirit of reproducibility, we provide the full Matlab simulation in Fig. 7a. This is because the improved channel estimation
package as open source. It is available for download at [41]. performance, due to the perfect feedback of filters, and that
12 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, ACCEPTED FOR PUBLICATION, APRIL 2015

40 40
WMMSE (perfect CSI) WMMSE (perfect CSI) SNRu = 10 dB
35 RB−WMMSE 35 Lower bound opt. WMMSE [7]-[10]
RB−WMMSE ( ⇢ = 1)
Average sum rate [bits/s/Hz]

MaxSINR

Average sum rate [bits/s/Hz]


SNRu = 10 dB
30 TDMA 30 MaxSINR
Uncoord. transmission TDMA
25 25 Uncoord. transmission

20 20

15 15
SNRu = 0 dB
10 SNRu = 0 dB 10

5 5

0 0
0 5 10 15 20 25 30 0 5 10 15 20 25 30
Downlink SNR [dB] Downlink SNR [dB]
(a) Fully distributed CSI acquisition (Sec. III-A) (b) CSI acquisition with globally shared filters and MSE weights (Sec. III-C)
Fig. 7. Sum rate after the 20th iteration for Kt = 3, Kc = 2, Mt = 4, Mr = 2 canonical interfering broadcast channel with Nd = 1.

30 5 40

Average sum rate [bits/s/Hz]


Average sum rate [bits/s/Hz]

Sec. III−C Perfect feedback


SNRd = 30 dB
Number of kiloflops

29 Sec. III−B 4 Quantized feedback


Sec. III−A Sum rates 30
28
3 SNRd = 20 dB
27 20
2 SNRd = 10 dB
26 Estimation flop counts
1 10
25 SNRd = 0 dB
RB-WMMSE flop count
24 0 0
12 20 30 40 50 60 70 80 90 100
Number of pilots Np,d = Np,u 1 2 3 4 5 6 7 8
Number of quantization bits

Fig. 8. Comparison of complexity and sum rate performance after the 20th Fig. 9. Sum rate as a function of quantization accuracy for Kt = 3, Kc =
iteration for Kt = 3, Kc = 2, Mt = 4, Mr = 2 canonical interfering 2, Mt = 4, Mr = 2 canonical interfering broadcast channel with Nd = 1
broadcast channel with Nd = 1, SNRd = 20 dB, and SNRu = 10 dB. and SNRu = 30 dB.

B. Large scale 3-cell network


the estimates of the channels are improved in every iteration,
The results presented so far describe performance in a set-
as described in Sec. III-C.
ting where the desired signal and interfering signals had equal
3) Sum Rate and Complexity vs. Flop Count: For the case average power levels. In realistic deployments, e.g. macrocell
with SNRd = 20 dB, and SNRu = 10 dB, we vary the number setups, large scale fading such as path loss and shadow fading
of pilots Np,d = Np,u and study the resulting performance are present however, leading to a more heterogeneous setting.
and complexity of the system. The results can be seen in In order to investigate the performance for such a setting, we
Fig. 8. The more complex CSI acquisition methods perform study a scenario where the BSs are located at the vertices
slightly better in the sum rate sense. The centralized CSI of an equilateral triangle, and the antenna bore sights are
acquisition from Sec. III-C requires particularly many flops, aimed towards the centre of the triangle (see Fig. 10). This
since it estimates all interfering channels. scenario models three interfering sectors in a larger hexagonal
macrocell deployment. In particular, we assume a setup where
4) Quantized MSE Weight Feedback: So far, the feed- fractional frequency reuse is combined with coordinated pre-
back of the MSE weights was assumed to be perfect. We coding, such that cell centre and cell edge users are served
now study performance of the system, using quantized MSE on orthogonal subbands [44]. Since the cell centre users
weights, while varying the number of feedback bits used. typically have very high signal-to-interference ratios (SIRs),
For the case with fixed uplink SNRu = 30 dB, we vary the they can be served well using single cell techniques. The cell
SNRd and the number of quantization bits. Each UE had edge users experience low SIRs however, and thus multicell
an
h individual⇣ codebook with⌘iweights uniformly quantized on coordinated precoding is a fruitful transmission strategy for
Pi s21 (Hik i )
0, 10 log10 1 + 2 dB. The performance is shown these users. Since our focus is on coordinated precoding, our
r
in Fig. 9. For higher downlink SNR, more bits are needed simulations only study the performance of the cell edge users.
for good performance. For high resolution quantization, the The simulation parameters (see Table V) can be described as
performance is equal to that of perfect feedback. a simplified version of the 3GPP Case 1 [45], [46], where the
BRANDT AND BENGTSSON: DISTRIBUTED CSI ACQUISITION AND COORDINATED PRECODING FOR TDD MULTICELL MIMO SYSTEMS 13

TABLE V
S IMULATION PARAMETERS FOR LARGE SCALE 3- CELL NETWORK
(A DAPTED FROM 3GPP C ASE 1 [45], [46])
BS 1
Inter site distance 500 m
Max. distance, UE to cell edge 50 m
Path loss 15.3 + 37.6 log10 (distance [m]) dB
Penetration loss 20 dB
✓ ⇣ ⌘2 ◆
BS antenna gain [47] min 12 35✓ , 23 dB

UE antenna gain 0 dB
500 m 500 m
UE 12 UE 11
e
Shadow fading Lognormal with std. dev. 8 dB
dg
ll e
Ce
Small scale fading i.i.d. CN (0, 1)
UE 32
UE 21 UE 31 Bandwidth? 15 kHz
BS transmit power? Pt = 18.2 dBm
UE receiver noise power? 2
r = 123.2 dBm (9 dB NF)
UE 22 UE transmit power? Pr = 4.8 dBm
BS 2 BS 3 BS receiver noise power? 2
t = 127.2 dBm (5 dB NF)
500 m
? We only study one subcarrier in a 10 MHz system with 600 subcarriers.
The total transmit powers are thus Pttot = 46 dBm and Prtot = 23 dBm.
Fig. 10. Cell layout for large scale 3-cell network, here displayed with
Kc = 2 UEs per cell.

slightly drops as more users are selected. This is because the


small scale fading is i.i.d. Rayleigh fading, and where we only RB-WMMSE algorithm, just like the WMMSE algorithm, is
study one subcarrier. able to implicitly perform user selection in the iterations. The
The purpose of the simulation study is to investigate the fact that the performance is almost constant when more than 3
impact on sum rate performance, when the number of cell edge users are selected for transmission in each cell suggests that the
users per cell Kc is varied. For each of the two simulations RB-WMMSE algorithm is able to find a good local solution
to be described, we generated 500 independent user drops to the weighted sum rate problem.
(including shadow fading), where the UEs were dropped 2) Sum Rate vs. Total Number of Users per Cell: We now
uniformly at random in the cell, but never farther than 50 study system performance as a function of the total number of
m from the cell edge. For each user drop, 5 independent users per cell Kc . Given the results in the previous section, we
small scale fading realizations were generated. We used the select at most 2 users for transmission per cell for MaxSINR.
fully distributed estimation method in Sec. III-A for channel Similarly for TDMA and uncoordinated transmission, disre-
estimation in the RB-WMMSE algorithm and MaxSINR, and garding the obvious unfairness of such a strategy, we only
we assumed perfect feedback for the MSE weights. In order select 1 user per cell. For the RB-WMMSE and WMMSE
to have the same estimation performance regardless of Kc , we algorithms, we do not explicitly perform any user selection,
fixed Np,d = Np,u = 3 · 10 · 1 = 30. but rather let the algorithms perform their own implicit user
We used the same baseline methods as described in selection in the iterations. The results of the simulation can
Sec. V-A. For large Kc , all baseline methods would perform be seen in Fig. 11b. For the baselines with user selection,
poorly due to the high interference levels experienced at the the improved performance with Kc is due to the increased
cell edge. We therefore coupled the baseline methods with a multiuser diversity. The RB-WMMSE algorithm is also able
user selection procedure that determined which UEs to serve. to harness this increase in multiuser diversity. The large gap
Before describing the main results of the simulation study, we between the perfect CSI case, and the RB-WMMSE algorithm
first detail the user selection procedure performance. with fully distributed CSI acquisition (Sec. III-A) is due to
1) Sum Rate vs. Number of Users Selected for Trans- the low uplink SNR in this scenario12 . The RB-WMMSE
mission: In order to study how many users to select for algorithm performs slightly worse than the MaxSINR with
transmission for the baseline methods, we performed simu- user selection, for large Kc . It has however been verified
lations where Kc = 10 users were dropped per cell, and the that this gap can be closed by combining the RB-WMMSE
number of users selected for transmission was varied. The algorithm together with explicit user selection, serving at most
user selection was based solely on the channel strength to the 3 users per cell (cf. Fig. 11a). Here we however show the
serving BS. The results are plotted in Fig. 11a. It can be seen results without explicit user selection, in order to display the
that performance for MaxSINR is maximized when 2 users self-reliant performance of the algorithm. The uncoordinated
are selected for transmission. For TDMA and uncoordinated transmission strategy works fairly well in terms of sum rate,
transmission, performance is maximized when only a single
12 Along the cell edge, the average uplink SNR ranges between 4.9 dB
user is selected for transmission in each cell. The performance
and 11.8 dB. The corresponding average downlink SNR ranges between
of the RB-WMMSE algorithm is the highest when 3 users are 7.2 dB and 14.1 dB. Note that the average SNRs are determined both by the
selected for transmission in each cell, but performance only distance to the BS, as well as the angle to the antenna bore sight.
14 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, ACCEPTED FOR PUBLICATION, APRIL 2015

60 50
WMMSE (perfect CSI) TDMA WMMSE (perfect CSI)
RB−WMMSE Uncoord. transmission MaxSINR (perfect CSI)
MaxSINR (perfect CSI) WMMSE (naive) RB−WMMSE
50

Average sum rate [bits/s/Hz]


Average sum rate [bits/s/Hz]

MaxSINR 40 Uncoord. transmission


MaxSINR

40
30
30
20
20 TDMA
WMMSE (naive)
10
10

0 0
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Number of users selected for transmission per cell and subcarrier Total number of users per cell and subcarrier Kc
(a) Varying the number of users selected for transmission, given Kc = 10 (b) Varying the total number of users per cell Kc . At most 2 users per cell
users per cell. were served by MaxSINR. For TDMA and uncoordinated transmission, only
1 user per cell was served.
Fig. 11. Sum rate after the 20th iteration for the large scale scenario with fully distributed CSI acquisition (Sec. III-A).

kDck
but as noted earlier, corresponds to a highly unfair situation Now introduce the spectral norm |||D|||2 = maxc kck 2 =
2
where only one user is served per cell. s1 (D). Then, for all cik such that kcik k2 = 1, we have that
2
cH H H
ik Vik Hik i Hik i Vik cik  kVik cik k2 · 1 HH
ik i Hik i
VI. C ONCLUSIONS 2
= kVik cik k2 · s21 (Hik i )  |||Vik |||22 · kcik k2 · s21 (Hik i )
2

Many distributed coordinated precoding algorithms have 2


= |||Vik |||22 · s21 (Hik i )  kVik kF · s21 (Hik i )  Pi s21 (Hik i ) .
been proposed in the literature, but few of these works study (32)
the issue of robustness against imperfect CSI. The works that
do study this important issue can however not be coupled with Thus, 1 ViHk HH ik i Hik i Vik  Pi s21 (Hik i ), and the upper
distributed CSI acquisition, thus leading to centralized CSI bound then directly follows from (31). For the lower bound,
i+n 1
acquisition requiring large amounts of BS backhaul usage. To note that I I + ViHk HH ik i ik Hik i V i k = W i k . ⌅
our knowledge, the present paper is the first that proposes a
robust, and yet still fully distributed, coordinated precoding B. Proof of Proposition 1
i+n
algorithm. In doing so, three CSI acquisition methods have Decompose ik = F ik F H
ik + ik and let
1 2
been proposed, and the corresponding requirements in terms of C ik = F ikH i+n
ik Fik and Dik = Fik H i+n
ik F ik .
channel estimation, feedback, and signaling have been illumi- We have that Aik = ik
1
F i k
and W i k
= I + C ik .
nated. The robustification of the algorithm comes from using 1/2
Plugging in, ||Aik Wik ||2F = Tr Aik Wik AH =
ik
inherent properties of the WMMSE solutions and applying 2
Tr Aik Aik Wik = Tr Fik ik Fik (I + Cik ) . Applying
H H
a data dependent scaling procedure, leading to a realizable
the matrix inversion lemma to ik1 , it can be shown that
algorithm which does not depend on unknown parameters of 2 1 1
FHik ik Fik = (I + Cik ) Dik (I +⇣Cik ) after simpli-
the CSI uncertainty. Compared with the centralized state-of- ⌘
1/2 1
the-art methods, our system performs similarly, with the major fications. Thus, ||Aik Wik ||2F = Tr (I + Cik ) D ik =
✓ ⇣ ⌘ 1 ◆
difference that our system can be implemented in a fully i+n 1 i+n 1 i+n 1
Tr F I + F H
F FH .
distributed manner. When evaluated in a macrocell setup, the ik ik ik ik ik ik ik

proposed system also performs well. Applying the matrix⇣inversion lemma ⌘ backwards, we then get
1/2 2 i+n 1 1
||Aik Wik ||F = Tr ik ik . Further,

A PPENDIX 2 ⇣ ⌘ (a)
1/2 1
A ik W ik = Tr (I + Cik ) Dik  Tr Cik1 Dik
F
A. Proof of Lemma 1 ✓⇣ ⌘ 1 ◆
H i+n 1 H i+n 2
For UE ik it holds that i+n = Tr F ik F F F
ik ⌫ r I, with equality if the UE
2 ik ik ik ik ik

does not experience any interference. Thus, the MSE weight h 1/2
i
ei = i+n
for that UE satisfies = Let F k ik F ik
✓⇣ ⌘ 1 ◆
1 i+n 1
Wik = I + ViHk HH i+n
Hik i Vik (30) = Tr eH F
F ei eH
F ei
F
ik i ik ik k ik ik k

1 H H ✓ ⇣ ⌘ ◆
I + 2 Vik Hik i Hik i Vik . (31) i+n 1
ei FeH F
e
1
eH
r = Tr ik F k ik i k F ik
BRANDT AND BENGTSSON: DISTRIBUTED CSI ACQUISITION AND COORDINATED PRECODING FOR TDD MULTICELL MIMO SYSTEMS 15

(b) ⇣ ⌘
 max Tr i+n
ik
1
⇧ik unique. The fact that (Aopt opt
ik , Wik ) is indeed a minimizer is
rank(⇧ik )=Nd clear, since each variable minimizes the objective function
⇣ 1
⌘ Nd Nd when the other variable is kept fixed. ⌅
i+n
 Nd 1 ik = i+n
 2
Mr ik r

where ⇧ik is a rank-Nd projection matrix. The inequal- R EFERENCES


ity (a) is due to the trace being an increasing function [1] D. Gesbert, S. Hanly, H. Huang, S. Shamai Shitz, O. Simeone, and
on the cone of positive definite matrices and the fact that W. Yu, “Multi-cell MIMO cooperative networks: A new look at inter-
1/2 1 1/2 1/2 1/2
Dik (I + Cik ) Dik D C 1 D . The inequality ference,” IEEE J. Sel. Areas Commun., vol. 28, no. 9, pp. 1380–1408,
⇣ ⌘ i1k ik ik 2010.
ei F
(b) holds since F eH F
ei e H is a rank-Nd projection
F [2] E. Björnson and E. Jorswieck, “Optimal resource allocation in coordi-
k ik k ik
nated multi-cell systems,” Foundations and Trends in Communications
matrix. and Information Theory, vol. 9, no. 2-3, pp. 113–381, 2013.
Now assume there are Ns  Nd interference-free di- [3] J. Jose, A. Ashikhmin, P. Whiting, and S. Vishwanath, “Channel estima-
mensions, and that the effective channel is fully contained tion and linear precoding in multiuser multiple-antenna TDD systems,”
IEEE Trans. Veh. Technol., vol. 60, no. 5, pp. 2102–2116, 2011.
in those. Let the eigenvalues of ik be {s+i+n m } and the [4] C. Shi, R. Berry, and M. Honig, “Bi-directional training for adaptive
eigenvalues of i+n i+n
ik be {m }. Let the eigenvalues be ordered beamforming and power control in interference networks,” IEEE Trans.
such that s+i+n
m = i+nm for all m 2 {1, . . . , Mr Ns }. Signal Process., vol. 62, no. 3, pp. 607–618, Feb. 2014.
s+i+n [5] M. Biguesh and A. B. Gershman, “Training-based MIMO channel
Consequently, m = sm + r2 and i+nm = r for all
2
estimation: a study of estimator tradeoffs and optimal training signals,”
m 2 {Mr Ns + 1, . . . , Mr }. Here, {sm } are the received IEEE Trans. Signal Process., vol. 54, no. 3, 2006.
signal powers in the interference-free subspace. Then, [6] J. Jose, N. Prasad, M. Khojastepour, and S. Rangarajan, “On robust
weighted-sum rate maximization in MIMO interference networks,” in
⇣ ⌘ X Mr ✓ ◆ Proc. IEEE Int. Conf. Commun. (ICC’11), Jun. 2011, pp. 1–6.
i+n 1 1 1 1
Tr ik = [7] J. Shin and J. Moon, “Weighted-sum-rate-maximizing linear transceiver
ik
m=1
i+n
m s+i+n
m filters for the K-user MIMO interference channel,” IEEE Trans. Com-
mun., vol. 60, no. 10, pp. 2776–2783, Oct. 2012.
MXr Ns ✓ ◆ M
X ✓ ◆
1 1 r
1 1 [8] H.-H. Lee, Y.-C. Ko, and H.-C. Yang, “On robust weighted sum rate
= + maximization for MIMO interfering broadcast channels with imperfect
m=1
i+n
m i+n
m r
2 sm + r2 channel knowledge,” IEEE Commun. Lett., vol. 17, no. 6, pp. 1156–
| {z } m=M|
r Ns +1
{z } 1159, 2013.
Interference dimensions Interference-free dimensions [9] M. Razaviyayn, M. Baligh, A. Callard, and Z.-Q.
Mr
X Mr
X Luo, “Robust transceiver design,” WO Patent Application
sm 1 Ns WO 2013/044 824 A1, 04, 2013. [Online]. Available:
= 2 s 2
! 2
= 2 http://patentscope.wipo.int/search/en/detail.jsf?docId=WO2013044824
m=Mr Ns +1 r (m + r) m=Mr Ns +1 r r
[10] F. Negro, I. Ghauri, and D. T. M. Slock, “Sum rate maximization in
the noisy MIMO interfering broadcast channel with partial CSIT via
as the {sm } grow large with respect to r.
2
⌅ the expected weighted MSE,” in Proc. Int. Symp. Wireless Commun.
Systems (ISWCS’12), 2012, pp. 576–580.
[11] D. Schmidt, C. Shi, R. Berry, M. Honig, and W. Utschick, “Com-
C. Proof of Proposition 2 parison of distributed beamforming algorithms for MIMO interference
networks,” IEEE Trans. Signal Process., vol. 61, no. 13, pp. 3476–3489,
It can be shown that all feasible points are regular, and 2013.
thus any minimizer of this non-convex problem satisfies [12] Q. Shi, M. Razavivayn, Z. Luo, and C. He, “An iteratively weighted
the KKT conditions [30, Ch. 3.3.1]. The expressions are MMSE approach to distributed sum-utility maximization for a MIMO
interfering broadcast channel,” IEEE Trans. Signal Process., vol. 59,
obtained
⇣ ⇣by forming the Lagrangian
⌘⌘ Lik⇣(Aik , Wik , ⌫⌘ik ) = no. 9, pp. 4331–4340, 2011.
H b b
Tr Wik I + Aik ik Aik 2Re Wik Fik AikH [13] J. Ren, K.-K. Wong, and J. Hou, “Robust analytical ZF optimization for
⇣ ⌘ MISO IFC,” IEEE Wireless Commun. Lett., vol. 2, no. 4, pp. 451–454,
Nd
log det (Wik ) + ⌫ik Tr Aik Wik AH ik 2 , setting 2013.
r
[14] P. Komulainen, A. Tölli, and M. Juntti, “Effective CSI signaling and
the complex partial derivatives to zero, and applying the decentralized beam coordination in TDD multi-cell MIMO systems,”
matrix inversion lemma. It now remains to find the optimal IEEE Trans. Signal Process., vol. 61, no. 9, pp. 2204–2218, May 2013.
⌫iopt 0. If the constraint is satisfied for ⌫iopt = 0, the [15] M. Guillaud, D. T. M. Slock, and R. Knopp, “A practical method for
k k
wireless channel reciprocity exploitation through relative calibration,”
problem is solved and the form is identical to the solution in Proc. 8th Int. Symp. Signal Process. Applicat. (ISSPA’05), 2005, pp.
in Sec. II-A1. Otherwise, let b ik = Lik ⇤ik LH ik and 403–406.
b i+n = Li+n ⇤i+n Li+n,H be eigenvalue decompositions. Then [16] R. Rogalin, O. Bursalioglu, H. Papadopoulos, G. Caire, A. Molisch,
ik ik ik ik
1/2 A. Michaloliakos, V. Balan, and K. Psounis, “Scalable synchronization
as can be seen in (33), ||Aik Wik ||2F is decreasing in ⌫ik , and reciprocity calibration for distributed multiuser MIMO,” IEEE
opt Trans. Wireless Commun., vol. 13, no. 4, pp. 1815–1831, Apr. 2014.
and the ⌫ik which satisfies the inequality constraint with
[17] H. Cox, R. M. Zeskind, and M. M. Owen, “Robust adaptive beamform-
equality can be found by bisection. A natural starting point ing,” IEEE Trans. Acoust., Speech, Signal Process., vol. 35, no. 10, pp.
for the lower value in the bisection is ⌫ilower
k
= 0. Using the 1365–1376, 1987.
same argument as in the proof for Proposition 1, we have [18] B. D. Carlson, “Covariance matrix estimation errors and diagonal
1/2
2 loading in adaptive arrays,” IEEE Trans. Aerosp. Electron. Syst., vol. 24,
that A ik W i k  ⇣ Nd ⌘ and we can enforce no. 4, pp. 397–401, Jul. 1988.
F b i+n +⌫i I
Mr i k k [19] R. Wu, Z. Bao, and Y. Ma, “Control of peak sidelobe level in adaptive
with ⌫iupper arrays,” IEEE Trans. Antennas Propag., vol. 44, no. 10, pp. 1341–1347,
1/2 Nd
||Aik Wik ||2F r. The optimal
2
 2 =
⌫ik =⌫iupper r k 1996.
k
[20] J. Li, P. Stoica, and Z. Wang, “On robust Capon beamforming and
⌫iopt
k
can now be found using bisection on (33), given the diagonal loading,” IEEE Trans. Signal Process., vol. 51, no. 7, pp. 1702–
bounds. With ⌫iopt
k
found, the minimizer can be identified as 1715, 2003.
16 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, ACCEPTED FOR PUBLICATION, APRIL 2015

✓ ⇣ ⌘ ✓ ⇣ ⌘ ◆◆
2 2 1
1/2
A i k W ik = Tr A ik W ik A H bH
= Tr F b i + ⌫i I bi
F bH
I+F b i+n + ⌫i I bi
F =
ik ik k k k ik ik k k
F
⇣ 1/2 i+n,H 1/2

b bH + ⇤i+n b b H i+n i+n
2 bi F b H Li (⇤i + ⌫i I) 2 LH F
Tr LHik Fik Fik Lik (⇤ik + ⌫ik I) i k + ⌫ ik I L ik F k ik k k k i k ik F ik L ik ⇤ ik + ⌫ i k I
h i
LH b bH
ik F ik F ik L ik
Mr
X
mm 1 H b b H i+n i+n 1/2 2
= 2 + (⇤ i k
+ ⌫ i k
I) L i k
F i k
F i k
L i k
⇤ i k
+ ⌫ i k
I
[⇤ik ]mm + ⌫ik F
m=1
h i
LH b bH h i
i k F ik F ik L ik
Mr
X Mr X
X Mr 2
mm 1 1 H b b H i+n
= 2 + 2 ⇥
⇣ ⇤ ⌘ L F F L
ik ik ik ik
m=1 [⇤ik ]mm + ⌫ik m=1 p=1 [⇤ik ]mm + ⌫ik ⇤i+nik + ⌫ ik
pp
mp

(33)

[21] S. Vorobyov, A. Gershman, and Z.-Q. Luo, “Robust adaptive beamform- [39] L. N. Trefethen and D. Blau III, Numerical Linear Algebra. SIAM,
ing using worst-case performance optimization: a solution to the signal 1997.
mismatch problem,” IEEE Trans. Signal Process., vol. 51, no. 2, pp. [40] D. Love and R. Heath, “Limited feedback unitary precoding for spatial
313–324, Feb. 2003. multiplexing systems,” IEEE Trans. Inf. Theory, vol. 51, no. 8, pp. 2967–
[22] S. Shahbazpanahi, A. Gershman, Z.-Q. Luo, and K. M. Wong, “Robust 2976, 2005.
adaptive beamforming for general-rank signal models,” IEEE Trans. [41] R. Brandt, “Simulation Matlab code.” [Online]. Available:
Signal Process., vol. 51, no. 9, pp. 2257–2269, Sep. 2003. http://github.com/rasmusbrandt/tvt2015_distributed_CSI_acquisition
[23] K. Zarifi, S. Shahbazpanahi, A. Gershman, and Z.-Q. Luo, “Robust blind [42] T. Liu and C. Yang, “On the feasibility of linear interference alignment
multiuser detection based on the worst-case performance optimization for MIMO interference broadcast channels with constant coefficients,”
of the MMSE receiver,” IEEE Trans. Signal Process., vol. 53, no. 1, pp. IEEE Trans. Signal Process., vol. 61, no. 9, pp. 2178–2191, 5 2013.
295–305, Jan. 2005. [43] K. Gomadam, V. R. Cadambe, and S. Jafar, “A distributed numerical
[24] M. Xu, D. Guo, and M. L. Honig, “Joint bi-directional training of approach to interference alignment and applications to wireless interer-
nonlinear precoders and receivers in cellular networks,” arXiv:1408.6924 ence networks,” IEEE Trans. Inf. Theory, vol. 57, no. 6, pp. 3309–3322,
[cs.IT], Aug. 2014. 2011.
[25] B. Ghimire, G. Auer, and H. Haas, “Busy burst enabled coordinated [44] L.-C. Wang and C.-J. Yeh, “3-cell network MIMO architectures with
multipoint network with decentralized control,” IEEE Trans. Wireless sectorization and fractional frequency reuse,” IEEE J. Sel. Areas Com-
Commun., vol. 10, no. 10, pp. 3310–3320, 2011. mun., vol. 29, no. 6, pp. 1185–1199, Jun. 2011.
[45] 3GPP, “TR 25.814, Physical layer aspects for evolved universal terres-
[26] S. Christensen, R. Agarwal, E. Carvalho, and J. Cioffi, “Weighted sum-
trial radio access (Release 7),” 3GPP, Tech. Rep., 2006.
rate maximization using weighted MMSE for MIMO-BC beamforming
[46] ——, “TR 36.814, Further advancements for E-UTRA physical layer
design,” IEEE Trans. Wireless Commun., vol. 7, no. 12, pp. 4792–4799,
aspects (Release 9),” 3GPP, Tech. Rep., 2010.
Dec. 2008.
[47] ——, “TR 25.996, Spatial channel model for multiple input multiple
[27] F. Negro, S. P. Shenoy, I. Ghauri, and D. T. M. Slock, “Weighted sum output (MIMO) simulations (Release 11),” 3GPP, Tech. Rep., 2012.
rate maximization in the MIMO interference channel,” Proc. IEEE Int.
Symp. Personal, Indoor, Mobile Radio Commun. (PIMRC’10), pp. 684–
689, 2010.
[28] F. P. Kelly, A. K. Maulloo, and D. K. H. Tan, “Rate control for
communication networks: Shadow prices, proportional fairness and
Rasmus Brandt (S’10) was born in Uppsala, Swe-
stability,” J. Operational Research Soc., vol. 49, no. 3, pp. 237–252,
den, in 1985. He received the Tech.Lic. degree in
1998.
Electrical Engineering from KTH Royal Institute of
[29] Y.-F. Liu, Y.-H. Dai, and Z.-Q. Luo, “Coordinated beamforming for Technology, Stockholm, Sweden, in 2014 and the
MISO interference channel: Complexity analysis and efficient algo- M.Sc. degree in Engineering Physics from Uppsala
rithms,” IEEE Trans. Signal Process., vol. 59, no. 3, pp. 1142–1157, University, Uppsala, Sweden, in 2010. As part of
Mar. 2011. his M.Sc. program, he spent the academic year
[30] D. Bertsekas, Nonlinear programming. Athena Scientific, 2006. 2007/2008 at Queen’s University, Kingston, Canada
[31] M. Razaviyayn, M. Hong, and Z. Luo, “A unified convergence analysis and the summer of 2009 as an IAESTE Intern at
of block successive minimization methods for nonsmooth optimization,” CMC Microsystems, Kingston, Canada. He is cur-
SIAM J. Optimization, vol. 23, no. 2, pp. 1126–1153, 2013. rently working towards the Ph.D. degree in Electrical
[32] T. A. Levanen, J. Pirskanen, T. Koskela, J. Talvitie, and M. Valkama, Engineering at the KTH Royal Institute of Technology. His research interests
“Radio interface evolution towards 5G and enhanced local area commu- include statistical signal processing for wireless communications as well as
nications,” IEEE Access, vol. 2, pp. 1005–1029, 2014. resource allocation and interference alignment for interference management.
[33] N. Jindal and A. Lozano, “A unified treatment of optimum pilot overhead Mr. Brandt is a Publicity Chair of the 16th IEEE International Workshop on
in multipath fading channels,” IEEE Trans. Commun., vol. 58, no. 10, Signal Processing Advances in Wireless Communications (SPAWC’15).
pp. 2939–2948, Oct. 2010.
[34] E. Lähetkangas, K. Pajukoski, E. Tiirola, G. Berardinelli, I. Harjula, and
J. Vihriälä, “On the TDD subframe structure for beyond 4G radio access
network,” in Future Network and Mobile Summit, Jul. 2013, pp. 1–10.
[35] J. Jose, A. Ashikhmin, T. L. Marzetta, and S. Vishwanath, “Pilot
contamination and precoding in multi-cell TDD systems,” IEEE Trans.
Wireless Commun., vol. 10, no. 8, pp. 2640–2651, 2011.
[36] E. Dahlman, S. Parkvall, and J. Sköld, 4G LTE/LTE-Advanced for
Mobile Broadband. Academic Press, 2011.
[37] H. Cox, “Resolving power and sensitivity to mismatch of optimum array
processors,” J. Acoustical Soc. America, vol. 54, no. 3, pp. 771–785,
1973.
[38] S. M. Kay, Fundamentals of Statistical Signal Processing: Estimation
Theory. Prentice Hall, 1993.
BRANDT AND BENGTSSON: DISTRIBUTED CSI ACQUISITION AND COORDINATED PRECODING FOR TDD MULTICELL MIMO SYSTEMS 17

Mats Bengtsson (M’00–SM’06) received the M.S.


degree in computer science from Linköping Uni-
versity, Linköping, Sweden, in 1991 and the Tech.
Lic. and Ph.D. degrees in electrical engineering from
the KTH Royal Institute of Technology, Stockholm,
Sweden, in 1997 and 2000, respectively.
From 1991 to 1995, he was with Ericsson Telecom
AB Karlstad. He currently holds a position as Asso-
ciate Professor at the Signal Processing department,
School of Electrical Engineering, KTH. His research
interests include statistical signal processing and its
applications to communications, multi-antenna processing, cooperative com-
munication, radio resource management, and propagation channel modelling.
Dr. Bengtsson served as Associate Editor for the IEEE Transactions on Signal
Processing 2007-2009 and was a member of the IEEE SPCOM Technical
Committee 2007-2012.

You might also like