Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

1890 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 66, NO.

7, APRIL 1, 2018

Sparse Activity Detection for Massive Connectivity


Zhilin Chen , Student Member, IEEE, Foad Sohrabi , Student Member, IEEE, and Wei Yu , Fellow, IEEE

Abstract—This paper considers the massive connectivity appli- devices connected to each cellular base-station (BS) may be in
cation in which a large number of devices communicate with a the order of 104 to 106 ; (ii) the traffic pattern of each device
base-station (BS) in a sporadic fashion. Device activity detection
may be sporadic—at any given time only a small fraction of
and channel estimation are central problems in such a scenario.
Due to the large number of potential devices, the devices need to be potential devices are active. For such a network, accurate user
assigned non-orthogonal signature sequences. The main objective activity detection and channel estimation are crucial for estab-
of this paper is to show that by using random signature sequences lishing successful communications between the devices and the
and by exploiting sparsity in the user activity pattern, the joint user BS.
detection and channel estimation problem can be formulated as a To identify active users and to estimate their channels, each
compressed sensing single measurement vector (SMV) or multiple
measurement vector (MMV) problem depending on whether the user must be assigned a unique signature sequence. However,
BS has a single antenna or multiple antennas and efficiently solved due to the large number of potential devices but the limited co-
using an approximate message passing (AMP) algorithm. This pa- herence time and frequency dimensions in the wireless fading
per proposes an AMP algorithm design that exploits the statistics channel, the signature sequences for all users cannot be mutually
of the wireless channel and provides an analytical characterization orthogonal. Non-orthogonal signature sequences superimposed
of the probabilities of false alarm and missed detection via state
evolution. We consider two cases depending on whether or not the in the pilot stage causes significant multi-user interference, e.g.,
large-scale component of the channel fading is known at the BS when a simple matched filtering or correlation operation is ap-
and design the minimum mean squared error denoiser for AMP plied at the BS for user activity detection and channel estimation.
according to the channel statistics. Simulation results demonstrate A key observation of this paper is that the sporadic nature of the
the substantial advantage of exploiting the channel statistics in traffic leads to sparse user transmission patterns. By exploiting
AMP design; however, knowing the large-scale fading component
does not appear to offer tangible benefits. For the multiple-antenna sparsity and by formulating the detection and estimation prob-
case, we employ two different AMP algorithms, namely the AMP lem with independent and identically distributed (i.i.d.) random
with vector denoiser and the parallel AMP-MMV, and quantify the non-orthogonal pilots as a compressed sensing problem, this
benefit of deploying multiple antennas. multi-user interference problem can be overcome, and highly
Index Terms—Device activity detection, channel estimation, reliable activity detection and accurate channel estimation can
approximate message passing, compressed sensing, Internet of be made possible. In the compressed sensing terminology, when
Things (IoT), machine-type communications (MTC). the BS is equipped with a single antenna, activity detection and
channel estimation can be formulated as a single measurement
I. INTRODUCTION vector (SMV) problem; when the BS has multiple antennas, the
NE of the key requirements for the next-generation wire- problem can be formulated as a multiple measurement vector
O less cellular networks is to provide massive connectivity
for machine-type communications (MTC), envisioned to sup-
(MMV) problem.
This paper proposes the use of compressed sensing tech-
port diverse applications such as environment sensing, event niques for the joint user activity detection and channel estima-
detection, surveillance and control [1], [2]. Machine-centric tion problem. Due to the large-scale nature of massive device
communications have two distinctive features as compared to communications, this paper adopts the computationally efficient
conventional human-centric communications: (i) the overall approximate message passing (AMP) algorithm [3] as the main
system needs to support massive connectivity—the number of technique. AMP is an iterative thresholding method with a key
feature that allows analytic performance characterization via the
so-called state evolution. The main contributions of this paper
Manuscript received April 30, 2017; revised September 18, 2017 and Decem-
ber 11, 2017; accepted January 2, 2018. Date of publication January 23, 2018; are: (i) a novel AMP algorithm design for user activity detection
date of current version February 15, 2018. The associate editor coordinating the that exploits the statistical information of the wireless channel,
review of this manuscript and approving it for publication was Prof. Zhi Quan. and (ii) a characterization of the probabilities of false alarm and
This work was supported by the Natural Sciences and Engineering Research
Council of Canada through a Discovery Grant and through a Steacie Memorial missed detection for both SMV and MMV scenarios.
Fellowship. This paper was presented in part at the IEEE International Con-
ference on Acoustics, Speech and Signal Processing, New Orleans, LA, USA,
March 2017. (Corresponding author: Wei Yu.) A. Related Work
The authors are with The Edward S. Rogers Sr. Department of Electrical and
Computer Engineering, University of Toronto, Toronto, ON M5S 3G4, Canada The user activity detection problem for massive connectiv-
(e-mail: zchen@comm.utoronto.ca; fsohrabi@comm.utoronto.ca; weiyu@ ity has been studied from information theoretical perspectives
comm.utoronto.ca). in [2], [4]. From an algorithmic point of view, the problem
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. is closely related to sparse recovery in compressed sensing and
Digital Object Identifier 10.1109/TSP.2018.2795540 has been studied in a variety of wireless communication settings.
1053-587X © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications standards/publications/rights/index.html for more information.
CHEN et al.: SPARSE ACTIVITY DETECTION FOR MASSIVE CONNECTIVITY 1891

For example, assuming no prior knowledge of the channel state significantly enhance the Bayesian AMP algorithm. Moreover,
information (CSI), joint user activity detection and channel es- analytical performance characterization can be obtained by us-
timation is considered in [5]–[8]. Specifically, [5] proposes an ing state evolution. Finally, the AMP algorithm can be extended
efficient greedy algorithm based on orthogonal matching pur- to the multiple-antenna case.
suit for sporadic multi-user communication. By exploiting the
statistics of channel path-loss and the joint sparsity structures, B. Main Contributions
[6] proposes a modified Bayesian compressed sensing algorithm
This paper considers the user activity detection and channel
in a cloud radio-access network. In the context of orthogonal fre-
estimation problem in the uplink of a single-cell network with
quency division multiplexing (OFDM) systems, [7] introduces a
a large number of potential users, but at any given time slot
one-shot random access protocol and employs the basis pursuit
only a small fraction of them are active. To exploit the sparsity
denoising detection method with a detection error bound based
in user activity pattern, this paper formulates the problem as a
on the restricted isometry property. The performance of such
compressed sensing problem and proposes the use of random
schemes in a practical setting is illustrated in [7], [8]. When
signature sequences and the computationally efficient AMP al-
perfect CSI is assumed, joint user activity and data detection for
gorithm for device activity detection. This paper provides the
code division multiple access systems (CDMA) is investigated
design and analysis of AMP for both cases in which the BS is
in [9], [10], where [9] designs the sparsity-exploiting maximum
equipped with a single antenna and with multiple antennas.
a posteriori detector by accounting for both sparsity and finite-
This paper considers two different scenarios: (i) when the
alphabet constraints of the signal, and [10] proposes a greedy
large-scale fading coefficients of all user are known and the
block-wise orthogonal least square algorithm by exploiting the
detector is designed based on the statistics of fast fading com-
block sparsity among several symbol durations. Differing from
ponent only; (ii) when the large-scale fading coefficients are
most of the above works that consider cellular systems, [11],
not known and the detector is designed based on the statistics of
[12] study the user activity detection in wireless ad hoc net-
both fast fading and large-scale fading components as a function
works, where each node in the system identifies its neighbor
of the distribution of device locations in the cell. The proposed
nodes simultaneously. The authors of [11] propose a scalable
AMP-based detector exploits the statistics of the wireless chan-
compressed neighbor discovery scheme that employs random
nel by specifically designing the minimum mean squared error
binary signatures and group testing based detection algorithms.
(MMSE) denoiser. This paper provides analytical characteriza-
In [12], the authors propose a more dedicated scheme that uses
tion of the probabilities of false alarm and missed detection via
signatures based on Reed-Muller code and a chirp decoding
the state evolution for both scenarios.
algorithm to achieve a better performance.
For the case where the BS is equipped with a single antenna,
In contrast to the aforementioned works, this paper adopts the
numerical results indicate that: (i) the analytic performance
more computationally efficient AMP algorithm for user activity
characterization via state evolution is very close to the simu-
detection and channel estimation, which is more suitable for
lation; (ii) exploiting the statistical information of the channel
large-scale networks with a large number of devices. The AMP
and user activity can significantly improve the detector perfor-
algorithm is first proposed in [3] as a low-complexity iterative al-
mance; (iii) knowing the large-scale fading coefficient actually
gorithm for conventional compressed sensing with real-valued
does not bring substantial performance improvement as com-
signals and real-valued measurements. A framework of state
pared to the case that only the statistical information about the
evolution that tracks the performance of AMP at each iteration
large-scale fading is available.
is introduced in [3]. The AMP algorithm is then extended along
For the case where the BS is equipped with multiple anten-
different directions. For example, [13] generalizes the AMP al-
nas, this paper considers both the AMP with vector denoiser
gorithm to a broad family of iterative thresholding algorithms,
[18] and the parallel AMP-MMV [17]. For the AMP with vec-
and provides a rigorous proof of the framework of the state
tor denoiser, this paper exploits wireless channel statistics in
evolution. To deal with complex-valued signals and measure-
denoiser design and further analytically characterizes the prob-
ments, [14] proposes a complex AMP algorithm (CAMP). By
abilities of false alarm and missed detection based on the state
exploiting the input and output distributions, a generalized AMP
evolution. For the parallel AMP-MMV algorithm, which is more
(GAMP) algorithm is designed in [15]. Similarly, a Bayesian
suitable for distributed computation, performance characteriza-
approach is used to design the AMP algorithm in [16], [17] by
tion is more difficult to obtain. Simulation results show that: (i)
accounting for the input distribution. For the compressed sens-
having multiple antennas at the BS can significantly improve the
ing problem with multiple signals sharing joint sparsity, i.e., the
detector performance; (ii) the predicted performance of AMP
MMV problem, [18] designs an AMP algorithm via a vector
with vector denoiser is very close to its simulated performance;
form of message passing; and [19] designs the AMP-MMV al-
(iii) AMP with vector denoiser and parallel AMP-MMV achieve
gorithm by directly using message passing over a multi-frame
approximately the same performance.
factor graph.
Although the use of the AMP algorithm for user activity
detection has been previously proposed in [20], the statisti- C. Paper Organization and Notations
cal information of the channel is not exploited in the prior The remainder of this paper is organized as follows.
work; also performance analysis is not yet available. This paper Section II introduces the system model. Section III introduces
makes progress by showing that exploiting channel statistics can the AMP algorithms for both SMV and MMV problems.
1892 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 66, NO. 7, APRIL 1, 2018

Section IV considers user activity detection and channel es- where hn ∈ C 1×M is the channel vector between user n and
timation when the BS has a single-antenna, while Section V the BS, W ∈ C L ×M is the effective complex Gaussian noise,
considers the multiple-antenna case. Simulation results are pro- and X  [rT1 , · · · , rTN ]T ∈ C N ×M where rn  an hn ∈ C 1×M
vided in Section VI. Conclusions are drawn in Section VII. is the nth row vector of X. We also use cm ∈ C N ×1 to represent
Throughout this paper, upper-case and lower-case letters de- the mth column vector of X, i.e., X = [c1 , · · · , cM ]. Note that
note random variables and their realizations, respectively. Bold- an indicates whether the entire row vector rn is zero or not. In
face lower-case letters denote vectors. Boldface upper-case let- other words, columns of X (i.e., cm ) share the same sparsity
ters denote matrices or random vectors, where context should pattern.
make the distinction clear. Superscripts (·)T , (·)∗ and (·)−1 de- We are interested in detecting the user activity an as well
note transpose, conjugate transpose, and inverse operators, re- as in estimating the channel gains of the active users, which
spectively. Further, I denotes identity matrix with appropriate correspond to the non-zero rows of the matrix X, based on the
dimensions, E[·] denotes expectation operation,  denotes def- observation Y in the regime where N  L. The problem of
inition, | · | denotes either the magnitude of a complex variable recovering X from Y is in the form of the MMV problem in
or the determinant of a matrix, depending on the context, and compressed sensing.
 · 2 denotes the 2 norm. A key observation of this paper is that the design of recovery
algorithm can be significantly enhanced by taking advantage
II. SYSTEM MODEL of the knowledge about the statistical information of x or X.
Consider the uplink of a wireless cellular system with one Toward this end, we provide a model for the distribution of the
BS located at the center and N single-antenna devices located entries of x, and the distribution of the rows of X. Since x is a
uniformly in a circular area with radius R, but in each coherence special case of X when M = 1, we focus on the model for X.
block only a subset of users are active. Let an ∈ {1, 0} indicate We assume that each user accesses the channel with a small
whether or not user n is active. For the purpose of channel prob- probability λ in an i.i.d. fashion, i.e., Pr(an = 1) = λ, ∀n, and
ing and user identification, user n is assigned a unique signature there is no correlation between different users’ channels, so that
sequence sn = [s1n , s2n , · · · , sL n ]T ∈ C L ×1 , where L is the the row vectors of X follow a mixture distribution
length of the sequence. This paper assumes that the signature pR|G (rn |gn ) = (1 − λ)δ0 + λpH |G (rn |gn ), (3)
sequence sn is generated according to i.i.d. complex Gaussian
where δ0 denotes the point mass measure at 0, pH |G denotes the
distribution with zero mean and variance 1/L such that each se-
probability density function (pdf) of the channel vector H given
quence is normalized to have unit power, and the normalization
prior information G, which has a pdf pG , and gn denotes the
factor 1/L is incorporated into the transmit power.
prior information for user n. Note that we use H to denote the
We consider a block-fading channel model where the channel
random channel vector and hn to denote its realization. Based
is static in each block. In this paper, we consider two cases
on (3), the pdf of the entries of x is
where the BS is equipped with either a single antenna or multiple
antennas. When the BS has only one antenna, the received signal pX |G (xn |gn ) = (1 − λ)δ0 + λpH |G (xn |gn ). (4)
at the BS can be modeled as
To model the distribution of H, we assume that all users are

N
randomly and uniformly located in a circular coverage area of
y= an sn hn + w  Sx + w, (1) radius R with the BS at the center, and the channels between
n =1
the users and the BS follow an independent distribution that de-
where hn ∈ C is the channel coefficient between user n and the pends on the distance. More specifically, H includes path-loss,
BS, w ∈ C L ×1 is the effective complex Gaussian noise whose shadowing, and Rayleigh fading. The path-loss between a user
variance σw2 depends on the background noise power normal- and the BS is modeled (in dB) as α + β log10 (d), where d is the
ized by the user transmit power. Here, x  [x1 , x2 , · · · , xN ]T ∈ distance measured in meter, α is the fading coefficient at d = 1,
C N ×1 where xn  hn an , and S  [s1 , s2 , · · · , sN ] ∈ C L ×N . and β is the path-loss exponent. The shadowing (in dB) follows
We aim to recover the non-zero entries of x based on the a Gaussian distribution with zero mean and variance σSF 2
. The
received signal y. We are interested in the regime where the Rayleigh fading is assumed to be i.i.d. complex Gaussian with
number of potential users is much larger than the pilot sequence zero mean and unit variance across all antennas.
length, i.e., N  L, so that the user pilot sequences cannot be The large-scale fading, which includes path-loss and shad-
mutually orthogonal; but due to the sporadic traffic, only a small owing, is denoted as G, whose pdf pG can be modeled by the
number of devices transmit in each block, resulting in a sparse 2
distribution of BS-user distance and shadowing parameter σSF .
x. The recovering of x for the single antenna case is in the form This paper considers both the case where only the statistics of
of the SMV problem in compressed sensing. the the large-scale fading, i.e., pG , is known as well as the case
This paper also considers the case where the BS is equipped where the exact large-scale fading coefficient gn is known at the
with M antennas. In this case, the received signal Y ∈ C L ×M BS. The latter case is motivated by the scenario in which the
at the BS can be expressed in matrix form as devices are stationary, so that the path-loss and shadowing can
be estimated and stored at the BS as prior information. When

N
Y= an sn hn + W  SX + W, (2) gn is known, pH |G captures the distribution of the Rayleigh fad-
n =1 ing component. When only p(G) is known, we drop G and gn
CHEN et al.: SPARSE ACTIVITY DETECTION FOR MASSIVE CONNECTIVITY 1893

from pH |G (hn |gn ), and write it as pH (hn ), which captures the B. AMP for MMV problem
distribution of both large-scale fading and Rayleigh fading. 1) AMP With Vector Denoiser: One extension of the AMP
algorithm to solve the MMV problem in (2) is proposed in [18],
III. AMP ALGORITHM which employs a vector denoiser that operates on each row
AMP is an iterative algorithm that recovers sparse signal for vector of the matched filtered output:
compressed sensing. We introduce the AMP framework for both Xt+1 = η(S∗ Zt + Xt , g, t), (8)
the SMV and the MMV problems in this section.
N t  ∗ t
Zt+1 = Y − SXt+1 + Z η (S Z + Xt , g, t) , (9)
L
A. AMP for SMV problem
where η(·, g, t)  [ηt (·, g1 ), · · · , ηt (·, gN )]T with ηt (·, gn ) :
AMP is first proposed for the SMV problem in [3]. Starting
C 1×M → C 1×M being a vector denoiser that operates on the
with x0 = 0 and z0 = y, AMP proceeds at each iteration as
nth row vector of S∗ Zt + Xt , and the other notations are simi-
xt+1 = η(S∗ zt + xt , g, t), (5) lar to those used in (5) and (6). The state evolution of the AMP
algorithm for the MMV problem also has a similar form as
N t  ∗ t
zt+1 = y − Sxt+1 + z η (S z + xt , g, t) , (6) N  t t ∗
L Σt+1 = σw2 I + E D (D ) , (10)
L
where g  [g1 , · · · , gN ]T , and t = 0, 1, · · · is the index of it-
where Dt  (ηt (R + Ut , G) − R) ∈ C M ×1 , with random
T
eration, xt is the estimate of x at iteration t, zt is the residual,
vector R following pR|G and random vector Ut following
η(·, g, t)  [ηt (·, g1 ), · · · , ηt (·, gN )]T where ηt (·, gn ) : C →
CN (0, Σt ). The expectation is taken over R, Ut , and G. To
C is an appropriately designed non-linear function known as
minimize the estimation error at each iteration, we can also
denoiser that operates on the nth entry of the input vector,
design the vector denoiser ηt (·, ·) via the Bayesian approach.
η  (·)  [ηt (·, g1 ), · · · , ηt (·, gN )]T where ηt (·, gn ) is the first or-
2) Parallel AMP-MMV: A different extension of the AMP
der derivative of ηt (·, gn ) with respect to the first argument, and
algorithm for dealing with the MMV problem is the parallel
· is averaging operation over all entries of a vector. Note that
AMP-MMV algorithm proposed in [19]. The basic idea is to
the third term in the right hand side of (6) is the correction term
solve the MMV problem iteratively by using multiple parallel
known as the “Onsager term” from statistical physics.
AMP-SMVs then exchanging soft information between them.
In the AMP algorithm, the matched filtered output x̃t 
∗ t Parallelization allows distributed implementation of the algo-
S z + xt can be modeled as signal x plus noise (including
rithm, which can be computationally advantageous, especially
multiuser interference), i.e., x̃t = x + vt , where vt is Gaussian
when the number of antennas is large.
due to the correction term. The denoiser is typically designed to
The outline of the parallel AMP-MMV algorithm is illustrated
reduce the estimation error at each iteration. In the compressed
in Algorithm 1 which operates on a per-antenna basis, i.e., on
sensing literature, the prior distribution of x is usually assumed
the columns of X and Z, denoted as cm and zm respectively, and
to be unknown. In this case, a minimax framework over the
where η(·, g, t, i, m)  [ηt,i,m (·, g1 ), · · · , ηt,i,m (·, gN )]T is the
worst case x leads to a soft thresholding denoiser [21]. When
denoiser used for the mth antenna in the iteration (t, i). Note
the prior distribution of x is known, the Bayesian framework
that here we add index i and m in the notation of denoiser,
then can be used to account for the prior information on x [16].
ηt,i,m (·, gn ), to indicate the index of outer iteration and the
In this paper, we adopt the Bayesian approach and design the
index of SMV stage, respectively. In the first phase which is
MMSE denoiser for the massive connectivity setup as shown in

called the (into)-phase, the messages π n m , are calculated and


the next section.
passed to the mth AMP-SMV stage. These messages convey the
The AMP algorithm can be analyzed in the asymptotic regime
current belief about the probability of being active for each user.
where L, N → ∞ with fixed N/L via the state evolution, which

predicts the per-coordinate performance of the AMP algorithm In the first iteration, we have π n m = λ, ∀n, m, since no further
at each iteration as follows information is available. In the next phase, which is called the
(within)-phase, the conventional AMP algorithm is applied to
N   the received signal of each antenna. Note that the denoiser in
2
τt+1 = σw2 + E |ηt (X + τt V, G) − X|2 , (7)
L the AMP algorithm is a function of the current belief about the
activity of the users which is obtained based on the information
where τt is referred to as the state, X, V , and G are random
sharing between all M AMP-SMV stages. Finally, in the (out)-
variables with X following pX |G , V following the complex
phase, the estimate of channel gains is used to refine the belief
Gaussian distribution with zero mean and unite variance, and G
about the activity of the users.
following pG , and the expectation is taken over all X, V , and
G. We denote X̃ t  X + τt V . The random variables X, V , G,
and X̃ t capture the distributions of the entries of x, entries of IV. USER ACTIVITY DETECTION: SINGLE-ANTENNA CASE
vt (up to a factor τt ), the
 prior information gn , and entries of A main point of this paper is that exploiting the statistics of
x̃t , respectively, with E |ηt (X̃ t , G) − X|2 characterizing the the wireless channel can significantly enhance detector perfor-
per-coordinate MSE of the estimate of x at iteration t. mance. This section proposes an MMSE denoiser design for
1894 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 66, NO. 7, APRIL 1, 2018

and R as
Algorithm 1: Parallel AMP-MMV Method [19].
 

1: Initialize π n m = 0.5, ∀n, m. 40 2(ln 10)2 σSF
2
2 ln(10)α
a= √ exp − ,
2: for i = 1 to I do R2 β π β2 β
Execute the (into)-phase: √
 −10 2 −α − β log10 (R) 20


π nm =
λ m  = m π n m 
, ∀n, m b= , c= − .
3:
(1−λ)
  (ln 10)σSF 2σSF βb
m  = m (1− π n m )+λ m  = m π n m 
Execute the (within)-phase:
Proof: See Appendix A. 
4: for m = 1 to M do
Proposition 2: Let hn denote the channel coefficient which
5: Initialize c0m = 0, z0m = ym .
contains both the large-scale fading gn and Rayleigh fading. If
6: for t = 0 to T do
∗ t only pG (gn ) is known at the BS, the pdf of hn is given by
7: m = η(S zm + cm , g, t, i, m),
ct+1 t

8: zm = ym − Scm + NL ztm η  (S∗ ztm


t+1 t+1 ∞  
a −γ −2 −|hn |2
+ cm , g, t, i, m) ,
t pH (hn ) = gn Q(gn ) exp dgn . (12)
0 π gn2
9: end for
10: end for If gn is known at the BS, the pdf of hn given gn is
Execute the (out)-phase:  

11: Calculate π n m , ∀n, m, the probability of user n 1 −|hn |2
pH |G (hn |gn ) = exp . (13)
being active based on the decision at the mth πgn2 gn2
AMP-SMV stage.
12: end for Proof: See Appendix B . 
Note that for the first scenario, the channel distribution (12)
only depends on a few parameters such as the path-loss expo-
nent in the path-loss model and the standard deviation in the
the AMP algorithm for the wireless massive connectivity prob-
shadowing model, which are assumed to be known and can
lem that specifically takes wireless channel characteristics into
be estimated in practice. For the second scenario, the channel
consideration in the single-antenna case. Two scenarios are con-
distribution (13) is just a Rayleigh fading model parameterized
sidered: the large-scale fading gn of each user is either directly
by the large-scale fading. The large-scale fading information
available or only its statistics is available at the BS. This section
can be obtained by tracking the estimated channel over a rea-
further studies the optimal detection strategy, and analyzes the
sonable period. This second scenario is applicable to the case
probabilities of false alarm and missed detection by using the
where the users are mostly stationary, so the large-scale fading
state evolution of the AMP algorithm.
changes only slowly over time. It is worth noting that although
this paper restricts attention to the Rayleigh fading model, the
A. MMSE Denoiser for AMP Algorithm approach developed here is equally applicable for Rician or any
other statistical channel model.
In the scenario where only the statistics about the large-scale
In the following, we design the MMSE denoisers for the AMP
fading is known at the BS, the distributions of the channel co-
algorithm by exploiting pH (hn ) and pH |G (hn |gn ).
efficients pH (hn ) are independent and identical for all devices.
1) With Statistical Knowledge of Large-Scale Fading Only:
In the scenario where the devices are stationary and their path-
Since gn is unknown and only the distribution pG (gn ) is avail-
loss and shadowing coefficients can be estimated and thus the
able at the BS, the denoiser ηt (·, gn ) reduces to ηt (·), which
exact large-scale fading is known at the BS, the distributions
indicates that the denoiser for each entry of the matched filtered
of the channel coefficients are of the form pH |G (hn |gn ), which
output is the same. By using (12), the pdf of the entries of x can
are complex Gaussian with variance parameterized by gn , and
be expressed as
are independent but not identical across the devices. To de-
rive the MMSE denoisers via the Baysian approach for both ∞
exp(−|xn |2 gn−2 )
cases, we first characterize the distributions pG (gn ), pH (hn ) pX (xn ) = (1 − λ)δ0 + dgn . (14)
and pH |G (hn |gn ) as follows. 0 πgnγ +2 /(aλQ(gn ))
Proposition 1: Consider a circular wireless cellular coverage The MMSE denoiser is given by the conditional expectation, i.e.,
area of radius R with BS at the center and uniformly distributed ηt (x̃tn ) = E[X|X̃ t = x̃tn ] where random variable X̃ t = X +
devices where the channels between the BS and the devices τt V , and x̃tn is a realization of X̃ t . Note that the denoiser ηt (x̃tn )
are modeled with large-scale fading gn with parameters α, β depends on t through τt . The expression of the conditional
and shadowing fading parameter σSF as defined in the system expectation is given in the following proposition.
model. Then, gn follows a distribution as Proposition 3: Based on the pdf of xn in (14) and the signal-
plus-noise model X̃ t = X + τt V at each iteration in AMP, the
pG (gn ) = agn−γ Q(gn ), (11) conditional expectation of X given X̃ t = x̃tn is given by
∞
where Q(gn )  (b ln g n +c) exp(−s2 )ds, γ  40/β + 1, and ν1 (|x̃tn |2 )
E[X|X̃ t = x̃tn ] = x̃tn , (15)
a, b, and c are constants depending on parameters α, β, σSF ξ1 (|x̃tn |2 )
CHEN et al.: SPARSE ACTIVITY DETECTION FOR MASSIVE CONNECTIVITY 1895

Compared with the MMSE denoiser in (15), we add gn to the


left hand side of (19) to emphasize the dependency on prior
information gn .

B. User Activity Detection


After the AMP algorithm has converged, we employ the like-
lihood ratio test to perform user activity detection. For the hy-
pothesis testing problem

H0 : X = 0, inactive user,
(21)
H1 : X = 0, active user;
the optimal decision rule is given by

pX̃ t |X (x̃tn |X =
0) H 0
LLR = log ≶ ln , (22)
pX̃ t |X (x̃tn |X = 0) H 1
Fig. 1. MMSE denoiser vs. soft thresholding denoiser [14] η tso ft (x̃tn )  where LLR denotes the log-likelihood ratio, and ln denotes the
θ x̃ tn
(x̃tn − |x̃ tn |
)I(|x̃tn | > θ), where I(·) is the indicator function. decision threshold typically determined by a cost function. The
performance metrics of interest are the probability of missed
detection PM , defined as the probability that a device is ac-
where functions νi (s) and ξi (s) are defined as tive but the detector declares the null hypothesis H0 , and the
∞ 2−γ   probability of false alarm, PF , defined as the probability that a
gn Q(gn ) −s device is inactive, but the detector declares it to be active. We
νi (s)  exp dgn , (16)
0 (gn2 + τt2 )i+1 gn2 + τt2 consider two cases depending on whether the large-scale fading
  coefficient gn is available at the BS or not.
1−λ −s
ξi (s)  exp 1) With Statistical Knowledge of Large-Scale Fading Only:
λaτt2i τt2
∞ −γ   We first derive the likelihood probabilities in the following.
gn Q(gn ) −s Proposition 4: Suppose that X follows (14), and V follows
+ exp dgn . (17)
0 (gn2 + τt2 )i gn2 + τt2 complex Gaussian distribution with zero mean and unit vari-
Proof: See Appendix C.  ance, the likelihood of X̃ t = X + τt V given X = 0 or X = 0 is
Note that to implement ηt (x̃tn ) at each iteration, the value of given by
 
τt is needed. In practice, an empirical estimate τ̂t = √1L zt 2 , 1 −|x̃tn |2
pX̃ t |X (x̃n |X = 0) =
t
exp , (23)
where  · 2 denotes the 2 norm, can be used [22]. Although πτt2 τt2
ηt (x̃tn ) is in a complicated form, we note that it can be pre- ∞ −γ  
agn Q(gn ) −|x̃tn |2
computed and stored as table lookup, so it does not add to pX̃ t |X (x̃tn |X = 0) = exp dgn .
run-time complexity. To gain some intuition, we illustrate the 0 π(gn2 + τt2 ) gn2 + τt2
shape of the MMSE denoiser as compared to the widely used (24)
soft thresholding denoiser in Fig. 1. We observe that the MMSE Proof: See Appendix D. 
denoiser plays a role similar to the soft thresholding denoiser, Based on (23) and (24), the log-likelihood ratio is given as
shrinking the input towards the origin, especially when the input ∞ 2 −γ
is small, thereby promoting sparsity. aτt gn
LLR = log Q(gn ) exp(|x̃tn |2 Δ)dgn , (25)
2) With Exact Knowledge of Large-Scale Fading: When gn 0 gn2 + τt2
is available at the BS, we substitute (13) into (4), and the pdf of where Δ is defined in (20). By observing that LLR is monotonic
the entries of x is simplified to Bernoulli-Gaussian as H0
  in |x̃tn |, we can simplify the decision rule in (22) as |x̃tn | ≶ ln ,
λ −|xn |2 H1
pX |G (xn |gn ) = (1 − λ)δ0 + 2 exp . (18) indicating that user activity detection can be performed based
πgn gn2
on the magnitude of x̃tn only.
The MMSE denoiser is given by ηt (x̃tn , gn ) = E[X|X̃ t = Based on the likelihood probabilities and the threshold ln ,
x̃tn , G = gn ], where the conditional expectation is [17] the probabilities of false alarm and missed detection can be
gn2 (gn2 + τt2 )−1 x̃tn characterized as follows
E[X|X̃ t = x̃tn , G = gn ] = ,  2
1+ 1−λ g n2 +τ t
2
exp (−Δ|x̃tn |2 ) −ln
λ τ t2 PF = pX̃ t |X (x̃n |X = 0)dx̃n = exp
t t
, (26)
(19) |x̃ n |> l n
t τt2

where
PM = pX̃ t |X (x̃tn |X = 0)dx̃tn , (27)
Δ  τt−2 − (gn2 + τt2 )−1 . (20) |x̃ tn |< l n
1896 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 66, NO. 7, APRIL 1, 2018

where (26) is simplified by using (23). Note that since only sta- detection as
tistical information of the large-scale fading is known at the BS, N   
1  −l2
PF and PM are the averaged false alarm and missed detection PM = 1 − exp
probabilities which do not depend on gn . N n =1 τt2 + gn2
2) With Exact Knowledge of Large-Scale Fading: When gn   
−l2
is known at the BS, the distribution of X is simplified to → pG (g) 1 − exp dg, as N → ∞,
Bernoulli-Gaussian. The likelihood probabilities become τt2 + g 2
(33)

exp −|x̃n |2 τt−2 where the distribution PG (g) is given in (11). When N is large,
pX̃ t |X ,G (x̃tn |X = 0, G = gn ) = , (28) once τt is given, the averaged performance only depends on the
πτt2
 statistics of the large-scale fading gn .
exp −|x̃n |2 (gn2 + τt2 )−1
pX̃ t |X ,G (x̃n |X = 0, G = gn ) =
t
.
π(τt2 + gn2 ) C. State Evolution Analysis
(29) We have characterized the probabilities of false alarm PF and
missed detection PM for user activity detection in (26), (27) and
The log-likelihood ratio is then given as (31), (32), but the parameter τt that represents the standard devi-
ation of the residual noise still needs to be determined. As AMP
  proceeds, τt convergesto τ∞ . To compute τt , we use the state
τt2 t 2
LLR(gn ) = log exp(|x̃n | Δ) , (30) evolution (7), where E |ηt (X̃ t , G) − X|2 in (7) can be inter-
gn2 + τt2
preted as the MSE of the denoiser. Note that for the MMSE de-
noiser, MSE can also be expressed as E |ηt (X̃ t , G) − X|2 =
where the notation LLR(gn ) emphasizes the dependency on  
E Var(X|X̃ t , G) , where Var(X|X̃ t , G) is the conditional
the prior information gn . Similar to the case where only the
variance of X given X̃ t and G, and the expectation is taken
statistics of gn is known, LLR here is also monotonic in |x̃tn |,
over both X̃ t and G. (Note that we drop G if the large-scale fad-
which means that the user activity detection can be performed
ing coefficient is unknown.) By using conditional variance, we
based on |x̃tn | only.
characterize the MSE of the designed denoisers in the following
We also use ln to denote the threshold in the detection. Based
propositions.
on (28) and (29), the probabilities of false alarm and missed
Proposition 5: The MSE of the denoiser for the case where
detection are given as follows
only the statistics of gn is known to the BS is given by

aQ(gn ) λgn2 τt2
PF (gn ) = pX̃ t |X ,G (x̃tn |X = 0, G = gn )dx̃tn MSE(τt ) = · 2 dgn
|x̃ tn |> l n 0 gnγ gn + τt2
 ∞

= exp −ln2 τt−2 , (31) + aλs μ1 (s) − ν12 (s)ξ1−1 (s) ds, (34)
0

PM (gn ) = pX̃ t |X ,G (x̃tn |X = 0, G = gn )dx̃tn where functions νi (s) and ξi (s) are defined in (16) and (17),
|x̃ tn |< l n respectively, and function μi (s) is defined as
 ∞ 4−γ  
= 1 − exp −ln2 (gn2 + τt2 )−1 , (32) gn Q(gn ) −s
μi (s)  exp dgn . (35)
0 (gn2 + τt2 )i+2 gn2 + τt2
where we use the notation PF (gn ) and PM (gn ) to indicate the
Proof: See Appendix E. 
prior known gn . Note that the false alarm probability in (31)
It is worth noting that λgn2 τt2 (gn2 + τt2 )−1 in the first term
has the form as that in (26) even through the value of τt may be
of the right hand side of (34) corresponds to the MSE of the
different due to different denoisers.
estimate of xn if the large-scale fading coefficient gn as well
A natural question then arises: how to design the threshold
as the user activity is assumed to be a priori known, and the
ln as a function of the known large-scale fading gn ? In theory,
integral of gn corresponds to the averaging over all possible gn .
we can treat each user separately, i.e., set the thresholding value
The second term then represents the cost of unknown gn and
of each user separately according to its own cost function. For
unknown user activity in reality. Similarly, for the case where
example, if a specific target false alarm probability is needed for
gn is exactly known, the MSE can be characterized as follows.
user n, we can design its thresholding parameter, ln , using the
Proposition 6: The MSE of the denoiser for the case where
expression in (31). In order to bring fairness, this paper considers
gn is known exactly at the BS is
a common target false alarm probability for all users. Under this ∞
condition, all users share the same thresholding parameter, i.e., aQ(gn ) λgn2 τt2
MSE(τt ) = · 2 dgn
ln = l, ∀n, since the expression of PF (gn ) in (31) does not 0 gnγ gn + τt2
depend on gn . In such a case, different users may have different ∞
aλQ(gn )gn4 
probabilities of missed detection depending on their large-scale + γ 2 + τ2)
1 − ϕ1 (gn2 τt−2 ) dgn ,
fading gn . To measure the performance of the detector for the 0 gn n(g t
entire system, we employ the average probability of missed (36)
CHEN et al.: SPARSE ACTIVITY DETECTION FOR MASSIVE CONNECTIVITY 1897

where function ϕi (s) of s is defined as Proposition 7: Let rn denote the row vector of X. If only
∞ pG (gn ) is known at the BS, the pdf of rn is given by
ti exp(−t)
ϕi (s)  dt. (37) 
1 + (1 − λ)(1 + s)i exp(−st)/λ
0 ∞
exp −rn 22 gn−2
Proof: See Appendix F.  pR (rn ) = (1 − λ)δ0 + dgn .
0 π M gnγ +2M /(aλQ(gn ))
We also observe from (36) that the first term in the right (40)
hand side corresponds to the averaged MSE if the user activity If gn is known, the pdf of rn is Bernoulli-Gaussian as
is assumed to be known, and the second term corresponds the
extra error brought by unknown user activity. λ exp(−rn 22 gn−2 )
Based on the expressions of MSE in (34) and (36), the state pR|G (rn |gn ) = (1 − λ)δ0 + . (41)
(πgn2 )M
evolution in (7) can be expressed as
2 N Proof: The results are extensions of (14) and (18) by con-
τt+1 = σw2 + MSE(τt ), (38)
L sidering multivariate random variables. 
based on which PF and PM can be evaluated according to (26), Given R̃t = R + Ut with Ut following complex Gaussian
(27), and (31), (32), as functions of the iteration number. As the distribution with zero mean and covariance Σt , the MMSE de-
AMP algorithm converges, τt converges to the fixed point τ∞ noisers ηt (r̃tn ) and ηt (r̃tn , gn ) for both cases are given by the
of the above equation. conditional expectation in the following.
Now we compare the resulting MSEs in these two cases. Proposition 8: If only pG (gn ) is known at the BS, the con-
According to the decomposition of variance, we have ditional expectation of R given R̃t = r̃tn is
    ∞
E Var X|X̃ t = E Var X|X̃ t , G Q(gn )ψa (gn )(gn−2 Σt + I)−1 r̃tn dgn
0 ∞
   E[R|R̃ =t
r̃tn ] = ,
+ E Var E[X|X̃ t , G]X̃ t ψc (gn ) + 0 Q(gn )ψb (gn )dgn
  (42)
≥ E Var X|X̃ t , G , (39) where ψa (gn ), ψb (gn ) and ψc (gn ) are defined as follows
which indicates that knowing the large-scale fading can help to  t ∗
improve the estimation on X given X̃ t . However, the simula- exp −r̃tn Σ−1 −2 2 −1
t − (Σt + gn Σt ) (r̃n )
ψa (gn )  γ 2
,
tion results in Section VI show that surprisingly for the model gn |Σt + gn I|
of the large-scale fading considered in this paper, the perfor- (43)
mance improvement is actually minor, indicating that knowing t −2 −1
 
exp −r̃n gn I − (gn2 I + gn4 Σt )−1 (r̃tn )∗
the large-scale fading does not help to get a much better estima- ψb (gn )  ,
tion. Knowing the exact value of gn is not crucial in user activity gnγ |Σt + gn2 I|
detection—the statistical information of gn is already sufficient. (44)

ψc (gn )  (1 − λ)(aλ)−1 exp −r̃tn Σ−1 t ∗ −1
t (r̃n ) |Σt | . (45)
V. USER ACTIVITY DETECTION: MULTIPLE-ANTENNA CASE
This section designs the AMP algorithms that account for If gn is known at the BS, the conditional expectation is
wireless channel propagation for the massive connectivity prob-
lem in the multiple-antenna case. As mentioned earlier, two (gn−2 Σt + I)−1 r̃tn
different AMP algorithms can be used for the MMV problem: E[R|R̃t = r̃tn , G = gn ] = , (46)
1 + 1−λ 2
λ |gn I + Σt |ψd (gn )
the AMP with a vector denoiser operating on each row of the
input matrix, or the parallel AMP-MMV that divides the MMV
where ψd (gn ) is defined as follows
problem into parallel SMV problems and iteratively solves the
SMV problem on each antenna separately with soft information  t ∗
exchange between the antennas. The AMP with vector denoiser ψd (gn )  exp −r̃tn Σ−1 t − (Σt + gn I)
2 −1
(r̃n ) |Σt |−1 .
admits a state evolution, which allows an easier characterization (47)
of its performance, whereas AMP-MMV can be implemented Proof: See Appendix G. 
in a distributed way which is helpful for reducing the running The covariance matrix Σt in both (42) and (46) is tracked via
time of the algorithm, especially when the BS is equipped with the state evolution (10), and Σt can be further simplified by the
a large antenna array. following proposition.
Proposition 9: Based on the pdfs in (40) and (41) and the
A. User Activity Detection by AMP With Vector Denoiser state evolution (10), if the initial covariance matrix Σ0 is a diag-
onal matrix with identical diagonal entries, i.e., Σ0 = τ02 I, then
As in the scenario with a single antenna, we consider both Σt stays as a diagonal matrix with identical diagonal entries,
the cases where only the statistical knowledge or the exactly i.e., Σt = τt2 I, for t ≥ 1, where τt is determined by
knowledge of the large-scale fading is known at the BS. To
design the denoisers, we first characterize the pdfs of the row N
2
vectors of X in the following. τt+1 = σw2 + MSE(τt ). (48)
L
1898 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 66, NO. 7, APRIL 1, 2018

If only pG (gn ) is known at the BS, MSE(τt ) is given by probabilities can be computed as, respectively
∞ 
aλgn2 τt2 Q(gn ) exp −r̃tn 22 τt−2
MSE(τt ) = dgn pR̃ t |R,G (r̃n |R = 0, G = gn ) =
t
, (55)
0 gnγ (gn2 + τt2 ) π M τt2M
∞ 
μM (s) − νM 2 −1
(s)ξM (s) exp −r̃tn 22 (τt2 + gn2 )−1
+ ds, (49) pR̃ t |R,G (r̃n |R = 0, G = gn ) =
t
.
0 Γ(M + 1)/(λas ) M π M (τt2 + gn2 )M
(56)
where functions μi (s) , νi (s) and ξi (s) are defined in (35), (16),
and (17), respectively, and Γ(·) is the Gamma function. If the For both cases, we immediately obtain the LLRs as, respectively
exact large-scale fading gn is known at the BS, MSE(τt ) is ∞ 2M −γ
aτt gn Q(gn ) 
given by LLR = log 2 2 exp r̃tn 22 Δ dgn , (57)
0 (gn + τt ) M
∞  
aλgn2 τt2 Q(gn ) t 2 
MSE(τt ) = dgn τt2M
gnγ (gn2 + τt2 ) LLR(gn ) = log exp r̃ 
n 2 Δ . (58)
0 (τt2 + gn2 )M
∞  
aλQ(gn )gn4 ϕM (gn2 τt−2 ) Observing that LLRs are monotonic in r̃tn 2 , we can set a
+ 1− dgn ,
0 gnγ (gn2 + τt2 ) Γ(M + 1) threshold ln on r̃tn 2 to perform the detection. When the large-
(50) scale fading is unknown at the BS, the probabilities of false
where function ϕi (s) is defined in (37). alarm and missed detection are then given as, respectively

Proof: See Appendix H.  exp −r̃tn 22 τt−2
Note that Σ0 is the noise covariance matrix after the first PF = M τ 2M
dr̃tn
r̃ n 2 > l n
t π t
matched filtering, which is indeed a diagonal matrix with iden-
(a) 1 
tical diagonal entries. Based on Proposition 9, the MMSE de- = 1− γ̄ M, ln2 τt−2 , (59)
noiser in (42) can be further simplified as Γ(M )

νM (r̃tn 22 ) and


E[R|R̃t = r̃tn ] = r̃tn , (51) ∞
ξM (r̃tn 22 ) agn−γ Q(gn )
PM =
r̃ tn 2 < l n 0 π (gn2 + τt2 )M
M
where νi (s) and ξi (s) are defined in (16) and (17), respectively,  
and the MMSE denoiser in (46) can be simplified as −r̃tn 22
exp dgn dr̃tn
gn2 + τt2
E[R|R̃t = r̃tn , G = gn ]
(b)

agn−γ Q(gn ) 
gn2 (gn2 + τt2 )−1 r̃tn = γ̄ M, ln2 (gn2 + τt2 )−1 dgn , (60)
= 2 , (52) 0 Γ(M )
1−λ g n2 +τ t M
1+ λ ( τ t2 ) exp(−Δr̃tn 22 )
where γ̄(·, ·) is the lower incomplete Gamma function, and (a)
where Δ is defined in (20). Note that if we let M = 1, (51) and and (b) are simply obtained by noticing that the integral of r̃tn
(52) reduce to the denoisers for the single-antenna case in (15) can be interpreted as the cumulative distribution function (cdf)
and (19). As mentioned before, we can also pre-compute and of a χ2 distribution with 2M degrees of freedom since r̃tn 22
store the functions νM (·) and ξM (·) in (51) as table lookup. can be regarded as a sum of the squares of 2M identical real
After the AMP algorithm has converged, we use the like- Gaussian random variables. Using the same approach, when the
lihood ratio test to perform the user activity detection. Recall large-scale fading is known to the BS, the probabilities of false
that R̃t = R + Ut where Ut follows complex Gaussian distri- alarm and missed detection can be evaluated as

bution. For the case where the large-scale fading is unknown, exp −r̃tn 22 τt−2
based on (40) and pR̃ t derived in (82) in Appendix G, the like- PF (gn ) = dr̃tn ,
r̃ tn 2 > l n π M τt2M
lihood probabilities given that the user is inactive or active are,
1 
respectively =1− γ̄ M, ln2 τt−2 , (61)
 Γ(M )
exp −r̃tn 22 τt−2 
pR̃ t |R (r̃tn |R = 0) = , (53) exp −r̃tn 22 (τt2 + gn2 )−1
π M τt2M PM (gn ) = dr̃tn
∞ r̃ tn 2 < l n π M (τt2 + gn2 )M
agn−γ Q(gn )
pR̃ t |R (r̃tn |R = 0) = 1 
0 π (gn2 + τt2 )M
M
= γ̄ M, ln2 (gn2 + τt2 )−1 . (62)
  Γ(M )
−r̃tn 22
exp dgn . (54) It is easy to verify that when M = 1, (59), (60), (61) and (62)
gn2 + τt2
reduce to (26), (27), (31) and (32), respectively.
For the case where the large-scale fading coefficient is known, Based on (59), (60), (61) and (62), we can design the thresh-
noting that R follows a Beroulli-Gaussian distribution, and old ln to achieve a trade-off between the probability of false
R̃t follows a mixed Gaussian distribution, then the likelihood alarm and the probability of missed detection. The proposed
CHEN et al.: SPARSE ACTIVITY DETECTION FOR MASSIVE CONNECTIVITY 1899

thresholding strategy in the single-antenna case can still be used


in the multiple-antenna scenario.

B. User Activity Detection by Parallel AMP-MMV


The outline of the parallel AMP-MMV algorithm is as pre-
sented in Algorithm 1. In this section, we adopt the paral-
lel AMP-MMV algorithm for our problem setup; we present
the expression of the denoiser, η(·, g, t, i, m)  [ηt,i,m (·, g1 ),
· · · , ηt,i,m (·, gN )]T and the probability of a device being active

based on the decision at the mth AMP-SMV stage, π n m . Here
we only discuss the case where the large-scale fading is known.
The extension to the scenario where the large-scale fading is
unknown is similar.
Since the parallel AMP-MMV algorithm employs M paral-
lel AMP-SMVs, the expression of the scalar denoiser for each
AMP-SMV is in the form of the MMSE denoiser for the single-
antenna case in (19). However, instead of using the prior λ as Fig. 2. Performance of AMP based user activity detection with only statistical
knowledge of the large-scale fading.
the probability of being active for each user, the algorithm has
access to a better estimate for the probability of activities as

π n m as algorithm proceeds. Therefore, the expression for the


to (61) and (62), respectively. To have a complete performance
MMSE denoiser can be written as
analysis, we also need to determine τT2 ,I in the parallel AMP-
gn2 (gn2 + τt,i 2 −1 t,i
) x̃n m MMV algorithm. However, due to the soft information exchange
ηt,i,m (x̃t,i
n m , gn ) = 2 2  2 t,i 2  ,
1− π n m g n +τ t , i

1+
2 exp τ −g
2
n |x̃ n m |
2 2
between the antennas, deriving an analytic state evolution for
π nm τ t,i t , i (g n +τ t , i ) 2
τt,i is very challenging. The numerical experiments in Section
(63)
VI show that the performance of parallel AMP-MMV is very
where x̃t,i
nm is the element in the nth row and the mth column
similar to AMP with vector denoiser. This observation suggests
of X̃t,i . At the end of the ith outer iteration, i.e., t = T , the
that the parameter τT2 ,I for the AMP-MMV algorithm should be
likelihood probabilities given that the user is inactive or active
similar to the final value of τt2 in AMP with vector denoiser.
can be written as
We briefly discuss the complexities of AMP with vector de-
,i 2 −2
exp(−|x̃Tn m | τT ,i ) noiser and AMP-MMV. For both algorithms the computational
p(x̃Tn m
,i
|X = 0, G = gn ) = , (64)
πτT2 ,i complexities mainly lie in the matched filtering and residual
,i 2 2
calculation, which depend on the problem size as O(N LM ),
exp(−|x̃Tn m | (gn + τT2 ,i )−1 ) at each iteration. The advantage of AMP-MMV is that parallel
p(x̃Tn m
,i
|X = 0, G = gn ) = 2 .
π(τT ,i + gn2 ) computation is allowed due to the division of MMV problem
(65) into several SMV problems.
Further, using equations (64) and (65), the probability that
user n is active based on the decision at the mth AMP-SMV can VI. SIMULATION RESULTS
be calculated as We evaluate the performance of the proposed method in a cell
p(x̃Tn m
,i
|X = 0, G = gn ) of radius R = 1000 m with potential N = 4000 users among
π nm = which 200 are active, i.e., λ = 0.05. The channel fading param-
p(x̃Tn m
,i
= 0, G = gn ) + p(x̃Tn m
|X ,i
|X = 0, G = gn )
eters are α = 15.3, β = 37.6 and σSF = 8, and the background
−1
τT2 ,i + gn2 noise is −169 dBm/Hz over 10 MHz.
−gn2 |x̃Tn m
,i 2
|
= 1+ exp . (66) We first consider the single-antenna case with only statistical
τT2 ,i τT2 ,i (gn2 + τT2 ,i )
knowledge of large-scale fading. Fig. 2 shows the tradeoff be-
After the parallel AMP-MMV is terminated, we use likeli- tween the probabilities of missed detection and false alarm of
hood ratio test to perform the user activity detection. It can be AMP with MMSE denoiser when the pilot sequence length is
shown that the LLR for user n can be calculated as set as L = 800 and the transmit power is set as 5 dBm, 15 dBm,
 and 25 dBm. We see that the simulated PM and PF match the
τT2M
,I −gn2 m |x̃Tn m ,I 2
| analysis very well. We also plot a lower bound using τ∞ = σw .
LLR(gn ) = log exp .
(τT2 ,I + gn2 )M (τT2 ,I + gn2 )τT2 ,I The lower bounds are very close to the actual performance,
(67) indicating that after convergence AMP is able to almost com-
It can be seen that the LLR expression for AMP-MMV algo- pletely eliminate multiuser interference; the remaining error is
rithm is in a similar form as in (57). Therefore, with the same dominated by the background noise.
discussion, we can show that the probabilities of false alarm Fig. 3 shows the performance of the AMP algorithm with
and missed detection can be further simplified in a form similar MMSE denoiser when the exact large-scale fading coefficients
1900 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 66, NO. 7, APRIL 1, 2018

Fig. 3. Performance of AMP based user activity detection with knowledge of Fig. 4. Performance comparison of AMP with MMSE denoiser, AMP with
the large-scale fading. soft threshoding denoiser, and CoSaMP.

are known. For comparison, the performance with only statis-


tical knowledge of the large-scale fading is also demonstrated
(only simulated performance is included since the predicted per-
formance is almost the same as depicted in Fig. 2). Fig. 3 shows
that the predicted curves match the simulated curves very well.
More interestingly, it indicates that the performance improve-
ment for knowing the exact large-scale fading coefficients is
negligible which suggests that knowing the distribution of the
large-scale fading (rather than the exact value) is already enough
for good user activity detection performance.
The next simulation compares the AMP algorithm with
MMSE denoiser with two other algorithms widely used in com-
pressed sensing: CoSaMP [23], and AMP but with soft thresh-
olding denoiser [3]. Compare to AMP, CoSaMP is based on
the matching pursuit technique. Compare to AMP with MMSE
denoiser, AMP with soft thresholding does not exploit the sta-
tistical knowledge of xn . Fig. 4 shows that AMP with MMSE
denoiser outperforms both CoSaMP and AMP with soft thresh- Fig. 5. Impact of transmit power and length of pilot on user activity detection
olding denoiser. This is partly due to the fact that both CoSaMP performance: MMSE denoiser vs. soft threholding denoiser.
and AMP with soft thresholding denoiser do not exploit the the
statistical knowledge of xn . Note that AMP with soft thresh-
olding implicitly solves the LASSO problem [14], [24], i.e., the Finally, we consider the multiple-antenna case assuming the
sparse signal recovery problem as an 1 -penalized least squares knowledge of large-scale fading coefficients. Fig. 6 illustrates
optimization. Therefore, the results in Fig. 4 indicate that AMP the probabilities of false alarm and missed detection under
with MMSE denoiser also outperforms LASSO. different numbers of antennas for both AMP with vector de-
Fig. 5 compares the performance of the AMP algorithm with noiser and parallel AMP-MMV algorithms. For comparison,
MMSE denoiser and the AMP algorithm with soft thresholding the single-antenna M = 1 case is also included. Fig. 6 shows
denoiser as function of transmit power and pilot length. For con- that for AMP with vector denoiser, the simulated results match
venience, we set PF = PM by properly choosing the threshold the predicted results very well. Further, it shows that the perfor-
l. We observe first that the MMSE denoiser outperforms soft mances of AMP with vector denoiser and parallel AMP-MMV
thresholding denoiser significantly, but more importantly, we are approximately the same, indicating that although these two
observe that the minimum L needed to drive PF and PM to algorithms employ different strategies, they both exploit the sta-
zero as transmit power increases is between 300 and 400 for tistical knowledge of the channel in the same way, resulting in
the MMSE denoiser, whereas the minimum L is between 600 similar performances.
and 800 for the soft threshloding denoiser, indicating the clear The impact of the pilot length L and the number of anten-
advantage of accounting for channel statistics in user activity nas M on the probabilities of false alarm and missed detection
detector design. as the transmit power increases is shown in Fig. 7. We set
CHEN et al.: SPARSE ACTIVITY DETECTION FOR MASSIVE CONNECTIVITY 1901

applications with random non-orthogonal signature sequences.


Specifically, we propose an AMP-based user activity detection
algorithm by exploiting the statistics of the wireless channel for
the uplink of a cellular system with a large number of potential
users but only a small fraction of them are active at any time
slot. We show that by using the state evolution, an accurate per-
formance characterization in terms of the probabilities of false
alarm and missed detection can be carried out. In particular, we
consider both cases in which the BS is equipped with a single
antenna or multiple antennas. We present the designs of the
MMSE denoisers in the scenarios where the large-scale fading
is either available exactly or where only its statistics is available
at the BS. For the multiple-antenna case, we adopt two AMP
algorithms, AMP with vector denoiser and parallel AMP-MMV,
to tackle the detection problem. We derive a performance anal-
ysis for both the single-antenna case and the multiple-antenna
case. Simulation results validate the analysis, and show that ex-
Fig. 6. Performance of AMP with vector denoiser (vAMP) and AMP-MMV ploiting the statistics of the channel in AMP denoiser design
for user activity detection in the multiple-antenna case. can significantly improve the detection threshold, and further
deploying multiple antennas at the BS can also bring significant
performance improvement.

APPENDIX
A. Proof of Proposition 1
Let d, x, y, z denote the distance from a user to the BS, the
shadowing in dB, the shadowing in linear scale, and the path-
loss, respectively. Note that in appendices we slightly abuse
some notations appeared in the paper due to the limited alphabet.
However this should not cause confusion since they only used
for derivations in the appendices. Let g  yz denote the large-
scale fading. We derive the pdfs of d, x, y, z and g as follows.
Assuming that all users are uniformly distributed in the cell
with radius R, the pdf of d is
2d
pD (d) = , 0 < d < R. (68)
R2

Fig. 7. Impact of length of pilot, number of antennas, and transmit power.


Since x = −20 log10 (y) follows Gaussian distribution with zero
2
mean and variance σSF , the pdf of y can be derived as
 
L = 300, 600, and M = 1, 2, 4. We make PF = PM for conve- 20 200 ln2 (y)
pY (y) = √ exp − 2 2
. (69)
nient comparison by properly choosing the threshold l. Note that ln(10) 2πσSF y ln (10)σSF
in the scenario where the exact large-scale fading coefficients
are known, different users have the same probabilities of false By using z = 10−(α +β log 1 0 d)/20 , we get the pdf of z as
alarm but different probabilities of missed detection. Thus we 40
have to use the average probability of missed detection over all pZ (z) = 10−2α /β z −40/β −1 , (70)
R2 β
users. Fig. 7 shows that increasing L or M brings significant
improvement. Specifically, when L = 300, M = 1, PF and PM where z > 10−(α +β log 1 0 R )/20 . The pdf of g is

tend to remain unchanged as the transmit power increases. How- 
ever, by either increasing L or increasing M , PF and PM can be pG (g) = |y|−1 pY (y)pZ y −1 g dy  ag −γ Q(g), (71)
driven to zero as the transmit power increases. In other words, ∞
the minimum L required to drive PF and PM to zeros can be where Q(g)  b ln g +c exp(−s2 )ds, γ  40/β + 1, and a, b
reduced by increasing M . and c are constants given in the proposition.

VII. CONCLUSION B. Proof of Proposition 2


This work shows that compressed sensing is a viable strategy Let g denote the large-scale fading following (11), and x =
for sporadic device activity detection for massive connectivity xR + jxI denote the Rayleigh fading, where xR and xI are the
1902 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 66, NO. 7, APRIL 1, 2018

real and imaginary parts of x, respectively. Let h = hR + jhI where (a) is obtained by pY ,W (y, w) = pH (y − w)pW (w), and
denote the channel coefficient accounting for both large-scale (b) is obtained by substituting (72). Combine the results in (74)
fading g and Rayleigh fading x, i.e., h = gx. Then the pdf and (75) with pX̃ |X (x̃|X = 0) = pY (x̃), we get
of h is  
1−λ −|x̃|2
pH (h) = pH R ,H I ,G (hR , hI , g)dg pX̃ (x̃) = exp
πτ 2 τ2
  ∞ 
hR hI  ∂(xR , xI , g)  λaQ(g) exp −|x̃|2 /(g 2 + τ 2 )
= pG (g)pX R ,X I
,  dg + dg. (76)
g g ∂(hR , hI , g) 0 πg γ (g 2 + τ 2 )
 
a ∞ −γ −2 −|h|2
= g Q(g) exp dg, (72) E. Proof of Proposition 5
π 0 g2
We omit superscript t and subscript n for notation simplicity.
where pH R ,H I ,G (hR , hI , g) and pX R ,X I (xR , xI ) are the joint The conditional variance of X given X̃ = x̃ is
(x R ,x I ,g )
pdfs, and | ∂∂ (h R ,h I ,g )
| = g −2 is the Jacobian determinant. When     2
g is known, pH |G (h|g) follows the complex Gaussian distribu- Var(X|X̃ = x̃) = E |X|2 X̃ = x̃ − E[X|X̃ = x̃] . (77)
tion with zero mean and variance g 2 .
Since we have derived E[X|X̃ = x̃] in (73), we only need to
derive E |X|2 X̃ = x̃ , which can be expressed as
C. Proof of Proposition 3
  
We omit superscript t and subscript n for notation simplic- E |X|2 X̃ = x̃
ity. The conditional expectation of X given X̃ = x̃ can be
|x|2 pX (x)
expressed as = p (x̃|x)dx
pX̃ (x̃) X̃ |X
E[X|X̃ = x̃] ∞  
2 −|x|2 −|x̃ − x|2
pX (x) = f (g)|x| exp + dgdx
= x p (x̃|x)dx 0 g2 τ2
pX̃ (x̃) X̃ |X ∞  
∞   −|x̃|2 −|x − δ x̃|2
−|x|2 −|x̃ − x|2 = f (g)|x|2 exp + dgdx
= f (g)x exp + dgdx 0 g2 + τ 2 δτ 2
0 g2 τ2 ∞  
∞   (a) λaQ(g)fˆ(g) −|x̃|2
−|x̃|2 −|x − δ x̃|2 = exp dg, (78)
= f (g)x exp + dgdx 0 g γ pX̃ (x̃)π g2 + τ 2
0 g2 + τ 2 δτ 2
∞   where f (g) is defined in Appendix C, (a) is obtained by us-
(a) λax̃ −|x̃|2 Q(g)g 2−γ ing Gaussian integral of x, fˆ(g)  τ 2 g 2 /(g 2 + τ 2 )2 + |x̃|2 g 4 /
= exp dg, (73)
pX̃ (x̃)π 0 g 2 + τ 2 (g 2 + τ 2 )2 (g 2 2 3
 + τ ) . By using (77), (76), (73), and MSE(τ ) =
where f (g)  λaQ(g)/(pX̃ (x̃)π 2 τ 2 g γ +2 ), δ  g 2 /(g 2 + τ 2 ), Var(X|X̃ = x̃)pX̃ (x̃)dx̃ with some algebraic manipulations,
(a) is obtained by using Gaussian integral of x. By substitut- we obtain (34).
ing the expression of pX̃ (x̃) derived in Appendix D, we get
E[X|X̃ = x̃] as in (15). F. Proof of Proposition 6
Similar
 to Appendix E, we first derive the conditional expec-
D. Proof of Proposition 4 tation E |X|2 |X̃ = x̃, G = g as
We omit superscript t and subscript n in the following. Note   λfˆ(g) exp(−|x̃|2 /(g 2 + τ 2 ))
that pX̃ |X (x̃|X = 0) = pW (x̃) where random variable W is de- E |X|2 |X̃ = x̃, G = g = ,
pX̃ |G (x̃|g)π
fined as W  τ V , and pX̃ |X (x̃|X = 0) = pY (x̃) where random
(79)
variable Y is defined as Y  H + W with H following the dis- where fˆ(g) is defined in Appendix
  E. Then based on (19) and
tribution in (12). Since V follows complex Gaussian distribution Var(X|X̃ = x̃, G = g) = E |X|2 X̃ = x̃, G = g − |E[X|X̃
with zero mean and unit variance, we get = x̃, G = g]|2 , we have
 
1 −|x̃|2
pX̃ |X (x̃|X = 0) = exp , (74) MSE = Var(X|X̃ = x̃, G = g)pX̃ |G (x̃|g)pG (g)dx̃dg
πτ 2 τ2

To compute pX̃ |X (x̃|X = 0), we derive pY (y) as follows ∞
λg 2 τ 2 λg 4 1 − ϕ1 (g 2 τ −2 )
(a)
= + pG (g)dg,
(a) 0 g2 + τ 2 g2 + τ 2
pY (y) = pH (y − w)pW (w)dw
(80)
∞ −γ
 2

(b) ag Q(g) −|y| where (a) is obtained by using Gaussian integral over x̃. After
= exp dg, (75)
0 π(g 2 + τ 2 ) g2 + τ 2 plugging pG (g), we finally get (36).
CHEN et al.: SPARSE ACTIVITY DETECTION FOR MASSIVE CONNECTIVITY 1903

G. Proof of Proposition 8 of r̃t . Based on (84) and (85), we have



We omit superscript t and subscript n. For the case where g 2 τt2 λ exp − r̃t 22 /(g 2 + τt2 )
the large-scale fading is unknown, the proof is similar to ci = · dr̃t
g 2 + τt2 π M (g 2 + τt2 )M
Appendix C except that we need to deal with random vectors 
rather than random scalars. By using |r̃t,i |2 1 − φ−1 (r̃t )
+ ·
g −4 (g 2 + τt2 )2
pR (r) 
E[R|R̃ = r̃] = r p (r̃|r)dr, (81)
pR̃ (r̃) R̃|R λ exp − r̃t 22 /(g 2 + τt2 )
dr̃t
π M (g 2 + τt2 )M
where pR̃ (r̃) can be derived as  
  λg 2 τ 2 λg 4 ϕM (g 2 τt−2 )
1−λ −r̃22 = 2 t2 + 2 1 − , (86)
pR̃ (r̃) = exp g + τt g + τt2 Γ(M + 1)
(πτ 2 )M τ2
∞  where the first term of the last step is obtained by using Gaussian
λaQ(g) exp −r̃22 /(g 2 + τ 2 )
+ dg. (82) integral, the second term is obtained by integrating in spheri-
0 π M g γ (g 2 + τ 2 )M cal coordinates instead of Cartesian coordinates, and function
By plugging pR̃ (r̃) and pR (r) into (81), and using multivariate ϕi (s) is defined in (37). As expected, ci does not depend on i,
Gaussian integral of r with some algebraic manipulations, we indicating that the diagonal elements are indeed
 identical.
 By
can obtain (42). For the case where the large-scale fading is replacing g by G and ci by C, we get E Dt D∗t = EG [C]I,
known, the conditional expectation can be found in [18]. where C is a random variable depends on G, and

H. Proof of Proposition 9 aQ(g) λg 2 τt2
EG [C] = · 2 dg
0 gγ g + τt2
Since the proofs for both cases are similar, in the following ∞  
we focus on the case where the large-scale fading is known. aQ(g) λg 4 ϕM (g 2 τt−2 )
+ · 2 1− dg. (87)
We omit subscript n, and use t as subscript instead of super- 0 gγ g + τt2 Γ(M + 1)
script for convenience. We use induction by assuming Σt = τt2 I
holds. To evaluate the right hand side of (10), we first derive the The state evolution in (10) is then simplified to
distribution of R̃t based on (41) as N
 Σt+1 = σw2 I + 2
EG [CI]  τt+1 I, (88)
λ exp − r̃t 22 (g 2 + τt2 )−1 L
pR̃ t |G (r̃t |g) = φ(r̃t ), (83)
π M (g 2 + τt2 )M which completes the induction.

where φ(r̃t )  1 + (1 − λ)(1 + g 2 τt−2 )M exp(−Δr̃t 22 )/λ, REFERENCES


and Δ is defined in (20). We then compute the conditional
covariance matrix of Rt given R̃t = r̃t and G = g as [1] J. G. Andrews et al., “What will 5G be?” IEEE J. Sel. Areas Commun.,
vol. 32, no. 6, pp. 1065–1082, Jun. 2014.
Cov = E[RTt (RTt )∗ |R̃t = r̃t , G = g] [2] W. Yu, “On the fundamental limits of massive connectivity,” in Proc. Inf.
Theory Appl. Workshop, San Diego, CA, USA, Feb. 2017, pp. 1–6.
∗ [3] D. Donoho, A. Maleki, and A. Montanari, “Message-passing algorithms
− E[RTt |R̃t = r̃t , G = g] E[RTt |R̃t = r̃t , G = g] for compressed sensing,” Proc. Nat. Acad. Sci. USA, vol. 106, no. 45,
pp. 18 914–18 919, Nov. 2009.
g 2 τt2 φ−1 (r̃t ) φ−1 (r̃t ) − φ−2 (r̃t ) T T ∗ [4] X. Chen, T.-Y. Chen, and D. Guo, “Capacity of Gaussian many-access
= I + r̃t (r̃t ) . (84)
g 2 + τt2 g −4 (g 2 + τt2 )2 channels,” IEEE Trans. Inf. Theory, vol. 63, no. 6, pp. 3516–3539,
Jun. 2017.
[5] H. F. Schepker, C. Bockelmann, and A. Dekorsy, “Exploiting sparsity in
Then by taking the expectation over R̃t , we obtain channel and data estimation for sporadic multi-user communication,” in
Proc. Int. Symp. Wireless Commun. Syst., Ilmenau, Germany, Aug. 2013,
ER̃ t |G [Cov] = pR̃ t |G (r̃t |g)Covdr̃t , (85) pp. 1–5.
[6] X. Xu, X. Rao, and V. K. N. Lau, “Active user detection and channel
estimation in uplink CRAN systems,” in Proc. IEEE Int. Conf. Commun.,
which is a diagonal matrix due to fact that the off-diagonal ele- London, U.K., Jun. 2015, pp. 2727–2732.
ment is an integral of an odd function over a symmetric interval, [7] G. Wunder, P. Jung, and M. Ramadan, “Compressive random access us-
ing a common overloaded control channel,” in Proc. IEEE Globecom
which is zero. Furthermore, it is easy to observe from the in- Workshops, San Diego, CA, USA, Dec. 2015, pp. 1–6.
tegral that the diagonal elements of ER̃ t |G [Cov] are identical. [8] G. Wunder, H. Boche, T. Strohmer, and P. Jung, “Sparse signal processing
Note that when MMSE denoiser is employed, concepts for efficient 5G system design,” IEEE Access, vol. 3, pp. 195–
 the right hand 208, Feb. 2015.
side of (10) can be rewritten as E Dt D∗t = EG ER̃ t |G [Cov] ,
  [9] H. Zhu and G. B. Giannakis, “Exploiting sparse user activity in multiuser
which leads to the result that E Dt D∗t is also a diagonal matrix detection,” IEEE Trans. Commun., vol. 59, no. 2, pp. 454–465, Feb. 2011.
[10] H. F. Schepker and A. Dekorsy, “Compressive sensing multi-user detection
with identical diagonal elements.  with block-wise orthogonal least squares,” in Proc. IEEE Veh. Technol.
In the following, we derive an explicit expression of E Dt Conf., Yokohama, Japan, May 2012, pp. 1–5.

Dt . To this end, we first compute ER̃ t |G [Cov]. Let ci denote the [11] J. Luo and D. Guo, “Neighbor discovery in wireless ad hoc networks based
on group testing,” in Proc. Allerton Conf. Commun., Control, Comput.,
ith diagonal entry of ER̃ t |G [Cov], and r̃t,i denote the ith entry Urbana-Champaign, IL, USA, Sep. 2008, pp. 791–797.
1904 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 66, NO. 7, APRIL 1, 2018

[12] L. Zhang, J. Luo, and D. Guo, “Neighbor discovery for wireless networks Foad Sohrabi (S’13) received the B.A.Sc. degree
via compressed sensing,” Performance Eval., vol. 70, no. 7, pp. 457–471, from the University of Tehran, Tehran, Iran, in 2011,
Jul. 2013. and the M.A.Sc. degree from McMaster University,
[13] M. Bayati and A. Montanari, “The dynamics of message passing on dense Hamilton, ON, Canada, in 2013, both in electrical
graphs, with applications to compressed sensing,” IEEE Trans. Inf. Theory, and computer engineering. Since September 2013,
vol. 57, no. 2, pp. 764–785, Feb. 2011. he has been working toward the Ph.D degree at the
[14] A. Maleki, L. Anitori, Z. Yang, and R. G. Baraniuk, “Asymptotic analysis University of Toronto, Toronto, ON, Canada. From
of complex LASSO via complex approximate message passing (CAMP),” July to December 2015, he was a Research Intern with
IEEE Trans. Inf. Theory, vol. 59, no. 7, pp. 4290–4308, Jul. 2013. Bell Labs, Alcatel-Lucent, Stuttgart, Germany. His
[15] S. Rangan, “Generalized approximate message passing for estimation research interests include MIMO communications,
with random linear mixing,” in Proc. IEEE Int. Symp. Inf. Theory, optimization theory, wireless communications, and
St. Petersburg, Russia, Jul. 2011, pp. 2168–2172. signal processing. He was the recipient of the IEEE Signal Processing Society
[16] D. L. Donoho, A. Maleki, and A. Montanari, “Message passing algorithms Best Paper Award in 2017.
for compressed sensing: I. motivation and construction,” in Proc. IEEE
Inf. Theory Workshop, Cairo, Egypt, Jan. 2010, pp. 1–5.
[17] P. Schniter, “Turbo reconstruction of structured sparse signals,” in Proc.
Annu. Conf. Inf. Sci. Syst., Princeton, NJ, USA, Mar. 2010, pp. 1–6.
[18] J. Kim, W. Chang, B. Jung, D. Baron, and J. C. Ye, “Belief propaga-
tion for joint sparse recovery,” 2011. [Online]. Available: http://arxiv.
org/abs/1102.3289v1 Wei Yu (S’97–M’2–SM’8–F’14) received the
[19] J. Ziniel and P. Schniter, “Efficient high-dimensional inference in the mul- B.A.Sc. degree in computer engineering and math-
tiple measurement vector problem,” IEEE Trans. Signal Process., vol. 61, ematics from the University of Waterloo, Waterloo,
no. 2, pp. 340–354, Jan. 2013. ON, Canada, in 1997, and the M.S. and Ph.D. de-
[20] G. Hannak, M. Mayer, A. Jung, G. Matz, and N. Goertz, “Joint channel grees in electrical engineering from Stanford Univer-
estimation and activity detection for multiuser communication systems,” sity, Stanford, CA, USA, in 1998 and 2002, respec-
in Proc. IEEE Int. Conf. Commun. Workshop, London, U.K., Jun. 2015, tively. Since 2002, he has been with the Electrical
pp. 2086–2091. and Computer Engineering Department, University
[21] D. L. Donoho, I. Johnstone, and A. Montanari, “Accurate prediction of of Toronto, Toronto, ON, Canada, where he is cur-
phase transitions in compressed sensing via a connection to minimax de- rently a Professor and holds a Canada Research Chair
noising,” IEEE Trans. Inf. Theory, vol. 59, no. 6, pp. 3396–3433, Jun. 2013. (Tier 1) in information theory and wireless communi-
[22] A. Montanari, “Graphical models concepts in compressed sensing,” in cations. His research interests include information theory, optimization, wireless
Compressed Sensing: Theory and Applications, Y. C. Eldar and G. communications, and broadband access networks.
Kutyniok, Eds. New York, NY, USA: Cambridge Univ. Press, 2012, ch. Prof. Yu is a Fellow of the Canadian Academy of Engineering. He currently
9, pp. 394–438. serves on the IEEE Information Theory Society Board of Governors (2015–
[23] D. Needell and J. A. Tropp, “CoSaMP: Iterative signal recovery from 2020). He serves as the Chair of the Signal Processing for Communications
incomplete and inaccurate samples,” Appl. Comp. Harmon. Anal., vol. 26, and Networking Technical Committee of the IEEE Signal Processing Society
no. 3, pp. 301–321, May 2009. (2017–2018). He was an IEEE Communications Society Distinguished Lecturer
[24] D. L. Donoho, A. Maleki, and A. Montanari, “The noise-sensitivity phase (2015–2016). He currently serves as an Area Editor for the IEEE TRANSACTIONS
transition in compressed sensing,” IEEE Trans. Inf. Theory, vol. 57, no. 10, ON WIRELESS COMMUNICATIONS (2017–2019). He was an Associate Editor for
pp. 6920–6941, Oct. 2011. the IEEE TRANSACTIONS ON INFORMATION THEORY (2010–2013), an Editor
for the IEEE TRANSACTIONS ON COMMUNICATIONS (2009–2011), an Editor
for the IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS (2004–2007),
and a Guest Editor for a number of special issues for the IEEE JOURNAL ON
SELECTED AREAS IN COMMUNICATIONS and the EURASIP JOURNAL ON AP-
PLIED SIGNAL PROCESSING. He was the Technical Program Co-Chair for the
IEEE Communication Theory Workshop in 2014, and the Technical Program
Zhilin Chen (S’14) received the B.E. degree in Committee Co-Chair of the Communication Theory Symposium at the IEEE
electrical and information engineering and the M.E. International Conference on Communications (ICC) in 2012. He was the recip-
degree in signal and information processing from ient of the IEEE Signal Processing Society Best Paper Award in 2017 and in
Beihang University (formerly Beijing University of 2008, the Journal of Communications and Networks Best Paper Award in 2017,
Aeronautics and Astronautics), Beijing, China, in the Steacie Memorial Fellowship in 2015, the IEEE Communications Society
2012 and 2015, respectively. He is currently working Best Tutorial Paper Award in 2015, the IEEE ICC Best Paper Award in 2013,
toward the Ph.D. degree at the University of Toronto, the McCharles Prize for Early Career Research Distinction in 2008, the Early
Toronto, ON, Canada. His research interests include Career Teaching Award from the Faculty of Applied Science and Engineering,
wireless communication and signal processing. University of Toronto in 2007, and the Early Researcher Award from Ontario in
2006. He is recognized as a Highly Cited Researcher.

You might also like