Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

1

New Low Rank Optimization Model and Convex


Approach for Robust Spectral Compressed Sensing
Zai Yang and Xunmeng Wu

Abstract—This paper investigates recovery of an undamped (where the unobserved samples are called missing data). The
spectrally sparse signal and its spectral components from a set of latter case is known also as spectral super-resolution for
regularly spaced samples within the framework of spectral com- emphasizing the extent to which nearly located frequencies
pressed sensing and super-resolution. We show that the existing
Hankel-based optimization methods suffer from the fundamental can be resolved [6], [7].
arXiv:2101.06433v1 [cs.IT] 16 Jan 2021

limitation that the prior of undampedness cannot be exploited. Line spectral estimation (in the case when all regularly
We propose a new low rank optimization model partially inspired spaced samples are available) has a long history of research,
by forward-backward processing for line spectral estimation and and a great number of approaches have been developed,
show its capability in restricting the spectral poles on the unit see [4]. Among them, the most well-known approach is the
circle. We present convex relaxation approaches with the model
and show their provable accuracy and robustness to bounded (nonparametric) fast Fourier transform (FFT), one of the top
and sparse noise. All our results are generalized from the 1-D ten algorithms in the 20th century [8]. With the rapid increase
to arbitrary-dimensional spectral compressed sensing. Numerical of computing power and the need of higher accuracy and
simulations are provided that corroborate our analysis and show resolution, parametric approaches like maximum likelihood
efficiency of our model and advantageous performance of our estimation (MLE) were developed in which the main difficulty
approach in improved accuracy and resolution as compared to
the state-of-the-art Hankel and atomic norm methods. comes from the nonlinearity and nonconvexity with respect
to the frequency parameters. To overcome this difficulty,
Index Terms—Low rank double Hankel model, doubly en- subspace-based methods, e.g. MUSIC and ESPRIT [9], [10],
hanced matrix completion (DEMaC), line spectral estimation,
spectral compressed sensing, Kronecker’s theorem. were proposed that turn to estimate the frequencies from
the second-order statistics estimated from the samples. The
subspace methods have good statistical properties [11], [12];
I. I NTRODUCTION
however, they need the model order K that is usually unknown
Spectral compressed sensing [2], [3] refers to the recovery and have poor performance in the presence of missing data and
or estimation of a spectrally sparse signal and its spectral outliers.
components from a subset of regularly spaced samples. In Sparse optimization and compressed sensing approaches
particular, the regularly spaced samples {ỹj } are given by have dominated the research of this century on this topic, see
K
X [13]. By exploiting the spectral sparsity and proposing an op-
ỹj = yjo + ej , yjo = sk zkj−1 , |zk | = 1, (1) timization framework for signal recovery, the aforementioned
k=1 drawbacks of subspace methods can be tackled. The idea of
where yjo denotes the ground truth signal, ej is noise, and sparse methods for spectral compressed sensing dates back
zk , sk denote the pole and complex amplitude of the kth to the last century [14]. But rigourous theory and algorithms
spectral component. The signal is called spectrally sparse since have not been established until the seminal papers [6], [7]
the number of spectral components K is small. In this paper, in which a convex optimization approach was proposed to
we consider undamped spectral signals, which are commonly deal with the continuous-valued frequencies. To date, such
encountered in array signal processing, radar, wireless com- convex, gridless sparse methods include atomic norm (or total
munications, and many other areas [4], with zk = ei2πfk , variation norm) methods [5], [6], [15]–[18], enhanced matrix
k = 1, . . . , K lying on the unit circle that have a one-to-one completion (EMaC) [19] and covariance fitting approaches
connection to the frequencies fk ∈ [0, 1), k = 1, . . . , K where [20]–[22]. Unlike the conventional MLE that solves directly
√ for the frequencies in a nonconvex way, interestingly, all these
i = −1. Our objective is to estimate yjo , {zk } and {sk }
given partial entries in {ỹj }. methods turn to optimize explicitly or implicitly a structured
Spectral compressed sensing is known also as off-the-grid low rank matrix where the low-rankness comes from the spec-
compressed sensing [5] in the sense that the frequencies in tral sparsity. In particular, atomic norm and covariance fitting
the model are not assumed on a fixed grid as opposed to are related to a low rank positive semidefinite (PSD) Toeplitz
conventional compressed sensing [2]. When all entries in {ỹj } matrix, and EMaC is rooted on a low rank Hankel matrix.
are available, it is the well-known line spectral estimation Consequently, spectral compressed sensing is connected to the
problem [4] that is fundamental in digital signal processing celebrated low rank matrix recovery problem [23].

Part of this paper will be presented at the 2020 European Signal Processing A. Main Contributions of This Paper
Conference (EUSIPCO) [1].
The authors are with the School of Mathematics and Statistics, Xi’an We note in this paper that the low rank Hankel model widely
Jiaotong University, Xi’an 710049, China (e-mail: yangzai@xjtu.edu.cn). employed for spectral compressed sensing has not been well
2

understood. The first contribution of this paper is to show that practical noisy case. In this paper, we mainly study convex
the low rank Hankel model has a fundamental limitation, to be relaxation approaches that do not need the model order and
specific, it cannot use the on-circle prior |zk | = 1 in the data whose accuracy are proved in absence of noise and with
model (1) and its produced poles do not lie on the unit circle bounded and sparse noise. In fact, an algorithm in [31] is
in general, resulting in difficulties of physical interpretation modified and applied in this paper to the proposed double
and potential performance loss. Hankel model to validate the model efficiency.
The second contribution of this paper is to resolve the
limitation above. To do so, we propose a new low rank
optimization model for spectral compressed sensing that we C. Notation and Organization
call low rank double Hankel model by introducing another Notations used in this paper are as follows. The set of
Hankel matrix into the model. We show analytically that the real and complex numbers are denoted R and C respectively.
new model substantially shrinks the solution space and has Boldface letters are reserved for vectors and matrices. The
great potential to restrict the poles on the unit circle. amplitude of scalar a is denoted |a|. The complex conjugate,
By employing the double Hankel model, as the third con- transpose and complex transpose of matrix A are denoted A,
tribution, we propose new convex optimization approaches AT and AH respectively. The rank of matrix A is denoted
to spectral compressed sensing, called doubly enhanced ma- rank (A). We write A ≥ 0 if A is Hermitian and positive
trix completion (DEMaC) in honor of EMaC, and present semidefinite. The jth entry of vector x is xj . The diagonal
theoretical results showing DEMaC’s accuracy with full and matrix with vector x on the diagonal is denoted diag (x).
compressive/partial sampling and its robustness to bounded The Kronecker and Khatri-Rao (a.k.a. columnwise Kronecker)
and sparse noise. products of matrices are denoted ⊗ and ?, respectively.
We also extend the double Hankel model and the DEMaC The rest of this paper is organized as follows. Section II
approaches from the 1-D to arbitrary-dimensional spectral introduces the previous low rank PSD-Toeplitz and Hankel
compressed sensing. Extensive numerical simulations are pro- models for spectral compressed sensing. Section III presents
vided that confirm our analysis and show that DEMaC per- the new low rank double Hankel model and shows its prop-
forms consistently better than EMaC in various scenarios. erties as compared to the Hankel and PSD-Toeplitz models.
Section IV presents convex relaxation approaches for spectral
B. Related Work compressed sensing by applying the double Hankel model
and shows their theoretical guarantees with and without noise.
It is always an essential and challenging task to exploit the
Section VI provides numerical results that validate efficiency
prior |zk | = 1 for line spectral estimation. The MLE explicitly
of the double Hankel model and advantageous performance
uses the prior and suffers from convergence (to global optima)
of DEMaC as compared to EMaC and atomic norm methods.
issue. Subspace methods consist of two consecutive estimation
Conclusions are drawn in Section VII and some technical
steps, first for the (signal or noise) subspace and second for
proofs for DEMaC are given in the Appendix.
the spectral poles. The forward-backward processing technique
[24]–[26] uses the prior in the first step to improve the estima-
tion accuracy of the covariances from which the subspace is II. P REVIOUS L OW R ANK O PTIMIZATION M ODELS
computed and eventually improves the accuracy of frequency
estimation. MUSIC uses the prior in the second step and We present previous low rank Hankel and Toeplitz optimiza-
performs a line search for frequency estimation [9]. Besides tion models in this section. Here we assume that the number
including the forward-backward technique, unitary ESPRIT of spectral poles K is given. Let
also uses the prior in the second step by solving for the spectral ( )
K
poles from a total least squares problem [27]. Interestingly, N
X j−1
S0 = y ∈ C : yj = sk zk , sk ∈ C, |zk | = 1 (2)
root-MUSIC does not use the prior but has higher resolution
k=1
than MUSIC [28], which implies that exploiting the prior in
the second step might not improve the accuracy because there that denotes the set of candidate signals that are undamped
is no guarantee that the information lost in the first step can spectrally sparse with at most K spectral components. What
be compensated in the second. the framework of spectral compressed sensing does is to
The low rank double Hankel optimization model proposed find some candidate signal ŷ ∈ S0 that is nearest to ỹ =
T
in this paper is partially inspired by the forward-backward [ỹ1 , . . . , ỹN ] and then retrieve the estimated poles from ŷ.
processing technique. Unlike subspace methods and as many To be specific, it attempts to solve the following optimization
other optimization approaches, the estimation in DEMaC is problem:
accomplished in a single step by solving the resulting opti-
mization problem in which the prior is implicitly included. min Loss (y, ỹ) , subject to y ∈ S0 , (3)
y
Equipped with the low rank Hankel model, nonconvex
optimization algorithms have been proposed for spectral com- where Loss (·, ·) denotes a loss function that is chosen accord-
pressed sensing [29]–[31]. But these methods need the model ing to the sampling scheme and noise properties. To cast (3)
order K like the MLE and subspace methods and few theoret- as a tractable optimization problem, the key underlies how to
ical results are known about their accuracy, especially in the characterize the constraint y ∈ S0 .
3

A. Low Rank PSD-Toeplitz Model III. N EW L OW R ANK O PTIMIZATION M ODEL


Let   A. Low Rank Double Hankel Model
t1 t2 ... tN
 t2 t1 ··· tN −1  In this paper, we propose to approximate the set S0 in (2)
Tt= . (4)
 
.. .. ..  by
 .. . . . 
S2 = y ∈ CN : rank Hy | J1 HyJ2 ≤ K ,
  
tN tN −1 ... t1 (10)
denote an N × N Hermitian Toeplitz matrix formed by using where for j = 1, 2, Jj is an Nj × Nj reversal matrix with
t ∈ CN . By exploiting the Carathéodory-Fejér theorem [4, ones on the anti-diagonal and zeros elsewhere. It is seen
Ch. 4.9.2], which states that a PSD Toeplitz matrix admits a that J1 HyJ2 remains to be a Hankel matrix. The resulting
Vandermonde decomposition, S0 can be written equivalently optimization model
as [5], [18]
 
t yH
  min Loss (y, ỹ) , subject to y ∈ S2 (11)
y ∈ CN : 1 ≥ 0, rank (T t) ≤ K, t ∈ CN . y
y Tt
(5) is referred to as low rank double Hankel model.
t1 y H
 
Suppose y ∈ S0 and recall (7). Then we have
The PSDness of implies that T t is also PSD.
y Tt
The resulting PSD-Toeplitz model and its variants lead to the J1 HyJ2 = J1 A1 SAT2 J2 . (12)
atomic norm and covariance fitting methods.
To proceed, it is easy to show the following identity

B. Low Rank Hankel Model Jj Aj = Aj Z 1−Nj , j = 1, 2, (13)


The low rank Hankel model is introduced in [19], [29]
where Z = diag (z1 , . . . , zK ). Substituting (13) into (12), we
inspired by the classical matrix pencil approach [32], [33].
obtain
For y ∈ CN , we form the N1 × N2 Hankel matrix
  J1 HyJ2 = A1 SZ 1−N AT2 , (14)
y1 y2 ... yN2
 y2 y3 . . . yN2 +1  which is in the same form as (7). Consequently,
Hy =  . ..  , (6)
 
. ..
 .. .. . . 

Hy | J1 HyJ2 = A1 SAT2 | A1 SZ 1−N AT2
  
yN1 yN1 +1 . . . yN  T
(15)
A2
where N1 , N2 satisfy N1 + N2 = N + 1. If y ∈ S0 , it can = A1 S
A2 Z 1−N S̃
easily be shown that
has rank no greater than K, where S̃ =
Hy = A1 SAT2 (7) diag−2 (sgn (s1 ) , . . . , sgn (sK )) is a unitary diagonal
s
and thus rank (Hy) ≤ K, where S = diag (s1 , . . . , sk ) and matrix and sgn (s) = |s| for s 6= 0. We therefore conclude
for j = 1, 2, Aj is an Nj × K Vandermonde matrix with
S0 ⊆ S2 ⊆ S1 (16)
[Aj ]n,k = zkn−1 . Let
which implies that, as compared to the Hankel model (9), the
S1 = y ∈ CN : rank (Hy) ≤ K .

(8)
proposed double Hankel model (11) is a tighter relaxation of
Evidently, S0 ⊆ S1 . By changing S0 to S1 , the original the original problem (3). More differences between them will
problem (3) is then relaxed to the following low rank Hankel be shown in the ensuing subsection.
optimization model: It is worth noting that the proposed double Hankel model
is partially inspired by the forward-backward processing tech-
min Loss (y, ỹ) , subject to y ∈ S1 . (9) nique [24], [25]. In particular, let y̆ ∈ CN be such that
y

The model in (9) and its variants form the Hankel-based J1 HyJ2 = Hy̆. (17)
methods.
Though the low rank Hankel model (9) is a relaxation of It is seen that y̆ is the conjugated backward version of y
the original problem (3), it has several merits. For example, satisfying
EMaC, a convex relaxation of (9), has higher resolution than K K
atomic norm methods that use the PSD-Toeplitz model [19]. sk z kN −j =
X X
y̆j = y N −j+1 = sk zkj−N , j = 1, . . . , N
Unlike the PSD-Toeplitz model, several nonconvex optimiza- k=1 k=1
tion algorithms have been proposed for spectral compressed (18)
sensing equipped with the Hankel model [29], [31], [34]. by making use of (2). Equation (18) implies that the virtual
A possible reason is that the Hankel model contains fewer signal y̆ is composed of the same spectral poles {zk } as y,
variables and thus is easier to initialize, which is critical for which is the key observation underlying the forward-backward
nonconvex algorithms. processing technique.
4

B. Properties Using (21), (22), (6) and (24), we obtain


In this subsection we analyze the low rank Hankel and [Hy | Hy̆]
double Hankel models and illustrate advantages of the latter. = A (z) SAT (z) | A (z 0 ) S 0 AT (z 0 )
 
To this end, we let
= A (z) SAT (z) | A (z ? ) S 0 AT (z ? ) (25)
 

SAT (z)
( )  
K
= [A (z) | A (z ? )] ,
X j−1
0 N
S1 = y ∈ C : yj = sk zk , sk ∈ C, 0 6= zk ∈ C S 0 AT (z ? )
k=1
(19) where S 0 = diag (s01 , . . . , s0K ). Consequently, it is seen that
that is obtained by removing the constraint |zk | = 1 in S0 and rank ([Hy | Hy̆]) ≤ K equals the number of distinct poles in
thus S0 ⊂ S10 . To link S10 to S1 , on one hand, we have (7) for K0
{zk }k=1
K0
∪ {zk? }k=1 . Using the fact that zk = zk? if and only
any y ∈ S10 and thus S10 ⊆ S1 . On the other hand, it follows if |zk | = 1, we have y ∈ S20 , where
from the Kronecker’s theorem [35], [36] that if y ∈ S1 and
K < min {N1 , N2 }, then y ∈ S10 except for degenerate cases. S20 = {y ∈ S10 where either |zk | = 1 or zk , zk? appear in pair} .
Therefore, S1 and S10 are approximately identical given K < (26)
min {N1 , N2 }. This implies that the Hankel model (9) can be It can be shown by arguments similar to those above that
viewed as a relaxation of (3) by removing the constraint |zk | = the converse, S20 ⊆ S2 , is also true. Therefore, S20 and S2
1. In other words, by using the Hankel model (9), we have are approximately identical, and we conclude that the double
actually abandoned the prior knowledge that the spectral poles Hankel model (11) can be viewed as a relaxation of (3) by
{zk } lie on the unit circle. This claim is consistent with the allowing appearance of pairs of poles {zk , zk? } if they are not
observation in [19], [29], [32] that the matrix pencil method on the unit circle.
and the Hankel model also apply to damped spectrally sparse Note that if we write zk = rk ei2πfk in the polar coordinate,
signals for which the magnitudes of the spectral poles {|zk |} then zk? = rk−1 ei2πfk and thus the pair of estimated poles
are unknown. {zk , zk? } share an identical frequency, implying that they
We next study the double Hankel model in (11). Assume correspond to a same spectral component given the prior
y ∈ S2 and K < min {N1 , N2 } and recall (17) and (18). Then knowledge that the true spectral poles lie on the unit circle.
we have Intuitively, such pair of estimated poles is obtained only when
two spectral frequencies are closely located so that they cannot
 
rank ([Hy | Hy̆]) = rank Hy | J1 HyJ2 ≤ K. (20) be separated. If all the frequencies are properly separated, it
is expected that all of the estimated poles obtained by using
It follows that rank (Hy) ≤ K < min {N1 , N2 } and the double Hankel model lie on the unit circle, which, as we
rank (Hy̆) ≤ K < min {N1 , N2 }. Applying the Kronecker’s will see, is consistent with our numerical results presented in
theorem results in y, y̆ ∈ S10 almost surely, to be specific, there Section VI.
K K
exist {sk , zk ∈ C}k=1 and {s0k , zk0 ∈ C}k=1 with zk , zk0 6= 0 In the case K = 1, we have the following result.
such that for j = 1, . . . , N , Proposition 1: If K = 1 and N1 , N2 > 1, then

K
X S0 = S20 = S2 ⊂ S10 ⊂ S1 . (27)
yj = sk zkj−1 , (21)
k=1
K Proof: First, when K = 1 it is evident that S0 = S20 ⊂ S10 .
s0k zk0j−1 ,
X
y N −j+1 = y̆j = (22) We show next
k=1
     

 y 
 
 0 

0
   .. 
  

yielding that 0 .
S1 = S1 ∪  .  : y ∈ C ∪   : y ∈ C ⊃ S10 . (28)
 

  ..  
  0 
 

K K  
  
0 y
  
s0k zk0N −1 zk0?j−1 ,
X X
yj = sk zkj−1 = (23)
k=1 k=1 To do so, we assume without loss of generality that N1 = N2
and Hy is a square matrix (Otherwise, we can consider the
where we denote z ? = z −1 for z ∈ C. Equation (23) says that leading square submatrix of Hy of size min (N1 , N2 ) and
y admits two different decompositions with each consisting of draw the same conclusion.) Let H1 be the Hankel matrix
K poles. Recall N = N1 +N2 −1 ≥ 2K +1, and thus any N × obtained by removing the last row and column of Hy. If
2K Vandermonde matrix with distinct poles has full column rank (H1 ) = 0, which implies H1 = 0, we must have that all
rank. Using this fact, we conclude that the two decompositions but the last entry of y are zero. If rank (H1 ) = rank (Hy) = 1,
in (23) must be identical, implying without loss of generality applying [36, Theorem 3.1], we have y ∈ S10 , or all but the
that for some K0 ≤ K, first entry of y are zero.
Now it suffices to show S2 = S20 , or S2 ⊆ S20 , since it
zk = zk0? , sk = s0k zk0N −1 , k = 1, . . . , K0 , is evident that S20 ⊆ S2 . For any y ∈ S2 , it is evident that
sk = s0k = 0, k = K0 + 1, . . . , K. (24) y ∈ S1 . Then by (28) we must have y ∈ S10 since otherwise
5

 
rank Hy | J1 HyJ2 = 2 that implies that y ∈ / S2 . Conse- The DEMaC problem in (30) is convex and can be written
quently, by the same arguments as those in Subsection III-B, as semidefinite programming (SDP) since the nuclear norm
we have y ∈ S20 , completing the proof.  kXk? of matrix X can be cast as the SDP
It is implied by Proposition 1 that when K = 1 the double 1 1

P XH

Hankel model is equivalent to the original problem (3) but the min tr (P ) + tr (Q) , subject to ≥ 0. (31)
P ,Q 2 2 X Q
Hankel model is not.
where P and Q are Hermitian matrices. Therefore, DEMaC
can be solved using off-the-shelf SDP solvers such as SDPT3
C. Comparison With Previous Models [37].
For the proposed double Hankel model and the previous
PSD-Toeplitz and Hankel models, we present in Table I the B. Theoretical Guarantee With Full Sampling
number of (real-valued) optimization variables in the signal Assume Ω = {1, . . . , N } and there is no noise. Then we
space [see (5), (9) and (11)] and in the parameter space [see have full knowledge of the true signal y o and thus it is trivial
(2), (19) and (26)] respectively. The former quantity measures to solve DEMaC in (30). We next study the condition for exact
complexity of an optimization model, and the latter reflects recovery of the distinct frequencies {fk }. It is worth noting
the model accuracy. that an equivalent problem has been studied in [38], [39] in the
The PSD-Toeplitz model is equivalent to the original prob- context of forward-backward spatial smoothing for coherent
lem (3) and thus includes 3K optimization variables in direction-of-arrival (DOA) estimation, where the frequencies
the parameter space. To achieve such equivalence, 2N new are estimated from a covariance matrix that equals
variables in the Hermitian Toeplitz matrix are introduced,  o H
Hy | J1 Hy o J2 Hy o | J1 Hy o J2

resulting in 4N variables in total. The Hankel model contains (32)
H H
a minimum number 2N of variables in y, while its solution = Hy o · (Hy o ) + J1 Hy o · (Hy o ) J1
space is enlarged and K new variables (the amplitudes of poles using a subspace method such as ESPRIT.
K
{|zk |}k=1 ) are introduced in the parameter space. In contrast Let us recall (15) and assume K ≤ N1 −1. We compute the
to those above, the double Hankel model possesses the same truncated singular value decomposition (SVD) of the double
variables as the Hankel model in the signal space and the Hankel matrix as
same number of variables as the PSD-Toeplitz model in the  o
Hy | J1 Hy o J2 = U ΛV H .

parameter space. (33)
If the double Hankel matrix has the maximal rank K, then
IV. C ONVEX R ELAXATION A PPROACHES A1 in (15) and U share an identical range space that is
known as the signal subspace from which the frequencies
In this section we exploit the double Hankel model (11) and can be uniquely computed using ESPRIT. Now our question
propose convex relaxation approaches for spectral compressed becomes under
  what conditions the double Hankel matrix or
sensing with provable accuracy. It is worth noting that we A2
in (15) has rank K.
do not assume that the number of frequencies K is known. A2 Z 1−N S̃
Instead, certain prior knowledge on the measurement noise  if K ≤ N2
Evidently, , then A2 has full column rank and
will be assumed. Our approaches are mainly inspired by [19] A2
thus rank = K.
where the low rank Hankel model is considered, and they A2 Z 1−N S̃
are named as doubly enhanced matrix completion (DEMaC) If N2 < K ≤ 2N2 and for arbitrary  distinct {fk }, it
A2
following EMaC in [19]. follows from [39] that is rank-deficient only
A2 Z 1−N S̃
if the phases {sgn (sk )} of the spectral components satisfy
A. DEMaC in Absence of Noise certain equations that will be violated with probability one if
{sgn (sk )} are jointly distributed according to some absolutely
We first consider the noiseless case where we have access continuous distribution.
to PΩ (y o ), where Ω ⊆ {1, . . . , N } is the sampling index set To sum up, we conclude the following result.
of size M ≤ N , and PΩ is the projection onto the subspace Theorem 1: Assume Ω = {1, . . . , N } and no noise is
supported on Ω that sets all entries of y out of Ω to zero. By present. Then DEMaC is able to exactly recover the distinct
swapping the objective and the constraint, the double Hankel frequencies {fk } if
model (11) is equivalently written as:
N
K ≤ min (N1 − 1, N2 ) ≤ b c, (34)
min rank Hy | J1 HyJ2 , subject to PΩ (y − y o ) = 0.
 
2
y
(29) or with probability one if
Following the literature on low rank matrix recovery, we 2N
K ≤ min (N1 − 1, 2N2 ) ≤ b c (35)
relax the rank function in (29) to the nuclear norm and obtain 3
  and if the phases {sgn (sk )} of the spectral components are
(DEMaC) min Hy | J1 HyJ2 ?
y
(30) jointly distributed according to some absolutely continuous
subject to PΩ (y − y o ) = 0. distribution on the K-dimensional torus.
6

TABLE I
Number of real-valued optimization variables of three optimization models in signal and parameter spaces. The former quantity measures model complexity
(the less the better), and the latter reflects model accuracy (the ground truth is 3K).

Optimization models #variables in signal space #variables in parameter space

PSD-Toeplitz 4N 3K

Hankel 2N 4K

Double Hankel (proposed) 2N 3K

Note that all equalities in (34)–(35) can be achieved, imply- and thus
ing that up to b N2 c frequencies can be recovered determinis- 1h  i
tically and up to b 2N λmin (G2 ) ≥ λmin (G02 ) + λmin S̃Z 1−N G02 Z N −1 S̃
3 c frequencies almost surely. In contrast 2
to this, no more than b N2 c frequencies can be recovered if 1
the Hankel model is used, as in [19]. Moreover, to make the = [λmin (G02 ) + λmin (G02 )]
2
number of recoverable frequencies as large as possible, we = λmin (G02 ) ,
should choose N1 , N2 such that 2N2 ≥ N1 ≥ N2 according (42)
to (34)–(35), implying
where the first equality holds since both Z and S̃ are unitary
2N N
≥ N1 ≥ . (36) matrices. Consequently, the incoherence property defined here
3 2 with respect to DEMaC is easier to satisfy than that with
This choice also makes sense in the compressive sampling EMaC in [19].
settings as shown later. Let  
The SVD plus ESPRIT method mentioned above can also be N N
cs = max , (43)
used in general settings that we will study next to retrieve the N1 2N2
estimated poles {ẑk } from the solution of the double Hankel that measures how close the double Hankel matrix is to a
matrix once DEMaC is numerically solved. square matrix. The factor 2 in the second denominator in (43)
comes along with the definition of G2 in (38).
C. Theoretical Guarantee With Compressive Sampling We have the following result that is in parallel with [19,
Theorem 1] and shows that, like EMaC, DEMaC perfectly re-
When partial entries of y o are observed in absence of noise, covers the spectrally sparse signal from O K log4 N random

we show in this subsection that DEMaC has similar theoretical samples.
guarantee as EMaC. We first introduce some notations similar Theorem 2: Let y o be given in (1), and Ω the random index
to those in [19]. Recall (15), and let set of size M . Suppose that the incoherence property (39)
  holds and that all samples are noiseless. Then there exists a
A2
Ã2 = (37) universal constant c1 > 0 such that y o is the unique solution to
A2 Z 1−N S̃
DEMaC in (30) with probability exceeding 1−N −2 , provided
and that
1 H 1 M > c1 µ1 cs K log4 N. (44)
G1 = A A 1 , G2 = ÃH Ã2 . (38)
N1 1 2N2 2
 
The double Hankel matrix Hy | J1 HyJ2 is said to obey the The proof of Theorem 2 is similar to that of [19, Theorem
incoherence property with parameter µ1 if 1], and the latter is inspired by [40] in which the general
low rank matrix recovery problem is studied via affine nuclear
1 1
λmin (G1 ) ≥ and λmin (G2 ) ≥ , (39) norm minimization. In particular, as in [19], [40], DEMaC is
µ1 µ1 first rewritten as a nuclear norm minimization problem under
where λmin (·) denotes the minimum eigenvalue of a matrix. affine (and additional) constraints that are introduced to iden-
Note that the incoherence property is defined in [19] based on tify the sampling basis and capture the Hankel structures of
G1 above and the double Hankel matrix to recover. Then, it is shown that the
1 H sampling basis fulfils the incoherence condition with respect
G02 = A A2 . (40) to the tangent space of the true low rank matrix. Finally, a
N2 2
dual certificate is constructed and rigorously verified via the
It is seen from (37) and (38) that golfing scheme [40]. The new challenges in our proof are to
1 identify the sampling basis with the double Hankel model and
G2 = · AH2 A2 + S̃Z
N −1 AH A Z 1−N S̃
2 2 to properly deal with the complex conjugate operator in the
2N2 (41)
1 h i double Hankel matrix that leads to non-affine constraints. The
= G02 + S̃Z 1−N G02 Z N −1 S̃ , detailed proof is complicated and deferred to the Appendix.
2
7

The sample size shown in Theorem 2 is an increasing V. E XTENSION TO D IMENSION T WO AND A BOVE
function of cs defined in (43). Consequently, we should choose
N1 , N2 such that cs is small. Therefore, the choice in (36) is In this section, we extend the low rank double Hankel
also appropriate in this compressive sampling setting. model and the DEMaC approaches from the 1-D to arbitrary-
dimensional spectral compressed sensing. Without loss of
generality and for simplicity of notations, we first consider the
2-D case. Certain notations such as N1 , N2 will be redefined.
D. DEMaC With Bounded Noise
Corresponding to (1), a noiseless 2-D spectrally sparse signal
We now consider the noisy case where the acquired samples is given by a 2-way array (or matrix) y whose entries are
PΩ (ỹ) are given by (1), with kPΩ (e)k2 ≤ η. Consequently, K
we solve the following noisy version of DEMaC: X j1 −1 j2 −1
yj1 ,j2 = sk zk,1 zk,2 , |zk,1 | = |zk,2 | = 1, (49)
  k=1
(Noisy-DEMaC) min Hy | J1 HyJ2 ?
y
(45) where 1 ≤ j1 ≤ N1 , 1 ≤ j2 ≤ N2 , and the total sample size
subject to kPΩ (y − ỹ)k ≤ η. is N = N1 · N2 .
Let yj1 ,: denote the N2 × 1 vector composed of
The following result is a consequence of combining Theo-
yj1 ,1 , . . . , yj1 ,N2 for fixed j1 . As in (6), Hyj1 ,: denotes an
rem 2 and [19, Theorem 2]. Its proof can be derived by slightly
N2,1 × N2,2 Hankel matrix formed by using yj1 ,: where
modifying the proof of [19, Theorem 2] based on technical
N2,1 + N2,2 = N2 + 1. Then we define a 2-level Hankel
results in the proof of Theorem 2 and thus is omitted.
(or Hankel-block-Hankel) matrix Hy as an N1,1 × N1,2
Theorem 3: Suppose ỹ is a noisy copy of y o that satisfies block Hankel matrix formed by Hy1,: , . . . , HyN1 ,: where
kPΩ (y o − ỹ)k ≤ η. Under the assumptions of Theorem 2, N1,1 + N1,2 = N1 + 1, to be specific,
the solution ŷ to Noisy-DEMaC in (45) satisfies
 
Hy1,: Hy2,: . . . HyN1,2 ,:
kHŷ − Hy o kF ≤ 5N 3 η (46)  Hy2,: Hy3,: . . . HyN1,2 +1,: 
Hy =  . (50)
 
.. .. .. ..
with probability exceeding 1 − N −2 .
 . . . . 
HyN1,1 ,: HyN1,1 +1,: . . . HyN1 ,:

For j = 1, 2 and l = 1, 2, we define Ajl as an Njl × K


E. DEMaC With Sparse Noise Vandermonde matrix with [Ajl ]n,k = zk,j n−1
and let Al =
It is an important research topic to deal with outliers in data A1l ? A2l , where ? denotes the Khatri-Rao (or column-wise
samples that can be caused for example by saturation in data Kronecker) product. Consequently, the kth column of Aj
quantization and abnormal sensor behaviors. We assume in this represents a sampled 2-D sinusoid with pole zk = (zk,1 , zk,2 ).
case that a constant portion of the acquired samples PΩ (ỹ) Using these notations, it can be readily verified that (we omit
are outliers that are modelled via additive sparse noise. As the details)
in [19], we modify DEMaC to include the sparse noise by Hy = A1 SAT2 , (51)
solving the following problem:
  which is identical to (7). This implies that the 2-level Hankel
(Robust-DEMaC) min Hy | J1 HyJ2 ? + 2λ kHek1 matrix Hy is low rank.
y,e
For l = 1, 2, let Jl be the N1l N2l × N1l N2l reversal matrix
subject to PΩ (y + e) = PΩ (ỹ) . that can be written as a Kronecker product, Jl = J1l ⊗ J2l ,
(47) where Jjl is the Njl × Njl reversal matrix for j = 1, 2
and l = 1, 2. By applying the following identity for matrices
Note in (47) that kHek1 = kvec (He)k1 is defined as the el-
A, B, C, D of proper dimension
ementwise `1 norm and thus 2 kHek1 = He | J1 HeJ2 1 .
We have the following result by simply combining Theorem (A ⊗ B) (C ? D) = (AC) ? (BD) , (52)
2 and [19, Theorem 3] whose proof is omitted.
Theorem 4: Suppose ỹ is a noisy copy of y o . Let Ω be we have that for l = 1, 2,
the random index set of size M , and conditioning on Ω 
each sample in Ω is corrupted by noise independently with Jl Al = (J1l ⊗ J2l ) A1l ? A2l
conditional probability τ ≤ 0.1. Set λ = √M 1log N . Then there
 
= J1l A1l ? J2l A2l
exists a numerical constant c1 > 0 depending only on τ such    
= A1l Z11−N1l ? A2l Z21−N2l (53)
that if (39) holds and
= (A1l ? A2l ) Z11−N1l Z21−N2l
M > c1 µ21 c2s K 2 log3 N, (48) = Al Z11−N1l Z21−N2l ,
then y o is the unique solution to Robust-DEMaC in (47) with where Zj = diag (z1j , . . . , zkj ) for j = 1, 2, the third equality
probability exceeding 1 − N −2 . follows from (13), and the fourth equality holds since Z1 , Z2
8

are diagonal matrices. It immediately follows from (51) and Algorithm 1 The IHT algorithm with double Hankel model
(53) that Initialize: y0 = ye
for t = 1, ... do
J1 HyJ2 = J1 A1 SAT2 J2 Dt = HD (yt + αt (e
y − yt ))
T
= J1 A1 · S J2 A2 Gt = ΓK (Dt )
(54) yt+1 = HD† (Gt )
= A1 Z11−N11 Z21−N21 SZ21−N22 Z11−N12 AT2
end for
= A1 SZ11−N1 Z21−N2 AT2 ,
where the last equality holds since Z1 , Z2 , S are all diagonal
matrices. Therefore, as in the 1-D case, the new 2-level Hankel 1200
Hankel
matrix J1 HyJ2 admits a similar decomposition as Hy. Double Hankel

As in the 1-D case, by replacing the 2-level 1000 1000 1000


 Hankel matrix 1000
997

Hy by the double 2-level Hankel matrix Hy | J1 HyJ2 ,

Number of successful trials


we are able to propose the double Hankel model and the 800
DEMaC approaches for 2-D spectral compressed sensing.
They generalize the Hankel model and EMaC in [19], [29],
600
[31]. Note that the theoretical results for DEMaC also hold in
the 2-D case as in [19].
The double Hankel model and the DEMaC approach can 400

be generalized to arbitrary dimension. In dimension d ≥ 3,


the noiseless data y are given by a d-way array that can be 200
used to form a d-level Hankel matrix Hy as in (50), to be
specific, a d-level Hankel matrix is a block Hankel matrix with 0
0 0
each block being a (d − 1)-level Hankel matrix. Then we can Noiseless data Noisy data (random freqs.) Noisy data (spaced freqs.)

similarly define Al , l = 1, 2 as the Khatri-Rao product of


d Vandermonde matrices, of which each column represents a
Fig. 1. Histogram of the number of successful trials of IHT with the Hankel
sampled d-D sinusoid, and derive identities (51) and (54). and double Hankel models with SNR = ∞ and 0dB and a total number of
1000 trials.
VI. N UMERICAL R ESULTS
A. Model Efficiency separation N4 (referred to as random and spaced frequencies
In this subsection we validate efficiency of the double respectively), with amplitudes 0.5 + |w| and random phases,
Hankel model proposed in Section III in restricting the spectral where w follows the standard normal distribution. A number
poles on the unit circle. To this end, we present an iterative of N = 65 uniform samples are acquired with random noise
hard thresholding (IHT) algorithm to solve (11) that is mod- added that follows a complex normal distribution. We set
ified from the one rooted on the Hankel model proposed in N1 = N2 = 33 in the Hankel and double Hankel models.
[31]. In particular, we consider the noisy, full sampling case The IHT algorithm is terminated if kyt+1 − yt k2 / kyt k2 <
and use the least square loss function. The double Hankel 10−5 or maximally 3000 iterations are reached. We say that
model that we need to solve is thus given by an algorithm successfully produces poles {ẑk } on the unit
1 circle
PKin a trial if their−4average distance to the unit circle
2 1
min ky − ỹk2 , subject to rank (HD (y)) ≤ K, (55) K k=1 ||ẑk | − 1| < 10 .
y 2
  In our simulation, we consider the noiseless case and the
where HD (y) = Hy | J1 HyJ2 . noisy case with the signal to noise ratio SNR = 0dB and run
Our algorithm is illustrated in Algorithm 1. It starts with 1000 trials for each. The number of successful trials in which
the initialization y0 = ye. Then gradient descent is used to the estimated poles lie on the unit circle is shown in Fig. 1. In
update the signal estimate at each iteration with step size the absence of noise, the true spectral poles can be obtained
αt = √1t . After the double Hankel matrix Dt is formed, with either Hankel or double Hankel model, as predicted in
the hard thresholding operator ΓK is applied to obtain the Theorem 1. In the presence of noise, the estimated spectral
best rank-K approximation, Gt , of Dt by setting all but its poles with the Hankel model lie off the unit circle, which is
largest K singular values to zero. The new estimate yt+1 is consistent with our analysis. In contrast to this, the proposed
obtained as the minimizer to the problem miny kHD y − Gt k2 , double Hankel approach rarely fails. It is seen that when
where HD† denotes the pseudoinverse of the double Hankel the frequencies are properly separated, the IHT with double
operator. Note that as compared to [31] the Hankel operator H Hankel model succeeds in all trials to restrict the estimated
is changed to the double Hankel operator HD in Algorithm 1, poles on the unit circle.
and a descending step size is adopted that empirically ensures To study what happens when the double Hankel model fails
feasibility of the solution once converged. to produce poles on the unit circle, we present in Table II the
In our simulation, we consider K = 3 spectral frequencies estimated poles of the three failures shown in Fig. 1. It is seen
that are randomly generated without or with a minimum that all the failures occur when two frequencies are close to
9

TABLE II
Details of the three failures with the double Hankel model in Fig. 1

Ground truth of poles Estimated poles with the double Hankel model Estimated poles with the Hankel model

ei2π{0.7568,0.5952,0.6019} 1.0000ei2π0.7583 , 1.0155ei2π0.5970 , 1.0155−1 ei2π0.5970 0.9948ei2π0.7583 , 1.0466ei2π0.5951 , 0.9900ei2π0.3786


 

ei2π{0.0733,0.2271,0.2420} 1.0000ei2π0.0730 , 1.0291ei2π0.2331 , 1.0291−1 ei2π0.2331 0.9989ei2π0.0729 , 0.9739ei2π0.2241 , 1.0774ei2π0.2449


 

ei2π{0.5708,0.9943,0.9819} 1.0000ei2π0.2233 , 1.0331ei2π0.9862 , 1.0331−1 ei2π0.9862 0.9501ei2π0.7540 , 0.9604ei2π0.9883 , 1.0352ei2π0.9831


 

ANM
1
We first consider the noiseless case and study the success
12 0.8 rate of DEMaC as compared to ANM and EMaC. In particular,
a number of M = 30 samples are randomly selected from
Sparsity K

0.6
8

0.4 N = 65 regularly spaced samples. The coefficients {sk } and


4
0.2
noise are generated as in the previous simulation. A number
0
of K frequencies are randomly generated such that they are
0.4 0.8 1.2 1.6 2
mutually separated by at least ∆f and two of them are sep-
(a) arated exactly  by ∆f , where we consider K ∈ {1, 2 . . . , 20}
and ∆f ∈ 0, 0.1 N , ..., 2
N . We say that the spectrally sparse
1 1
signal y o (and its frequencies) is successfully recovered by
12 0.8 12 0.8
an algorithm if the normalized mean squared error (NMSE)
kŷ−y o k22
≤ 10−10 . The success rate is calculated by averaging
Sparsity K

Sparsity K

0.6 0.6
8 8
ky o k22
0.4 0.4
over 20 Monte Carlo trials for each combination (K, ∆f ).
4 4
0.2 0.2
Our results are presented in Fig. 2, where white means
0.4 0.8 1.2 1.6 2
0
0.4 0.8 1.2 1.6 2
0 complete success and black means complete failure. Phase
transition behaviors are observed in the sparsity-separation
(b) (c) plane. It is seen that ANM has a resolution of about 1/N ,
1 1
as reported in [5]. EMaC is insensitive to the frequency sepa-
12 0.8 12 0.8
ration, as observed in [19]. Like EMaC, DEMaC is insensitive
to the separation and has an enlarged success phase thanks to
Sparsity K

Sparsity K

0.6 0.6
8 8
the double Hankel model adopted.
0.4 0.4

4 4
We also present in Fig. 2 the results of DEMaC with two
0.2 0.2
other choices of N1 . It is seen that slightly better performances
0.4 0.8 1.2 1.6 2
0
0.4 0.8 1.2 1.6 2
0
can be obtained in this case with N1 ≈ 0.6N and N1 ≈ 2/3N .
We recommend to use N1 ≈ 0.6N in practice.
(d) (e)
We test the performance of DEMaC with bounded noise.
Fig. 2. Sparsity-separation phase transition results of (a) ANM, (b) We
 0.1consider K = 2 frequencies, vary their distance ∆f ∈
EMaC with N1 = b0.5(N + 1)c, and (c)-(e) DEMaC with N1 = , 0.3
, . . . , 1.9
and the noise level η ∈ {0.1, 1, 10}, set
b0.5(N + 1)c , N1 = b0.6(N + 1)c and N1 = b2(N + 1)/3c with N N N
N = 65 and M = 30. White means complete success and black means N1 = 33 for both Noisy-EMaC and Noisy-DEMaC, and keep
complete failure. other settings. We plot the curves of signal recovery error in
Fig. 3(a). It is seen that DEMaC and EMaC are insensitive
to the distance of frequencies and the former consistently
each other with distance less than N1 . Their estimated poles are results in smaller errors than the latter. ANM has the largest
located symmetrically on two sides of the unit circle, which errors with closely located frequencies and light noise. We also
verifies our analysis carried out in Section III. compare the errors of frequency recovery. Since the model
order K might not be correctly determined by using either
approach, to make a fair comparison, we assume K is given
B. Accuracy and Robustness of DEMaC and estimate the frequencies from the signal estimate using
In this subsection, we study numerical performance of the ESPRIT assisted with forward-backward processing. We plot
proposed DEMaC as compared to EMaC [19] and atomic the error curves in Fig. 3(b). It is seen that DEMaC also
norm minimization (ANM) [5]. All three approaches are outperforms EMaC and has the smallest error in the case of
implemented in Matlab with the CVX toolbox and the SDPT3 closely located frequencies.
solver [37], [41]. Note that ANM always produces spectral We now evaluate robustness of DEMaC to additive sparse
poles on the unit circle and EMaC does not. As in the previous noise. We consider the full sampling case and vary the sparsity
simulation, it is observed that DEMaC produces spectral poles level K and the number of corrupted samples. The success
on the unit circle in almost all trials due to the double Hankel rates of Robust-EMaC and Robust-DEMaC are presented in
model adopted. We next focus on the resolution and accuracy Fig.4. It is seen that DEMaC has an enlarged success phase
of the three approaches. than EMaC and thus a better performance.
10

N1,2 = N2,1 = N2,2 = 6 for both EMaC and DEMaC. We


10 0
plot in Fig. 5 the curves of signal recovery error averaged
over 20 Monte Carlo trials. Again, it is seen that DEMaC
consistently outperforms EMaC. We also compute the average
10 -1
distance fromqthe estimated signal poles {(ẑk,1 , ẑk,2 )} to the
1
PK P2 2
2-D torus as K l=1 (|ẑk,l | − 1) for each trial. The
Normalized mean squared error

k=1
10 -2
histogram plots for EMaC and DEMaC are presented in Fig. 6.
It is seen that the estimated spectral poles of DEMaC are
pushed to the 2-D torus (within numerical precision) in most
10 -3
trials due to the double Hankel model adopted.

10 -4
100
EMaC
DEMaC

10 -5
0.1 0.5 0.9 1.3 1.7

10-1

Normalized mean squared error


(a) Signal recovery error

10 0 10-2

10 -1

10 -2
-3
10

-3
10
Mean squared error

10 -4

10-4
20 36 52 68 84 100 116
10 -5
Sample size M

10 -6

Fig. 5. Results of signal recovery errors of EMaC and DEMaC in the 2-D
10 -7
case with compressive samples and random frequencies.
10 -8

10 -9
0.1 0.5 0.9 1.3 1.7

(b) Frequency recovery error

Fig. 3. Results of signal and frequency recovery errors using DEMaC, EMaC
and ANM with varying frequency separation and noise levels.

1 1

17 17
0.8 0.8
Sparsity K

Sparsity K

13 13
0.6 0.6

9 9
0.4 0.4

5 5
0.2 0.2

1 1
0 0
1 9 17 25 33 1 9 17 25 33
Number of sparse noise Number of sparse noise

(a) EMaC (b) DEMaC

Fig. 4. Results of success rates of (a) EMaC and (b) DEMaC with varying
sparsity K and varying number of sparse noise when N = 65 regularly
spaced samples are acquired.

Fig. 6. Histogram of distance of the estimated poles to the 2-D torus using
C. The 2-D Case EMaC and DEMaC in the 2-D case with compressive samples and random
frequencies.
We present numerical results in the 2-D case in this sub-
section. We consider recovery of 2-D spectrally sparse signals
consisting of K = 3 spectral components, whose frequencies
are randomly generated, from a subset of N1 × N2 = 11 × 11 VII. C ONCLUSION
regularly spaced noisy samples. We fix the noise level η = 1, In this paper, a low rank double Hankel model was proposed
vary the sample size M ∈ {20, 24, . . . , 120} and let N1,1 = for arbitrary-dimensional spectral compressed sensing that
11

is capable of pushing the spectral poles to the unit circle We also define the projection EΩ0 that takes sum only over
and resolves a fundamental limitation of the previous low non-repetitive elements in Ω, and its complement operator
rank Hankel model. By applying the new model, the convex EΩ0 ⊥ = E − EΩ0 .
relaxation DEMaC approaches were proposed that have prov- Using these notations, the Hankel structures in Y are
able accuracy and are shown theoretically and numerically to captured by E ⊥ (Y ) = 0 and the samples of Y are given
by EΩ0 (Y ) = EΩ0 (Y o ), where Y o = Hy o | J1 Hy o J2 . Note

outperform EMaC rooted on the Hankel model.
By modifying the IHT algorithm in [31], we have presented however that the complex conjugate relationship between the
in this paper a nonconvex algorithm for spectral compressed two Hankel matrices in Y is not algebraic and cannot be
sensing with the double Hankel model. A future research represented by an affine constraint. To include this constraint,
direction is to analyze its theoretical performance such as we further define the set
global convergence and accuracy. In fact, it is expected that the
C = X | J1 XJ2 : X ∈ CN1 ×N2
 
double Hankel model can be incorporated into any approach (56)
rooted on the Hankel model and brings improvement for and impose Y ∈ C. Now DEMaC is readily rewritten as
recovery of undamped spectrally sparse signals and their
spectral poles. min kY k?
Y ∈C
Spectral poles that are located symmetrically on two sides subject to EΩ0 (Y ) = EΩ0 (Y o ) , (57)
of the unit circle may still appear by using the double Hankel ⊥ ⊥ o
model. Another future research direction is to study new E (Y ) = E (Y ) = 0.
models that exclude such exceptions and meanwhile preserve Note that (57) is not a common affine constrained nuclear
low model complexity. norm minimization problem due to the restriction onto the
feasible set C.
A PPENDIX
We prove Theorem 2 in this appendix. Our proof is similar B. Dual Certificate
to that of [19, Theorem 1] and both are inspired by [40]. We
Recall the SVD of Y o in (33) and denote by T the tangent
will follow the same steps as in [19], [40]: 1) identify the
space with respect to Y o . Let PU (resp. PV , PT ) be the
sampling basis with respect to the low rank matrix that needs
orthogonal projection onto the row (resp. column, tangent)
to recover and reformulate DEMaC as affine nuclear norm
space of Y o , yielding for any matrix Y that
minimization (note that not all constraints are affine in our
case), 2) identify a dual certificate that validates optimality PU (Y ) = U U H Y , (58)
of a solution, 3) show that the sampling basis is incoherent PV (Y ) = Y V V H
, (59)
with the tangent space of the ground truth low rank matrix,
and 4) construct a dual certificate that validates optimality and PT = PU + PV − PU PV . (60)
of the ground truth as a solution to the affine nuclear norm We denote by PT⊥ = I − PT the orthogonal complement of
minimization. PT , where I is the identity operator.
We provide the dual certificate for DEMaC in the following
A. Problem Reformulation lemma that is in parallel with [19, Lemma 1]. The proof is
As in [19], we try to find the sampling basis and rewrite almost unaltered and thus is omitted.
DEMaC as an affine constrained nuclear norm minimiza- Lemma 1: Consider a multi-set Ω that contains M random
tion problem with respect to the low rank matrix Y = indices. Suppose that the sampling operator EΩ obeys
 
Hy | J1 HyJ2 . Note that each sample yjo , j ∈ Ω corresponds N 1
to one off-diagonal of each Hankel matrix. Consequently, PT EPT − PT EΩ PT ≤ . (61)
M 2
for every sampling index n = 1, . . . , N , let En be the
(normalized) nth N1 × N2 elementary Hankel matrix with If there exists matrix W satisfying
En (j, l) = √1ωn if j + l = n + 1 and zeros elsewhere, where EΩ0 ⊥ (W ) = 0, (62)
ωn equals the number of nonzero entries in En . Denote by 1
(1) (2)
PT W − U V H F ≤

En (resp. En ) the projection onto the subspace spanned by , (63)
(1) (2) 2N 2
En = [En | 0]N1 ×2N2 (resp. En = [0 | EN −n+1 ]N1 ×2N2 ). 1
(1) (2) PT⊥ (W ) ≤ , (64)
Then, En = En + En is the nth sampling operator PN with 2
respect to Y . We also define the projection E = n=1 En then Y o is the unique solution to (30) or, equivalently, y o is
and its orthogonal complement E ⊥ = I − E. the unique minimizer of DEMaC.
For ease of analysis, as in [19], [40], we assume that the
sampling indices in Ω are i.i.d. generated so that Ω is a multi-
set that possibly contains repeated elements. Note that the C. Proof of Incoherence Condition (61)
derived low bounds on the sample size also apply in the To show the incoherence condition (61), we first bound
(j)
case when Ω is selected uniformly at random from those the projection of each En onto the tangent space T in the
with distinct elements according to discussions in [40].
P Con- following lemma that is in parallel with [19, Lemma 2]. The
sequently, the sampling operator is given by EΩ = n∈Ω En . proof is similar and will be omitted.
12

Lemma 2: Given (39), we have 2) For  all i  = 1, . . . , j0 , let F =


  µ c K
1 s
  µ c K
1 s
PT E − 1q EΩi PT (Fi−1 );
PU En(j) ≤ , PV En(j) ≤ (65) Pj0  1 
N N ⊥
3) Set W = i=1 q EΩi + E (Fi−1 ).
for all n = 1, . . . , N and j = 1, 2. For any n1 , n2 = 1, . . . , N The remaining task is to show that W satisfies (62)–(64)
and any j1 , j2 = 1, 2, we have in Lemma 1. In fact, (62) and (63) can be shown by the same
arguments as in [19]. The proof of (64) is also similar if we
r
D  E ωn2 2µ1 cs K
En(j11 ) , PT En(j22 ) ≤ . (66) change the definitions of the norms k·kE,∞ , k·kE,2 introduced
ωn1 N
in [19] as:
D E
The incoherence condition (61) is established in the follow- X2 (j)
En , Y
ing lemma that is in parallel with [19, Lemma 3]. kY kE,∞ = max √ , (75)
Lemma 3: For any constant 0 <  ≤ 12 , we have n
j=1
ωn
v E2
N u D
uN 2 (j)
PT EPT − PT EΩ PT ≤  (67) u X X En , Y
M kY kE,2 = t √ . (76)
ωn
with probablity exceeding 1 − N −4 , provided that M > n=1 j=1

c1 µ1 cs K log N for some universal constant c1 > 0. Based on (75) and (76), we can prove analogous versions of
Proof: The proof is similar to that of [19, Lemma 3] and Lemmas 4–7 in [19] with minor modifications that together
we mainly highlight a few differences. We define a family of result in (64). We will omit the details.
operators It is worth noting that the constraint Y ∈ C is not utilized in
N 1 our proof (so that the number of free variables is doubled) and
Zn = PT En PT − PT EPT (68)
M M the resulting universal constant c1 is doubled as compared to
for n = 1, . . . , N . It is seen that that in [19]. After all, the derived sample size shares the same
N X order-wise complexity as in [19] that is of essential importance
PT EPT − PT EΩ PT = − Zn . (69) in such big-data analysis. It is interesting to investigate in
M
n∈Ω future studies how to use the constraint Y ∈ C to further
As in [19], we can compute reduce the sample size.
  2
2µ1 cs K
PT En(j) PT ≤ PT En(j) ≤ , (70) R EFERENCES
FN [1] Z. Yang and X. Wu, “Forward-backward Hankel matrix fitting for spec-
where the second inequality follows from Lemma 2. Thus, tral super-resolution,” in 28th European Signal Processing Conference
(EUSIPCO), 2020.
2 [2] E. J. Candès, J. Romberg, and T. Tao, “Robust uncertainty principles:
X 4µ1 cs K
kPT En PT k ≤ PT En(j) PT ≤ . (71) Exact signal reconstruction from highly incomplete frequency informa-
j=1
N tion,” IEEE Transactions on Information Theory, vol. 52, no. 2, pp.
489–509, 2006.
For any n ∈ Ω that is uniformly drawn from {1, . . . , N }, [3] M. F. Duarte and R. G. Baraniuk, “Spectral compressive sensing,”
Applied and Computational Harmonic Analysis, vol. 35, no. 1, pp. 111–
by derivations similar to those in [19], we have 129, 2013.
[4] P. Stoica and R. L. Moses, Spectral analysis of signals. Upper Saddle
E [Zn ] = 0, (72) River, NJ, US: Pearson/Prentice Hall, 2005.
N 8µ1 cs K [5] G. Tang, B. N. Bhaskar, P. Shah, and B. Recht, “Compressed sensing
kZn k ≤ 2 max kPT En PT k ≤ , (73) off the grid,” IEEE Transactions on Information Theory, vol. 59, no. 11,
n M M pp. 7465–7490, 2013.
X N 1 8µ1 cs K [6] E. J. Candès and C. Fernandez-Granda, “Towards a mathematical theory
E Zn2 ≤
 
kPT En PT k + ≤ . (74)
M M M of super-resolution,” Communications on Pure and Applied Mathemat-
n∈Ω ics, vol. 67, no. 6, pp. 906–956, 2014.
The conclusion is finally drawn by also applying the Bernstein [7] ——, “Super-resolution from noisy data,” Journal of Fourier Analysis
and Applications, vol. 19, no. 6, pp. 1229–1254, 2013.
inequality [42, Theorem 1.6] (see also [19, Lemma 11]). [8] J. Dongarra and F. Sullivan, “Guest editors introduction: The top 10
It is worth noting that the upper bounds in (73) and (74) algorithms,” Computing in Science & Engineering, vol. 2, no. 1, pp.
are amplified by a scaling factor 2 as compared to those in 22–23, 2000.
[9] R. Schmidt, “A signal subspace approach to multiple emitter location
[19]. Consequently, the universal constant c1 is doubled.  spectral estimation,” Ph.D. dissertation, Stanford University, 1981.
[10] R. Roy, A. Paulraj, and T. Kailath, “ESPRIT–A subspace rotation
approach to estimation of parameters of cisoids in noise,” IEEE Trans-
D. Construction of Dual Certificate W actions on Acoustics, Speech, and Signal Processing, vol. 34, no. 5, pp.
1340–1342, 1986.
We construct a dual certificate W using the golfing scheme [11] A. Fannjiang, “Compressive spectral estimation with single-snapshot
introduced in [40] as in [19]. In particular, suppose that Ω = ESPRIT: Stability and resolution,” arXiv preprint arXiv:1607.01827,
∪ji=1
0
Ωi where Ωi ’s are independently generated multi-sets, 2016.
[12] W. Liao and A. Fannjiang, “MUSIC for single-snapshot spectral es-
each containing M 1
j0 i.i.d. samples. Let  < e , j0 = 5 log  N
1
timation: Stability and super-resolution,” Applied and Computational
M
and q = N j0 . The dual certificate W is constructed in the Harmonic Analysis, vol. 40, no. 1, pp. 33–67, 2016.
following three steps: [13] Z. Yang, J. Li, P. Stoica, and L. Xie, “Sparse methods for direction-of-
arrival estimation,” Academic Press Library in Signal Processing Volume
1) Set F0 = U V H ; 7 (R. Chellappa and S. Theodoridis, Eds.), pp. 509–581, 2018.
13

[14] I. F. Gorodnitsky and B. D. Rao, “Sparse signal reconstruction from [36] R. L. Ellis and D. C. Lay, “Factorization of finite rank Hankel and
limited data using FOCUSS: A re-weighted minimum norm algorithm,” Toeplitz matrices,” Linear Algebra and its Applications, vol. 173, pp.
IEEE Transactions on Signal Processing, vol. 45, no. 3, pp. 600–616, 19–38, 1992.
1997. [37] K.-C. Toh, M. J. Todd, and R. H. Tütüncü, “SDPT3–a MATLAB soft-
[15] V. Chandrasekaran, B. Recht, P. A. Parrilo, and A. S. Willsky, “The con- ware package for semidefinite programming, version 1.3,” Optimization
vex geometry of linear inverse problems,” Foundations of Computational Methods and Software, vol. 11, no. 1-4, pp. 545–581, 1999.
Mathematics, vol. 12, no. 6, pp. 805–849, 2012. [38] S. U. Pillai and B. H. Kwon, “Forward/backward spatial smoothing
[16] B. N. Bhaskar, G. Tang, and B. Recht, “Atomic norm denoising with techniques for coherent signal identification,” IEEE Transactions on
applications to line spectral estimation,” IEEE Transactions on Signal Acoustics, Speech, and Signal Processing, vol. 37, no. 1, pp. 8–15, 1989.
Processing, vol. 61, no. 23, pp. 5987–5999, 2013. [39] Y.-H. Choi, “On conditions for the rank restoration in forward/backward
[17] J.-M. Azais, Y. De Castro, and F. Gamboa, “Spike detection from spatial smoothing,” IEEE Transactions on Signal Processing, vol. 50,
inaccurate samplings,” Applied and Computational Harmonic Analysis, no. 11, pp. 2900–2901, 2002.
vol. 38, no. 2, pp. 177–195, 2015. [40] D. Gross, “Recovering low-rank matrices from few coefficients in any
[18] Z. Yang, L. Xie, and P. Stoica, “Vandermonde decomposition of mul- basis,” IEEE Transactions on Information Theory, vol. 57, no. 3, pp.
tilevel Toeplitz matrices with application to multidimensional super- 1548–1566, 2011.
resolution,” IEEE Transactions on Information Theory, vol. 62, no. 6, [41] M. Grant and S. Boyd, “CVX: Matlab software for disciplined convex
pp. 3685–3701, 2016. programming,” Available online at http://cvxr.com/cvx, 2008.
[19] Y. Chen and Y. Chi, “Robust spectral compressed sensing via structured [42] J. A. Tropp, “User-friendly tail bounds for sums of random matrices,”
matrix completion,” IEEE Transactions on Information Theory, vol. 60, Foundations of Computational Mathematics, vol. 12, no. 4, pp. 389–434,
no. 10, pp. 6576–6601, 2014. 2012.
[20] Z. Yang and L. Xie, “On gridless sparse methods for line spectral
estimation from complete and incomplete data,” IEEE Transactions on
Signal Processing, vol. 63, no. 12, pp. 3139–3153, 2015.
[21] X. Wu, W.-P. Zhu, and J. Yan, “A Toeplitz covariance matrix reconstruc-
tion approach for direction-of-arrival estimation,” IEEE Transactions on
Vehicular Technology, vol. 66, no. 9, pp. 8223–8237, 2017.
[22] C. Zhou, Y. Gu, X. Fan, Z. Shi, G. Mao, and Y. D. Zhang, “Direction-
of-arrival estimation for coprime array via virtual array interpolation,”
IEEE Transactions on Signal Processing, vol. 66, no. 22, pp. 5956–5971,
2018.
[23] M. A. Davenport and J. Romberg, “An overview of low-rank matrix
recovery from incomplete observations,” IEEE Journal of Selected
Topics in Signal Processing, vol. 10, no. 4, pp. 608–622, 2016.
[24] J. E. Evans, J. R. Johnson, and D. Sun, “Application of advanced signal
processing techniques to angle of arrival estimation in ATC navigation
and surveillance systems,” Lincoln Laboratory, Tech. Rep., 1982.
[25] R. T. Williams, S. Prasad, A. K. Mahalanabis, and L. H. Sibul,
“An improved spatial smoothing technique for bearing estimation in a
multipath environment,” IEEE Transactions on Acoustics, Speech, and
Signal Processing, vol. 36, no. 4, pp. 425–432, 1988.
[26] H. Wang and K. R. Liu, “2-D spatial smoothing for multipath coherent
signal separation,” IEEE Transactions on Aerospace and Electronic
Systems, vol. 34, no. 2, pp. 391–405, 1998.
[27] M. Haardt and J. A. Nossek, “Unitary ESPRIT: How to obtain increased
estimation accuracy with a reduced computational burden,” IEEE Trans-
actions on Signal Processing, vol. 43, no. 5, pp. 1232–1242, 1995.
[28] A. Barabell, “Improving the resolution performance of eigenstructure-
based direction-finding algorithms,” in IEEE International Conference
on Acoustics, Speech, and Signal Processing (ICASSP), vol. 8, 1983,
pp. 336–339.
[29] F. Andersson, M. Carlsson, J.-Y. Tourneret, and H. Wendt, “A new
frequency estimation method for equally and unequally spaced data,”
IEEE Transactions on Signal Processing, vol. 62, no. 21, pp. 5761–
5774, 2014.
[30] M. Cho, J.-F. Cai, S. Liu, Y. C. Eldar, and W. Xu, “Fast alternating
projected gradient descent algorithms for recovering spectrally sparse
signals,” in 2016 IEEE International Conference on Acoustics, Speech
and Signal Processing (ICASSP), 2016, pp. 4638–4642.
[31] J.-F. Cai, T. Wang, and K. Wei, “Fast and provable algorithms for
spectrally sparse signal reconstruction via low-rank Hankel matrix
completion,” Applied and Computational Harmonic Analysis, vol. 46,
no. 1, pp. 94–121, 2019.
[32] Y. Hua and T. K. Sarkar, “Matrix pencil method for estimating pa-
rameters of exponentially damped/undamped sinusoids in noise,” IEEE
Transactions on Acoustics, Speech, and Signal Processing, vol. 38, no. 5,
pp. 814–824, 1990.
[33] Y. Hua, “Estimating two-dimensional frequencies by matrix enhance-
ment and matrix pencil,” IEEE Transactions on Signal Processing,
vol. 40, no. 9, pp. 2267–2280, 1992.
[34] J.-F. Cai, X. Qu, W. Xu, and G.-B. Ye, “Robust recovery of complex
exponential signals from random Gaussian projections via low rank
Hankel matrix reconstruction,” Applied and Computational Harmonic
Analysis, 2016.
[35] R. Rochberg, “Toeplitz and Hankel operators on the Paley-Wiener
space,” Integral Equations and Operator Theory, vol. 10, no. 2, pp.
187–235, 1987.

You might also like