Corrupted Rank-One Measurements: Low-Rank Positive Semidefinite Matrix Recovery From

1
Low-Rank Positive Semidefinite Matrix Recovery

from Corrupted Rank-One Measurements
Yuanxin Li, Student Member, IEEE, Yue Sun, and Yuejie Chi, Member, IEEE⋆
Abstract—We study the problem of estimating a low-rank where X 0 = x0 xT0 is a lifted rank-one matrix from the signal
positive semidefinite (PSD) matrix from a set of rank-one x0 of interest. On the other hand, they could arise by design,
arXiv:1602.02737v2 [cs.IT] 31 Aug 2016
measurements using sensing vectors composed of i.i.d. standard such as from the covariance sketching scheme considered in
Gaussian entries, which are possibly corrupted by arbitrary
outliers. This problem arises from applications such as phase [7], where zi is aggregated from squared intensity measure-
retrieval, covariance sketching, quantum space tomography, and ments of L data samples of a zero-mean ergodic data stream
power spectrum estimation. We first propose a convex optimiza- {xl }∞l=1 as
tion algorithm that seeks the PSD matrix with the minimum ℓ1 -
L L
!
norm of the observation residual. The advantage of our algorithm 1X 2 1X
T
is that it is free of parameters, therefore eliminating the need zi = |hai , xl i| = ai xl xl ai ≈ aTi X 0 ai .
T
for tuning parameters and allowing easy implementations. We L L

l=1 l=1
establish that with high probability, a low-rank PSD matrix can (3)
be exactly recovered as soon as the number of measurements Here, X 0 = E[xl xTl ] corresponds to the covariance matrix of
is large enough, even when a fraction of the measurements are the data when L is sufficiently large, and the goal of covariance
corrupted by outliers with arbitrary magnitudes. Moreover, the
recovery is also stable against bounded noise. With the additional
sketching is to recover the covariance matrix X 0 from the
information of an upper bound of the rank of the PSD matrix, set of measurements {zi }m i=1 . In many applications such as
we propose another non-convex algorithm based on subgradient array signal processing [8] and network traffic monitoring [9],
descent that demonstrates excellent empirical performance in the covariance matrix of the data can be well approximated
terms of computational efficiency and accuracy. by a low-rank PSD matrix, as most of its variance can be
Index Terms—rank-one measurements, low-rank PSD matrix explained by the few top principal components. Last but
estimation, outliers not least, measurements of low-rank PSD matrices in the
form of (1) also occur in a number of applications such as
I. I NTRODUCTION quantum state tomography [10], compressive power spectrum
estimation [11], non-coherent direction-of-arrival estimation
In many emerging applications of science and engineering, from magnitude measurements [12], synthetic aperture radar
we are interested in estimating a low-rank positive semidef- imaging [13], and so on.
inite (PSD) matrix X 0 ∈ Rn×n from a set of nonnegative It is natural to ask if it is possible to recover the low-rank
magnitude measurements: PSD matrix X 0 in (1) from an information-theoretically op-
zi = hZ i , X 0 i = hai aTi , X 0 i = aTi X 0 ai , (1) timal number of measurements in a computationally efficient
manner. A popular approach is based on convex relaxation
for i = 1, . . . , m, where h·, ·i denotes the inner product [7], which seeks the PSD matrix with the smallest trace norm
operator. The measurement zi is quadratic in the sensing while satisfying the observation constraint. It is shown in [7]
vector ai ∈ Rn , but linear in X 0 , where the sensing matrix that this algorithm exactly recovers all rank-r PSD matrices
Z i = ai aTi is rank-one. On one hand, such magnitude as soon as the number of measurements exceeds the order of
measurements could arise due to physical limitations, e.g. nr in the absence of noise, and the recovery is stable against
incapability of capturing phases, such as in phase retrieval and bounded noise as well.
optical imaging from intensity measurements [1]–[6], where
only the squared intensity of linear measurements of a signal
A. Our Goal and Contributions
x0 ∈ Rn is recorded:
2 In this paper, we focus on robust recovery of the low-
zi = |hai , x0 i| = aTi x0 xT0 ai = aTi X 0 ai ,

(2) rank PSD matrix when the measurements in (1) are fur-
ther corrupted by outliers, possibly adversarial with arbitrary
Y. Li and Y. Chi are with Department of Electrical and Computer Engineer-
ing, The Ohio State University, Columbus, OH 43210 USA (e-mails: {li.3822, amplitudes. In signal processing applications, outliers are
chi.97}@osu.edu). Y. Sun is with Department of Electronics Engineering, somewhat inevitable, which may be caused by sensor failures,
Tsinghua University, Beijing, China. Part of the work was done while Y. Sun malicious attacks, or reading errors. In the application of
was visiting The Ohio State University.
This work is supported in part by NSF under grant CCF-1422966, ECCS- covariance sketching, as in (3), a sufficient aggregation length
1462191 and AFOSR under grant FA9550-15-1-0205. Corresponding e-mail: L is necessary in order for each measurement zi to be well
chi.97@osu.edu. Date: April 4, 2018. approximated by (1). Measurements which are not aggregated
Preliminary results of this paper were presented in part at the IEEE Inter-
national Conference on Acoustics, Speech and Signal Processing, Shanghai, from a large enough L may be regarded as outliers. Therefore,
China, March 2016. it becomes critical to address robust recovery of X 0 in the
2
presence of outliers. Fortunately, it is reasonable to assume II. PARAMETER -F REE C ONVEX R ELAXATION
that the number of outliers is usually much smaller than the A. Problem Formulation
number of total measurements, making it possible to leverage
the sparsity of the outliers to faithfully recover the low-rank Let X 0 ∈ Rn×n be a rank-r PSD matrix, then the set of
PSD matrix of interest. m measurements, which may be corrupted by either arbitrary
outliers or bounded noise, can be represented as
We first propose a convex optimization algorithm that seeks
the PSD matrix that minimizes the ℓ1 -norm of the mea- z = A(X 0 ) + β + w, (4)
surement residual, where the ℓ1 -norm is adopted to promote
outlier sparsity. The proposed convex program is free of tuning where z, β, w ∈ Rm . The mapping A: Rn×n → Rm is
Tlinear m
parameters and eliminates the need for trace minimization, defined as A (X 0 ) = ai X 0 ai i=1 , where ai ∈ Rn is the
a popular convex surrogate for low-rank matrix recovery, by ith sensing vector composed of i.i.d. standard Gaussian entries,
only enforcing the PSD constraint. Neither does it require i = 1, . . . , m. The vector β denotes the outlier vector, which
the knowledge of the outliers, even their existence. When is assumed to be sparse whose entries can be arbitrarily large.
the sensing vectors are composed of i.i.d. standard Gaussian The fraction of nonzero entries is defined as s := kβk0 /m.
entries, we establish that for a fixed n × n rank-r PSD matrix, Moreover, the vector w denotes the additive noise, which is
as long as the number of measurements exceeds the order assumed bounded as kwk1 ≤ ǫ. Our goal is to robustly recover
of nr2 , the proposed convex program can exactly recover it X 0 from the measurements z.
with high probability, even when a fraction of an order of
1/r measurements are arbitrarily corrupted. Our measurement B. Recovery via Convex Relaxation
complexity is order-wisely near-optimal up to a factor of r,
and is near-optimal in the rank-one case up to a constant To motivate our algorithm, consider the case when only
factor. Furthermore, the recovery is also stable against additive the outlier vector β is present in (4) and the rank of X 0
bounded noise. While the proposed convex program coincides is known. One may seek a rank-r PSD matrix that minimizes
with a version of the PhaseLift algorithm [14]–[16] studied the cardinality of the measurement residual to motivate outlier
in the literature for phase retrieval, our work provides its first sparsity, given as
theoretical performance guarantee to recover low-rank PSD
X̂ = argminX0 kz − A(X)k0 , s.t. rank(X) = r. (5)
matrices in the presence of arbitrary outliers. Moreover, we
show the proposed approach can be easily extended to recover However, both the cardinality minimization and the rank
low-rank Toeplitz PSD matrices via numerical simulations. constraint are NP-hard in general, making this method compu-
To further reduce the computational burden when facing tationally infeasible. A common approach is to resort to convex
large-scale problems, we next develop a non-convex algorithm relaxation, where we relax the cardinality minimization by its
based on subgradient descent when the rank of the PSD convex relaxation, i.e. the ℓ1 -norm, and meanwhile, drop the
matrix, or an upper bound of it, is known a priori. Since any rank constraint, yielding:
rank-r PSD matrix can be uniquely decomposed as X 0 =
U 0 U T0 , where U 0 ∈ Rn×r up to some orthonormal trans- (Robust-PhaseLift:) X̂ = argminX0 kz − A(X)k1 . (6)
formations, it is sufficient to recover U 0 without constructing We denote the above convex program as the Robust-PhaseLift
the PSD matrix explicitly. The subgradient descent algorithm algorithm, since it coincides with the PhaseLift algorithm
then iteratively updates the estimate by descending along the studied in [14]–[16] for phase retrieval1 . The advantage of
subgradient of the ℓ1 -norm of the measurement residual using Robust-PhaseLift in (6) is that it does not require any prior
a properly selected step size and spectral initialization. We knowledge of the noise bound, the rank of X 0 , nor the sparsity
conduct extensive numerical experiments to demonstrate its level of the outliers, and is free of any regularization parameter.
excellent empirical performance, and compare it against the It is also worth emphasizing that due to the special rank-
convex program proposed above as well as other alternative one measurement operator, in (6) it is possible to only honor
approaches in the literature. the PSD constraint but not motivate the low-rank structure
explicitly, via for example, trace minimization2.
Encouragingly, we demonstrate that the algorithm (6) admits
robust recovery of a rank-r PSD matrix as soon as the number
B. Organization
of measurements is large enough, even with a fraction of arbi-
The rest of the paper is organized as below. Section II trary outliers in Theorem 1. To the best of our knowledge, this
presents the proposed convex optimization algorithm and its is the first theoretical performance guarantee of the robustness
corresponding performance guarantee, where detailed com- of (6) with respect to arbitrary outliers in the low-rank setting.
parisons to related work are presented. Section III describes Our main theorem is given as below.
the proposed non-convex subgradient descent algorithm that
1 Note that there are a few different versions of PhaseLift in the literature
is computationally efficient with excellent empirical perfor-
mance. Numerical examples are provided in Section IV. The which are not outlier-robust, therefore we rename (6) to Robust-PhaseLift for
emphasis.
proof of the main theorem is given in Section V. Finally, we 2 The interested readers are invited to look up Fig. 1 in [2] for an intuitive
conclude in Section VI. geometric interpretation in the noise-free and outlier-free case.
3
Theorem 1. Suppose that kwk1 ≤ ǫ and s = kβk0 /m. the same algorithm has been employed to recover low-rank
Assume the support of β is selected uniformly at random with PSD matrices in [7], where an order of nr measurements
the signs of its nonzero entries generated from the Rademacher obtained from i.i.d. sub-Gaussian sensing vectors are shown
distribution as P {sgn (βi ) = −1} = P {sgn (βi ) = 1} = 1/2 to guarantee exact recovery in the noise-free case and stable
for each i ∈ supp(β). Then for a fixed rank-r PSD matrix recovery with bounded noise. One problem with the algorithm
X 0 ∈ Rn×n , there exist some absolute constants c1 > 0 and (7) is that the noise bound ǫ is assumed known. Furthermore,
0 < s0 < 1 such that as long as it is not amenable to handle outliers, since kz − A(X 0 )k1 can
s0 be arbitrarily large with outliers and consequently the ground
m ≥ c1 nr2 , s ≤ , truth X 0 quickly becomes infeasible for (7).
r
the solution to (6) satisfies The proposed algorithm (6) is studied in [14]–[16] as a
rǫ variant of PhaseLift for phase retrieval, corresponding to the
X̂ − X 0 ≤ c2 , case where X 0 = x0 xT0 is rank-one. It is shown in [14], [15]

F m
that with O(n) i.i.d. Gaussian sensing vectors, the algorithm
with probability exceeding 1 − exp(−γm/r2 ) for some con- succeeds with high probability. Compared with (7), the algo-
stants c2 and γ. rithm (6) eliminates trace minimization and leads to easier
Theorem 1 has the following consequences. algorithm implementations. We note that [17] also considers
• Exact Recovery with Outliers: When ǫ = 0, Theorem 1
a regularization-free algorithm for PSD matrix estimation that
suggests the recovery is exact using Robust-PhaseLift (6), minimizes the ℓ2 -norm of the residual, which unfortunately,
i.e. X̂ = X 0 even when a fraction of measurements are cannot handle outliers as Robust-PhaseLift (6). Hand [16] first
arbitrarily corrupted, as long as the number of measure- considered the robustness of the Robust-PhaseLift algorithm
ments m is on the order of nr2 . Given there are at least nr (6) in the presence of outliers for phase retrieval, establishing
unknowns, our measurement complexity is near-optimal that the same guarantee holds even with a constant fraction of
up to a factor of r. outliers. Our work extends the performance guarantee in [16]
• Stable Recovery with Bounded Noise: In the presence
to the general low-rank PSD matrix case.
of bounded noise, Theorem 1 suggests that the recovery Broadly speaking, our problem is related to low-rank matrix
performance decreases gracefully with the increase of ǫ, recovery from an under-determined linear system [18]–[20],
where the Frobenius norm of the reconstruction error is where the linear measurements are drawn from inner products
proportional to the per-entry noise level of the measure- with rank-one sensing matrices. It is due to this special
ments. structure of the sensing matrices that we can eliminate the trace
• Phase Retrieval: When r = 1, the problem degenerates
minimization, and only consider the feasibility constraint for
to the case of phase retrieval, and Theorem 1 recovers PSD matrices. Standard approaches for separating low-rank
existing results in [16] for outlier-robust phase retrieval, and sparse components [21]–[25] via convex optimization are
where the measurement complexity is on the order of n, given as
which is optimal up to a scaling factor. min Tr(X) + λkβk1 , s.t. kz − A(X) − βk1 ≤ ǫ,
Let us denote X̂ r = argminrank(Z)=r,Z0 kX̂ − ZkF as X0, β
the best rank-r PSD matrix approximation of X̂, the solution where λ is a regularization parameter that requires to be tuned
to (6). Then Theorem 1 suggests that the estimate X̂ can be properly. In contrast, the formulation (6) is parameter-free.
well approximated by a rank-r PSD matrix since
rǫ III. A N ON -C ONVEX S UBGRADIENT D ESCENT
kX̂ − X̂ r kF ≤ kX̂ − X 0 kF ≤ c2 ,
m A LGORITHM
as long as the number of measurements is sufficiently large.
Furthermore, we have In this section, we propose another algorithm for robust
low-rank PSD matrix recovery from corrupted rank-one mea-
kX̂ r − X 0 kF ≤ kX̂ r − X̂kF + kX̂ − X 0 kF surements assuming the rank (or its upper bound) of the PSD
rǫ matrix X 0 is known a priori as r. In this case, we can
≤ 2kX̂ − X 0 kF ≤ 2c2 ,
m decompose X 0 as X 0 = U 0 U T0 where U 0 ∈ Rn×r . Instead
indicating that X̂ r provides an accurate estimate of X 0 that of directly recovering X 0 , we may aim at recovering U 0 up
is exactly rank-r and PSD. to orthogonal transforms, since (U 0 Q)(U 0 Q)T = U 0 U 0 for
any orthonormal matrix Q ∈ Rr×r . Consider relaxing of the
loss function in (5) but keeping the rank constraint, we obtain
C. Comparisons to Related Work
the following problem:
In the absence of outliers, the PhaseLift algorithm in the
following form X̂ = argminX0 kz − A(X)k1 , s.t. rank(X) = r. (8)
min Tr(X) s.t.
X0
kz − A(X)k1 ≤ ǫ, (7) Since any rank-r PSD matrix X can be written as X = U U T
for some U ∈ Rn×r , (8) can be equivalently reformulated as
where Tr(X) denotes the trace of X, has been proposed
to solve the phase retrieval problem [2], [3], [14]. Later Û = argminU ∈Rn×r f (U ), (9)
4
no outliers modest outlier amplitudes large outlier amplitudes
f (U ) = 1
2m kz − A(U U T )k1
g(U ) = 1
4m kz − A(U U T )k22
Fig. 1: Illustrations of the objective function − log f (U ) and its ℓ2 -norm counterpart − log g(U ) (in negative logarithmic
scales) under different corruption scenarios when U ∈ R2×1 . The number of measurements is m = 100 with i.i.d. Gaussian
sensing vectors, and the fraction of outliers is s = 0.2 with uniformly selected support and amplitudes drawn from Unif[0, 10]
or Unif[0, 100]. It is interesting to observe that while large outliers completely distort g(U ), the proposed objective is quite
robust with the ground truth being the only global optima of f (U ).
with following matrix with respect to Frobenius norm as

m 2
1 1 X T 2 m
T
f (U ) = z − A(U U T ) =

z − a i . (0)

(0)
1 X T

i U U = argminrank(X)=r X − zi ai ai .

2m 2m i=1
U
1 2 m i=1
F
(11)
Clearly, (9) is no longer convex. To illustrate, the first row of
Secondly, at the (t+1)th iteration, t ≥ 0, we apply subgradient
Fig. 1 plots the value of the objective function in the negative
descent to refine the estimate as
logarithmic scale, i.e. − log f (U ), under different corruption
scenarios when U ∈ R2×1 . For comparison, the second row of U (t+1) = U (t) − µt · ∂f (U (t) ), (12)
Fig. 1 shows the loss function evaluated in ℓ2 -norm: g(U ) =
1 T 2 where the step size µt is adaptively set as
4m kz − A(U U )k2 , which is not robust to outliers.
Motivated by the recent non-convex approaches [26]–[28] of n o
solving quadratic systems, we propose a subgradient descent µt = 0.05 × max 2−t/1000 , 10−6 ,
algorithm to solve (9) effectively, working with a non-smooth
which provide more accurate estimates using fewer iterations
function f (U ). Note that a subgradient of f (U ) with respect
in the numerical simulations. The procedure is summarized in
to U can be given as
Alg. 1, where the stopping rule in Alg. 1 is simply put as a
m maximum number of iterations.
1 X T 2

∂f (U ) = − sgn zi − U ai ai aTi U , (10)
m i=1 2
Algorithm 1: Subgradient descent for solving (9)
where the sign function sgn(·) is defined as Parameters: Rank r, number of iterations Tmax , and
 step size µt ;
 +1, x > 0 Input: Measurements z, and sensing vectors {ai }m i=1 ;
sgn(x) = 0, x = 0 . Initialization: Initialize U (0) ∈ Rn×r via (11);
−1, x < 0 for t = 0 : Tmax − 1 do

update U (t+1) via (12);
Our subgradient descent algorithm proceeds as below. De- end for
note the estimate in the tth iteration by U (t) ∈ Rn×r . First, Output: Û = U (Tmax ) .
U (0) is initialized as the best rank-r approximation of the
5
The main advantage of Alg. 1 is its low memory and 12

1
0.9
12
1
0.9
computational complexity. Given that it does not construct 10 0.8 10 0.8
0.7 0.7
the full PSD matrix, the memory complexity is simply the 8

0.6
8
0.6
Rank (r)
Rank (r)
size of U (t) , which is on the order of nr. The computational 6
0.5
0.4
6
0.5
0.4
complexity per iteration is also low, which is on the order of 4 0.3 4 0.3
mnr, that is linear in all the parameters. We demonstrate the

0.2 0.2
2 2
0.1 0.1
excellent empirical performance of Alg. 1 in Section IV-C. 100 200 300 400
Number of measurements (m)
500 600
0
0 0.05 0.1
Percent of outliers (s)
0.15 0.2
0
(a) (b)
IV. N UMERICAL E XAMPLES
Fig. 3: Phase transitions of low-rank PSD matrix recovery
A. Performance of Convex Relaxation with respect to (a) the number of measurements and the rank,
with 5% of measurements corrupted by standard Gaussian
We first examine the performance of Robust-PhaseLift in variables; (b) the percent of outliers and the rank, when the
(6). Let n = 40. We randomly generate a low-rank PSD matrix number of measurements is m = 600, when n = 40.
of rank-r as X 0 = U 0 U T0 , where U 0 ∈ Rn×r is composed of
i.i.d. standard Gaussian variables. The sensing vectors are also
composed of i.i.d. standard Gaussian variables. Each Monte B. Convex Relaxation with additional Toeplitz Structure
Carlo simulation is called successful if the normalized estimate
We next consider robust recovery of low-rank Toeplitz PSD
error satisfies kX̂ −X 0 kF /kX 0 kF ≤ 10−6 , where X̂ denotes
matrices, where we allow complex-valued sensing vectors
the solution to (6). For each cell, the success rate is calculated
A(X) = {aH m
i Xai }i=1 and complex-valued Toeplitz PSD
by averaging over 100 Monte Carlo simulations.
matrices X. Estimating low-rank Toeplitz PSD matrices is of
great interests for array signal processing [29]. We modify (6)
12
1
0.9
12
1
0.9
by incorporating the Toeplitz constraint as:
10 0.8 10 0.8
0.7 0.7 X̂ = argminX0 kz − A(X)k1 , s.t. X is Toeplitz. (13)

8 8
0.6 0.6
Rank (r)
Rank (r)
6
0.5
0.4
6
0.5
0.4
Let n = 64, the Toeplitz PSD matrix X 0 is generated as X 0 =
4 0.3 4 0.3
V ΣV H , where V = [v(f1 ), . . . , v(fr )] ∈ Cn×r is a Vander-
monde matrix with v(fi ) = [1, ej2πfi , . . . , ej2π(n−1)fi ]T , fi ∼
0.2 0.2
2 2
0.1 0.1
100 200 300 400 500

600
0
100 200 300 400 500
600
0
Unif[0, 1], and Σ = diag[σ12 , . . . , σr2 ], with σi2 ∼ Unif[0, 1].
Fig. 4 shows the phase transitions of Toeplitz PSD matrix
(a) (b)
recovery with respect to the number of measurements and the
rank without outliers in (a), and when 5% of measurements
Fig. 2: Phase transitions for low-rank PSD matrix recovery
are selected uniformly at random and corrupted by standard
with respect to the number of measurements and the rank, (a)
Gaussian variables in (b). It can be seen that the low-rank
with trace minimization; and (b) without trace minimization
Toeplitz PSD matrix can be robustly recovered from a sub-
of noise-free measurements, when n = 40.
linear number of measurements due to the additional Toeplitz
structure. We note that a different covariance sketching scheme
Fig. 2 shows the success rates of algorithms with respect is considered in [30]–[32] for estimating low-rank Toeplitz
to the number of measurements and the rank, with the trace covariance matrices. Though not directly comparable to our
minimization as in (7) in (a); and without the trace minimiza- measurement scheme, it may benefit from a similar parameter-
tion as proposed in Robust-PhaseLift (6) in (b) for noise-free free convex optimization to handle outliers.
measurements. It can be seen that the performance of these
two algorithms are almost equivalent, confirming a similar 12
1
12
1
0.9 0.9
numerical observation for the phase retrieval problem [4] also 10 0.8 10 0.8
holds in the low-rank setting, where trace minimization may 8

0.7
0.6
8
0.7
0.6
Rank (r)
Rank (r)
be eliminated for low-rank PSD matrix recovery using rank- 6

0.5
6
0.5
0.4 0.4
one measurements. 4 0.3 4 0.3
Fig. 3 further shows the success rates of the Robust- 2

0.2
0.1
2
0.2
0.1
PhaseLift algorithm (a) with respect to the number of measure- 10 20 30 40 50 60 70

0
10 20 30 40 50 60 70
0
Number of measurements (m) Number of measurements (m)

ments and the rank, when 5% of measurements are selected
uniformly at random and corrupted by standard Gaussian (a) (b)
variables; and (b) with respect to the percent of outliers and
the rank, for a fixed number of measurements m = 600. Fig. 4: Phase transitions of low-rank Toeplitz PSD matrix
This also suggests possible room for improvements of our recovery with respect to the number of measurements and the
theoretical guarantee, as the numerical results indicate that the rank, (a) without outliers, and (b) with 5% of measurements
required measurement complexity for successful recovery has corrupted by standard Gaussian variables, when n = 64.
a seemingly linear relationship with r.
6
C. Performance of Non-Convex Subgradient Descent 12

1
0.9
12
1
0.9
We next examine the performance of the non-convex sub- 10 0.8
0.7
10 0.8
0.7
gradient descent algorithm in Alg. 1, where the number of 8

0.6
8
0.6
Rank (r)
Rank (r)
iterations is set as Tmax = 3 × 104 , which is a large
0.5 0.5
6 6
0.4 0.4
value to guarantee convergence when terminated. Denote the 4 0.3
0.2
4 0.3
0.2
solution to Alg. 1 by Û , and each Monte Carlo simulation is 2

0.1
2
0.1
deemed successful if the normalized estimate error Tsatisfies 0 0.05 0.1 0.15
0.2
0
0 0.05 0.1 0.15
0.2
0
kX̂ − X 0 kF /kX 0 kF ≤ 10−6 , where X̂ = Û Û is the (a) (b)

estimated low-rank PSD matrix. For each cell, the success rate
is calculated by averaging over 100 Monte Carlo simulations. Fig. 6: Phase transitions of low-rank PSD matrix recovery with
respect to the percent of outliers and the rank using (a) the
12
1
proposed Alg. 1, and (b) the WF algorithm, when n = 40 and
0.9
m = 600.
10 0.8
0.7
8
0.6
Rank (r)
0.5
in the bounded noise w is i.i.d. drawn from Unif[−4/m, 4/m],
6
0.4 thus kwk1 ≤ ǫ, where ǫ = 4. Fig. 7 depicts the mean
4 0.3 squared error kX̂ −X 0 k2F for different algorithms with respect
0.2 to the number of measurements, where X̂ is the estimated
2
0.1
PSD matrix. For the subgradient descent algorithm in Alg. 1,
0
100 200 300 400 500
600 various ranks are used as prior information, corresponding to
the correct rank r, its underestimate r −1, and its overestimate
r + 1. It can be seen that Alg. 1 works well as long as the
Fig. 5: Phase transitions of low-rank PSD matrix recovery with given rank provides an upper bound of the true rank, and it
respect to the number of measurements and the rank for the performs much better than the WF algorithm which is not
proposed Alg. 1 using noise-free measurements, when n = 40. outlier-robust. On the other hand, the PhaseLift algorithm (7)
does not admit favorable performance for various constraint
Fig. 5 shows the success rate of Alg. 1 with respect to the parameters (ǫ, 2ǫ, 4ǫ) as expected since the outliers do not fall
number of measurements and the rank under the same setup into the prescribed noise bound. In fact, it fails to return any
of Fig. 2 for noise-free measurements, when n = 40. Indeed, feasible solution when the number and amplitudes of outliers
empirically Alg. 1 performs similarly as the convex algorithms is too large in our simulation. In contrast, Robust-PhaseLift
but with a much lower computational cost. Moreover, the allows stable recovery even with an additional bounded noise,
proposed Alg. 1 allows perfect recovery even in the presence which performs comparably with Alg. 1 with the correct model
of outliers. For comparison, we implement the extension of the order.
Wirtinger Flow (WF) algorithm in [26], [28], [33] in the low-
rank case, that minimizes the squared ℓ2 -norm of the residual, 10 4
where the update rule per iteration becomes Robust-PhaseLift
Alg. 1 with r
m Alg. 1 with r+1
1 X
10 2
Alg. 1 with r-1
U (t+1) = U (t) +µWF
t z i − k(U (t) T
) a k 2 T
i 2 ai ai U
(t)
, Wirtinger Flow
PhaseLift with ǫ
m i=1
Mean squared error
PhaseLift with 2ǫ
PhaseLift with 4ǫ
10 0
using the same initialization (11). The step size is set as
µWF
t = 0.1/ kU 0 k2F . Fig. 6 (a) shows the success rates of
Alg. 1 with respect to the percent of outliers and the rank, 10 -2
under the same setup of Fig. 3 (b), where the performance is

even better than the convex counterpart in (6). In contrast, the 10 -4
WF algorithm performs poorly even with very few outliers, as
shown in its success rate plot in Fig. 6 (b), as the loss function 10 -6
100 200 300 400 500 600
used for WF is not robust to outliers. Number of measurements (m)
D. Comparisons with Additional Bounded Noise Fig. 7: Comparisons of mean squared errors using different
Finally, we compare the two proposed algorithms (Robust- algorithms with respect to the number of measurements with
PhaseLift in (6) and Alg. 1), the WF algorithm and the 5% outliers and bounded noise, when n = 40 and r = 3.
PhaseLift algorithm in (7) when the measurements are cor-
rupted by both outliers and bounded noise. Fix n = 40 and
r = 3. The rank-r PSD matrix X 0 , the sensing vectors, as V. P ROOF OF M AIN T HEOREM
well as the outliers are generated similarly as earlier, where In this section we prove Theorem 1, and the roadmap
the fraction of the outliers is set to 5%. Moreover, each entry of our proof is below. In Section V-A, we first provide
7
the sufficient conditions for an approximate dual certificate the solution to (6) satisfies
that certifies the optimality of the proposed algorithm (6) in rǫ
X̂ − X 0 ≤ c ,

Lemma 1. Section V-B records a few lemmas that show A F m
satisfies the required restricted isometry properties. Then, a where c is a constant.
dual certificate is constructed and validated for a fixed low-
rank PSD matrix X 0 in Section V-C. Finally, the proof is Proof: Denote the solution to (6) by X̂ = X 0 +H 6= X 0 ,
concluded in Section V-D. then we have X̂ 0, H T ⊥ 0, and furthermore,
First we introduce some additional notations. Let S be a kA(H) − (β + w)k1 = kz − A(X 0 + H)k1
subset of {1, 2, . . . , m}, then S ⊥ is the complement of S with
respect to {1, 2, . . . , m}. AS is the mapping operator A con- = kz − A(X̂)k1
strained on S, which is defined as AS (X) = aTi Xai i∈S . ≤ kz − A(X 0 )k1 = kβ + wk1 ,

Pm
Denote the adjoint operator of A by A∗ (µ) = i=1 µi ai aTi , where the inequality follows from the optimality of X̂ since
where µi is the ith entry of µ, 1 ≤ i ≤ m. We use kXk, kXkF both X̂ and X 0 are feasible to (6). Since
and kXk1 to denote the spectral norm, the Frobenius norm
and the nuclear norm of the matrix X, respectively, and use kA(H)−(β+w)k1 = kAS (H)−β−wS k1 +kAS ⊥ (H)−wS ⊥ k1 ,
kxkp to denote the ℓp -norm of the vector x. Let the singular and
value decomposition of the fixed rank-r PSD matrix X 0 be kβ + wk1 = kβ + wS k1 + kwS ⊥ k1 ,
X 0 = U ΛU T , then the symmetric tangent space T at X 0 is
denoted by we have
kAS ⊥ (H)k1 ≤ kAS ⊥ (H) − wS ⊥ k1 + kwS ⊥ k1
n o
T := U Z T + ZU T | Z ∈ Rn×r .
≤ kβ + wk1 − kAS (H) − β − wS k1 + kwS ⊥ k1
We denote by PT and PT ⊥ the orthogonal projection onto ≤ kβ + wS k1 − kAS (H) − β − wS k1 + 2kwS ⊥ k1
T and its orthogonal complement, respectively. And for no-
≤ kAS (H)k1 + 2kwS ⊥ k1 ,
tational simplicity, we denote H T := PT (H) and H T ⊥ :=
H −PT (H) for any symmetric matrix H ∈ Rn×n . Moreover, where the last inequality follows from the triangle inequality.
γ, c, c1 and c2 represent absolute constants, whose values may We could further bound
change according to context.
kAS ⊥ (H T )k1 ≤ kAS ⊥ (H)k1 + kAS ⊥ (H T ⊥ )k1
≤ kAS (H)k1 + kAS ⊥ (H T ⊥ )k1 + 2kwS ⊥ k1
A. Approximate Dual Certificate
≤ kAS (H T )k1 + kAS (H T ⊥ )k1
The following lemma suggests that under certain appropriate + kAS ⊥ (H T ⊥ )k1 + 2kwS ⊥ k1
restricted isometry preserving properties of A, a properly
constructed dual certificate can guarantee faithful recovery of = kAS (H T )k1 + kA(H T ⊥ )k1 + 2kwS ⊥ k1 .
the proposed algorithm (6). (19)
Our assumptions on A imply that
Lemma 1 (Approximate Dual Certificate for (6)). Denote a
subset S with |S| s√0 1
m := ⌈ 13 2r ⌉, where 0 < s0 < 1 is some 1+ Tr (H T ⊥ )
constant, and the support of β satisfies supp(β) ⊆ S. Suppose 10
that the mapping A obeys that for all symmetric matrices X, 1
≥ kA (H T ⊥ )k1
m
1 1 1
kA (X)k1 ≤ 1 + kXk1 , (14) ≥ (kA ⊥ (H T )k1 − kAS (H T )k1 − 2kwS ⊥ k1 )
m 10 m S
|S ⊥ |

and 1 |S| 1 2ǫ
≥ 1− kH T kF − 1+ kH T k1 − ,
1 1 5m 12 m 10 m
kAS (X)k1 ≤ 1+ kXk1 , (15)
|S| 10 where the first inequality follows from (14) due to kH T ⊥ k1 =
and for all matrices X ∈ T , Tr (H T ⊥ ), as H T ⊥ 0, the second inequality follows from
(19), and the last inequality follows from (15) and (16). This
1 1 1 gives
kAS ⊥ (X)k1 > 1− kXkF , (16)
|S ⊥ | 5 12
|S | |S| √
⊥
2ǫ
Tr (H T ⊥ ) ≥ − 2r kH T kF − , (20)
where AS and AS ⊥ is the operator constrained on S and S ⊥ 6m m m
respectively. Then if there exists a matrix Y = A∗ (µ) that √
satisfies where we use the inequality kH T k1 ≤ 2rkH T kF .
1 1 On the other hand, since µ/(9/m) is a subgradient of the
Y T ⊥ − I T ⊥ , kY T kF ≤ , (17) ℓ1 -norm at β from (18), we have
r 13r Dm E
and kβk1 + µ, w − A(H) ≤ kw + β − A(H)k1

µi = m 9
sgn(βi ), i ∈ supp(β) 9
9 , (18) ≤ kβ + wk1 ≤ kβk1 + kwk1 ,
|µi | ≤ m , i∈/ supp(β)
8
which, by a simple transformation, is Lemma 3 ( [7]). Suppose the sensing vectors ai ’s are
9 composed of i.i.d. sub-Gaussian entries, then there exist pos-
hµ, A(H)i ≥ hµ, wi − kwk1 itive universal constants c1 , c2 , c3 such that, provided that
m
m > c3 nr, for all matrices X of rank at most r, one has
9 18ǫ
≥ − kµk∞ + kwk1 ≥ − . 2
m m 1 − δrlb kXkF ≤

kB (X)k1 ≤ 1 + δrub kXkF ,

m
Then with
hH, Y i = hA(H), µi, with probability exceeding 1 − c1 e−c2 m , where δrlb and δrub
are defined as the RIP-ℓ2/ℓ1 constants. And the operator B
we can get represents the linear transformation that maps X ∈ Rn×n
to {Bi (X)}m/2i=1 ∈ R
m/2
, where Bi (X) := ha2i−1 aT2i−1 −
18ǫ T
− ≤ hA(H), µi = hH, Y i a2i a2i , Xi.
m
= hH T , Y T i + hH T ⊥ , Y T ⊥ i The third condition (16) can be easily validated from the
2
1 lower bound by setting δrlb appropriately, since m kB (X)k1 ≤
≤ kY T kF kH T kF − hH T ⊥ , I T ⊥ i 2 Pm/2
T
T

r m i=1 ha 2i−1 a 2i−1 , Xi + ha 2i a 2i , Xi =
1 1 2kA (X) k1 .
≤ kH T kF − Tr(H T ⊥ ),
13r r
which gives C. Construction of Dual Certificate
1 18rǫ For notational simplicity, let α0 := EZ 2 1{|Z|≤3} ≈ 0.9707,
Tr(H T ⊥ ) ≤ kH T kF + . (21)
13 m β0 := EZ 4 1{|Z|≤3} ≈ 2.6728 and θ0 := EZ 6 1{|Z|≤3} ≈
Combining with (20), we know 11.2102 for a standard Gaussian random variable Z, where
1E is an indicator function with respect to an event E.
|S | |S| √
⊥
2ǫ 1 18rǫ Consider that the singular value decomposition of a PSD
− 2r kH T kF − ≤ kH T kF + .
6m m m 13 m matrix
Pr X 0 of rank at most r can be represented as X 0 =
T
⊥ √ λ
i=1 i ui ui , then inspired by [14], [16], we construct Y as
Since |S6m| − |S|
m
1
2r − 13 > 0 under the assumption on |S|
m r
in Lemma 1, we have 1 X h 1 X T 2
Y := a ui 1{|aT ui |≤3}
m r i=1 j j
20rǫ rǫ j∈S ⊥
kH T kF ≤ √ ≤ c1 ,
β0 − α0 i 9 X

m |S ⊥ |
− |S| 2r − 1 m
6m m 13 − (α0 + ) · aj aTj + χj aj aTj
r m
j∈S
where c1 is some fixed constant. Finally, we have (0) (1) (2)
:= Y −Y +Y ,
kX̂ − X 0 kF ≤ kH T kF + kH T ⊥ kF
where
≤ kH T kF + Tr(H T ⊥ ) " r #

1

18rǫ rǫ (0) 1 X 1 X T 2
≤ 1+ kH T kF + ≤c , Y = aj ui 1{|aT ui |≤3} aj aTj ,
13 m m m r i=1
j
j∈S ⊥
X
for some constant c. 1 β0 − α0
Y (1) = α0 + aj aTj ,
m r
j∈S ⊥
B. Restricted Isometry of A 9 X
Y (2) = χj aj aTj .
The first two conditions (14) and (15) in Lemma 1 are m
j∈S
supplied straightforwardly in the following lemma as long as
We set χj = sgn (βj ) if j ∈ supp(β), otherwise χj ’s
m ≥ cnr and |S| = c1 m/r ≥ c2 n for some constants c, c1
are i.i.d. Rademacher random variables with P {χj = 1} =
and c2 .
P {χj = −1} = 1/2.
Lemma 2 ( [2]). Fix any δ ∈ (0, 12 ) and assume m ≥ 20δ −2 n. The construction immediately indicates that Y satisfies (18).
Then for all PSD matrices X, one has We will show that Y satisfies (17) with high proability. In what
follows, we separate the constructed Y into two parts and
1
(1 − δ) kXk1 ≤ kA (X)k1 ≤ (1 + δ) kXk1 consider the bounds on Y (0) − Y (1) and Y (2) respectively.
m
1) Proof of Y T ⊥ + r1 I T ⊥ 0: First, by standard results
2
with probability exceeding 1 − 2e−mǫ /2 , where ǫ2 + ǫ = 4δ . in random matrix theory [34, Corollary 5.35], we have
The right hand side holds for all symmetric matrices.
m

β0 − α0

β0
(1)

|S ⊥ | Y − α0 + I ≤ 40r ,
The third condition (16) in Lemma 1 can be obtained using r
the mixed-norm RIP-ℓ2 /ℓ1 provided in [7] as long as m ≥ cnr
1 − 2e−γ |S |/r for some constant γ
⊥ 2
and |S| ≤ c1 m for some constants c and c1 . with probability

⊥ at least
provided S ≥ cnr2 for some constant c. In particular, this
9

gives 2) Proof of kY T kF ≤ 13r1
: Let Ỹ = Y (0) − Y (1) U ,
′

and Ỹ = I − U U T Ỹ be the projection of Ỹ onto the

m (1) β0 − α0 β0
|S ⊥ | Y T ⊥ − α0 + IT ⊥ ≤ . (22)

r 40r orthogonal complement of U , then we have
2 2 ′ 2
Let a′j = (I − U U T )aj be the projection of aj onto the (0) (1)
Y T − Y T = U T Ỹ + 2 Ỹ .

(26)
orthogonal complement of the column space of U , then we F F F
T
have First consider the term kU Ỹ k2F
in (26), where the kth
(0) 1 X column of U T Ỹ can be expressed explicitly as
Y T⊥ = ǫj ǫTj ,
m ⊥

j∈S
U T Ỹ
P 2 1/2 k "
1 r r #
where ǫj = r i=1 aTj ui 1{|aT ui |≤3} a′j are i.i.d.

1 X 1 X T 2 β0 − α0
j
= a ui 1{|aT ui |≤3} − α0 +
copies of a zero-mean, isotropic and sub-Gaussian random m r i=1 j j r
j∈S ⊥
vector ǫ, which satisfies E[ǫǫT ] = α0 I T ⊥ . Following [34,
· aTj uk U T aj

Theorem 5.39], we have
1

m (0) α0
Y − α 0 T ≤
I ⊥
, (23) := Φck ,
|S ⊥ | T ⊥ 40r m
where Φ ∈ Rr×|S | is constructed by U T aj ’s, and ck ∈
⊥
−γ |S ⊥ |/r 2
⊥ at least2 1 − 2e
with probability for some constant
R|S | is composed of ck,j ’s, each one expressed as
⊥
γ provided S ≥ cnr for some constant c. As a result, if

m ≥ cnr2 for some large constant c and |S| ≤ c1 m for some ck,j =
2
constant c1 small enough, with probability at least 1−e−γm/r , " r #
there exists 1 X T 2 β0 − α0
aTj uk ,

aj ui 1{|aT ui |≤3} − α0 +
r i=1 j r
Y ⊥ − Y (1)⊥ + β0 − α0 I T ⊥
(0)
T T r with

1

(0) (1) β0 − α0 S ⊥
E[c2k,j ] = θ0 + (r − 1) β0 − β02 − (r − 1) α20

≤ Y T ⊥ − Y T ⊥ + IT ⊥

r 2
r m
1 1 4.07

β0 − α20 + 2 θ0 + α20 − β02 − β0 ≤

⊥ !
S β0 − α0 = .
r r r
+ 1−
m r Note that c2k,j ’s are i.i.d. sub-exponential random variables

m (0) m (1) β0 − α0 |S| β0 − α0 with c2k,j ≤ K, for some constant K, then according

≤ ⊥ Y T⊥ − ⊥ Y T⊥ +
IT ⊥ + m ψ1
|S | |S | r r to [34, Corollary 5.17],
β0 α0
≤ + .
 
(24)  X !
2 ⊥
30r 60r ǫ  ǫ S
c2 − Ec2k,j ≥ S ⊥ ≤ 2exp −c 2 2

P ,

(2)  ⊥ k,j r K r
let’s check Y T ⊥ . Since Y (2) =

Next, j∈S
1 P T T
m j∈S 9χj aj aj , where E[9χj aj aj ] = 0, by [34, which shows that as long as |S| ≤ c1 m, for some constants c
Theorem 5.39] we have and c1 ,
2 4.07 + c 4.1m
kck k2 ≤ m≤

|S|
r r
1 1
X
(2) T

Y = 9χ j j j ≤
a a ,
m |S|
10r 2
holds with probability at least 1 − e−γm/r . Furthermore, for
j∈S
a fixed vector x ∈ R|S | obeying kxk2 = 1, kΦxk2 is
⊥ 2
with probability at least 1 − 2 exp(−γm/r) as long as m ≥ distributed as a chi-square random variable with r degrees
cnr2 and |S| = c1 m/r ≥ c2 nr, for some constants c, c1 and of freedom. From [35, Lemma 1], we have
c2 . In particular, this gives
2 m
kΦxk2 ≤ ,

(2)
1 12000r2
Y T ⊥ ≤ . (25)
10r 2
with probability at least 1 − e−γm/r , provided m ≥ cnr2 for
Putting this together with (24), we can obtain that if m ≥ some sufficiently large constant c. Therefore, we can obtain
cnr2 and |S| = c1 m/r ≥ c2 nr for some constants c, c1 and 2
2
c2 , with probability at least 1 − e−γm/r ,

T
2 1 ck kck k22 ≤ 1
U Ỹ = Φ ,

m2 kck k2 2 2700r3

k 2
Y T ⊥ + 1.7 I T ⊥ = Y (0)⊥ − Y (1)⊥ + Y (2)⊥ + 1.7 I T ⊥

T T T which yields
r r
r
α0 β0 1 0.25 X 2 1
≤ + + 0.11 ≤ . kU T Ỹ k2F = U T Ỹ ≤ , (27)

60 30 r r k 2 2700r2
k=1
10
2
with probability at least 1 − e−γm/r , when m ≥ cnr2 and with E[d2k,j ] = 81. Note 2
that dk,j ’s are i.i.d. sub-exponential
|S| ≤ c1 m. 2
random variables with dk,j ≤ K, for some constant K,
To bound the second term in (26), we could adopt the same ψ1
′ then based on [34, Corollary 5.17],
techniques as before. The kth column of Ỹ can be expressed  
explicitly as X 
ǫ2 |S|

2 2

P dk,j − Edk,j ≥ ǫ |S| ≤ 2exp −c1
,
K2
′
Ỹ
 
j∈S
k
which indicates that if |S| = cm/r, for some constant c,
" r #
1 X 1 X T 2 β0 − α0
= a ui 1{|aT ui |≤3} − α0 + 2
m r i=1 j j r kdk k2 ≤ (81 + c1 ) |S| ≤ 82 |S| := δ0 |S|
j∈S ⊥
holds with probability at least 1 − e−γm/r

· (aTj uk ) I − U U T aj . And
2
for a fixed
vector x ∈ R|S| obeying kxk2 = 1, Φ̄x 2 is also a chi-
1 X 1 square random variable with r degrees of freedom, so
:= ck,j a′j := Ψck ,
m ⊥
m 2 m
j∈S Φ̄x ≤
2
,
2700δ0cr2
where Ψ ∈ Rn×|S | is constructed by a′j ’s, each of which,
⊥
2
with probability at least 1 − e−γm/r , provided m ≥ c1 nr2
as a reminder, is the projection of aj onto the orthog-
for some sufficiently large constant c1 . Thus we have
onal
complement
of the column space
of U as a′j = 2
2 1 dk 1
I − U U T aj . Equivalently, Ψ = I − U U T A, where

U T Ȳ = 2 Φ̄ kdk k22 ≤ ,

k 2 m kdk k2 2700r3
A ∈ Rn×|S | is constructed by aj ’s, j ∈ S ⊥ . For a fixed
⊥
2
vector x ∈ R|S | obeying kxk2 = 1, we have kΨxk2 =

⊥ 2 which gives
2 r 2
2 2
I − U U T Ax ≤ kAxk2 , where kAxk2 is distributed
X 1
kU T Ȳ k2F = U T Ȳ ≤

, (30)

2 2700r2
as a chi-square random variable with n degrees of freedom. k 2
k=1
Again [35, Lemma 1] tells us 2
with probability at least 1 − e−γm/r , when m ≥ c1 nr2 and
2 2 m |S| = cm/r, for some appropriate constants c and c1 .
kΨxk2 ≤ kAxk2 ≤ , ′
12000r2 (2)
Now consider the second term kȲ k2F in kY T k2F , where
′
2
with probability exceeding 1 − e−γm/r , provided m ≥ cnr2 the kth column of Ȳ can be expressed explicitly as
for a sufficiently large constant c. Hence,

′ 1 X
Ȳ = 9χj (aTj uk ) I − U U T aj
2 k m
j∈S
′ 2
1 ck kck k2 ≤ 1
Ỹ ≤ 2 Ψ , 1 X 1

2
k 2 m kck k2 2 2700r3 := dk,j a′j := Ψ̄dk ,
m m
j∈S
which leads to
n×|S| ′
′ 2 r 2 where Ψ̄ ∈ R by aj ’s. Also, we can
is constructed
X ′ 1
Ỹ = Ỹ ≤ . (28) decompose Ψ̄ as Ψ̄ = I − U U T Ā, where Ā ∈ Rn×|S| is

2700r2

F k 2
k=1 |S|
constructed by aj ’s, j ∈ S. For a fixed
vector x ∈ R obey-
2
Then, combining (27) and (28), we know that 2
ing kxk2 = 1, we have Ψ̄x 2 = I − U U T Āx ≤

2
Āx 2 , where Āx 2 is a chi-square random variable with
r 2 ′ 2

(0)

(1) 1
Y T − Y T = U T Ỹ + 2 Ỹ ≤ . (29) 2 2

F F F 30r n degrees of freedom as well. Since we already know that
provided m ≥ c1 nr2 for a sufficiently large constant c1 ,
(2)
Next, let’s check kY T k2F , which can be written as m
Ψ̄x 2 ≤ Āx 2 ≤

2 2
,
(2) ′ 2700δ0cr2
kY T k2F = kU T Ȳ k2F + 2kȲ k2F , 2
with probability exceeding 1 − e−γm/r , we can have
′
where Ȳ = Y (2) U and Ȳ = (I − U U T )Ȳ . For the first 2
term kU T Ȳ k2F , the kth column of U T Ȳ can be formulated
2
′ 1 dk kdk k22 ≤ 1
Ȳ ≤ 2 Ψ̄ ,

explicitly as k 2 m kdk k2 2
2700r3
1 X 1 and a further result
U T Ȳ = 9χj aTj uk U T aj := Φ̄dk , 2 r 2 1
m m ′ ′
X
k
Ȳ = Ȳ ≤ , (31)

j∈S
F k 2 2700r2
k=1
where Φ̄ ∈ Rr×|S| is constructed by U T aj ’s, and dk ∈ R|S| which, combining with (30), leads to
is composed of dk,j ’s, each one expressed as r

(2)
2 2
′ 1
Y T = U T Ȳ + 2 Ȳ ≤ . (32)

dk,j = 9χj aTj uk ,

F F F 30r
11
Finally we can obtain that if m ≥ cnr2 and |S| = c1 m/r, [6] Y. Shechtman, Y. C. Eldar, O. Cohen, H. N. Chapman, J. Miao,
for some constants c and c1 , with probability at least 1 − and M. Segev, “Phase retrieval with application to optical imaging: a
2 contemporary overview,” Signal Processing Magazine, IEEE, vol. 32,
e−γm/r , no. 3, pp. 87–109, 2015.
1 [7] Y. Chen, Y. Chi, and A. Goldsmith, “Exact and stable covariance esti-
(0) (1) (2)
kY T kF = Y T − Y T + Y T ≤ . (33) mation from quadratic sampling via convex programming,” Information
F 15r Theory, IEEE Transactions on, vol. 61, no. 7, pp. 4034–4059, July 2015.
[8] L. L. Scharf, Statistical signal processing. Addison-Wesley Reading,
1991, vol. 98.
D. Proof of Theorem 1 [9] A. Lakhina, K. Papagiannaki, M. Crovella, C. Diot, E. D. Kolaczyk,
The required restricted isometry properties of the linear and N. Taft, “Structural analysis of network traffic flows,” in ACM
SIGMETRICS Performance evaluation review, vol. 32, no. 1. ACM,
mapping A are supplied in Section V-B and a valid appropriate 2004, pp. 61–72.
dual certificate is constructed in Section V-C, therefore, The- [10] D. Gross, Y.-K. Liu, S. T. Flammia, S. Becker, and J. Eisert, “Quantum
orem 1 can be straightforwardly obtained from the Lemma 1 state tomography via compressed sensing,” Physical review letters, vol.
105, no. 15, p. 150401, 2010.
in Section V-A. [11] D. D. Ariananda and G. Leus, “Compressive wideband power spectrum
estimation,” Signal Processing, IEEE Transactions on, vol. 60, no. 9,
pp. 4775–4789, 2012.
VI. C ONCLUSION [12] H. Kim, A. M. Haimovich, and Y. C. Eldar, “Non-coherent direction of
In this paper, we address the problem of estimating a low- arrival estimation from magnitude-only measurements,” Signal Process-
ing Letters, IEEE, vol. 22, no. 7, pp. 925–929, 2015.
rank PSD matrix X ∈ Rn×n from rank-one measurements [13] E. Mason, I.-Y. Son, and B. Yazici, “Passive synthetic aperture radar
that are possibly corrupted by arbitrary outliers and bounded imaging using low-rank matrix recovery methods,” Selected Topics in
noise. This problem has many applications in covariance Signal Processing, IEEE Journal of, vol. 9, no. 8, pp. 1570–1582, 2015.
[14] E. J. Candes and X. Li, “Solving quadratic equations via phaselift
sketching, phase space tomography, and noncoherent detection when there are about as many equations as unknowns,” Foundations
in communications. It is shown that with an order of nr2 of Computational Mathematics, vol. 14, no. 5, pp. 1017–1026, 2014.
random Gaussian sensing vectors, a PSD matrix of rank-r [15] L. Demanet and P. Hand, “Stable optimizationless recovery from phase-
less linear measurements,” Journal of Fourier Analysis and Applications,
can be robustly recovered by minimizing the ℓ1 -norm of the vol. 20, no. 1, pp. 199–221, 2014.
observation residual within the semidefinite cone with high [16] P. Hand, “Phaselift is robust to a constant fraction of arbitrary errors,”
probability, even when a fraction of the measurements are Applied and Computational Harmonic Analysis, 2016.
[17] M. Kabanava, R. Kueng, H. Rauhut, and U. Terstiege, “Stable
adversarially corrupted. This convex formulation eliminates low-rank matrix recovery via null space properties,” arXiv preprint
the need for trace minimization and tuning of parameters arXiv:1507.07184, 2015.
without prior knowledge of the outliers. Moreover, a non- [18] B. Recht, M. Fazel, and P. A. Parrilo, “Guaranteed minimum-rank
solutions of linear matrix equations via nuclear norm minimization,”
convex subgradient descent algorithm is proposed with excel- SIAM review, vol. 52, no. 3, pp. 471–501, 2010.
lent empirical performance, when additional information of [19] W. Dai, O. Milenkovic, and E. Kerman, “Subspace evolution and
the rank of the PSD matrix is available. For future work, transfer (set) for low-rank matrix completion,” Signal Processing, IEEE
Transactions on, vol. 59, no. 7, pp. 3120–3132, 2011.
it would be interesting to theoretically justify the proposed [20] M. Wang, W. Xu, and A. Tang, “A unique “nonnegative” solution to an
non-convex algorithm. Finally, we note that very recently one underdetermined system: From vectors to matrices,” Signal Processing,
of the authors proposed a median-truncated gradient descent IEEE Transactions on, vol. 59, no. 3, pp. 1007–1016, 2011.
[21] E. J. Candès, X. Li, Y. Ma, and J. Wright, “Robust principal component
algorithm for phase retrieval under a constant proportion of analysis?” Journal of ACM, vol. 58, no. 3, pp. 11:1–11:37, Jun 2011.
outliers with provable performance guarantees in [36], which [22] V. Chandrasekaran, S. Sanghavi, P. A. Parrilo, and A. S. Willsky,
might be possible to extend to the problem of robust low- “Rank-sparsity incoherence for matrix decomposition,” SIAM Journal
on Optimization, vol. 21, no. 2, pp. 572–596, 2011.
rank PSD matrix recovery considered in this paper and will [23] J. Wright, A. Ganesh, K. Min, and Y. Ma, “Compressive principal
be pursued elsewhere. component pursuit,” Information and Inference, vol. 2, no. 1, pp. 32–68,
2013.
[24] X. Li, “Compressed sensing and matrix completion with constant
ACKNOWLEDGEMENT proportion of corruptions,” Constructive Approximation, vol. 37, pp. 73–
99, 2013.
We thank the anonymous reviewers for their valuable sug- [25] G. Mateos and G. B. Giannakis, “Robust pca as bilinear decomposition
gestions that greatly improved the quality of this paper. with outlier-sparsity regularization,” Signal Processing, IEEE Transac-
tions on, vol. 60, no. 10, pp. 5176–5190, 2012.
[26] E. J. Candès, X. Li, and M. Soltanolkotabi, “Phase retrieval via wirtinger
R EFERENCES flow: Theory and algorithms,” Information Theory, IEEE Transactions
on, vol. 61, no. 4, pp. 1985–2007, 2015.
[1] J. R. Fienup, “Reconstruction of an object from the modulus of its fourier [27] Y. Chen and E. J. Candès, “Solving random quadratic systems of
transform,” Optics letters, vol. 3, no. 1, pp. 27–29, 1978. equations is nearly as easy as solving linear systems,” arXiv:1505.05114,
[2] E. J. Candes, T. Strohmer, and V. Voroninski, “Phaselift: Exact and May 2015.
stable signal recovery from magnitude measurements via convex pro- [28] C. D. White, R. Ward, and S. Sanghavi, “The local convexity of solving
gramming,” Communications on Pure and Applied Mathematics, vol. 66, quadratic equations,” arXiv preprint arXiv:1506.07868, 2015.
no. 8, pp. 1241–1274, 2013. [29] Y. I. Abramovich, D. A. Gray, A. Y. Gorokhov, and N. K. Spencer,
[3] E. J. Candes, Y. C. Eldar, T. Strohmer, and V. Voroninski, “Phase “Positive-definite toeplitz completion in doa estimation for nonuniform
retrieval via matrix completion,” SIAM Journal on Imaging Sciences, linear antenna arrays. i. fully augmentable arrays,” Signal Processing,
vol. 6, no. 1, pp. 199–225, 2013. IEEE Transactions on, vol. 46, no. 9, pp. 2458–2471, 1998.
[4] I. Waldspurger, A. d’Aspremont, and S. Mallat, “Phase recovery, maxcut [30] H. Qiao and P. Pal, “Generalized nested sampling for compressing low
and complex semidefinite programming,” Mathematical Programming, rank toeplitz matrices,” IEEE Signal Processing Letters, vol. 22, no. 11,
vol. 149, no. 1-2, pp. 47–81, 2015. pp. 1844–1848, 2015.
[5] P. Schniter and S. Rangan, “Compressive phase retrieval via generalized [31] D. Romero, D. D. Ariananda, Z. Tian, and G. Leus, “Compressive co-
approximate message passing,” Signal Processing, IEEE Transactions variance sensing: Structure-based compressive sensing beyond sparsity,”
on, vol. 63, no. 4, pp. 1043–1055, 2015. IEEE Signal Processing Magazine, vol. 33, no. 1, pp. 78–93, 2016.
12
[32] D. Romero, R. López-Valcarce, and G. Leus, “Compression limits for

random vectors with linearly parameterized second-order statistics,”
IEEE Transactions on Information Theory, vol. 61, no. 3, pp. 1410–
1425, 2015.
[33] Q. Zheng and J. Lafferty, “A convergent gradient descent algorithm for
rank minimization and semidefinite programming from random linear
measurements,” in Advances in Neural Information Processing Systems,
2015, pp. 109–117.
[34] R. Vershynin, “Introduction to the non-asymptotic analysis of random
matrices,” Compressed Sensing, Theory and Applications, pp. 210 – 268,
2012.
[35] B. Laurent and P. Massart, “Adaptive estimation of a quadratic functional
by model selection,” Annals of Statistics, pp. 1302–1338, 2000.
[36] H. Zhang, Y. Chi, and Y. Liang, “Provable non-convex phase retrieval
with outliers: Median truncated wirtinger flow,” in International Con-
ference on Machine Learning (ICML), New York, NY, 2016.

Corrupted Rank-One Measurements: Low-Rank Positive Semidefinite Matrix Recovery From

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Corrupted Rank-One Measurements: Low-Rank Positive Semidefinite Matrix Recovery From

Uploaded by

Copyright:

Available Formats

1

Low-Rank Positive Semidefinite Matrix Recovery

for tuning parameters and allowing easy implementations. We L L

no outliers modest outlier amplitudes large outlier amplitudes

with following matrix with respect to Frobenius norm as

The main advantage of Alg. 1 is its low memory and 12

computational complexity. Given that it does not construct 10 0.8 10 0.8

the full PSD matrix, the memory complexity is simply the 8

mnr, that is linear in all the parameters. We demonstrate the

0.7 0.7 X̂ = argminX0 kz − A(X)k1 , s.t. X is Toeplitz. (13)

100 200 300 400 500

holds in the low-rank setting, where trace minimization may 8

be eliminated for low-rank PSD matrix recovery using rank- 6

Fig. 3 further shows the success rates of the Robust- 2

PhaseLift algorithm (a) with respect to the number of measure- 10 20 30 40 50 60 70

Number of measurements (m) Number of measurements (m)

C. Performance of Non-Convex Subgradient Descent 12

We next examine the performance of the non-convex sub- 10 0.8

gradient descent algorithm in Alg. 1, where the number of 8

value to guarantee convergence when terminated. Denote the 4 0.3

solution to Alg. 1 by Û , and each Monte Carlo simulation is 2

kX̂ − X 0 kF /kX 0 kF ≤ 10−6 , where X̂ = Û Û is the (a) (b)

under the same setup of Fig. 3 (b), where the performance is

and |S| ≤ c1 m for some constants c and c1 . with probability

γ provided S ≥ cnr for some constant c. As a result, if

vector x ∈ R|S | obeying kxk2 = 1, we have kΨxk2 =

[32] D. Romero, R. López-Valcarce, and G. Leus, “Compression limits for

You might also like

0.7 0.7 X̂ = argminX0 kz − A(X)k1 , s.t. X is Toeplitz. (13)