Professional Documents
Culture Documents
Corrupted Rank-One Measurements: Low-Rank Positive Semidefinite Matrix Recovery From
Corrupted Rank-One Measurements: Low-Rank Positive Semidefinite Matrix Recovery From
Abstract—We study the problem of estimating a low-rank where X 0 = x0 xT0 is a lifted rank-one matrix from the signal
positive semidefinite (PSD) matrix from a set of rank-one x0 of interest. On the other hand, they could arise by design,
arXiv:1602.02737v2 [cs.IT] 31 Aug 2016
measurements using sensing vectors composed of i.i.d. standard such as from the covariance sketching scheme considered in
Gaussian entries, which are possibly corrupted by arbitrary
outliers. This problem arises from applications such as phase [7], where zi is aggregated from squared intensity measure-
retrieval, covariance sketching, quantum space tomography, and ments of L data samples of a zero-mean ergodic data stream
power spectrum estimation. We first propose a convex optimiza- {xl }∞l=1 as
tion algorithm that seeks the PSD matrix with the minimum ℓ1 -
L L
!
norm of the observation residual. The advantage of our algorithm 1X 2 1X
T
is that it is free of parameters, therefore eliminating the need zi = |hai , xl i| = ai xl xl ai ≈ aTi X 0 ai .
T
presence of outliers. Fortunately, it is reasonable to assume II. PARAMETER -F REE C ONVEX R ELAXATION
that the number of outliers is usually much smaller than the A. Problem Formulation
number of total measurements, making it possible to leverage
the sparsity of the outliers to faithfully recover the low-rank Let X 0 ∈ Rn×n be a rank-r PSD matrix, then the set of
PSD matrix of interest. m measurements, which may be corrupted by either arbitrary
outliers or bounded noise, can be represented as
We first propose a convex optimization algorithm that seeks
the PSD matrix that minimizes the ℓ1 -norm of the mea- z = A(X 0 ) + β + w, (4)
surement residual, where the ℓ1 -norm is adopted to promote
outlier sparsity. The proposed convex program is free of tuning where z, β, w ∈ Rm . The mapping A: Rn×n → Rm is
Tlinear m
parameters and eliminates the need for trace minimization, defined as A (X 0 ) = ai X 0 ai i=1 , where ai ∈ Rn is the
a popular convex surrogate for low-rank matrix recovery, by ith sensing vector composed of i.i.d. standard Gaussian entries,
only enforcing the PSD constraint. Neither does it require i = 1, . . . , m. The vector β denotes the outlier vector, which
the knowledge of the outliers, even their existence. When is assumed to be sparse whose entries can be arbitrarily large.
the sensing vectors are composed of i.i.d. standard Gaussian The fraction of nonzero entries is defined as s := kβk0 /m.
entries, we establish that for a fixed n × n rank-r PSD matrix, Moreover, the vector w denotes the additive noise, which is
as long as the number of measurements exceeds the order assumed bounded as kwk1 ≤ ǫ. Our goal is to robustly recover
of nr2 , the proposed convex program can exactly recover it X 0 from the measurements z.
with high probability, even when a fraction of an order of
1/r measurements are arbitrarily corrupted. Our measurement B. Recovery via Convex Relaxation
complexity is order-wisely near-optimal up to a factor of r,
and is near-optimal in the rank-one case up to a constant To motivate our algorithm, consider the case when only
factor. Furthermore, the recovery is also stable against additive the outlier vector β is present in (4) and the rank of X 0
bounded noise. While the proposed convex program coincides is known. One may seek a rank-r PSD matrix that minimizes
with a version of the PhaseLift algorithm [14]–[16] studied the cardinality of the measurement residual to motivate outlier
in the literature for phase retrieval, our work provides its first sparsity, given as
theoretical performance guarantee to recover low-rank PSD
X̂ = argminX0 kz − A(X)k0 , s.t. rank(X) = r. (5)
matrices in the presence of arbitrary outliers. Moreover, we
show the proposed approach can be easily extended to recover However, both the cardinality minimization and the rank
low-rank Toeplitz PSD matrices via numerical simulations. constraint are NP-hard in general, making this method compu-
To further reduce the computational burden when facing tationally infeasible. A common approach is to resort to convex
large-scale problems, we next develop a non-convex algorithm relaxation, where we relax the cardinality minimization by its
based on subgradient descent when the rank of the PSD convex relaxation, i.e. the ℓ1 -norm, and meanwhile, drop the
matrix, or an upper bound of it, is known a priori. Since any rank constraint, yielding:
rank-r PSD matrix can be uniquely decomposed as X 0 =
U 0 U T0 , where U 0 ∈ Rn×r up to some orthonormal trans- (Robust-PhaseLift:) X̂ = argminX0 kz − A(X)k1 . (6)
formations, it is sufficient to recover U 0 without constructing We denote the above convex program as the Robust-PhaseLift
the PSD matrix explicitly. The subgradient descent algorithm algorithm, since it coincides with the PhaseLift algorithm
then iteratively updates the estimate by descending along the studied in [14]–[16] for phase retrieval1 . The advantage of
subgradient of the ℓ1 -norm of the measurement residual using Robust-PhaseLift in (6) is that it does not require any prior
a properly selected step size and spectral initialization. We knowledge of the noise bound, the rank of X 0 , nor the sparsity
conduct extensive numerical experiments to demonstrate its level of the outliers, and is free of any regularization parameter.
excellent empirical performance, and compare it against the It is also worth emphasizing that due to the special rank-
convex program proposed above as well as other alternative one measurement operator, in (6) it is possible to only honor
approaches in the literature. the PSD constraint but not motivate the low-rank structure
explicitly, via for example, trace minimization2.
Encouragingly, we demonstrate that the algorithm (6) admits
robust recovery of a rank-r PSD matrix as soon as the number
B. Organization
of measurements is large enough, even with a fraction of arbi-
The rest of the paper is organized as below. Section II trary outliers in Theorem 1. To the best of our knowledge, this
presents the proposed convex optimization algorithm and its is the first theoretical performance guarantee of the robustness
corresponding performance guarantee, where detailed com- of (6) with respect to arbitrary outliers in the low-rank setting.
parisons to related work are presented. Section III describes Our main theorem is given as below.
the proposed non-convex subgradient descent algorithm that
1 Note that there are a few different versions of PhaseLift in the literature
is computationally efficient with excellent empirical perfor-
mance. Numerical examples are provided in Section IV. The which are not outlier-robust, therefore we rename (6) to Robust-PhaseLift for
emphasis.
proof of the main theorem is given in Section V. Finally, we 2 The interested readers are invited to look up Fig. 1 in [2] for an intuitive
conclude in Section VI. geometric interpretation in the noise-free and outlier-free case.
3
Theorem 1. Suppose that kwk1 ≤ ǫ and s = kβk0 /m. the same algorithm has been employed to recover low-rank
Assume the support of β is selected uniformly at random with PSD matrices in [7], where an order of nr measurements
the signs of its nonzero entries generated from the Rademacher obtained from i.i.d. sub-Gaussian sensing vectors are shown
distribution as P {sgn (βi ) = −1} = P {sgn (βi ) = 1} = 1/2 to guarantee exact recovery in the noise-free case and stable
for each i ∈ supp(β). Then for a fixed rank-r PSD matrix recovery with bounded noise. One problem with the algorithm
X 0 ∈ Rn×n , there exist some absolute constants c1 > 0 and (7) is that the noise bound ǫ is assumed known. Furthermore,
0 < s0 < 1 such that as long as it is not amenable to handle outliers, since kz − A(X 0 )k1 can
s0 be arbitrarily large with outliers and consequently the ground
m ≥ c1 nr2 , s ≤ , truth X 0 quickly becomes infeasible for (7).
r
the solution to (6) satisfies The proposed algorithm (6) is studied in [14]–[16] as a
rǫ variant of PhaseLift for phase retrieval, corresponding to the
X̂ − X 0
≤ c2 , case where X 0 = x0 xT0 is rank-one. It is shown in [14], [15]
F m
that with O(n) i.i.d. Gaussian sensing vectors, the algorithm
with probability exceeding 1 − exp(−γm/r2 ) for some con- succeeds with high probability. Compared with (7), the algo-
stants c2 and γ. rithm (6) eliminates trace minimization and leads to easier
Theorem 1 has the following consequences. algorithm implementations. We note that [17] also considers
• Exact Recovery with Outliers: When ǫ = 0, Theorem 1
a regularization-free algorithm for PSD matrix estimation that
suggests the recovery is exact using Robust-PhaseLift (6), minimizes the ℓ2 -norm of the residual, which unfortunately,
i.e. X̂ = X 0 even when a fraction of measurements are cannot handle outliers as Robust-PhaseLift (6). Hand [16] first
arbitrarily corrupted, as long as the number of measure- considered the robustness of the Robust-PhaseLift algorithm
ments m is on the order of nr2 . Given there are at least nr (6) in the presence of outliers for phase retrieval, establishing
unknowns, our measurement complexity is near-optimal that the same guarantee holds even with a constant fraction of
up to a factor of r. outliers. Our work extends the performance guarantee in [16]
• Stable Recovery with Bounded Noise: In the presence
to the general low-rank PSD matrix case.
of bounded noise, Theorem 1 suggests that the recovery Broadly speaking, our problem is related to low-rank matrix
performance decreases gracefully with the increase of ǫ, recovery from an under-determined linear system [18]–[20],
where the Frobenius norm of the reconstruction error is where the linear measurements are drawn from inner products
proportional to the per-entry noise level of the measure- with rank-one sensing matrices. It is due to this special
ments. structure of the sensing matrices that we can eliminate the trace
• Phase Retrieval: When r = 1, the problem degenerates
minimization, and only consider the feasibility constraint for
to the case of phase retrieval, and Theorem 1 recovers PSD matrices. Standard approaches for separating low-rank
existing results in [16] for outlier-robust phase retrieval, and sparse components [21]–[25] via convex optimization are
where the measurement complexity is on the order of n, given as
which is optimal up to a scaling factor. min Tr(X) + λkβk1 , s.t. kz − A(X) − βk1 ≤ ǫ,
Let us denote X̂ r = argminrank(Z)=r,Z0 kX̂ − ZkF as X0, β
the best rank-r PSD matrix approximation of X̂, the solution where λ is a regularization parameter that requires to be tuned
to (6). Then Theorem 1 suggests that the estimate X̂ can be properly. In contrast, the formulation (6) is parameter-free.
well approximated by a rank-r PSD matrix since
rǫ III. A N ON -C ONVEX S UBGRADIENT D ESCENT
kX̂ − X̂ r kF ≤ kX̂ − X 0 kF ≤ c2 ,
m A LGORITHM
as long as the number of measurements is sufficiently large.
Furthermore, we have In this section, we propose another algorithm for robust
low-rank PSD matrix recovery from corrupted rank-one mea-
kX̂ r − X 0 kF ≤ kX̂ r − X̂kF + kX̂ − X 0 kF surements assuming the rank (or its upper bound) of the PSD
rǫ matrix X 0 is known a priori as r. In this case, we can
≤ 2kX̂ − X 0 kF ≤ 2c2 ,
m decompose X 0 as X 0 = U 0 U T0 where U 0 ∈ Rn×r . Instead
indicating that X̂ r provides an accurate estimate of X 0 that of directly recovering X 0 , we may aim at recovering U 0 up
is exactly rank-r and PSD. to orthogonal transforms, since (U 0 Q)(U 0 Q)T = U 0 U 0 for
any orthonormal matrix Q ∈ Rr×r . Consider relaxing of the
loss function in (5) but keeping the rank constraint, we obtain
C. Comparisons to Related Work
the following problem:
In the absence of outliers, the PhaseLift algorithm in the
following form X̂ = argminX0 kz − A(X)k1 , s.t. rank(X) = r. (8)
min Tr(X) s.t.
X0
kz − A(X)k1 ≤ ǫ, (7) Since any rank-r PSD matrix X can be written as X = U U T
for some U ∈ Rn×r , (8) can be equivalently reformulated as
where Tr(X) denotes the trace of X, has been proposed
to solve the phase retrieval problem [2], [3], [14]. Later Û = argminU ∈Rn×r f (U ), (9)
4
f (U ) = 1
2m kz − A(U U T )k1
g(U ) = 1
4m kz − A(U U T )k22
Fig. 1: Illustrations of the objective function − log f (U ) and its ℓ2 -norm counterpart − log g(U ) (in negative logarithmic
scales) under different corruption scenarios when U ∈ R2×1 . The number of measurements is m = 100 with i.i.d. Gaussian
sensing vectors, and the fraction of outliers is s = 0.2 with uniformly selected support and amplitudes drawn from Unif[0, 10]
or Unif[0, 100]. It is interesting to observe that while large outliers completely distort g(U ), the proposed objective is quite
robust with the ground truth being the only global optima of f (U ).
0.9
12
1
0.9
0.7 0.7
Rank (r)
Rank (r)
size of U (t) , which is on the order of nr. The computational 6
0.5
0.4
6
0.5
0.4
complexity per iteration is also low, which is on the order of 4 0.3 4 0.3
excellent empirical performance of Alg. 1 in Section IV-C. 100 200 300 400
Number of measurements (m)
500 600
0
0 0.05 0.1
Percent of outliers (s)
0.15 0.2
0
(a) (b)
IV. N UMERICAL E XAMPLES
Fig. 3: Phase transitions of low-rank PSD matrix recovery
A. Performance of Convex Relaxation with respect to (a) the number of measurements and the rank,
with 5% of measurements corrupted by standard Gaussian
We first examine the performance of Robust-PhaseLift in variables; (b) the percent of outliers and the rank, when the
(6). Let n = 40. We randomly generate a low-rank PSD matrix number of measurements is m = 600, when n = 40.
of rank-r as X 0 = U 0 U T0 , where U 0 ∈ Rn×r is composed of
i.i.d. standard Gaussian variables. The sensing vectors are also
composed of i.i.d. standard Gaussian variables. Each Monte B. Convex Relaxation with additional Toeplitz Structure
Carlo simulation is called successful if the normalized estimate
We next consider robust recovery of low-rank Toeplitz PSD
error satisfies kX̂ −X 0 kF /kX 0 kF ≤ 10−6 , where X̂ denotes
matrices, where we allow complex-valued sensing vectors
the solution to (6). For each cell, the success rate is calculated
A(X) = {aH m
i Xai }i=1 and complex-valued Toeplitz PSD
by averaging over 100 Monte Carlo simulations.
matrices X. Estimating low-rank Toeplitz PSD matrices is of
great interests for array signal processing [29]. We modify (6)
12
1
0.9
12
1
0.9
by incorporating the Toeplitz constraint as:
10 0.8 10 0.8
Rank (r)
6
0.5
0.4
6
0.5
0.4
Let n = 64, the Toeplitz PSD matrix X 0 is generated as X 0 =
4 0.3 4 0.3
V ΣV H , where V = [v(f1 ), . . . , v(fr )] ∈ Cn×r is a Vander-
monde matrix with v(fi ) = [1, ej2πfi , . . . , ej2π(n−1)fi ]T , fi ∼
0.2 0.2
2 2
0.1 0.1
0.9 0.9
numerical observation for the phase retrieval problem [4] also 10 0.8 10 0.8
0.6
8
0.7
0.6
Rank (r)
Rank (r)
0.4 0.4
one measurements. 4 0.3 4 0.3
0.1
2
0.2
0.1
0.9
12
1
0.9
0.7
10 0.8
0.7
Rank (r)
Rank (r)
iterations is set as Tmax = 3 × 104 , which is a large
0.5 0.5
6 6
0.4 0.4
0.2
4 0.3
0.2
deemed successful if the normalized estimate error Tsatisfies 0 0.05 0.1 0.15
Percent of outliers (s)
0.2
0
0 0.05 0.1 0.15
Percent of outliers (s)
0.2
0
0.7
8
0.6
Rank (r)
0.5
in the bounded noise w is i.i.d. drawn from Unif[−4/m, 4/m],
6
0.4 thus kwk1 ≤ ǫ, where ǫ = 4. Fig. 7 depicts the mean
4 0.3 squared error kX̂ −X 0 k2F for different algorithms with respect
0.2 to the number of measurements, where X̂ is the estimated
2
0.1
PSD matrix. For the subgradient descent algorithm in Alg. 1,
0
100 200 300 400 500
Number of measurements (m)
600 various ranks are used as prior information, corresponding to
the correct rank r, its underestimate r −1, and its overestimate
r + 1. It can be seen that Alg. 1 works well as long as the
Fig. 5: Phase transitions of low-rank PSD matrix recovery with given rank provides an upper bound of the true rank, and it
respect to the number of measurements and the rank for the performs much better than the WF algorithm which is not
proposed Alg. 1 using noise-free measurements, when n = 40. outlier-robust. On the other hand, the PhaseLift algorithm (7)
does not admit favorable performance for various constraint
Fig. 5 shows the success rate of Alg. 1 with respect to the parameters (ǫ, 2ǫ, 4ǫ) as expected since the outliers do not fall
number of measurements and the rank under the same setup into the prescribed noise bound. In fact, it fails to return any
of Fig. 2 for noise-free measurements, when n = 40. Indeed, feasible solution when the number and amplitudes of outliers
empirically Alg. 1 performs similarly as the convex algorithms is too large in our simulation. In contrast, Robust-PhaseLift
but with a much lower computational cost. Moreover, the allows stable recovery even with an additional bounded noise,
proposed Alg. 1 allows perfect recovery even in the presence which performs comparably with Alg. 1 with the correct model
of outliers. For comparison, we implement the extension of the order.
Wirtinger Flow (WF) algorithm in [26], [28], [33] in the low-
rank case, that minimizes the squared ℓ2 -norm of the residual, 10 4
where the update rule per iteration becomes Robust-PhaseLift
Alg. 1 with r
m Alg. 1 with r+1
1 X
10 2
Alg. 1 with r-1
U (t+1) = U (t) +µWF
t z i − k(U (t) T
) a k 2 T
i 2 ai ai U
(t)
, Wirtinger Flow
PhaseLift with ǫ
m i=1
Mean squared error
PhaseLift with 2ǫ
PhaseLift with 4ǫ
10 0
using the same initialization (11). The step size is set as
µWF
t = 0.1/ kU 0 k2F . Fig. 6 (a) shows the success rates of
Alg. 1 with respect to the percent of outliers and the rank, 10 -2
D. Comparisons with Additional Bounded Noise Fig. 7: Comparisons of mean squared errors using different
Finally, we compare the two proposed algorithms (Robust- algorithms with respect to the number of measurements with
PhaseLift in (6) and Alg. 1), the WF algorithm and the 5% outliers and bounded noise, when n = 40 and r = 3.
PhaseLift algorithm in (7) when the measurements are cor-
rupted by both outliers and bounded noise. Fix n = 40 and
r = 3. The rank-r PSD matrix X 0 , the sensing vectors, as V. P ROOF OF M AIN T HEOREM
well as the outliers are generated similarly as earlier, where In this section we prove Theorem 1, and the roadmap
the fraction of the outliers is set to 5%. Moreover, each entry of our proof is below. In Section V-A, we first provide
7
the sufficient conditions for an approximate dual certificate the solution to (6) satisfies
that certifies the optimality of the proposed algorithm (6) in
rǫ
X̂ − X 0
≤ c ,
Lemma 1. Section V-B records a few lemmas that show A F m
satisfies the required restricted isometry properties. Then, a where c is a constant.
dual certificate is constructed and validated for a fixed low-
rank PSD matrix X 0 in Section V-C. Finally, the proof is Proof: Denote the solution to (6) by X̂ = X 0 +H 6= X 0 ,
concluded in Section V-D. then we have X̂ 0, H T ⊥ 0, and furthermore,
First we introduce some additional notations. Let S be a kA(H) − (β + w)k1 = kz − A(X 0 + H)k1
subset of {1, 2, . . . , m}, then S ⊥ is the complement of S with
respect to {1, 2, . . . , m}. AS is the mapping operator A con- = kz − A(X̂)k1
strained on S, which is defined as AS (X) = aTi Xai i∈S . ≤ kz − A(X 0 )k1 = kβ + wk1 ,
Pm
Denote the adjoint operator of A by A∗ (µ) = i=1 µi ai aTi , where the inequality follows from the optimality of X̂ since
where µi is the ith entry of µ, 1 ≤ i ≤ m. We use kXk, kXkF both X̂ and X 0 are feasible to (6). Since
and kXk1 to denote the spectral norm, the Frobenius norm
and the nuclear norm of the matrix X, respectively, and use kA(H)−(β+w)k1 = kAS (H)−β−wS k1 +kAS ⊥ (H)−wS ⊥ k1 ,
kxkp to denote the ℓp -norm of the vector x. Let the singular and
value decomposition of the fixed rank-r PSD matrix X 0 be kβ + wk1 = kβ + wS k1 + kwS ⊥ k1 ,
X 0 = U ΛU T , then the symmetric tangent space T at X 0 is
denoted by we have
kAS ⊥ (H)k1 ≤ kAS ⊥ (H) − wS ⊥ k1 + kwS ⊥ k1
n o
T := U Z T + ZU T | Z ∈ Rn×r .
≤ kβ + wk1 − kAS (H) − β − wS k1 + kwS ⊥ k1
We denote by PT and PT ⊥ the orthogonal projection onto ≤ kβ + wS k1 − kAS (H) − β − wS k1 + 2kwS ⊥ k1
T and its orthogonal complement, respectively. And for no-
≤ kAS (H)k1 + 2kwS ⊥ k1 ,
tational simplicity, we denote H T := PT (H) and H T ⊥ :=
H −PT (H) for any symmetric matrix H ∈ Rn×n . Moreover, where the last inequality follows from the triangle inequality.
γ, c, c1 and c2 represent absolute constants, whose values may We could further bound
change according to context.
kAS ⊥ (H T )k1 ≤ kAS ⊥ (H)k1 + kAS ⊥ (H T ⊥ )k1
≤ kAS (H)k1 + kAS ⊥ (H T ⊥ )k1 + 2kwS ⊥ k1
A. Approximate Dual Certificate
≤ kAS (H T )k1 + kAS (H T ⊥ )k1
The following lemma suggests that under certain appropriate + kAS ⊥ (H T ⊥ )k1 + 2kwS ⊥ k1
restricted isometry preserving properties of A, a properly
constructed dual certificate can guarantee faithful recovery of = kAS (H T )k1 + kA(H T ⊥ )k1 + 2kwS ⊥ k1 .
the proposed algorithm (6). (19)
Our assumptions on A imply that
Lemma 1 (Approximate Dual Certificate for (6)). Denote a
subset S with |S| s√0 1
m := ⌈ 13 2r ⌉, where 0 < s0 < 1 is some 1+ Tr (H T ⊥ )
constant, and the support of β satisfies supp(β) ⊆ S. Suppose 10
that the mapping A obeys that for all symmetric matrices X, 1
≥ kA (H T ⊥ )k1
m
1 1 1
kA (X)k1 ≤ 1 + kXk1 , (14) ≥ (kA ⊥ (H T )k1 − kAS (H T )k1 − 2kwS ⊥ k1 )
m 10 m S
|S ⊥ |
and 1 |S| 1 2ǫ
≥ 1− kH T kF − 1+ kH T k1 − ,
1 1 5m 12 m 10 m
kAS (X)k1 ≤ 1+ kXk1 , (15)
|S| 10 where the first inequality follows from (14) due to kH T ⊥ k1 =
and for all matrices X ∈ T , Tr (H T ⊥ ), as H T ⊥ 0, the second inequality follows from
(19), and the last inequality follows from (15) and (16). This
1 1 1 gives
kAS ⊥ (X)k1 > 1− kXkF , (16)
|S ⊥ | 5 12
|S | |S| √
⊥
2ǫ
Tr (H T ⊥ ) ≥ − 2r kH T kF − , (20)
where AS and AS ⊥ is the operator constrained on S and S ⊥ 6m m m
respectively. Then if there exists a matrix Y = A∗ (µ) that √
satisfies where we use the inequality kH T k1 ≤ 2rkH T kF .
1 1 On the other hand, since µ/(9/m) is a subgradient of the
Y T ⊥ − I T ⊥ , kY T kF ≤ , (17) ℓ1 -norm at β from (18), we have
r 13r Dm E
and kβk1 + µ, w − A(H) ≤ kw + β − A(H)k1
µi = m 9
sgn(βi ), i ∈ supp(β) 9
9 , (18) ≤ kβ + wk1 ≤ kβk1 + kwk1 ,
|µi | ≤ m , i∈/ supp(β)
8
which, by a simple transformation, is Lemma 3 ( [7]). Suppose the sensing vectors ai ’s are
9 composed of i.i.d. sub-Gaussian entries, then there exist pos-
hµ, A(H)i ≥ hµ, wi − kwk1 itive universal constants c1 , c2 , c3 such that, provided that
m
m > c3 nr, for all matrices X of rank at most r, one has
9 18ǫ
≥ − kµk∞ + kwk1 ≥ − . 2
m m 1 − δrlb kXkF ≤
kB (X)k1 ≤ 1 + δrub kXkF ,
m
Then with
hH, Y i = hA(H), µi, with probability exceeding 1 − c1 e−c2 m , where δrlb and δrub
are defined as the RIP-ℓ2/ℓ1 constants. And the operator B
we can get represents the linear transformation that maps X ∈ Rn×n
to {Bi (X)}m/2i=1 ∈ R
m/2
, where Bi (X) := ha2i−1 aT2i−1 −
18ǫ T
− ≤ hA(H), µi = hH, Y i a2i a2i , Xi.
m
= hH T , Y T i + hH T ⊥ , Y T ⊥ i The third condition (16) can be easily validated from the
2
1 lower bound by setting δrlb appropriately, since m kB (X)k1 ≤
≤ kY T kF kH T kF − hH T ⊥ , I T ⊥ i 2 Pm/2
T
T
r m i=1 ha 2i−1 a 2i−1 , Xi + ha 2i a 2i , Xi =
1 1 2kA (X) k1 .
≤ kH T kF − Tr(H T ⊥ ),
13r r
which gives C. Construction of Dual Certificate
1 18rǫ For notational simplicity, let α0 := EZ 2 1{|Z|≤3} ≈ 0.9707,
Tr(H T ⊥ ) ≤ kH T kF + . (21)
13 m β0 := EZ 4 1{|Z|≤3} ≈ 2.6728 and θ0 := EZ 6 1{|Z|≤3} ≈
Combining with (20), we know 11.2102 for a standard Gaussian random variable Z, where
1E is an indicator function with respect to an event E.
|S | |S| √
⊥
2ǫ 1 18rǫ Consider that the singular value decomposition of a PSD
− 2r kH T kF − ≤ kH T kF + .
6m m m 13 m matrix
Pr X 0 of rank at most r can be represented as X 0 =
T
⊥ √ λ
i=1 i ui ui , then inspired by [14], [16], we construct Y as
Since |S6m| − |S|
m
1
2r − 13 > 0 under the assumption on |S|
m r
in Lemma 1, we have 1 X h 1 X T 2
Y := a ui 1{|aT ui |≤3}
m r i=1 j j
20rǫ rǫ j∈S ⊥
kH T kF ≤ √ ≤ c1 ,
β0 − α0 i 9 X
m |S ⊥ |
− |S| 2r − 1 m
6m m 13 − (α0 + ) · aj aTj + χj aj aTj
r m
j∈S
where c1 is some fixed constant. Finally, we have (0) (1) (2)
:= Y −Y +Y ,
kX̂ − X 0 kF ≤ kH T kF + kH T ⊥ kF
where
≤ kH T kF + Tr(H T ⊥ ) " r #
1
18rǫ rǫ (0) 1 X 1 X T 2
≤ 1+ kH T kF + ≤c , Y = aj ui 1{|aT ui |≤3} aj aTj ,
13 m m m r i=1
j
j∈S ⊥
X
for some constant c. 1 β0 − α0
Y (1) = α0 + aj aTj ,
m r
j∈S ⊥
B. Restricted Isometry of A 9 X
Y (2) = χj aj aTj .
The first two conditions (14) and (15) in Lemma 1 are m
j∈S
supplied straightforwardly in the following lemma as long as
We set χj = sgn (βj ) if j ∈ supp(β), otherwise χj ’s
m ≥ cnr and |S| = c1 m/r ≥ c2 n for some constants c, c1
are i.i.d. Rademacher random variables with P {χj = 1} =
and c2 .
P {χj = −1} = 1/2.
Lemma 2 ( [2]). Fix any δ ∈ (0, 12 ) and assume m ≥ 20δ −2 n. The construction immediately indicates that Y satisfies (18).
Then for all PSD matrices X, one has We will show that Y satisfies (17) with high proability. In what
follows, we separate the constructed Y into two parts and
1
(1 − δ) kXk1 ≤ kA (X)k1 ≤ (1 + δ) kXk1 consider the bounds on Y (0) − Y (1) and Y (2) respectively.
m
1) Proof of Y T ⊥ + r1 I T ⊥ 0: First, by standard results
2
with probability exceeding 1 − 2e−mǫ /2 , where ǫ2 + ǫ = 4δ . in random matrix theory [34, Corollary 5.35], we have
The right hand side holds for all symmetric matrices.
m
β0 − α0
β0
(1)
|S ⊥ | Y − α0 + I
≤ 40r ,
The third condition (16) in Lemma 1 can be obtained using r
the mixed-norm RIP-ℓ2 /ℓ1 provided in [7] as long as m ≥ cnr
1 − 2e−γ |S |/r for some constant γ
⊥ 2
gives 2) Proof of kY T kF ≤ 13r1
: Let Ỹ = Y (0) − Y (1) U ,
′
and Ỹ = I − U U T Ỹ be the projection of Ỹ onto the
m (1) β0 − α0
β0
|S ⊥ | Y T ⊥ − α0 + IT ⊥
≤ . (22)
r
40r orthogonal complement of U , then we have
2
2
′
2
Let a′j = (I − U U T )aj be the projection of aj onto the
(0) (1)
Y T − Y T
=
U T Ỹ
+ 2
Ỹ
.
(26)
orthogonal complement of the column space of U , then we F F F
T
have First consider the term kU Ỹ k2F
in (26), where the kth
(0) 1 X column of U T Ỹ can be expressed explicitly as
Y T⊥ = ǫj ǫTj ,
m ⊥
j∈S
U T Ỹ
P 2 1/2 k "
1 r r #
where ǫj = r i=1 aTj ui 1{|aT ui |≤3} a′j are i.i.d.
1 X 1 X T 2 β0 − α0
j
= a ui 1{|aT ui |≤3} − α0 +
copies of a zero-mean, isotropic and sub-Gaussian random m r i=1 j j r
j∈S ⊥
vector ǫ, which satisfies E[ǫǫT ] = α0 I T ⊥ . Following [34,
· aTj uk U T aj
Theorem 5.39], we have
1
m (0)
α0
Y − α 0 T
≤
I ⊥
, (23) := Φck ,
|S ⊥ | T ⊥ 40r m
where Φ ∈ Rr×|S | is constructed by U T aj ’s, and ck ∈
⊥
−γ |S ⊥ |/r 2
⊥ at least2 1 − 2e
with probability for some constant
R|S | is composed of ck,j ’s, each one expressed as
⊥
2
with probability at least 1 − e−γm/r , when m ≥ cnr2 and with E[d2k,j ] = 81. Note 2
that
dk,j ’s are i.i.d. sub-exponential
|S| ≤ c1 m.
2
random variables with
dk,j
≤ K, for some constant K,
To bound the second term in (26), we could adopt the same ψ1
′ then based on [34, Corollary 5.17],
techniques as before. The kth column of Ỹ can be expressed
explicitly as X
ǫ2 |S|
2 2
P dk,j − Edk,j ≥ ǫ |S| ≤ 2exp −c1
,
K2
′
Ỹ
j∈S
k
which indicates that if |S| = cm/r, for some constant c,
" r #
1 X 1 X T 2 β0 − α0
= a ui 1{|aT ui |≤3} − α0 + 2
m r i=1 j j r kdk k2 ≤ (81 + c1 ) |S| ≤ 82 |S| := δ0 |S|
j∈S ⊥
holds with probability at least 1 − e−γm/r
· (aTj uk ) I − U U T aj
.
And
2
for a fixed
vector x ∈ R|S| obeying kxk2 = 1,
Φ̄x
2 is also a chi-
1 X 1 square random variable with r degrees of freedom, so
:= ck,j a′j := Ψck ,
m ⊥
m
2 m
j∈S
Φ̄x
≤
2
,
2700δ0cr2
where Ψ ∈ Rn×|S | is constructed by a′j ’s, each of which,
⊥
2
with probability at least 1 − e−γm/r , provided m ≥ c1 nr2
as a reminder, is the projection of aj onto the orthog-
for some sufficiently large constant c1 . Thus we have
onal
complement
of the column space
of U as a′j =
2
2 1
dk
1
I − U U T aj . Equivalently, Ψ = I − U U T A, where
U T Ȳ
= 2
Φ̄
kdk k22 ≤ ,
k 2 m
kdk k2
2700r3
A ∈ Rn×|S | is constructed by aj ’s, j ∈ S ⊥ . For a fixed
⊥
2
Finally we can obtain that if m ≥ cnr2 and |S| = c1 m/r, [6] Y. Shechtman, Y. C. Eldar, O. Cohen, H. N. Chapman, J. Miao,
for some constants c and c1 , with probability at least 1 − and M. Segev, “Phase retrieval with application to optical imaging: a
2 contemporary overview,” Signal Processing Magazine, IEEE, vol. 32,
e−γm/r , no. 3, pp. 87–109, 2015.
1 [7] Y. Chen, Y. Chi, and A. Goldsmith, “Exact and stable covariance esti-
(0) (1) (2)
kY T kF =
Y T − Y T + Y T
≤ . (33) mation from quadratic sampling via convex programming,” Information
F 15r Theory, IEEE Transactions on, vol. 61, no. 7, pp. 4034–4059, July 2015.
[8] L. L. Scharf, Statistical signal processing. Addison-Wesley Reading,
1991, vol. 98.
D. Proof of Theorem 1 [9] A. Lakhina, K. Papagiannaki, M. Crovella, C. Diot, E. D. Kolaczyk,
The required restricted isometry properties of the linear and N. Taft, “Structural analysis of network traffic flows,” in ACM
SIGMETRICS Performance evaluation review, vol. 32, no. 1. ACM,
mapping A are supplied in Section V-B and a valid appropriate 2004, pp. 61–72.
dual certificate is constructed in Section V-C, therefore, The- [10] D. Gross, Y.-K. Liu, S. T. Flammia, S. Becker, and J. Eisert, “Quantum
orem 1 can be straightforwardly obtained from the Lemma 1 state tomography via compressed sensing,” Physical review letters, vol.
105, no. 15, p. 150401, 2010.
in Section V-A. [11] D. D. Ariananda and G. Leus, “Compressive wideband power spectrum
estimation,” Signal Processing, IEEE Transactions on, vol. 60, no. 9,
pp. 4775–4789, 2012.
VI. C ONCLUSION [12] H. Kim, A. M. Haimovich, and Y. C. Eldar, “Non-coherent direction of
In this paper, we address the problem of estimating a low- arrival estimation from magnitude-only measurements,” Signal Process-
ing Letters, IEEE, vol. 22, no. 7, pp. 925–929, 2015.
rank PSD matrix X ∈ Rn×n from rank-one measurements [13] E. Mason, I.-Y. Son, and B. Yazici, “Passive synthetic aperture radar
that are possibly corrupted by arbitrary outliers and bounded imaging using low-rank matrix recovery methods,” Selected Topics in
noise. This problem has many applications in covariance Signal Processing, IEEE Journal of, vol. 9, no. 8, pp. 1570–1582, 2015.
[14] E. J. Candes and X. Li, “Solving quadratic equations via phaselift
sketching, phase space tomography, and noncoherent detection when there are about as many equations as unknowns,” Foundations
in communications. It is shown that with an order of nr2 of Computational Mathematics, vol. 14, no. 5, pp. 1017–1026, 2014.
random Gaussian sensing vectors, a PSD matrix of rank-r [15] L. Demanet and P. Hand, “Stable optimizationless recovery from phase-
less linear measurements,” Journal of Fourier Analysis and Applications,
can be robustly recovered by minimizing the ℓ1 -norm of the vol. 20, no. 1, pp. 199–221, 2014.
observation residual within the semidefinite cone with high [16] P. Hand, “Phaselift is robust to a constant fraction of arbitrary errors,”
probability, even when a fraction of the measurements are Applied and Computational Harmonic Analysis, 2016.
[17] M. Kabanava, R. Kueng, H. Rauhut, and U. Terstiege, “Stable
adversarially corrupted. This convex formulation eliminates low-rank matrix recovery via null space properties,” arXiv preprint
the need for trace minimization and tuning of parameters arXiv:1507.07184, 2015.
without prior knowledge of the outliers. Moreover, a non- [18] B. Recht, M. Fazel, and P. A. Parrilo, “Guaranteed minimum-rank
solutions of linear matrix equations via nuclear norm minimization,”
convex subgradient descent algorithm is proposed with excel- SIAM review, vol. 52, no. 3, pp. 471–501, 2010.
lent empirical performance, when additional information of [19] W. Dai, O. Milenkovic, and E. Kerman, “Subspace evolution and
the rank of the PSD matrix is available. For future work, transfer (set) for low-rank matrix completion,” Signal Processing, IEEE
Transactions on, vol. 59, no. 7, pp. 3120–3132, 2011.
it would be interesting to theoretically justify the proposed [20] M. Wang, W. Xu, and A. Tang, “A unique “nonnegative” solution to an
non-convex algorithm. Finally, we note that very recently one underdetermined system: From vectors to matrices,” Signal Processing,
of the authors proposed a median-truncated gradient descent IEEE Transactions on, vol. 59, no. 3, pp. 1007–1016, 2011.
[21] E. J. Candès, X. Li, Y. Ma, and J. Wright, “Robust principal component
algorithm for phase retrieval under a constant proportion of analysis?” Journal of ACM, vol. 58, no. 3, pp. 11:1–11:37, Jun 2011.
outliers with provable performance guarantees in [36], which [22] V. Chandrasekaran, S. Sanghavi, P. A. Parrilo, and A. S. Willsky,
might be possible to extend to the problem of robust low- “Rank-sparsity incoherence for matrix decomposition,” SIAM Journal
on Optimization, vol. 21, no. 2, pp. 572–596, 2011.
rank PSD matrix recovery considered in this paper and will [23] J. Wright, A. Ganesh, K. Min, and Y. Ma, “Compressive principal
be pursued elsewhere. component pursuit,” Information and Inference, vol. 2, no. 1, pp. 32–68,
2013.
[24] X. Li, “Compressed sensing and matrix completion with constant
ACKNOWLEDGEMENT proportion of corruptions,” Constructive Approximation, vol. 37, pp. 73–
99, 2013.
We thank the anonymous reviewers for their valuable sug- [25] G. Mateos and G. B. Giannakis, “Robust pca as bilinear decomposition
gestions that greatly improved the quality of this paper. with outlier-sparsity regularization,” Signal Processing, IEEE Transac-
tions on, vol. 60, no. 10, pp. 5176–5190, 2012.
[26] E. J. Candès, X. Li, and M. Soltanolkotabi, “Phase retrieval via wirtinger
R EFERENCES flow: Theory and algorithms,” Information Theory, IEEE Transactions
on, vol. 61, no. 4, pp. 1985–2007, 2015.
[1] J. R. Fienup, “Reconstruction of an object from the modulus of its fourier [27] Y. Chen and E. J. Candès, “Solving random quadratic systems of
transform,” Optics letters, vol. 3, no. 1, pp. 27–29, 1978. equations is nearly as easy as solving linear systems,” arXiv:1505.05114,
[2] E. J. Candes, T. Strohmer, and V. Voroninski, “Phaselift: Exact and May 2015.
stable signal recovery from magnitude measurements via convex pro- [28] C. D. White, R. Ward, and S. Sanghavi, “The local convexity of solving
gramming,” Communications on Pure and Applied Mathematics, vol. 66, quadratic equations,” arXiv preprint arXiv:1506.07868, 2015.
no. 8, pp. 1241–1274, 2013. [29] Y. I. Abramovich, D. A. Gray, A. Y. Gorokhov, and N. K. Spencer,
[3] E. J. Candes, Y. C. Eldar, T. Strohmer, and V. Voroninski, “Phase “Positive-definite toeplitz completion in doa estimation for nonuniform
retrieval via matrix completion,” SIAM Journal on Imaging Sciences, linear antenna arrays. i. fully augmentable arrays,” Signal Processing,
vol. 6, no. 1, pp. 199–225, 2013. IEEE Transactions on, vol. 46, no. 9, pp. 2458–2471, 1998.
[4] I. Waldspurger, A. d’Aspremont, and S. Mallat, “Phase recovery, maxcut [30] H. Qiao and P. Pal, “Generalized nested sampling for compressing low
and complex semidefinite programming,” Mathematical Programming, rank toeplitz matrices,” IEEE Signal Processing Letters, vol. 22, no. 11,
vol. 149, no. 1-2, pp. 47–81, 2015. pp. 1844–1848, 2015.
[5] P. Schniter and S. Rangan, “Compressive phase retrieval via generalized [31] D. Romero, D. D. Ariananda, Z. Tian, and G. Leus, “Compressive co-
approximate message passing,” Signal Processing, IEEE Transactions variance sensing: Structure-based compressive sensing beyond sparsity,”
on, vol. 63, no. 4, pp. 1043–1055, 2015. IEEE Signal Processing Magazine, vol. 33, no. 1, pp. 78–93, 2016.
12