A Non-Convex Relaxation For Fixed-Rank Approximation

A Non-Convex Relaxation for Fixed-Rank Approximation
Carl Olsson1,2 Marcus Carlsson1 Erik Bylow1

1 2
Centre for Mathematical Sciences Department of Electrical Engineering
Lund University Chalmers University of Technology
arXiv:1706.05855v2 [math.OC] 10 Nov 2017
Abstract problems of the form
This paper considers the problem of finding a low rank min I( rank(X) ≤ r) + kAX − bk2 . (2)
X
matrix from observations of linear combinations of its el-
ements. It is well known that if the problem fulfills a re- Here I( rank(X) ≤ r) is 0 if rank(X) ≤ r and ∞ other-
stricted isometry property (RIP), convex relaxations using wise. The linear operator A : Rm×n → Rp is assumed to
the nuclear norm typically work well and come with theo- fulfill a restricted isometry property (RIP) [24]
retical performance guarantees. On the other hand these
formulations suffer from a shrinking bias that can severely (1 − δq )kXk2F ≤ kAXk2 ≤ (1 + δq )kXk2F , (3)
degrade the solution in the presence of noise.
In this theoretical paper we study an alternative non- for all matrices with rank(X) ≤ q. The standard approach
convex relaxation that in contrast to the nuclear norm does for problems of this class is to replace
P the rank function with
not penalize the leading singular values and thereby avoids the convex nuclear norm kXk∗ = i σi (X) [24, 5]. It was
this bias. We show that despite its non-convexity the pro- first observed that this is the convex envelope of the rank
posed formulation will in many cases have a single local function over the set {X; σ1 (X) ≤ 1} in [11]. Since then
minimizer if a RIP holds. Our numerical tests show that a number of generalizations that give performance guar-
our approach typically converges to a better solution than antees for the nuclear norm relaxation have appeared, e.g.
nuclear norm based alternatives even in cases when the RIP [24, 23, 5, 6]. The approach does however suffer from a
does not hold. shrinking bias that can severely degrade the solution in the
presence of noise. In contrast to the rank constraint the nu-
clear norm penalizes both small singular values of X, as-
1. Introduction sumed to stem from measurement noise, and large singu-
lar values, assumed to make up the true signal, equally. In
Low rank approximation is an important tool in appli- some sense the suppression of noise also requires an equal
cations such as rigid and non rigid structure from motion, suppression of signal. Non-convex alternatives have been
photometric stereo and optical flow [25, 4, 26, 13, 2, 12]. shown to improve performance [22, 20].
The rank of the approximating matrix typically describes In this paper we will consider the relaxation
the complexity of the solution. For example, in non-rigid
structure from motion the rank measures the number of ba- min Rr (X) + kAX − bk2 , (4)
X
sis elements needed to describe the point motions [4]. Un-
der the assumption of Gaussian noise the objective is typi- where
cally to solve
N
X
min kX − X0 k2F , (1) Rr (X) = max zi2 − kX − Zk2 , (5)
Z
rank(X)≤r i=r+1
where X0 is a measurement matrix and k · kF is the Frobe- and zi , i = 1, ..., N are the singular values of X and Z
nius norm. The problem can be solved optimally using SVD respectively. The minimization over Z does not have any
[9], but the strategy is limited to problems where all matrix closed form solution, however it was shown in [18, 1] how
elements are directly measured. In this paper we will con- to efficiently evaluate and compute its proximal operator.
sider low rank approximation problems where linear com- Figure 1 shows a three dimensional illustration of the level
binations of the elements are observed. We aim to solve sets of the regularizer. In [18, 17] it was shown that
1
then there can not be any other stationary point with rank
less than or equal to r. The matrix Z is related to the gra-
dient of the objective function at Xs (see Section 2). The
term kX − Zk2F can be seen as a local approximation of
kAX − bk2 close to Xs (see [21]). If for example there
is a rank r matrix X0 such that b = AX0 then it is easy
to show that X0 is a stationary point and the corrrespond-
Figure 1: Level set surfaces {X | Rr (X) = α} for X = ing Z is identical to X0 . Since this means that zr+1 = 0
diag(x1 , x2 , x3 ) with r = 1 (Left) and r = 2 (Middle). our results certify that this is the only stationary point to
Note that when r = 1 the regularizer promotes solutions the problem if A fulfills (3) with δ2r < 12 . The following
where only one of xk is non-zero. For r = 2 the regu- lemma clarifies the connection between the stationary point
larlizer instead favors solutions with two non-zero xk . For Xs and Z.
comparison we also include the nuclear norm. (Right)
Lemma 1.1. The point Xs is stationary in F (X) = R(X)+
kAX − bk2 if and only if 2Z ∈ ∂G(Xs ), where G(X) =
Rr (X) + kX − X0 k2F , (6) Rr (X) + kXk2F , and if and only if
is the the convex envelope of Xs ∈ arg min Rr (X) + kX − Zk2F . (8)

X
I( rank(X) ≤ r) + kX − X0 k2F . (7)
(The proof is identical to that of Lemma 3.1 in [21].) In
By itself the regularization term Rr (X) is not convex, but [1] it is shown that as long as zr 6= zr+1 the unique solution
when adding a quadratic term kXk2F the result is convex. It of (8) is the best rank r approximation of Z. When there
is shown in [1] that (7) and (6) have the same optimizer as are several singular values that are equal to zr , (8) will have
long as the singular values of X0 are distinct. (When this is multiple solutions and some of them will not be of rank r.
not the case the minimizer is not unique.) Assuming that (3)
In [7] the relationship between minimizers of (4) and (2)
holds kAXk2 will behave roughly like kXkF for matrices
is studied. We note that [7] shows that if kAk ≤ 1 then any
of rank less than q, and therefore it seems reasonable that
local minimizer of (4) is also a local minimizer of (2), and
(4) should have some convexity properties. In this paper we
that their global minimizers coincide. In this situation local
study the stationary points of (4). We show that if a RIP
minimizers of F will therefore be rank r approximations
property holds it is in many cases possible to guarantee that
of Z. Hence, loosely speaking our results state that in this
any stationary point of (4) (with rank r) is unique.
situation any local minimum of F is likely to be unique.
A number of recent works propose to use the related
k · kr∗ -norm [19, 10, 16, 15] (sometimes referred to as the Our work builds on that of [21] which derivesP similar
spectral k-support norm). This is a generalization of the nu- results for the non-convex regularizer Rµ (X) = iµ −
√
clear norm which is obtained when selecting r = 1. It can max( µ − xi )2 , where xi are the singular values of X.
be shown [19] that the extreme points of the unit ball with In this case a trade-off between rank and residual error is
this norm are rank r matrices. Therefore this choice may optimized using the formulation
be more appropriate than the nuclear norm when search-
ing for solutions of a particular (known) rank. It can be min Rµ (X) + kAX − bk2 . (9)
X
seen (e.g. from the derivations in [16]) that kXkr∗ is
the convex envelope of (7) when X0 = 0, which gives While it can be argued that (9) and (4) are essentially equiv-
kXk2r∗ = Rr (X) + kXk2F . While the approach is convex alent since we can iteratively search for a µ that gives the
the extra norm penalty adds a (usually) unwanted shrinking desired rank, the results of [21] may not rule out the exis-
bias similar to what the nuclear norm does. In contrast, our tence of multiple high rank stationary points. In contrast,
approach avoids this bias since it uses a non-convex regu- when using (4) our results imply that if zr+1 < (1 − 2δr )zr
larizer. Despite this non-convexity we are still able to derive then Xs is the unique stationary point of the problem. (To
strong optimality guarantees for an important class of prob- see this, note that if there are other stationary points we can
lem instances. by the preceding discussion assume that at least one is of
rank r or less which contradicts our main result in Theo-
1.1. Main Results and Contributions
rem 2.4). Hence, in this sense our results are stronger than
Our main result, Theorem 2.4, shows that if Xs is a sta- those of [21] and allow for directly searching for a matrix of
tionary point of (4) and the singular values zi of the matrix the desired rank with an essentially parameter free formula-
Z = (I − A∗ A)Xs + A∗ b fulfill zr+1 < (1 − 2δ2r )zr tion.
1.2. Notation To find the set of subgradients we thus need to determine
all maximizers of L. Since the maximizing Z has the same
In this section we introduce some preliminary material
U and V as X what remains is to determine the singular
and notation. In general we will use boldface to denote a
values of Z. It can be shown [18] that these have the form
vector x and its ith element xi . Unless otherwise stated the
singular values of a matrix X will be denoted xi and the

max(xi , s) i ≤ r

vector of singular values x. By √ kxk we denote the stan- zi ∈ s i ≥ r, xi 6= 0 . (15)
dard euclidean norm kxk = xT x. A diagonal matrix 
[0, s] i > r, xi = 0

with diagonal elements x will be denoted Dx . For matri-
ces we define the scalar product as hX, Y i = tr(X T Y ),
for some number s ≥ xr . (The case xi = 0, i > r is
where tr p is the trace function,
pPn and the Frobenius norm actually not addressed in [18]. However, it is easy to see that
kXkF = hX, Xi = i=1 xi . The adjoint of the lin- any value in [0, s] works since zi vanishes from (12) when
ear matrix operator A is denoted A∗ . By ∂F (X) we mean
xi = 0, i > r. In fact, any value [−s, s] works, but we use
the set of subgradients of the function F at X and by a sta-
the convention that singular values are positive. Note that
tionary point we mean a solution to 0 ∈ ∂F (X).
the columns of U that correspond to zero singular values
of X are not uniquely defined. We can always achieve a
2. Optimality Conditions
decreasing sequence with zi ∈ [0, s] by changing signs and
Let F (X) = Rr (X)+kAX −bk2 . We can equivalently switching order.)
write For a general matrix X the value of s can not be deter-
mined analytically but has to be computed numerically by
F (X) = G(X) − δq kXk2F + H(X) + kbk2 , (10)
maximizing a one dimensional concave and differentiable
where G(X) = Rr (X) + kXk2F and H(X) = δq kXk2F
+ function [18]. If rank(X) ≤ r it is however clear that the
kAXk2 − kXk22 − 2hAX, bi. The function G is convex optimal choice is s = xr . To see this we note that since the
and sub-differentiable. Any stationary point Xs of F there- optimal Z is of the form U Dz V T we have
fore has to fulfill r r
X X
2δq Xs − ∇H(Xs ) ∈ ∂G(Xs ). (11) L(X, Z) = − (zi − xi )2 + x2i , (16)
i=1 i=1
Computation of the gradient gives the optimality conditions
2Z ∈ ∂G(Xs ) where Z = (I − A∗ A)Xs + A∗ b. if rank(x) ≤ r. Selecting
Pr s = xr and inserting (15) into
(16) gives L(X, Z) = i=1 x2i , which is clearly the maxi-
2.1. Subgradients of G mum. Hence if rank(X) = r we conclude that the subgra-
For our analysis we need to determine the subdifferen- dients of g are given by 2Z = 2U Dz V T where
tial ∂G(X) of the function G(X). Let x be the vector of (
singular values of X and X = U Dx V T be the SVD. Using xi i≤r
zi ∈ . (17)
Von Neumann’s trace theorem it is easy to see [18] that the [0, xr ] i ≥ r
Z that maximizes (5) has to be of the form Z = U Dz V T ,
where z are singular values. If we let 2.2. Growth estimates for the ∂G(X)
r
X Next we derive a bound on the growth of the subgradi-
L(X, Z) = − zi2 + 2hZ, Xi, (12) ents that will be useful when considering the uniqueness of
i=1 low rank stationary points.
then we have G(X) = maxZ L(X, Z). The function L is Let x and x0 be two vectors both with at most r non-zero
linear in X and concave in Z. Furthermore for any given X (positive) elements, and I and I 0 be the indexes of the r
the corresponding maximizers can be restricted to a com- largest elements of x and x0 respectively. We will assume
pact set (because of the dominating quadratic term). By that both I and I 0 contain r elements. If in particular x0 has
Danskin’s Theorem, see [3], the subgradients of G are then fewer than r non-zero elements we also include some zero
given by elements in I 0 . We define the corresponding sequences z
and z0 by
∂G(X) = convhull{∇X L(X, Z), Z ∈ Z(X)}, (13)
( (
where Z(X) = arg maxZ L(X, Z). We note that by xi i∈I x0i i ∈ I0
zi ∈ , zi0 ∈ , (18)
concavity the maximizing set Z(X) is convex. Since [0, s] i ∈/I [0, s0 ] i ∈/ I0
∇X L(X, Z) = 2Z we get
where s = mini∈I xi and s0 = mini∈I x0i . If x0 has fewer
∂G(X) = 2 arg max L(X, Z). (14) than r non-zero elements then s0 = 0. Note that we do not
Z
require that the elements of the x, x0 , z and z0 vectors are then
ordered in decreasing order. We will see later (Lemma 2.2) hMπ z0 − z, Mπ x0 − xi
a∗ = min , (26)
that in order to estimate the effects of the U and V matrices Mπ kMπ x0 − xk2
we need to be able to handle permutations of the singular
where Mπ belongs to the set of permutation matrices.
values. For our analysis we will also use the quantity s =
maxi∈I/ zi . The proof is almost identical to that of Lemma 4.1 in
Lemma 2.1. If s < cs, where 0 < c < 1 then [21] and therefore we omit it. While our subdifferential is
different to the one studied in [21], for both of them we have
1−c 0 that the z and x vectors fulfill zi ≥ xi ≥ 0 which is the only
hz0 − z, x0 − xi > kx − xk2 . (19)
2 property that is used in the proof.
Proof. Since zi = xi when i ∈ I and xi = 0 otherwise, we Corollary 2.3. Assume that X is of rank r and 2Z ∈
can write the inner product hz0 − z, x0 − xi as ∂G(X). If the singular values of the matrix Z fulfill zr+1 <
X X X czr , where 0 < c < 1, then for any 2Z 0 ∈ ∂G(X 0 ) with
(xi −x0i )2 + xi (xi −zi0 )+ x0i (x0i −zi ). rank(X 0 ) ≤ r we have
i∈I i∈I i∈/I
i ∈ I0 / I0
i∈ i ∈ I0 1−c
(20) hZ 0 − Z, X 0 − Xi > kX 0 − Xk2F , (27)
2
Note that kx0 − xk2 =
X X X as long as kX 0 − XkF 6= 0.
(xi − x0i )2 + x2i + x02
i . (21)
i∈I i∈I i∈/I
Proof. We let x, x0 , z and z0 be the singular values of the
i ∈ I0 / I0
i∈ i ∈ I0 matrices X, X 0 , Z and Z 0 respectively. Our proof essen-
tially follows that of Corollary 4.2 in [21], where a similar
Since the second and third sum in (20) have the same num- result is first proven under the assumption that x 6= x0 and
ber of terms it suffices to show that then generalized to the general case using a continuity ar-
1−c 2 gument. For this purpose we need to extend the infeasible
xi (xi − zi0 ) + x0j (x0j − zj ) ≥ (xi + x02
j ), (22) interval somewhat. Since 0 < c < 1 and zr+1 < czr are
2
open there is an > 0 such that zr+1 < (c − )zr and
when i ∈ I, i ∈ / I 0 and j ∈
/ I, j ∈ I 0 . By the assumption 0 < c − < 1. Now assume that a∗ > 1 in (25), then
s < cs we know that zj < cxi . We further know that clearly
zi0 ≤ s0 ≤ x0j . This gives
1−c+
hZ 0 − Z, X 0 − Xi > kX 0 − Xk2F , (28)
x2i+ x02
j
2
xi zi0 ≤ xi x0j ≤ , (23)
2 since 1−c+
< 1. Otherwise a∗ ≤ 1 and we have
2 02 2
xi + xj
x0j zj < cx0j xi ≤ c , (24)
2 hZ 0 − Z, X 0 − Xi hMπ z0 − z, Mπ x0 − xi
≥ . (29)
Inserting these inequalities into the left hand side of (22) kX 0 − Xk2F kMπ x0 − xk2
gives the desired bound.
According to Lemma 2.1 the right hand side is strictly larger
The above result gives an estimate of the growth of the than 1−c+
2 , which proves that (28) holds for all X 0 with
0
subdifferential in terms of the singular values. To derive a x 6= x.
similar estimate for the matrix elements we need the follow- It remains to show that
ing lemma:
1−c+
hZ 0 − Z, X 0 − Xi ≥ kX 0 − Xk2F , (30)
Lemma 2.2. Let x,x0 ,z,z0 be fixed vectors with non- 2
increasing and non-negative elements such that x 6= x0 and
z and z0 fulfill (15) (with x and x0 respectively). Define for the case x0 = x and kX 0 − XkF 6= 0. Since > 0 is
X 0 = U 0 Dx0 V 0T , X = U Dx V T , Z 0 = U 0 Dz0 V 0T , and arbitrary this proves the Corollary. This can be done as in
Z = U Dz V T as functions of unknown orthogonal matri- [21] using continuity of the scalar product and the Frobenius
ces U , V , U 0 and V 0 . If norm. Specifically, a sequence X(t) → X, when t → 0, is
defined by modifying the largest singular value and letting
hZ 0 − Z, X 0 − Xi σ1 (X(t)) = σ1 (X) + t. It is easy to verify that X(t) fulfills
a∗ = min0 ≤1 (25) (28) for every t > 0. Letting t → 0 then proves (30).
U,V,U ,V 0 kX 0 − Xk2F
2.3. Uniqueness of Low Rank Stationary Points GIST approach from [14] because of its simplicity. Given a
current iterate Xk this method solves
In this section we show that if a RIP (3) holds and the
singular values zr and zr+1 are well separated there can 2
Xk+1 = arg min Rr (X) + τk kX − Mk k , (38)
only be one stationary point of F that has rank r. We first X
derive a bound on the gradients of H. We have where Mk = Xk − τ1k (A∗ AXk −A∗ b). Note that if τk = 1
then any fixed point of (38) is a stationary point by Lemma
∇H(X) = 2δq X + 2(A∗ A − I)X − 2A∗ b. (31)
1.1. To solve (38) we use the proximal operator computed
This gives h∇H(X 0 ) − ∇H(X), X 0 − Xi = in [18].
Our algorithm consists of repeatedly solving (38) for a
2δq kX 0 − Xk2F + 2(kA(X 0 − X)k2 − kX 0 − Xk2F ). (32) sequence of {τk }. We start from a larger value (τ0 = 5 in
our implementation) and reduce towards 1 as long as this
By (3) kA(X 0 − X)k2 − kX 0 − Xk2F ≤ δkX 0 − Xk2F , if results in decreasing objective values. Specifically we set
−1
rank(X 0 − X) ≤ q which gives τk+1 = τk1.1 + 1 if the previous step was successful in
reducing the objective value. Otherwise we increase τ ac-
h∇H(X 0 ) − ∇H(X), X 0 − Xi ≥ 0 (33) cording to τk+1 = 1.5(τk − 1) + 1.
This leads us to our main result 3.1. Synthetic Data
Theorem 2.4. Assume that Xs is a stationary point of We first evaluate the quality of the relaxation on a num-
F , that is, (I − A∗ A)Xs + A∗ b = Z, where 2Z ∈ ber of synthetic experiments. We compare the two formula-
∂G(Xs ), rank(Xs ) = r and the singular values of Z fulfill tions (4) and
zr+1 < (1 − 2δ2r )zr . If Xs0 is another stationary point then min µkXk∗ + kAX − bk2 . (39)
rank(Xs0 ) > r. X
In Figure 2 (a) we tested these two relaxations on a num-

Proof. Assume that rank(Xs0 ) ≤ r. Since both Xs and Xs0
ber of synthetic problems with varying noise levels. The
are stationary we have
data was created so that the operator A fulfills (3) with
2δ2r Xs0 − ∇H(Xs0 ) = 2Z 0 , (34) δ = 0.2. By column stacking an m × n matrix X the lin-
ear mapping A can be represented with a matrix A of size
2δ2r Xs − ∇H(Xs ) = 2Z, (35) p × mn. It is easy to see that if we let p = mn the term
(1 − δq ) of (3) will be the same as the smallest singular
where 2Z ∈ ∂G(Xs ) and 2Z 0 ∈ ∂G(Xs0 ). Taking the dif-
value of A squared. (In this case the RIP constraint will
ference between the two equations yields
hold for any rank and we therefore suppress the subscript
2δ2r (Xs0 − Xs ) − ∇H(Xs0 ) + ∇H(Xs ) = 2Z 0 − 2Z, (36) q and only use δ.) For the data in Figure 3 (a) we selected
400 × 400 matrices A with random N (0, 1) (gaussian mean
which implies 0 and variation 1) entries and modified their singular val-
ues. We then generated a 20 × 20 matrices X of rank 5
2δ2r kV k2F − h∇H(Xs0 ) + ∇H(Xs ), V i = 2hZ 0 − Z, V i, by sampling 20 × 5 matrices U and V with N (0, 1) entries
(37) and computed X = U V T . The measurement vector b was
where V = Xs0 − Xs has rank(V ) ≤ 2r. By (33) the left created by computing b = AX + , where is N (0, σ 2 )
hand side is less than 2δ2r kV k2F . However, according to for varying noise level σ between 0 and 1. In Figure 2 (a)
Corollary 2.3 (with c = 1−2δ2r ) the right hand side is larger we plotted the measurement fit kAX − bk versus the noise
than 2δ2r kV k2F which contradicts rank(Xs0 ) ≤ r. level σ for the solutions obtained with (4) and (39). Note
that since the formulation (39) does not directly specify the
Remark. Note that [7] shows that if kAk ≤ 1 then any rank of the sought matrix we iteratively searched for the
local minimizer of (4) is also a local minimizer of (2) and smallest value of µ that gives the correct rank, using a bi-
therefore of rank r. Hence any local minimizer Xs obeying section scheme. The reason for choosing the smallest µ is
the conditions of the theorem will be unique. that this reduces the shrinking bias to a minimum while it
still gives the correct rank.
3. Implementation and Experiments In Figure 2 (b) we computed the Z matrix and plotted the
fraction of problem instances where its singular values ful-
In this section we test the proposed approach on some filled zr+1 < (1 − 2δ)zr , with δ = 0.2. For these instances
simple real and synthetic applications (some that fulfill (3) the obtained stationary points are also globally optimal ac-
and some that do not). For our implementation we use the cording to our main results.
(a) (b) (c)
25 1.2 15
20 1
0.8 10
15
0.6
10
0.4 5
Rr Rr
5 k · k∗ Rr k · k∗
0.2
0 0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
A − 400 × 400 with δ = 0.2 A − 300 × 400 (unknown δ)
Figure 2: (a) - Noise level (x-axis) vs. data fit kAX − bk (y-axis) for the solutions obtained with (4) and (39). (b) - Fraction
of instances where the solution of (4) could be verified to be globally optimal. (c) - Same as (a). (a) and (b) uses 400 × 400
A with δ = 0.2 while (c) uses 300 × 400 A.
In Figure 2 (c) we did the same experiment as in (a) but The linear operator defined by A(X # ) = RX does by
with an under determined A of size 300 × 400. It is known itself not obey (3) since there are typically low rank matrices
[24] that if A is p × mn and the elements of A are drawn in its nullspace. This can be seen by noting that if Ni is the
from N (0, p1 ) then A fulfills (3) with high probability. The 3 × 1 vector perpendicular to the two rows of Ri , that is
exact value of δq is however difficult to determine and there- Ri Ni = 0 then  
fore we are not able to verify optimality in this case. Xi
 Yi  = Ni Ci , (41)
3.2. Non-Rigid Structure from Motion Zi
In this section we consider the problem of Non-Rigid where Ci is any 1 × m matrix, is in the null space of Ri .
Structure from Motion. We follow the aproach of Dai. et Therefore any matrix of the form
al. [8] and let
 

X1
 n11 C1 n21 C1 n31 C1
 Y1   n12 C2 n22 C2 n32 C2 
N (C) =  , (42)
 

 Z1 
   .. .. ..
  X1 Y1 Z1  . . . 
X =  ...  and X # =  ... .. ..  , (40) n1F CF n2F CF n3F CF
  
  . . 
 XF  X F Y F Z F

 YF 
 where nij are the elements of Ni , vanishes under A. Setting
everything but the first row of N (C) to zero shows that there
ZF is a matrix of rank 1 in the null space of A. Moreover, if
where Xi ,Yi ,Zi are 1 × m matrices containing the x-,y- and the rows of the optimal X # spans such a matrix it will not
z-coordinates of the tracked points in images i. Under the be unique since we may add N (C) without affecting the
assumption of an orthographic camera the projection of the projections or the rank.
3D points can be modeled using M = RX, where R is In Figure 4 we compare the two relaxations
a 2F × 3F block diagonal matrix with 2 × 3 blocks Ri ,
consisting of two orthogonal rows that encode the camera Rr (X # ) + kRX − M k2F (43)
orientation in image i. The resulting 2F × m measurement
and
matrix M consists of the x- and y-image coordinates or the
µkX # k∗ + kRX − M k2F (44)
tracked points. Under the assumption of a linear shape basis
model [4] with r deformation modes, the matrix X # can be on the four MOCAP sequences displayed in Figure 3, ob-
factorized into X # = CB, where the r ×3m matrix B con- tained from [8]. These consist of real motion capture data
tain the basis elements. It is clear that such a factorization and therefore the ground truth solution is only approxima-
is possible when X # is of rank r. We therefore search for tively of low rank.
the matrix X # of rank r that minimizes the residual error In Figure 4 we plot the rank of the obtained solution ver-
kP X − M k2F . sus the datafit kRX − M k2F . Since (44) does not allow us
Drink Pick-up Stretch Yoga
Figure 3: Four images from each of the MOCAP data sets.
Figure 4: Results obtained with (43) (blue dots) and (44) (orage curve) for the four sequences. Data fit kRX − M kF (y-axis)
versus rank(X # ) (x-axis).
Figure 5: Results obtained with (43) (blue dots) and (44) (orage curve) for the four sequences. Distance to ground truth
kX − Xgt kF (y-axis) versus rank(X # ) (x-axis).
Figure 6: Results obtained with (45) (blue dots) and (46) (orage curve) for the four sequences. Data fit kRX − M kF (y-axis)
versus rank(X # ) (x-axis).
Figure 7: Results obtained with (45) (blue dots) and (46) (orage curve) for the four sequences. Distance to ground truth
kX − Xgt kF (y-axis) versus rank(X # ) (x-axis) is plotted for various regularization strengths.
to directly specify the rank of the sought matrix, we solved [2] R. Basri, D. Jacobs, and I. Kemelmacher. Photometric stereo
the problem for 50 values of µ between 1 and 100 (orange with general, unknown lighting. Int. J. Comput. Vision,
curve) and computed the resulting rank and datafit. Note 72(3):239–257, May 2007.
that even if a change of µ is not large enough to change the [3] D. Bertsekas. Nonlinear Programming. Athena Scientific,
rank of the solution it does affect the non-zero singular val- 1999.
ues. To achieve the best result for a specific rank with (44) [4] C. Bregler, A. Hertzmann, and H. Biermann. Recovering
we should select the smallest µ that gives the correct rank. non-rigid 3d shape from image streams. In IEEE Conference
on Computer Vision and Pattern Recognition, 2000.
Even though (3) does not hold, the relaxation (43) consis-
[5] E. J. Candès, X. Li, Y. Ma, and J. Wright. Robust principal
tently gives better data fit with lower rank than (44). Fig-
component analysis? J. ACM, 58(3):11:1–11:37, June 2011.
ure 5 also shows the rank versus the distance to the ground
[6] E. J. Candès and B. Recht. Exact matrix completion via con-
truth solution. For high rank the distance is typically larger vex optimization. Foundations of Computational Mathemat-
for (43) than (44). A feasible explanation is that when the ics, 9(6):717–772, 2009.
rank is high it is more likely that the row space of X # con- [7] M. Carlsson. On convexification/optimization of func-
tains a matrix of the type N (C). Loosely speaking, when tionals including an l2-misfit term. arXiv preprint
we allow too complex deformations it becomes more diffi- arXiv:1609.09378, 2016.
cult to uniquely recover the shape. The nuclear norm’s built [8] Y. Dai, H. Li, and M. He. A simple prior-free method for
in bias to small solutions helps to regularize the problem non-rigid structure-from-motion factorization. International
when the rank constraint is not discriminative enough. Journal of Computer Vision, 107(2):101–122, 2014.
One way to handle the null space of A is to add addi- [9] C. Eckart and G. Young. The approximation of one matrix by
tional regularizes that penalize low rank matrices of the type another of lower rank. Psychometrika, 1(3):211–218, 1936.
N (C). Dai et al. [8] suggested to use the derivative prior [10] A. Eriksson, T. Thanh Pham, T.-J. Chin, and I. Reid. The k-
kDX # k2F , where the matrix D : RF → RF −1 is a first support norm and convex envelopes of cardinality and rank.
order difference operator. The nullspace of D consists of In The IEEE Conference on Computer Vision and Pattern
Recognition, 2015.
matrices that are constant in each column. Since this im-
[11] M. Fazel, H. Hindi, and S. P. Boyd. A rank minimization
plies that the scene is rigid it is clear that N (C) is not in the
heuristic with application to minimum order system approx-
nullspace of D. We add this term and compare
imation. In American Control Conference, 2001.
Rr (X # ) + kRX − M k2F + kDX # k2F (45) [12] R. Garg, A. Roussos, and L. Agapito. A variational approach
to video registration with subspace constraints. Int. J. Com-
and put. Vision, 104(3):286–314, 2013.
µkX # k∗ + kRX − M k2F + kDX # k2F . (46) [13] R. Garg, A. Roussos, and L. de Agapito. Dense variational
reconstruction of non-rigid surfaces from monocular video.
Figures 6 and 7 show the results. In this case both the data In IEEE Conference on Computer Vision and Pattern Recog-
fit and the distance to the ground truth is consistently bet- nition, 2013.
ter with (45) than (46). When the rank increases most of [14] P. Gong, C. Zhang, Z. Lu, J. Huang, and J. Ye. A gen-
the regularization comes from the derivative prior leading eral iterative shrinkage and thresholding algorithm for non-
to both methods providing similar results. convex regularized optimization problems. In The 30th In-
ternational Conference on Machine Learning (ICML), pages
4. Conclusions 37–45, 2013.
[15] C. Grussler and A. Rantzer. On optimal low-rank approxi-
In this paper we studied the local minima of a non- mation of non-negative matrices. In 2015 54th IEEE Con-
convex rank regularization approach. Our main theoreti- ference on Decision and Control (CDC), pages 5278–5283,
cal result shows that if a RIP property holds then there is Dec 2015.
often a unique local minimum. Since the proposed relax- [16] C. Grussler, A. Rantzer, and P. Giselsson. Low-rank opti-
ation (4) and the original objective (2) is shown to have the mization with convex constraints. CoRR, abs/1606.01793,
same global minimizers if kAk ≤ 1 in [7] this result is also 2016.
relevant for the original discontinuous problem. Our ex- [17] V. Larsson and C. Olsson. Convex envelopes for low rank
perimental evaluation shows that the proposed approach of- approximation. In International Conference on Energy Min-
ten gives better solutions than standard convex alternatives, imization Methods in Computer Vision and Pattern Recogni-
even when the RIP constraint does not hold. tion, 2015.
[18] V. Larsson and C. Olsson. Convex low rank approximation.
International Journal of Computer Vision, 120(2):194–214,
References
2016.
[1] F. Andersson, M. Carlsson, and C. Olsson. Convex envelopes [19] A. M. McDonald, M. Pontil, and D. Stamos. Spectral k-
for fixed rank approximation. Optimization Letters, pages 1– support norm regularization. In Advances in Neural Infor-
13, 2017. mation Processing Systems. 2014.
[20] K. Mohan and M. Fazel. Iterative reweighted least squares
for matrix rank minimization. In Communication, Control,
and Computing (Allerton), 2010 48th Annual Allerton Con-
ference on, pages 653–661. IEEE, 2010.
[21] C. Olsson, M. Carlsson, F. Andersson, and V. Larsson.
Non-convex rank/sparsity regularization and local minima.
arXiv:1703.07171, 2017.
[22] S. Oymak, A. Jalali, M. Fazel, Y. C. Eldar, and B. Hassibi.
Simultaneously structured models with application to sparse
and low-rank matrices. IEEE Transactions on Information
Theory, 61(5):2886–2908, 2015.
[23] S. Oymak, K. Mohan, M. Fazel, and B. Hassibi. A simpli-
fied approach to recovery conditions for low rank matrices.
In Information Theory Proceedings (ISIT), 2011 IEEE Inter-
national Symposium on, pages 2318–2322. IEEE, 2011.
[24] B. Recht, M. Fazel, and P. A. Parrilo. Guaranteed minimum-
rank solutions of linear matrix equations via nuclear norm
minimization. SIAM Rev., 52(3):471–501, Aug. 2010.
[25] C. Tomasi and T. Kanade. Shape and motion from image
streams under orthography: A factorization method. Int. J.
Comput. Vision, 9(2):137–154, 1992.
[26] J. Yan and M. Pollefeys. A factorization-based approach for
articulated nonrigid shape, motion and kinematic chain re-
covery from video. IEEE Trans. Pattern Anal. Mach. Intell.,
30(5):865–877, 2008.

A Non-Convex Relaxation For Fixed-Rank Approximation

Uploaded by

Copyright:

Available Formats

You might also like

A Non-Convex Relaxation For Fixed-Rank Approximation

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Non-Convex Relaxation For Fixed-Rank Approximation

Uploaded by

Copyright:

Available Formats

A Non-Convex Relaxation for Fixed-Rank Approximation

Carl Olsson1,2 Marcus Carlsson1 Erik Bylow1

Abstract problems of the form

is the the convex envelope of Xs ∈ arg min Rr (X) + kX − Zk2F . (8)

This leads us to our main result 3.1. Synthetic Data

In Figure 2 (a) we tested these two relaxations on a num-

A − 400 × 400 with δ = 0.2 A − 300 × 400 (unknown δ)

Figure 3: Four images from each of the MOCAP data sets.

You might also like