Professional Documents
Culture Documents
A Non-Convex Relaxation For Fixed-Rank Approximation
A Non-Convex Relaxation For Fixed-Rank Approximation
A Non-Convex Relaxation For Fixed-Rank Approximation
This paper considers the problem of finding a low rank min I( rank(X) ≤ r) + kAX − bk2 . (2)
X
matrix from observations of linear combinations of its el-
ements. It is well known that if the problem fulfills a re- Here I( rank(X) ≤ r) is 0 if rank(X) ≤ r and ∞ other-
stricted isometry property (RIP), convex relaxations using wise. The linear operator A : Rm×n → Rp is assumed to
the nuclear norm typically work well and come with theo- fulfill a restricted isometry property (RIP) [24]
retical performance guarantees. On the other hand these
formulations suffer from a shrinking bias that can severely (1 − δq )kXk2F ≤ kAXk2 ≤ (1 + δq )kXk2F , (3)
degrade the solution in the presence of noise.
In this theoretical paper we study an alternative non- for all matrices with rank(X) ≤ q. The standard approach
convex relaxation that in contrast to the nuclear norm does for problems of this class is to replace
P the rank function with
not penalize the leading singular values and thereby avoids the convex nuclear norm kXk∗ = i σi (X) [24, 5]. It was
this bias. We show that despite its non-convexity the pro- first observed that this is the convex envelope of the rank
posed formulation will in many cases have a single local function over the set {X; σ1 (X) ≤ 1} in [11]. Since then
minimizer if a RIP holds. Our numerical tests show that a number of generalizations that give performance guar-
our approach typically converges to a better solution than antees for the nuclear norm relaxation have appeared, e.g.
nuclear norm based alternatives even in cases when the RIP [24, 23, 5, 6]. The approach does however suffer from a
does not hold. shrinking bias that can severely degrade the solution in the
presence of noise. In contrast to the rank constraint the nu-
clear norm penalizes both small singular values of X, as-
1. Introduction sumed to stem from measurement noise, and large singu-
lar values, assumed to make up the true signal, equally. In
Low rank approximation is an important tool in appli- some sense the suppression of noise also requires an equal
cations such as rigid and non rigid structure from motion, suppression of signal. Non-convex alternatives have been
photometric stereo and optical flow [25, 4, 26, 13, 2, 12]. shown to improve performance [22, 20].
The rank of the approximating matrix typically describes In this paper we will consider the relaxation
the complexity of the solution. For example, in non-rigid
structure from motion the rank measures the number of ba- min Rr (X) + kAX − bk2 , (4)
X
sis elements needed to describe the point motions [4]. Un-
der the assumption of Gaussian noise the objective is typi- where
cally to solve
N
X
min kX − X0 k2F , (1) Rr (X) = max zi2 − kX − Zk2 , (5)
Z
rank(X)≤r i=r+1
where X0 is a measurement matrix and k · kF is the Frobe- and zi , i = 1, ..., N are the singular values of X and Z
nius norm. The problem can be solved optimally using SVD respectively. The minimization over Z does not have any
[9], but the strategy is limited to problems where all matrix closed form solution, however it was shown in [18, 1] how
elements are directly measured. In this paper we will con- to efficiently evaluate and compute its proximal operator.
sider low rank approximation problems where linear com- Figure 1 shows a three dimensional illustration of the level
binations of the elements are observed. We aim to solve sets of the regularizer. In [18, 17] it was shown that
1
then there can not be any other stationary point with rank
less than or equal to r. The matrix Z is related to the gra-
dient of the objective function at Xs (see Section 2). The
term kX − Zk2F can be seen as a local approximation of
kAX − bk2 close to Xs (see [21]). If for example there
is a rank r matrix X0 such that b = AX0 then it is easy
to show that X0 is a stationary point and the corrrespond-
Figure 1: Level set surfaces {X | Rr (X) = α} for X = ing Z is identical to X0 . Since this means that zr+1 = 0
diag(x1 , x2 , x3 ) with r = 1 (Left) and r = 2 (Middle). our results certify that this is the only stationary point to
Note that when r = 1 the regularizer promotes solutions the problem if A fulfills (3) with δ2r < 12 . The following
where only one of xk is non-zero. For r = 2 the regu- lemma clarifies the connection between the stationary point
larlizer instead favors solutions with two non-zero xk . For Xs and Z.
comparison we also include the nuclear norm. (Right)
Lemma 1.1. The point Xs is stationary in F (X) = R(X)+
kAX − bk2 if and only if 2Z ∈ ∂G(Xs ), where G(X) =
Rr (X) + kX − X0 k2F , (6) Rr (X) + kXk2F , and if and only if
Theorem 2.4. Assume that Xs is a stationary point of We first evaluate the quality of the relaxation on a num-
F , that is, (I − A∗ A)Xs + A∗ b = Z, where 2Z ∈ ber of synthetic experiments. We compare the two formula-
∂G(Xs ), rank(Xs ) = r and the singular values of Z fulfill tions (4) and
zr+1 < (1 − 2δ2r )zr . If Xs0 is another stationary point then min µkXk∗ + kAX − bk2 . (39)
rank(Xs0 ) > r. X
20 1
0.8 10
15
0.6
10
0.4 5
Rr Rr
5 k · k∗ Rr k · k∗
0.2
0 0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Figure 2: (a) - Noise level (x-axis) vs. data fit kAX − bk (y-axis) for the solutions obtained with (4) and (39). (b) - Fraction
of instances where the solution of (4) could be verified to be globally optimal. (c) - Same as (a). (a) and (b) uses 400 × 400
A with δ = 0.2 while (c) uses 300 × 400 A.
In Figure 2 (c) we did the same experiment as in (a) but The linear operator defined by A(X # ) = RX does by
with an under determined A of size 300 × 400. It is known itself not obey (3) since there are typically low rank matrices
[24] that if A is p × mn and the elements of A are drawn in its nullspace. This can be seen by noting that if Ni is the
from N (0, p1 ) then A fulfills (3) with high probability. The 3 × 1 vector perpendicular to the two rows of Ri , that is
exact value of δq is however difficult to determine and there- Ri Ni = 0 then
fore we are not able to verify optimality in this case. Xi
Yi = Ni Ci , (41)
3.2. Non-Rigid Structure from Motion Zi
In this section we consider the problem of Non-Rigid where Ci is any 1 × m matrix, is in the null space of Ri .
Structure from Motion. We follow the aproach of Dai. et Therefore any matrix of the form
al. [8] and let
X1
n11 C1 n21 C1 n31 C1
Y1 n12 C2 n22 C2 n32 C2
N (C) = , (42)
Z1
.. .. ..
X1 Y1 Z1 . . .
X = ... and X # = ... .. .. , (40) n1F CF n2F CF n3F CF
. .
XF X F Y F Z F
YF
where nij are the elements of Ni , vanishes under A. Setting
everything but the first row of N (C) to zero shows that there
ZF is a matrix of rank 1 in the null space of A. Moreover, if
where Xi ,Yi ,Zi are 1 × m matrices containing the x-,y- and the rows of the optimal X # spans such a matrix it will not
z-coordinates of the tracked points in images i. Under the be unique since we may add N (C) without affecting the
assumption of an orthographic camera the projection of the projections or the rank.
3D points can be modeled using M = RX, where R is In Figure 4 we compare the two relaxations
a 2F × 3F block diagonal matrix with 2 × 3 blocks Ri ,
consisting of two orthogonal rows that encode the camera Rr (X # ) + kRX − M k2F (43)
orientation in image i. The resulting 2F × m measurement
and
matrix M consists of the x- and y-image coordinates or the
µkX # k∗ + kRX − M k2F (44)
tracked points. Under the assumption of a linear shape basis
model [4] with r deformation modes, the matrix X # can be on the four MOCAP sequences displayed in Figure 3, ob-
factorized into X # = CB, where the r ×3m matrix B con- tained from [8]. These consist of real motion capture data
tain the basis elements. It is clear that such a factorization and therefore the ground truth solution is only approxima-
is possible when X # is of rank r. We therefore search for tively of low rank.
the matrix X # of rank r that minimizes the residual error In Figure 4 we plot the rank of the obtained solution ver-
kP X − M k2F . sus the datafit kRX − M k2F . Since (44) does not allow us
Drink Pick-up Stretch Yoga
Figure 4: Results obtained with (43) (blue dots) and (44) (orage curve) for the four sequences. Data fit kRX − M kF (y-axis)
versus rank(X # ) (x-axis).
Figure 5: Results obtained with (43) (blue dots) and (44) (orage curve) for the four sequences. Distance to ground truth
kX − Xgt kF (y-axis) versus rank(X # ) (x-axis).
Figure 6: Results obtained with (45) (blue dots) and (46) (orage curve) for the four sequences. Data fit kRX − M kF (y-axis)
versus rank(X # ) (x-axis).
Figure 7: Results obtained with (45) (blue dots) and (46) (orage curve) for the four sequences. Distance to ground truth
kX − Xgt kF (y-axis) versus rank(X # ) (x-axis) is plotted for various regularization strengths.
to directly specify the rank of the sought matrix, we solved [2] R. Basri, D. Jacobs, and I. Kemelmacher. Photometric stereo
the problem for 50 values of µ between 1 and 100 (orange with general, unknown lighting. Int. J. Comput. Vision,
curve) and computed the resulting rank and datafit. Note 72(3):239–257, May 2007.
that even if a change of µ is not large enough to change the [3] D. Bertsekas. Nonlinear Programming. Athena Scientific,
rank of the solution it does affect the non-zero singular val- 1999.
ues. To achieve the best result for a specific rank with (44) [4] C. Bregler, A. Hertzmann, and H. Biermann. Recovering
we should select the smallest µ that gives the correct rank. non-rigid 3d shape from image streams. In IEEE Conference
on Computer Vision and Pattern Recognition, 2000.
Even though (3) does not hold, the relaxation (43) consis-
[5] E. J. Candès, X. Li, Y. Ma, and J. Wright. Robust principal
tently gives better data fit with lower rank than (44). Fig-
component analysis? J. ACM, 58(3):11:1–11:37, June 2011.
ure 5 also shows the rank versus the distance to the ground
[6] E. J. Candès and B. Recht. Exact matrix completion via con-
truth solution. For high rank the distance is typically larger vex optimization. Foundations of Computational Mathemat-
for (43) than (44). A feasible explanation is that when the ics, 9(6):717–772, 2009.
rank is high it is more likely that the row space of X # con- [7] M. Carlsson. On convexification/optimization of func-
tains a matrix of the type N (C). Loosely speaking, when tionals including an l2-misfit term. arXiv preprint
we allow too complex deformations it becomes more diffi- arXiv:1609.09378, 2016.
cult to uniquely recover the shape. The nuclear norm’s built [8] Y. Dai, H. Li, and M. He. A simple prior-free method for
in bias to small solutions helps to regularize the problem non-rigid structure-from-motion factorization. International
when the rank constraint is not discriminative enough. Journal of Computer Vision, 107(2):101–122, 2014.
One way to handle the null space of A is to add addi- [9] C. Eckart and G. Young. The approximation of one matrix by
tional regularizes that penalize low rank matrices of the type another of lower rank. Psychometrika, 1(3):211–218, 1936.
N (C). Dai et al. [8] suggested to use the derivative prior [10] A. Eriksson, T. Thanh Pham, T.-J. Chin, and I. Reid. The k-
kDX # k2F , where the matrix D : RF → RF −1 is a first support norm and convex envelopes of cardinality and rank.
order difference operator. The nullspace of D consists of In The IEEE Conference on Computer Vision and Pattern
Recognition, 2015.
matrices that are constant in each column. Since this im-
[11] M. Fazel, H. Hindi, and S. P. Boyd. A rank minimization
plies that the scene is rigid it is clear that N (C) is not in the
heuristic with application to minimum order system approx-
nullspace of D. We add this term and compare
imation. In American Control Conference, 2001.
Rr (X # ) + kRX − M k2F + kDX # k2F (45) [12] R. Garg, A. Roussos, and L. Agapito. A variational approach
to video registration with subspace constraints. Int. J. Com-
and put. Vision, 104(3):286–314, 2013.
µkX # k∗ + kRX − M k2F + kDX # k2F . (46) [13] R. Garg, A. Roussos, and L. de Agapito. Dense variational
reconstruction of non-rigid surfaces from monocular video.
Figures 6 and 7 show the results. In this case both the data In IEEE Conference on Computer Vision and Pattern Recog-
fit and the distance to the ground truth is consistently bet- nition, 2013.
ter with (45) than (46). When the rank increases most of [14] P. Gong, C. Zhang, Z. Lu, J. Huang, and J. Ye. A gen-
the regularization comes from the derivative prior leading eral iterative shrinkage and thresholding algorithm for non-
to both methods providing similar results. convex regularized optimization problems. In The 30th In-
ternational Conference on Machine Learning (ICML), pages
4. Conclusions 37–45, 2013.
[15] C. Grussler and A. Rantzer. On optimal low-rank approxi-
In this paper we studied the local minima of a non- mation of non-negative matrices. In 2015 54th IEEE Con-
convex rank regularization approach. Our main theoreti- ference on Decision and Control (CDC), pages 5278–5283,
cal result shows that if a RIP property holds then there is Dec 2015.
often a unique local minimum. Since the proposed relax- [16] C. Grussler, A. Rantzer, and P. Giselsson. Low-rank opti-
ation (4) and the original objective (2) is shown to have the mization with convex constraints. CoRR, abs/1606.01793,
same global minimizers if kAk ≤ 1 in [7] this result is also 2016.
relevant for the original discontinuous problem. Our ex- [17] V. Larsson and C. Olsson. Convex envelopes for low rank
perimental evaluation shows that the proposed approach of- approximation. In International Conference on Energy Min-
ten gives better solutions than standard convex alternatives, imization Methods in Computer Vision and Pattern Recogni-
even when the RIP constraint does not hold. tion, 2015.
[18] V. Larsson and C. Olsson. Convex low rank approximation.
International Journal of Computer Vision, 120(2):194–214,
References
2016.
[1] F. Andersson, M. Carlsson, and C. Olsson. Convex envelopes [19] A. M. McDonald, M. Pontil, and D. Stamos. Spectral k-
for fixed rank approximation. Optimization Letters, pages 1– support norm regularization. In Advances in Neural Infor-
13, 2017. mation Processing Systems. 2014.
[20] K. Mohan and M. Fazel. Iterative reweighted least squares
for matrix rank minimization. In Communication, Control,
and Computing (Allerton), 2010 48th Annual Allerton Con-
ference on, pages 653–661. IEEE, 2010.
[21] C. Olsson, M. Carlsson, F. Andersson, and V. Larsson.
Non-convex rank/sparsity regularization and local minima.
arXiv:1703.07171, 2017.
[22] S. Oymak, A. Jalali, M. Fazel, Y. C. Eldar, and B. Hassibi.
Simultaneously structured models with application to sparse
and low-rank matrices. IEEE Transactions on Information
Theory, 61(5):2886–2908, 2015.
[23] S. Oymak, K. Mohan, M. Fazel, and B. Hassibi. A simpli-
fied approach to recovery conditions for low rank matrices.
In Information Theory Proceedings (ISIT), 2011 IEEE Inter-
national Symposium on, pages 2318–2322. IEEE, 2011.
[24] B. Recht, M. Fazel, and P. A. Parrilo. Guaranteed minimum-
rank solutions of linear matrix equations via nuclear norm
minimization. SIAM Rev., 52(3):471–501, Aug. 2010.
[25] C. Tomasi and T. Kanade. Shape and motion from image
streams under orthography: A factorization method. Int. J.
Comput. Vision, 9(2):137–154, 1992.
[26] J. Yan and M. Pollefeys. A factorization-based approach for
articulated nonrigid shape, motion and kinematic chain re-
covery from video. IEEE Trans. Pattern Anal. Mach. Intell.,
30(5):865–877, 2008.