Professional Documents
Culture Documents
Abolghasemi 2012
Abolghasemi 2012
Signal Processing
journal homepage: www.elsevier.com/locate/sigpro
a r t i c l e i n f o abstract
Article history: In this paper the problem of optimization of the measurement matrix in compressive
Received 7 December 2010 (also called compressed) sensing framework is addressed. In compressed sensing a
Received in revised form measurement matrix that has a small coherence with the sparsifying dictionary (or
27 July 2011
basis) is of interest. Random measurement matrices have been used so far since they
Accepted 14 October 2011
Available online 25 October 2011
present small coherence with almost any sparsifying dictionary. However, it has been
recently shown that optimizing the measurement matrix toward decreasing the
Keywords: coherence is possible and can improve the performance. Based on this conclusion, we
Basis pursuit (BP) propose here an alternating minimization approach for this purpose which is a variant
Compressive sensing (CS)
of Grassmannian frame design modified by a gradient-based technique. The objective is
Gradient descent
to optimize an initially random measurement matrix to a matrix which presents a
Orthogonal matching pursuit (OMP)
Random Gaussian matrices smaller coherence than the initial one. We established several experiments to measure
Sparse representation the performance of the proposed method and compare it with those of the existing
approaches. The results are encouraging and indicate improved reconstruction quality,
when utilizing the proposed method.
& 2011 Elsevier B.V. All rights reserved.
0165-1684/$ - see front matter & 2011 Elsevier B.V. All rights reserved.
doi:10.1016/j.sigpro.2011.10.012
1000 V. Abolghasemi et al. / Signal Processing 92 (2012) 999–1009
possible sparse a which is not feasible. However, there frames [17–19], aims at minimizing a cost function in an
have been reported several approaches providing the alternating scheme. The results of our experiments and
solution for the sparse recovery problem, by either relax- simulations confirm the ability of the proposed method to
ing it to decrease the coherence leading to higher reconstruction
quality.
minJaJ1 s:t: y ¼ Da, ð2Þ
The rest of the paper is organized as follows. In the next
P
where JaJ1 ¼ i 9ai 9 refers to ‘1 -norm, or using a greedy section, we first define the criteria that can be used for
strategy. Note that in addition to (1) and (2), there are measuring the coherence of a matrix. Then, the proposed
other formulations as well which are not presented here. approach for optimization of the measurement matrix is
Some of the well known recovery methods are basis described. In Section 3 an adaptive scheme to choose the
pursuit (BP) [4], orthogonal matching pursuit (OMP) [5], optimum stepsize is proposed. In Section 4, some simula-
least angle regression (LARS) [6], and least absolute tions are given to examine the proposed method from the
shrinkage and selection operator (LASSO) [7]. practical perspectives. The discussion and conclusion are
Although sparsity is an essential requirement for the presented in Sections 5 and 6, respectively.
signals in the CS framework, it is not sufficient for a signal
to be successfully reconstructed. Thus far, it is realized
2. Measurement matrix optimization
that in order to be able to attain a reasonably small p and
yet have a stable reconstruction, U requires to have
2.1. Problem formulation
specific properties [8], the one which is focused on here,
is mutual coherence. In CS theory, a suitable sensing
Following the notations stated earlier, assume a has
matrix U is desired to be as incoherent as possible with
s 5 m non-zero samples and called s-sparse. A successful
sparsifying matrix W. In other words, the correlation
CS requires U and W to be incoherent. A good measure of
between any distinct pair of columns in equivalent matrix
coherence between these two matrices (or equivalently
D should be very small, and that means to have a nearly
columns of D) can be obtained by referring to the defini-
orthogonal matrix D. It is shown that random matrices
tion of mutual coherence [20,21]. Here, we quote this
with Gaussian or Bernoulli distributions are appropriate
definition as that presented by Elad [10]:
choices for U [9], as they satisfy this property with high
probability and can be generated non-adaptively. Although Definition 1. For a matrix D, the mutual coherence is
U is normally selected to be random, some recent works defined as the maximum absolute value and normalized
have been reported in which the authors attempt to find an inner product between all columns in D that can be
optimal structure for U (explicitly or implicitly) with the described as
hope of increasing the reconstruction quality and to take ( T
)
9di dj 9
fewer measurements [10–15]. Elad [10] attempts to itera- mmx ðDÞ ¼ max : ð3Þ
iaj,1 r i,j r m Jdi J2 Jdj J2
tively decrease the average mutual coherence using a
shrinkage operation followed by a singular value decom-
A desired U has a small mmx with respect to W. An
position (SVD) step. In [11] the authors apply a kind of non-
alternative description for mmx is obtained by referring to
uniform sampling by segmenting the input signal and ~ T D,
the corresponding Gram matrix G~ ¼ D ~ in which D~ is
taking samples with different rates from each segment. In
column-normalized version of D. We set two coherence
[12], which is an application to magnetic resonance imaging
measures based on this matrix, which are used to study
(MRI), the authors define an incoherence criterion based on
the performance of different methods later in the Results
point spread function (PSF) and propose a Monte Carlo
section. These parameters are the maximum and average
scheme for random incoherent sampling of this type of data. ~ denoted respectively as
of off-diagonal elements in G,
Wang et al. [13] propose a variable density sampling
strategy by exploiting the prior information about the mmx ¼ max 9g~ ij 9, ð4Þ
iaj,1 r i,j r m
statistical distributions of natural images in the wavelet
domain. Their proposed method is computationally efficient P P
iaj 9g ij 9
jai
~
and can be applied to several transform domains. In another mav ¼ : ð5Þ
mðm1Þ
work in [14], Wang et al. propose to generate colored
random projections using an adaptive scheme. Duarte- There are important reasons which motivate us to
Carvajalino et al. [15] take the advantage of an eigenvalue search for measurement matrices with small coherence.
decomposition process followed by a KSVD-based algorithm The first reason is the influence that the mutual coherence
[15] (see also [16]) to optimize U and learn dictionary W, has on the achievable sparsity bound for the given signal.
respectively. The results of all previous methods reveal the A detailed discussion about this fact has been documen-
improved performance which is an evidence of the benefits ted in [21,22]. In a brief statement, if we suppose that the
that optimal sampling can provide for this framework. following inequality holds in a noiseless setting:
Our contribution in this paper is to develop a novel
1 1
approach for optimizing the measurement matrix. We JaJ0 o 1þ ð6Þ
2 mmx ðDÞ
propose a gradient-based method to optimize a randomly
selected measurement matrix to decrease the coherence then, a is necessarily the sparsest signal when y and D are
with the given sparsifying matrix. The proposed algo- known. More importantly, both BP and OMP are guaranteed
rithm, which is related to the design of Grassmannian to successfully recover it [21,22]. This implies that as long as
V. Abolghasemi et al. / Signal Processing 92 (2012) 999–1009 1001
the mutual coherence is small, a successful reconstruction is between all vectors are simply obtained by multiplying D ~
achievable for a wider range of sparsity degree. by its transposed: D ~ T D.
~ This is equivalent to the Gram
Another significant requirement is the restricted iso- matrix G~ which has already been described. Note that
metry property (RIP) [9,20], which is defined with respect diagonal elements of G~ are not of our interest since they
to an isometry constant 0 o d r1. For a s-sparse signal a ~ In particular,
indicate the Euclidean norms of columns of D.
and any integer t r s, the isometry constant of D is the we are only interested in the off-diagonal elements of G~
smallest ds that satisfies which correspond to inner products between any two
2 2 2 distinct vectors defined as
ð1ds ÞJat J2 rJDt t J2 rð1 þ ds ÞJ t J2 :
a a ð7Þ
T
g~ ij ¼ cosðyij Þ :¼ /d~ i , d~ j S ¼ d~ i d~ j for iaj, ð9Þ
This property implies that for a proper isometry constant
(the ideal d is 0), RIP ensures that any t subset of columns where / , S denotes the vector inner product. It is
in D with cardinality less than s is nearly orthogonal. This important to note that minimum possible absolute off-
orthogonality can be related to the coherence of columns; diagonals of G~ are zero and can occur only if D
~ is ortho-
less coherence corresponds to higher degree of orthogon- normal with p¼m. This is not the case here, since we only
ality. This results in a better CS behavior and guarantees deal with p o m. It can be shown that minimum absolute
the identifiability of the original signal by both OMP and inner product for each vector pair in an equiangular tight
BP [4]. Note that in noisy circumstances more severe frame is [18]:
conditions are applied to ensure the stability of recon- rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
mp
struction, which are not dealt here. mE ¼ : ð10Þ
pðm1Þ
2.2. The proposed method In fact, (10) is the minimum achievable mutual coherence of
D, too. We now summarize our discussion in the following
Recall from the previous section that an incoherent D has definition [18]:
a small mutual coherence. In other words, the correspond-
Definition 2. A matrix D is called an equiangular tight frame
ing Gram matrix has its absolute off-diagonal elements
(ETF) if its corresponding Gram matrix has unit diagonal
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
close to zero and diagonal elements equal to one. This,
elements and off-diagonals equal to 7 ðmpÞ=pðm1Þ:
indicates a Gram matrix which is equal to identity matrix. In 8 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
a previous work [23], we proposed a method to minimize >
<7 mp
for iaj,
the difference between Gram matrix and identity matrix in g~ ij ¼ pðm1Þ ð11Þ
>
:1
the form of Frobenius norm: otherwise:
the algorithm until reaching to a desired coherence, or the fact that has been also indicated in the corresponding
until the value of cost function for two successive itera- literature [28].
tions does not change significantly.
Next, we describe a procedure for choosing an adaptive
stepsize which improves the convergence behavior of the
proposed method compared with using a fixed stepsize. 4. Results
μmx
6000 0.4
4000 0.3
2000 0.2
0 0.1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10 20 30 40 50 60 70 80 90 100
Absolute off−diagonal values of Gram matrix elements Iteration number
Normalized error
6000
0.7
5000 0.6
4000 0.5
3000 0.4
2000 0.3
1000 0.2
0 0.1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10 20 30 40 50 60 70 80 90 100
Iteration number
Absolute off−diagonal values of Gram matrix elements
Evolution of adaptive stepsize β
Optimization with overcomplete DCT dictionary 0.055
8000 0.0545
No optimization
0.054
7000 Proposed method
The method in [23] 0.0535
6000 0.053
5000 0.0525
β
4000 0.052
0.0515
3000
0.051
2000 0.0505
1000 0.05
10 20 30 40 50 60 70 80 90 100
0 Iteration number
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Absolute off−diagonal values of Gram matrix elements Fig. 2. The convergence results: (a) evolution of mmx , (b) the objective
function, and (c) the evolution of adaptive stepsize b, all versus iterations
~ (a),
Fig. 1. Comparing the distributions of off-diagonal elements of G: number.
(b) using random dictionary, and (c) with overcomplete DCT dictionary.
Table 1
Effects of different methods on mutual coherence when using two types of dictionaries: random and DCT.
Random
dictionary:
mmx 0.7326 0.6803 0.4298 0.7831 0.8552 0.9281
mav 0.1557 0.1436 0.1475 0.1554 0.1548 0.1556
DCT dictionary:
mmx 0.9334 0.7429 0.4855 0.9593 0.9146 0.8932
mav 0.1527 0.1430 0.1471 0.1551 0.1538 0.1552
V. Abolghasemi et al. / Signal Processing 92 (2012) 999–1009 1005
ρ = k/p
ρ = k/p
ρ = k/p
ρ = k/p
Fig. 3. Empirical phase transition in the ðr, dÞ plane, when (a) LASSO and (b) OMP with random U, (c) LASSO and (d) OMP with proposed method
was used.
In the next part of the experiments we investigated the for each n where the reconstruction is successful with
advantages of measurement matrix optimization in the probability 1g. g is the estimator’s threshold and was set
sparse signal recovery. We selected OMP, a greedy algo- to g ¼ 0:25 (see [29] for more details about the estimator’s
rithm, which iteratively builds up an approximation of the threshold). Preliminary inspection of the results, especially
sparse signal and is widely used as a reconstruction those in Fig. 4, where the curves for different methods are
algorithm in this framework. Further, LASSO (a continuation depicted on the same graph, reveals the success of the
of BP [4]) was selected as a minimization approach which proposed approach. It is seen that the proposed approach
attempts to fit a regression model while imposing ‘1 -norm does give improvements over random matrices, and Elad’s
constraint on the regression coefficients and solving: method. However, it has been observed that these advan-
tages decrease as the problems size grows (as n increases
minJyDaJ22 s:t: JaJ1 r k, ð25Þ from around 800–1000 and beyond). Nevertheless, the
a
proposed method outperforms both Elad’s and pure random
where k is a nonnegative constant [7,30]. We present the matrices at large dimensions, though not very noticeably.
results of applying these methods as they are well known in Following the previous test in analysis of the reconstruc-
sparse recovery problems. Extensive simulations were tion performance, we set up a new experiment where a
established to sketch the phase transition diagram3 in a redundant DCT dictionary Wð80120Þ was used. We con-
noiseless setting. One hundred trials were generated at each ducted two separate CS experiments, first by fixing p¼25
point on a grid of d ¼ p=n and r ¼ s=p axes, for n ¼ m ¼ 500, and varying s from 1 to 10 and second by fixing s¼4 and
d 2 ½0:1 1, and r 2 ½0:05 1. The ðr, dÞ plane is made of varying p from 15 to 40 (see also [10,29]). Each experiment
30 30 equispaced points in total. W was chosen to be a was performed for 50,000 random sparse ensembles and
random matrix of size 500 500. Non-zeros of the sparse the average reconstruction error was recorded. The corre-
signals in each trial were drawn from random Gaussian sponding graphs shown in Fig. 5 indicate the improved
distribution. The resulting diagrams are demonstrated in performance when using an optimized U instead of a
Fig. 3 for two different recovery algorithms, namely OMP [5] random U. Moreover, it is seen that the proposed method
and LASSO [7].4 Shaded attributes are the proportion of performs better, especially when OMP is used as the
successful outcomes to all trials. An outcome is considered recovery method. The reason can be inferred from the fact
to be successful if the reconstruction error JxxJ ^ 2 =JxJ2 is that OMP works based on orthogonality, which is improved
less than 0.01. The overlaid curves show the estimate of a when the proposed method is applied.
More importantly, Fig. 5(a) reveals that by using the
3
Details about transition diagram can be found in [29,30].
optimized measurement matrices one can achieve the
4
Other methods such as BP [4] and LARS [6] were also tested and same performance as that obtained using pure random
similar behaviors were observed. matrices, but with fewer number of measurements.
1006 V. Abolghasemi et al. / Signal Processing 92 (2012) 999–1009
10−2
0.4
BP with random Φ
BP with Elad’s method
0.2 10−3 BP with proposed method
OMP with random Φ
OMP with Elad’s method
OMP with proposed method
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10−4
15 20 25 30 35 40
δ = p/n Number of measurements
0.6
0.5 10−3
BP with random Φ
BP with Elad’s method
0.4 BP with proposed method
10−4 OMP with random Φ
0.3 OMP with Elad’s method
OMP with proposed method
0.2 10−5
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8 9 10
δ = p/n Number of non−zeros
Fig. 4. (a) Empirical phase transition curves of (a) LASSO and (b) OMP Fig. 5. Relative number of errors as a function of (a) measurement
when different measurement matrices were used. numbers p and (b) number of non-zeros.
0.5
0.45 1
Pure random Φ
Relative number of errors
Table 2
The computation time (in seconds) per iteration for different methods.
The proposed method (fixed stepsize) 0.0565 0.2562 0.7069 1.4385 2.5095
The proposed method (adaptive stepsize) 0.0673 0.3526 0.9854 2.0480 3.6545
sampled has a particular spectrum. We have observed [2] E.J. Candes, J. Romberg, T. Tao, Robust uncertainty principles: exact
signal reconstruction from highly incomplete frequency information,
such behavior in other non-adaptive methods too, such as IEEE Transactions on Information Theory 52 (2) (2006) 489–509.
[10,23]. This non-adaptivity has the advantage that the [3] C.E. Shannon, Communication in the presence of noise, Proceedings
proposed method can be applied to a wide range of of the IRE 37 (1) (1949) 10–21.
[4] S. Chen, D.L. Donoho, M.A. Saunders, Atomic decomposition by
signals and images without any restriction. On the other basis pursuit, SIAM Journal on Scientific Computing 20 (1) (1999)
hand, the disadvantage is that it is not able to incorporate 33–61.
the prior knowledge into the optimization process in [5] J.A. Tropp, A.C. Gilbert, Signal recovery from random measurements
via orthogonal matching pursuit, IEEE Transactions on Information
cases where such a priori is available. However, in the Theory 53 (12) (2007) 4655–4666.
case of available knowledge about spectrum of the sparse [6] B. Efron, T. Hastie, L. Johnstone, R. Tibshirani, Least angle regression,
signal (e.g. the signal is bandpass or highpass), one Annals of Statistics 32 (2004) 407–499.
[7] R. Tibshirani, Regression shrinkage and selection via the LASSO,
reasonable approach can be to first generate a colored Journal of the Royal Statistical Society (Series B) 58 (1) (1996)
random measurement matrix using the method described 267–288.
in [14], then apply the proposed method in this paper to [8] E.J. Cande s, M.B. Wakin, An introduction to compressive sampling,
IEEE Signal Processing Magazine 25 (2) (2008) 21–30.
optimize it, and decrease the mutual coherence. This is
[9] E.J. Cande s, J.K. Romberg, T. Tao, Stable signal recovery from
the context that has to be further studied in the future. incomplete and inaccurate measurements, Communications on
Pure and Applied Mathematics 59 (8) (2006) 1207–1223.
[10] M. Elad, Optimized projections for compressed sensing, IEEE
6. Conclusion Transactions on Signal Processing 55 (12) (2007) 5695–5702.
[11] V. Abolghasemi, S. Sanei, S. Ferdowsi, F. Ghaderi, A. Belcher,
Segmented compressive sensing, in: IEEE/SP 15th Workshop on
In this paper optimization of the measurement matrix Statistical Signal Processing SSP 2009, September 2009.
in compressed sensing was addressed. Although random [12] M. Lustig, D. Donoho, J.M. Pauly, Sparse MRI: the application of
compressed sensing for rapid MR imaging, Magnetic Resonance in
sampling has been widely used so far in this framework, it Medicine 58 (6) (2007) 1182–1195.
has been recently shown that optimized projections with [13] Z. Wang, G.R. Arce, Variable density compressed image sampling,
the property of smaller mutual coherence with respect to IEEE Transactions on Image Processing 19 (1) (2010) 264–270.
[14] Z. Wang, G.R. Arce, J.L. Paredes, Colored random projections for
sparsifying matrix can significantly improve the perfor-
compressed sensing, IEEE International Conference on Acoustics,
mance. We proposed a gradient-based alternating optimiza- Speech and Signal Processing, ICASSP 2007, vol. 3, April 2007, pp.
tion method which iteratively optimizes the measurement III-873–III-876.
matrix to decrease the mutual coherence. The advantages [15] J.M. Duarte-Carvajalino, G. Sapiro, Learning to sense sparse signals:
simultaneous sensing matrix and sparsifying dictionary optimi-
are higher performance in sparse recovery, and also possi- zation, IEEE Transactions on Image Processing 18 (7) (2009)
bility to take fewer measurements and yet achieve the same 1395–1408.
performance level when using pure random matrices. We [16] M. Aharon, M. Elad, A. Bruckstein, K-SVD: an algorithm for design-
ing overcomplete dictionaries for sparse representation, IEEE
also described a steepest descent method for obtaining an Transactions on Signal Processing 54 (11) (2006) 4311–4322.
adaptive stepsize to improve the performance. The pre- [17] I.S. Dhillon, R.W. Heath, T. Strohmer, J.A. Tropp, Constructing
sented numerical results verify the success of the proposed packings in Grassmannian manifolds via alternating projection,
Experimental Mathematics 17 (1) (2008) 9–35.
approach. In addition, the experimental results indicate that [18] J.A. Tropp, I.S. Dhillon, R.W. Heath, T. Strohmer, Designing
the proposed method is computationally less expensive structured tight frames via an alternating projection method, IEEE
than some current methods in the literature. Transactions on Information Theory 51 (1) (2005) 188–209.
[19] T. Strohmer, R.W. Heath, Grassmannian frames with applications to
coding and communication, Applied and Computational Harmonic
Analysis 14 (3) (2003) 257–275.
[20] D.L. Donoho, P.B. Stark, Uncertainty principles and signal recovery,
Acknowledgments SIAM Journal on Applied Mathematics 49 (3) (1989) 906–931.
[21] D.L. Donoho, M. Elad, Optimally sparse representation in general
The authors would like to thank the anonymous revie- (nonorthogonal) dictionaries via ‘1 -minimization, Proceedings of
the National Academy of Sciences 100 (5) (2003) 2197–2202.
wers for their thorough remarks and fruitful suggestions. [22] J.A. Tropp, Greed is good: algorithmic results for sparse app-
roximation, IEEE Transactions Information Theory 50 (2004)
References 2231–2242.
[23] V. Abolghasemi, S. Ferdowsi, B. Makkiabadi, S. Sanei, On optimiza-
tion of the measurement matrix for compressive sensing, in:
[1] D. Donoho, Compressed sensing, IEEE Transactions on Information European Signal Processing Conference EUSIPCO, Aalborg, Den-
Theory 52 (4) (2006) 1289–1306. mark, August 2010.
V. Abolghasemi et al. / Signal Processing 92 (2012) 999–1009 1009
[24] M. Yaghoobi, L. Daudet, M.E. Davies, Parametric dictionary design [28] S.C. Douglas, Generalized gradient adaptive step sizes for stochastic
for sparse coding, IEEE Transactions on Signal Processing 57 (12) gradient adaptive filters, in: International Conference on Acoustics,
(2009) 4800–4810. Speech, and Signal Processing, ICASSP 1995, vol. 2, pp. 1396–1399,
[25] K.B. Petersen, M.S. Pedersen, The Matrix Cookbook, Version May 1995.
20081110, October 2008. [29] D.L. Donoho, I. Drori, V. Stodden, Y. Tsaig, Sparselab /http://
[26] S. Amari, A theory of adaptive pattern classifiers, IEEE Transactions sparselab.stanford.edu/S, December 2005.
on Electronic Computers EC EC-16 (3) (1967) 299–307. [30] D.L. Donoho, Y. Tsaig, Fast solution of ‘1 -norm minimization
[27] V.J. Mathews, Z. Xie, A stochastic gradient adaptive filter with problems when the solution may be sparse, IEEE Transactions on
gradient adaptive step size, IEEE Transactions on Signal Processing Information Theory 54 (11) (2008) 4789–4812.
41 (6) (1993) 2075–2087.