Professional Documents
Culture Documents
An Efficient Method For Robust Projection Matrix Design
An Efficient Method For Robust Projection Matrix Design
Abstract
Our objective is to efficiently design a robust projection matrix Φ for the Compressive Sensing (CS) systems when applied to the
signals that are not exactly sparse. The optimal projection matrix is obtained by mainly minimizing the average coherence of the
equivalent dictionary. In order to drop the requirement of the sparse representation error (SRE) for a set of training data as in [15]
[16], we introduce a novel penalty function independent of a particular SRE matrix. Without requiring of training data, we can
arXiv:1609.08281v3 [cs.LG] 6 Sep 2017
efficiently design the robust projection matrix and apply it for most of CS systems, like a CS system for image processing with a
conventional wavelet dictionary in which the SRE matrix is generally not available. Simulation results demonstrate the efficiency
and effectiveness of the proposed approach compared with the state-of-the-art methods. In addition, we experimentally demonstrate
with natural images that under similar compression rate, a CS system with a learned dictionary in high dimensions outperforms the
one in low dimensions in terms of reconstruction accuracy. This together with the fact that our proposed method can efficiently
work in high dimension suggests that a CS system can be potentially implemented beyond the small patches in sparsity-based image
processing.
Keywords: Robust projection matrix, sparse representation error (SRE), high dimensional dictionary, mutual coherence.
2
which can be addressed by some practical algorithms [18], network can extract more important features and avoid over-
among which the popularly utilized are the K-singular value fitting. Similar to this phenomenon, a dictionary trained on
decomposition (K-SVD) algorithm [19] and the method of op- a huge dataset is also expected to contain more features of
timal direction (MOD) [20]. As stated in the previous section, the represented signal. Simulation results shown in Section 4
the SRE e k = x k − Ψ θ k is generally not nil. We concatenate all demonstrate that a CS system with such a dictionary and de-
the SRE {eek } into an N × P matrix: signing a projection matrix designed on this dictionary yields a
higher reconstruction accuracy on natural images than the one
E , X − ΨΘ with a dictionary obtained from a small dataset. The recent
work in [26] [27] stated that a dictionary learned with larger
which is referred to as the SRE matrix corresponding to the patches (e.g., 64 × 64) on a huge dataset can better capture fea-
training dataset {xxk } and the learned dictionary Ψ . tures in natural images.2 In this paper, we also attempt to exper-
The recent work in [15] [16] attempted to design a robust imentally investigate the performance of designing a projection
projection matrix with consideration of the SRE matrix E and matrix on a high dimensional dictionary. These also motivate
proposed to solve us to develop an efficient method for designing a robust sensing
T matrix without the requirement of the SRE matrix E as it is not
Φ = arg min kII L − Ψ T Φ̃ ΦΨ k2F + λkΦ̃
Φ Φ̃ ΦE k2F (8) easy to obtain for the above two situations.
Φ
Φ̃
or In the next section, we provide a novel framework to effi-
T ciently design a robust sensing matrix, and more importantly, it
G − ΨT Φ̃
Φ = arg min kG Φ Ψk2F + λkΦ̃
Φ Φ̃ Φ E k2F (9)
Φ,G
Φ̃ G∈Hξ can be applied to the situation when the SRE matrix E is not
available.
where Hξ is the set of relaxed equiangular tight frames (ETFs):
Hξ = {G G = G T , G (i, i) = 1, ∀ i, max |G
G|G G(i, j)| ≤ ξ}. 3. A Novel Approach to Projection Matrix Design
i, j
Compared to (8) which requires the Gram matrix of the equiv- In this section, we provide an efficient robust sensing ma-
alent dictionary close to an identity matrix, (9) relaxes the re- trix design approach which drops the requirement of training
quirement of coherence between the equivalent dictionary but signals and their corresponding SRE matrix E . Our proposed
is much harder to solve. See [15] [16] for more discussions on framework is actually inspired by (8) and (9) from the following
this issue. two aspects.
We remark that to ensure the designed sensing matrix by (8)
or (9) be robust to the SRE for all the signals of interest, the
SRE matrix E should be well representative, i.e., we need suf- 3.1. A Novel Framework for Robust Projection Matrix Design
ficient number of training signals x k . As stated in [15] [16],
these methods ((8) and (9)) can be applied naturally when the First note that the energy of SRE kEE k2F is usually very small
dictionary is learned by algorithms like K-SVD with plenty of since the learned sparsifying dictionary is assumed to sparsely
training data {xxk }, since the SRE E is available without any ad- represent a signal well as in (2). Otherwise if kE E k2F is very
ditional effort. However, this could be prohibitive when the CS large, it indicates that the dictionary is not well designed for
system with an analytic dictionary is applied to some arbitrary this class of signals and it is possible that this class of signal
signals (but still they are approximately sparse in this dictio- can not be recovered from the compressive measurements no
nary), since there are no sufficient number of data available to matter what projection matrix is utilized. It follows from the
obtain the SRE matrix E . For example, one may only want norm consistent property that
to apply the CS system to an arbitrary image with the wavelet
dictionary. Also, these methods are prohibitive for a dictionary kΦ
ΦE kF ≤ kΦ
ΦkF kE
E kF (10)
trained on large datasets with millions of training samples and
in a dynamic CS system for streaming signals. To train such which implies informally that a smaller sensing matrix kΦ ΦkF
a dictionary, we have to conduct online algorithms [21] - [23] yields a smaller projected SRE kΦ Φ E kF .
which typically apply stochastic gradient method where in each Also as illustrated before that the amount of training data
iteration a randomly selected tiny part of the training signals should be sufficient so that they can represent the class of tar-
called mini-batch instead of the whole data is utilized for com- geted signals, the energy in the corresponding SRE matrix E
puting the expected gradient. In these cases, additional efforts should spread out in every elements. In other words, one can
are needed to obtain the SRE matrix E and it is usually pro- view the expected SRE as an additive Guassian white noise. In
hibitive to compute and store E for all the training dataset. All this situation, we have the following result.
of these situations make the approach proposed in [15] [16] be-
come limited. 2 The dimension of a dictionary in such a case becomes high compared with
Aiming to obtain a neural network well expressing the sig- the moderate dictionary size shown in [19]. In fact, the name of a high dimen-
nals of interest, an empirical strategy widely used by deep learn- sional dictionary in this paper means the dictionary obtained by training on a
ing community is to utilize a huge training dataset so that the larger size of represented signal.
3
Φe 1 k22 , . . . , kΦ
Φe P k22 is
Lemma 1. Suppose E (:, k) = e k , ∀k = 1, · · · P are i.i.d Gaussian It follows from (13) and (14) that kΦ
random vectors with each of zero mean and covariance σ2 I . a sequence of independent and identically distributed random
Then for any Φ ∈ RM×N , we have variable drawn from distributions of expected values given by
σ2 kΦ ΦΦ T k2F . Thus, by the law
Φk2F and variances given by 2σ4 kΦ
ΦE k2F = Pσ2 kΦ Φk2F .
E kΦ (11) ΦE k2
kΦ
of large numbers [29], the average P F converges in probabil-
where E denotes the expectation operator. ity and almost surely to the expected value σ2 kΦΦk2F as P → ∞.
Moreover, when the number of training samples P ap- Finally, the central limit theorem [29] establishes that as P ap-
ΦE k2
kΦ √ kΦ ΦE k2
proaches to ∞, we have P F converges in probability and proaches infinity, the random variables P( P F − σ2 kΦ Φk2F )
almost surely to σ2 kΦ
Φk2F . In particular, converges in distribution to a normal N (0, 2σ2 kΦΦΦ T k2F ).
ΦE k2F
√ kΦ d
p − σ2 kΦ
Φk2F → ΦΦ T k2F )
− N (0, 2σ2 kΦ (12)
P In words, Lemma 1 indicates that when the number of train-
ing samples approaches to infinity, kΦΦE k2F is proportional to
where N (µ, ς) denotes the Gaussian distribution of mean µ and kΦ 2
ΦkF . Inspired by (10)-(12), it is expected that without any
d
variance ς, and →
− means convergence in distribution. training signals and their corresponding SRE matrix E , a ro-
bust projection matrix can be obtained by solving the following
Proof. For each k, we first define d k = Φ e k . Since e k ∼
problem
N (0, σ2 I ), we have d k ∼ N (0, σ2 Φ Φ T ). Let Φ Φ T = Q Λ Q T
be an eigendecomposition of Φ Φ T , where Λ is an M × M diag- T
Φ) ≡ kII L − Ψ T Φ̃
Φ = arg min f (Φ̃ ΦΨ k2F + λkΦ̃
Φ Φ̃ Φk2F (15)
onal matrix with the non-negative eigenvalues λ1 , . . . , λM along Φ
Φ̃
its diagonal. We have
or
QT d k k22
kdd k k22 = kQ T
G − ΨT Φ̃
Φ, G) ≡ kG
Φ = arg min f (Φ̃ Φ Ψk2F + λkΦ̃
Φ Φ̃ Φk2F (16)
and Φ,G
Φ̃ G∈Hξ
Q T d k ∼ N (0, σ2 Λ ).
Here, with abuse of notation, we use both f to denote the ob-
For convenience, we define new random variables c = Q T d k jective function in (15) and (16). However, it should be clear
and b1 = λ 1σ2 c 2 (1), b2 = λ 1σ2 c 2 (2), . . . , bM = λ 1σ2 c 2 (M). It from the context as we always use f (Φ̃
Φ) to represent the one in
1 1 1
is clear that b1 , b2 , . . . , bM are independent random variables (15) and f (Φ̃
Φ, G ) to represent the one in (16). Since the more
of χ21 distribution, the chi-squared distribution with 1 degree of training samples can better represent the signals of interest and
freedom. the SRE, (12) indicates that the sensing matrices obtained by
Now we compute the mean of kΦ Φe k k2F : (15) and (16) are more robust to SRE than the ones obtained by
(8) and (9). This is demonstrated by experiments in Section 4.
Φe k k2F ] = E[kdd k k2F ] = E[kcck2F ]
E[kΦ The numerical algorithms is presented to solve (15) and (16) in
M M the following section.
= ∑ λi σ2 E[bi ] = ∑ λi σ2 (13)
i=1 i=1
3.2. Efficient Algorithms for Solving (15) and (16)
= trace(σ2 Φ Φ T ) = σ2 kΦ
Φk2F
Note that f (Φ̃
Φ) is a special case of f (Φ̃
Φ, G ) with G = I L .
where the second line we utilize E[χ21 ] = 1. The variance of Thus, we first consider solving
Φe k k2F is given by:
kΦ T
G − Ψ T Φ̃
Φ, G ) = kG
min f (Φ̃ ΦΨ k2F + λkΦ̃
Φ Φ̃ Φk2F (17)
Φe k k2F ] = Var[kdd k k2F ] = Var[kcck2F ]
Var[kΦ Φ
Φ̃
4
the Conjugate-Gradient (CG) [30] method is utilized to solve • Simulation results with synthetic data and natural images
(17).3 The gradient of f (Φ̃
Φ, G ) in terms of Φ̃
Φ is given as fol- (where the SRE matrix E is available) show that the pro-
lows: posed method also yields a comparable performance to or
T outperforms the methods in [15] [16] in terms of SRA.
Φ f (Φ̃
∇Φ̃ Φ, G ) = 2λΦ̃ ΦΨ G Ψ T + 4Φ̃
Φ − 4Φ̃ ΦΨ Ψ T Φ̃ ΦΨ Ψ T (19)
Φ Φ̃
Moreover, the experiments on natural images show that
Φ, G ), the toolbox minFunc4
After obtaining the gradient of f (Φ̃ designing a projection matrix on a given dictionary which
[25] is utilized to solve (17) with CG method. We note that is learned with large dataset or high-dimensional training
the gradient-based method only involves simple matrix multi- signals can improve SRA significantly with the same com-
plication in (19), without requiring performing SVD and matrix pression rate MN . However, it requires a great deal of mem-
inversion. Hence it is also suitable for designing the projection ories to store the SRE matrix E for either large dateset or
matrix for a CS system working on high dimensional signals. high-dimensional training data.
We now turn to solve (16) which has two variables Φ̃ Φ and
G ∈ Hξ . A widely used strategy for such problems is the alter- 4. Simulation Results
nating minimization [10] [11] [15] [16]. The main idea behind
alternating minimization for (16) is that we keep one variable In this section, we perform a set of experiments on synthetic
constant (say Φ̃
Φ), and optimize over the other variable (say G ). data and natural images to demonstrate the performance of the
Once G is fixed, as we explained before, we utilize CG method CS system with projection matrix designed by the proposed
to solve (17). On the other hand, the solution to minG f (Φ̃
Φ, G ) methods. For convenience, the corresponding CS systems are
can be simply obtained by projecting the Gram matrix of the denoted by CSMT with Φ̃ Φ obtained via (15) and CSMT −ET F with
equivalent dictionary onto the set Hξ when we fix Φ̃ Φ. The main Φ obtained via (16), and are compared with the following CS
Φ̃
steps of the algorithm are outlined in Algorithm 1. systems: CSrandn with a random projection matrix, CSLH with
the sensing matrix obtained via (8) [15], CSLH−ET F with the
Algorithm 1 sensing matrix obtained via (9) [15], and CSDCS [28]. It was
Initialization: Set k = 1, Φ 0 as a random one and the number first proposed in [28] that simultaneously optimizing Φ and Ψ
of iterations Iter. for a CS system results in better performance in terms of SRA.
In the sequel, we also examine this strategy in natural images
Step I: Set G̃Gk = Ψ T Φ Tk−1 Φ k−1 Ψ and then project it onto the and the corresponding CS system is denoted by CSS−DCS .5 For
set Hξ : simplicity, the parameter ξ in Hξ is set to Welch bound in the
following experiments.
1, i = j,
The SRA is evaluated in terms of the peak signal-to-noise
G k (i, j) = Gk (i, j),
G̃ i , j, |G̃
Gk (i, j)| ≤ ξ,
ratio (PSNR) [5]
ξ · sign(G̃
Gk (i, j)), i , j, |G̃
Gk (i, j)| > ξ
(2 − 1)2
r
where sign(·) is a sign function. ρ psnr , 10 × log 10 dB
ρmse
Φ f (Φ̃
Step II: Solve Φ k = arg minΦ̃ Φ, G k ) with CG.
with r = 8 bits per pixel. We also utilized the measure ρmse :
If k < Iter, set k = k + 1 and go to Step I. Otherwise, ter-
minate the algorithm and output ΦIter . 1 P
ρmse , ∑ kx̃xk − xk k22
Remarks: N × P k=1
• It is clear that this approach is independent of training data where x k is the original signal, x̃xk = Ψ θ̃θk stands for the recon-
and can be utilized for most of CS systems as long as the structed signal with θ̃θk the solution of (4), and P is the number
sparsifying dictionary Ψ is given. of patches in an image or the testing data.
• Even in the case where the SRE matrix E is available, it
A. Synthetic Data Experiments
is much easier and more efficient to solve (15) than (8)
since typically the number of columns in E is dramatically An N ×L dictionary Ψ is generated with normally distributed
greater than the size of Ψ and Φ , i.e., P M, N, L. entries and then is normalized so that each column has unit l2
norm. We also generate a random M ×N matrix Φ 0 (where each
3 We note that both of the methods shown in [15] [16] for solving (17) need entry has Gaussian distribution of zero-mean and variance 1)
to calculate the inversion of Ψ Ψ T . However, in practice, the learned dictio- as the initial condition for all of the aforementioned projection
nary sometimes is ill-conditioned, which may cause numerical instable prob- matrices. Φ 0 is also utilized as the sensing matrix in CSrandn .
lem if directly applying their methods. Thus, as global convergence of many
local search algorithms for solving similar low-rank optimizations is guaranteed
in [24], CG is chosen to solve (17). Obviously, if the aforementioned problem 5 In our experiment, the coupling factor utilized in CS
S−DCS is set to 0.5
does not happen in practical cases, the method in [15] [16] can be used to ad- which is the best value in our setting. Generally speaking, CSS−DCS should
dress (15). Moreover, we will show that CG and the methods shown in [15] have a best performance in terms of SRA because it optimizes projection matrix
[16] yield a similar solution in the following experiments. and dictionary simultaneously. Thus, the performance of CSS−DCS serves as the
4 We note that minFunc is a stable toolbox that can be efficiently applied indicator of the best performance can be achieved by other CS systems that only
with millions of variables. consider optimizing the projection matrix.
5
The synthetic data for training and testing is obtained as fol-
lows. A set of 2P K-sparse vectors {θθk ∈ ℜL } is generated as 6=0
6 = 0:2
the sparse coefficients where each non-zero elements of {θθk } is 6 = 0:5
randomly positioned with a Gaussian distribution of zero-mean 10 0 6 = 0:7
6 = 0:9
and unit variance. The set of signal vectors {xxk } is produced
(0)
with x k = Ψ s k + e k , x k + e k , ∀k, where Ψ is the given dic-
(f()k ) ! f $ )=f $
tionary and e k is the random noise with Gaussian distribution 10 -5
of zero-mean and variance σ2e to yield different signal-to-noise
(0)
ration (SNR) (in dB) of the signals. Clearly, x k is exactly K-
sparse in Ψ , while x k is approximately K-sparse in Ψ .
Denote X = X (0) + ∆ as the signal matrix of dimension N × 10 -10
(0)
2P, where X (0) (:, k) = x k and ∆ (:, k) = e k . We use the SRE
matrix E = ∆ (:, 1 : P) in (8) and (9) whose solutions are used
for CSLH and CSLH−ET F , respectively. The data X (:, P + 1 :
10 -15
2P) is utilized for testing the CS systems. The measurements 50 100 150 200 250 300 350 400 450 500
{yyk } are obtained by y k = Φ X (:, P + k), ∀k ∈ (0, P] where Φ is Iteration k
the projection matrix of the CS systems. For simplicity, OMP
is chosen to solve the sparse coding problem throughout the Figure 1: Evolution of f (ΦΦ) in (15) for different values of λ versus iteration,
experiments. where the sparsity level is set to K = 4. Here k represents the iteration of the
With the synthetic data, we conduct three set of experiments CG method.
to demonstrate the performance of our proposed framework for
robust projection matrix design, i.e., (15) and (16). In these ρmse . However, the performance of CSMT −ET F is more
three set of experiments, we respectively show the convergence sensitive than with λ than CSMT . We will show in the
of CG method, the effect of λ and the signal recovery accu- next experiment that the performance of CSMT −ET F out-
racy of the proposed projection matrices CSMT and CSMT −ET F performs CSMT in synthetic data when the SNR is not
versus different SNR of the signals. too small. However, for natural images which have rel-
1) Convergence Analysis: Let M = 20, N = 60, L = 100 and atively large SRE, CSMT always has better performance
K = 4.We utilize CG to solve (15). We note that a random dic- than CSMT −ET F . This phenomenon is also observed for
tionary with well-conditioned is chosen and thus we also com- CSLH and CSLH−ET F in [15] [16]. Thus, we only consider
pute the closed-form solution shown in [15] for (15). The ob- the performance of CSMT and CSLH for the natural images
jective value obtained by the closed-form solution is denoted in next section.
by f ∗ and is compared with the CG method. The evolution of
f (Φ
Φ) for different λ is shown in Figure 1. We note that differ-
Φ) and hence different f ∗ .
ent λ results in different functions f (Φ #10 -3
We observe global convergence of CG method for solving (15) 8 CSM T
with all the choices of λ. CSM T !ET F
7
2) The Choice of λ: With M = 20, N = 60, L = 80, K = 4 and
6
SNR = 15 dB, we check the effect of the trade-off parameter λ
in terms of ρmse for CSMT and CSMT −ET F . The λ is chosen 5
from 0 to 2 with step size 0.01. The evaluation of ρmse versus
different λ is depicted in Figure 2. 4
%mse
Remark 1:
• As seen from Figure 2, different λ yields different perfor- 3
mance in terms of ρmse for this practical situation where
the SNR is 15dB. It is clear that a proper choice of λ
results in significantly better performance than other val-
2
ues, especially for CSMT −ET F . Clearly, the advantage of
the proposed method is shown by comparing the cases for
λ = 0 and other values of λ as the former corresponds to
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
the traditional approaches which do not take the SRE into 6
account. In the sequel, we simplicity search the best λ
(with which the CS systems attain the minimal ρmse for
the test data) within (0, 1] for each experiment setting. Figure 2: Performance evaluation: ρmse versus the values of λ, where the spar-
sity level is set to K = 4 and SNR = 15 dB.
• According to this experiment, if λ is well chosen,
CSMT −ET F has better performance than CSMT in terms of 3) Signal Recovery Accuracy Evaluation: With M = 20, N =
6
60, L = 80, K = 4 and P = 1000, we compare our CS systems learned on a much larger training dataset. The performance of
CSMT and CSMT −ET F with other CS systems for SNR varying CS systems with a higher dimensional dictionary is given in the
from 5 to 45 dB. Figure 3 displays signal reconstruction error Experiment C. We observe that a CS system with a higher di-
ρmse versus SNR for all six CS systems. mensional dictionary and a projection matrix designed by our
proposed algorithm yields better SRA under the same compres-
sion rate. Both training and testing datasets used in these three
10 -1
CSrandn
set of experiments are extracted as follows from the LabelMe
CSDC S database [31]. Note that Data I is extracted with small patches
CSLH
CSLH !ET F and Data II is obtained by sample larger patches for the third
CSM T experiment.
CSM T !ET F
10 -2 Training Data I: A set of 8 × 8 non-overlapping patches is ob-
tained by randomly extracting 400 patches from each image in
the whole LabelMe training dataset. We arrange each patch of
%mse
Experiment B: large dataset and low dimensional dictionary CSMT , CSLH needs at least more computational time of
In this set of experiments, we first learn a dictionary on large- O PNK 2 L2 log L + PNLK 4 + PN 2 . In the set of next ex-
scale training samples, i.e., Training Data I, and then design the periments, we will show the advantage of designing the
projection matrices with the learned dictionary. As discussed in projection matrix on a high dimensional dictionary. With
the previous section, the large-scale training dataset makes it N and L increasing, the efficiency of the proposed method
inefficient or even impossible to compute the SRE matrix E . CSMT becomes more distinct.
Therefore, it is inefficient to utilize the methods in [15] [16]
as they require the SRE matrix E . Similar reason holds for
Experiment C: large dataset and high dimensional dictionary
CSS−DCS . Fortunately, the following results show that the pro-
posed CS system CSMT performs comparably to CSLH . Inspired by the work in [26], we attempt to design the pro-
The online dictionary learning algorithm in [22] [23] is cho- jection matrix on a high dimensional dictionary in this set of
sen to train the sparsifying dictionary on the whole Training experiments. The reason to utilize a high dimensional dictio-
Data I. For a fair comparison, we calculate the SRE E off-line nary is as follows. The sparse representation of a natural image
for CSLH in this experiment.6 The same M, N, L, K in Exper- X can be written as,
iment A are used in this experiment. λ = 0.9 and λ = 1e − 3
X X +E
= X̃
are selected for CSMT and CSLH , respectively. We note that the
X
X̃ , ΨΘ
choice of λ for CSLH is very sensitive to E . This is because the
T
two terms kII L −Ψ ΨT Φ̃
Φ Φ̃ ΦΨ k2F and kΦ̃
ΦE k2F in (8) for CSLH have where E is the sparse representation error.7 We first recover
different physical meanings and more importantly, the second Θ by solving a set of (3) and then take X̃X = Ψ Θ as the recov-
term kΦ̃ΦE k2F increases when we have more number of training ered image. It is clear that no matter what projection matrix
data, while the first term is independent of the training data. X instead of X . Thus,
is utilized, the best we can obtain is X̃
Thus, we need to decrease λ for CSLH when we increase the with a dictionary which can capture more information of the
number of training data. X , the correspond-
training dataset and better represent X with X̃
Remark 4: ing CS system is excepted to yield a higher SRA. As stated
in [26], training the dictionary with larger patches results in
• As shown in Table 3, benefiting from large-scale train-
smaller sparse representation errors for natural images. How-
ing samples, the performance of both CSLH and CSMT has
ever, training dictionary on larger patches, we have to train on
been improved compared with the one in Table 2. More-
a large-scale dataset to better represent the signals of interest.
over, we also observe that CSMT performs similarly to
This demonstrates the efficiency of the proposed method for
CSLH . It is also of interest to note that the PSNR for CSMT
designing a robust projection matrix on a high dimensional dic-
in Table 3 is higher than the one for CSS−DCS in Table 2
tionary as this method drops the requirement of the SRE matrix
for most of the tested images. This suggests that if the dic-
E which is not only in high dimension, but also large-scale.
tionary and the projection matrix are simultaneously op-
The parameters M, N, L, K and λ are set to 80, 256, 800, 16
timized by online algorithm with large dataset, the per-
and 0.5, respectively. Due to the fact that CSMT has a similar
formance of the corresponding CS system can be further
performance with CSLH and the choice of λ for CSLH is very
improved since joint optimization (CSS−DCS ) is expected
sensitive to E , we omit the performance of CSLH in this ex-
to have better performance than only optimizing projec-
periment. The simulation results are presented in Table 4. In
tion matrix with a given dictionary (CSMT ) under the same
order to demonstrate the visual effect clearly, two images Lena
8
Table 1: Performance Evaluated with Different Measures for Each of The Five Systems (M = 20, N = 64, L = 100, K = 4).
kII L − G k2F µ(DD) D)
µav (D Φk2F
kΦ kΦΦE k2F
CSrandn 5.30 × 105 0.951 0.384 1.25 × 103 4.86 × 103
CSDCS 7.82 × 104 0.999 0.695 3.41 × 101 1.30 × 102
CSS−DCS 8.00 × 101 0.857 0.326 6.24 × 100 3.48 × 101
CSLH 8.02 × 101 0.859 0.330 3.03 × 107 3.07 × 101
CSMT 8.00 × 101 0.848 0.331 6.59 × 100 3.94 × 101
Table 2: Statistics of ρ psnr for Ten Images Processed With M = 20, N = 64, L = 100 for K = 4. The highest ρ psnr is marked in bold.
Lena Elaine Man Barbara Cameraman Boat Peppers House Bridge Mandrill Average
CSrandn 29.01 29.23 27.65 22.46 23.15 26.57 24.71 28.34 26.49 20.61 25.82
CSDCS 30.50 30.69 28.93 24.11 24.31 27.66 26.50 29.71 27.71 21.87 27.20
CSS−DCS 33.18 32.61 31.52 25.96 26.75 30.36 29.71 33.24 30.20 24.09 29.76
CSLH 32.38 31.77 30.69 25.31 25.90 29.52 28.83 32.56 29.26 23.14 28.94
CSMT 32.47 32.24 30.83 25.36 25.99 29.67 28.92 32.31 29.43 23.32 29.05
Table 3: Statistics of ρ psnr for Ten Images Processed With M = 20, N = 64, L = 100 for K = 4. The dictionary is trained on a large dataset. The highest ρ psnr is
marked in bold.
Lena Elaine Man Barbara Cameraman Boat Peppers House Bridge Mandrill Average
CSrandn 30.54 29.88 28.69 22.52 23.94 27.48 26.75 30.31 27.21 20.93 26.83
CSDCS 30.20 29.91 28.57 23.43 23.89 27.33 26.51 30.06 27.31 21.36 26.86
CSS−DCS − − − − − − − − − − −
CSLH 33.92 32.78 31.88 25.73 26.85 30.60 30.17 32.24 30.13 23.87 29.82
CSMT 33.91 32.81 31.88 25.71 26.87 30.62 30.14 33.54 30.16 23.88 29.95
Table 4: Statistics of ρ psnr for Ten Images Processed With M = 80, N = 256, L = 800 for K = 16. The highest ρ psnr is marked in bold.
Lena Elaine Man Barbara Cameraman Boat Peppers House Bridge Mandrill Average
CSrandn 30.74 29.70 28.82 22.76 23.65 27.38 27.14 30.77 26.95 20.77 26.87
CSDCS 29.82 29.09 27.94 22.99 23.15 26.56 25.90 29.51 26.57 20.85 26.24
CSS−DCS − − − − − − − − − − −
CSLH − − − − − − − − − − −
CSMT 34.41 32.96 32.35 26.02 27.18 30.97 30.74 34.33 30.31 24.01 30.33
and Mandrill are shown in Figs. 4 and 5, respectively. For a be observed, the proposed CS system CSMT has highest
clear comparison, We choose the projection matrices and cor- ρ psnr among the three CS systems.
responding dictionary which yields the highest average ρ psnr
from Table 2 to Table 4 in Figs. 4 and 5. • The recent work in [33] states that it is possible to train the
dictionary on millions of training signals whose dimension
Remark 5:
is also more than one million. The proposed method can
be utilized to design a robust projection matrix on such
• We observe that for the CS systems CSrandn and CSDCS ,
high dimensional dictionaries since it gets rid of the re-
designing the projection matrix on a high dimensional dic-
quirement of the SRE matrix E . Note that in this case,
tionary results in similar performance to what is shown in
more efforts for efficiently solving (15) are needed. A full
Experiments A and B with the same compression rate M N. investigation regarding this direction belongs to a future
Moreover, CSDCS has lower ρ psnr in Experiments B and
work.
C than Experiment A. However, the proposed CS system
CSMT has increasing ρ psnr from Experiment A to Exper- Three sets of experiments on natural images are conducted to
iment C. This indicates the effectiveness of the proposed illustrate the effectiveness and efficiency of the proposed frame-
method for a CS system with a higher dimensional dictio- work in Section 3. A dictionary trained on a larger dataset
nary. can better represent the signal and the corresponding CS sys-
tem yields better performance in terms of SRA. Additionally, a
• We also investigate the influence of the parameters M, L high dimensional dictionary has more freedom to represent the
, K to the above mentioned CS systems. The simulation signals of interest. The CS system with a high dimensional
results on Testing Data II are given in Fig.s 6 to 8. As can dictionary and a projection matrix obtained by the proposed
9
(a) (b) (c)
(g) (h)
Figure 4: Lena and its reconstructed images from the corresponding CS systems. (a)The original. (b) CSrandn (Experiment B). (c) CSDCS (Experiment A). (d) CSLH
(Experiment B). (e) CSS−DCS (Experiment A). (f) - (h) CSMT (From Experiment A to Experiment C). The corresponding ρ psnr can be found from Table 2 to Table 4.
10
(a) (b) (c)
(d) (e)
(f) (g)
Figure 5: Mandrill and its reconstructed images from the corresponding CS systems. (a)The original. (b) CSrandn (Experiment B). (c) CSDCS (Experiment A). (d)
CSLH (Experiment B). (e) CSS−DCS (Experiment A). (f) - (h) CSMT (From Experiment A to Experiment C). The corresponding ρ psnr can be found from Table 2 to
Table 4.
11
method results in higher ρ psnr . However, both cases need to
train the dictionary on a large-scale training dataset, making it CSrandn
31.5
inefficient or even impossible for computing the SRE matrix E . CSDC S
CSM T
One of the main contributions in this paper is proposing a new 31
framework that is independent of the SRE matrix E .
30.5
30
32
̺psnr
29.5
31 CSrandn
29
CSDC S
CSM T
30 28.5
28
29
̺psnr
27.5
25
50 55 60 65 70 75 80 SRE matrix in designing procedure is reasonable as it is equiv-
M
alent to the case when we have infinite number of training sam-
Φk2F as an surrogate to the projected SRE
ples. We thus utilize kΦ
Figure 6: The statistic ρ psnr versus the varying of M on Testing Data II for a 2
fixed N = 256, L = 800 and K = 16. kΦΦE kF to design the sensing matrices. The performance of de-
signing projection matrices with dictionaries either learned on
large-scale training dataset or of high dimension is experimen-
tally examined. The simulation results on synthetic data and
natural images demonstrate the effectiveness and efficiency of
32 CSrandn the proposed approach. It is of interest to note that the pro-
CSDC S
CSM T posed method yields better performance when we increase the
31
dimension of the dictionary, which surprisingly is not true for
the other methods.
30
Our proposed framework for designing robust sensing
matrices—which shares similar structure to that in [15] [16]—
̺psnr
29
simultaneously minimizes the surrogate of sparse representa-
28
tion error (SRE) and the mutual coherence of the CS systems.
Thus we need extract effort (like Figure 2) to find an optimal
27
λ that well balances these two terms. An ongoing research is
to come up with a new framework without requiring balanc-
26 ing the tradeoff between minimizing the mutual coherence and
decreasing the projected SRE.
12 14 16 18 20 22 24 26 28 30
K
Acknowledgment
Figure 7: The statistic ρ psnr versus the varying of K on Testing Data II for a
fixed N = 256, M = 80 and L = 800.
This research is supported in part by ERC Grant agreement
no. 320649, and in part by the Intel Collaborative Research
Institute for Computational Intelligence (ICRI-CI). The code
5. Conclusion
in this paper to represent the experiments can be downloaded
This paper considers the problem of designing a robust pro- through the link https://github.com/happyhongt/
jection matrix for the signals that are not exactly sparse. A
novel cost function is proposed to decrease the influence of
References
SRE for the measurements and at the same time is indepen-
dent of training data and the corresponding SRE matrix (the [1] E. J. Candès, J. Romberg, and T. Tao, “Robust uncertainty principles:
independence of training data saves computations for practical Exact signal reconstruction from highly incomplete frequency informa-
designing). As shown in Lemma 1, we state that discarding the tion,” IEEE Trans. Inf. Theory, vol. 52, pp. 489-509, Feb. 2006.
12
[2] E. J. Candès and T. Tao, “Near optimal signal recovery from random nals: simultaneous sensing matrix ans sparsifying dictionary optimiza-
projections: Universal encoding strategies,” IEEE Trans. Inf. Theory, tion,” IEEE Trans. Image Process., vol. 18, pp. 1395-1408, Jul. 2009.
vol. 52, pp. 5406-5425, Dec. 2006. [29] G. Casella, and L.B. Roger. Statistical Inference, Vol. 2. Pacific Grove,
[3] D. L. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory, vol. 52, CA: Duxbury, 2002.
pp. 1289-1306, Apr. 2006. [30] J. Nocedal and S. Wright, Numerical Optimization, Springer, 2006.
[4] E. J. Candès and M. B. Wakin, “An introduction to compressive samp- [31] B. C. Russell, A. Torralba, K. P. Murphy and W. T. Freeman, “LabelMe:
ing,” IEEE Signal Process. Mag., vol. 25, pp. 21-30, Mar. 2008. A Database and Web-Based Tool for Image Annotation,” International
[5] M. Elad, Sparse and Redundant Representations: from theory to appli- Journal of Computation Vision, vol. 77, pp. 157-173, May 2008.
cations in signal and image processing, Springer Science & Business [32] R. Rubinstein, M. Zibulevsky and M. Elad, “Efficient implementation
Media, 2010. of the K-SVD algorithm and the Batch-OMP method,” Department of
[6] Y. C. Eldar and G. Kutyniok, Compressed Sensing: Theory and Appli- Computer Science, Technion, Israel, Tech. Rep., 2008.
cation, Cambridge University Press, May 2012. [33] A. Mensch, J. Mairal, B. Thirion and G. Varoquaux, “Dictionary learn-
[7] M. Zibulevsky and M. Elad, “`1 -`2 Optimization in Signal and Image ing for massive matrix factorization,” Proceedings of the 33th annual
Processing,” IEEE Signal Process. Mag. vol. 27, pp. 76-88, May 2010. international conference on machine learning ACM (ICML), 2016.
[8] D. Barchiesi, and M. D. Plumbley, “Learning incoherent dictionaries for
sparse approximation using iterative projections and rotations,” IEEE
Trans. Signal Process., vol. 61, pp. 2055-2065, Apr. 2013.
[9] G. Li, Z. Zhu, H. Bai, and A. Yu, “A new framework for designing in-
coherent sparsifying dictionaries,” in IEEE Conf. Acous., Speech, Signal
Process.(ICASSP), pp. 4416-4420, 2017.
[10] M. Elad, “Optimized projections for compressed sensing,” IEEE Trans.
Signal Process., vol. 55, pp. 5695-5702, Dec. 2007.
[11] V. Abolghasemi, S. Ferdowsi, and S. Sanei, “A gradient-based alternat-
ing minimzation approach for optimization of the measurement matrix
in compressive sensing,” Signal Process., vol. 94, pp. 999-1009, Apr.
2012.
[12] W.-S. Lu and T. Hinamoto, “Design of projection matrix for compres-
sive sensing by nonsmooth optimization,” IEEE International Sympo-
sium Circuits and Systems (ISCAS), pp. 1279 - 1282, Jun. 2014.
[13] S. Li, Z. Zhu, G. Li, L. Chang, and Q. Li, “Projection matrix optimiza-
tion for block-sparse compressive sensing,” IEEE Conf. Signal Process.,
Communicaton and Computation (ICSPCC), Aug. 2013.
[14] G. Li, Z. H. Zhu, D. H. Yang, L. P. Chang, and H. Bai, “On projec-
tion matrix optimization for compressive sensing systems,” IEEE Trans.
Signal Process., vol. 61, pp. 2887-2898, Jun. 2013.
[15] G. Li, X. Li, S. Li, H. Bai, Q. Jiang and X. He, “Designint robust sensing
matrix for image compression,” IEEE Trans. Image Process., vol. 24, pp.
5389-5400, Dec. 2015.
[16] T. Hong, H. Bai, S. Li, and Z. Zhu, “An efficient algorithm for designing
projection matrix in compressive sensing based on alternating optimiza-
tion,” Signal Process., vol. 125, pp. 9-20, Aug. 2016.
[17] Z. Zhu and M. B. Wakin, “Approximating sampled sinusoids and multi-
band signals using multiband modulated DPSS dictionaries,” J. Fourier
Analysis Appl., pp. 1-40, Aug. 2016.
[18] I. Tosic and P. Frossard, “Dictionary Learning,” IEEE Signal Process.
Mag., vol. 28, pp. 27-38, Mar. 2011.
[19] M. Aharon, M. Elad, and A. Bruckstein, “K-SVD: An algorithm for
designing overcomplete dictionaries for sparse representation,” IEEE
Trans. Signal Process., vol. 54, pp. 4311-4322, Nov. 2006.
[20] K. Engan, S. O. Aase and J. H. Hakon-housoy, “Method of optimal di-
rection for frame design,” Proc. IEEE Int. Conf. Acoust., Speech, Signal
Process.(ICASSP), vol. 5, pp. 2443-2446, Mar. 1999.
[21] L. Botou, “Online algorithms and stochastic approximations,” Online
Learning and Neural Networks, Cambridge Univ. Press, 1998.
[22] J. Mairal, F. Bach, J. Ponce and G. Sapiro, “Online dictionary learning
for sparse coding,” Proceedings of the 26th annual international confer-
ence on machine learning ACM (ICML), pp. 689-696, 2009.
[23] J. Mairal, F. Bach, J. Ponce and G. Sapiro, “Online learning for matrix
factorization and sparse coding,” Journal of Machine Learning, vol. 11,
pp. 19-60, Jan. 2010.
[24] Z. Zhu, Q. Li, G. Tang, and M. B. Wakin, “Global Optimality in Low-
rank Matrix Optimization,” arXiv preprint, arXiv:1702.07945, 2017.
[25] M. Schmidt, “minFunc: unconstrained differentiale multivari-
ate optimization in Matlab.” https://www.cs.ubc.ca/˜schmidtm/
Software/minFunc.html, 2005.
[26] J. Sulam, B. Ophir, M. Zibulevsky and M. Elad, “Trainlets: dictionary
learning in high dimensions,” IEEE Trans. Signal Process., vol. 64, pp.
3180-3193, Jun. 2016.
[27] J. Sulam and M. Elad, “Large inpainting of face images with trainlets,”
IEEE Signal Processing Letter, vol. 23, pp. 1839-1843, Dec. 2016.
[28] J. M. Duarte-Carvajalino and G. Sapiro, “Learning to sense sparse sig-
13