Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMI.2017.2699225, IEEE
Transactions on Medical Imaging
IEEE TRANSACTIONS ON MEDICAL IMAGING 1

Basis Expansion Approaches for Regularized


Sequential Dictionary Learning Algorithms with
Enforced Sparsity for fMRI Data Analysis
Abd-Krim Seghouane and Asif Iqbal.

Abstract—Sequential dictionary learning algorithms have been the underlying structure of the problem by decomposing the
successfully applied to functional magnetic resonance imaging observed data using a factor model and a specific constraint.
(fMRI) data analysis. fMRI datasets are however structured data While independent component analysis (ICA) [13] [14] [15]
matrices with notions of temporal smoothness in the column
direction. This prior information which can be converted to a achieves such a decomposition by assuming the rows of X
constraint of smoothness on the learned dictionary atoms has are samples from statistically independent random variables,
seldomly been included in classical dictionary learning algorithms dictionary learning methods assume that the column of X are
when applied to fMRI data analysis. In this paper we tackle this sparse. With dictionary learning methods, the fMRI time series
problem by proposing two new sequential dictionary learning measured at a specific voxel is approximated by a sparse linear
algorithms dedicated to fMRI data analysis by accounting for
this prior information. These algorithms differs from the existing combination of dynamic components, where each component
ones in their dictionary update stage. The steps of this stage has a different time-series signal pattern. As a result dictionary
are derived as a variant of the power method for computing learning methods can be used to extract sets of mutually
the SVD. The proposed algorithms generate regularized dic- correlated brain regions without prior information on the
tionary atoms via the solution of a left regularized rank-one time course of these regions. Some of these sets of regions,
matrix approximation problem where temporal smoothness is
enforced via regularization through basis expansion and sparse interpreted as functional networks [16] can give insights into
basis expansion in the dictionary update stage. Applications on the mechanisms of brain diseases and their modifications in
synthetic data experiments and real fMRI datasets illustrating pathological situation can serve as biomarkers to aid in clinical
the performance of the proposed algorithms are provided. diagnostic [17].
Keywords: functional magnetic resonance imaging (fMRI), Datasets arising from fMRI experiments exhibit a temporal
dictionary learning, sequential update, regularization, basis structure that is characterized by smoothness. This prior infor-
expansion. mation which can be converted to a constraint of smoothness
on the learned dictionary atoms has been ignored in classical
dictionary learning algorithms when applied to fMRI data
analysis. With fMRI datasets in the form of Y described above,
I. I NTRODUCTION regularizing the dictionary elements or atoms to encourage
Dictionary learning is an increasingly used data-driven smoothness of the dataset in the columns direction may be
approach to analyze fMRI data [1] [2] [3] [4] [5] [6] [7] of interest. Indeed, while we expect to have only a limited
[8] [9] [10] [11]. Given an fMRI data matrix Y formed by number of voxels active at each time point, it is also expected
vectorizing each time series observed in every voxel creating to have continuous activity along the time. The temporal
a matrix n × N where n is the number of time points smoothness assumption can be motivated by the convolution
and N the number of voxels (≈ 10, 000 − 100, 000) [12], model popularly used in fMRI data analysis [18]. Under
dictionary learning approaches, like the widely used general this linear model, the blood oxygenation level-dependent
linear model (GLM), assume a linear multivariable model for (BOLD) signal response to neuronal activity is lagged, damped
the fMRI data Y = DX where the matrix D is the dictionary (smoothed) by the hemodynamic response function (HRF)
and X is a sparse matrix of latent variables. The difference [19]. Therefore, the signal at a fixed voxel over time is believed
between the two approaches is that the matrix D is estimated to be smooth and of low frequency. We therefore further
in dictionary learning approaches whereas the design matrix is develop dictionary learning algorithms that accounts for such a
specified in the GLM. As a data-driven approach to model the priori information by enforcing smoothness of the dictionary
observed data, dictionary learning algorithms are suitable for atoms. This is obtained by regularizing the dictionary atoms
the analysis of fMRI data as they minimize the assumptions on in the dictionary update stage where regularization is obtained
through penalized rank one matrix approximation [20].
Copyright c 2017 IEEE. Personal use of this material is permitted. In [21] [22] a dictionary learning approach leading to sub-
However, permission to use this material for any other purposes must be
obtained from the IEEE by sending a request to pubs-permission@ieee.org stantial performance improvement was proposed. Within this
This work was supported by the Australian Research Council through Grant approach the sparsity constraint is not confined to the sparse
FT. 130101394 coding stage only but also included in the dictionary update
The authors are with the Department of Electrical and Electronic Engi-
neering, The University of Melbourne. Melbourne, Australia, e-mail: Abd- stage such that with each dictionary atoms, its associated
krim.seghouane@unimelb.edu.au and Aiqbal1@student.unimelb.edu.au sparse code or the support is also updated. In this paper a

0278-0062 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMI.2017.2699225, IEEE
Transactions on Medical Imaging
2 IEEE TRANSACTIONS ON MEDICAL IMAGING

similar approach is adopted to propose alternative sequen- D from being arbitrarily large and therefore have arbitrarily
tial dictionary learning algorithms adapted to data matrices small values of xi , it is common to constrain D to belong to
whose column domain is structurally smooth. The proposed set D = {D ∈ Rn×K : k dk k2 = 1 ∀k}, where k . k2 is the
algorithms differ in their dictionary update stage which are l2 norm and dk is the k th column of D. Finding the optimal s
derived based on a variation of the familiar power method corresponds to a problem of model order selection that can be
or alternating least square method for calculating the SVD resolved using a univariate linear model selection criterion [26]
[23]. The steps of the dictionary update stage of the proposed [27]. The generally used optimization strategy, not necessarily
algorithms are obtained through the solution of regularized leading to a global minimum consists in splitting the problem
rank-one matrix approximation problems where regularization into two stages which are alternately solved within an iterative
is introduced through basis expansion and sparse basis ex- loop. These two stages are, first, the sparse coding stage, where
pansion [24]. The regularization is obtained by shrinking the D is fixed and the sparse coefficient vectors are found by
dictionary atom to a certain subspace defined by the basis solving
functions. Furthermore, this basis can be constructed using
x̂i = arg min k yi − Dxi k22 ; (2)
a priori frequencies information on the fMRI dataset or on xi
the experiment conducted to generate the fMRI data. It is a
subject to k xi k0 ≤ s i = 1, ..., N.
way to introduce some known structure on the data such as
smoothness in the column direction of Y. fMRI datasets are In practice, the sparse coding stage is often approximately
structured with notions of temporal smoothness and classical solved by using either a greedy pursuit or a convex relaxation
dictionary learning algorithms ignoring these structures can approach [28]. The dictionary update stage where X is fixed
result in lower performance. Based on the above motivations and D is derived by solving
our specific contributions include a) a sequential dictionary
D = arg min k Y − DX k2F (3)
learning algorithm for regularized dictionary learning in the D
column direction through sparse basis expansion, b) a sequen- followed by normalizing its columns constitutes the second
tial dictionary learning algorithm for regularized dictionary stage.
learning in the column direction through basis expansion c) This is where sequential and parallel update methods differ.
computationally efficient algorithms to compute a) and b). In parallel update methods all dictionary atoms are updated in
In the next section the dictionary learning method proposed in parallel using least squares [29] [30] or maximum likelihood
[25] and [21] are reviewed. The proposed sequential dictionary [31] [32] [33] whereas sequential update methods [25] [34]
learning algorithm for regularized dictionary learning in the break the global minimization (3) into K sequential mini-
column direction through basis expansion is described in mization problems. In the method proposed in [25], which
section III. In section IV the sequential dictionary learning has become a benchmark in dictionary learning, each column
that is based on sparse basis expansion is derived. Section V dk of D and its corresponding row of coefficients xrow are
k
contains simulation results illustrating the performance of the updated based on a rank-1 matrix approximation of the error
proposed algorithms. Concluding remarks are given in section for all the signals when dk xrow is removed
k
VI.
{dk , xk } = arg min
row
k Y − DX k2F
dk ,xk
II. BACKGROUND
= arg min
row
kEk − dk xrow 2
k kF . (4)
Given a set of signals Y = [y1 , y2 , ..., yN ], a learned dk ,xk

dictionary is a collection of vectors or atoms dk , k = 1, ..K PK


where Ek = Y − i=1,i6=k di xrow . The singular value decom-
i
that can be used for optimal linear representation. Usually the
position (SVD) of Ek = U∆V> is used to find the closest
objective is to find a linear representation for the set of signals
rank-1 matrix approximation of Ek . In this case, dk could be
Y
updated by taking the first column of U and the xrow could
{D, X} = arg min k Y − DX k2F k
D,X be updated by taking the first column of V multiplied by the
first diagonal element of ∆. This form of update corresponds
where D = [d1 , d2 , ..., dK ], that makes the total representation
to a dictionary update stage that ignores the sparsity pattern
error as small as possible. This optimization problem is ill-
information derived in the sparse coding. A dictionary update
posed unless extra constraints are imposed on the dictionary
stage that uses the sparsity pattern information with (4) can
D and the sparse codes X. The common constraint on X, is
be obtained by avoiding the loss of sparsity in xrow
k that will
that each column of X is sparse thus the name “sparse codes”.
be created by the direct application of the SVD on Ek . This
Let the sparse coefficient vectors xi , i = 1, ..., N constitute
solution was adopted in [25] where it was proposed to modify
the columns of the matrix X, with this constraint, the above
only the nonzero entries of xrow
k by taking into account only
objective can be re-stated as the minimization problem
the signals yi that use the atom dk in (4) or by taking the
row
min ||Y − DX||2F s.t. k xi k0 ≤ s, ∀ 1 ≤ i ≤ N, (1) SVD of ER k = Ek Iwk and work with e xk = xrowk Iwk , where
D,X row
wk = {i|1 ≤ i ≤ N ; xk (i) 6= 0} and Iwk the N × |wk |
where the xi ’s are the column vectors of X, k . kF is the submatrix of the N × N identity matrix obtained by retaining
Frobenius norm, k . k0 is the l0 quasi-norm, which counts only those columns whose index numbers are in wk , instead
the number of nonzero coefficients and s  K. To prevent of the SVD of Ek .

0278-0062 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMI.2017.2699225, IEEE
Transactions on Medical Imaging
SEGHOUANE et al.: BASIS EXPANSION APPROACHES FOR REGULARIZED SEQUENTIAL DICTIONARY LEARNING ALGORITHMS WITH ENFORCED SPARSITY FOR FMRI DATA AN

The motivation of the proposed approach comes from the descriptions, the dictionary updates of dk and xrow
k that
observation that the rank-1 approximation obtained using the generates smooth dictionary atoms in the dictionary update
SVD and written as dk xrowk can also be approximated by stage can be formulated as
applying few iterations of the power method for computing
{ak , xrow
k } = arg min kEk − Dp ak xrow 2 row
k kF + α1 kxk k1
the SVD [23]. Recall that the power method, or alternating row ak ,xk
least square method, sequentially estimates dk with xrow
k fixed subject to k Dp ak k22 = 1, k = 1, ..., K.(10)
and vice versa by solving the least square problem
where Dp is the base dictionary [24] of size n × p, p ≤ n,
row row row
kER xk k2F = tr (ER xk )(ER xk )>

k − dk e k − dk e k − dk e k.k1 is the l1 norm of xrow k and α1 is a non-negative penalty
= kER 2 > R row> parameter controlling the amount of sparsity in xrow (increas-
k kF − 2dk Ek exk k
2 row 2 ing α1 increases the amount of sparsity in xrow k ). The updates
+ kdk k2 .ke
xk k2 (5) dk and xrow are obtained by alternating minimization of (10).
k
and then rescaling the estimates to give Regularization in the column direction is obtained by rep-
resenting dk in a fully known low-dimensional subspace
row>
ER
kexk Sdk = span(dp1 , ..., dpp ) where dpi is the ith column of
dk = row>
(6)
k ER xk k2 Dp . Description in a lower dimensional subspace can be seen
ke
as a limiting case of regularization by using the squared l2
row
xk = d>
e R
k Ek . (7) norm between dk and its orthogonal projection PDp dk , i.e.;
These equations define the power algorithm, which, if initial- α k (I − PDp )dk k22 as a penalty. In this case PDp is the
ized randomly, converges almost surely to a least square rank orthogonal projector on Sdk . The associated penalty matrix is
one fit [23]. Using this observation a dictionary update stage the residual operator Ω = α2 (I − PDp ) where α → ∞ [35].
can be obtained by iterating (6) and (7) until convergence or by The shrinkage of the solution toward the lower dimensional
applying only few iterations of these equations instead of the subspace Sdk can be obtained by using finite values for the
computationally expensive SVD of ER parameter α.
k . This approach offers
a computationally efficient alternative to [25]. As for the minimization of (5), the updates of dk = Dp ak and
In [21] and independently in [22] following a different ap- xrow
k can be obtained by iterative alternating minimization of
proach, an alternative dictionary update stage that leads to an (10), i.e., first fixing ak , the xrow
k that minimizes (10) is derived
improved performance dictionary learning algorithm compared from
>
> >
to state of the art methods was proposed. Within this dictionary xrow
k = arg min
row
kxrow 2 row row
k k + αkxk k1 − 2ak Dp Ek xk .
xk
update stage it is proposed to re-update all the entries of xrow
k
and the sparsity pattern information instead of only updating and gives
the nonzero entries of xrow
k . The resulting algorithm is a vari- >

> > α1 
xrow
k = sgn(a> k Dp Ek ) ◦ |ak Dp Ek | − 1(N ) (11)
ant of the power method or alternating least square method for 2 +
regularized rank one approximation where a sparsity penalty where ◦, | . |, sgn(.), (.)+ define the Hadamard product, the
is introduced in the minimization problem to promote sparsity component-wise absolute value, the component-wise sign and
of xrow
k . The estimate of dk and xk
row
are given by the component-wise max(0, x) respectively. Then fix xrow k to
> derive ak as
Ek xrow
k
dk = . (8) > >
||Ek xrow
>
||2 ak = arg min −2a> row
k Dp Ek xk + kDp ak k2 .kxrow
k k
2
k ak
 α >  which gives
xrow
k = sgn(d> >
k Ek ) ◦ |dk Ek | − 1 (9)
2 (N ) + −1 >
ak = D>
p Dp D> row
p Ek xk (12)
where ◦, | . |, sgn(.), (.)+ define the Hadamard product,
the component-wise absolute value, the component-wise sign following the normalization k Dp ak k22 =
1 gives
and the component-wise max(0, x) respectively. The 1N is a ak
ak = .
vector of ones of size N . Below we propose extensions of the k Dp ak k2
algorithm proposed in [21] that are adapted to data matrices The estimates of ak and xrow are obtained by iterating between
k
whose column domain is structurally smooth. these two steps (11) and (12) until convergence. The updated
dictionary atom is obtained as dk = Dp ak . The derived
III. R EGULARIZED SEQUENTIAL DICTIONARY LEARNING dictionary learning algorithm is illustrated in table I.
VIA BASIS EXPANSION In the case of an orthonormal basis such that D> p Dp = I, the
As discussed in Section I, with a number of datasets Y derivation of the proposed approach for sequential atom update
we may be interested in obtaining smooth dictionary atoms to in regularized dictionary learning is based on the observation
encourage smoothness in the column direction of Y. Among that minimizing kEk − Dp ak xrow 2
k kF with respect to ak and
>
the options for regularized penalties that can be used in xk is the same as minimizing kDp Ek −ak xrow
row 2
k kF ; i.e., with
>
the cost function used in the dictionary update stage to Ek replaced by Dp Ek . The steps of the dictionary update stage
encourage smoothness of the dictionary atoms, we focus here are therefore obtained from the SVD of D> p Ek followed by the
on regularization through basis expansion. From the previous normalization of ak and the update (11).

0278-0062 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMI.2017.2699225, IEEE
Transactions on Medical Imaging
4 IEEE TRANSACTIONS ON MEDICAL IMAGING

TABLE I The estimates of ak and xrow


k are obtained by iterating between
S TEPWISE DESCRIPTION OF THE PROPOSED SEQUENTIAL ALGORITHM FOR these two steps (11) and (15) until convergence instead of
LEARNING SMOOTH DICTIONARY ATOMS VIA BASIS EXPANSION
(11) and (12). The derived dictionary learning algorithm is
Algorithm 1 illustrated in table II.
Given: Training data Y ∈ Rn×N , base dictionary Dp , The sparse basis expansion approach acts as a regularizer and
initial dictionary representation Aini ,
error tolerance , no. of iterations j. generates smooth dictionary atoms. Besides the appearance
Dictionary initialization: of a prespecified dictionary Dp in the dictionary update cost
D = Dp Aini ; function, the proposed approach for regularized dictionary
For i=1 to j
1: Sparse Coding Stage: learning is different from a regularized dictionary learning
Find sparse coefficients X, using algorithm obtained with a dictionary update based on the cost
x̂i = arg minxi k xi k0 ; function
subject to k yi − Dxi k22 ≤  i = 1, ..., N
>
2: Dictionary Update Stage: kER row 2 row
k − dk xk kF + α1 kxk k1 + α2 dk Ωdk (16)
For each column k = 1, 2, ..., K in A,
Set the kth column of A, ak = 0; where Ω is a non-negative definite roughness penalty matrix
2.a: Compute the error matrix using
Ek = Y − Dp AX used to penalize the second differences [20]
While kaiter
k − aiter+1
k k22 ≥ ε iterate n−1
2.b: Update the row xrow using
X
k
> D> E ). |a> D> E | − α1 1 d>
k Ωdk = d2k (1)+d2k (n)+ (dk (i+1)−2dk (i)+dk (i−1))2
xrow

k = sgn(a k p k k p k 2 (N ) +
i=2
2.c: Update ak using: (17)
−1 > >
ak = D> p Dp Dp Ek xrow
k and α2 is the hyperparameter that control the amount of
2.d: Rescale ak using:
aiter = kD aak k regularization. In the case of (16) the term αd> k Ωdk is used
k p k 2
iter = iter+1 to impose some smoothness on the atoms dk ’s. However, it
2.e: Update the dictionary atom dk using dk = Dp ak . is well known that enforcing smoothness using penalties does
end. not work well for all functions. It is the case for functions
Output: D,X
with jumps. Wavelets regularization for example performs
well for this kind of function by concentrating on sparsity
IV. R EGULARIZED SEQUENTIAL DICTIONARY LEARNING in the transformed domain rather than on regularization in the
VIA SPARSE BASIS EXPANSION
original domain. Furthermore, an efficient algorithm exits for
computing the discrete wavelet transform [36]. Therefore, if in
Instead of the approach presented above and based on (10) the transformed domain obtained by the basis Dp in the case
learning regularized dictionary atoms can be obtained via of regularized sequential dictionary learning via sparse basis
sparse basis expansion. This is the approach adopted here to expansion, the atoms are sparse, then we should take advantage
derive an alternative variant of the power method for updating of this information and work in this transformed domain [37].
both xrow
k and ak in the dictionary update stage. Within this Direct regularization as offered by (16) or algorithm 1 should
framework, the dictionary update stage problem (8) can be work well in situations when the atom function doesn’t contain
formulated as abrupt or sudden variations or when the atom function is “a
nice function”.
{ak , xrow
k } = arg min
row
kEk − Dp ak xrow 2 row
k kF + α1 kxk k1
ak ,xk
+ α2 kak k1 V. E XPERIMENTAL E VALUATION
subject to k Dp ak k22 = 1, k = 1, ..., K.(13) In this section, we are presenting the performance analysis
of the proposed dictionary learning algorithms, namely A1
As for the minimization of (10), the updates of ak and xrow k
(algorithm 1) and A2 (algorithm 2) with respect to the S1
can be obtained by iterative alternating minimization, i.e., first [21], K-SVD [25], and its sparse version known as K-SVDs
fixing dk , the xrow
k that minimizes (13) is is given by (11) and [24]. We performed two experiments for performance analysis;
similar to the minimizer of (10). For fixed xrow
k , minimization
dictionary recovery and sparse GLM analysis using simulated
of (13) with respect to ak is equivalent to the minimization of fMRI dataset [1], and then we applied the algorithms on two
real fMRI datasets; block-paradigm auditory stimulus task
row >
ak = arg min kDp ak k22 kxrow 2
k k2 + α2 kak k1 − 2xk Ek Dp ak fMRI dataset, and resting state fMRI dataset for validation.
ak
(14) The details of these experiments are given below.
which gives
−1 A. Dictionary Recovery
D>p Dp
 >
  α
> 2

ak = sgn D> row
p Ek xk ◦ D> row
p Ek xk − 1p . This section presents the performance comparison of the

>
k xrow
k k22 2 +selected algorithms in terms of their ability to recover an
(15) underlying original dictionary D (the generating dictionary),
g
Imposing the scaling constraint k Dp ak k22 = 1 gives which has been used to generate the test signals matrix Y. We
ak start by selecting discrete cosine transform (DCT) as our base
ak = . dictionary Dp of size 20 × 20 and a random initial dictionary
k Dp ak k2

0278-0062 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMI.2017.2699225, IEEE
Transactions on Medical Imaging
SEGHOUANE et al.: BASIS EXPANSION APPROACHES FOR REGULARIZED SEQUENTIAL DICTIONARY LEARNING ALGORITHMS WITH ENFORCED SPARSITY FOR FMRI DATA AN

TABLE II TABLE III


S TEPWISE DESCRIPTION OF THE PROPOSED SEQUENTIAL ALGORITHM FOR AVERAGE PERCENTAGE OF RECOVERED ATOMS WITH DCT AS BASE
LEARNING SMOOTH DICTIONARY ATOMS VIA SPARSE BASIS EXPANSION DICTIONARY

Algorithm 2
Sparsity level (s) KSVD KSVDs S1 A1 A2
Given: Training data Y ∈ Rn×N , base dictionary Dp , initial dictionary
representation Aini , error tolerance , no. of iterations j. 2 79.87 71.20 83.40 87.93 88.60
Dictionary initialization: 3 71.20 59.73 79.73 87.80 88.93
D = Dp Aini ; SNR 10 dB
4 28.53 39.93 49.20 71.60 72.47
For i=1 to j
1: Sparse Coding Stage: 2 85.40 81.70 87.67 92.60 93.33
Find sparse coefficients X, using 3 88.80 79.67 89.67 94.60 93.53
x̂i = arg minxi k xi k0 ; SNR 20 dB
4 91.27 74.53 87.60 93.87 94.27
subject to k yi − Dxi k22 ≤  i = 1, ..., N
2: Dictionary Update Stage: 2 82.93 82.13 90.60 92.13 92.20
For each column k = 1, 2, ..., K in A, 3 87.93 80.40 91.93 93.40 92.67
SNR 35 dB
Set the kth column of A, ak = 0; 4 92.13 77.00 88.40 93.80 95.20
2.a: Compute the error matrix using
Ek = Y − Dp AX 2 85.07 82.27 92.20 92.27 93.93
Iterate: 3 89.93 79.07 90.40 93.27 94.20
SNR 50 dB
2.b: Update the row xrowk using 4 92.47 75.53 90.40 93.13 94.13
> > > α1
xrow = sgn(a>

k k Dp Ek ). |ak Dp Ek | − 2 1(N ) +
2.c: Update ak using 
−1
D>
p Dp A1 and A2 , α3 = 0.45 was selected for S1 and the selected
Compute ψ = row> 2
 kxk k2    atom sparsity level for K-SVDs was 10. It is evident from the
ak = ψ sgn D> R row> · > ER xrow> − α2
.
p Ek xk Dp k k 1
2 p + table III, that both proposed algorithms perform better than
2.d: Rescale ak using: the other three algorithms in all cases. For visualization of
ak = kD aak k
p k 2 convergence rate of all algorithms, the average percentage of
2.e: Update the dictionary atom dk using dk = Dp ak .
end.
recovered atoms vs iteration number are presented in Fig. 1,
Output: D,X and Fig. 2 with s = 3 and s = 4 for SN R = 10 dB. It can
be seen that the convergence rate of the proposed algorithm
is much superior than K-SVD algorithm.
representation Aini of size 20 × 50. In order to get a sparse We also ran the exact same experiment using a different base
(in Dp ) original dictionary Dg , 10 random entries from each dictionary Dp of size 100 × 20 with each column consisting
column of Aini are made zero. The columns of Aini were then of multiple hemodynamic response functions (HRFs) [38]
normalized such that ||Dp a||2 = 1, where a denotes a column starting at random locations. Using this Dp , we generated the
of Aini . The ground truth dictionary is, then, generated by test signal matrix Y ∈ R100×1500 in the same way as described
Dg = Dp Aini resulting in the dictionary of size 20 × 50. earlier. The mean atom recovery rate obtained over 30 trials is
1500 test signals Y of dimensions 20 denoted by {yi }1500 i=1
presented in table IV for sparsity levels s ∈ {2, 3} and signal
were then created by the linear combination of s dictionary to noise ratios corresponding to SN Rs ∈ {10, 20, 35, 50} dB.
columns (atoms) taken from random locations with uniformly We can also observe from this table as well that the proposed
distributed i.i.d coefficients. Test signals were corrupted with algorithms outperform the other three algorithms.
AWGN corresponding to different signal to noise ratios. These
signals were used by different algorithms in order to recover TABLE IV
AVERAGE PERCENTAGE OF RECOVERED ATOMS WITH HRF S AS BASE
the underlying dictionary. DICTIONARY
In all dictionary learning algorithms, the dictionary Dl given
by Dl = Dp Aini with random Aini , was used as initial Sparsity level (s) KSVD KSVDs S1 A1 A2
dictionary. Orthogonal Matching Pursuit (OMP) [28] was used
2 73.53 66.73 76.80 85.53 85.47
in sparse coding stage with sparsity constraint s giving best SNR 10 dB 3 22.73 30.73 51.67 78.67 77.73
s-term approximation of the test signals Y. Each algorithm
2 80.00 70.87 82.80 87.47 88.00
was iterated 11s2 times (for each sparsity level s) to train SNR 20 dB 3 58.33 40.40 69.53 84.67 83.87
the dictionary by alternating between sparse coding and the
2 81.27 72.60 82.53 85.87 86.80
dictionary update stage.
SNR 35 dB 3 69.13 38.53 72.73 84.20 84.73
The learning process is repeated 30 times for sparsity levels
s ∈ {2, 3, 4} and signal to noise ratios corresponding to 2 80.67 73.53 85.80 87.80 87.20
SNR 50 dB 3 61.60 39.8 75.93 84.47 85.87
SN Rs ∈ {10, 20, 35, 50} dB. The learned dictionary Dl is
then compared with the generating dictionary Dg the same way
as described in [25]. Table III contains the mean percentage
of recovered atoms (over the 30 trials) for all sparsity and B. Sparse GLM Analysis on Simulated Data
noise levels. We tried multiple values of the tuning parameters In this section we have compared the effectiveness of the
for A1 , A2 , S1 , and K-SVDs and selected those values proposed algorithms A1 and A2 against S1 [21], K-SVD [25],
which gave the best results. The tuning parameters α1 and and its sparse version known as K-SVDs [24] algorithms
α2 = (0.25, 0.18) were selected for the proposed algorithms in the recovery of underlying source signals from the given

0278-0062 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMI.2017.2699225, IEEE
Transactions on Medical Imaging
6 IEEE TRANSACTIONS ON MEDICAL IMAGING

TABLE V
C ORRELATION OF RECOVERED TIME SERIES WITH ORIGINALS FOR THE THREE CASES AVERAGED OVER 100 ITERATIONS

K-SVD K-SVDs S1 A1 A2
SnR dB 0 -5 -10 0 -5 -10 0 -5 -10 0 -5 -10 0 -5 -10
A T1 0.99 0.98 0.94 1.00 1.00 0.94 0.99 0.98 0.94 1.00 0.99 0.98 1.00 0.99 0.98
Case (a)
B T2 0.96 0.88 0.38 0.88 0.82 0.56 0.97 0.89 0.53 0.99 0.96 0.83 0.99 0.97 0.84
avg 0.98 0.93 0.66 0.94 0.91 0.75 0.98 0.94 0.74 0.99 0.98 0.91 0.99 0.98 0.91
A T1 0.99 0.96 0.89 0.96 0.93 0.82 0.99 0.97 0.89 0.99 0.98 0.96 0.99 0.98 0.96
Case (b)
C T2 0.98 0.96 0.89 0.99 0.94 0.86 0.97 0.96 0.89 0.99 0.99 0.96 0.99 0.99 0.96
avg 0.98 0.96 0.89 0.98 0.94 0.84 0.98 0.96 0.89 0.99 0.99 0.96 0.99 0.99 0.96
B T1 0.70 0.47 0.33 0.62 0.59 0.39 0.70 0.60 0.47 0.87 0.74 0.43 0.85 0.73 0.41
Case (c)
C T2 0.99 0.97 0.93 1.00 0.99 0.95 0.99 0.96 0.92 0.99 0.99 0.97 0.99 0.99 0.97
avg 0.85 0.72 0.63 0.81 0.79 0.67 0.84 0.78 0.70 0.93 0.86 0.70 0.92 0.86 0.69

TABLE VI
C ORRELATION OF RECOVERED SPATIAL MAPS WITH ORIGINALS FOR THE THREE CASES AVERAGED OVER 100 ITERATIONS

K-SVD K-SVDs S1 A1 A2
SnR dB 0 -5 -10 0 -5 -10 0 -5 -10 0 -5 -10 0 -5 -10
T1 A 0.99 0.94 0.71 0.74 0.75 0.69 1.00 0.99 0.92 1.00 0.99 0.95 1.00 0.99 0.96
Case (a)
T2 B 0.98 0.70 0.15 0.41 0.32 0.25 1.00 0.98 0.48 1.00 0.98 0.82 1.00 0.98 0.93
avg 0.99 0.82 0.43 0.57 0.53 0.47 1.00 0.99 0.70 1.00 0.99 0.89 1.00 0.98 0.94
T1 A 0.91 0.92 0.86 0.72 0.67 0.56 0.99 0.98 0.91 0.99 0.98 0.92 0.99 0.98 0.92
Case (b)
T2 C 0.96 0.93 0.87 0.40 0.36 0.31 0.99 0.98 0.91 0.99 0.98 0.92 0.99 0.98 0.92
avg 0.94 0.92 0.87 0.56 0.52 0.44 0.99 0.98 0.91 0.99 0.98 0.92 0.99 0.98 0.92
T1 B 0.99 0.81 0.47 0.37 0.36 0.29 0.99 0.93 0.69 0.97 0.91 0.57 0.97 0.93 0.54
Case (c)
T2 C 0.89 0.84 0.69 0.38 0.42 0.35 0.89 0.87 0.84 0.96 0.95 0.93 0.96 0.94 0.94
avg 0.94 0.83 0.58 0.38 0.39 0.32 0.94 0.90 0.77 0.97 0.93 0.75 0.97 0.94 0.74

Atom Sparsity Level s = 3 for SNR = 10 dB Atom Sparsity Level s = 4 for SNR = 10 dB
90 70
K-SVD K-SVD
K-SVDs K-SVDs
Average Percentage of Recovered Atoms

Average Percentage of Recovered Atoms

80
S1 60 S1
A1 A1
70
A2 A2
50
60

50 40

40 30

30
20
20

10
10

0 0
0 10 20 30 40 50 60 70 80 90 100 0 20 40 60 80 100 120 140 160 180
Iterations Iterations

Fig. 1. Average percentage of atoms recovered after each iteration for different Fig. 2. Average percentage of atoms recovered after each iteration for different
sparsity levels s = 3 with SN R = 10 dB sparsity levels s = 4 with SN R = 10 dB

mixture of signals. We generated fMRI datasets for three cases are shown in Fig. 3; spatially independent events Fig.
distinct activation cases; spatially independent case, partial 3(a), partial spatial overlapping events Fig. 3(b), and complete
spatial overlap, and complete spatial overlap case as done spatial overlapping events Fig. 3(c). These three datasets were
in [1]. Two sinusoids with frequencies ∈ {1.5, 4.5}Hz of analyzed using the proposed algorithms A1 , A2 , and S1 , K-
duration 120 secs were taken as temporal paradigms and box SVD, and K-SVDs.
signals were used as activation patterns as shown in Fig. 3. We started with the dictionary Dl = Dp Aini where Dp is
Three distinct visual patterns of size 10 × 10 voxels were the discrete cosine transform (DCT) matrix of size 120 × 40
created with amplitudes of 1 in {2, ..., 6}×{2, ..., 6} for pattern and Aini having the size of 40 × 2 with i.i.d random entries.
A, {8, 9} × {8, 9} for pattern B, and {5, ..., 9} × {5, ..., 9} for The columns of Aini were normalized such that ||Dp a||2 = 1,
pattern C, and 0 elsewhere. Additive white Gaussian noise was where a denotes the column of Aini . The stopping criterion
used to corrupt the training signals with the corresponding sig- for the dictionary learning algorithms was chosen to be at
nal to noise ratios of ∈ {0, −5, −10}dB. The three simulated max 50 iterations or when the difference between dictio-

0278-0062 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMI.2017.2699225, IEEE
Transactions on Medical Imaging
SEGHOUANE et al.: BASIS EXPANSION APPROACHES FOR REGULARIZED SEQUENTIAL DICTIONARY LEARNING ALGORITHMS WITH ENFORCED SPARSITY FOR FMRI DATA AN

naries became smaller than 1 . The sparse coding step is subjects were presented with a 3 second visual cue followed
performed using Orthogonal Matching Pursuit (OMP) [28] by the cue for a specific task. The length of each movement
with sparsity constraint s = 1. We ran the simulations for block was 12 seconds (10 movements). There were a total of
different values for the tuning parameters and selected the 13 blocks, i.e. 4 toe movements (2 for each foot), 4 finger
ones providing best results. The tuning parameters α1 , α2 , α3 tappings (2 for each hand), 2 tongue movements and 3 15-
and atom sparsity level selected for A1 , A2 , S1 , and KSVDs second fixation blocks.
were (0.2, 0.02, 0.2, 1) respectively. Table V and table VI The block design motor task fMRI run duration was 3:34
contains the mean correlation of recovered time series with the (min:sec) with a total of 284 scans. We discarded the first
original ones and recovered spatial maps with the original ones 5 and used the remaining 279 scans for the sparse GLM
respectively over 100 trials. It can be seen that the proposed analysis. The scans were spatially smoothed using a 6x6x6
algorithms outperform all others in all scenarios, especially mm3 FWHM Gaussian kernel. Data outside the brain was
the complete spatial overlap case, even in the presence of high removed using a binary mask and the resulting images were
amount of noise. The most well recovered activation patterns vectorized and placed as rows of the matrix Y ∈ Rn×N , where
and time series extracted by all algorithms for the simulated n = 279 being the number of time points and N being the
cases given in Fig. 3 are presented in Fig. 4 for SN R = 0 number of voxels in an image. The DCT basis set with a cut
dB. off frequency of 1/170 Hz was used to get rid of the low
As it can be seen in Fig. 4, zero overlap case (left column), all frequency drifts and the high frequency noise was removed
methods except K-SVDs were able to recover the activation by temporally smoothing the BOLD time-series using a 2.0 s
pattern effectively, with A2 s’ results being better in terms of FWHM Gaussian kernel.
recovered activation patterns, and the recovered time series. 1) Dictionary Learning: The data matrix Y was down-
In partial overlap case (mid column), A1 , A2 , and S1 were sampled by a factor of 8 along the spatial direction in order to
able to recover the activation patterns with A2 giving the best reduce computation time during the dictionary learning stage
results, while both K-SVD, and K-SVDs failed to recover the [1]. All dictionary learning algorithms were used to learn a
activation patterns separately. In case of full spatial overlap, dictionary Dl ∈ Rn×40 with sparse coding stage performed
(right column), only A1 and A2 were able to decompose using the correlation based thresholding [1] with optimal
the data spatially as well as temporally with A2 ’s results sparsity level of s = 2, resulting in a sparse coefficient matrix
being best, while other algorithms failed as evident from their X ∈ R40×N with Y = Dl X. The initial dictionary Dl used
corresponding activation patterns and time series. Although it for K-SVD and S1 was initialized with random time-series
is clear from the Fig. 4, that the KSVDs did recover the time taken from Y, whereas, in case of K-SVDs, and the proposed
series effectively, but it was not able to recover the activation algorithms, the initial dictionary chosen was Dl = Dp Aini
patterns for any case. where Dp is the DCT matrix of size n × 40 and Aini = I40 (an
After establishing the validity of our algorithms on simulated identity matrix of size 40). In order to capture the remaining
data, we now provide 2 examples on real fMRI data to further drift in the signals, the first element of the dictionary was
consolidate the validity of proposed algorithms on real world set to be the DC component, which was never changed
data. during the dictionary update stage. The stopping criterion
for all algorithms was chosen to be at max 30 iterations
C. Sparse GLM Analysis on Experimental Task fMRI dataset or when the difference between dictionaries became smaller
In this section, we have used a single subject (id 100307) than 2 . We tried different values of tuning parameters for
motor-task fMRI dataset to compare the aforementioned dic- all the algorithms and selected the ones which were giving
tionary learning algorithms. This dataset was acquired from the best results in terms of dictionary atoms’ correlation with
the Q1 release of the Human Connectome Project [39]. The the modeled hemodynamic response (MHR) functions. The
acquisition parameters of tfMRI data were: 90 × 104 matrix, selected parameters α1 , α2 , α3 and the atom sparsity level
220mm FOV, 72 slices, TR = 0.72s, TE = 33.1ms, flip angle = used for A1 , A2 , S1 , and KSVDs were (0.2, 0.15, 0.3, 15)
52o , BW = 2290 Hz/Px, in-plane FOV = 208 × 180 mm with respectively.
2.0 mm isotropic voxels. The data was preprocessed following 2) Results: To analyze the recovered dictionaries, we cor-
the preprocessing pipeline consisting of motion correction, related them with the 6 MHRs which were constructed by
temporal pre-whitening, slice time correction, global drift convolving 6 stimulus functions with the canonical HRF. The
removal, and the scans were spatially normalized to a standard correlation coefficients of these MHRs w.r.t. the most corre-
MNI152 template and were resampled to 2x2x2 mm3 voxels. lated dictionary atoms are given in table VII. We then selected
The reader is referred to [40] and [39] for more details the most correlated dictionary atoms w.r.t. the MHR functions
regarding data acquisition and preprocessing. corresponding to the left and right finger tapping tasks as
The experimental design was based on the task developed in regressors for the sparse GLM analysis. These regressors are
[41] where the subjects were asked to squeeze their left or shown in Fig. 5 for comparison. These atoms along with other
right toes, tap their left or right fingers, or move their tongue s − 1 atoms were used to generate the F-statistics maps [1].
in order to map the areas showing activations when the subject These maps were then thresholded at a random field correction
performed these tasks. Before the start of a movement block, p < 0.001. The most descriptive activation maps for left
1 ||D − Di−1 ||F /||Di−1 ||F < , where  = 0.001 2 ||D − Di−1 ||F /||Di−1 ||F < , where  = 0.01
i i

0278-0062 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMI.2017.2699225, IEEE
Transactions on Medical Imaging
8 IEEE TRANSACTIONS ON MEDICAL IMAGING

A A B

B C C

Case (a) Case (b) Case (c)


Fig. 3. Simulated activation patterns for case (a) spatially independent events, (b) partial spatial overlapping events, and (c) complete spatial overlapping
events.

A1

A2

S1

KSVD

KSVDs

Case (a) Case (b) Case (c)


Fig. 4. Simulated results for the scenarios given in Fig. 3 with SN R = 0 dB. Recovered time series and the corresponding activation patterns are presented
for (a) spatially independent events, (b) partial spatial overlapping events, and (c) complete spatial overlapping events with each row containing results
corresponding to different algorithm.

0278-0062 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMI.2017.2699225, IEEE
Transactions on Medical Imaging
SEGHOUANE et al.: BASIS EXPANSION APPROACHES FOR REGULARIZED SEQUENTIAL DICTIONARY LEARNING ALGORITHMS WITH ENFORCED SPARSITY FOR FMRI DATA AN

Left Finger Right Finger Left Finger Right Finger

(0.775)
(0.664)

(a)
(a)

(0.720)
(0.644)

(b)

(b)
(0.680)

(0.804)
(c)

(0.819)
(0.707)

(d)
(c)
(0.660)

(0.831)

(e)

50 100 150 200 250 50 100 150 200 250


(d)
nScans nScans

Fig. 5. The most correlated dictionary atoms (red) w.r.t. the MHR (blue)
recovered by a) K-SVD, b) K-SVDs, c) S1 , d) A1 , and e) A2 . The values in
parenthesis are the corresponding correlation coefficients. (e)

and right finger tapping tasks are given in Fig. 6 showing


that all algorithms were able to localize the neural activity Fig. 6. Most descriptive F-statistics activation maps for left and right finger
tapping tasks, at a random field correction p < 0.001 recovered by a) K-SVD,
in the motor cortex area with the A2 ’s results being most b) K-SVDs, c) S1 , d) A1 , and e) A2 .
specific. Moreover, in Fig. 5 it can be seen that the proposed
algorithms were able to recover smooth dictionary atoms as
compared to the other methods with A2 ’s recovered atom Hz was used to get rid of the low frequency drifts and the
being the smoothest and most highly correlated to the right high frequency noise was removed by temporally smoothing
finger tapping task MHR as well. From table VII, we can see the BOLD time-series using a 1.5s FWHM Gaussian kernel.
that the highest correlated atoms w.r.t. the 5 MHR functions The data matrix Y was down sampled by a factor of 4 along
(except LT) were present in the dictionaries recovered by A1 the spatial direction in order to reduce computation time
and A2 algorithms. In terms of mean correlation coefficients, in the dictionary learning stage. We learned the dictionaries
the A2 ’s results were highest. Moreover, these results are D ∈ Rn×80 using all the algorithms in same way as detailed
consistent with the results presented in [9] for the same dataset. in section V-C1, with the tuning parameters α1 , α2 , α3 and
the atom sparsity level used for A1 , A2 , S1 , and KSVDs were
TABLE VII chosen to be (0.2, 0.2, 0.2, 25) respectively in the same way
C ORRELATION COEFFICIENTS OF HIGHEST CORRELATED ATOMS WITH
MHR FUNCTIONS CORRESPONDING TO L EFT T OE (LT), R IGHT T OE (RT), as discussed in section V-C1. The sparsity level for the sparse
L EFT F INGER (LF), R IGHT F INGER (RF), T ONGUE MOVEMENT TASK , coding stage was set to 2 and the algorithms were iterated
AND V ISUAL C UE AS RECOVERED BY DIFFERENT DICTIONARY LEARNING at max 20 times or till convergence (as mentioned in section
ALGORITHMS .
V-C1) to learn the dictionaries.
LT RT LF RF Tongue Cue Mean 1) Seed-Voxel-based Correlation Analysis: In order to an-
KSVD 0.503 0.517 0.664 0.775 0.724 0.817 0.667
KSVDs 0.439 0.460 0.644 0.720 0.783 0.677 0.621
alyze the rsfMRI dataset, we have used the seed-voxel-based
S1 0.580 0.518 0.680 0.804 0.672 0.787 0.674 correlation analysis technique which is based on the assump-
A1 0.530 0.608 0.707 0.819 0.706 0.772 0.690 tion that while the brain is in resting state, the low-frequency
A2 0.540 0.595 0.660 0.831 0.845 0.835 0.718 temporal fluctuations are correlated in the brain regions which
are functionally connected with each other. Now, in order
for this analysis technique to work, the chosen seed-voxel
D. Resting State fMRI dataset has to belong to a set of correlated voxels corresponding to
The single subject (id 100307) rsfMRI dataset used in this any functionally connected networks (FCN) [42]. These seeds
section was obtained from the Human Connectome Project may correspond to salience network (SN), dorsal attention
Q1 release [39]. The acquisition parameters and preprocessing network (DAN), default mode network (DMN) or any other
information is same as in V-C. The rsfMRI scan run duration FCN. So we have chosen to work with the seed-voxels already
was 14:33 (min:sec) resulting in a total of 1200 scans. We established to be a part of the DMN provided in [43] and [44].
selected the first 420 scans (302.4 sec) and discarded the first The standard MNI coordinates of these seed-voxels are listed
15, leaving us with 405 scans for the analysis. The scans in table VIII.
were spatially smoothed using a 6x6x6 mm3 FWHM Gaussian Now in order to check whether the learned dictionaries have
kernel. The resulting scanned images were vectorized and recovered the time-series corresponding to the seed-voxels,
placed as rows of the matrix Y ∈ Rn×N where n = 405 being we selected a 6x6x6 mm3 (3 voxels from each direction)
the number of time points and N being the number of voxels in cube centered at the given MNI coordinates and calculated
an image. The DCT basis set with a cut off frequency of 1/150 the mean of all 27 voxel time-series. This mean time-series t

0278-0062 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMI.2017.2699225, IEEE
Transactions on Medical Imaging
10 IEEE TRANSACTIONS ON MEDICAL IMAGING

TABLE VIII
MNI COORDINATES OF THE SELECTED S EED -VOXELS IN MM AND THE CORRELATION COEFFICIENT OF THE MOST CORRELATED ATOM WITH THE MEAN
TIME - SERIES OBTAINED FROM 6 X 6 X 6 MM 3 CUBE CENTERED AT THE GIVEN MNI COORDINATES

MNI Coordinates
x y z KSVD KSVDs S1 A1 A2
Ventral Medial Prefrontal Cortex 6 70 14 0.74 0.74 0.78 0.80 0.76
Left Inferior Parietal Lobe -56 -66 24 0.76 0.76 0.79 0.78 0.77
Ventral Posterior Congulate 2 -46 28 0.75 0.70 0.71 0.80 0.76
Precuneus Cortex 9 -70 43 0.72 0.61 0.67 0.75 0.62
Precuneus Cortex -7 -60 22 0.74 0.71 0.69 0.71 0.71
Cingulate Gyrus 5 45 10 0.68 0.66 0.79 0.80 0.74
Mid Frontal Gyrus -27 30 45 0.81 0.66 0.80 0.79 0.76
Mean 0.74 0.69 0.75 0.78 0.73

(a) (b) (c) (d) (e) (a) (b) (c) (d) (e)
150 150

100 100

50 50

0 0

Fig. 7. DMN recovered by the dictionary atoms most correlated with seed- Fig. 9. DMN recovered by the dictionary atoms most correlated with seed-
voxel corresponding to the Ventral Medial Prefrontal Cortex [43] generated voxel corresponding to the Ventral Posterior Cingulate [43] generated with a)
with a) K-SVD, b) K-SVDs, c) S1 , d) A1 , and e) A2 K-SVD, b) K-SVDs, c) S1 , d) A1 , and e) A2

Most correlated atom vs mean TS from VMPC seed-voxel (a) (b) (c) (d) (e)
150
0.74

(a)

100
0.74

(b)
0.78

(c) 50
0.8

(d)

0
0.76

(e) Fig. 10. DMN recovered by the dictionary atoms most correlated with seed-
voxel corresponding to the Precuneus Cortex [44] generated with a) K-SVD,
b) K-SVDs, c) S1 , d) A1 , and e) A2
Fig. 8. Most correlated dictionary atoms (in red) vs the mean time series
t (in blue) extracted from the 6x6x6 mm3 cube centered at the seed-voxel
corresponding to the Ventral Medial Prefrontal Cortex corresponding to a)
K-SVD, b) K-SVDs, c) S1 , d) A1 , and e) A2 . The correlation coefficient is The neural activations corresponding to the highest correlated
also displayed on the left. dictionary atom with the mean time-series (t) were generated
in the same way as described in section V-C2. The resulting
image shows high activations in the regions which have been
was then correlated with all 80 learned dictionary atoms and classified as DMN in [44] and [43]. Three distinct networks
we selected the atom having the maximum correlation with recovered by the most correlated atoms from different dictio-
it as the recovered time-series corresponding to that specific naries are shown in Fig. 7 Ventral Medial Prefronal Cortex
seed region. The correlation coefficient of the most correlated [43], Fig. 9 Ventral Posterior Cingulate [43], and Fig. 10
atom from all learnt dictionaries are presented in table VIII. Precuneus Cortex [44]. By closely examining the activations,
These results have shown that the proposed algorithms are very we have found that the activations recovered by our proposed
competitive against the other algorithms with A1 performing algorithms are tightly localized around the seed region, and
best overall. have good distinct peaks. For the sake of completion, the

0278-0062 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMI.2017.2699225, IEEE
Transactions on Medical Imaging
SEGHOUANE et al.: BASIS EXPANSION APPROACHES FOR REGULARIZED SEQUENTIAL DICTIONARY LEARNING ALGORITHMS WITH ENFORCED SPARSITY FOR FMRI DATA AN

Dictionary convergence rate for all algorithms learning algorithms such as [29] [25] ignoring this structure
1.8
KSVD in the data matrix will result in lower performance. Taking a
1.6 KSVDs
S1
regularized rank-one matrix approximation approach via basis
1.4 A
1
expansion and sparse basis expansion in the dictionary update
A2
stage; dictionary learning methods adapted for data matrices
||Di - D i-1 ||F / ||D i-1 ||F

1.2
whose column domain is structurally smooth were proposed
1 in this paper. The obtained procedures for the dictionary
update stage can be seen as a variant of the power method
0.8
or alternating least square method for computing the SVD in
0.6 which smoothness is introduced via basis expansion and sparse
0.4
basis expansion. The steps of the dictionary update stage of
this algorithm were derived by introducing regularization in
0.2
the left singular vector when solving the rank one matrix
0 approximation problem through alternating least square. The
0 5 10 15 20 25 30
Iteration Number
performance of the proposed dictionary learning algorithms
were illustrated on both simulated and real fMRI datasets. The
Fig. 11. Relative change in the dictionary D w.r.t. Frobenious norm as a experimental results obtained on applications with simulated
function of iteration number. datasets showed that the proposed algorithms provided per-
formance improvement compared to some existing algorithms
corresponding atoms used to recover the activations given in but also that regularization via basis expansion outperforms
Fig. 7 are also shown in Fig. 8 where the atom recovered by regularization via sparse basis expansion.
A1 has the highest correlation with the corresponding mean
time series t. R EFERENCES
To summarize, we tested the proposed method’s performance
w.r.t. other methods on simulated as well as real world data. In [1] K. Lee, S. K. Tak, and J. C. Ye, “A data driven sparse GLM for fMRI
the experiment of dictionary recovery, the proposed methods analysis using sparse dictionary learning and MDL criterion,” IEEE
Transactions on Medical Imaging, vol. 30, pp. 1176–1089, 2011.
outperformed other methods and were able to recover the [2] V. Abolghasemi, S. Ferdowsi, and S. Sanei, “Fast and incoherent
underlying ground truth dictionary effectively even in low dictionary learning algorithms with application to fMRI,” Signal, Image
SNR. In case of signal separation experiment of simulated and Video Processing, vol. 9, pp. 147–158, 2013.
[3] M. U. Khalid and A. K. Seghouane, “Improving functional connectivity
fMRI dataset, our proposed algorithms were able to recover detection in fMRI by combining sparse dictionary learning and canonical
the time series as well as the corresponding activations even in correlation analysis,” In Proceedings of IEEE International Symposium
the case of full overlap as seen in Fig. 4 case c). Then we tested on Biomedical Imaging, pp. 286–289, 2013.
[4] ——, “Constrained maximum likelihood based efficient dictionary learn-
our algorithms on the real world fMRI datasets i.e. task-fMRI ing for fMRI analysis,” In Proceedings of IEEE International Symposium
and resting-state fMRI. In these experiments, our algorithms on Biomedical Imaging, pp. 45–48, 2014.
were able to extract the underlying time-series well in both [5] ——, “Multi-subject fMRI connectivity analysis using sparse dictionary
learning and multiset canonical correlation analysis,” In Proceedings of
cases and the resulting activations were also tightly localized IEEE International Symposium on Biomedical Imaging, pp. 683–686,
having distinct peaks. 2015.
A complete analysis of the convergence properties of the [6] J. L. et al., “Sparse representation of whole brain fMRI signals for
proposed algorithms is not provided in this paper. However, we identification of functional networks,” Medical Image Analysis, vol. 20,
pp. 112–134, 2015.
use the block design auditory dataset to provide a numerical [7] X. J. et al., “Sparse representation of HCP grayordiante data reveals
analysis of the convergence for all the algorithms. In order novel functional architecture of cerebral cortex,” Human Brain Mapping,
to illustrate the convergence, at every iteration, we inspect the vol. 36, pp. 5301–5319, 2015.
[8] S. Z. et al., “Characterizing and differentiating task-based and resting
relative change of the dictionary D w.r.t. the Frobenious norm, state fMRI signals via two-stage sparse representations,” Brain Imaging
that is defined as and Behavior, vol. 10, pp. 21–32, 2015.
||Di − Di−1 ||F [9] ——, “Supervised dictionary learning for inferring concurrent brain
||Di−1 ||F networks,” IEEE Transactions on Medical Imaging, vol. 34, pp. 2036–
2045, 2015.
where Di indicates the dictionary at iteration i. One can [10] X. H. et al., “Sparsity constrained fMRI decoding of visual saliency in
naturatistic video steams,” IEEE Transactions on Autonomous Mental
observe from the Fig. 11 that relative change of the dictionaries Development, vol. 7, pp. 65–75, 2015.
does converge as the number of iterations increases. However, [11] G. V. et al., “Multi-subject dictionary learning to segment an atlas of
the relative change after 12-15th iteration is very small, thus brain spontaneous activity,” Information Processing in Medical Imaging,
Lectures in computer Science, vol. 6801, pp. 562–573, 2011.
our choice of using 20, 30 iterations to learn the dictionaries.
[12] A. K. Seghouane and Y. Saad, “Prewhitening high dimensional fmri
data sets without eigendecomposition,” Neural Computation, vol. 26,
VI. C ONCLUSION pp. 907–919, 2014.
[13] M. McKeown and T. Sejnowski, “Independent component analysis of
Big datasets arising for example from spatio-temporal mea- fMRI data: examining the assumptions,” Human Brain Mapping, vol. 6,
surements as in fMRI studies can be structurally smooth. In pp. 368–372, 1998.
[14] V. D. Calhoun, T. Adali, G. Pearlson, and J. J. Pekar, “A method for
this case the dataset reshaped as a spatio-temporal matrix making group inferences from functional MRI data using independent
is structured in the column domain and classical dictionary component analysis,” Human Brain Mapping, vol. 13, pp. 43–53, 2001.

0278-0062 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMI.2017.2699225, IEEE
Transactions on Medical Imaging
12 IEEE TRANSACTIONS ON MEDICAL IMAGING

[15] ——, “Spatial and temporal independent component analysis of func- for the human connectome project,” NeuroImage, vol. 80, pp. 105–124,
tional MRI data containing a pair of task related waveforms,” Human 2013.
Brain Mapping, vol. 14, pp. 140–151, 2001. [41] R. L. Buckner, F. M. Krienen, A. Castellanos, J. C. Diaz, and B. T. T.
[16] H. Eavani, R. Filipovych, C. Davatzikos, T. D. Satterthwaite, R. E. Gur, Yeo, “The organization of the human cerebellum estimated by intrinsic
and R. C. Gur, “Sparse dictionary learning of resting state fMRI,” In functional connectivity,” Journal of Neurophysiology, vol. 106, no. 5,
Pattern Recognition Neuroimaging, pp. 73–76, 2014. pp. 2322–2345, 2011.
[17] L. Wang, Y. Zang, Y. He, M. Liang, X. Zhang, L. Tian, T. Wu, T. Jiang, [42] C. F. Beckmann, M. DeLuca, J. T. Devlin, and S. M. Smith, “Inves-
and K. Li, “Changes in hippocampal connectivity in the early stages of tigations into resting-state connectivity using independent component
Alzheimer’s disease: evidence from resting state fMRI,” Neuroimage, analysis,” Philosophical Transactions of the Royal Society of London B:
vol. 31, pp. 496–504, 2006. Biological Sciences, vol. 360, pp. 1001–1013, 2005.
[18] G. M. Boynton, S. A. Engel, G. H. Glover, and D. J. Heeger, “Linear [43] R. Leech, S. Kamourieh, C. F. Beckmann, and D. J. Sharp, “Fractionat-
systems analysis of functional magnetic resonance imaging in human ing the default mode network: distinct contributions of the ventral and
V1,” Neuroscience, vol. 16, pp. 4207–4221, 1996. dorsal posterior cingulate cortex to cognitive control,” The Journal of
[19] K. J. Worlsey, C. H. Liao, J. Aston, V. Petre, G. H. Duncan, F. Morales, Neuroscience, vol. 31, pp. 3217–3224, 2011.
and A. C. Evans, “A general statistical analysis for fMRI,” NeuroImage, [44] V. D. Calhoun and T. Adali, “Multisubject independent component
vol. 15, pp. 1–15, 2002. analysis of fMRI: A decade of intrinsic networks, default mode, and
[20] J. O. Ramsay and B. W. Silverman, Functional Data Analysis. Sprinver- neurodiagnostic discovery,” IEEE Reviews in Biomedical Engineering,
Verlag, 2005. vol. 5, pp. 60–73, 2012.
[21] A. K. Seghouane and M. Hanif, “A sequential dictionary learning
algorithm with enforced sparsity,” IEEE International Conference on
Acoustic Speech and signal Processing, ICASSP, pp. 3876–3880, 2015.
[22] M. Sadeghi, M. Babie-Zadeh, and C. Jutten, “Learning overcomplete
dictionaries based on atom by atom updating,” IEEE Transactions on
Signal Processing, vol. 62, pp. 883–891, 2014.
[23] G. H. Golub and C. f. Van Loan, Matrix Computations. Johns Hopkins,
1996.
[24] R. Rubinstein, M. Zibulevsky, and M. Elad, “Double sparsity: Learning
sparse dictionaries for sparse signal approximation,” IEEE Transactions
on Signal Processing, vol. 58, pp. 1553–1564, 2010.
[25] M. Aharon, M. Elad, and alfred Bruckstein, “K-SVD: Anlgorithm
for desiging overcomplete dictionaries for sparse representation,” IEEE
Transactions on Signal Processing, vol. 54, pp. 4311–4322, 2006.
[26] A. K. Seghouane and M. Bekara, “A small sample model selection cri-
terion based on the Kullback symmetric divergence,” IEEE Transactions
on Signal Processing, vol. 52, pp. 3314–3323, 2004.
[27] A. K. Seghouane, “Asymptotic bootstrap corrections of AIC for linear
regression models,” Signal Processing, vol. 90, pp. 217–224, 2010.
[28] J. Tropp and S. J. Wright, “Computational methods for sparse solution
of linear inverse problems,” Proceedings of the IEEE, vol. 98, pp. 948–
958, 2010.
[29] K. Engan, S. O. Aase, and J. Hakon-Husoy, “Method of optimal
directions for frame design,” IEEE Int. Conference on Acoustics, Speech,
and Signal Processing, pp. 2443–2446, 1999.
[30] K. Skretting and K. Egan, “Recursive least squares dictionary learning
algorithm,” IEEE Transactions on Signal Processing, vol. 58, pp. 2121–
2130, 2010.
[31] M. Hanif and A. K. Seghouane, “Maximum likelihood orthogonal
dictionary learning,” IEEE Workshop on Statistical Signal Processing
(SSP), pp. 1–4, 2014.
[32] K. K.-D. et, J. F. Murray, B. D. Rao, K. Engan, T. W. Lee, and T. J.
Sejnowski, “Dictionary learning algorithms for sparse representation,”
Neural Computation, vol. 15, pp. 349–396, 2003.
[33] M. S. Lewicki and T. J. Sejnowski, “Learning overcomplete representa-
tions,” Neural Computation, vol. 12, pp. 337–365, 2000.
[34] S. K. Sahoo and A. Makur, “Dictionary training for sparse representa-
tion as generalization of K-means clustering,” IEEE Signal Processing
Letters, vol. 20, pp. 587–590, 2013.
[35] J. Z. Huang, H. Shen, and A. Buja, “The analysis of two-way functional
data using two-way regularized singular value decompositions,” Journal
ot the American Statistical Association, vol. 104, pp. 1609–1620, 2009.
[36] S. Mallat, A Wavelet Tour of Signal Processing. Academic Press, 1999.
[37] I. Johnstone and A. Lu, “On consistency and sparsity for principal
components analysis in high dimensions,” Journal ot the American
Statistical Association, vol. 104, pp. 682–693, 2009.
[38] K. J. Friston, J. T. Ashburner, S. J. Kiebel, T. E. Nichols, and W. D.
Penny, Statistical Parametric Mapping: The Analysis of Functional
Brain Images. Academic Press, 2006.
[39] D. M. Barch, G. C. Burgess, M. P. Harms, S. E. Petersen, B. L. Schlag-
gar, M. Corbetta, M. F. Glasser, S. Curtiss, S. Dixit, C. Feldt, D. Nolan,
E. Bryant, T. Hartley, O. Footer, J. M. Bjork, R. Poldrack, S. Smith,
H. Johansen-Berg, A. Z. Snyder, and D. C. Van Essen, “Function in the
human connectome: Task-fmri and individual differences in behavior,”
NeuroImage, vol. 80, pp. 169–189, 2013.
[40] M. F. Glasser, S. N. Sotiropoulos, J. A. Wilson, T. S. Coalson, B. Fischl,
J. L. Andersson, J. Xu, S. Jbabdi, M. Webster, J. R. Polimeni, D. C.
Van Essen, and M. Jenkinson, “The minimal preprocessing pipelines

0278-0062 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

You might also like