Professional Documents
Culture Documents
2008 - Bredies - Iterated Hard Shrinkage For Minimization Problems With Sparsity Constraints
2008 - Bredies - Iterated Hard Shrinkage For Minimization Problems With Sparsity Constraints
2008 - Bredies - Iterated Hard Shrinkage For Minimization Problems With Sparsity Constraints
c 2008 Society for Industrial and Applied Mathematics
Vol. 30, No. 2, pp. 657–683
Abstract. A new iterative algorithm for the solution of minimization problems in infinite-
dimensional Hilbert spaces which involve sparsity constraints in form of p -penalties is proposed.
In contrast to the well-known algorithm considered by Daubechies, Defrise, and De Mol, it uses
hard instead of soft shrinkage. It is shown that the hard shrinkage algorithm is a special case of
the generalized conditional gradient method. Convergence properties of the generalized conditional
gradient method with quadratic discrepancy term are analyzed. This leads to strong convergence
of the iterates with convergence rates O(n−1/2 ) and O(λn ) for p = 1 and 1 < p ≤ 2, respectively.
Numerical experiments on image deblurring, backwards heat conduction, and inverse integration are
given.
Key words. sparsity constraints, iterated hard shrinkage, generalized conditional gradient
method, convergence analysis
DOI. 10.1137/060663556
Ku − f 2
(1) Ψ(u) = + wk |u, ψk |p ,
2
k
∗ Received by the editors June 22, 2006; accepted for publication (in revised form) June 28, 2007;
computation) the Besov seminorm Φ(u) = |u|B1,1 1 , while the latter can be
has precisely the same form as (1). See also [18, 24].
(iv) Operator equations with sparse frame expansions [23]. One can drop the
assumption that the solution has a sparse representation in a given basis
and consider the solution to be sparse in a given frame {ηk } (see [7] for
an introduction to frames). If one wants to solve the equation Ku = f
under theassumption that u has a sparse representation in the given frame,
i.e., u = k ak ηk with only a few ak = 0, one solves the minimization problem
KF a − f 2
min2 + w0 |ak |
a∈ 2
k
(a = (ak ) and F a = k ak ηk ).
Problems of type (1) are well-analyzed in the case of finite-dimensional Hilbert
spaces; see, e.g., [5, 9, 14, 16, 17, 19, 21]. While in the finite-dimensional case there are
powerful minimization techniques based on convex programming [5, 3], the infinite-
dimensional case is much harder. One well-understood algorithm for the minimization
of Ψ is the iterated soft shrinkage algorithm introduced independently in [15, 22]. The
algorithm is analyzed in [1, 6, 8], while in [6, 8] the authors also show convergence
of the algorithm in the infinite-dimensional setting. In [8] the authors use tech-
niques based on optimization transfer, and in [6] the notion of proximal mappings
and forward-backward splitting is used. In this paper we utilize a generalization of
the conditional gradient method; see [2].
While the question of convergence is answered, it is still open how fast the iteration
converges in the infinite-dimensional case. To our knowledge no convergence rates
have been proven for the iterated soft shrinkage. The main contribution of this article
is a new minimization algorithm for which convergence rates are proved. Namely, we
prove that our algorithm converges linearly for p > 1 and like n−1/2 for p = 1.
The article is organized as follows: In section 2 we briefly introduce our new algo-
rithm and state the main results. Section 3 is devoted to the analysis of convergence
rates of the generalized conditional gradient method for functionals F + Φ, with a
smooth part F and a nonsmooth part Φ. We consider a special case, adapted to the
problem of minimizing (1). The application of the results to the special case (1) is
given in section 4, where explicit rates of the convergence of the new algorithm are
Downloaded 12/30/12 to 139.184.30.136. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
derived and the main results on convergence rates are proven. Section 5 presents nu-
merical experiments on the convergence of our algorithm and compares our algorithm
and the iterated soft shrinkage.
2. An iterative hard shrinkage algorithm. We state the problem of mini-
mizing (1), i.e.,
min Ψ(u),
u∈H1
and
⎧
⎪
⎨ |x|
1/(p−1)
p sgn(x) for |x| ≤ pS0p−1 ,
(4) Hp (x) =
⎪
⎩ S0 x
2−p
1 0 if |x| ≤ 1,
|x| = 0
∞ if |x| > 1.
Note that ϕp is the usual power for small values and becomes quadratic outside
of [−S0 , S0 ] in a C 1 -way. The function Hp is a kind of shrinkage for small values
(remember that 1 ≤ p ≤ 2) and linear outside of [−S0 , S0 ]. Also note that Hp and
ϕp satisfy Hp (x) ∈ ∂ϕ−1 p (x), a relation which is explained later in section 4. See
Figures 1 and 2 for illustrations of ϕp and Hp , respectively.
The minimization algorithm, which turns out to be a special case of the general-
ized conditional gradient algorithm, now reads as follows.
Algorithm 1.
1. Initialization. Set u0 = 0 and n = 0.
2. Direction determination. For un ∈ 2 calculate
v n = Hp,w −K ∗ (Kun − f ) ,
3 3 3
2 2 2
ϕ1.25 (t)
ϕ1.5 (t)
ϕ1 (t)
1 1 1
0 0 0
−2 −1 0 1 2 −2 −1 0 1 2 −2 −1 0 1 2
t t t
2 2 2
1 1 1
H1.25 (s)
H1.5 (s)
H1 (s)
0 0 0
−1 −1 −1
−2 −2 −2
−2 −1 0 1 2 −2 −1 0 1 2 −2 −1 0 1 2
s s s
where
∗
− K (Ku n
− f )
(5) Hp,w −K ∗ (Kun − f ) k = Hp k
,
wk
rates for this procedure under certain conditions. We then apply these results to a
functional similar to (2) and verify that the convergence criteria are satisfied.
Downloaded 12/30/12 to 139.184.30.136. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
where F is smooth but nonconvex and Φ is convex but possibly nonsmooth, resulting
in a nonconvex and nonsmooth Ψ. Here we consider the special case where F (u) =
2 Ku − f , so problem (7) is convex but still potentially nonsmooth.
1 2
The generalized conditional gradient method applied to (7) and an explicit step-
size rule gives the following algorithm
Algorithm 2.
1. Initialization. Set n = 0, and choose u0 such that Φ(u0 ) < ∞.
2. Direction search. For n ≥ 0, calculate a minimizer of the approximate
problem
The expressions rn and Dn will be used as abbreviations for r(un ) and D(un ), respec-
tively.
Note that from the minimization property (8) immediately follows that
These terms, and especially D, play a central role in the convergence analysis of this
algorithm. Since we regard it as a generalization, the ideas utilized in the following
are inspired by [12], where the analysis is carried out for the well-known conditional
gradient method.
To prove convergence for Algorithm 2 we first derive that D serves as an estimate
for r, i.e., for the distance to the minimal value of Ψ in (7):
Lemma 3. For the functions D and r according to (10) and (11), respectively,
we have the relation D ≥ r. In particular, D(u) = 0 if and only if u is a solution of
minimization problem (7).
Proof. Let u ∈ H1 be given. Choose a u∗ which satisfies
Since the minimization problem is well-posed (see [13], for example), such a u∗ can
always be found.
First observe that always
Ku − f 2 Ku∗ − f 2
(13) K ∗ (Ku − f ), u − u∗ ≥ − .
2 2
Now
K ∗ (Ku − f ), u + Φ(u)− min K ∗ (Ku − f ), v + Φ(v)
v∈H1
by the definition of the minimum and (13). The characterization D(u) = 0 if and only
if u is optimal is just a consequence of the second statement of Proposition 2.
Remark 2. One immediate consequence is that the step-size rule (9) always
produces sn ∈ [0, 1]. Moreover, in case the solution is unique, sn = 0 or v n = un if
Downloaded 12/30/12 to 139.184.30.136. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
and only if un is the solution of the problem: The sufficiency follows directly from the
representation of Dn in (12), while the necessity can be seen as follows. If Kun = Kv n ,
then, according to (9), sn = 0 since Dn = 0. The case Kun = Kv n implies with (12)
that Ψ(un ) = Ψ(v n ). Consequently, un = v n by uniqueness of solutions.
Remark 3. The above algorithm can be interpreted as a modification of the
steepest-descent/Landweber algorithm for the minimization of 12 Ku − f 2 . Denote
by T the (set-valued) solution operator of the minimization problem (8).
The steepest-descent algorithm produces iterates un+1 = un + sn (v n − un ) ac-
cording to
Kun − f , K(un − v n )
v n = −K ∗ (Kun − f ), sn = .
K(v n − un )2
In comparison, Algorithm 2 also produces in the same manner, with similar directions
and step sizes:
v n ∈ T −K ∗ (Kun − f ) ,
Φ(un ) − Φ(v n ) + Kun − f , K(un − v n )
sn = min 1, .
K(v n − un )2
Note that, in the generalized conditional gradient algorithm, the descent direction
of the steepest descent of the quadratic part F is applied to a generally nonlinear
operator. Likewise, the step size is essentially the one used in the steepest-descent
algorithm, except for the presence of Φ. Finally, in the iteration step we can allow
only convex combinations, and therefore it differs with respect to this restriction.
Now for the convergence analysis we note that this generalized conditional gradi-
ent algorithm has very convenient descent properties.
Lemma 4. Algorithm 2 produces a sequence {un } (and {v n }), for which the
corresponding rn satisfy
⎧
⎨ −rn if Dn ≥ K(v n − un )2 ,
2
(14) rn+1 − rn ≤
⎩ −rn2
rn+1 − rn ≤ −qrn2
for all n ≥ 0.
Proof. First note that
F un + sn (v n − un ) − F (un )
K un + sn (v n − un ) − f 2 Kun − f 2
= −
2 2
s2n K(v n − un )2
= sn Kun − f , K(v n − un ) + ,
2
−rn2
rn+1 − rn ≤ .
2r0
In the other case, we want to derive an estimate for K(v n − un )2 . Since {rn } is
bounded and Φ is coercive, there has to be a C1 > 0 such that un ≤ C1 for all
n. From convex analysis we know that the solution operator of the minimization
problem (5) in step 2 is bounded, whenever the property Φ(u)/u → ∞ if u → ∞
is satisfied (see [20], for example). Thus, it follows that v n ≤ C2 for some constant
C2 > 0. This gives the estimate
−rn2
rn+1 − rn ≤ 2
.
2K (C1 + C2 )2
Finally, one obtains the desired estimate if one puts the two cases together and
sets
1 1
q = min , .
2r0 2K2 (C1 + C2 )2
Such an estimate immediately implies that the distances to the minimum behave
like O(n−1 ).
rn ≤ Cn−1
Downloaded 12/30/12 to 139.184.30.136. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
1 1 rn − rn+1 qrn2
− = ≥ ≥ q > 0,
rn+1 rn rn+1 rn rn+1 rn
and summing up yields
1 1 1
n−1
1
− = − ≥ q(n − 1).
rn r0 r
i=0 i+1
r i
D ≥ r ≥ R ≥ 0;
i.e., we are able to estimate the distance to the minimizer from above and below.
Now the key to proving the strong convergence of sequences {un } generated by
Algorithm 2 is to examine the growth behavior of R in the neighborhood of u∗ . This
is done in the following theorem, which is also the main result on convergence rates
for the generalized conditional gradient method for problems of type (7).
un − u∗ ≤ Cn−1/2 .
un − u∗ ≤ Cλn/2 , rn ≤ r0 λn
Proof. Assume that this is not the case. Then there exists a sequence {un } ⊂ H1
with un = 1 and Kun +PM un ≤ n1 . It is immediate that there is a subsequence
which converges weakly to some u ∈ H1 . Moreover, Kun → 0 = Ku as well as
PM un → 0 = PM u, and, since M ⊥ is finite-dimensional, PM ⊥ un → PM ⊥ u. In
particular, un → u, so u = 1 and Ku = 0, a contradiction to the injectivity
of K.
Proof of Theorem 7. From Lemma 5 we know that the distances rn converge to
zero with estimate rn ≤ C1 n−1 . It is also clear that un ≤ C2 for some C2 > 0, so
we can find an L > 0 such that un − u∗ ≤ L. By assumption and convexity of F ,
The convergence statement in the case of uniqueness of the solution then follows from
the usual subsequence argument.
(19) u ≤ C3 Ku + PM u .
Kun − f 2 Ku∗ − f 2
rn = + Φ(un ) − − Φ(u∗ )
2 2
K(un − u∗ )2
= .
2
Together with (19) it follows that
un − u∗ 2 ≤ 2C32 K(un − u∗ )2 + PM (un − u∗ )2
≤ 2C1 C32 2 + c(L)−1 n−1 ,
We aim at showing that the solution operator T of the minimization problem (8) is
locally Lipschitz continuous in u∗ in some sense. Choose a u ∈ H1 with u − u∗ ≤ L,
and denote by v ∈ T (u) a solution of (8). Since v solves the minimization problem,
it holds that
and thus
It follows that
K2
v − u∗ ≤ u − u∗ ,
c(L)
2
K2
≤ 2K 2
1+ un − u∗ 2 = C4 un − u∗ 2 .
c(L)
(see also (3)), which turns out to be exactly Algorithm 1. It is proven that (21)
admits exactly the same minimizers as (2), and the convergence rates then follow
from applying Theorem 7. Finally, we conclude with some remarks on the algorithm.
In the following, we split the functional Ψ̃ = F + Φ, with
∞
1
F (u) = Ku − f 2 , Φ(u) = wk ϕp (uk )
2
k=1
for all u ∈ H1 . Thus, u∗ is also in minimizer for Ψ̃. On the other hand, if u∗ is not a
minimizer for Ψ, then there exists a u ∈ H1 with u ≤ S0 such that
for |t| ≥ S0 .
Remark 5. We remark on the possibility of normalizing the constant S0 . For
f 2 1/p
given f and wk we have S0 = 2w0 and define K N = S0 K, wkN = S0p wk . With
the substitution uN = S0−1 u, problem (2) becomes
∞
K N uN − f 2 N N p
min + wk |uk | .
uN ∈H1 2
k=1
f 2 1/p
This results in a constant S0N = 2wN = 1.
0
In order to apply the convergence results of the previous section, we have to verify
that Algorithm 1 corresponds to Algorithm 2 in the case of (21). This will be done
in the following. First, we check that the algorithm is indeed applicable; i.e., we show
that the functional Φ meets Condition 1.
∞
Φ(u) = wk ϕ(uk )
k=1
is proper, convex, and lower semicontinuous and fulfills Φ(u)/u → ∞ when u →
∞.
Proof. To see that Φ is proper, convex, and lower semicontinuous, we refer to the
standard literature on convex analysis [13].
To establish the desired coercivity, suppose the opposite, i.e., that there is a
sequence with un → ∞ and Φ(un )/un ≤ C1 for a C1 > 0. If there exists a
C2 > 0 such that |unk | ≤ C2 for all k, n ≥ 1, then there follows, with the help of
|unk |2 ≤ C22−p |unk |p as well as |t|p ≤ ϕ(t),
∞
∞ ∞
C22−p C 2−p C 2−p
un 2 = |unk |2 ≤ wk |unk |p ≤ 2 wk ϕ(unk ) = 2 Φ(un ).
w0 w0 w0
k=1 k=1 k=1
Φ(unl )
≥ w0 ϕ(unkll ) → ∞ for l → ∞,
unl
which is again a contradiction.
It is not hard to verify that all ϕp according to (3) satisfy the requirements of the
lemma. In particular, the property ϕp (t)/|t| → ∞ whenever |t| → ∞ follows from the
quadratic extension outside of [−S0 , S0 ].
The next step is to observe that Algorithm 2 for the modified functional (21) is
given by Algorithm 1.
Proposition 11. In the situation of Proposition 9, the generalized conditional
gradient method (Algorithm 2) for solving the minimization problem (21) is realized
by Algorithm 1.
Proof. The proof is given by analyzing the steps of Algorithm 2 and comparing
them with Algorithm 1.
Regarding step 1, choosing u0 = 0 as the initialization is feasible since always
Φ(0) = 0. The direction search in step 2 amounts to solving the minimization problem
∞
(22) min K ∗ (Ku − f ) k (vk − uk ) + wk ϕp (uk ),
v∈H1
k=1
which can be done pointwise for each k ≥ 1. This involves the solution of
for given s, t ∈ R and w̄ > 0, for which the solution can be derived as follows: First
note that t ∈ R is a minimizer if and only if − w̄s ∈ ∂ϕp (t). For 1 < p ≤ 2, this
subgradient reads as
⎧
⎨{p sgn(t)|t|p−1 }
Downloaded 12/30/12 to 139.184.30.136. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
for |t| ≤ S0 ,
∂ϕp (t) =
⎩{ 2−p t}
p
for |t| > S0 ,
S 0
while
⎧
⎪
⎪[−1, 1] for t = 0,
⎨
∂ϕ1 (t) = {sgn(t)} for |t| ≤ S0 ,
⎪
⎪
⎩ t
{ S0 } for |t| > S0 .
A solution t is then given by t ∈ (∂ϕ)−1 (− w̄s ), which necessarily means, for 1 < p ≤ 2,
that
⎧
⎨sgn − s − s 1/(p−1) for − s ≤ pS p−1 ,
w̄p w̄p w̄ 0
t=
⎩− 0
for − w̄s > pS0p−1 ,
p−2
sS
w̄p
where the breaking points ±pS0p−1 are of course given by the values of ∂ϕp (±S0 ). For
p = 1 a similar situation occurs:
⎧
⎪
⎪ {0} for − w̄s < 1,
⎨
t ∈ sgn − w̄s [0, S0 ] for − w̄s = 1,
⎪
⎪
⎩ sS0
− w̄ for − s > 1.
w̄
Since
s the algorithm does not demand a particular solution, we can choose t = 0 if
− = 1. Putting the cases p = 1 and 1 < p ≤ 2 together gives that the pointwise
w̄
minimization problem is indeed solved by
s
t = Hp − ,
w̄
with Hp according to (4). Consequently, the operator
v = Hp,w −K ∗ (Ku − f )
Lemma 12. Let 1 < p ≤ 2. For each C1 > 0 and L > 0 there exists a c1 (L, C1 ) >
0 such that
|t|p − |s|p − p sgn(s)|s|p−1 (t − s) ≥ c1 (L, C1 )(t − s)2
for all |s| ≤ C1 and |t − s| ≤ L.
Proof. First note that ϕ(t) = |t|p is twice differentiable with the exception that
the second derivative does not exist in 0:
ϕ (t) = p sgn(t)|t|p−1 , ϕ (t) = p(p − 1)|t|p−2 .
Since ϕ is locally integrable, the Taylor formula
t
ϕ(t) = ϕ(s) + ϕ (s)(t − s) + ϕ (τ )(t − τ )dτ
s
t
⇒ |t| = |s| + p sgn(s)|s|
p p p−1
(t − s) + p(p − 1) |τ |p−2 (t − τ )dτ
s
still holds. By assumption |s|, |t| ≤ C1 + L, and hence |τ |p−2 ≥ |C1 + L|p−2 for
|τ | ≤ C1 + L, showing that
t
|t| ≥ |s| + p sgn(s)|s| (t − s) + p(p − 1)(C1 + L)
p p p−1 p−2
(t − τ )dτ
s
for each |s| ≤ C1 , |s − t| ≤ L, remembering that ϕp (s) = |s|p for |s| ≤ C1 and
ϕp (t) ≥ |t|p . Hence, if v − u∗ ≤ L,
Downloaded 12/30/12 to 139.184.30.136. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
∞
R(v) = wk ϕp (vk ) − ϕp (u∗k ) − p sgn(u∗k )|u∗k |p−1
k=1
∞
≥ w0 c1 (L) |vk − u∗k |2 = c(L)v − u∗ 2 .
k=1
Due to the construction of ϕ1 we can estimate |t| ≤ ϕ1 (t), which further leads to
∞
∞
∞
|vk | w0
R(v) ≥ wk |vk | − vk ξk ≥ w0 = |vk − u∗k |.
2 2
k=k0 k=k0 k=k0
Define
M = {u ∈ 2 : uk = 0 for k < k0 },
Collecting the results of this section, we are able to apply Theorem 7, which
finally gives our main convergence result for the iterated hard shrinkage procedure in
Algorithm 1.
Theorem 14. Consider minimization problem (2). If 1 < p ≤ 2, then Algo-
rithm 1 produces a sequence {un } which converges linearly to the unique minimizer
u∗ , i.e.,
un − u∗ ≤ Cλn
for a 0 ≤ λ < 1.
If p = 1 and K is injective, then the descent algorithm produces a sequence {un }
which converges strongly to the unique minimizer u∗ with speed
u∗ − un ≤ Cn−1/2 .
Proof. First note that, by Proposition 9 and Remark 4, the minimization problem
(2) is equivalent to (21), and the latter can be solved instead. Additionally, one knows
that (2) is well-posed, so at least one optimal solution u∗ ∈ H1 exists. As already
Downloaded 12/30/12 to 139.184.30.136. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
in step 3 of the algorithm can be used as an a posteriori error bound on the distance
to the minimizer; i.e., it holds that Dn ≥ Ψ̃(un ) − minu∈2 Ψ̃(u); see Lemma 3. So
one can use the stopping criterion Dn < ε to assure that the minimal value is reached
up to a certain tolerance ε > 0 in case of convergence; see the appendix for details.
Remark 9. Note that if p = 1 the penalty functional
∞
Φ(u) = wk |uk |
k=1
with a small ε > 0, (see, e.g., [25], where this way was introduced for regularizing
the T V norm). This always introduced some deviation to the real solution and posed
numerical difficulties for very small ε. Especially the desired property of sparseness
of the solution is lost.
In the algorithm presented above, we do in some sense the opposite: We modify
the modulus function for large values in order to make the generalized conditional
gradient method applicable. In this case, the modification is outside of the domain
relevant for the minimization problem (2), and the solutions obtained are the exact
sparse solutions.
5. Numerical experiments. To illustrate the convergence behavior of the it-
erated hard shrinkage algorithm as stated in Algorithm 1, we performed numerical
tests on three linear model problems and compared the results to the iterated soft
shrinkage algorithm introduced in [15, 22]. Our primary aim is to demonstrate the
Downloaded 12/30/12 to 139.184.30.136. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
applicability of the new algorithm; we thus perform the experiments for well-known
problems in image processing and inverse problems.
N M 2 N M
(u ∗ g)ij − fij
min + wk |u, ψk |p ,
u
i=1 j=1
2
k=1
where ψk is the discrete two-dimensional wavelet basis spanning the space RN M and
∗ denotes the usual discrete convolution with a kernel g = {gij }.
For our computations, we used an out-of-focus kernel g with radius r = 6 pixels
which has been normalized to |g| = 0.99 such that the associated operator’s norm
is strictly below 1. The original image of size 256 × 256 pixels with values in [0, 1] was
blurred with that kernel and in one case disturbed with Gaussian noise of variance
η (see Figure 3). Then Algorithm 1 as well as the iterated soft shrinkage procedure
were applied for a suitable number of iterations with a wavelet decomposition up to
level 5 and the following parameters:
1. p = 1, wk = 0.00002, η = 0;
2. p = 1.75, wk = 0.005, η = 0.025.
For the comparison of these two methods, we plotted the functional values at each
iteration step. The results for the two deblurring tests are depicted in Figure 4.
Note that, if 1 < p < 2, we observed that the hard shrinkage iteration usually
performs faster than the soft shrinkage algorithm, although we did not optimize for
computational speed. The reason for this is that the iterated soft shrinkage algorithm
demands the solution of
x + w̄p sgn(x)|x|p−1 = y
for all basis coefficients in each iteration step which has to be done by numerical
approximation. Experiments show that it is necessary to compute the solution suffi-
ciently precise, since otherwise an increase of the functional value is possible. This is
a relatively time-consuming task even if the quadratically convergent Newton method
is used. Alternatively, one can use tables to look up the required values, but this still
has to be done for a wide range of numbers and up to a high precision. By com-
parison, in the iterated hard shrinkage method (Algorithm 1), only the evaluation of
|x|1/(p−1) is necessary which can usually be done significantly faster and requires no
special implementation.
uorig
f1 f2
Fig. 3. The images used as measured data f for the deblurring problem. From left to right, the
original image, the blurred image without noise, and the noisy blurred image used for tests 1 and 2,
respectively, are depicted.
u(0, x) = u0 (x),
u(t, 0) = u(t, 1) = 0.
With K we denote the operator which maps the initial condition onto the solution of
the above equation at time T . The problem of finding the initial condition u0 from
the measurement of the heat distribution f at time T is thus formulated as solving
Ku0 = f.
M 2
N
(Ku)j − fj
min + wk |uk |p .
u
j=1
2
k=1
To test the algorithm, we created an initial distribution u0 with one spike. The data
f = Ku + δ is degraded with a relative error of 15% (see Figure 5).
0.15
Ψ(un )
0.1
0.05
100 200 300 400 500
n
u1,h u1,s
97
96.8
Ψ(un )
96.6
96.4
Fig. 4. The results of the deblurring tests. In the left column, you can see the reconstructed
u after 500 iterations with the iterated hard shrinkage method, while, in the middle column, the
reconstruction u after the same amount of iterations with the iterated soft shrinkage procedure is
depicted. On the right-hand side, a comparison of the descent of the functional values Ψ(un ) for
both methods is shown. The solid and dashed lines represent the values of Ψ for the iterated hard
and soft shrinkage procedures, respectively.
We solved the above minimization problem with the iterated hard shrinkage al-
gorithm for wk = 0.03 and different values of p, namely,
p1 = 1, p2 = 1.01, p3 = 1.5.
As worked out in the appendix the values Dn as defined in (11) can be used as a
stopping criterion, and in this experiment we stopped the iteration if Dn becomes
smaller than 10−8 or the maximum number of 1000 iterations is reached. Figure 6
shows the results of the minimization process together with the estimators Dn . Note
that the minimizers for p = 1 and p = 1.01 do not differ too much, although the
estimator Dn behaves very different: For p = 1 it oscillates heavily and is decaying
slowly as the theory indicates. The slight change from p = 1 to p = 1.01 results in
an estimator which is still oscillating but vanishing much faster, and the algorithm
stopped after 136 iterations. For p = 1.5 the sparsity of the reconstruction is lost, but
the algorithm terminated after just 29 iterations.
10 0.4
7.5 0.3
Downloaded 12/30/12 to 139.184.30.136. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
0.2
5
u
f
0.1
2.5
0
0
0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1
x x
Fig. 5. The data of the second model problem. From left to right: The spike initial heat
distribution u and the heat distribution at time T = .002. (Note the different scaling.)
0.3
1 1
0.2
u1
u2
u3
0.5 0.5
0.1
0 0 0
2
10
100 100
Dn
Dn
100
Dn
10 - 5 10 - 5
10 - 2
10 - 4 10 - 10 10 - 10
250 500 750 1000 25 50 75 100 125 5 10 15 20 25
n n n
Fig. 6. The results of the reconstruction of the initial condition. Top row from left to right:
Reconstruction for p = 1, p = 1.01, and p = 1.5, respectively. Bottom row: The values of the
estimator Dn in a logarithmic scale. (Note again different scaling.)
The true solution u0 is given by small plateaus, and hence the data f δ = Ku0 + δ are
a noisy function with steep linear ramps. We used N = 1000 and a relative error of
5%. The minimization problem reads as
N 2
N
(Ku)i − fiδ
min + wk |uk |,
u
i=1
2
k=1
which was solved with the iterative soft and hard shrinkage methods for the parameter
wk = .002. The results after 1000 iterations are shown in Figure 7, while Figure 8
shows the decrease of the function Ψ for both methods.
50 0.5
0.25
Downloaded 12/30/12 to 139.184.30.136. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
0
u0
fδ
0
−50 −0.25
−0.5
0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1
t t
50 50
0 0
uh
us
−50 −50
Fig. 7. Top left: The true solution u0 ; top right: The noisy data f δ = Ku + δ. Bottom left:
The result of the iterative soft shrinkage; bottom right: The result of the iterative hard shrinkage.
30
20
Ψ(un )
10
0
250 500 750 1000
n
Fig. 8. A comparison of the descent of the functional values Ψ(un ) for both methods is shown.
The solid and dashed lines represent the values of Ψ for the iterated hard and soft shrinkage proce-
dures, respectively.
convergence rates. Unfortunately, the case of total variation deblurring Φ(u) = |u|T V
seems not to fit into this context, and further analysis is needed (while the case of the
Downloaded 12/30/12 to 139.184.30.136. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
for n → ∞.
Proof. First observe that the descent property of Lemma 4 implies convergence
of the functional values Ψ(un ), especially Ψ(un ) − Ψ(un+1 ) → 0. In Lemma 4, the
estimate
⎧
⎨ −Dn if Dn ≥ K(v n − un )2 ,
2
Ψ(u n+1
) − Ψ(u ) ≤
n
⎩ −Dn 2
2K(v −u )
n n 2 if Dn ≤ K(v n − un )2 ,
which is slightly different from (14), is also proven. Remember that Condition 1 and
the descent property imply v n − un ≤ C (cf. the proof of Lemma 5), and thus one
can deduce
" √ 1/2 #
Dn ≤ max 2 Ψ(un ) − Ψ(un+1 ) , 2CK Ψ(un ) − Ψ(un+1 ) ,
which proves the assertion, since Dn ≥ 0 and the right-hand side obviously converges
to 0 as n → ∞.
Downloaded 12/30/12 to 139.184.30.136. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
can equivalently be expressed by −K ∗ (Ku − f ) ∈ ∂Φ(v), which is, in turn, with the
help of the Fenchel identity, equivalent to
−Φ(v) − K ∗ (Ku − f ), v = Φ∗ −K ∗ (Ku − f ) .
sup v n , v ∗ − Φ(v n ) = ∞.
n
which is a contradiction.
It follows that Φ∗ is less than infinity everywhere and hence continuous (see, e.g.,
[20]). So additionally, Φ∗ (−K ∗ (Kun − f )) → Φ∗ (−K ∗ (Ku∗ − f )), and the continuity
of the scalar product leads to
lim Dn = K ∗ (Ku∗ − f ), u∗ + Φ(u∗ ) + Φ∗ −K ∗ (Ku∗ − f ) = 0.
n→∞
Remark 12. Let us show how the error bound D can be computed in the modified
problem (21).
We first state the conjugate functionals Φ∗ associated with the penalty functionals
Φ. With some calculation one obtains for the conjugates of ϕp as defined in (3) that
⎧
⎨(p − 1) x∗ p if |x∗ | ≤ pS0p−1 ,
∗ ∗ p
ϕp (x ) =
⎩ S02−p (x∗ )2 + p − 1S p if |x∗ | > pS p−1 ,
2p 2 0 0
where p denotes the dual exponent, i.e., p1 + p1 = 1, and we formally define that
1
∞ = 0.
Downloaded 12/30/12 to 139.184.30.136. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
with the ϕ∗p as above. Eventually, the error bound at some u ∈ 2 can be expressed by
∞
− K ∗ (Ku − f ) k
∗ ∗
D(u) = K (Ku − f ) k uk + wk ϕp (uk ) + wk ϕp .
wk
k=1
REFERENCES
[17] J.-J. Fuchs and B. Delyon, Minimal L1 -norm reconstruction function for oversampled sig-
nals: Applications to time-delay estimation, IEEE Trans. Inform. Theory, 46 (2000),
pp. 1666–1673.
Downloaded 12/30/12 to 139.184.30.136. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
[18] R. Gribonval and M. Nielsen, Sparse representation in unions of bases, IEEE Trans. Inform.
Theory, 49 (2003), pp. 3320–3325.
[19] B. D. Rao, Signal processing with the sparseness constraint, in Proceedings of the International
Conference on Acoustics, Speech and Signal Processing 1, Seattle, WA, 1998, pp. 369–372.
[20] R. E. Showalter, Monotone operators in Banach space and nonlinear partial differential
equations, Math. Surveys Monogr. 49, AMS, Providence, RI, 1997.
[21] J.-L. Starck, E. J. Candès, and D. L. Donoho, Astronomical image representation by the
curvelet transform, Astron. Astrophys., 398 (2003), pp. 785–800.
[22] J.-L. Starck, M. K. Nguyen, and F. Murtagh, Wavelets and curvelets for image deconvo-
lution: A combined approach, Signal Processing, 83 (2003), pp. 2279–2283.
[23] G. Teschke, Multi-frames in Thresholding Iterations for Nonlinear Operator Equations with
Mixed Sparsity Constraints, Technical report, Preprint series of the DFG priority program
1114 “Mathematical methods for time series analysis and digital image processing,” 2005.
[24] J. A. Tropp, Just relax: Convex programming methods for identifying sparse signals in noise,
IEEE Trans. Inform. Theory, 52 (2006), pp. 1030–1051.
[25] C. R. Vogel and M. E. Oman, Iterative methods for total variation denoising, SIAM J. Sci.
Comput., 17 (1996), pp. 227–238.