Professional Documents
Culture Documents
Denoising of Image Gradients and Total Generalized Variation Denoising
Denoising of Image Gradients and Total Generalized Variation Denoising
Denoising of Image Gradients and Total Generalized Variation Denoising
Abstract We revisit total variation denoising and study an Keywords image denoising, gradient estimate, total
augmented model where we assume that an estimate of the generalized variation, Douglas-Rachford method, precondi-
image gradient is available. We show that this increases the tioning
image reconstruction quality and derive that the resulting
model resembles the total generalized variation denoising
method, thus providing a new motivation for this model. 1 Introduction
Further, we propose to use a constraint denoising model and
develop a variational denoising model that is basically pa- In this work we revisit variational denoising of images with
rameter free, i.e. all model parameters are estimated directly total variation penalties, dating back to the classical Rudin-
from the noisy image. Osher-Fatemi total variation denoising method [23]. We start
Moreover, we use Chambolle-Pock’s primal dual method by augmenting the model with an estimate of the image gra-
as well as the Douglas-Rachford method for the new mod- dient and analyze, how this helps for image denoising. This
els. For the latter one has to solve large discretizations of is related to the method of first estimating image normals
partial differential equations. We propose to do this in an in- and then using this estimate for a better image denoising, an
exact manner using the preconditioned conjugate gradients approach proposed by Lysaker et al. [20]. As we will see, a
method and derive preconditioners for this. Numerical ex- combined approach, which tries to estimate the gradient of
periments show that the resulting method has good denois- the denoised image and the denoised image itself simultane-
ing properties and also that preconditioning does increase ously, is very close to the successful total generalized varia-
convergence speed significantly. Finally we analyze the du- tion denoising from Bredies et al. [4]. A brief introduction to
ality gap of different formulations of the TGV denoising this idea was already proposed in [15]. Further, we will pro-
problem and derive a simple stopping criterion. pose different (in some sense equivalent) versions of the to-
tal general variation denoising method (one of these, CTGV,
This material was based upon work partially supported by the National
Science Foundation under Grant DMS-1127914 to the Statistical and already introduced in [15]) which have several advantages
Applied Mathematical Sciences Institute. Any opinions, findings, and over the classical one: First, we are going to work with con-
conclusions or recommendations expressed in this material are those straints in contrast to penalties, which, in some cases, allows
of the author(s) and do not necessarily reflect the views of the National
for a simple, clean and effective parameter choice. Second,
Science Foundation.
different formulations of these problems lead to different
Birgit Komander dual problems and hence, different algorithms and some of
Institute of Analysis and Algebra, TU Braunschweig, 38092 Braun-
schweig, Germany, E-mail: b.komander@tu-braunschweig.de
these turn out to be a little simpler regarding duality gaps and
stopping. Moreover, the different models show slightly dif-
D. A. Lorenz
Institute of Analysis and Algebra, TU Braunschweig, 38092 Braun-
ferent numerical performance. Finally, we will make use of
schweig, Germany, E-mail: d.lorenz@tu-braunschweig.de the Douglas-Rachford method to solve these minimization
Lena Vestweber
problems. This involves the solution of large discretizations
Institut Computational Mathematics, AG Numerik, TU Braun- of linear partial differential equations and we will develop
schweig, 38092 Braunschweig, Germany, E-mail: l.vestweber@tu- simple and effective preconditioners for these equations. In
braunschweig.de contrast to [6] where the authors use classical linear splitting
2 B. Komander et al.
methods for the inexact solution of the linear equations, we 2.1 Denoising with prior knowledge on the gradient
propose to use a few iterations of the preconditioned conju-
gate gradient method. Consider the image model u0 = u† + η, where u0 is the
The paper is organized as follows. Section 2 motivates given noisy image, u† is the ground truth, i.e. the noise-
denoising of gradients as a mean to improve total varia- free image, and η is the additional Gaussian white noise.
tion denoising and derives several new variational methods. In the situation of images, there are methods to obtain a rea-
Then, section 3 investigates the corresponding duality gaps sonable estimate of the amount of noise, i.e. an estimate on
of the problems and section 4 deals with the numerical treat- ku† − u0 k 2 = kηk 2 is available. One can use, for example
ment and especially with the Douglas-Rachford method and the techniques from [17, 18] to estimate the noise level of
efficient preconditioners for the respective linear subprob- Gaussian white noise quite accurately from a single image.
lems. Section 5 reports numerical experiments and section 6 Using this information, it seems that
draws some conclusions.
ku − u0 k 2 ≤ ku† − u0 k 2 = kηk 2 =: δ1
2 2
Remark 3 If Rthe condition Ω u† = Ω u0 in Lemma 1 does
R R
if ku† − u0 + ck 2 ≤ ku† − u0 k 2 . We expand the left hand
side and get, writing |Ω| for the measure of Ω, not hold, but Ω (u† − u0 ) = ε, then the proof of that lemma
Z still shows that all solutions of (3) are of the form u† + c
2 2 ε ε
ku† − u0 k 2 + 2c (u† − u0 ) + c2 |Ω| ≤ ku† − u0 k 2 . with |c − |Ω| | ≤ |Ω| .
Ω
Since the middle integral vanishes by assumption, we see If v 6= ∇ u† , then any solution ũ of (3) will usually be
that c = 0. different from u† , although, it will fulfill the trivial estimate
t
u
kũ − u† k 2 ≤ kũ − u0 k 2 + ku0 − u† k 2
The next lemma shows, that v = ∇ u† is also necessary ≤ 2ku0 − u† k 2 = δ1 .
for perfect reconstruction.
However, the following lemma shows that ũ → u† for v →
Lemma 2 If δ1 = ku† − u0 k 2 , Ω u† = Ω u0 and u†
R R
∇ u† (for constant noise level δ1 ):
solves (3), then v = ∇ u† .
u† and u0 fulfill Ω u† =
R
RLemma 4 Assume Ω is convex,
Proof Let u† ∈ argminu k| ∇ u − v|k 1 s.t. ku − u0 k 2 ≤ u , let v fulfill k|v − ∇ u† |k 1 ≤ ε and assume that there
Ω 0
ku† − u0 k 2 . We denote by I C (x) the indicator function of exists a solution ũ of (3) with δ1 = ku† − u0 k 2 for which
a set C, and set K = ∇, F (u) = I k · −u0 k 2 ≤ku† −u0 k 2 (u), the constraint is active (i.e. kũ − u0 k 2 = δ1 ).
∗
R = k|q − v|k 1 (i.e. the Fenchel conjugate of G is G (φ)
G(q) Then there exists another solution ū of (3) with δ1 =
= Ω vφ dx + I k| · |k 1 ≤1 (φ)). The characterization of op- ku† − u0 k 2 that fulfills Ω ū = Ω u† and moreover, it holds
R R
timality by the Fenchel-Rockafellar optimality system [12, that
Remark 4.2] shows that (u∗ , φ∗ ) is a primal-dual optimal
pair if and only if kū − u† k 2 ≤ diam(Ω)ε
(
0 ∈ K ∗ φ∗ + ∂F (u∗ ), where diam(Ω) denotes the diameter of Ω.
0 ∈ −Ku∗ + ∂G∗ (φ∗ ), Proof To obtain ū we consider ūR = ũ + c for a suitable
u† is achieved for c =
R
which amounts to the inclusions constant c. The equality Ω
ū = Ω
†
R
( Ω
(u − ũ)/|Ω|. Since ∇ ū = ∇ ũ holds, ū is optimal for (3)
0 ∈ K ∗ φ∗ + ∂F (u∗ ), as soon as it is feasible. To check feasibility we calculate
0 ∈ −Ku∗ + ∂G∗ (φ∗ ), 2 2
( kū − u0 k 2 = kũ − u0 + ck 2
0 ∈ K ∗ φ∗ + ∂ I k · −u0 k 2 ≤ku† −u0 k 2 (u∗ ), 2
Z
⇔ = kũ − u0 k 2 + 2c (ũ − u0 ) + c2 |Ω|.
0 ∈ −Ku∗ + v + ∂ I k| · |k 1 ≤1 (φ∗ ). Ω
2.2 Denoising of image gradients Instead of using the constrained formulation of the first prob-
lem, we can also use a penalized formulation. Thus, the gra-
The previous lemma shows that any approximation v of the dient denoising problem writes as:
true gradient ∇ u† is helpful for total variation denoising ac-
1. Choose α > 0 and calculate a denoised gradient by solv-
cording to (3). In order to determine such a v, one way is
ing
to denoise the gradient of the input image, specifically, by
a variational method with a smoothness penalty for the gra- v̂ = argmin k| ∇ u0 − v|k 1 + αk| E(v)|k 1 (6)
dient and some discrepancy term. Naturally, a norm of the v
derivate of the gradients can be used. A first candidate could
2. Denoise u0 by solving (5).
be the Jacobian of the gradient, i.e.
Since we use a denoised gradient prior to apply total
∂1 (∂1 u) ∂2 (∂1 u) variation denoising, we term the method (4) and (5) De-
J(∇ u) =
∂1 (∂2 u) ∂2 (∂2 u) noised Gradient Total Variation (DGTV). Due to the sim-
ilarity with total generalized variation, we call the second
which amounts to the Hessian of u. Thus, the matrix is sym-
method based on (6) and (5) Denoised Gradient Total Gen-
metric as soon as u is twice continuously differentiable. How-
eralized Variation (DGTGV).
ever, notice, that the Jacobian of an arbitrary vector field
is not necessarily symmetric and hence using kJ(v)k as
smoothness penalty seems unnatural. Instead, we could use 2.3 Constrained and Morozov total generalized variation
the symmetrized Jacobian,
1
Both two-stage denoising methods DGTV and DGTGV for
∂1 v1 2 (∂1 v2 + ∂2 v1 )
E(v) = 1 , the gradient resemble previously known methods: The latter
2 (∂1 v2 + ∂2 v1 ) ∂2 v2
is related to total generalized variation (TGV) [4] while the
where v1 and v2 are the components of v. Note that for twice former to constrained total generalized variation (CTGV) [15].
differentiable u we have The TGV of second order, defined in [4] has been shown to
be equal to
E(∇ u) = J(∇ u) = Hess(u),
TGV2(α0 ,α1 ) (u) = min α1 k| ∇ u − v|k 1 + α0 k| E(v)|k 1
v
i.e. in both cases we obtain the Hessian of u. Imitating the (7)
TV-seminorm (and also followoing the idea of total gener-
alized variation), we take F (v) = k| E v|k 1 . in [5, 4] while CTGV from [15] is defined as
Similar to the constraint in (3) the denoised gradient n o
should not differ more from the true gradient than ∇ u0 , CTGVδ (u) = min k| E(v)|k 1 s.t. k| ∇ u − v|k 1 ≤ δ .
v
thus, we consider the minimization problem with a constraint (8)
min k| E(v)|k 1 s.t. k| ∇ u0 − v|k 1 ≤ δ2 Considering ∇ u to be the given data in these problems, one
v
could say, following the notion from [19], that TGV is a
The parameter δ2 can be chosen as follows: If we set δ2 := Tikhonov-type estimation of ∇ u while CTGV is a Morozov-
ck| ∇ u0 |k 1 , then c = 1 would allow the trivial minimizer type estimation of ∇ u.
v = 0 and any c < 1 will enforce some structure of ∇ u0 Now, combining the two steps of DGTGV into one op-
onto the minimizer and smaller c leads to less denoising. timization problem, where in each step the image as well as
Putting the pieces together, we arrive at a two-stage de- the gradient is updated simultaneously, we get
noising method: n o
min TGV2(α,1) (u) s.t. ku − u0 k 2 ≤ δ1
1. Choose 0 < c < 1 and calculate a denoised gradient by u
n
solving = min k| ∇ u − v|k 1 + αk| E(v)|k 1 (9)
u,v
o
v̂ ∈ argmin k| E(v)|k 1 s.t. ku − u0 k 2 ≤ δ1 .
v (4)
s.t. k| ∇ u0 − v|k 1 ≤ ck| ∇ u0 |k 1 =: δ2 .
This formulation is a Morozov type formulation of the TGV
2. Denoise u0 by solving problem
2
û ∈ argmin k| ∇ u − v̂|k 1 min 12 ku − u0 k 2 + TGV2(α0 ,α1 ) (u) (10)
u
u (5)
s.t. ku − u0 k 2 ≤ kηk 2 =: δ1 . and thus, in the following referred to as MTGV.
Denoising of image gradients and total generalized variation denoising 5
Another approach to form a combined optimization prob- Also the problems CTGV-, TGV-, and MTGV-denoising
lem out of DGTGV is to preserve both constraints, i.e. tak- come with two parameters each that have to be chosen. For
ing (4) and (5) to obtain CTGV- and MTGV-denoising the choice for δ1 is the same
as above. For the parameter α in MTGV-denoising (9) we
min k| E(v)|k 1 s.t. ku − u0 k 2 ≤ δ1 , have experimental experience that hints that α ≈ 2 is a good
u,v
(11)
k| ∇ u − v|k 1 ≤ δ2 . universal parameter (cf. section 5). This inline with the usual
recommendation that α1 = 2α0 is a good choice for TGV
In both problem formulations δ1 again is the noise level. Us- denoising from (10) cf. [13]. For the remaining parameter δ2
ing (8) we see that problem (11) becomes for CTGV there is following heuristic from [15]: We denote
with uTV the TV denoised image with δ1 chosen according
min CTGVδ2 (u) s.t. ku − u0 k 2 ≤ δ1 . (12)
u to the noise level estimate from Section 2.4 and set δ2 =
ck| ∇ uTV |k 1 with 0 < c < 1. However, CTGV will not be
Obviously the TGV, MTGV and the CTGV denoising
included in the experiment in section 5 and the main reason
problems are implicitly equivalent in the sense that, knowing
is, that the parameter choice here does lead to inferior results
the solution to one of the problems allows to calculate the
compared to MTGV.
respective parameters of one of the other problems such that
the solution stays one (cf. [19, Theorem 2.3] for a general
result and [15, Lemma 1, Lemma 2] for a result in the case 3 Duality gaps
of CTGV and TGV).
Besides the choice of the problem parameters, controlling
the denoising, one has to choose a reasonable stopping cri-
2.4 Parameter choice
terion. Since all considered problems are convex, the dual-
ity gap is a natural candidate. Fenchel-Rockafellar optimal-
A few words on parameter choice for all methods are in or-
ity [12] states that, under appropriate conditions, the primal
der. Frequently, a constraint ku − u0 k 2 ≤ δ1 appears in the
problem min F (x) + G(Kx) has the dual problem
problem and the parameter δ1 has a large influence on the
maxy −F ∗ (−K ∗ y) − G∗ (y), that the duality gap, defined
denoising result. A natural choice is to adapt the parameter
as
to the noise level, i.e.
For a given discrete image, this number can be estimated is an upper bound on the difference from the current ob-
as follows: Under the assumption that the noise is additive jective value to the optimal one, i.e. gap(x, y) ≥ F (x) +
Gaussian white noise, the methods from [17, 18] allow to G(Kx) − F (x∗ ) − G(Kx∗ ) where x∗ is a solution of the
estimate the standard deviation σ of the noise η. Then, it is primal problem, and that the duality gap vanishes exactly at
primal-dual optimal pairs.
√ known that the 2-norm of η is estimated ny kηk 2 ≈
well
For the MTGV denoising problem (9) we have the gap
σ k, where k is the number of pixels.
The second parameter in DGTV is δ2 = ck| ∇ u0 |k 1 , the function:
constraint parameter in (4). Setting c = 1 leads to v = 0 as gapMTGV (u, v, p, q) =
a feasible and also optimal solution, hence, the gradient we k| ∇ u − v|k 1 + αk| E(v)|k 1 + I {k·−u0 k 2 ≤δ1 } (u)
insert into the second optimization problem is zero and thus (13)
the second problem becomes pure total variation denoising + δ1 k∇∗ pk 2 − hp, ∇ u0 i + I {0} (p − E ∗ q)
without any additional information. Therefore, c ∈ (0, 1) + I {k|·|k ∞ ≤α} (p) + I {k|·|k ∞ ≤1} (q).
is a reasonable choice. Experiments showed (cf. section 5)
that a lot of gradient denoising, i.e. smoothening of ∇ u0 , As also noted in [6] (for the TGV denoising problem (10))
leads to good reconstructions of the image in the second this gap is not helpful: The problem is, that the indicator
step. Thus, we set c ≈ 0.99. function I {0} (p − E ∗ q) is usually not finite as p = E ∗ q is
The method DGTGV, the penalized variant of the two- usually not fulfilled. To circumvent this problem, one could
stage method, includes the parameter α, the penalization pa- simply replace p by E ∗ q in the gap function. If we do this,
rameter in (6). Again experiments showed that α = 1 leads we obtain a gap that only depends on q:
to a good image reconstruction (cf. section 5), independent gapMTGV (u, v, q) =
of the noise level, the image size or the image type.
k| ∇ u − v|k 1 + αk| E(v)|k 1 + I {k·−u0 k 2 ≤δ1 } (u)
Numerical experiments on the performance of the two- (14)
stage methods with regard to image quality and computa- + δ1 k∇∗ E ∗ qk 2 − hE ∗ q, ∇ u0 i
tional speed can be found in section 5. + I {k|·|k ∞ ≤α} (E ∗ q) + I {k|·|k ∞ ≤1} (q).
6 B. Komander et al.
We still have the problem, that I {k|·|k ∞ ≤α} (E ∗ q) might be For this problem, the gap function is
infinite. So replacing q and p in (13) by
gapMTGV (u, w, q) =
q
q̃ := k|w|k 1 + αk| E(∇ u − w)|k 1 + I {k·−u0 k 2 ≤δ1 } (u)
k| E ∗ q|k ∞
max 1, α + δ1 k∇∗ (E ∗ (q))k 2 − hE ∗ q, ∇ u0 i
∗
E q + I {k|·|k ∞ ≤α} (E ∗ q) + I {k|·|k ∞ ≤1} (q),
p̃ := = E ∗ q̃,
k| E ∗ q|k ∞
max 1, α (16)
we obtain a finite gap function which is a valid stopping which is exactly the same as (14). As above the indicator
criterion for the problem: function I {k|·|k ∞ ≤α0 } (E ∗ q) might be infinite and therefore
q should be replaced by q̃ from above. So replacing p by E ∗ q
Lemma 5 Let Φ(u, v) := k| ∇ u − v|k 1 + αk| E(v)|k 1 + in (13) results in the gap function of (15). Both gap functions
I {k·−u0 k 2 ≤δ1 } (u) be the primal functional of the MTGV de- can be used as valid stopping criteria for both problem for-
noising problem and (u∗ , v ∗ ) be a solution of minu,v Φ(u, v). mulations.
Then the error of the primal energy can be estimated by This method for the stopping criterion does apply to gen-
eral problems of the form
Φ(u, v) − Φ(u∗ , v ∗ )
min F (u) + G(Av) + H(Bu − v), (17)
≤k| ∇ u − v|k 1 + αk| E(v)|k 1 + I {k·−u0 k 2 ≤δ1 } (u) u,v
+ δ1 k∇∗ (E ∗ q̃)k 2 − hE ∗ q̃, ∇ u0 i + I {k|·|k ∞ ≤1} (q̃), where A and B are linear (and standard regularity condi-
tions, implying Fenchel-Rockafellar duality is fulfilled). The
q
where q̃ :=
k| E ∗ q|k ∞
and q is any feasible dual vari- dual problem is
max 1, α
If we replace p by A∗ q, we obtain
Proof The estimate is clear since any dual value is smaller
or equal than any primal value. So we can change the dual max −F ∗ (−B ∗ A∗ q) − H ∗ (A∗ q) − G∗ (q),
variables in the gap as we like. q
Corollary 6 Let Φ(u, v) := k| ∇ u − v|k 1 + αk| E(v)|k 1 + To that end, in [22] the authors propose to use the splitting
1 2
2 ku − u0 k 2 be the primal functional of the TGV denoising
∂F (x)
problem and (u∗ , v ∗ ) be a solution of minu,v Φ(u, v). Then B(x, y) = ,
∂G∗ (y)
the error of the primal energy can be estimated by (20)
K∗
0 x
A(x, y) = .
Φ(u, v) − Φ(u∗ , v ∗ ) −K 0 y
1 2 If A is a matrix the resolvent RtA simply is the matrix in-
≤ α1 k| ∇ u − v|k 1 + α0 k| E(v)|k 1 + ku − u0 k 2
2 verse (I + tA)−1 . The resolvent of B is given by the proxi-
1 2 mal mappings of F and G∗ , namely
+ k∇∗ (E ∗ q̃)k 2 − hE ∗ q̃, ∇ u0 i + I {k|·|k ∞ ≤α1 } (q̃),
2
(I + t∂F )−1 (x)
proxtF (x)
q . Moreover, if (un , v n , q n ) con-
RtB (x, y) = = .
where q̃ :=
k| E ∗ q|k ∞ (I + t∂G∗ )−1 (y) proxtG∗ (y)
max 1, α
verge to a primal-dual solution, the upper bound converges For further flexibility, it is proposed in [22] to rescale the
to zero. problem by replacing G with G̃(y) := G( βy ) and K with
K̃ := βK. Then one can replace the dual variable by ỹ k :=
βy k and finally use proxtG̃∗ (y) = β1 proxtβ 2 G∗ (βy) to ob-
4 Numerics tain a second stepsize s := β 2 t, which can be chosen inde-
pendently from the stepsize t. In total, the resolvents then
In this section we describe methods to solve the convex opti- become
mization problems related to the various denoising methods
proxtF (x)
from the previous sections. We will work with standard dis- RtB (x, y) = ,
proxsG∗ (y)
cretizations of the images and the derivative operators, but
−1
tK ∗
state them for the sake of completeness in Appendix A.1. In I
RtA = .
this section we focus on the optimization methods. −sK I
In each step of Douglas-Rachford we need the inverse of a
fairly large block matrix. However, as also noted in [22] this
4.1 Douglas-Rachford’s method can be done efficiently with the help of the Schur comple-
ment. If K is a matrix, K ∗ = K T and
The Douglas-Rachford algorithm (see [10] and [16]) is a −1
I tK T
splitting algorithm to solve monotone inclusions of the form
0 ∈ A(x)+B(x), which requires only the resolvents RtA = −sK I
(I + tA)−1 and RtB = (I + tB)−1 but not the resolvent of T
00 I T −1 I
the sum A + B. = + (I + stK K) .
0I sK −tK
One way to write down the Douglas-Rachford iteration
is as a fixed point iteration To use this, we need to solve equations with the positive
definite coefficient matrix I + λK T K efficiently.
z k = F (z k−1 ), with In the following we interpret the linear operators as ma-
F (z) = z + RtA (2RtB (z) − z) − RtB (z) trices, without changing notation, i.e. the matrix E also stands
for the matrix realizing the linear operation E. With the help
for some step-size t > 0. It is possible to employ relaxation of the vectorization operation vec, it holds that E · vec(u) =
for the Douglas-Rachford iteration as vec(E u), where on the left hand side E is a matrix and on
the right hand side E is the linear operator from A.1.
z k = z k−1 + ρ(F (z k−1 ) − z k−1 ), (18) As discussed is section 3, we have different possibili-
ties to choose the primal variables (and thus, also for the
where 1 < ρ < 2 is overrelaxation and 0 < ρ < 1 is un- dual variables): Namely, we could use the primal variables
derrelaxation. For ρ in the whole range from 0 to 2 and any u and v, as, e.g. in the MTGV problem (9), or the primal
t > 0 it holds that the iteration converges to some fixed point variables u and w, as in the corresponding problem (15).
z of F such that RtB (z) is a zero of A + B, see, e.g. [11]. This choice does not only influence the dual problem and
The Douglas-Rachford method can be used to solve the the duality gap, but also the involved linear map K T K. The
saddle point problem formulation with u and v as primal variables leads to
− div 0 ∇ −I −∆ div
min max hy, Kxi + F (x) − G∗ (y). (19) KT K = = (21)
x y −I E T 0 E −∇ I + E T E
8 B. Komander et al.
D1T D1 + D2T D2
while the variable change to u and w gives T 0
J J= ,
0 D1T D1 + D2T D2
− div(E T )
T
T
H
K K= T E(∇) −E = T H −E
−E −E
T H T H = (D12 )T D12 + 2(D1 D2 )T (D1 D2 ) + (D22 )T D22 ,
T
H H −H E
= , (22)
−E T H E T E
−D1T D12 − D2T (D1 D2 )
− ET H = .
where H := E(∇) is the Hessian matrix, cf. Appendix A.2.. −D1T (D1 D2 ) − D2T D22
The same is true for CTGV from (11) and the usual TGV The operator J T J is in fact the negative (discrete) Laplace
problem (10). For problem (6) we have operator applied component-wise and also called vector Lapla-
cian. Note that the operator E T E decomposes as
KT K = ET E . (23)
T
D1
E T E = 12 J T J + 12
D1 D2 .
D2
4.2 Inexact Douglas-Rachford
Note that the boundary conditions are implicitely contained
For the solution of the linear equation with coefficient matrix in the discretization operators and adjoints (e.g. we use Neu-
I + λK T K we propose to use the preconditioned conjugate mann boundary conditions for the gradient and thus, equa-
gradient method (PCG). For preconditioning we do several tions −∆u = f always contain Neumann boundary condi-
approximations: First we replace complicated discrete dif- tions).
ference operators by simpler one and then we use a block- Our linear operators are dicretizations of continuous dif-
diagonal preconditioner and use the incomplete Cholesky ferential operators. Hence, the resolvent steps correspond to
decomposition in each block. As we will see, with these pre- solutions of certain differential equations. In this context it
conditioners we only need one or two iterations of PCG to has been shown to be beneficial, to motivate precondition-
obtain good convergence results of the Douglas-Rachford ers for the linear systems by their continuous counterparts,
iteration. If we only use one iteration of PCG and denote see [21]. Block diagonal preconditioners are a natural choice
A = (I + stK T K), the linear step is approximated by for these operators. The conjugate gradient method is usu-
ally used for linear systems of the form Ax = f , where f is
A−1 b ≈ xk − aM (Axk + b), an element of a finite dimensional space X. As discussed in
[21] it can also be used in the case where X is infinite dimen-
where the PCG stepsize is given by
sional and A : X → X is a symmetric and positive-definite
hb − Axk , M (b − Axk )i isomorphism. Consider for example the negative Laplace
a=
hAM (b − Axk ), M (b − Axk )i operator A : X = H01 (Ω) → X ∗ defined by
Z
and M is the preconditioner for A, i.e. M −1 ≈ A. hAu, vi = ∇u · ∇v dx, u, v ∈ X.
In the following the coefficient matrices and precondi- Ω
tioners are given in detail. With the matrices D1 and D2 The standard weak formulation of the Dirichlet problem is
representing the derivatives in the first and second direction, now Ax = f , where f ∈ X ∗ . Since X 6= X ∗ the linear
we have operator A maps x out of the space and the conjugate gra-
D1 0
dient method is not well defined. To overcome this problem,
D1 1 D2 1 D1 we introduce a preconditioner M, which is a symmetric and
∇= , E = 2 2
D2 1 D2 1 D1 ,
positive definite isomorphism mapping X ∗ to X. The pre-
2 2
0 D2 conditioned system MAx = Mf can then be solved by
the conjugate gradient method. We consider now the corre-
sponding continuous linear operator from (21):
D12
D1 0
−∆ −∇∗
D2 0 D1 D2 Au,v = I + stK ∗ K = I + st .
J =
0
, H=
D1 D2 ,
−∇ I + E ∗ E
D1
0 D2 D22 The operator Au,v is an isomorphism mapping X = H 1 (Ω)×
(H 1 (Ω))2 into its dual X ∗ = H 1 (Ω)0 × (H 1 (Ω)0 )2 . The
−∆ = ∇T ∇ = D1T D1 + D2T D2 , canonical choice of a preconditioner, in the sense of [21], is
therefore given as the blockdiagonal operator
D1T D1 + 12 D2T D2 1 T (I − st∆)−1
2 D2 D1
0
ET E = , Mu,v = −1
1 T
2 D1 D2
1 T T
2 D1 D1 + D2 D2 0 (I + st (I + J ∗ J))
Denoising of image gradients and total generalized variation denoising 9
(the inverses of the respective operators also exist in the con- where H = E(∇) is the Hessian matrix. In our experiments
tinuous setting, see [2, Theorem 6.6]). To see that Au,v has we chose the discrete version of the blockdiagonal operator
a bounded inverse, we can check coercivity, i.e. for some
(I + stH ∗ H)−1
k > 0 we have 0
Mu,w = −1
0 (I + stJ ∗ J)
hAu,v (u, v), (u, v)i
for preconditioning. It gives good numerical results, but is
= hu, uiL2 (Ω) + sth∇u, ∇ui(L2 (Ω))2
not as nicely accompanied by the theory as the previous
− 2sth∇u, vi(L2 (Ω))2 operator. To obtain fast algorithms the inverses in the pre-
+ (1 + st)hv, vi(L2 (Ω))2 + sthE v, E vi(L2 (Ω))4 conditioners can be well approximated by the incomplete
= kuk2L2 (Ω) + stk∇uk2(L2 (Ω))2 − 2sth∇u, vi(L2 (Ω))2 Cholesky decomposition.1 For the denoising problems with
the primal variables u und v we use the preconditioner
+ (1 + st)kvk2(L2 (Ω))2 + stk E vk2(L2 (Ω))4
≥ kuk2L2 (Ω) + stk∇uk2(L2 (Ω))2 ichol(I − st∆) 0
Cu,v = (24)
0 ichol I + st I + J T J
− 2stk∇uk(L2 (Ω))2 kvk(L2 (Ω))2
+ (1 + st)kvk2(L2 (Ω))2 + stk E vk2(L2 (Ω))4 and for the denoising problems with the primal variables u
≥ kuk2L2 (Ω) + stk∇uk2(L2 (Ω))2 and w we use the preconditioner
!
k∇uk2(L2 (Ω))2 εkvk2(L2 (Ω))2 ichol(I + H T H)
0
− 2st + Cu,w = . (25)
2ε 2 0 ichol I + stJ T J
+ (1 + st)kvk2(L2 (Ω))2 + stk E vk2(L2 (Ω))4 For problem (6) we use the preconditioner
1
= kuk2L2 (Ω) + st(1 − )k∇uk2(L2 (Ω))2 Cv = ichol(I + stJ T J). (26)
ε
+ (1 + st(1 − ε))kvk2(L2 (Ω))2 + stk E vk2(L2 (Ω))4
Here the preconditioners are given in the form M −1 = C T C.
≥ k̃ kuk2L2 (Ω) + k∇uk2(L2 (Ω))2
+ k̃ kvk2(L2 (Ω))2 + k E vk2(L2 (Ω))4 5 Experiments
≥ k kuk2H 1 (Ω) + kvk2(H 1 (Ω))2 .
In the previous sections we introduced several different mod-
els for variational image denoising and also described two
The second to last estimate holds for some k̃ > 0 since for algorithms, each applicable to each model. This leaves us
all st > 0 we can find ε > 0, such that st 1 − 1ε > 0 and with a large number of parameters that have to be chosen. In
(1 + st(1 − ε)) > 0. We can chose k̃ as the minimum of this section, we try to illustrate the effects of these param-
both of them. The last estimate follows for some k > 0 as a eters and to give some guidance on how these parameters
consequence of Korn’s inequality [8]. should be chosen. To that end, we divide the set of parame-
The corresponding continuous linear operator from (23) ters into two groups:
is
Problem parameters: These are the parameters of the model
Av = I + st E E . ∗ itself. For example in the case of TGV denoising (10),
the two problem parameters are the two regularization
The operator Av is an isomorphism mapping X = (H 1 (Ω))2 parameters α0 and α1 while for MTGV (9) we have the
into its dual X ∗ = (H 1 (Ω)0 )2 . The canonical choice of a parameters α and δ1 and the DGTGV method consisting
preconditioner is therefore given as the blockdiagonal oper- of (6) and (5) also has the two parameters α and δ1 .
ator Algorithmic parameters: These are parameters, that influ-
ence only the algorithm, but not the theoretical minimiz-
Mv = (I + stJ ∗ J)−1 . ers. These can be for example: One or more step-sizes,
the relaxation parameter, the stopping criterion (e.g. a
The corresponding continuous linear operator from (22) tolerance for the duality gap), the parameters of the CG
is iteration, or the preconditioner.
1 We used the MATLAB function ichol with default parameters
H ∗ H −H ∗ E
Au,w = I + stK ∗ K = I + st , in the experiments. Varying the parameters did not lead to significantly
− E∗ H E∗ E different results.
10 B. Komander et al.
The problem parameters influence the quality of the denois- default value of α = 1. For the MTGV method from sec-
ing, while the algoritmic parameters influence the perfor- tion 2.3 we report the results of the optimization of the pa-
mance (or speed) of the method. Moreover, there is a trade- rameter α in figure 2b and we see that values around α = 2
off between speed and quality: if the algorithm is stopped seem optimal (while the variance is larger than for DGTGV).
too early, the minimizer may not be approximated well. Note,
however, that sometimes early stopping may increase recon-
struction quality (which often indicates that the model can
be improved), but we shall not deal with this question here
but rather focus on the analysis of the problem and algorith-
mic parameters separately. DGTGV, optimal α
0.25
0.05 noise
0.1 noise
0.25 noise
0.2
5.1 Optimal constants in DGTV, DGTGV and MTGV
0.15
use the parameter δ2 which we motivated to be δ2 = ck| ∇ u0 |k 2
in section 2.4. We calculated the optimal constants c for var-
ious images with different noise-levels, cf. figure 1. As ex- 0.1
pected, all c ≥ 1 lead to the same result (in this case v = 0
is a feasible solution and optimal and therefor, the two-stage
5 · 10−2
step. Hence, c ≈ 0.99 seems like a sensible choice. 0.6 0.8 1 1.2 1.4 1.6 1.8 2
α
(a) Optimal α for various images with different noise-levels in the
DGTV, optimal c DGTGV method.
0.3
0.1 noise
0.25 noise 0.05 noise
Relative norm distance
0.1 noise
0.25 noise
0.2
5 · 10−2
0
0 1 2 3 4 5 6
Fig. 1: Optimal values for various images and different α
noise-level for the DGTV method. (b) Optimal α for various images with different noise-levels in the
MTGV method.
Image(noise factor) DGTGV MTGV DGTGV MTGV DGTGV DGTGV DGTGV MTGV DGTGV MTGV
best α default α (grad.) (img.) (all)
affine(0.05) 37.26 38.30 37.07 38.25 0.12 0.06 0.17 0.20 0.18 0.28
affine(0.1) 32.58 33.97 32.66 33.65 0.15 0.05 0.19 0.25 0.20 0.30
affine(0.25) 27.74 28.41 26.80 27.87 0.27 0.04 0.31 0.39 0.44 0.34
eye(0.05) 31.34 31.64 31.32 31.49 0.14 0.16 0.30 0.91 0.46 1.21
eye(0.1) 28.94 29.37 28.95 29.28 0.23 0.18 0.41 0.93 0.83 1.08
eye(0.25) 26.26 26.98 26.09 26.89 0.67 0.16 0.83 1.53 1.66 1.26
cameraman(0.05) 30.57 30.79 30.53 30.78 0.30 0.15 0.46 1.28 0.80 1.79
cameraman(0.1) 27.14 27.32 27.09 27.32 0.38 0.16 0.54 1.38 1.03 1.55
cameraman(0.25) 23.20 23.34 23.15 23.26 0.73 0.16 0.89 1.81 1.51 1.43
moonsurface(0.05) 30.69 30.87 30.68 30.77 0.14 0.16 0.30 0.84 0.52 1.17
moonsurface(0.1) 28.46 28.73 28.46 28.65 0.28 0.17 0.45 0.93 0.72 1.08
moonsurface(0.25) 26.08 26.38 26.04 26.36 0.65 0.15 0.80 1.52 1.46 1.26
barbara(0.05) 28.35 28.66 28.35 28.49 1.19 0.58 1.77 10.98 3.40 9.67
barbara(0.1) 24.91 25.14 24.91 25.06 1.03 0.60 1.63 7.46 4.66 7.53
barbara(0.25) 22.34 22.48 22.34 22.46 1.94 0.75 2.69 7.33 8.03 6.04
(a) PSNR values (b) PSNR values (c) Time in seconds (Chambolle- (d) Time in seconds
for best possible α for default α values Pock). (Douglas-Rachford).
according to each according to each
method. method.
Table 1: 1a, 1b: PSNR values of DGTGV and MTGV methods with best possible α value and default values for each
method; 1c: Time in seconds for DGTGV and MTGV methods implemented with Chambolle-Pock, for the DGTGV both
steps also separately; 1d: Time in seconds for both methods implemented with Douglas Rachford.
Fig. 4: Comparison of DGTGV and MTGV method of affine Fig. 5: Comparison of DGTGV and MTGV method with
image with best possible α value; noise factor 0.25. best possible α value; noise factor 0.1.
without preconditioning (see figure 7).The algorithms are the Douglas-Rachford iteration are chosen by trial and er-
tested with the image eye (256x256) corrupted by Gaussian ror: s = 120, t = 0.1. The Douglas-Rachford iteration is
white noise of mean 0 and variance 0.1. The stepsizes of very slow for one or two iterations of CG without precon-
Denoising of image gradients and total generalized variation denoising 13
100
DR tgv
to MTGV (and hence, TGV) as a denoising method.
10
CP tgv
especially in the inexact case. However, the overall runtime
CP mtgv
DR tgv
was only better than the simpler primal-dual method by Cham-
−1
Normalized duality gap
A Appendix
10
A.1 Discretization
−3
10
We equip the space of M × N images with the inner product and in-
duced norm
−4
10
1
M X N M X N 2
X X
2
hu, vi = ui,j vi,j , kuk2 = ui,j .
−5
i=1 j =1 i=1 j =1
10
0 10 20 30 40 50 60
pirical observations allows for variational denoising meth- hE v, pi = hv, E ∗ pi, for all v, p.
14 B. Komander et al.
10−1
Normalized duality gap
10−3
10−5
10−5
1 iter cg 1 iter pcg
2 iter cg 2 iter pcg
3 iter cg 3 iter pcg
4 iter cg 4 iter pcg
−7
10−7
10 iter cg 10 iter pcg
10
100 101 102 103 104 100 101 102 103 104
Iterations Iterations
(a) Convergence of MTGV without preconditioning according to num- (b) Convergence of MTGV with preconditioning according to number
ber of iterations. of iterations.
100
10
1 iter pcg
2 iter pcg
−1
10−1
10
3 iter pcg
Normalized duality gap
4 iter pcg
−2
10−2
10 iter pcg
10
10−3
10
−4
10−4
10
1 iter cg
−5
10−5
2 iter cg
10
3 iter cg
−6
10−6
4 iter cg
10
10 iter cg
cg until conv.
−7
10−7
10
0 10 20 30 40 50 60 0 10 20 30 40 50 60
Time Time
(c) Convergence of MTGV without preconditioning according to time (d) Convergence of MTGV with preconditioning according to time in
in seconds. seconds.
Fig. 7: A comparisson of the MTGV method with and without preconditioning according to iterations and time.
A.2 Prox operators and duality gaps for considered A.2.1 DGTV
problems
In section 2 we formulated a two- staged denoising method in two
ways. First, as a constrained version (4) and (5). In this formulation
we get the primal functionals for the first problem (4)
In order to calculate experiments with the methods proposed in the pre- F (v ) = I k| ∇ u0 − · |k 1 ≤δ2 (v ),
vious sections, in this section we will state all primal and dual function- (27)
G(ψ ) = k|ψ|k 1
als according to a general optimization problem minx F (x) + G(Kx)
along with a study of the corresponding primal-dual gaps and possi-
bilities to ensure feasibility of the iterates throughout the program by with operator K = E . The dual problems, in general written as
reformulation of the problems by introducing a substitution variable.
We also give the proximal operators needed for the Chambolle-Pock max −F ∗ (−K ∗ y ) − G∗ (y )
and Douglas-Rachford algorithm. y
Denoising of image gradients and total generalized variation denoising 15
For the second problem within DGTV, we have the denoising prob- F ∗ (s, t) = δ1 ksk 2 + hs, u0 i + δ2 k|t|k ∞ ,
lem of the image with respect to the denoised gradient vb as output of (41)
G∗ (q ) = I k| · |k ∞ ≤1 (q )
the previous problem, cf. problem (5). There, the primal functionals
with K = ∇ are with dual block operator
F (u) = I ku0 − · k 2 ≤δ1 (u) ∗
(31) ∇
K∗ = E∗ . (42)
G(φ) = k|φ − v
b|k 1 − Id
and the corresponding dual functionals write as The proximal operators are accordingly given by
∗
F (s) = δ1 ksk 2 + hu0 , si
projk · −u0 k 2 ≤δ1 (u)
∗ (32) proxτ F (u, w) = ,
G (p) = I k| · |k ∞ ≤1 (p) + hv
b, pi. projk| · |k 1 ≤δ2 (w) (43)
Hence, the primal-dual gap for this problem is proxσG∗ (q ) = projk| · |k ∞ ≤1 (q )
gap(2)
DGTGV (u, p) = I ku0 − · k 2 ≤δ1 (u) + k| ∇ u − v
b|k 1 and the primal-dual gap writes as
(33)
∗
− hu0 , ∇ pi + I k| · |k ∞ ≤1 (p) + hv
b, pi. gapCTGV (u, w, q ) =
The proximal operators are given by k| E (∇ u − w)|k 1 + I k · −u0 k 2 ≤δ1 (u) + I k| · |k 1 ≤δ2 (w)
(44)
+ δ1 k∇∗ E ∗ qk 2 − h∇∗ E ∗ q, u0 i
proxτ F (u) = projku0 − · k 2 ≤δ1 (u)
(34) + δ2 k| E ∗ q|k ∞ + I k| · |k ∞ ≤1 (q ).
proxσG∗ (p) = projk| · |k ∞ ≤1 (p − σ vb).
A.2.4 MTGV
A.2.2 DGTGV
In section 2.3 we defined the MTGV optimization problem (9) as a sort
We reformulate problem (6) by using a substitution w = ∇ u0 − v ,
of mixed version between the TGV (10) and the CTGV (12) problems.
also considered in section 3, where we calculated another duality gap.
Thus, the primal functionals are
Hence, we get the primal functionals for the gradient denoising prob-
lem with operator K = E as F (u, v ) = I k · −u0 k 2 ≤δ1 (u),
(45)
F (w) = k|w|k 1 G(φ, ψ ) = k|φ|k 1 + αk|ψ|k 1
(35)
G(ψ ) = αk| E ∇ u0 − w|k 1 .
with the block operator
Therefore, the dual functionals write as
∇ − Id
K= . (46)
F ∗ (t) = I k| · |k ∞ ≤1 (t) 0 E
∗
(36)
G (q ) = I k| · |k ∞ ≤α (q ) + hE ∇ u0 , qi.
Accordingly, the dual functionals are
With that the primal-dual gap is
F ∗ (s, t) = δ1 ksk 2 + hs, u0 i + I {0} (t),
(1) (47)
gap DGTGV (w, q ) = k|w|k 1 + αk| E ∇ u0 − w|k 1 G∗ (p, q ) = δ2 k|p|k ∞ + I k| · |k ∞ ≤1 (q )
(37)
+ I k| · |k ∞ ≤1 (E ∗ q ) + I k| · |k ∞ ≤α (q ) + hE ∇ u0 , qi.
and the dual operator
The proximal operators are, with Moreau’s identity for the first one, ∗
∇ 0
proxτ F (w) = w − τ proxτ −1 F ∗ (τ −1 w) K= . (48)
− Id E ∗
= w − τ projk| · |k ∞ ≤1 (τ −1 w) (38)
The proximal operators are given by
proxσG∗ (q ) = projk| · |k ∞ ≤α (q − τ E ∇ u0 ).
proxsF (u, v ) = (projk · −u0 k 2 ≤δ1 (u), v )
For the second problem, we already derived all functionals, gaps and
proximal operators in subsection A.2.1, equations (31)–(34). proxtG∗ (p, q ) = (projk| · |k ∞ ≤1 (p), projk| · |k ∞ ≤α (q ))
16 B. Komander et al.
P 1
and the gap function is given by with |v| = d 2 2 in the pointwise sense.
k=1 vk
gapMTGV (u, v, p, q ) = The idea how to project onto an mixed norm ball, i.e. k| · |k 1 ,
can be found with in [24]. There the author developed an algorithm to
k| ∇ u − v|k 1 + αk| E (v )|k 1 + I {k · −u0 k 2 ≤δ1 } (u)
project onto an l1 -norm ball and after that onto a sum of l2 -norm balls.
+ δ1 k∇∗ pk 2 − hp, ∇ u0 i + I {0} (p − E ∗ q ) The projection itself cannot be stated in a simple closed form.
+ I {k| · |k ∞ ≤α} (p) + I {k| · |k ∞ ≤1} (q ).