An Algorithm For Minimization Of A Nondifferentiable Convex Function

Article · July 2009

An Algorithm For Minimization Of

A Nondifferentiable Convex Function

Abstract— In this paper an algorithm for minimization of a xk +1 = xk + α k d k k = 0,1,..., d k ≠ 0 (2)

nondifferentiable function is presented. The algorithm uses the
Moreau-Yoshida regularization of the objective function and its where the step-size αk and the directional vector d k are
second order Dini upper directional derivative. It is proved that
defined by the particular algorithms.
the algorithm is well defined, as well as the convergence of the
sequence of points generated by the algorithm to an optimal Paper is organized as follows: in the second section some
point. An estimate of the rate of convergence is given, too. basic theoretical preliminaries are given; in the third section
the Moreau-Yoshida regularization and its properties are
Index Terms— Moreau-Yoshida regularization, non-smooth described; in the fourth section the definition of the second
convex optimization, second order Dini upper directional order Dini upper directional derivative and the basic
properties are given; in the fifth section the semi-smooth
functions and conditions for their minimization are
I. INTRODUCTION described. Finally in the sixth section a model algorithm is
suggested and its convergence is proved, and an estimate rate
The following minimization problem is considered: of its convergence is given, too.
minn f (x) (1)

where f : R → R ∪ {+ ∞} is a convex and not necessary


* Throughout the paper we will use the following notation.

differentiable function with a nonempty set X of minima.
A vector s refers to a column vector, and ∇ denotes the
For nonsmooth programs, many approaches have been ⎛ ⎞

presented so far and they are often restricted to the convex gradient operator ⎜⎜ ∂ , ∂ ,..., ∂ ⎟⎟ . The Euclidean product
⎝ ∂x1 ∂x 2 ∂x n ⎠
unconstrained case. In general, the various approaches are
based on combinations of the following methods: is denoted by ⋅,⋅ and ⋅ is the associated norm; B ( x, ρ ) is
subgradient methods; bundle techniques and the the ball centred at x with radius ρ . For a given symmetric
Moreau-Yoshida regularization.
For a function f it is very important that its positive definite linear operator M we set ⋅,⋅ M
:= M ⋅,⋅ ;
Moreau-Yoshida regularization is a new function which has hence it is shortly denoted by x := x, x .The smallest
the same set of minima as f and is differentiable with
and the largest eigenvalue of M we denote by λ and Λ
Lipschitz continuous gradient, even when f is not respectively.
The domain of a given function f : R → R ∪ {+ ∞} is
differentiable. In [10], [11] and [19] the second order n
properties of the Moreau-Yoshida regularization of a given
function f are considered. { }
the set dom( f ) = x ∈ R f ( x ) < +∞ . We say that f is

Having in mind that the Moreau-Yoshida regularization of proper if its domain is nonempty.
The point x = arg min f ( x ) refers to the minimum
1 *
a proper closed convex function is an LC function, we
present an optimization algorithm (using the second order x∈R n

point of a given function f : R → R ∪ {+ ∞} .

Dini upper directional derivative (described in [1] and [2]))
The epigraph of a given function f : R → R ∪ {+ ∞}
based on the results from [3]. That is the main idea of this n
We shall present an iterative algorithm for finding an is the set {
epi f = (α , x ) ∈ R × R n α ≥ f ( x ) . The }
optimal solution of problem (1) by generating the sequence
concept of the epigraph gives us a possibility to define
of points {xk } of the following form: convexity and closure of a function in a new way. We say that
f is convex if its epigraph is a convex set, and f is closed
Definition 1. A vector g ∈ R is said to be a subgradient of

n for ∀x ∈ B ⊂ int S , where B is a compact.

a given proper convex function f : R → R ∪ {+ ∞} at a

n Proof. See in [7] or [9].
Proposition 1 Let f : R → R ∪ {+ ∞} be a proper convex
point x ∈ R if the next inequality
function. The condition:
f (z ) ≥ f (x ) + g T ⋅ (z − x ) (3) 0 ∈ ∂f ( x ) (7)
holds for all z ∈ R . The set of all subgradients of f ( x ) at
is a first order necessary and sufficient condition for a global
the point x , called the subdifferential at the point x , is minimizer at x ∈ R . This can be stated alternatively as:

denoted by ∂f ( x ) . The subdifferential ∂f ( x ) is a

∀s ∈ R n , s = 1 max sT g ≥ 0 .
nonempty set if and only if x ∈ dom( f ) . g ∈∂f ( x )

For a convex function f it follows that (8)

{ }
f ( x ) = maxn f ( z ) + g T ( x − z ) holds, where g ∈ ∂f ( z ) Proof. See [13].
Lemma 4. If a proper convex function
(see [4]). f : R → R ∪ {+ ∞} is a differentiable function at a point

The concept of the subgradient is a simple generalization

of the gradient for nondifferentiable convex functions. x ∈ dom ( f ) , then:
Lemma 1. Let f : S → R ∪ {+ ∞} be a convex function ∂f ( x ) = {∇f ( x )} . (9)
defined on a convex set S ⊆ R , and x′ ∈ int S . Let {xk }
Proof. The statement follows directly from Definition 2.
n 1
be a sequence such that xk → x′ , where xk = x′ + ε k s k , Definition 3. The real function f defined on R is LC

and s k → s , and g k ∈ ∂f ( xk ) . Then all function on the open set D ⊆ R

ε k > 0 ,ε k → 0 if it is continuously
differentiable and its gradient ∇f is locally Lipschitz, i.e.
accumulation points of the sequence {g k } lie in the set
∂f ( x′) . ∇f ( x ) − ∇f ( y ) ≤ L x − y for x, y ∈ D (10)
Proof. See in [7] or [6].
Definition 2. The directional derivative of a real function for some L > 0 .
f defined on R n at the point x′ ∈ R n in the direction
s ∈ R n , denoted by f ′( x′, s ) , is III. THE MOREAU-YOSHIDA REGULARIZATION

f ( x′ + t ⋅ s ) − f ( x′) Definition 4. Let f : R → R ∪ {+ ∞} be a proper closed


f ′( x′, s ) = lim (4) convex function. The Moreau-Yoshida regularization of a

t ↓0 t
given function f , associated to the metric defined by M,
when this limit exists. denoted by F, is defined as follows:
Hence, it follows that if the function f is convex and
⎧ 1 ⎫
x′ ∈ dom f , then F ( x ) := minn ⎨ f ( y ) + y − x
M ⎬ (11)
y∈R ⎩ 2 ⎭
f ( x′ + t ⋅ s ) = f ( x′) + t ⋅ f ′( x′, s ) + o(t ) (5)
The above function is an infimal convolution. In [15] it is
holds, which can be considered as one linearization of the proved that the infimal convolution of a convex function is
function f (see in [5]). also a convex function. Hence the function defined by (11) is
Lemma 2. Let f : S → R ∪ {+ ∞} be a convex function
a convex function and has the same set of minima as the
function f (see in [5]), so the motivation of the study of
defined on a convex set S ⊆ R , and x ′ ∈ int S . If the
Moreau-Yoshida regularization is due to the fact that
sequence xk → x′ , where xk = x′ + ε k s k , εk > 0 , minn f ( x ) is equal to minn F ( x ) .
x∈R x∈R
ε k → 0 and sk → s then the next formula: Definition 5. The minimum point p ( x ) of the function (11):
f ( xk ) − f ( x ′) ⎧ 1
p(x ) := argmin ⎨ f ( y ) + y − x

f ′( x′, s ) = lim
= max s T g (6) ⎬ (12)
k →∞ εk g∈∂f ( x′ ) y∈R n
⎩ 2 ⎭

holds. is called the proximal point of x .

Proof. See in [6] or [14]. Proposition 2. The function F defined by (11) is always
Lemma 3. Let f : S → R ∪ {+ ∞} be a convex function
Proof. See in [5].
defined on a convex set S ⊆ R . Then ∂f ( x ) is bounded
The first order regularity of F is well known (see in [5] and

[10]): without any further assumptions, F has a Lipschitzian the point x ∈ R in the direction d ∈ R , i.e.:
n n
gradient on the whole space R . More precisely, for all
g1 − g 2
x1 , x2 ∈ R n the next formula: FD′′ ( x, d ) = limsup d,
α ↓0 α
∇F ( x1 ) − ∇F (x2 ) ≤ Λ ∇F (x1 ) − ∇F ( x2 ), x1 − x2 (13)
g1 ∈ ∂f ( p( x + αd )), g 2 ∈ ∂f ( p ( x ))

holds (see in [10]), where ∇F ( x) has the following form: where F ( x ) is defined by (11).
Lemma 6: Let f : R → R be a closed convex proper
G := ∇F ( x ) = M ( x − p ( x)) ∈ ∂f ( p ( x )) (14)
function and F is its Moreau –Yoshida regularization in the
and p ( x ) is the unique minimum in (11). So, according to sense of definition 5. Then the next statements are valid.
(i) FD′′ ( x k , kd ) = k FD′′ ( xk , d )
above consideration and Definition 3, we conclude that F is
(ii) FD′′( xk , d1 + d 2 ) ≤ 2(FD′′ ( xk , d1 ) + FD′′( xk , d 2 ))
an LC function (see in [11]).
Note that the function f has nonempty subdifferential at
(iii) FD′′( xk , d ) ≤ K ⋅ d , where K is some constant.

any point p of the form p ( x ) . Since p ( x ) is the minimum

Proof. See in [18] and [2].
point of the function (11) then (see in [5] and [10]):
Lemma 7. Let f : R → R be a closed convex proper

p( x ) = x − M g where g ∈ ∂f ( p ( x )) .
(15) function and let F be its Moreau –Yoshida regularization.
Then the next statements are valid.
In [10] it is also proved that for all x1 , x2 ∈ R the next (i) FD′′ ( x, d ) is upper semicontinous with respect to

(x, d ) i.e. if xi → x and d i → d , then
p( x1 ) − p( x2 ) M ≤ M ( x1 − x2 ), p( x1 ) − p( x2 ) (16) lim sup FD′′(xi , di ) ≤ FD′′(x, d )

i →∞
is valid, namely the mapping x → p ( x ) , where p ( x ) is {
(ii) FD′′ ( x, d ) = max d Vd V ∈ ∂ F ( x )
T 2
Λ Proof. See in [18] and [2].
defined by (12), is Lipschitzian with constant (see
Proposition 2.3. in [10]).
Lemma 5: The following statements are equivalent: V. SEMI-SMOOTH FUNCTIONS AND OPTIMALITY CONDITIONS
(i) x minimizes f ; (iv) x minimizes F ;
Definition 7: A function ∇F : R → R
n n
is said to be
(ii) p ( x ) = x (v) f ( p( x )) = f ( x ) ; semi-smooth at the point x ∈ R
if ∇F is locally
(iii) ∇F ( x ) = 0 (vi) F ( x ) = f ( x ) Lipschitzian x∈R at and the limit

Proof. See in [5] or [19].

lim{Vh}, V ∈ ∂ F ( x + λh ) exists for any d ∈ R .
2 n
λ ↓0

IV. DINI SECOND UPPER DIRECTIONAL DERIVATIVE Note that for a closed convex proper function, the gradient
of its Moreau-Yoshida regularization is a semi-smooth
We shall give some preliminaries that will be used in the
remainder of the paper.
Lemma 8. [18]: If ∇F : R → R is semi-smooth at the
n n
Definition 6. [18] The second order Dini upper directional
point x ∈ R then ∇F is directionally differentiable at
derivative of the function f ∈ LC at the point x ∈ R in
1 n

the direction d ∈ R is defined to be

n x ∈ R and for any V ∈ ∂ 2 F ( x + h ), h → 0 we have:

[∇f (x + αd ) − ∇f (x )]T ⋅ d . Vh − (∇F ) ( x, h ) = o( h ) . Similarly we have that
f D′′ ( x, d ) = lim sup
α ↓0 α h T Vh − F ′′( x, h ) = o h ( ). 2

If ∇f is directionally differentiable at xk , we have

Lemma 9: Let f : R → R be a proper closed convex

f D′′( xk , d ) = f ′′(xk , d ) = lim

[∇f (x + αd ) − ∇f (x )]T ⋅ d function and let F be its Moreau-Yoshida regularization.
So, if x ∈ R
α ↓0 α is solution of the problem (1) then
for all d ∈ R .
n F ′( x, d ) = 0 and FD′′ (x, d ) ≥ 0 for all d ∈ R n .
Since the Moreau-Yoshida regularization of a proper Proof. From the definition of the directional derivative
and by Lemma 5 we have that F ′( x, d ) = ∇F ( x ) d = 0 .
1 T
closed convex function f is an LC function, we can
Since x ∈ R is a solution of the problem (1) then according
consider its second order Dini upper directional derivative at

to Lemma 5, theorem 23.1 in [15] and the fact that the next
1 ( ) 1
F xk + q i (k )d k − F (xk ) ≤ − q i (k )σ (FD′′( xk , d k )) (18)
inequalities F ′( x + td , d ) ≥ (F ( x + td ) − F ( x )) ≥ 0 2
hold we have where σ : [0,+∞) →[0,+∞) is a continuous function
F ′( x + td , d ) − F ′( x, d ) satisfying δ1t ≤ σ (t ) ≤ δ 2t and 0 < δ1 < δ 2 < 1 .
FD′′ ( x, d ) = limsup ≥0■
t ↓0 t We suppose that
Lemma 10. Let f : R → R be a proper closed convex
≤ FD′′( xk , d ) ≤ c2 d
2 2
c1 d (19)
function, F its Moreau-Yoshida regularization, and x a
hold for some c1 and c2 such that 0 < c1 < c2 .
point from R . If F ′( x, d ) = 0 and FD′′ ( x, d ) > 0 for all

Lemma 11. Under the assumption (19) the function Φ k (⋅) is

d ∈ R n , then x ∈ R n is a strict local minimizer of the
problem (1).
Proof. From the assumption K >0
there exists
Proof. Suppose that x ∈ R is not a strict minimum of the
( 0 < c1 ≤ K ≤ c2 ) such that FD′′( x, d ) = max d Vd = K d .
2 T

function f . According to Lemma 5 that means that x ∈ R

n 2
V ∈∂ F ( x )

is not a strict minimum of the function F, nor a proximal point Since ∇F is locally Lipschitzian we have that
of the function F. Then there exists a sequence
∇F ( xk ) d ≤ Λ d
holds (see in [17]), therefore
{xk }, xk → x such that F (xk ) ≤ F (x ) holds for every k .
If we define the sequence {x k }, xk → x by xk = x + t k d , ∇F ( xk ) d + ⋅ 0 ≤ Λ d holds and hence we have that:

xk − x
where t k = then by Lemma 8 and Lemma 6 it 1 1
∇F (xk ) d + FD′′(x, d ) − K d ≤ Λ d . Hence, we have:
T 2
2 2
follows that
1 1 1
( ) − Λ d + K d ≤ ∇F(xk ) d + FD′′(x, d )≤ Λ d + K d
2 T 2
F ( xk ) − F (x ) − t k ∇F ( x ) d = t k2 FD′′ ( x, d ) + o d
T 2
2 2 2
1 Φ (d ) 1
holds. Since ∇F ( x ) = 0 (from assumption of Lemma 10) it and − Λ + K d ≤ k ≤Λ+ K d .
2 d 2
1 2
follows that t k FD′′ (x, d ) ≤ 0 , which contradicts the This establish coercivity of the function Φ k .■
Remark. Coercivity of the function Φ k assures that the
optimal solution of problem (17) exists (see in [18]). It also
VI. A MODEL ALGORITHM means that, under the assumption (19) the direction sequence
In this section an algorithm for solving the problem (1) is {d k } is bounded sequence on R n (proof is analogous to the
introduced. We suppose that at each x ∈ R it is possible to
proof in [18]).
compute f ( x), F ( x ), ∇F ( x ) and FD′′ ( x, d ) for a given Proposition 3. If the Moreau-Yoshida regularization F (⋅) of
d ∈R . n
the proper closed convex function f (⋅) satisfies the
At the k-th iteration we consider the following problem
condition (19), then:
1 (i) the function F (⋅) is uniformly and, hence, strictly
minn Φ k (d ), Φ k (d ) = ∇F ( xk ) d + FD′′ ( xk , d ) (17)

d ∈R 2 convex;
where FD′′ ( xk , d ) stands for the second order Dini upper {
(ii) the level set L( x0 ) = x ∈ R : F ( x ) ≤ F ( x0 ) is a
compact convex set, and
directional derivative at xk in the direction d . Note that if
(iii) there exists a unique point x* such that
Λ is a Lipschitzian constant for F it is also a Lipschitzian
constant for ∇F . The function Φ k (d ) is called an
( )
F x* = min F ( x ) .
x∈ L ( x 0 )

iteration function. It is easy to see that Φ k (0 ) = 0 and Proof. (i) From the assumption (19) and the mean value
theorem it follows that for all x ∈ L( x0 ) ( x ≠ x0 ) there
Φ k (d ) is Lipschitzian on R n . We generate the sequence
exists θ ∈ (0,1) such that:
{xk } of the form xk +1 = xk + α k d k , where the direction
F ( x) − F (x0 ) == ∇F ( x0 ) ( x − x0 ) + FD′′(x0 + θ (x − x0 ), x − x0 )
vector d k is a solution of the problem (17), and the step-size 2
αk is a number satisfying α k = q i (k ) , 0 < q < 1 , where 1
≥ ∇F (x0 ) (x − x0 ) + c1 x − x0 > ∇F ( x0 ) ( x − x0 )
T 2 T

i(k ) is the smallest integer from {0,1,2,...} such that 2

that is, F (⋅) is uniformly and consequently strictly convex well-defined.

Proposition 4. . If d k ≠ 0 is a solution of (17), then for any
on L( x0 ) .
(ii) From [16] it follows that the level set L( x0 ) is
continuous σ : [0,+∞) →[0,+∞) satisfying

bounded. The set L( x0 ) is closed and convex because the

δ1t ≤ σ (t ) ≤ δ 2t (where 0 < δ1 < δ 2 < 1 ) there exists a
function F (⋅) is continuous and convex. Therefore the set finite i (k ) such that for all q
∈ 0, q i (k ) i (k )
( *
L( x0 ) is a compact convex set. ( ) 1
F xk + q i (k )d k − F ( xk ) ≤ − q i (k )σ (FD′′( xk , d k ))
(iii) The existence of x follows from the continuity of the
function f (⋅) , and therefore and F (⋅) , on the bounded set Proof . According to Lemma 9.3 from [8] and from the
L( x0 ) . From definition of the level set it follows that: definition of d k it follows that for xk +1 = xk + td k , t > 0

F (x* ) = min F ( x ) = min F ( x ) we have

x∈L ( x 0 ) x∈D Λ 1 Λ
F(xk+1) −F(xk ) ≤t∇F(xk ) dk + t2 dk ≤ −t FD′ (xk,dk ) + t2 dk
T 2 2

Since F (⋅) is strictly convex it follows from [15] that 2 2 2 (23)

* 1 Λ2 2
x is a unique minimizer.■ ≤ −t σ(FD′ (xk,dk )) + t dk
Lemma 12. The following statements are equivalent: 2δ2 2
(i) d = 0 is globally optimal solution of the problem (17)
If we choose t =
σ (FD′′( xk , d k )) and put in (23), we get
(ii) 0 is the optimum of the objective function in (17) Λ dk

(iii) the corresponding xk is such that 0 ∈ ∂f ( x k )

1 δ2 −1 σ 2 (FD′′(xk , dk )) K
Proof. (i) ⇒ (ii:) is obvious F(xk+1 ) − F(xk ) ≤ = − tσ(FD′ (xk ,dk ))
(ii) ⇒ (iii): Let 0 be a global optimum value of (17), then 2 δ2 Λ dk
for any λ >0 and d ∈ Rn :
δ2 −1
= −K < 0 . Taking q
i* (k )
[Kt ] , i.e.
1 δ2 q
0 = Φ k (0 ) ≤ Φ k (λd ) = λ∇F ( xk ) d + FD′′ (xk , λd )

i * (k ) = log q
[Kt ] we have that the claim of the theorem
1 2
= λ∇F (xk ) d + λ FD′′ ( xk , d ) q

where the last equality holds by Lemma 6. Hence dividing
holds for all q
i (k )
∈ 0, q i
(k )
both sides by λ and letting λ↓0 we have that Convergence theorem. Suppose that f is a proper closed
∇F ( xk ) d ≥ 0 holds for any d ∈ R , and consequently
T n
convex function and F is its Moreau-Yoshida regularization
satisfies (19). Then for any initial point x0 ∈ R , xk → x∞ ,
it follows that xk is a stationary point of the function F , i.e.
∇F ( xk ) = 0 . Hence, by Lemma 5 it follows that xk is a as k → +∞ , where x∞ is a unique minimal point of the

minimum point of the function f . function f .

(iii) ⇒ (i): Let xk be a point such that 0 ∈ ∂f ( x k ) . Then Proof. . If d k ≠ 0 is a solution of (17), it follows that
by (14), Lemma 5 and Lemma 9 it follows that Φ k (d k ) ≤ 0 = Φ k (0) . Consequently, we have by the
condition (19) that
∇F ( x k ) d ≥ 0
1 1
∇F ( xk ) d k ≤ − FD′′(xk , d k ) ≤ − c1 d k
T 2
for any d ∈ R . Suppose that d ≠ 0 is the optimal solution
n 2 2
of the problem (17). From the property of the iterative From the above inequality it follows that the vector d k is a
function Φ it follows that: descent direction at xk , i.e. from the relations (18) and (19)
∇F ( xk ) d k ≤ − FD′′ ( xk , d k ) we get
2 ( ) 1
F( xk +1 ) − F(xk ) = F xk + qi(k )dk − F (xk ) ≤ − qi(k )σ (FD′′(xk , dk ))
Hence, (21) implies:
1 i(k ) 1 i(k )
∇F ( x k ) d k < 0 . ≤ − q δ1FD′′(xk , dk ) ≤ − q c1 dk
T 2
2 2
The above two inequalities (20) and (22) are for every d k ≠ 0 . Hence the sequence {F ( xk )} has the
descent property, and, consequently, the sequence
Now we shall prove that there exists a finite i (k ) , i.e.
{xk } ⊂ L(x0 ) . Since L( x0 ) is by the Proposition 3 a
since d k is defined by (17), that the algorithm is
compact convex set, it follows that the sequence {xk } is

bounded. Therefore there exist accumulation points of the But, from the property of the iterative function in (17), we
sequence {xk } . have ∇ F ( x∞ ) d ∞ ≤ −
FD′′ ( x∞ , d ∞ ) . Therefore, we get a

Since ∇F is continuous, then, if ∇F ( xk ) → 0, k → +∞ 2

it follows that every accumulation point x∞ of the sequence
{xk } satisfies ∇F ( x∞ ) = 0 . Since F is (by the Convergence rate theorem. Under the assumptions of the
previous theorem we have that the following estimate holds
Proposition 3) strictly convex, there exists a unique point
for the sequence {xk } generated by the algorithm.
x∞ ∈ L( x0 ) such that ∇F (x∞ ) = 0 . Hence, the sequence
{xk } has a unique limit point x∞ and it is a global ⎡ 1 n−1 F (x ) − F (xk +1 )⎤
F ( xn ) − F (x∞ ) ≤ μ0 ⎢1 + μ0 2 ∑ k ⎥
minimizer of F and by Lemma 5 it is a global minimizer of ⎢⎣ η k =0 ∇F (xk ) 2 ⎥⎦
the function f .
for n = 1,2,3,...
Therefore we have to prove that ∇F ( xk ) → 0, k → +∞ .
where μ0 = F(x0 ) − F(x∞ ) and diamL(x0 ) =η < +∞ (since
Let K1 be a set of indices such that lim xk = x∞ . Then
k∈K 1 by Proposition 3 it follows that L( x0 ) is bounded).
there are two cases to consider: Proof . The proof directly follows from the Theorem 9.2,
a) The set of indices {i (k )} for k ∈ K1 , is uniformly page 167, in [8].
bounded above by a number I . Because of the descent
property, it follows that all points of accumulation have the
