Convex Optimization L2 18

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Convex functions I

ELL 822–Selected Topics in Communications


Let f : S → R, where S is a nonempty convex set in Rn . The
function f is convex on S if
f (λx1 + (1 − λ)x2 ) ≤ λf (x1 ) + (1 − λ)f (x2 ) for each x1 , x2 ∈ S and
for each λ ∈ (0, 1)
Lecture 2
Convex functions

- Ref: [Boyd] Chapter 3

Strictly convex on S if the above inequality is true as a strict


inequality
Jun B. Seo 2-1 2-2

Level sets Sublevel sets


• Sublevel set associated with f is defined as (for α ∈ R)
Level set of function f with level α is defined as
Sα = {x ∈ dom f |f (x) ≤ α}
S = {x|f (x) = α}
Pn • Let S be a nonempty convex set in Rn and let f : S → R be a
4 2
– Styblinski-Tang function: f (x) = 0.5( i=1 xi − 16xi + 5xi ) convex function. Then, the sublevel set Sα is convex:

4
Proof Suppose x 1 , x 2 ∈ Sα . Thus, we have x 1 , x 2 ∈ S and
100
3 f (x 1 ) ≤ α and f (x 2 ) ≤ α.
2
50
1 Consider x = λx 1 + (1 − λ)x 2 for λ ∈ (0, 1).
0
x2

-50 -1 By convexity of S, we can see x ∈ S and x ∈ Sα .


-2
-100
4
2 4
-3 Since f is convex, we write
0 2 -4
x2 -2 0
-2 x1
-4 -4 -2 0 2 4
-4
x1 f (x) ≤ λf (x 1 ) + (1 − λ)f (x 2 ) ≤ λα + (1 − λ)α = α.

2-3 2-4
Epigraph I Epigraph II
• Let S be a nonempty set in Rn and let f : S → R.
Proof
• The graph of f is described by the set {(x, f (x))|x ∈ S} ⊂ Rn+1
Suppose that f is convex and let (x 1 , y1 ), and (x 2 , y2 ) ∈ epi f
• The epigraph of f , denoted by epi f , is a subset of Rn+1 , i.e.,
– It means that x 1 , x 2 ∈ S and y1 ≥ f (x 1 ) and y2 ≥ f (x 2 ).
epi f = {(x, y)|x ∈ S, y ∈ R, y ≥ f (x)} – Convexity of f enables us to write

f (λx 1 + (1 − λ)x 2 ) ≤ λf (x 1 ) + (1 − λ)f (x 2 )


≤ λy1 + (1 − λ)y2

– Since λx 1 + (1 − λ)x 2 ∈ S, we have

[x 1 + (1 − λ)x 2 , λy1 + (1 − λ)y2 ] ∈ epi f

• Let S be a nonempty convex set. Then, f is convex if and only if


epi f is a convex set:
2-5 2-6

Epigraph III Epigraph IV


Constrained optimization problem in standard form
Proof continued minimize f0 (x)
x
Conversely assume that epi f is convex and let x 1 , x 2 ∈ S subject to fi (x) ≤ 0, for i = 1, . . . , m
Then, while [x 1 , f (x 1 )] ∈ epi f and [x 2 , f (x 2 )] ∈ epi f , due to hj (x) = 0, for j = 1, . . . , p
convexity of epi f , for λ ∈ (0, 1) we have
We can rewrite this in epigraph from as
[λx 1 + (1 − λ)x 2 , λf (x 1 ) + (1 − λ)f (x 2 )] ∈ epi f .
minimize t
x,t
It implies subject to f0 (x) − t ≤ 0
f (λx 1 + (1 − λ)x 2 ) ≤ λf (x 1 ) + (1 − λ)f (x 2 ), fi (x) ≤ 0, for i = 1, . . . , m
hj (x) = 0, for j = 1, . . . , p
which is convex.
– Every convex optimization problem can be transformed into a
problem with linear objective function
2-7 2-8
Epigraph V First-order condition of convex functions I
The epigraph form is an optimization problem in the (epi)graph Let S be a nonempty open set in Rn and let f : S → R be
space (x, t): differentiable on S.
minimize t Then, f is convex if and only if for any x ∈ S, we have
min. f0 (x) = |x| x,t
x
⇒ subject to |x| − t ≤ 0 f (y) ≥ f (x) + ∇f (x)T (y − x) for each y ∈ S,
subject to −x +1≤0
−x +1≤0 df (x) T
h i
df (x)
where ∇f (x) = dx1 , . . . , dxn is gradient of f .

3 feasible
region

-3 0 1 3 – The first order approximation of f at x is a global lower bound


2-9 2-10

First-order condition of convex functions II First-order condition of convex functions III


Proof Proof continued
By convexity of f , To show the converse, suppose a point t = αx + (1 − α)y
f (αy + (1 − α)x) ≤ αf (y) + (1 − α)f (x) We need to show that f is convex if the following holds
= α(f (y) − f (x)) + f (x) f (x) ≥ f (t) + ∇f (t)(x − t)
We rewrite this as f (y) ≥ f (t) + ∇f (t)(y − t)

f (αy + (1 − α)x) − f (x) + αf (x) ≤ αf (y). Multiplying each with α and 1 − α, we have

Finally, we have αf (x) ≥ αf (t) + α∇f (t)(x − t)


f (x + α(y − x)) − f (x) (1 − α)f (y) ≥ (1 − α)f (t) + (1 − α)∇f (t)(y − t)
f (y) ≥f (x) + (y − x)
α(y − x) Adding them yields
f (x + ∆x) − f (x)
=f (x) + (y − x) αf (x) + (1 − α)f (y) ≥ f (t) + ∇f (t)(αx + (1 − α)y − t)
∆x
=f (x) + ∇f (x)T (y − x) = f (αx + (1 − α)y)

2-11 2-12
First-order condition of convex functions IV Convex functions II
• If f is convex and x, y ∈ dom f , we have Let S be an nonempty convex set in Rn and let f : S → R be
t ≥ f (y) ≥ f (x) + ∇f (x)T (y − x) for (y, t) ∈ epi f differentiable on S.

• The epi f has a supporting hyperplane with [∇f (x), −1] at x Then, f is convex if and only if for each x 1 , x 2 ∈ S, we have
" # " #!
y x [∇f (x 2 ) − ∇f (x 1 )]T (x 2 − x 1 ) ≥ 0 (monotone)
(y, t) ∈ epi f ⇒ [∇f (x) − 1] − ≤0
t f (x)
Proof If f is convex, for two distinct x 1 and x 2 we have

f (x 1 ) ≥ f (x 2 ) + ∇f (x 2 )T (x 1 − x 2 )
f (x 2 ) ≥ f (x 1 ) + ∇f (x 1 )T (x 2 − x 1 )

Adding two equations side-by-side, we have

∇f (x 2 )T (x 1 − x 2 ) + ∇f (x 1 )T (x 2 − x 1 ) ≤ 0
Global minimum of convex function f is attained if and only if
non-vertical supporting
∇f (x) = 0 hyperplanes
2-13 2-14

Convex functions III Second-order condition of convex functions I

Proof continued • Let S n denote the set of symmetric n × n matrices, i.e.,


To prove the converse, by assumption, if the following holds for
S n = {X ∈ Rn×n |X = X T }
x = λx 1 + (1 − λ)x 2
– S n++ (or, S n+ ) denote the set of symmetric positive
[∇f (x) − ∇f (x 1 )]T (x − x 1 ) ≥ 0,
(semi)definite matrix, i.e., z T H z > 0 (or, z T H z ≥ 0) for z ∈ Rn
we have (1 − λ)[∇f (x) − ∇f (x 1 )]T (x 2 − x 1 ) ≥ 0, i.e., • Let S be a nonempty open set in Rn and let f : S → R be twice
differentiable on S.
∇f (x)T (x 2 − x 1 ) ≥ ∇f (x 1 )T (x 2 − x 1 )
• f is convex (strictly) function if and only if its Hessian matrix
Using the mean value theorem, i.e.,
∂ 2 f (x)
H (x) = [hij (x)] with hij (x) =
f (x 2 ) − f (x 1 ) = ∇f (x)T (x 2 − x 1 ), ∂xi ∂xj

for x = λx 1 + (1 − λ)x 2 and λ ∈ (0, 1). We have the result. H (x) ∈ S n+ (or S n++ ) over S

2-15 2-16
Second-order condition of convex functions II Second-order condition of convex functions III
Proof Proof continued
Using convexity of f , f (y) ≥ f (x) + ∇f (x)(y − x) for To show the converse, use ‘mean value theorem’ extended to
y = x + λx ∈ S with small λ, we have second order
f (x + λx) ≥ f (x) + λ∇f (x)T x Let f : Rn → R be twice continuously differentiable over an open
set S, and x ∈ S.
Using Taylor expansion of f , we also have
For all y such that x + y ∈ S there exists an α ∈ [0, 1]
1
f (x + λx) = f (x) + λ∇f (x)T x + λ2 x T H (x)x + λ2 kxk2 O(x; λx)
2 1
f (x + y) = f (x) + y T ∇f (x) + y T ∇2 f (x + αy)y
where O(x; λx) → 0 as λ → 0. 2

Plugging this, dividing λ2 and letting λ → 0 yields In using mean value theorem, let x = x + y ∈ S. Then,

1 T 1
x H (x)x + O(x; λx) ≥ 0 f (x) = f (x) + y T ∇f (x) + y T ∇2 f (x + αy)y
2 | {z } 2
→0

2-17 2-18

Second-order condition of convex functions IV Restriction of a convex function to a line I


• f : Rn → R is convex if and only if g : R → R
Proof continued
g(t) = f (x + tv) for dom g = {t|x + tv ∈ domf }
The point x + αy in ∇2 f (x + αy) is expressed as
is convex (in t) for any x ∈ dom f , v ∈ Rn
x + αy = x + α(x − x) = αx + (1 − α)x = x̂ : used to check convexity of f by checking convexity of functions
of one variable
The theorem gives
f (x1 , x2 ) = x21 + x22
1
f (x) = f (x) + y ∇f (x) + y T ∇2 f (x̂)y
T
60 20
f (x1 , x2 ) = x21 − x22
2
0
If 12 y T ∇2 f (x̂)y ≥ 0, then 40

-20
20
T
f (x) ≥ f (x) + ∇f (x) (x − x), -40
0 4
6 2 4
4 0
which completes the proof 2 0
x2 -2 -4 -6 -4 -2 0 2 4 6 x2 -2
-4 -4
-2
0
x1
2
-6 x1

2-19 2-20
Restriction of a convex function to a line II Restriction of a convex function to a line III
Proof
f : S n → R and f (X ) = log det X for dom f = S n++ .
Show whether f is convex or not g(t) = log det X + log det(Q(I + tΛ)Q T )
Proof = log det X + log det((I + tΛ)Q T Q)
n
Y
g(t) = log det(X + tV ) = log det(X 1/2
(I + tX −1/2
VX −1/2
)X 1/2
) = log det X + log det(I + tΛ) = log det X + log (1 + tλi )
i=1
= log det(X (I + tX −1/2 VX −1/2 )) n
X
= log det X + log(1 + tλi )
= log det X + log det(I + tX −1/2 VX −1/2 ) i=1
= log det X + log det(I + tQΛQ T )
By examining g 00 (t), i.e.,
= log det X + log det Q(I + tΛ)Q T
n
00
X 1
where real symmetric matrix A = QΛQ T
with = QQ T QT Q =I g (t) = − <0
i=1
(1 + λi t)2
and Λ is a diagonal matrix of eigenvalues of X −1/2 VX −1/2
we can see that f is concave.
2-21 2-22

Operations that preserve convexity I Operations that preserve convexity II


• Every norm on Rn is convex
• Let f1 , f2 , . . . , fk : Rn → R be convex function.
– Nonnegative weighted sum:
Scalar composition f = h(g(x)), where h : Rk → R and
k
X g : Rn → Rk
f (x) = αi fi (x)
i=1 • f is convex if h is convex and nondecreasing, and g is convex,
is convex for αi > 0
• f is convex if h is convex and nonincreasing, and g is concave
– Pointwise maximum or supremum
• f is concave if h is concave and nondecreasing, and g is concave
f (x) = max{f1 (x), . . . , fk (x)} • f is concave if h is concave and nonincreasing, and g is convex

– Composition with an affine mapping: Suppose f : Rn → R and


h(x) = f (Ax + b).
If f is convex, so is h
2-23 2-24
Operations that preserve convexity III Operations that preserve convexity IV

• Extended-value extension f̃ of convex function f for x ∈ dom f For functions h : Rk → R and gi : Rn → Rk


(
f (x), if x ∈ S Vector composition f = h(g(x)) = h(g1 (x), . . . , gk (x)) where
f̃ (x) =
∞, if x ∈
/S
f 00 = g 0 (x)T ∇2 h(g(x))g 0 (x) + ∇h(g(x))T g 00 (x)
– f̃ is defined on Rn , and takes values in R ∪ {∞}.
– f is convex for x ∈ convex set S, f̃ satisfies for θ ∈ [0, 1], f is convex if h is convex and nondecreasing in each argument, and g
is convex,
f̃ (θx 1 + (1 − θ)x 2 ) ≤ θf̃ (x 1 ) + (1 − θ)f̃ (x 2 )
f is convex if h is convex and nonincreasing in each argument and gi
• The following statements also hold: is concave
f is concave if h is concave and nondecreasing in each argument, and
– f is convex if h is convex, h̃ is nondecreasing, and g is convex,
gi is concave
– f is convex if h is convex, h̃ is nonincreasing, and g is concave

2-25 2-26

Subgradient of convex functions I Subgradient of convex functions II


A subgradient of convex function f : S → R at x ∈ S is any Let f (x) = min{f1 (x), f2 (x)}, where f1 and f2 are defined as
g ∈ Rn such that
f1 (x) = 4 − |x| and f2 (x) = 4 − (x − 2)2 for x ∈ R
T
f (x) ≥ f (x) + g (x − x) for all x ∈ S

always exists for convex f


If f is differentiable, then we have unique g = ∇f (x)

0 1

Subgradient of f at x = 1: λ∇f1 (1) + (1 − λ)∇f2 (1) for λ ∈ [0, 1]


Subgradient of f at x = 4: λ∇f1 (4) + (1 − λ)∇f2 (4) for λ ∈ [0, 1]
Subgradient is also a global underestimator of f at x 2-27 2-28
Subgradient of convex functions III Subgradient of convex functions IV

The subdifferential of f at x is the set of all subgradients at x • If g is a subgradient of f at x, from f (y) ≥ f (x) + g T (y − x),

∂f (x) = {g|f (x) ≥ f (x) + g T (x − x), ∀x ∈ dom f } f (y) < f (x) ⇒ g T (y − x) ≤ 0

0 1

• The nonzero subgradients at x define supporting hyperplanes to


the sublevel set
• ∂f (x) is always a closed convex set (possibly empty), since it is
{y|f (y) ≤ f (x)}
the intersection of an infinite set of halfspaces

2-29 2-30

Subgradient of convex functions V Subgradient of convex functions VI


Let S be a convex set in Rn and f : S → R be convex function. Proof continued
For x ∈ int S, then ∂f (x) is nonempty. a T (x − x) + b(z − f (x)) ≤ 0
Proof
• b > 0, as z → ∞, inequality does not hold: b ≤ 0
Suppose a hyperplane with normal vector [a, b] for a ∈ Rn , b ∈ R
(not both zero) such that for all (x, z) ∈ epi f • b = 0 (this means a vertical hyperplane), we have
" # " #!
x x a T (x − x) ≤ 0
[a T b] − = a T (x − x) + b(z − f (x)) ≤ 0
z f (x)
where (x, f (x)) is a boundary point of epi f which is impossible for all x ∈ S when x ∈ int S
Suppose x + a ∈ int S for  > 0

a T (x − x) = a T a ≤ 0

This implies that a must be also zero, which contradicts nonzero


vector a and b
2-31 2-32
Subgradient of convex functions VII Subgradient of convex functions VIII
Proof continued A subgradient of convex function f : S → R at x ∈ S is any
g ∈ Rn such that
a T (x − x) + b(z − f (x)) ≤ 0
• b < 0, let a
e = a/|b|, and divide both sides with |b|, f (x) ≥ f (x) + g T (x − x) for all x ∈ S
" #T " # " #T " # This is rewritten as
−a x −a x
eT x − z ≤ a
e T x − f (x) ⇒
e e
a ≤ " # " #!
1 f (x) 1 z T T x x
f (x) − g x ≥ f (x) − g x ⇒ [g − 1] − ≤0
f (x) f (x)
for all (x, z) ∈ epi f , while we have a
e ∈ ∂f (x)
At point x, there exist a supporting hyperplane with [g, −1]
• Letting z = f (x), we get a hyperplane, H as
n o
H = x|â T (x̂ − x̂ 0 ) = 0
where
" # " # " #
a
e x x
â = , x̂ = , and x̂ 0 =
−1 f (x) f (x)
2-33 2-34

Subgradient of convex functions IX Properties of subdifferential I


• Scaling: For λ > 0, the function λf is

The following functions are not subdifferentiable at x = 0 ∂(λf )(x) = λ∂f (x)

• f : R → R and dom f = R+ • Sum: the function of f1 + f2 is convex,


(
1, if x = 0 ∂(f1 + f2 )(x) = ∂f1 (x) + ∂f2 (x)
f (x) =
0, if x > 0
• Composition with affine mapping: let φ(x) = f (Ax + b). Then,
• f : R → R and dom f = R+
∂φ(x) = AT ∂f (Ax + b)

f (x) = − x
• Finite pointwise maximum: if f (x) = maxi=1,...,n fi (x), then
The only supporting hyperplane to epi f at (0, f (0)) is vertical
 
∂f (x) = conv ∪i:fi (x)=f (x) ∂fi (x)

convex hull of the union of subdifferentials of all ‘active’ function


at x
2-35 2-36
Properties of subdifferential II Quasi-convex
• Consider a piecewise-linear function Let f : S → R, where S is a nonempty convex set in Rn .
The function f is called quasi-convex (or unimodal)
f (x) = max (aiT x + bi )
i=1,...,m
• if and only if for x1 , x2 ∈ S,
f (λx1 + (1 − λ)x2 ) ≤ max{f (x1 ), f (x2 )} for each λ ∈ (0, 1)
• if and only if all its sublevel set Sα = {x ∈ dom f |f (x) ≤ α} for
α ∈ R are convex
• the function f is called quasi-concave, if −f is quasiconvex

• The subdifferential at x is a polyhedron

∂f = conv{ai |i ∈ I (x)} with I (x) = {i|aiT x + bi = f (x)}

2-37 2-38

First-order condition of Quasi-convex I First-order condition of Quasi-convex II

Let S be a nonempty open convex set in Rn , and let f : S → R


be differentiable on S Proof continued
Then, f is quasiconvex if and only if Suppose f (x2 ) ≤ f (x1 ). We assume that x2 > x1 and show that
for x 1 , x 2 ∈ S and f (x 2 ) ≤ f (x 1 ), ∇f (x 1 )T (x 2 − x 1) ≤ 0 f (z) ≤ f (x1 ) for z ∈ [x1 , x2 ] due to quasi-convexity.

Proof If f (x 1 ) > f (x 2 ), by definition of quasi-convexity, Consider it is not true, i.e., there is a z ∈ [x1 , x2 ] with f (z) > f (x1 )
Then, there exists z such that f 0 (z) < 0
f (λx 2 + (1 − λ)x 1 ) = f (x 1 + λ(x 2 − x 1 )) ≤ f (x 1 )
By definition of quasi-convexity, we must have
We can write for 0 < λ ≤ 1
f (x1 ) ≤ f (z) ⇒ f (z)0 (x1 − z) ≤ 0
f (x 1 + λ(x 2 − x 1 )) − f (x 1 )
(x 2 − x 1 ) ≤ 0
λ(x 2 − x 1 ) However, this contradicts, since f 0 (z) < 0 and x1 − z < 0
As λ → 0, we have the result

2-39 2-40
Pseudo-convex

• For quasi-convex function, ∇f (x) = 0 does not give the


condition of global minimizer
• Let S be a nonempty open set in Rn and let f : S → R be
differentiable on S
The function f is called pseudoconvex:
if for each x 1 , x 2 ∈ S with ∇f (x 1 )T (x 2 − x 1 ) ≥ 0, we have
f (x 2 ) ≥ f (x 1 )
This shows that if ∇f (x) = 0 at any point x, we have

f (x) ≥ f (x) for all x,

which implies that x is a global minimum for f

2-41

You might also like