Professional Documents
Culture Documents
Convex Slides 2014 PDF
Convex Slides 2014 PDF
Convex Slides 2014 PDF
CAMBRIDGE, MASS
SPRING 2014
BY DIMITRI P. BERTSEKAS
http://web.mit.edu/dimitrib/www/home.html
LECTURE OUTLINE
• Generic form:
minimize f (x)
subject to x ∈ C
Cost function f : ℜn 7→ ℜ, constraint set C, e.g.,
C = X ∩ x | h1 (x) = 0, . . . , hm (x) = 0
∩ x | g1 (x) ≤ 0, . . . , gr (x) ≤ 0
• Continuous vs discrete problem distinction
• Convex programming problems are those for
which f and C are convex
− They are continuous problems
− They are nice, and have beautiful and intu-
itive structure
• However, convexity permeates all of optimiza-
tion, including discrete problems
• Principal vehicle for continuous-discrete con-
nection is duality:
− The dual of a discrete problem is continu-
ous/convex
− The dual provides info for the solution of the
discrete primal (e.g., lower bounds, etc)
WHY IS CONVEXITY SO SPECIAL?
A union
A union of points Anofintersection
its points An of
intersection of halfspaces
hyperplanes
Time domainAbstract Min-Common/Max-Crossing
Frequency domain Theorems
DUAL DESCRIPTION OF CONVEX FUNCTIONS
(−y, 1)
f (x)
) Slope = y
x0 yx
∗ f1 (x)
y) f2∗ (−y)
) −f2 (x)
) x∗ some x
• Primal problem:
min f1 (x) + f2 (x)
x
• Dual problem:
∗ f1 (x)
Slope y ∗
y) f2∗ (−y)
) −f2 (x)
) x∗ some x
! " ! "
min f1 (x) + f2 (x) = max − f1⋆ (y) − f2⋆ (−y)
x y
_ w
xM Min Common
uw
w Min Common Muw
Min w∗ Point w*
Common
n Point nMin
Point w∗ Point w*
Common
M M
M M
00 uu 00 uu
MaxMax Crossing
Crossing Point q*
g Point q ∗ Max
Max Crossing
Crossing Point q*
g Point q ∗
(a)
(a) ) (b)
(b)
u ww
_
xMM
Min
Min Common
Common Point w*
n Point w∗
Max Crossing S MM
g Max
Point q ∗ Point q*
Crossing
00 uu
) (c)
(c)
Special choices of
Special choices of M
# $ x1
C2 = (x1 , x2 ) | x1 = 0 ,
LP CONVEX
LP CONVEX NLP NLP
LP CONVEX NLP
Duality Gradient/Newton
SimplexSubgradient Cutting plane Interior point
Subgradient Cutting plane Interior point
gradient Cutting plane Interior point Subgradient
THE RISE OF THE ALGORITHMIC ERA
LECTURE OUTLINE
x y x
−2
y
−2
x x
x y
−2 x y
−2
{x | a′j x ≤ bj , j = 1, . . . , r}
(always convex, closed, not always bounded)
− Cones: Sets C such that λx ∈ C for all λ > 0
and x ∈ C (not always convex or closed)
CONVEX FUNCTIONS
) αf (x)a f(x)
+ (1+ (1
− -α)f (y)
a )f(y)
) ff(y)
(y)
ff(x)
(x)
! "
f αxf(a+x(1
+ (1 a )y)
−- α)y
xx a x+
αx (1-−
+ (1 a )y
α)y xy
yC
(x)
f f(x) Epigraph
Epigraph f (x)
f(x) Epigraph
Epigraph
x
dom(f ) xx x
dom(f )
Convex function Nonconvex function
Convex function Nonconvex function
epi(f )
# $ X x
x | f (x) ≤ γ
• (ii) ⇒ (iii): Let (xk , wk ) ⊂ epi(f ) with
(xk , wk ) → (x, w). Then f (xk ) ≤ wk , and
f (x) ≤ lim inf f (xk ) ≤ w so (x, w) ∈ epi(f )
k→∞
0 if x ∈ (0, 1),
n
fˆ(x) =
∞ if x = 0 or x = 1.
Then f and fˆ have the same epigraph, and
both are not closed. But f is lower-semicon-
tinuous at all x of its domain while fˆ is not.
• Note that:
− If f is lower semicontinuous at all x ∈ dom(f ),
it is not necessarily closed
− If f is closed, dom(f ) is not necessarily closed
• Proposition: Let f : X 7→ [−∞, ∞] be a func-
tion. If dom(f ) is closed and f is lower semicon-
tinuous at all x ∈ dom(f ), then f is closed.
PROPER AND IMPROPER CONVEX FUNCTIONS
x x
dom(f ) dom(f )
per Function Not Closed Improper Function Closed Improper Function Not Closed Improp
LECTURE OUTLINE
∂f f (x + αei ) − f (x)
(x) = lim ,
∂xi α→0 α
Proof that
Proof that
so x∗ minimizes f over C.
Converse: Assume the contrary, i.e., x∗ min-
imizes f over C and ∇f (x∗ )′ (x − x∗ ) < 0 for some
x ∈ C. By differentiation, we have
f x∗ x∗ )
+ α(x − − f (x∗ )
lim = ∇f (x∗ )′ (x−x∗ ) < 0
α↓0 α
so f x∗
+ α(x − x∗ )
decreases strictly for suffi-
ciently small α > 0, contradicting the optimality
of x∗ . Q.E.D.
PROJECTION THEOREM
(x − x∗ )′ (z − x∗ ) ≤ 0, ∀x∈C
∇f (x∗ )′ (x − x∗ ) ≥ 0, ∀ x ∈ C.
TWICE DIFFERENTIABLE CONVEX FNS
• Given a set X ⊂ ℜn :
• A convex combination
Pm of elements of X is a
vector of
Pthe form i=1 αi xi , where xi ∈ X, αi ≥
m
0, and i=1 αi = 1.
• The convex hull of X, denoted conv(X), is the
intersection of all convex sets containing X. (Can
be shown to be equal to the set of all convex com-
binations from X).
• The affine hull of X, denoted aff(X), is the in-
tersection of all affine sets containing X (an affine
set is a set of the form x + S, where S is a sub-
space).
• A nonnegative combination
Pm of elements of X is
a vector of the form i=1 αi xi , where xi ∈ X and
αi ≥ 0 for all i.
• The cone generated by X, denoted cone(X), is
the set of all nonnegative combinations from X:
− It is a convex cone containing the origin.
− It need not be closed!
− If X is a finite set, cone(X) is closed (non-
trivial to show!)
CARATHEODORY’S THEOREM
x4 x2
X conv(X)
z x1
x2 x
xx x
z x1
cone(X)
0 x3
(a) ) (b)
1
(x, 1)
Y
0
n x
X
AN APPLICATION OF CARATHEODORY
LECTURE OUTLINE
C xα = αx + (1 − α)x
x
x
0 αǫ
αǫ Sα
xS
z z1
0
X
z1 and z2 are linearly
C independent, belong to
C and span aff(C)
z2
• Proposition:
(a) We have cl(C) = cl ri(C) and ri(C) = ri cl(C) .
(b) Let C be another nonempty convex set. Then
the following three conditions are equivalent:
(i) C and C have the same rel. interior.
(ii) C and C have the same closure.
(iii) ri(C) ⊂ C ⊂ cl(C).
Proof: (a) Since ri(C) ⊂ C, we have cl ri(C) ⊂
cl(C). Conversely, let x ∈ cl(C). Let x ∈ ri(C).
By the Line Segment Principle, we have
x
C
The proof of ri(C) = ri cl(C) is similar.
LINEAR TRANSFORMATIONS
x + αd ∈ C, ∀ x ∈ C, ∀ α ≥ 0
Recession Cone RC
C
d
x + αd
RC∩D = RC ∩ RD
z3 d x + d1
d z2 x + d2
z1 = x + d x + d3
x
x+d
(zk − x)
zk = x + kd, dk = kdk
kzk − xk
We have
dk kzk − xk d x−x kzk − xk x−x
= + , → 1, → 0,
kdk kzk − xk kdk kzk − xk kzk − xk kzk − xk
x y Ck C
k y C yk+1 yk
AC
LECTURE OUTLINE
epi(f)
γ
“Slice” {(x,γ) | f(x) ≤ γ}
0 Recession
Cone of f
(b) This is the case where RVγ = {(0, 0)} for all γ
such that Vγ is nonempty.
RECESSION FUNCTION
f (x)
epi(f )
epi(f )
f (x)
) rf (d) ) epi(rf ) = Repi(f ) ) rf (d)
Constraint Perturbed
= Constraint Constraint Perturbed Constraint
10 dd)
x, d x, 10 dd)
x, d x,
) epi(rf ) = Repi(f )
=
• We have
Rf = {d | (d, 0) ∈ Repi(f ) } = d | rf (d) ≤ 0
) f (x + αd)
αγ αγ
10
) Slope = rf (d)
10 αγ αγ
f(x + a y) f(x + a y)
f (x + αd) f (x + αd)
αa αa
(a) (b)
f(x + a y) f(x + a y)
f (x + αd) f (x + αd)
rf (d) = 0 rf (d) = 0
f(x) f(x)
x, f (x) x, f (x)
αa αa
(c) (d)
f(x + a y) f(x + a y)
f (x + αd) f (x + αd)
αa αa
(e) (f)
• Consider
f (x) = x′ Qx + a′ x + b
Rf = {d | Qd = 0, a′ d ≤ 0}
Lf = (Rf ) ∩ (−Rf ) = {d | Qd = 0, a′ d = 0}
• Recession function:
a′ d if d ∈ N (Q),
rf (d) =
∞ if d ∈ / N (Q).
lim f (xk ) = ∞.
k→∞
X ∗ = ∩∞
k=1 (X ∩ Vk ).
f = f1 + · · · + fm
LECTURE OUTLINE
• Hyperplanes
• Supporting and Separating Hyperplane Theo-
rems
• Strict Separation
• Proper Separation
• Nonvertical Hyperplanes
Positive Halfspace
e {x | a′ x ≥ b}
Hyperplane
{x | a′ x = b} = {x | a′ x = a′ x}
Negative Halfspace
e {x | a′ x ≤ b}
either a′ x1 ≤ b ≤ a′ x2 , ∀ x1 ∈ C1 , ∀ x2 ∈ C2 ,
or a′ x2 ≤ b ≤ a′ x1 , ∀ x1 ∈ C1 , ∀ x2 ∈ C2
a
a
C1 y C2 C
(a) ) (b)
a′ x1 < b < a′ x2 , ∀ x1 ∈ C1 , ∀ x2 ∈ C2
C1
C1 y C2
y C2
d x1
x
1 x2
(a) ) (b)
SUPPORTING HYPERPLANE THEOREM
x̂0 a0 x0
C
x̂1
1 x̂2
a1
x̂3 x1
a2
a x a3
x2
x3
a′ x1 ≤ a′ x2 , ∀ x1 ∈ C1 , ∀ x2 ∈ C2 .
C1 − C2 = {x2 − x1 | x1 ∈ C1 , x2 ∈ C2 }
C1
C1 y C2
y C2
d x1
x
1 x2
(a) a) (b)
C1 y C2
C1 y C2
C1 y C2
a a
a
ri(C1 ) ∩ ri(C2 ) = Ø
PROPER POLYHEDRAL SEPARATION
ri(C) ∩ ri(P ) = Ø
LECTURE OUTLINE
(−y, 1)
f (x)
) Slope = y
x0 yx
f ⋆ (y) = sup x′ y − f (x) , y ∈ ℜn
x∈ℜn
!
f (x) = αx − β β if y = α
f ⋆ (y) =
∞ if y "= α
) Slope = α
β
x0 β/α yx x0 βα =y
"
!
f (x) = |x| 0 if |y| ≤ 1
f ⋆ (y) =
∞ if |y| > 1
x0 yx α −1 x0 11 =y
x0 yx x0 =y
CONJUGATE OF CONJUGATE
x′ y − f (x)
as x ranges over ℜn .
• Consider the conjugate of the conjugate:
f ⋆⋆ (x) = sup y′x − f ⋆ (y) , x ∈ ℜn .
y∈ℜn
f ⋆⋆ (x) ′ ⋆
= sup y x − f (y) , x ∈ ℜn
y∈ℜn
f (x)
(−y, 1)
) Slope = y
! "
Hyperplane
f ⋆⋆ (x) = sup y ′ x − f ⋆Hyperplane
(y)
! "
n
H = (x, w) | w − x y = −f (y)
′ ⋆
y∈ℜ
#
yx x0
(a) We have
f (x) ≥ f ⋆⋆ (x), ∀ x ∈ ℜn
f (x) = f ⋆⋆ (x), ∀ x ∈ ℜn
ˇ f (x) = f ⋆⋆ (x),
cl ∀ x ∈ ℜn
PROOF OF CONJUGACY THEOREM (A), (B)
Hence
f ⋆ (y) = sup y′z − f (z) ≤ c < y ′ x − f ⋆⋆ (x),
z∈ℜn
contradicting the fact f ⋆⋆ (x)
= supy∈ℜn y ′ x −
f (y) . Thus, epi(f ⋆⋆ ) ⊂ epi(f ), which implies
⋆
We have
f ⋆ (y) = ∞, ∀ y ∈ ℜn ,
f ⋆⋆ (x) = −∞, ∀ x ∈ ℜn .
But
ˇ f = f,
cl
ˇ f 6= f ⋆⋆ .
so cl
A FEW EXAMPLES
1 1
• lp and lq norm conjugacy, where p + q =1
n n
1X 1 X
f (x) = |xi |p , f ⋆ (y) = |yi |q
p i=1 q i=1
1 ′
f (x) = x Qx + a′ x + b,
2
1
f ⋆ (y) = (y − a)′ Q−1 (y − a) − b.
2
• Conjugate of a function obtained by invertible
linear transformation/translation of a function p
f (x) = p A(x − c) + a′ x + b,
f ⋆ (y) = q (A′ )−1 (y − a) + c′ y + d,
where q is the conjugate of p and d = −(c′ a + b).
LECTURE 8
LECTURE OUTLINE
f ⋆⋆ (x) ′ ⋆
= sup y x − f (y) , x ∈ ℜn
y∈ℜn
f (x)
(−y, 1)
) Slope = y
! "
Hyperplane
f ⋆⋆ (x) = sup y ′ x − f ⋆Hyperplane
(y)
! "
n
H = (x, w) | w − x y = −f (y)
′ ⋆
y∈ℜ
#
yx x0
x̂
y
0 σX (y)/"y"
• If C is a cone,
0 if y ′ x ≤ 0, ∀ x ∈ C,
n
σC (y) = sup y′x =
x∈C ∞ otherwise
C ∗ = {y | y ′ x ≤ 0, ∀ x ∈ C}
• By the Conjugacy
Theorem the polar cone of C ∗
is cl conv(C) . This is the Polar Cone Theorem.
0 a$ 1% C
"
0 a$ %1
1 a2 "
{y | y ′ a1 ≤ 0} C
*+,-,+.$& ,!,!/ $&
1 a$2&
!
=!0} # = 0}
"
C C
# ∗ C" C ∗
{y |% ,!,!/
*+,-,+.$ y ′ a2 ≤ 0}
(a) ) (b)
'$( ')(
POLYHEDRAL CONES - FARKAS’ LEMMA
A union
A union of points Anofintersection
its points An of
intersection of halfspaces
hyperplanes
Time domainAbstract Min-Common/Max-Crossing
Frequency domain Theorems
(−y, 1)
f (x)
) Slope = y
x0 yx
M M
M M
00 uu 00 uu
MaxMax Crossing
Crossing Point q*
g Point q ∗ Max
Max Crossing
Crossing Point q*
g Point q ∗
(a)
(a) ) (b)
(b)
u ww
_
xMM
Min
Min Common
Common Point w*
n Point w∗
Max Crossing S MM
g Max
Point q ∗ Point q*
Crossing
00 uu
b) (c)
(c)
MATHEMATICAL FORMULATIONS
ξ ≤ w + µ′ u, ∀ (u, w) ∈ M
Max crossing problem is to maximize ξ subject to
ξ ≤ inf (u,w)∈M {w + µ′ u}, µ ∈ ℜn , or
△
maximize q(µ) = inf {w + µ′ u}
(u,w)∈M
subject to µ ∈ ℜn
GENERIC PROPERTIES – WEAK DUALITY
subject to µ ∈ ℜn
M = epi(p)
and finally
q(µ) = inf p(u) + µ′ u
m
u∈ℜ
w∗ = infn F (x, 0)
x∈ℜ
LECTURE 9
LECTURE OUTLINE
M M
M M
00 uu 00 uu
MaxMax Crossing
Crossing Point q*
g Point q ∗ Max
Max Crossing
Crossing Point q*
g Point q ∗
(a)
(a) ) (b)
(b)
u ww
_
xMM
Min
Min Common
Common Point w*
n Point w∗
Max Crossing S MM
g Max
Point q ∗ Point q*
Crossing
00 uu
REVIEW OF THE MC/MC FRAMEWORK
△
w∗ = inf w, q∗ = sup q(µ) = inf {w+µ′ u}
(0,w)∈M µ∈ℜn (u,w)∈M
minimize c′ x
subject to a′j x ≥ bj , j = 1, . . . , r,
where c ∈ ℜn , aj ∈ ℜn , and bj ∈ ℜ, j = 1, . . . , r.
• For µ ≥ 0, the dual function has the form
maximize b′ µ
Xr
subject to aj µj = c, µ≥0
j=1
MINIMAX PROBLEMS
Given φ : X × Z 7→ ℜ, where X ⊂ ℜn , Z ⊂ ℜm
consider
minimize sup φ(x, z)
z∈Z
subject to x ∈ X
or
maximize inf φ(x, z)
x∈X
subject to z ∈ Z
minimize f (x)
subject to x ∈ X, g(x) ≤ 0
f (x) if g(x) ≤ 0,
(
min sup L(x, µ) =
x∈X µ≥0
∞ otherwise
• Write the dual problem as
? ∗
inf sup L(x, µ) = w∗ q = sup infn L(x, µ)
x∈ℜn µ≥0 = µ≥0 x∈ℜ
ZERO SUM GAMES
x = (x1 , . . . , xn ), z = (z1 , . . . , zm )
• We have
so
is convex.
• Min Common/Max Crossing Theorem I: We
have q ∗ = w ∗ if and only if for every sequence
(uk , wk ) ⊂ M with uk → 0, there holds
w∗ ≤ lim inf wk .
k→∞
LECTURE OUTLINE
M M
M M
00 uu 00 uu
MaxMax Crossing
Crossing Point q*
g Point q ∗ Max
Max Crossing
Crossing Point q*
g Point q ∗
(a)
(a) ) (b)
(b)
u ww
_
xMM
Min
Min Common
Common Point w*
n Point w∗
Max Crossing S MM
g Max
Point q ∗ Point q*
Crossing
00 uu
b) (c)
(c)
REVIEW OF THE MC/MC FRAMEWORK
△
q ∗ = sup q(µ) = inf {w + µ′ u}
µ∈ℜn (u,w)∈M
is convex.
• Min Common/Max Crossing Theorem I: We
have q ∗ = w ∗ if and only if for every sequence
(uk , wk ) ⊂ M with uk → 0, there holds
w∗ ≤ lim inf wk .
k→∞
PROOF OF THEOREM I
• Assume that = q∗ w∗ . Let (uk , wk ) ⊂ M be
such that uk → 0. Then,
q(µ) = inf {w+µ′ u} ≤ wk +µ′ uk , ∀ k, ∀ µ ∈ ℜn
(u,w)∈M
Conversely,
assume that for every sequence
(uk , wk ) ⊂ M with uk → 0, there holds w∗ ≤
lim inf k→∞ wk . If w∗ = −∞, then q ∗ = −∞, by
weak duality, so assume that −∞ < w∗ . Steps:
• Step 1: (0, w∗ − ǫ) ∈
/ cl(M ) for any ǫ > 0.
PROOF OF THEOREM I (CONTINUED)
• Let X ⊂ ℜn , f : X 7→ ℜ, and gj : X 7→ ℜ,
j = 1, . . . , r, be convex. Assume that
f (x) ≥ 0, ∀ x ∈ X with g(x) ≤ 0
Assume there exists a vector x ∈ X such that
gj (x) < 0 for all j = 1, . . . , r. Then there exists
µ ≥ 0 such that
= 0} = 0} = 0}
w (µ, 1) w (µ, 1)
• Apply MC/MC to
M = (u, w) | there is x ∈ X s. t. g(x) ≤ u, f (x) ≤ w
! "
w !
M = (u, w) | there exists x ∈ X "
C such that g(x) ≤ u, f (x) ≤ w
! "
(g(x), f (x)) | x ∈ X
(0, w∗ )
0 u
w (µ, 1) D
minimize f (x)
subject to x ∈ X, gj (x) ≤ 0, j = 1, . . . , r,
• It follows that
′
f∗ ∗
≤ inf f (x)+µ g(x) ≤ inf f (x) = f ∗ .
x∈X x∈X, g(x)≤0
f (x) ≥ 0, ∀ x ∈ X with Ax − b ≤ 0
f (x) + µ′ (Ax − b) ≥ 0, ∀ x ∈ X
! "
M = M̃ − (u, 0) | u ∈ P
w∗
Θθ
!
P ) q(µ)
= 0} u = 0} u = 0} u
LECTURE OUTLINE
minimize f (x)
subject to x ∈ X, gj (x) ≤ 0, j = 1, . . . , r,
subject to x1 ≤ 0, x ∈ X = {x | x ≥ 0}
0 if u > 0,
(
√
p(u) = inf e x1 x2
− = 1 if u = 0,
x1 ≤u, x≥0
∞ if u < 0,
0 if u > 0
!
√
p(u) = inf − =
e x1 x2 1 if u = 0
x1 ≤u, x≥0
∞ if u < 0
0.8
√
0.6 e x1 x2
− =
0
0.4
0.2
0 20
0 15
5 10
10
15 5
20 0 u x2
2 x1 = u
1
q(µ) = inf {x + µx2 } =− , ∀ µ > 0,
x∈ℜ 4µ
) p(u)
) epi(p)
10 u
QUADRATIC PROGRAMMING DUALITY
! "
(g(x), f (x)) | x ∈ X
Corresponds to optimal x∗
0 u
(µ∗ , 1) Slope =
x∗ Primal feasibility violated
Complementary slackness violated
Ax∗ ≤ b, µ∗ ≥ 0
x∗ = −Q−1 (c + A′ µ∗ )
• The problem is
minimize f (x)
subject to x ∈ X, g(x) ≤ 0, Ax = b,
′
where X is convex, g(x) = g1 (x), . . . , gr (x) , f :
X 7→ ℜ and gj : X 7→ ℜ, j = 1, . . . , r, are convex.
• Convert the constraint Ax = b to Ax ≤ b
and −Ax ≤ −b, with corresponding dual variables
λ+ ≥ 0 and λ− ≥ 0.
• The Lagrangian function is
subject to µ ≥ 0, λ ∈ ℜm .
DUALITY AND OPTIMALITY COND.
′ ′
= inf n f1 (x1 ) − λ x1 + inf n f2 (x2 ) + λ x2
x1 ∈ℜ x2 ∈ℜ
∗ f1 (x)
λ Slope λ∗
−f1⋆ (λ)
(c) Slope λ S
) q(λ)
f2⋆ (−λ)
) −f2 (x)
∗ f∗ = q∗
) x∗ some x
0 if x ∈ C,
n
f1 (x) = f (x), f2 (x) =
∞ if x ∈
/ C.
if λ ∈ C ∗ ,
n
′
′ 0
f1⋆ (λ) = sup λ x−f (x) , f2⋆ (λ) = sup λ x =
x∈ℜn ∞ / C∗,
if λ ∈
x∈C
where C ∗ = {λ | λ′ x ≤ 0, ∀ x ∈ C}.
• The dual problem is
minimize f ⋆ (λ)
subject to λ ∈ Ĉ,
Ĉ = {λ | λ′ x ≥ 0, ∀ x ∈ C}.
LECTURE OUTLINE
subject to µ ≥ 0
• Important point: The calculation of the dual
function has been decomposed into m simpler min-
imizations.
• Another important point: If Xi is a discrete
set (e.g., Xi = {0, 1}), the dual optimal value is
a lower bound to the optimal primal value. It is
still useful in a branch-and-bound scheme.
LARGE SUM PROBLEMS
minimize f (x)
subject to a′j x ≤ bj , j = 1, . . . , r,
c b
c Affine set b + S x
X y0
x∗ Cone C
0 if x ∈ C,
n
f1 (x) = f (x), f2 (x) =
∞ if x ∈
/ C.
if λ ∈ C ∗ ,
n
′
′ 0
f1⋆ (λ) = sup λ x−f (x) , f2⋆ (λ) = sup λ x =
x∈ℜn ∞ / C∗,
if λ ∈
x∈C
minimize c′ x
subject to x − b ∈ S, x ∈ C.
• The conjugate is
minimize b′ λ
subject to λ − c ∈ S ⊥ , λ ∈ Ĉ.
min c′ x ⇐⇒ max b′ λ,
Ax=b, x∈C c−A′ λ∈Ĉ
min c′ x ⇐⇒ max b′ λ,
Ax−b∈C A′ λ=c, λ∈Ĉ
where x ∈ ℜn , λ ∈ ℜm , c ∈ ℜn , b ∈ ℜm , A : m×n.
• Proof of first relation: Let x be such that Ax =
b, and write the problem on the left as
minimize c′ x
subject to x − x ∈ N(A), x∈C
• The dual conic problem is
minimize x′ µ
subject to µ − c ∈ N(A)⊥ , µ ∈ Ĉ
• Using N(A)⊥ = Ra(A′ ), write the constraints
as c − µ ∈ −Ra(A′ ) = Ra(A′ ), µ ∈ Ĉ, or
• Nonnegative Orthant: C = {x | x ≥ 0}
• The Second Order Cone: Let
q
C = (x1 , . . . , xn ) | xn ≥ x21 + · · · + x2n−1
x3
x1
x2
minimize c′ x
subject to Ai x − bi ∈ Ci , i = 1, . . . , m,
C = C1 × · · · × Cm
Ax − b ∈ C
SECOND ORDER CONE DUALITY
min c′ x ⇐⇒ max b′ λ,
Ax−b∈C A′ λ=c, λ∈Ĉ
where λ = (λ1 , . . . , λm ).
• The duality theory is no more favorable than
the one for linear-conic problems.
• There is no duality gap if there exists a feasible
solution in the interior of the 2nd order cones Ci .
• Generally, 2nd order cone problems can be
recognized from the presence of norm or convex
quadratic functions in the cost or the constraint
functions.
• There are many applications.
EXAMPLE: ROBUST LINEAR PROGRAMMING
minimize c′ x
subject to a′j x ≤ bj , ∀ (aj , bj ) ∈ Tj , j = 1, . . . , r,
minimize c′ x
subject to gj (x) ≤ 0, j = 1, . . . , r,
where gj (x) = sup(aj ,bj )∈Tj {a′j x − bj }.
• For the special choice where Tj is an ellipsoid,
(aj + Pj uj , bj + qj′ uj )
Tj = | kuj k ≤ 1, uj ∈ ℜnj
′
′
gj (x) = sup (aj + Pj uj ) x − (bj + qj uj )
kuj k≤1
= kPj′ x − qj k + a′j x − bj .
LECTURE OUTLINE
• Examples:
xk+1 = xk + αk dk , k = 0, 1, . . .
dk = −Sk ∇f (xk )
f (x + αd) − f (x)
f ′ (x; d) = lim , x ∈ dom(f ), d ∈ ℜn
α↓0 α
f (x + αd)
f (x+αd)−f (x)
Slope: α
xk+1 = xk − αk Sk ∇f (xk ), k = 0, 1, . . .
60
1
40
x2
z 20
-1
0
-2 3
2
-20 1
0
3 -1 x1
-3 2 1 0 -2
-3 -2 -1 0 1 2 3 -1 -2 -3
-3
x1 x2
CONSTRAINED CASE: GRADIENT PROJECTION
xk+1 = xk + αk dk
X x
xk
xk+1
X xk − α∇f (xk )
1 ′ψ
ψi,k = ψi−1,k − 2
(c i i−1,k −bi )ci , i = 1, . . . , m,
kci k
1 ′x
for the case fi (x) = 2kci k 2 (c i − bi )2
1 x0
x∗
x2
x1
Hyperplanes c′i x = bi
COMPARE W/ NONINCREMENTAL GRADIENT
x*
ac
i i R aci x
min i
min max i i
max
i b bi i i bbi i
m−1
!
X
xk+1 = PX xk − αk ∇fik−ℓ (xk−ℓ )
ℓ=0
• Stochastic gradient method for f (x) = E F (x, w)
where w is a random variable, and F (·, w) is a con-
vex function for each value of w:
xk+1 = PX xk − αk ∇F (xk , wk )
LECTURE OUTLINE
f (z)
Epigraph of f (−g, 1)
Translated
! "
x, f (x)
0 z
• Some examples:
! "
f (x) = |x|
!"#$%&%'#' f (x) = max*+,%"-./$"#
!"#$%&%()# 0, (1/2)(x
/ %0%-$21− 1)
X+y 0 Y #x 0%-
−1 X y +0 |-1 Y x#
1 ∂f (x)
!!"#$ 1 ∂f (x)
!!"#$
| 1-
X +y 0 Y x
# 0%-
−1 X y +0 | -1 Y x#
0%-
−1
f (z) fx (z)
Translated
Epigraph of f (−g, 1) Epigraph of f
Translated Translated (−g, 1)
! "
x, f (x)
0 0
z z
δC (x) + g ′ (z − x) ≤ δC (z), ∀ z ∈ C,
! !
xf C xf C
NC (x) NC (x)
CALCULUS OF SUBDIFFERENTIALS
F = f1 + · · · + fm .
∩m
Assume that 1=1 ri dom(fi ) 6= Ø. Then
g ′ (x − x∗ ) ≥ 0, ∀ x ∈ X.
Proof: x∗ minimizes
Q.E.D.
ILLUSTRATION OF OPTIMALITY CONDITION
Level Sets of f
X NX (x∗ ) Complementary
X NX (xslackness violated slackness violated
∗ ) Complementary
Level Sets of f
x) x∗ x) x∗
X N X N
) ∇f (x∗ ) −g
y) ∂f (x∗ )
which is equivalent to
∇f (x∗ )′ (x − x∗ ) ≥ 0, ∀ x ∈ X.
• Let
f (x) = max φ(x, z),
z∈Z
where x ∈ ℜn , z ∈ ℜm , φ : ℜn × ℜm 7→ ℜ is
a function, Z is a compact subset of ℜm , φ(·, z)
is convex and differentiable for each z ∈ Z, and
∇x φ(x, ·) is continuous on Z for each x. Then
∂f (x) = conv ∇x φ(x, z) | z ∈ Z(x) , x ∈ ℜn ,
• Special case: f (x) = max φ1 (x), . . . , φm (x)
where φi are differentiable convex. Then
∂f (x) = conv ∇φi (x) | i ∈ I(x) ,
where
I(x) = i | φi (x) = f (x)
IMPORTANT ALGORITHMIC POINT
minimize f (x)
subject to x ∈ X, g(x) ≤ 0,
where f : ℜn 7→ ℜ, g : ℜn 7→ ℜr , X ⊂ ℜn . Con-
sider the dual problem maxµ≥0 q(µ), where
q(µ) = inf f (x) + µ′ g(x) .
x∈X
LECTURE OUTLINE
f (z)
Epigraph of f (−g, 1)
Translated
! "
x, f (x)
0 z
60
1
40
x2
z 20
-1
0
-2 3
2
-20 1
0
3 -1 x1
-3 2 1 0 -2
-3 -2 -1 0 1 2 3 -1 -2 -3
-3
x1 x2
xk+1 = PX (xk − αk gk ),
∂f (xk )
Level sets of q gk
Level sets of f M
X
xk mk
m*∗
x
xmkk −
+ sgαkk gk
Levelsets
Level sets of
of q f X M
xmkk o
90◦
< 90
<
xm∗*
k+1 =P
k+1 =
xm PM (mkk −
X (x + sαkkggkk))
xkk+−
m g kk gk
s kα
kxk+1 −yk ≤ kxk −yk −2αk f (xk )−f (y) +α2k kgk k2
2 2
Optimal Solution
Optimal solution
Set set
t !x"0
STEPSIZE RULES
• Constant Stepsize: αk ≡ α.
P
• Diminishing Stepsize: αk → 0, k αk = ∞
• Dynamic Stepsize:
f (xk ) − fk
αk =
c2
where fk is an estimate of f ∗ :
− If fk = f ∗ , makes progress at every iteration.
If fk < f ∗ it tends to oscillate around the
optimum. If fk > f ∗ it tends towards the
level set {x | f (x) ≤ fk }.
− fk can be adjusted based on the progress of
the method.
• Example of dynamic stepsize rule:
fk = min f (xj ) − δk ,
0≤j≤k
αc2
f≤ f∗ + .
2
• Proposition: If αk satisfies
∞
X
lim αk = 0, αk = ∞,
k→∞
k=0
then f = f ∗ .
• Similar propositions for dynamic stepsize rules.
• Many variants ...
CONVERGENCE METHODOLOGY I
Yk+1 ≤ (1 + Vk )Yk − Zk + Wk , ∀ k,
and ∞ ∞
X X
Wk < ∞, Vk < ∞
k=0 k=0
P∞
Then {Yk } converges and k=0 Zk < ∞.
• Supermartingale convergence (stochastic case):
Let {Yk }, {Zk }, {Wk }, and {Vk } be four nonneg-
ative sequences of random variables, and let Fk ,
k = 0, 1, . . ., be sets of random variables such that
Fk ⊂ Fk+1 for all k. Assume that
(1) For each k, Yk , Zk , Wk , and Vk are functions
of the random variables in Fk .
(2) E Yk+1 | Fk ≤ (1 + Vk )Yk − Zk + Wk ∀ k
P∞ P∞
(3) There holds, k=0 Wk < ∞, k=0 Vk < ∞
ThenP{Yk } converges to some random variable Y ,
∞
and k=0 Zk < ∞, with probability 1.
CONVERGENCE FOR DIMINISHING STEPSIZE
and αk satisfies
∞
X ∞
X
αk = ∞, αk2 < ∞.
k=0 k=0
LECTURE OUTLINE
and gi is a subgradient of f at xi .
f (x) f (x1 ) + (x − x1 )′ g1
f (x0 ) + (x − x0 )′ g0
x0 xx3 x∗ x2 0 x1 X x
)X
CONVERGENCE OF CUTTING PLANE METHOD
′ ′
Fk (x) = max f (x0 )+(x−x0 ) g0 , . . . , f (xk )+(x−xk ) gk
f (x) f (x1 ) + (x − x1 )′ g1
f (x0 ) + (x − x0 )′ g0
x0 xx3 x∗ x2 0 x1 X x
)X
Q.E.D.
TERMINATION
f (x)
f (x1 ) + (x − x1 )′ g1
f (x0 ) + (x − x0 )′ g0
x0 xx
3 x
∗ x2 0 x1 X x
)X
epi(f )
X, using intersections of
X x
F1 (x) f (x0 ) + (x − x0 )′ g0
x0 x x∗ x2 0 x1 X x
)X
SIMPLICIAL DECOMPOSITION IDEAS
) ∇f (x2 )
x̃2 1 x2
x̃1
) ∇f (x3 )
2 x3
x̃4
x̃3
3 x4 = x∗ Level sets of f
• Approximating problem
(min over a simplex):
Xk
minimize f αj x̃j
j=1
k
X
subject to αj = 1, αj ≥ 0
j=1
2 ∇f (x0 )
) ∇f (x1 )
x0
X
0 x1
) ∇f (x2 )
x̃2 1 x2
x̃1
) ∇f (x3 )
2 x3
x̃4
x̃3
3 x4 = x∗ Level sets of f
minimize ∇f (xk )′ (x − xk )
subject to x ∈ X
minimize f (x)
subject to x ∈ conv(Xk+1 ).
CONVERGENCE
∇f (xk )′ (x − xk ) ≥ 0, ∀ x ∈ conv(Xk )
2 ∇f (x0 )
) ∇f (x1 )
x0
X
0 x1
) ∇f (x2 )
x̃2 1 x2
x̃1
) ∇f (x3 )
2 x3
x̃4
x̃3
3 x4 = x∗ Level sets of f
x Ck+1 (x)
) Ck (x)
) c(x) Const.
)xk xk+1
+1 4 x̃k+1 x
−gk ∈ ∂c(x̃k+1 ),
)X
Primal feasibility violated gk conv(Xk )
gk
gk+1
+1 xk
k xk+1
x∗
e: x̃k+1
Level sets of f
LECTURE 18
LECTURE OUTLINE
• Proximal algorithm
• Convergence
• Rate of convergence
• Extensions
********************************************
f (xk )
f (x)
γk
1
γk − "x − xk "2
2ck
xk xk+1 x∗ X x
f (xk )
f (x)
γk
1
γk − "x − xk "2
2ck
xk xk+1 x∗ X x
By adding over k = 0, . . . , N ,
N
X
kxN +1 −yk2 +2 ck f (xk+1 )−f (y) ≤ kx0 −yk2 ,
k=0
with y = x∗ ∈ X ∗ ,
kxk+1 −x∗ k2 ≤ kxk −x∗ k2 −2c k f (xk+1
, )−f (x∗ )
(∗∗)
Thus kxk − x∗ k2 is monotonically nonincreasing,
so {xk } is bounded.
• If {xk }K → z, the limit point z must belong to
X ∗ , since f (xk ) ↓ f ∗ , and f is closed, so
f (x) f (x)
xk xk+1 xk+2 x∗ X x xk
xk+1
xk+2 x∗ X x
f (x)
f (x)
xk − xk+1
Slope =
ck
f (xk+1 ) )
f (x) βd(x)α
f∗ xk+1 xk X x
X∗ ) δk+1
d(xk+1 ) 1
lim sup ≤
k→∞ d(xk ) 1 + βc̄
linear convergence.
• If 1 < α < 2, then
d(xk+1 )
lim sup 1/(α−1) < ∞
k→∞ d(xk )
superlinear convergence.
FINITE CONVERGENCE
f ∗ + βd(x) ≤ f (x), ∀ x ∈ ℜn
f (x)
Slope β Slope β
f ∗ + βd(x)
f∗
X x
X∗
f (x) f (x)
x0 x1 x2 = x∗ X x x0 x∗ X x
EXTENSIONS
f (x)
γk
γk+1
γk − Dk (x, xk )
γk+1 − Dk+1 (x, xk+1 )
xk xk+1 xk+2 x∗ X x
LECTURE 19
LECTURE OUTLINE
undle
BundleVersions
Proximal
VersionsFenchel
Proximal
FenchelSimplicial
Duality
Duality Decomposition
Simplicial DecompositionBundle
BundleVersions
VersionsFenchel
FenchelDuality
Duality
Dual Proximal Point Algorithm Inner Linearization
Dual Proximal Point
l AlgorithmProximal
Algorithm
Point
Dual Proximal
Inner Linearization
Algorithm
Algorithm Outer LinearizationProximal Simplicial Decomposition Bundle Ver
Augmented Lagrangian Method Proximal
Proximal Algorithm
Cutting Plane Dual Proximal
Bundle Versions
RECALL PROXIMAL ALGORITHM
f (xk )
f (x)
γk
1
γk − "x − xk "2
2ck
xk xk+1 x∗ X x
and
∗ f1 (x)
λ Slope λ∗
−f1⋆ (λ)
(c) Slope λ S
) q(λ)
f2⋆ (−λ)
) −f2 (x)
∗ f∗ = q∗
) x∗ some x
! "
λ ∈ ∂f (x) x ∈ arg minn f (z) − z ′ λ
z∈ℜ
1
f1 (x) = f (x), f2 (x) = kx − xk k2
2ck
• The Fenchel dual is
xk − xk+1
λk+1 =
ck
f (xk )
f (x)
γk
Slope λk+1
Optimal primal proximal solution
1 Primal
Optimal dual Solution
proximal Optima
solution
γk − "x − xk "2 Optimal primal proximal solution
2ck
Optimal dual proximal solution
xk xk+1 x∗ X x
Optimal primal proximal solution
Optimal primal proximal solution
Optimal dual proximal solution
DUAL PROXIMAL ALGORITHM
ck
δk + x′k λ − "λ"2 f ⋆ (λ)
2
1 f (x)
γk − "x − xk "2
2ck
Optimal
γk Slope = x∗
xk xk+1 x∗ X x λk+1
δk
x h(λ) Slope = xk+1
Optimal Slope = xk
λ Slope = λk+1 Slope = 0
Primal Proximal Iteration Dual Proximal Iteration
minimize f (x)
subject to x ∈ X, Ax = b
• Primal and dual functions:
p(u) = inf f (x), q(λ) = inf f (x)+λ′ (Ax−b)
x∈X x∈X
Ax−b=u
xk − xk+1
λk+1 = = ∇φck (xk ),
ck
where
1
φc (z) = infn f (x) + kx − zk2
x∈ℜ 2c
f (z)
f (x)
φc (z)
where
Fk (x) = max f (x0 )+(x−x0 )′ g 0 , . . . , f (xk )+(x−xk )′ g k
f (x)
Fk (x)
) = xk −xk+1 X x
• Stability issue:
− For large enough ck and polyhedral X, xk+1
is the exact minimum of Fk over X in a single
minimization, so it is identical to the ordi-
nary cutting plane method.
f (x)
Fk (x)
) = xk −
x∗ xk+1 X x
′ ′
Fk (x) = max f (x0 )+(x−x0 ) g0 , . . . , f (xk )+(x−xk ) gk
1
pk (x) = kx − yk k2
2ck
where ck is a positive scalar parameter.
• We refer to pk (x) as the proximal term, and to
its center yk as the proximal center.
f (x)
Fk (x)
) yk X x
xk+1 x∗
δk f (x)
f (x)
δk
Fk (x)
Fk (x)
x′k+1 λ ⋆
gk+1 ∈ arg maxn − f (λ)
λ∈ℜ
x h(λ) h (λ)
Slope = xk+1
LECTURE OUTLINE
f (xk )
f (x)
γk
1
γk − "x − xk "2
2ck
xk xk+1 x∗ X x
1
n o
xk+1 = arg minn f (x) + kx − xk k2
x∈ℜ 2ck
minimize f (x)
subject to x ∈ X, Ax = b
• Primal and dual functions:
′
p(u) = inf f (x), q(λ) = inf f (x)+λ (Ax−b)
x∈X x∈X
Ax−b=u
λk+1 = λk + ck (Axk+1 − b)
A DIFFICULTY WITH AUGM. LAGRANGIANS
m
X
minimize fi (x)
i=1
subject to x ∈ ∩m
i=1 Xi ,
1
n o
S(α, w) ∈ arg minm kzk1 + kz − wk2 ,
z∈ℜ 2α
wi − α if wi > α,
(
S i (α, w) = 0 if |wi | ≤ α, i = 1, . . . , m.
wi + α if wi < −α,
FAVORABLY STRUCTURED PROBLEMS II
• Basis pursuit:
minimize kxk1
subject to Cx = b,
m
X
minimize fi (xi )
i=1
m
X
subject to Ai xi = b, xi ∈ X i , i = 1, . . . , m,
i=1
subject to Ai xi = z i , xi ∈ X i , i = 1, . . . , m,
m
X i
z = b,
i=1
c
n o
xik+1 ∈ arg min fi (xi ) + (Ai xi − zki )′ pik + kAi xi − zki k2 ,
xi ∈Xi 2
(m
X c
zk+1 ∈ arg Pm
min (Ai xik+1 − z i )′ pik + kAi xik+1 − z i k
z i =b 2
i=1 i=1
cv
Slope = 1/c /c Slope = −1/c
x∗ ∗ x − x∗
) X∗
d z x
) Pc,f (x) − x∗
) Nc,f (x) − x∗ Pc,f (x)
Nc,f (x)
LECTURE OUTLINE
X x
xk
xk+1
X xk − α∇f (xk )
1
y− x−α∇f (x)
= ℓ(y; x)+ 1 ky−xk2 +constant
2
2α 2α
so
1
n o
ky − xk2
PX x − α∇f (x) ∈ arg min ℓ(y; x) +
y∈X 2α
f (x)
f (xk )
γk
1
ℓ(x; xk )
) γk − 2αk "x − xk "2
xk xk+1 X x
X
L
f (y) ≤ ℓ(y; x) + ky − xk2
2
L
) ℓ(y; x) + !y − x!2
2
f (x)
ℓ(y; x) + f (y)
X xX x − L1 ∇f (x) )yf
so by setting x = xk ,
′ 1
2
∇f (xk ) (xk+1 − xk ) ≤ −
xk+1 − xk
α
• Using this relation and the descent lemma,
L
f (xk+1 ) ≤ ℓ(xk+1 ; xk ) + kxk+1 − xk k2
2
L
= f (xk ) + ∇f (xk )′ (xk+1 − xk ) + kxk+1 − xk k2
2
1 L
≤ f (xk ) − − kxk+1 − xk k2
α 2
so α ∈ (0, 2/L) reduces the cost function value.
• If α ∈ 0, 2/L and x is the limit of a subsequence
{xk }K , then f (xk ) ↓ f (x), so kxk+1 − xk k → 0. This
implies PX x − α∇f (x) = x. Q.E.D.
STEPSIZE RULES
1
f (xk+1 ) ≤ ℓ(xk+1 ; xk ) + kxk+1 − xk k2
2α
m ′ m
f (xk ) − f xk (β s) ≥ σ∇f (xk ) xk − xk (β s) ,
αk = β 2 sk βsk ) sk
x0 Projection arc Slope α ℓ
! "
! "
f xk (α) −f (xk ) ) σ∇f (xk )′ xk (α)−xk
∗ Ld(x0 )2
lim d(xk ) = 0, f (xk ) − f ≤
k→∞ 2k
L ∗ L
kx − xk+1 k2 ≤ kx∗ − xk k2 − ek+1
2 2
L ∗ L
0≤ kx − xk+1 k2 ≤ d(x0 )2 − (k + 1)ek+1
2 2
GENERALIZATION - EVENTUALLY CONST. αK
f (x)
Slope cǫ
ǫ −ǫ 0 ǫ x
• Unconstrained minimization of
2
c
2
|x| if |x| ≤ ǫ,
f (x) = cǫ2
cǫ|x| − 2
if |x| > ǫ
1 1
xk+1 = xk − ∇f (xk ) = xk − c ǫ = xk − ǫ
L c
LECTURE OUTLINE
X x
xk
xk+1
X xk − α∇f (xk )
f (x)
Slope cǫ
ǫ −ǫ 0 ǫ x
• Unconstrained minimization of
2
1
2
|x| if |x| ≤ ǫ,
f (x) = ǫ2
ǫ|x| − 2
if |x| > ǫ
1
xk+1 = xk − ∇f (xk ) = xk − ǫ
L
θk (1 − θk−1 )
βk =
θk−1
1 − θk+1 1 2
2
≤ , θk ≤
θk+1 θk2 k+2
zk = xk − α∇f (xk )
1
n o
xk+1 ∈ arg min h(x) + kx − zk k2
x∈X 2α
1
f (xk+1 ) ≤ ℓ(xk+1 ; xk ) + kxk+1 − xk k2 (1)
2α
αk ↓ α,
1
f (xk+1 ) ≤ ℓ(xk+1 ; xk ) + kxk+1 − xk k2 ,
2αk
α d(x0 )2
f (xk ) + h(xk ) − min f (x) + h(x) ≤ , ∀ k,
x∈X 2k
where
d(x) = ∗min∗ kx − x∗ k, x ∈ ℜn
x ∈X
SCALED PROXIMAL GRADIENT METHODS
LECTURE OUTLINE
• Incremental methods
• Review of large sum problems
• Review of incremental gradient methods
• Incremental subgradient-proximal methods
• Convergence analysis
• Cyclic and randomized component selection
**************************************************
• References:
(1) D. P. Bertsekas, “Incremental Gradient, Sub-
gradient, and Proximal Methods for Convex
Optimization: A Survey”, Lab. for Informa-
tion and Decision Systems Report LIDS-P-
2848, MIT, August 2010.
(2) Published versions in Math. Programming
J., and the edited volume “Optimization for
Machine Learning,” by S. Sra, S. Nowozin,
and S. J. Wright, MIT Press, Cambridge,
MA, 2012.
LARGE SUM PROBLEMS
• Minimize over X ⊂ ℜn
m
X
f (x) = fi (x), m is very large,
i=1
where X , fi are convex. Some examples:
• Dual cost of a separable problem - Lagrangian
relaxation, integer programming.
• Data analysis/machine learning: x is parameter
vector of a model; each fi corresponds to error
between data and output of the model.
− ℓ1 -regularization
n
(leastm squares plus ℓ1 penalty):
X j
X ′ 2
min γ |x | + (ci x − di )
x
j=1 i=1
1 2 2
min (1 − x) + (1 + x)
x∈ℜ 2
(c(ai xi x−
- bbi )i2)
2
x*
ac
i i R aci x
min i
min max i i
max
i b bi i i bbi i
• Problem: Minimize
m
X
f (x) = fi (x)
i=1
˜ i (xk ) ,
xk+1 = PX xk − αk ∇f k
• Algorithm
˜ i (xk )
xk+1 = PX xk − αk ∇f k
Progress if −2αk f (xk ) − f (y) + α2k m2 c2 < 0.
• Result for a constant stepsize αk ≡ α:
∗ m 2 c2
lim inf f (xk ) ≤ f + α
k→∞ 2
P∞
• Convergence for αk ↓ 0 with k=0
αk = ∞.
CONVERGENCE: RANDOMIZED ORDER
• Algorithm
˜ i (xk )
xk+1 = PX xk − αk ∇f k
∗ mc2
lim inf f (xk ) ≤ f + α
k→∞ 2
1
n o
xk+1 = arg min f (x) + kx − xk k2
x∈X 2αk
can be written as
˜ (xk+1 )
xk+1 = PX xk − αk ∇f
1
n o
xk+1 = arg min fik (x) + kx − xk k2
x∈X 2αk
˜ i (zk )
xk+1 = PX zk − αk ∇h k
∗ m 2 c2
lim inf f (xk ) ≤ f + αβ
k→∞ 2
P∞
• Convergence for αk ↓ 0 with k=0
αk = ∞.
CONVERGENCE: RANDOMIZED ORDER
∗ mc2
lim inf f (xk ) ≤ f + αβ
k→∞ 2
1
n o
zk = arg minn γ kxk1 + kx − xk k2
x∈ℜ 2αk
• Decomposes into the n one-dimensional mini-
mizations
1
n o
zkj = arg min j
γ |x | + |xj − xjk |2 ,
xj ∈ℜ 2αk
1
n o
xk+1 ∈ arg minn c dist(x; Xjk ) + kx − yk k2 .
x∈ℜ 2αk
The second iteration can be implemented in “closed
form,” using projection on Xjk .
LECTURE 24
LECTURE OUTLINE
**************************************
References:
• On-line chapter on algorithms
• Bertsekas, D. P., 1999. Nonlinear Programming,
Athena Scientific, Belmont, MA.
• Beck, A., and Teboulle, M., 2003. “Mirror De-
scent and Nonlinear Projected Subgradient Meth-
ods for Convex Optimization,” Operations Research
Letters, Vol. 31, pp. 167-175.
GENERALIZED PROXIMAL-RELATED ALGS
where
(
x ln(x) − 1 if x > 0,
φ(x) = 0 if x = 0,
∞ if x < 0.
• Proximal algorithm:
f (x)
γk − Dk (x, xk )
x0 xk xk+1 xk+2 x∗ X x
GENERALIZED PROXIMAL ALGORITHM
f (x)
γk
γk+1
γk − Dk (x, xk )
γk+1 − Dk+1 (x, xk+1 )
xk xk+1 xk+2 x∗ X x
xk ∈ X ∗
xk ∈ arg minn f (x) + Dk (x, xk )} ⇒
x∈ℜ
/ X ∗ [the
• Then strict cost improvement for xk ∈
second inequality in (2) is strict].
• Guaranteed if f is convex and:
(a) Dk (·, xk ) satisfies (1), and is convex and dif-
ferentiable at xk .
(b) ri dom(f ) ∩ ri dom(Dk (·, xk )) 6= Ø.
EXAMPLES
where M satisfies
Mk (y, y) = f (y), ∀ y ∈ ℜn , k = 0, 1,
Mk (x, xk ) ≥ f (xk ), ∀ x ∈ ℜn , k = 0, 1, . . .
• Example for case f (x) = R(x) + kAx − bk2 , where
R is a convex regularization function
γk f (x)
Slope λk+1
Optimal primal proximal solution
Primal
Optimal dual Solution
proximal Optimal dual
solution
Optimal primal proximal solution
γk − Dk (x, xk ) Optimal dual proximal solution
xk xk+1 x∗ X x
Optimal primal proximal solution
Optimal primal proximal solution
Optimal dual proximal solution
where xk > 0.
• Fenchel duality ⇒ Augmented Lagrangian method
• Note: The conjugate of the logarithmic
if x > 0,
(
x ln(x) − 1
h(x) = 0 if x = 0,
∞ if x < 0,
i ck uik+1
xik+1 = xk e , i = 1, . . . , n
EXPONENTIAL AUGMENTED LAGRANGIAN
• A special case for the convex problem
minimize f (x)
subject to g1 (x) ≤ 0, . . . , gr (x) ≤ 0, x ∈ X
• Method:
n i
X
˜ i f (xk ) + 1 i x
xk+1 ∈ arg min xi ∇ x ln −1
x∈X αk xik
i=1
LECTURE OUTLINE
**************************************
References:
• Boyd and Vanderbergue, 2004. Convex Opti-
mization, Cambridge U. Press.
• Bertsekas, D. P., 1999. Nonlinear Programming,
Athena Scientific, Belmont, MA.
• Bertsekas, D. P., and Tsitsiklis, J. N., 1989.
Parallel and Distributed Algorithms: Numerical
Methods, Prentice-Hall.
INTERIOR POINT METHODS
eǫB(x)
B(x)
Se' <ǫ′e < ǫ
Boundary of S ) ǫe'′ B(x)
B(x) Boundary of S
Boundary of S Boundary of S
)S
S
• Examples:
r r
X X 1
B(x) = − ln −gj (x) , B(x) = − .
gj (x)
j=1 j=1
where ǫk ↓ 0.
BARRIER METHOD - EXAMPLE
1 1
0.5 0.5
0 0
-0.5 -0.5
-1 -1
2.05 2.1 2.15 2.2 2.25 2.05 2.1 2.15 2.2 2.25
1 2 2 2
minimize f (x) = 1
2
(x ) + (x )
subject to 2 ≤ x1 ,
with optimal solution x∗ = (2, 0).
• Logarithmic barrier: B(x) = − ln (x1 − 2)
√
• We have xk = 1 + 1 + ǫk , 0 from
1 2 2 2 1
xk ∈ arg min 1
2
(x ) + (x ) − ǫk ln (x − 2)
x1 >2
so by taking limit
minimize c′ x
subject to Ai x − bi ∈ Ci , i = 1, . . . , m,
m
X
minimize c′ x + ǫk Bi (Ai x − bi )
i=1
subject to x ∈ ℜn , Ai x − bi ∈ int(Ci ), i = 1, . . . , m,
and ǫk ↓ 0.
• Essential to use Newton’s method to solve the
approximating problems.
• Interesting complexity analysis
SEMIDEFINITE PROGRAMMING
maximize b′ λ
subject to D − (λ1 A1 + · · · + λm Am ) ∈ C,
• Problem
minimize f (x)
subject to x ∈ X,
X = X1 × X2 × · · · × Xm ,
x = (x1 , x2 , . . . , xm ),
constrained by xi ∈ Xi .
• (Block) Coordinate descent: At each iteration,
minimized the cost w.r.t. each of the block com-
ponents xik , in cyclic order
1
+ kξ − xik k2 ,
2c
1
f (x) + kx − yk2
2c
COORDINATE DESCENT EXTENSIONS
n
X
Gi (xi ) = γkxk1
i=1
S(k + 1) ⊂ S(k), k = 0, 1, . . . ,
• Interpretation of assumptions:
) x = (x1 , x2 )
(0) S2 (0) ) S(k + 1) x∗ f F (x) =
(0) S(k)
S(0)
S1 (0)
x1 Iterations
) S(k + 1) x∗ f
(0) S(k)
S(0)
Iterations x2 Iteration
LECTURE OUTLINE
(x)
f f(x) Epigraph
Epigraph f (x)
f(x) Epigraph
Epigraph
x
dom(f ) xx x
dom(f )
Convex function Nonconvex function
Convex function Nonconvex function
C1
C1 y C2
y C2
d x1
x
1 x2
(a) a) (b)
A union
A union of points Anofintersection
its points An of
intersection of halfspaces
hyperplanes
Abstract Min-Common/Max-Crossing
Time domain Frequency domain Theorems
(−y, 1)
f (x)
) Slope = y
x0 yx
• Conjugacy theorem: f = f ⋆⋆
• Support functions
x̂
y
0 σX (y)/"y"
Optimal
Solution
F (x, z) F (x, z)
epi(f ) epi(f )
O O
f (x) = inf F (x, z) z f (x) = inf F (x, z) z
z z
x1 x1
x2 x2
x x
MIN COMMON/MAX CROSSING DUALITY
_ w
xM Min Common
uw
w Min Common Muw
nMin w∗ Point w*
Common
Point nMin
Point w∗ Point w*
Common
M M
M M
00 uu 00 uu
MaxMax Crossing
Crossing Point q*
g Point q ∗ Max
Max Crossing
Crossing Point q*
g Point q ∗
(a)
(a) ) (b)
(b)
u ww
_
xMM
Min
Min Common
Common Point w*
n Point w∗
Max Crossing S MM
g Max
Point q ∗ Point q*
Crossing
00 uu
b) (c)
(c)
w∗ ≤ lim inf wk .
k→∞
• Let X ⊂ ℜn , f : X 7→ ℜ, and gj : X 7→ ℜ,
j = 1, . . . , r, be convex. Assume that
f (x) ≥ 0, ∀ x ∈ X with g(x) ≤ 0
Let
∗ ′
Q = µ | µ ≥ 0, f (x) + µ g(x) ≥ 0, ∀ x ∈ X .
= 0} = 0} = 0}
w (µ, 1) w (µ, 1)
minimize f (x)
subject to x ∈ X, gj (x) ≤ 0, j = 1, . . . , r,
(Ax∗ − b)′ µ∗ = 0,
i.e., µ∗j > 0 implies that the j th constraint is tight.
(Applies to inequality constraints only.)
FENCHEL DUALITY
• Primal problem:
∗ f1 (x)
λ Slope λ∗
−f1⋆ (λ)
(c) Slope λ S
) q(λ)
f2⋆ (−λ)
) −f2 (x)
∗ f ∗ = q∗
) x∗ some x
CONIC DUALITY
if x ∈ C ,
n
0
f1 (x) = f (x), f2 (x) =
∞ if x ∈
/ C.
• Linear Conic Programming:
minimize c′ x
subject to x − b ∈ S, x ∈ C.
• Equivalent dual linear conic problem:
minimize b′ λ
subject to λ − c ∈ S ⊥ , λ ∈ Ĉ.
• Special Linear-Conic Forms:
min c′ x ⇐⇒ max b′ λ,
Ax=b, x∈C c−A′ λ∈Ĉ
min c′ x ⇐⇒ max b′ λ,
Ax−b∈C A′ λ=c, λ∈Ĉ
where x ∈ ℜn , λ ∈ ℜm , c ∈ ℜn , b ∈ ℜm , A : m × n.
SUBGRADIENTS
f (z)
(−g, 1)
! "
x, f (x)
0 z
• ∂f (x) 6= Ø for x ∈ ri dom(f ) .
• Conjugate Subgradient Theorem: If f is closed
proper convex, the following are equivalent for a
pair of vectors (x, y):
(i) x′ y = f (x) + f ⋆ (y).
(ii) y ∈ ∂f (x).
(iii) x ∈ ∂f ⋆ (y).
(a) X ∗ = ∂f ⋆ (0).
∗
⋆
(b) X is nonempty if 0 ∈ ri dom(f ) .
(c) X ∗ is nonempty and compact if and only if
0 ∈ int dom(f ⋆ ) .
CONSTRAINED OPTIMALITY CONDITION
g ′ (x − x∗ ) ≥ 0, ∀ x ∈ X.
Level Sets of f
∗ NC (x∗ ) ∗ NC (x∗ )
Level Sets of f
gC x) x∗ gC x) x∗
) ∇f (x∗ ) −g
y) ∂f (x∗ )
COMPUTATION: PROBLEM RANKING IN
60
1
40
x2
z 20
-1
0
-2 3
2
-20 1
0
3 -1 x1
-3 2 1 0 -2
-3 -2 -1 0 1 2 3 -1 -2 -3
-3
x1 x2
• Subgradient method:
∂f (xk )
Level sets of q gk
Level sets of f M
X
xk mk
m*∗
x
xmkk −
+ sgαkk gk
f (x) f (x1 ) + (x − x1 )′ g1
f (x0 ) + (x − x0 )′ g0
x0 xx3 x∗ x2 0 x1 X x
)X
2 ∇f (x0 )
) ∇f (x1 )
x0
X
0 x1
) ∇f (x2 )
x̃2 1 x2
x̃1
) ∇f (x3 )
2 x3
x̃4
x̃3
3 x4 = x∗ Level sets of f
f (xk )
f (x)
γk
1
γk − "x − xk "2
2ck
xk xk+1 x∗ X x
f (x)
Fk (x)
) = xk −xk+1 X x
ck
δk + x′k λ − "λ"2 f ⋆ (λ)
2
1 f (x)
γk − "x − xk "2
2ck
Optimal
γk Slope = x∗
xk xk+1 x∗ X x λk+1
δk Slope = xk+1
Optimal Slope = xk
λ Slope = λk+1 Slope = 0
Primal Proximal Iteration Dual Proximal Iteration
DUALITY VIEW OF PROXIMAL METHODS
le Versions
ndle Proximal
VersionsFenchel
Proximal
FenchelSimplicial
Duality
Duality Decomposition
Simplicial DecompositionBundle
BundleVersions
VersionsFenchel
FenchelDuality
Duality
Dual Proximal Point Algorithm Inner Linearization
ual Proximal Point
gorithmProximal
Algorithm
Point
Dual Proximal
Inner Linearization
Algorithm
Algorithm Outer LinearizationProximal Simplicial Decomposition Bundle
Augmented Lagrangian Method Proximal
Proximal Algorithm
Cutting Plane Dual Proximal
Bundle Versions
eǫB(x)
B(x)
Se' <ǫ′e < ǫ
Boundary of S ) ǫe'′ B(x)
B(x) Boundary of S
Boundary of S Boundary of S
)S
S
0.5 0.5
0 0
-0.5 -0.5
ADVANCED TOPICS