Professional Documents
Culture Documents
Lecture4 APG
Lecture4 APG
DSA5103 Lecture 4
Yangjing Zhang
05-Sep-2023
NUS
Today’s content
lecture4 1/54
Basic convex analysis
Norms
lecture4 2/54
Inner product
lecture4 3/54
Inner product
From the definition of inner products, the following properties hold for
any matrices A, B, C ∈ Rm×n and any scalars a, b ∈ R
• kAk2F := hA, Ai ≥ 0
• hA, Ai = 0 if and only if A = 0
• hC, aA + bBi = ahC, Ai + bhC, Bi
• hA + B, A + Bi = hA, Ai + 2hA, Bi + hB, Bi
(i.e., kA + Bk2F = kAk2F + 2hA, Bi + kBk2F )
lecture4 4/54
Projection onto a closed convex set
lecture4 5/54
Projection onto a closed convex set
lecture4 5/54
Projection onto a closed convex set
Example.
1. C = Rn+ = {x ∈ Rn | xi ≥ 0, ∀ i = 1, 2, . . . , n} positive orthant
ΠC (z) = ΠRn+ (z) = max{z, 0}
2. C = {x ∈ Rn | kxk2 ≤ 1} `2 -norm ball
z, if kzk2 ≤ 1 z
ΠC (z) = =
z
, if kzk2 > 1 max{kzk2 , 1}
kzk2
Example.
lecture4 7/54
arg min
arg min f (x) denotes the solution set of x for which f (x) attains its
x
minimum (argument of the minimum)
4 1
0.9
3.5
0.8
3
0.7
2.5
0.6
2 0.5
0.4
1.5
0.3
1
0.2
0.5
0.1
0 0
0 0.5 1 1.5 2 2.5 3 3.5 4 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
Definition.
Let X be a Euclidean space (e.g., X = Rn or Rm×n ). Let
f : X → (−∞, +∞] be an extended real-valued function1 .
(1) The (effective) domain of f is defined to be the set
is closed.
(4) f is said to be convex if its epi-graph is convex.
1 Here f is allowed to take the value of +∞, but not allowed to take the value of −∞
lecture4 9/54
Extended real-valued function
lecture4 10/54
Example: extended real-valued function
0, if x = 0
f (x) = 1, if x > 0
+∞, if x < 0
lecture4 11/54
Indicator function
• dom(f ) = C, f is proper
• epi(f ) = C × [0, +∞) is closed if C is closed, i.e., δC (·) is closed if
C is closed
• epi(f ) is convex if C is convex, i.e., δC (·) is convex if C is convex
lecture4 12/54
Dual/polar cone
Definition (Cone)
A set C ⊆ X is called a cone if λx ∈ C when x ∈ C and λ ≥ 0.
C ∗ = {y ∈ X | hx, yi ≥ 0 ∀x ∈ C}
lecture4 13/54
Dual/polar cone
Figure 2: Left: A set C and its dual cone C ∗ . Right: A set C and its polar
cone C o . The dual cone and the polar cone are symmetric to each other with
respect to the origin. Image from internet
lecture4 14/54
Example: self-dual cones
lecture4 15/54
Example: self-dual cones
p
3. X = Rn . C := {x ∈ Rn | x22 + · · · + x2n ≤ x1 , x1 ≥ 0} is a
self-dual closed convex cone (second-order cone)
. Proof: want to show that {y ∈ Rn | hx, yi ≥ 0 ∀x ∈ C} = C
[RHS ⊆ LHS] Take y ∈ C. For any x ∈ C
v v
u n u n
uX uX
2
hx, yi = x1 y1 + x2 y2 + · · · + xn yn ≥ x1 y1 − t xi t yi2 ≥ 0
i=2 i=2
then v v
u n u n
uX 2 2
uX
2
hx, yi = y1 t yi − y2 − · · · − yn ≥ 0 ⇒ y1 ≥ t yi2 ⇒ y ∈ C
i=2 i=2 lecture4 16/54
.
Example: self-dual cones
lecture4 17/54
Normal cone
lecture4 18/54
Example: normal cones
lecture4 19/54
Example: normal cones
lecture4 20/54
Example: normal cones
Example 3. C = a triangle ⊆ R2
lecture4 21/54
Example: normal cones
Example 3. C = a triangle ⊆ R2
lecture4 22/54
Normal cone
lecture4 23/54
Normal cone
lecture4 24/54
Subdifferential
lecture4 25/54
Subdifferential and optimization
lecture4 26/54
Subdifferential and optimization
lecture4 26/54
Example: subdifferential
Example 1.
f (x) = |x|, x ∈ R.
1
0.9
0.8
0.7
{−1}, if x < 0
0.6
0.5
0.1
0
-1 -0.5 0 0.5 1
lecture4 27/54
Example: subdifferential
Example 1.
f (x) = |x|, x ∈ R.
1
0.9
0.8
0.7
{−1}, if x < 0
0.6
0.5
0.1
0
-1 -0.5 0 0.5 1
Example 2.
f (x) = max{x2 − 1, 0}, x ∈ R.
1.5
1
{2x}, if x < −1, x > 1 0.5
{0},
if − 1 < x < 1 0
∂f (x) =
[−2, 0], if x = −1 -0.5
[0, 2], if x = 1 -1
-1.5 -1 -0.5 0 0.5 1 1.5
lecture4 28/54
Example: subdifferential
Proof: Take x ∈ C
v ∈ ∂δC (x)
⇐⇒ δC (z) ≥ δC (x) + hv, z − xi ∀z
⇐⇒ 0 ≥ hv, z − xi ∀z ∈ C
⇐⇒ v ∈ NC (x)
lecture4 29/54
Lipschitz continuous
lecture4 30/54
Fenchel conjugate
Remark
lecture4 31/54
Fenchel conjugate
lecture4 32/54
Example: conjugate function
Its conjugate
∗
δC (y) = sup{hy, xi − δC (x) | x ∈ X } = sup{hy, xi | x ∈ C}
lecture4 33/54
Example: conjugate function
lecture4 34/54
Example: conjugate function
lecture4 34/54
Proximal operator
Moreau envelope and proximal mapping
lecture4 35/54
Example
lecture4 36/54
Example
Soft thresholding
1.5 2 2
1.8 1.8
1
1.6 1.6
1.4 1.4
0.5
1.2 1.2
0 1 1
0.8 0.8
-0.5
0.6 0.6
0.4 0.4
-1
0.2 0.2
-1.5 0 0
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
lecture4 38/54
Moreau envelope and proximal mapping
lecture4 39/54
Moreau envelope and proximal mapping
lecture4 41/54
Optimizing composite functions
minimize
p
f (β) + g(β)
β∈R
lecture4 42/54
Proximal gradient step
minimize
p
f (β) + g(β)
β∈R
lecture4 43/54
Proximal gradient step
lecture4 44/54
Proximal gradient step
lecture4 45/54
Proximal gradient methods
lecture4 46/54
Nesterov’s accelerated method
lecture4 47/54
Sequence tk
√
1+ 1+4t2k
Take t0 = t1 = 1, tk+1 = 2
45 1
40 0.9
0.8
35
0.7
30
0.6
25
0.5
20
0.4
15
0.3
10
0.2
5 0.1
0 0
0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80
lecture4 48/54
Accelerated proximal gradient methods
lecture4 50/54
Accelerated proximal gradient methods
lecture4 51/54
APG for lasso
1
minimize kXβ − Y k2 + λkβk1 lasso problem
p
β∈R 2
| {z } | {z }
=g(β)
=f (β)
lecture4 52/54
APG for lasso
0 ∈ ∇f (β) + ∂g(β),
β = Pg (β − ∇f (β)).
Therefore, we can choose a tolerance ε > 0 and stop the method at β (k)
once the stopping criteria is satisfied
β (k) − Sλ β (k) − X T (Xβ (k) − Y ) < ε.
lecture4 53/54
Restart
lecture4 54/54
References i