Professional Documents
Culture Documents
ConvexOptimization_Boyd_slides
ConvexOptimization_Boyd_slides
1. Introduction
• mathematical optimization
• convex optimization
• example
• nonlinear optimization
1–1
Mathematical optimization
minimize f0(x)
subject to fi(x) ≤ bi, i = 1, . . . , m
• f0 : Rn → R: objective function
• fi : Rn → R, i = 1, . . . , m: constraint functions
Introduction 1–2
Examples
portfolio optimization
• variables: amounts invested in different assets
• constraints: budget, max./min. investment per asset, minimum return
• objective: overall risk or return variance
data fitting
• variables: model parameters
• constraints: prior information, parameter limits
• objective: measure of misfit or prediction error
Introduction 1–3
Solving optimization problems
• least-squares problems
• linear programming problems
• convex optimization problems
Introduction 1–4
Least-squares
using least-squares
Introduction 1–5
Linear programming
minimize cT x
subject to aTi x ≤ bi, i = 1, . . . , m
Introduction 1–6
Convex optimization problem
minimize f0(x)
subject to fi(x) ≤ bi, i = 1, . . . , m
if α + β = 1, α ≥ 0, β ≥ 0
Introduction 1–7
solving convex optimization problems
• no analytical solution
• reliable and efficient algorithms
• computation time (roughly) proportional to max{n3, n2m, F }, where F
is cost of evaluating fi’s and their first and second derivatives
• almost a technology
Introduction 1–8
Example
illumination Ik
Introduction 1–9
how to solve?
1. use uniform power: pj = p, vary p
2. use least-squares:
Pn 2
minimize k=1 (I k − I des )
Introduction 1–10
5. use convex optimization: problem is equivalent to
3
h(u)
PSfrag replacements 1
0
0 1 2 3 4
u
Introduction 1–11
additional constraints: does adding 1 or 2 below complicate the problem?
• answer: with (1), still easy to solve; with (2), extremely difficult
• moral: (untrained) intuition doesn’t always work; without the proper
background very easy problems can appear quite similar to very difficult
problems
Introduction 1–12
Course goals and topics
goals
topics
Introduction 1–13
Nonlinear optimization
Introduction 1–14
Brief history of convex optimization
theory (convex analysis): ca1900–1970
algorithms
• 1947: simplex algorithm for linear programming (Dantzig)
• 1960s: early interior-point methods (Fiacco & McCormick, Dikin, . . . )
• 1970s: ellipsoid method and other subgradient methods
• 1980s: polynomial-time interior-point methods for linear programming
(Karmarkar 1984)
• late 1980s–now: polynomial-time interior-point methods for nonlinear
convex optimization (Nesterov & Nemirovski 1994)
applications
• before 1990: mostly in operations research; few in engineering
• since 1990: many new applications in engineering (control, signal
processing, communications, circuit design, . . . ); new problem classes
(semidefinite and second-order cone programming, robust optimization)
Introduction 1–15
Convex Optimization — Boyd & Vandenberghe
2. Convex sets
• generalized inequalities
2–1
Affine set
x=
PSfrag replacements θx1 + (1 − θ)x2 (θ ∈ R)
θ = 1.2 x1
θ=1
θ = 0.6
x2
θ=0
θ = −0.2
affine set: contains the line through any two distinct points in the set
x = θx1 + (1 − θ)x2
with 0 ≤ θ ≤ 1
convex set: contains line segment between any two points in the set
x = θ 1 x1 + θ 2 x2 + · · · + θ k xk
with θ1 + · · · + θk = 1, θi ≥ 0
x = θ 1 x1 + θ 2 x2
with θ1 ≥ 0, θ2 ≥ 0
PSfrag replacements x1
x2
0
convex cone: set that contains all conic combinations of points in the set
x0
x
aT x = b
x0 aT x ≥ b
PSfrag replacements
aT x ≤ b
PSfrag replacements
xc
Ax b, Cx = d
a1 a2
PSfrag replacements
P
a5
a3
a4
x y 0.5
z
example: ∈ S2+
y z
0
1
1
0
0.5
y −1 0 x
1. apply definition
example:
S = {x ∈ Rm | |p(t)| ≤ 1 for |t| ≤ π/3}
PSfrag replacements 2
1
1
p(t)
0
x2 0 S
−1
−1
−2
0 π/3 2π/3 π −2 −1
t x01 1 2
examples
• scaling, translation, projection
• solution set of linear matrix inequality {x | x1A1 + · · · + xmAm B}
(with Ai, B ∈ Sp)
• hyperbolic cone {x | xT P x ≤ (cT x)2, cT x ≥ 0} (with P ∈ Sn+)
images and inverse images of convex sets under perspective are convex
Ax + b
f (x) = T , dom f = {x | cT x + d > 0}
c x+d
1
f (x) = x
x1 + x 2 + 1
1 1
x2
0 C 0
f (C)
−1 −1
−1 0 1 −1 0 1
x1 x1
examples
• nonnegative orthant K = Rn+ = {x ∈ Rn | xi ≥ 0, i = 1, . . . , n}
• positive semidefinite cone K = Sn+
• nonnegative polynomials on [0, 1]:
x K y ⇐⇒ y − x ∈ K, x ≺K y ⇐⇒ y − x ∈ int K
examples
• componentwise inequality (K = Rn+)
x Rn+ y ⇐⇒ xi ≤ y i , i = 1, . . . , n
x K y, u K v =⇒ x + u K y + v
y∈S =⇒ x K y
y ∈ S, y K x =⇒ y=x
if C and D are disjoint convex sets, then there exists a 6= 0, b such that
aT x ≤ b for x ∈ C, aT x ≥ b for x ∈ D
aT x ≥ b aT x ≤ b
PSfrag replacements
D
C
{x | aT x = aT x0}
a
x0
PSfrag replacements C
K ∗ = {y | y T x ≥ 0 for all x ∈ K}
examples
• K = Rn+: K ∗ = Rn+
• K = Sn+: K ∗ = Sn+
• K = {(x, t) | kxk2 ≤ t}: K ∗ = {(x, t) | kxk2 ≤ t}
• K = {(x, t) | kxk1 ≤ t}: K ∗ = {(x, t) | kxk∞ ≤ t}
y K ∗ 0 ⇐⇒ y T x ≥ 0 for all x K 0
x1 S
λ2
x2
example (n = 2)
x1, x2, x3 are efficient; x4, x5 are not P
x1
x2 x5 x4
λ
x3
labor
3. Convex functions
• quasiconvex functions
3–1
Definition
f : Rn → R is convex if dom f is a convex set and
• f is concave if −f is convex
• f is strictly convex if dom f is convex and
convex:
• affine: ax + b on R, for any a, b ∈ R
• exponential: eax, for any a ∈ R
• powers: xα on R++, for α ≥ 1 or α ≤ 0
• powers of absolute value: |x|p on R, for p ≥ 1
• negative entropy: x log x on R++
concave:
• affine: ax + b on R, for any a, b ∈ R
• powers: xα on R++, for 0 ≤ α ≤ 1
• logarithm: log x on R++
extended-value extension f˜ of f is
• dom f is convex
• for x, y ∈ dom f ,
f (y)
f (x) + ∇f (x)T (y − x)
g replacements
(x, f (x))
f is twice differentiable if dom f is open and the Hessian ∇2f (x) ∈ Sn,
2 ∂ 2f (x)
∇ f (x)ij = , i, j = 1, . . . , n,
∂xi∂xj
convex if P 0
least-squares objective: f (x) = kAx − bk22
f (x, y)
T 1
2 y y
∇2f (x, y) = 3 0
y −x −x 0
2 2
1 0
convex for y > 0 y 0 −2 x
1 1
∇2f (x) = diag(z) − zz T
(zk = exp xk )
1T z (1T z)2
to show ∇2f (x) 0, we must verify that v T ∇2f (x)v ≥ 0 for all v:
P 2
P P
T 2 ( k zk vk )( k zk ) − ( k v k zk ) 2
v ∇ f (x)v = P ≥0
( k zk ) 2
P 2
P 2
P
since ( k v k zk ) ≤( k zk vk )( k zk ) (from Cauchy-Schwarz inequality)
Qn 1/n n
geometric mean: f (x) = ( k=1 x k ) on R ++ is concave
α-sublevel set of f : Rn → R:
Cα = {x ∈ dom f | f (x) ≤ α}
epigraph of f : Rn → R:
epi f
PSfrag replacements f
f (E z) ≤ E f (z)
prob(z = x) = θ, prob(z = y) = 1 − θ
examples
m
X
f (x) = − log(bi − aTi x), dom f = {x | aTi x < bi, i = 1, . . . , m}
i=1
examples
is convex
examples
• support function of a set C: SC (x) = supy∈C y T x is convex
• distance to farthest point in a set C:
f (x) = sup kx − yk
y∈C
λmax(X) = sup y T Xy
kyk2 =1
composition of g : Rn → R and h : R → R:
f (x) = h(g(x))
examples
• exp g(x) is convex if g is convex
• 1/g(x) is convex if g is concave and positive
composition of g : Rn → Rk and h : Rk → R:
examples
Pm
• i=1 log gi (x) is concave if gi are concave and positive
Pm
• log i=1 exp gi(x) is convex if gi are convex
is convex
examples
• f (x, y) = xT Ax + 2xT By + y T Cy with
A B
0, C0
BT C
g is convex if f is convex
examples
• f (x) = xT x is convex; hence g(x, t) = xT x/t is convex for t > 0
• negative logarithm f (x) = − log x is convex; hence relative entropy
g(x, t) = t log t − t log x is convex on R2++
• if f is convex, then
T T
g(x) = (c x + d)f (Ax + b)/(c x + d)
f (x)
xy
PSfrag replacements
x
(0, −f ∗(y))
Sα = {x ∈ dom f | f (x) ≤ α}
PSfrag replacements
β
a b c
• f is quasiconcave if −f is quasiconvex
• f is quasilinear if it is quasiconvex and quasiconcave
is quasiconvex
∇f (x)
x
PSfrag replacements
1 1 T
Σ−1 (x−x̄)
f (x) = p e− 2 (x−x̄)
(2π)n det Σ
f (x) = prob(x + y ∈ C)
is log-concave
proof: write f (x) as integral of product of log-concave functions
Z
1 u∈C
f (x) = g(x + y)p(y) dy, g(u) =
0 u∈
6 C,
p is pdf of y
Y (x) = prob(x + w ∈ S)
• Y is log-concave
for x, y ∈ dom f , 0 ≤ θ ≤ 1
for X, Y ∈ Sm, 0 ≤ θ ≤ 1
4–1
Optimization problem in standard form
minimize f0(x)
subject to fi(x) ≤ 0, i = 1, . . . , m
hi(x) = 0, i = 1, . . . , p
optimal value:
examples (with n = 1, m = p = 0)
• f0(x) = 1/x, dom f0 = R++: p? = 0, no optimal point
• f0(x) = − log x, dom f0 = R++: p? = −∞
• f0(x) = x log x, dom f0 = R++: p? = −1/e, x = 1/e is optimal
• f0(x) = x3 − 3x, p? = −∞, local optimum at x = 1
m
\ p
\
x∈D= dom fi ∩ dom hi,
i=0 i=1
example:
Pk
minimize f0(x) = − i=1 log(bi − aTi x)
find x
subject to fi(x) ≤ 0, i = 1, . . . , m
hi(x) = 0, i = 1, . . . , p
minimize 0
subject to fi(x) ≤ 0, i = 1, . . . , m
hi(x) = 0, i = 1, . . . , p
minimize f0(x)
subject to fi(x) ≤ 0, i = 1, . . . , m
aTi x = bi, i = 1, . . . , p
often written as
minimize f0(x)
subject to fi(x) ≤ 0, i = 1, . . . , m
Ax = b
−∇f0(x)
x
X
PSfrag replacements
minimize f0(x)
subject to fi(x) ≤ 0, i = 1, . . . , m
Ax = b
is equivalent to
Ax = b ⇐⇒ x = F z + x0 for some z
is equivalent to
minimize f0(x)
subject to aTi x ≤ bi, i = 1, . . . , m
is equivalent to
minimize (over x, t) t
subject to f0(x) − t ≤ 0
fi(x) ≤ 0, i = 1, . . . , m
Ax = b
is equivalent to
minimize f˜0(x1)
subject to fi(x1) ≤ 0, i = 1, . . . , m
minimize f0(x)
subject to fi(x) ≤ 0, i = 1, . . . , m
Ax = b
with f0 : Rn → R quasiconvex, f1, . . . , fm convex
can have locally optimal points that are not (globally) optimal
(x, f0(x))
PSfrag replacements
f0(x) ≤ t ⇐⇒ φt(x) ≤ 0
example
p(x)
f0(x) =
q(x)
minimize cT x + d
subject to Gx h
Ax = b
−c
P x?
PSfrag replacements
minimize cT x
subject to Ax b, x0
piecewise-linear minimization
equivalent to an LP
minimize t
subject to aTi x + bi ≤ t, i = 1, . . . , m
P = {x | aTi x ≤ bi, i = 1, . . . , m}
xcheb
is center of largest inscribed ball
PSfrag replacements
B = {xc + u | kuk2 ≤ r}
maximize r
subject to aTi xc + rkaik2 ≤ bi, i = 1, . . . , m
minimize f0(x)
subject to Gx h
Ax = b
linear-fractional program
cT x + d
f0(x) = T , dom f0(x) = {x | eT x + f > 0}
e x+f
minimize cT y + dz
subject to Gy hz
Ay = bz
eT y + f z = 1
z≥0
cTi x + di
f0(x) = max T , dom f0(x) = {x | eTi x+fi > 0, i = 1, . . . , r}
i=1,...,r e x + fi
i
minimize (1/2)xT P x + q T x + r
subject to Gx h
Ax = b
−∇f0(x?)
x?
P
PSfrag replacements
least-squares
minimize kAx − bk22
minimize f T x
subject to kAix + bik2 ≤ cTi x + di, i = 1, . . . , m
Fx = g
minimize cT x
subject to aTi x ≤ bi, i = 1, . . . , m,
minimize cT x
subject to aTi x ≤ bi for all ai ∈ Ei, i = 1, . . . , m,
minimize cT x
subject to prob(aTi x ≤ bi) ≥ η, i = 1, . . . , m
• robust LP
minimize cT x
subject to aTi x ≤ bi ∀ai ∈ Ei, i = 1, . . . , m
minimize cT x
subject to āTi x + kPiT xk2 ≤ bi, i = 1, . . . , m
• robust LP
minimize cT x
subject to prob(aTi x ≤ bi) ≥ η, i = 1, . . . , m,
minimize cT x
1/2
subject to āTi x + Φ−1(η)kΣi xk2 ≤ bi, i = 1, . . . , m
minimize f0(x)
subject to fi(x) ≤ 1, i = 1, . . . , m
hi(x) = 1, i = 1, . . . , p
PK a
1k 2k a nk a
• posynomial f (x) = k=1 ck x1 x2 · · · xn transforms to
K
!
X T
log f (ey1 , . . . , eyn ) = log eak y+bk (bk = log ck )
k=1
design problem
• aspect ratio hi/wi and inverse aspect ratio wi/hi are monomials
• the vertical deflection yi and slope vi of central axis at the right end of
segment i are defined recursively as
F
vi = 12(i − 1/2) 3 + vi+1
Ewihi
F
yi = 6(i − 1/3) 3 + vi+1 + yi+1
Ewihi
note
• we write wmin ≤ wi ≤ wmax and hmin ≤ hi ≤ hmax
Sminwi/hi ≤ 1, hi/(wiSmax) ≤ 1
minimize λ Pn
subject to j=1 A(x)ij vj /(λvi ) ≤ 1, i = 1, . . . , n
variables λ, v, x
minimize f0(x)
subject to fi(x) Ki 0, i = 1, . . . , m
Ax = b
conic form problem: special case with affine objective and constraints
minimize cT x
subject to F x + g K 0
Ax = b
minimize cT x
subject to x1F1 + x2F2 + · · · + xnFn + G 0
Ax = b
with Fi, G ∈ Sk
SOCP: minimize f T x
subject to kAix + bik2 ≤ cTi x + di, i = 1, . . . , m
SDP: minimize f T x
(cTi x + di)I Ai x + bi
subject to 0, i = 1, . . . , m
(Aix + bi)T cTi x + di
minimize λmax(A(x))
equivalent SDP
minimize t
subject to A(x) tI
• variables x ∈ Rn, t ∈ R
• follows from
λmax(A) ≤ t ⇐⇒ A tI
T
1/2
minimize kA(x)k2 = λmax(A(x) A(x))
minimize t
tI A(x)
subject to 0
A(x)T tI
• variables x ∈ Rn, t ∈ R
• constraint follows from
kAk2 ≤ t ⇐⇒ AT A t2I, t ≥ 0
tI A
⇐⇒ 0
AT tI
O = {f0(x) | x feasible}
O
O
f0(xpo )
rag replacements PSfrag replacements
f0(x?)
x? is optimal xpo is Pareto optimal
25
20 O
F2(x) = kxk22
15
PSfrag replacements 10
0
0 10 20 30 40 50
allocation x
mean return
10%
0.5
x(1)
5%
0
0%
0% 10% 20% 0% 10% 20%
standard deviation of return standard deviation of return
Convex optimization problems 4–44
Scalarization
minimize λT f0(x)
subject to fi(x) ≤ 0, i = 1, . . . , m
hi(x) = 0, i = 1, . . . , p
PSfrag replacements O
if x is optimal for scalar problem,
then it is Pareto-optimal for vector f0(x1)
optimization problem
f0(x3)
λ1
f0(x2) λ2
for convex vector optimization problems, can find (almost) all Pareto
optimal points by varying λ K ∗ 0
examples
kxk22
10
minimize kAx − bk22 + γkxk22
PSfrag replacements
5
for fixed γ, a LS problem γ=1
0
0 5 10 15 20
kAx − bk22
5. Duality
5–1
Lagrangian
minimize f0(x)
subject to fi(x) ≤ 0, i = 1, . . . , m
hi(x) = 0, i = 1, . . . , p
Duality 5–2
Lagrange dual function
Duality 5–3
Least-norm solution of linear equations
minimize xT x
subject to Ax = b
dual function
• Lagrangian is L(x, ν) = xT x + ν T (Ax − b)
• to minimize L over x, set gradient equal to zero:
∇xL(x, ν) = 2x + AT ν = 0 =⇒ x = −(1/2)AT ν
• plug in in L to obtain g:
1
g(ν) = L((−1/2)AT ν, ν) = − ν T AAT ν − bT ν
4
a concave function of ν
Duality 5–4
Standard form LP
minimize cT x
subject to Ax = b, x0
dual function
• Lagrangian is
L(x, λ, ν) = cT x + ν T (Ax − b) − λT x
= −bT ν + (c + AT ν − λ)T x
• L is linear in x, hence
−bT ν AT ν − λ + c = 0
g(λ, ν) = inf L(x, λ, ν) =
x −∞ otherwise
Duality 5–5
Equality constrained norm minimization
minimize kxk
subject to Ax = b
dual function
T T bT ν kAT νk∗ ≤ 1
g(ν) = inf (kxk − ν Ax + b ν) =
x −∞ otherwise
where kvk∗ = supkuk≤1 uT v is dual norm of k · k
Duality 5–6
Two-way partitioning
minimize xT W x
subject to x2i = 1, i = 1, . . . , n
dual function
X
T
g(ν) = inf (x W x + νi(x2i − 1)) = inf xT (W + diag(ν))x − 1T ν
x x
i
−1T ν W + diag(ν) 0
=
−∞ otherwise
Duality 5–7
Lagrange dual and conjugate function
minimize f0(x)
subject to Ax b, Cx = d
dual function
T T T T T
g(λ, ν) = inf f0(x) + (A λ + C ν) x − b λ − d ν
x∈dom f0
= −f0∗(−AT λ − C T ν) − bT λ − dT ν
Duality 5–8
The dual problem
maximize g(λ, ν)
subject to λ 0
• finds best lower bound on p?, obtained from Lagrange dual function
• a convex optimization problem; optimal value denoted d?
• λ, ν are dual feasible if λ 0, (λ, ν) ∈ dom g
• often simplified by making implicit constraint (λ, ν) ∈ dom g explicit
Duality 5–9
Weak and strong duality
weak duality: d? ≤ p?
• always holds (for convex and nonconvex problems)
• can be used to find nontrivial lower bounds for difficult problems
for example, solving the SDP
maximize −1T ν
subject to W + diag(ν) 0
gives a lower bound for the two-way partitioning problem on page 5–7
strong duality: d? = p?
• does not hold in general
• (usually) holds for convex problems
• conditions that guarantee strong duality in convex problems are called
constraint qualifications
Duality 5–10
Slater’s constraint qualification
minimize f0(x)
subject to fi(x) ≤ 0, i = 1, . . . , m
Ax = b
• also guarantees that the dual optimum is attained (if p? > −∞)
• can be sharpened: e.g., can replace int D with relint D (interior
relative to affine hull); linear inequalities do not need to hold with strict
inequality, . . .
• there exist many other types of constraint qualifications
Duality 5–11
Inequality form LP
primal problem
minimize cT x
subject to Ax b
dual function
T T T
−bT λ AT λ + c = 0
g(λ) = inf (c + A λ) x − b λ =
x −∞ otherwise
dual problem
maximize −bT λ
subject to AT λ + c = 0, λ0
Duality 5–12
Quadratic program
primal problem (assume P ∈ Sn++)
minimize xT P x
subject to Ax b
dual function
1
g(λ) = inf x P x + λ (Ax − b) = − λT AP −1AT λ − bT λ
T T
x 4
dual problem
maximize −(1/4)λT AP −1AT λ − bT λ
subject to λ 0
Duality 5–13
A nonconvex problem with strong duality
minimize xT Ax + 2bT x
subject to xT x ≤ 1
A 6 0, hence nonconvex
strong duality although primal problem is not convex (not easy to show)
Duality 5–14
Geometric interpretation
for simplicity, consider problem with one constraint f1(x) ≤ 0
interpretation of dual function:
t t
PSfrag replacements
G G
f1(x) p? p?
λu + t = g(λ) d?
g(λ)
u u
Duality 5–15
epigraph variation: same interpretation if G is replaced with
A
PSfrag replacements
f1(x)
λu + t = g(λ) p?
g(λ)
u
strong duality
• holds if there is a non-vertical supporting hyperplane to A at (0, p?)
• for convex problem, A is convex, hence has supp. hyperplane at (0, p?)
• Slater’s condition: if there exist (ũ, t̃) ∈ A with ũ < 0, then supporting
hyperplanes at (0, p?) must be non-vertical
Duality 5–16
Complementary slackness
Duality 5–17
Karush-Kuhn-Tucker (KKT) conditions
the following four conditions are called KKT conditions (for a problem with
differentiable fi, hi):
m
X p
X
∇f0(x) + λi∇fi(x) + νi∇hi(x) = 0
i=1 i=1
from page 5–17: if strong duality holds and x, λ, ν are optimal, then they
must satisfy the KKT conditions
Duality 5–18
KKT conditions for convex problem
if x̃, λ̃, ν̃ satisfy KKT for a convex problem, then they are optimal:
• recall that Slater implies strong duality, and dual optimum is attained
• generalizes optimality condition ∇f0(x) = 0 for unconstrained problem
Duality 5–19
example: water-filling (assume αi > 0)
Pn
minimize − i=1 log(xi + αi)
subject to x 0, 1T x = 1
interpretation
PSfrag replacements
• n patches; level of patch i is at height αi
1/ν ?
xi
• flood area with unit amount of water
αi
?
• resulting level is 1/ν
i
Duality 5–20
Perturbation and sensitivity analysis
(unperturbed) optimization problem and its dual
Duality 5–21
global sensitivity result
assume strong duality holds for unperturbed problem, and that λ?, ν ? are
dual optimal for unperturbed problem
apply weak duality to perturbed problem:
p?(u, v) ≥ g(λ?, ν ?) − uT λ? − v T ν ?
= p?(0, 0) − uT λ? − v T ν ?
sensitivity interpretation
Duality 5–22
local sensitivity: if (in addition) p?(u, v) is differentiable at (0, 0), then
∂p?(0, 0) ∂p?(0, 0)
λ?i =− , νi? =−
∂ui ∂vi
PSfrag replacements
?
p (u) for a problem with one (inequality)
constraint: u
u=0 p?(u)
p?(0) − λ?u
Duality 5–23
Duality and problem reformulations
common reformulations
Duality 5–24
Introducing new variables and equality constraints
minimize f0(Ax + b)
Duality 5–25
norm approximation problem: minimize kAx − bk
minimize kyk
subject to y = Ax − b
maximize bT ν
subject to AT ν = 0, kνk∗ ≤ 1
Duality 5–26
Implicit constraints
LP with box constraints: primal and dual problem
dual function
g(ν) = inf (cT x + ν T (Ax − b))
−1x1
Duality 5–27
Problems with generalized inequalities
minimize f0(x)
subject to fi(x) Ki 0, i = 1, . . . , m
hi(x) = 0, i = 1, . . . , p
Duality 5–28
lower bound property: if λi Ki∗ 0, then g(λ1, . . . , λm, ν) ≤ p?
proof: if x̃ is feasible and λ Ki∗ 0, then
m
X p
X
f0(x̃) ≥ f0(x̃) + λTi fi(x̃) + νihi(x̃)
i=1 i=1
≥ inf L(x, λ1, . . . , λm, ν)
x∈D
= g(λ1, . . . , λm, ν)
Duality 5–29
Semidefinite program
primal SDP (Fi, G ∈ Sk )
minimize cT x
subject to x1F1 + · · · + xnFn G
dual SDP
maximize − tr(GZ)
subject to Z 0, tr(FiZ) + ci = 0, i = 1, . . . , n
Duality 5–30
Convex Optimization — Boyd & Vandenberghe
• norm approximation
• least-norm problems
• regularized approximation
• robust approximation
6–1
Norm approximation
minimize kAx − bk
y = Ax + v
AT Ax = AT b
minimize t
subject to −t1 Ax − b t1
minimize 1T y
subject to −y Ax − b y
examples
2
2
• quadratic: φ(u) = u log barrier
1.5 quadratic
• deadzone-linear with width a:
φ(u)
1
deadzone-linear
φ(u) = max{0, |u| − a}
0.5
40
p=1
0
−2 −1 0 1 2
10
p=2
0
−2 −1 0 1 2
Deadzone
20
0
−2 −1 0 1 2
Log barrier
10
0
−2 −1 0 1 2
r
1.5
10
φhub (u)
f (t)
1
0
0.5 −10
0 −20
−1.5 −1 −0.5 0 0.5 1 1.5 −10 −5 0 5 10
u t
minimize kxk
subject to Ax = b
2x + AT ν = 0, Ax = b
minimize 1T y
subject to −y x y, Ax = b
Tikhonov regularization
y(t)
u(t)
0
−5 −0.5
−10 −1
0 50 100 150 200 0 50 100 150 200
t t
4 1
ag replacements PSfrag replacements
2 0.5
y(t)
u(t)
0 0
−2 −0.5
−4 −1
0 50 100 150 200 0 50 100 150 200
t t
4 1
2 0.5
y(t)
u(t)
0 0
−2 −0.5
−4 −1
0 50 100 150 200 0 50 100 150 200
t t
• x ∈ Rn is unknown signal
• xcor = x + v is (known) corrupted version of x, with additive noise v
• variable x̂ (reconstructed signal) is estimate of x
• φ : Rn → R is regularization function or smoothing objective
n−1
X n−1
X
φquad(x̂) = (x̂i+1 − x̂i)2, φtv (x̂) = |x̂i+1 − x̂i|
i=1 i=1
0.5
x̂
0
−0.5
x
0
0 1000 2000 3000 4000
0.5
−0.5
0 1000 2000 3000 4000
x̂
0
−0.5
0.5
0 1000 2000 3000 4000
0.5
xcor
x̂
0
−0.5 −0.5
0 1000 2000 3000 4000 0 1000 2000 3000 4000
i i
original signal x and noisy three solutions on trade-off curve
signal xcor kx̂ − xcork2 versus φquad(x̂)
placements
x̂i
2 0
PSfrag replacements
1
−2
0 500 1000 1500 2000
x
0
2
−1
x̂i
−2 0
0 500 1000 1500 2000
2 −2
0 500 1000 1500 2000
1 2
xcor
x̂i
0
−1
−2 −2
0 500 1000 1500 2000 0 500 1000 1500 2000
i i
original signal x and noisy three solutions on trade-off curve
signal xcor kx̂ − xcork2 versus φquad(x̂)
placements
x̂
2 0
PSfrag replacements
1
−2
0 500 1000 1500 2000
x
0
2
−1
x̂
−2 0
0 500 1000 1500 2000
2 −2
0 500 1000 1500 2000
1 2
xcor
x̂
0
−1
−2 −2
0 500 1000 1500 2000 0 500 1000 1500 2000
i i
original signal x and noisy three solutions on trade-off curve
signal xcor kx̂ − xcork2 versus φtv (x̂)
12
example: A(u) = A0 + uA1
10
xnom
• xnom minimizes kA0x − bk22
8
r(u)
6 xstoch
with u uniform on [−1, 1] xwc
4
• from page 5–14, strong duality holds between the following problems
minimize t+ λ
I P (x) q(x)
subject to P (x)T λI 0 0
q(x)T 0 t
0.2 xrls
frequency
0.15
0.1
xtik
0.05 xls
0
0 1 2 3 4 5
r(u)
7. Statistical estimation
• experiment design
7–1
Parametric distribution estimation
• y is observed value
• l(x) = log px(y) is called log-likelihood function
• can add constraints x ∈ C explicitly, or define px(y) = 0 for x 6∈ C
• a convex optimization problem if log px(y) is concave in x for fixed y
yi = aTi x + vi, i = 1, . . . , m
(y is observed value)
ML estimate is LS solution
• Laplacian noise: p(z) = (1/(2a))e−|z|/a,
m
1X T
l(x) = −m log(2a) − |ai x − yi|
a i=1
exp(aT u + b)
p = prob(y = 1) =
1 + exp(aT u + b)
k
X m
X
= (aT ui + b) − log(1 + exp(aT ui + b))
i=1 i=1
concave in a, b
0.8
prob(y = 1)
0.6
0.4
0.2
0 2 4 6 8 10
u
randomized detector
variable T ∈ R2×n
minimax detector
example
0.70 0.10
0.20 0.10
P =
0.05
0.70
0.05 0.10
0.8
0.6
Pfn
0.4
1
0.2
2 4
3
0
0 0.2 0.4 0.6 0.8 1
Pfp
m
!−1 m
X X
x̂ = aiaTi yi a i
i=1 i=1
λk (1 − vkT W vk ) = 0, k = 1, . . . , p
PSfrag replacements
λ2 = 0.5
p
!!
X
L(X, λ, Z, z, ν) = log det X −1+tr Z X− λk vk vkT −z T λ+ν(1T λ−1)
k=1
dual problem
change variable W = Z/ν, and optimize over ν to get dual of page 7–13
8. Geometric problems
• centering
• classification
8–1
Minimum volume ellipsoid around a set
PSfrag replacements
√
factor n can be improved to n if C is symmetric
xcheb xmve
fi(x) ≤ 0, i = 1, . . . , m, Fx = g
xac is minimizer of
m
X xac
φ(x) = − log(bi − aTi x)
i=1
PSfrag replacements
where
aT xi + bi > 0, i = 1, . . . , N, aT yi + bi < 0, i = 1, . . . , M
aT xi + bi ≥ 1, i = 1, . . . , N, aT yi + bi ≤ −1, i = 1, . . . , M
H1 = {z | aT z + b = 1}
H2 = {z | aT z + b = −1}
minimize (1/2)kak2
subject to aT xi + b ≥ 1, i = 1, . . . , N (1)
aT yi + b ≤ −1, i = 1, . . . , M
maximize 1 T λ + 1T µ
PN PM
subject to 2 i=1 λi xi − i=1 µi yi ≤1 (2)
2
1T λ = 1T µ, λ 0, µ0
interpretation
minimize t
PN PM
subject to i=1 θi xi − i=1 γi yi ≤t
2
T
θ 0, 1 θ = 1, γ 0, 1T γ = 1
minimize 1T u + 1T v
subject to aT xi + b ≥ 1 − ui, i = 1, . . . , N
aT yi + b ≤ −1 + vi, i = 1, . . . , M
u 0, v 0
• an LP in a, b, u, v
• at optimum, ui = max{0, 1 − aT xi − b}, vi = max{0, 1 + aT yi + b}
• can be interpreted as a heuristic for minimizing #misclassified points
f (z) = θ T F (z)
xTi P xi + q T xi + r ≥ 1, yiT P yi + q T yi + r ≤ −1
placement problem
P
minimize i6=j fij (xi, xj )
0 0 0
−1 −1 −1
−1 0 1 −1 0 1 −1 0 1
PSfrag replacements
eplacements
histograms of connection lengths kxi − xj k2
PSfrag replacements
4 4 6
5
3 3
4
2 2 3
2
1 1
1
00 0.5 1 1.5 2 00 0.5 1 1.5 00 0.5 1 1.5
9–1
Matrix structure and algorithm complexity
flop counts
x1 := b1/a11
x2 := (b2 − a21x1)/a22
x3 := (b3 − a31x1 − a32x2)/a33
..
permutation matrices:
1 j = πi
aij =
0 otherwise
A = A 1 A2 · · · A k
A = P LU
A = P1LU P2
A = LLT
A = P LLT P T
A = P LDLT P T
cost: (1/3)n3
(A22 − A21A−1
11 A 12 )x 2 = b 2 − A 21 A −1
11 b1
• step 3: (2/3)n32
examples
• general A11 (f = (2/3)n31, s = 2n21): no gain over standard method
(A + BC)x = b
first write as
A B x b
=
C −I y 0
(I + CA−1B)y = CA−1b,
then solve Ax = b − By
this proves the matrix inversion lemma: if A and A + BC nonsingular,
{x | Ax = b} = {F z + x̂ | z ∈ Rn−p}
• Newton’s method
• self-concordant functions
• implementation
10–1
Unconstrained minimization
minimize f (x)
f (x(k)) → p?
∇f (x?) = 0
2nd condition is hard to verify, except when all sublevel sets are closed:
• equivalent to condition that epi f is closed
• true if dom f = Rn
• true if f (x) → ∞ as x → bd dom f
implications
• for x, y ∈ S,
m
f (y) ≥ f (x) + ∇f (x)T (y − x) + kx − yk22
2
hence, S is bounded
• p? > −∞, and for x ∈ S,
? 1
f (x) − p ≤ k∇f (x)k22
2m
useful as stopping criterion (if you know m)
PSfrag replacements
f (x + t∆x)
• veryreplacements
PSfrag slow if γ 1 or γ 1
• example for γ = 10:
x(0)
x2
0
x(1)
−4
−10 0 10
x1
x(0) x(0)
x(2)
x(1)
replacements PSfrag replacements
x(1)
104
102
f (x(k) ) − p?
10−2
backtracking l.s.
10−4
0 50 100 150 200
k
unit balls and normalized steepest descent directions for a quadratic norm
and the `1-norm:
−∇f (x)
−∇f (x)
∆xnsd
∆xnsd
(0) x(0)
x
x(2)
x(1) x (2)
replacements
PSfrag replacements
x(1)
• steepest descent with backtracking line search for two quadratic norms
• ellipses show {x | kx − x(k)kP = 1}
• equivalent interpretation of steepest descent with quadratic norm k · kP :
gradient descent after change of variables x̄ = P 1/2x
interpretations
• x + ∆xnt minimizes second order approximation
b T 1 T 2
f (x + v) = f (x) + ∇f (x) v + v ∇ f (x)v
2
T 2
1/2
kuk∇2f (x) = u ∇ f (x)u
x + ∆xnsd
x + ∆xnt
PSfrag replacements
b 1
f (x) − inf f (y) = λ(x)2
y 2
• equal to the norm of the Newton step in the quadratic Hessian norm
2
1/2
λ(x) = ∆xnt∇ f (x)∆xnt
Newton iterates for f˜(y) = f (T y) with starting point y (0) = T −1x(0) are
y (k) = T −1x(k)
assumptions
• f strongly convex on S with constant m
• ∇2f is Lipschitz continuous on S, with constant L > 0:
2l−k 2l−k
L l L k 1
2
k∇f (x )k2 ≤ 2
k∇f (x )k2 ≤ , l≥k
2m 2m 2
f (x(0)) − p?
+ log2 log2(0/)
γ
105
x(0) 100
f (x(k)) − p?
x(1) 10−5
placements
10−10
10−150 1 2 3 4 5
k
105 2
10−15 0
0 2 4 6 8 10 0 2 4 6 8
k k
100
10−5
0 5 10 15 20
k
definition
• f : R → R is self-concordant if |f 000(x)| ≤ 2f 00(x)3/2 for all x ∈ dom f
• f : Rn → R is self-concordant if g(t) = f (x + tv) is self-concordant for
all x ∈ dom f , v ∈ Rn
examples on R
• linear and quadratic functions
• negative logarithm f (x) = − log x
• negative entropy plus negative logarithm: f (x) = x log x − log x
properties
• preserved under positive scaling α ≥ 1, and sum
• preserved under composition with affine function
• if g is convex with dom g = R++ and |g 000(x)| ≤ 3g 00(x)/x then
is self-concordant
examples: properties can be used to show that the following are s.c.
Pm
• f (x) = − i=1 log(bi − aTi x) on {x | aTi x < bi, i = 1, . . . , m}
• f (X) = − log det X on Sn++
• f (x) = − log(y 2 − xT x) on {(x, y) | kxk2 < y}
• if λ(x) ≤ η, then
2
2λ(x(k+1)) ≤ 2λ(x(k))
f (x(0)) − p?
+ log2 log2(1/)
γ
25
20
◦:
iterations
m = 100, n = 50 15
: m = 1000, n = 500
♦: m = 1000, n = 50 10
0
0 5 10 15 20 25 30 35
(0) ?
f (x )−p
main effort in each iteration: evaluate derivatives and solve Newton system
H∆x = g
• implementation
11–1
Equality constrained minimization
minimize f (x)
subject to Ax = b
∇f (x?) + AT ν ? = 0, Ax? = b
minimize (1/2)xT P x + q T x + r
subject to Ax = b
optimality condition:
T
?
P A x −q
=
A 0 ν? b
Ax = 0, x 6= 0 =⇒ xT P x > 0
represent solution of {x | Ax = b} as
{x | Ax = b} = {F z + x̂ | z ∈ Rn−p}
minimize f (F z + x̂)
reduced problem:
interpretations
T 2
1/2 T
1/2
λ(x) = ∆xnt∇ f (x)∆xnt = −∇f (x) ∆xnt
properties
1
f (x) − inf fb(y) = λ(x)2
Ay=b 2
d
f (x + t∆xnt) = −λ(x)2
dt t=0
T 2 −1
1/2
• in general, λ(x) 6= ∇f (x) ∇ f (x) ∇f (x)
• variables z ∈ Rn−p
• x̂ satisfies Ax̂ = b; rank F = n − p and AF = 0
• Newton’s method for f˜, started at z (0), generates iterates z (k)
x(k+1) = F z (k) + x̂
primal-dual interpretation
• write optimality condition as r(y) = 0, where
given starting point x ∈ dom f , ν , tolerance > 0, α ∈ (0, 1/2), β ∈ (0, 1).
repeat
1. Compute primal and dual Newton steps ∆xnt, ∆νnt.
2. Backtracking line search on krk2.
t := 1.
while kr(x + t∆xnt, ν + t∆νnt)k2 > (1 − αt)kr(x, ν)k2, t := βt.
3. Update. x := x + t∆xnt , ν := ν + t∆νnt.
until Ax = b and kr(x, ν)k2 ≤ .
d
kr(y + ∆y)k2 = −kr(y)k2
dt t=0
T
H A v g
=−
A 0 w h
solution methods
• LDLT factorization
• elimination (if H nonsingular)
105
f (x(k) ) − p?
100
10−5
10−100 5 10 15 20
k
p? − g(ν (k) )
100
PSfrag replacements
10−5
10−100 2 4 6 8 10
k
3. infeasible start Newton method (requires x(0) 0)
1010
105
kr(x(k) , ν (k) )k2
100
10−5
10−10
10−150 5 10 15 20 25
k
Equality constrained minimization 11–14
complexity per iteration of three methods is identical
T
H A v g
=−
A 0 w h
p
X
tr(AiXAj X)wj = bi, i = 1, . . . , p (2)
j=1
12–1
Inequality constrained minimization
minimize f0(x)
subject to fi(x) ≤ 0, i = 1, . . . , m (1)
Ax = b
• approximation improves as t → ∞
−5
−3 −2 −1 0 1
u
Interior-point methods 12–4
logarithmic barrier function
m
X
φ(x) = − log(−fi(x)), dom φ = {x | f1(x) < 0, . . . , fm(x) < 0}
i=1
m
X 1
∇φ(x) = ∇fi(x)
i=1
−f i (x)
m
X m
X 1
1
∇2φ(x) = 2
∇f i (x)∇f i (x) T
+ ∇ 2
fi(x)
f (x)
i=1 i i=1
−fi(x)
(for now, assume x?(t) exists and is unique for each t > 0)
• central path is {x?(t) | t > 0}
minimize cT x
subject to aTi x ≤ bi, PSfrag
i = 1, ...,6
replacements
? x?(10)
x
T T ?
hyperplane c x = c x (t) is tangent to
level curve of φ through x?(t)
p? ≥ g(λ?(t), ν ?(t))
= L(x?(t), λ?(t), ν ?(t))
= f0(x?(t)) − m/t
m
X
∇f0(x) + λi∇fi(x) + AT ν = 0
i=1
m
X
F0(x?(t)) + Fi(x?(t)) = 0
i=1
−3c
t=1 t=3
centering problem
102 140
Newton iterations
120
0
10
duality gap
100
10−2 80
60
−4
10
40
10−6 µ = 50 µ = 150 µ=2 20
0
0 20 40 60 80 0 40 80 120 160 200
Newton iterations µ
102
100
duality gap
10−2
10−4
0 20 40 60 80 100 120
Newton iterations
minimize cT x
subject to Ax = b, x0
30
25
20
15 1
10 102 103
m
fi(x) ≤ 0, i = 1, . . . , m, Ax = b (2)
minimize (over x, s) s
subject to fi(x) ≤ s, i = 1, . . . , m (3)
Ax = b
40 40
number
number
20 20
0 0
−1 −0.5 0 0.5 1 1.5 −1 −0.5 0 0.5 1 1.5
bi − aTi xmax bi − aTi xsum
Newton iterations
80 100 80
100 80 Infeasible 100 Feasible
60
40
20
0
−1 −0.5 0 0.5 1
γ
Newton iterations
Newton iterations
100 100
80 80
60 60
40 40
20 20
0 0
−100 −10−2 −10−4 −10−6 10−6 10−4 −2
100
γ γ 10
number of iterations roughly proportional to log(1/|γ|)
second condition
5 104
4 104
figure shows N for typical values of γ, c,
3 104
m
N
4 m = 100, = 105
2 10 t(0)
1 104
0
1 1.1 1.2
µ
√
• for µ = 1 + 1/ m:
√ m/t(0)
N =O m log
√
• number of Newton iterations for fixed gap reduction is O( m)
minimize f0(x)
subject to fi(x) Ki 0, i = 1, . . . , m
Ax = b
examples
Pn
• nonnegative orthant K = Rn+: ψ(y) = i=1 log yi , with degree θ = n
• positive semidefinite cone K = Sn+:
2
ψ(y) = log(yn+1 − y12 − · · · − yn2 ) (θ = 2)
∇ψ(y) K ∗ 0, y T ∇ψ(y) = θ
Pn
• nonnegative orthant Rn+: ψ(y) = i=1 log yi
m
X
φ(x) = − ψi(−fi(x)), dom φ = {x | fi(x) ≺Ki 0, i = 1, . . . , m}
i=1
1 w
λ?i(t) = ∇ψi(−fi(x?(t))), ?
ν (t) =
t t
m
X
f0(x?(t)) − g(λ?(t), ν ?(t)) = (1/t) θi
i=1
minimize cT x Pn
subject to F (x) = i=1 xiFi + G 0
maximize tr(GZ)
subject to tr(FiZ) + ci = 0, i = 1, . . . , n
Z0
P
• only difference is duality gap m/t on central path is replaced by i θi /t
Newton iterations
120
duality gap
0
10
10−2 80
10−4 40
10−6 µ = 50 µ = 200 µ=2
0
0 20 40 60 80 20 60 100 140 180
Newton iterations µ
100 100
10−2
60
−4
10
minimize 1T x
subject to A + diag(x) 0
PSfrag replacements
35
Newton iterations
30
25
20
15
101 102 103
n
13. Conclusions
13–1
Modeling
mathematical optimization
tractability
Conclusions 13–2
Theoretical consequences of convexity
Conclusions 13–3
Practical consequences of convexity
Conclusions 13–4
Convex optimization examples
1
Force/moment generation with thrusters
ui
θi
(pix, piy )
PSfrag replacements
Pn
† resulting vertical force: Fy = i=1 ui sin µi
Pn
† resulting torque: T = i=1 (piy ui cos µi ¡ pixui sin µi)
† fuel usage: u1 + ¢ ¢ ¢ + un
problem: flnd thruster forces ui that yield given desired forces and torques
Fxdes, Fydes, T des, and minimize fuel usage (if feasible)
minimize 1T u
subject to F u = f des
0 • ui • 1, i = 1, . . . , n
where
cos µ1 ¢¢¢ cos µn
F = sin µ1 ¢¢¢ sin µn ,
p1y cos µ1 ¡ p1x sin µ1 ¢ ¢ ¢ pny cos µn ¡ pnx sin µn
can express as LP
minimize kF u ¡ f desk∞
subject to 0 • ui • 1, i = 1, . . . , n
can express as LP
minimize # thrusters on
subject to F u = f des
0 • ui • 1, i = 1, . . . , n
can’t express as LP
(but we could check feasibility of each of the 2n subsets of thrusters)
ku(t)k∞ • 1, t = 0, 1, . . . , K
† settling time:
three unit masses, connected by two unit springs with equilibrium length
one
u1(t) u2(t)
PSfrag replacements
0
u1
−1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
t
1
u2
−1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
PSfrag replacements t
2
¡
right mass
1 ¡
ª
positions
0 ¾ center mass
−1
@
I
@
−2 left mass
−3
−4
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
t
transmitter i
Aijipi
maximize min P
k6=i Aijk pk + Nij
i,j
subject to 0 • pi • pmax
(xi, yi)
PSfrag replacements θ
† unit plane wave incident from angle µ induces in ith element a signal
ej(xi cos µ+yi sin µ¡ωt)
√
(j = ¡1, frequency ω, wavelength 2…)
n
X
y(µ) = wiej(xi cos µ+yi sin µ)
i=1
10◦
sidelobe level
@
R
@
PSfrag replacements
minimize t
subject to |y(µi)| • t
y(µtar ) = 1
10◦
sidelobe level
@
R
@
PSfrag replacements
2
Pn 2
† minimize ¾ i=1 |wi | (thermal noise power in y)
nonconvex extension:
† maximize number of zero weights
† N transmitter frequencies 1, . . . , N
† transmitters at locations ai, bi ∈ R2 use frequency i
† transmitters at a1, a2, . . . , aN are the wanted ones
† transmitters at b1, b2, . . . , bN are interfering
† receiver at position x ∈ R2
b3
PSfrag replacements q
a3 a
a2 a b2
q
x
a1 a
b1
q
kx ¡ aik¡fi
S/I = min
i kx ¡ bi k¡fi
{x | kx ¡ aik • kx ¡ bik, i = 1, . . . , N }
b3
q
a3 a
a2 a b2
q
PSfrag replacements
a1 a
b1
q
c = Cp, Cij ‚ 0
is geometric program
PSfrag replacements
f1
f2
f3
f4
† fundamental frequency:
1/2 1/2
ω1 = ‚min(K, M ) = ‚min(M ¡1/2KM ¡1/2)
P
† structure weight w = w0 + i x i wi
as SDP: P
minimize w0 + i xiwi
subject to Ω2M (x) ¡ K(x) „ 0
li • x i • u i
† FIR fllters
† Chebychev design
† equalizer design
1
FIR fllters
n¡1
X
y(t) = h¿ u(t ¡ ¿ ), t∈Z
¿ =0
Filter design 2
fllter frequency response: H : R → C
√
† j means ¡1 here (EE tradition)
† H is periodic and conjugate symmetric, so only need to know/specify
for 0 • ω • …
Filter design 3
example: (lowpass) FIR fllter, order n = 21
impulse response h:
0.2
0.1
h(t)
−0.1
PSfrag replacements
−0.2
0 2 4 6 8 10 12 14 16 18 20
Filter design 4
frequency response magnitude (i.e., |H(ω)|):
1
10
0
10
|H(ω)|
−1
10
−2
PSfrag replacements 10
−3
10
0 0.5 1 1.5 2 2.5 3
ω
frequency response phase (i.e., 6 H(ω)):
3
1
H(ω)
−1
6
PSfrag replacements −2
−3
0 0.5 1 1.5 2 2.5 3
ω
Filter design 5
Chebychev design
† h is optimization variable
† Hdes : R → C is (given) desired transfer function
† convex problem
† can add constraints, e.g., |hi| • 1
sample (discretize) frequency:
Filter design 6
Chebychev design via SOCP:
minimize t° °
subject to °A h ¡ b ° • t,
(k) (k)
k = 1, . . . , m
where
• ‚
1 cos ωk ¢¢¢ cos(n¡1)ωk
A(k) =
0 ¡sin ωk ¢ ¢ ¢ ¡sin(n¡1)ωk
• ‚
<Hdes(ωk )
b(k) =
=Hdes(ωk )
h0
h = ¢¢¢
hn¡1
Filter design 7
Linear phase fllters
suppose
† n = 2N + 1 is odd
† impulse response is symmetric about midpoint:
ht = hn¡1¡t, t = 0, . . . , n ¡ 1
then
Filter design 8
† term e¡jN ω represents N -sample delay
e
† H(ω) is real
e
† |H(ω)| = |H(ω)|
† called linear phase fllter (6 H(ω) is linear except for jumps of ±…)
Filter design 9
Lowpass fllter speciflcations
δ1
1/δ1
PSfrag replacements
δ2
ωp ωs π
ω
idea:
† pass frequencies in passband [0, ωp]
† block frequencies in stopband [ωs, …]
Filter design 10
speciflcations:
† maximum passband ripple (±20 log10 –1 in dB):
|H(ω)| • –2, ωs • ω • …
Filter design 11
Linear phase lowpass fllter design
† sample frequency
e
† can assume wlog H(0) > 0, so ripple spec is
e k ) • –1
1/–1 • H(ω
minimize –2
e k ) • – 1 , 0 • ω k • ωp
subject to 1/–1 • H(ω
e k ) • – 2 , ωs • ωk • …
¡–2 • H(ω
Filter design 12
† passband ripple –1 is given
† an LP in variables h, –2
Filter design 13
example
† linear phase fllter, n = 21
† passband [0, 0.12…]; stopband [0.24…, …]
† max ripple –1 = 1.012 (±0.1dB)
† design for maximum stopband attenuation
impulse response h:
0.2
0.1
h(t)
−0.1
PSfrag replacements
−0.2
0 2 4 6 8 10 12 14 16 18 20
Filter design 14
frequency response magnitude (i.e., |H(ω)|):
1
10
0
10
|H(ω)| −1
10
−2
PSfrag replacements 10
−3
10
0 0.5 1 1.5 2 2.5 3
ω
frequency response phase (i.e., 6 H(ω)):
3
0
H(ω)
−1
PSfrag replacements −2
6
−3
0 0.5 1 1.5 2 2.5 3
ω
Filter design 15
Equalizer design
PSfrag replacements
H(ω) G(ω)
equalization: given
† G (unequalized frequency response)
† Gdes (desired frequency response)
e=∆
design (FIR equalizer) H so that G GH … Gdes
Filter design 16
Chebychev equalizer design:
fl fl
fl e fl
minimize max flG(ω) ¡ Gdes(ω)fl
ω∈[0,…]
Filter design 17
time-domain equalization: optimize impulse response g̃ of equalized
system
e.g., with Gdes(ω) = e¡jDω ,
‰
1 t=D
gdes(t) =
6 D
0 t=
sample design:
minimize maxt6=D |g̃(t)|
subject to g̃(D) = 1
† an LP
P 2
P
† can use t6=D g̃(t) or t6=D |g̃(t)|
Filter design 18
extensions:
† can impose (convex) constraints
G(k)H … Gdes, k = 1, . . . , K
Filter design 19
Equalizer design example
0.8
0.6
0.4
g(t)
0.2
0
PSfrag replacements
−0.2
−0.4
0 1 2 3 4 5 6 7 8 9
Filter design 20
1
10
|G(ω)|
0
10
PSfrag replacements
−1
10
0 0.5 1 1.5 2 2.5 3
ω
1
G(ω)
−1
6
PSfrag replacements −2
−3
0 0.5 1 1.5 2 2.5 3
ω
e
design 30th order FIR equalizer with G(ω) … e¡j10ω
Filter design 21
Chebychev equalizer design:
fl fl
fl fl
minimize max flG̃(ω) ¡ e¡j10ω fl
ω
0.8
0.6
g̃(t)
0.4
0.2
PSfrag replacements
0
−0.2
0 5 10 15 20 25 30 35
Filter design 22
e
equalized frequency response magnitude |G|
1
10
|G(ω)| 0
10
e
PSfrag replacements
−1
10
0 0.5 1 1.5 2 2.5 3
ω
e
equalized frequency response phase 6 G
3
1
G(ω)
0
e
−1
6
PSfrag replacements −2
−3
0 0.5 1 1.5 2 2.5 3
ω
Filter design 23
time-domain equalizer design:
0.8
0.6
g̃(t)
0.4
0.2
PSfrag replacements
0
−0.2
0 5 10 15 20 25 30 35
Filter design 24
e
equalized frequency response magnitude |G|
1
10
0
10
|G(ω)|
PSfrag replacements
e
−1
10
0 0.5 1 1.5 2 2.5 3
ω
e
equalized frequency response phase 6 G
3
0
G(ω)
−1
PSfrag replacements
e
−2
6
−3
0 0.5 1 1.5 2 2.5 3
ω
Filter design 25
Filter magnitude speciflcations
Filter design 26
Autocorrelation coe–cients
autocorrelation coefficients associated with impulse response
h = (h0, . . . , hn¡1) ∈ Rn are
X
rt = h¿ h¿ +t
¿
Filter design 27
Fourier transform of autocorrelation coefficients is
X n¡1
X
R(ω) = e¡jω¿ r¿ = r0 + 2rt cos ωt = |H(ω)|2
¿ t=1
. . . convex in r
Filter design 28
Spectral factorization
Filter design 29
log-Chebychev magnitude design
choose h to minimize
minimize t
subject to D(ω)2/t • R(ω) • tD(ω)2, 0•ω•…
† convex in variables r, t
† constraint includes spectral factorization condition
Filter design 30
√
example: 1/f (pink noise) fllter (i.e., D(ω) = 1/ ω), n = 50,
log-Chebychev design over 0.01… • ω • …
1
10
0
10
|H(ω)|2
−1
10
PSfrag replacements
−2
10 −1 0
10 10
ω
Filter design 31
Disciplined convex programming
and the cvx modeling framework
Michael C. Grant
Information Systems Laboratory
Stanford University
Today’s topics:
• Several specific classes of convex programs: LPs, SDPs, GPs, SOCPs... hopefully
you can identify a problem from one of these classes when you see one
Solvers typically handle only a certain class of problems, such as LPs, or SDPs, or GPs
Most problems do not immediately present themselves in standard form, so they must
be transformed into standard form
minimize cT x
subject to Ax b
Aeqx = beq
`xu
As standard forms go, this one is quite flexible: in fact, the first step linprog often
does is to convert this problem into a more restricted standard form!
x free =⇒ x+ − x−, x+ ≥ 0, x− ≥ 0
aT x ≤ b =⇒ aT x + s = b, s ≥ 0
aT x = b =⇒ aT x ≤ b, aT x ≥ b
x = sedumi( A, b, c, K )
minimize cT x
subject to Ax = b
x ∈ K , K 1 × K2 × · · · × K L
where each set Ki ⊆ Rni , i = 1, 2, . . . , L is chosen from a very short list of cones (see
next slide...)
The Matlab variable K gives the number, types, and dimensions of the cones Ki
Qn , { (x, y) ∈ Rn × R | kxk2 ≤ y }
Qnc , { (x, y) ∈ Cn × R | kxk2 ≤ y }
The cones must be arranged in this order: i.e., the free variables first, then the
nonnegative orthants, then the second-order cones, then the semidefinite cones
rk , aTk x − bk k = 1, 2, . . . m
Obviously, the value of x∗ depends significantly upon the choice of that norm...
Pm T
2 1/2
minimize kAx − bk2 = k=1 (ak x − bk )
No need to use linprog or SeDuMi here: this is a least squares problem, with an
analytic solution
x∗ = (AT A)−1AT b
>> x = A \ b;
x
minimize 0 1
minimize q q
=⇒
subject to −q1 ≤ Ax − b ≤ +q1 A −1 x b
subject to ≤
−A −1 q −b
LP formulation:
T x
minimize 0 1
minimize 1T w w
=⇒
subject to −w ≤ Ax − b ≤ w A −I x b
subject to ≤
−A −I w −b
Let |w|[k] represent the k-th largest element (in magnitude) of w ∈ Rm:
minimize 1T v + Lq
subject to −v − 1q ≤ Ax − b ≤ +v + 1q
v≥0
This is not much more complex than the `1 and `∞ cases—but of course, you have to
know this trick (and few do)
sl, su ∈ Rn+ are slack variables, which are used quite often to convert inequalities to
equations
People don’t simply write down optimization problems and hope that they are convex;
instead, they draw from a “mental library” of functions and sets with known convexity
properties, and combine them in ways that convex analysis guarantees will produce
convex results...
• A ruleset which governs how those atoms can be used and combined to form a valid
problem
These rules constitute a set of sufficient conditions for convexity. For example,
is convex, but does not obey the ruleset, because it is the composition of a concave
nondecreasing function and a convex function
Obviously, the usefulness of disciplined convex programming depends upon the size of
the atom library available to us... or at least, on the ability to expand that library as
needed
The manner in which atoms are implemented depends heavily on the capabilities of the
underlying solver
Currently, cvx uses (SeDuMi), whose standard form permits four constraint types:
• Equality constraints
• Inequality constraints
• Second-order cone constraints
• Linear matrix inequalities
3.5
1.5
f (x) , |x| 1
0
= { (x, y) | x ≤ y, −x ≤ y }
−0.5
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2
Key observation: for the purposes of convex optimization, the epigraph/hypograph can
be used to provide a complete description of a function
Graph implementations are, in effect, rules for “rewriting” a function in terms of other
atoms by describing its epigraph or hypograph
f (x) , kxk1 = inf 1T v | − v ≤ x ≤ v
SeDuMi does not even support the basic convex function f (x) = x2—or does it?
y x
x2 ≤ y =⇒ 0
x 1
Of course, if SeDuMi were replaced with a different solver, this implementation might
not be necessary
For example, the square root function can be implemented using square:
√
f (x) , x ≥ y =⇒ y2 ≤ x
If X is symmetric:
λmin(X) ≥ y ⇐⇒ X − yI 0
λmax(X) ≤ y ⇐⇒ yI − X 0
(
n n xT Y −1x Y 0
f : R × S → R, f (x, Y ) ,
+∞ otherwise
cvx actually supports a more general concept we call incomplete specifications, which
is
Graph implementations are just special cases of this more general framework
10
3
5
2.5
y
2 0
1.5
−5
1
−10
0.5
Data points
0 −15 Huber penalty
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2
Regular LS
−20
Useful for robust regression— −10 −5 0
x
5 10
to suppress outliers
f (x, y) , inf { kx − yk | y ∈ E }
E , { y ∈ Rn | kP (y − yc)k2 ≤ 1 }