A First Look at Duality

A first look at duality
It is time for our first look at one of the most important concepts in
convex optimization (and all of convex analysis in general): duality.
With the separating and supporting hyperplane theorems, we have
had a glimpse of how hyperplanes and halfspaces interact with convex
sets. In this section, we will take this even further, and show that for
our closest point problem, the search for the optimal vector (points in
RN ) can be equivalently written as a search for a hyperplane (linear
functional on RN ).
First, let me point out something which should almost be obvious at

this point:
A closed convex set is equal to the intersection of (halfspaces
defined by) its supporting hyperplanes.
A polygon is the intersection of a finite number of halfspaces:
25
Georgia Tech ECE 8823c notes by M. A. Davenport and J. Romberg. Last updated 23:45, January 27, 2019
A general closed convex set is the intersection of a possibly infinite
number of halfspaces:
That this is true can almost immediately be deduced from work

we have done already. Let HC be the intersection of the halfspaces
defined by supporting hyperplanes:
\ \
HC = {x : aTx ≤ b}
z∈bd C (a,b)∈A(z)
where
A(z) = {set of supporting hyperplanes at z}
= {(a, b) : {aTx = b} is a supporting hyperplane at z}.
If x ∈ C, then x is also in every halfspace in the intersection above,
simply by the definition of “supporting hyperplane”, thus x ∈ C ⇒
x ∈ HC . If x 6∈ C, then there is at least one halfspace which does not
contain x; in particular, we can choose the hyperplane that supports
C at PC (x) with normal vector x − PC (x). Thus x 6∈ C ⇒ x 6∈ HC .
Moral: A (closed) convex set can be thought of as a collection of

points, or a collection of hyperplanes.
26
To continue, it will be critical for you to appreciate that vectors play
two distinct roles when we talk about convexity. The usual role (the
“primal” role, if you will) is as a point (set of coordinates) in RN . The
other role (the “dual” role) is as a linear functional: any a ∈ RN
defines a functional f (v) = hv, ai = aTv over v ∈ RN . When
coupled with a scalar, these functionals can be used to define subsets
of RN (hyperplanes and halfspaces).
To develop this appreciation, let’s look at another way to describe

a convex set that is closely related to the infinite-intersection-of-
hyperplanes interpretation. The support functional of a convex
set C is
h(a) = sup hx, ai.
x∈C
This is a function that takes linear functionals (which again are just
vectors) as arguments, and returns the maximum1 value of that linear
functional over C. Geometrically, we take the halfspace {x : hx, ai ≤
b} and see how large we need to make b so that it contains C (of
course, b might be negative).
Note that scaling a scales h(a) by the same amount; for all β ≥ 0,
we can see that h(βa) = β h(a).
1
We are writing sup instead of max here since if C is not bounded, it very
well may be that h(a) is ∞. If C is closed and h(a) is finite, then there
is at least one point in C which achieves the maximum.
27
Examples:
{x : hx, ai = 3.89}
3
1 1

a=√
2 C 2 1
1 h(a) = 3.89
a
1 2 3
{x : hx, ai = 3}
3
0
a=
2 C 1
1 h(a) = 3
a
1 2 3
3
1 1

a = −√
2 C
2 1
1 h(a) = −1.41
{x : hx, ai = 1.41}
a 1 2 3
28
Problem: Find h(a) = supx∈C hx, ai for the following sets:
1. C = {x : kxk2 ≤ 1}
2. C = {x : kxk1 ≤ 1}
3. C = {x : kxk∞ ≤ 1}
29
We can use the support functional to recast the closest point problem
as a search for a linear functional (i.e. hyperplane or halfspace). We
will see that the projection of x0 onto C can be found by searching
over the hyperplanes which separate x0 from C and choosing the one
which is the maximum distance from x0:
x0
This maximal hyperplane will support C at the solution PC (x0).
We will now make this geometrically intuitive fact mathematically

precise. Let C be a closed and convex set, let x0 be a point in RN ,
and let xb = PC (x0 ) be the projection of x0 onto C. We will show
that the distance from x0 to C can be written in two different ways:
d : = kx0 − PC (x0)k2
= min kx0 − xk2
x∈C
= maxN hx0, ai − h(a) subject to kak2 ≤ 1.
a∈R
We will show this first for the case where x0 = 0; the result for
general x0 will then follow easily just by shifting everything relative
to x0. We want to show that the distance from C to the origin,
d := min kxk2,
x∈C
30
can also be computed as
d = max −h(a).
kak2 ≤1
We will assume that 0 6∈ C, as otherwise the optimization problem

is trivial.
We will begin by first showing that d ≥ −h(a) for any a with

kak2 = 1. For any x ∈ C, we have
kxk2 = max hx, vi ≥ hx, −ai ≥ −h(a).

kvk2 =1
Note that the last inequality follows from the fact that, from the
definition of h(a), we have that for all x ∈ C, hx, ai ≤ h(a), which
implies that hx, −ai ≥ −h(a). In words, this means that x is a
distance of at least −h(a) from the origin. This holds for every point
in C, so
d ≥ −h(a).
The inequality above holds uniformly over all unit norm a, thus
d ≥ max −h(a).
kak2 =1
We will now show that there exists an ab such that d ≤ h(a b ), and
thus, when combined with the result above, shows that d = −h(a b ).
We know by the projection theorem that there is a unique close
point to the origin, which we call x
b . Moreover, recall that this x b
must satisfy the obtuseness property:
hx − x
b , x0 − x
b i ≤ 0 for all x ∈ C.
31
In our case, x0 = 0, which yields
x

hx − x
b , −x
bi ≤ 0 ⇒ x, − ≤ − kx
b k2 = −d,
b
kx
b k2
for all x ∈ C. Setting a

b = −x
b / kx
b k2 , this implies that
b ) = suphx, a
h(a b i ≤ −d,
x∈C
which yields d ≤ −h(a

b ).
The following picture might help you visualize the argument above:
{x : hx, a⇤ i = d}
C
x̂
Bd
d
â = a⇤ 0
32
We have shown that we can write the “closest point to the origin”
problem in two ways:
min kxk2 = max −h(a).

x∈C kak2 ≤1
For the general closest point problem, we use a simple change of

variable:
min kx − x0k2 = 0min kx0k2

x∈C x ∈C−x0
= max − sup hx0, ai
kak2 ≤1 x0 ∈C−x0
= max − suphx − x0, ai
kak2 ≤1 x∈C
= max hx0, ai − suphx, ai
kak2 ≤1 x∈C
= max hx0, ai − h(a).
kak2 ≤1
It is also true that a

b which maximizes the dual will be aligned with
the error xb − x0 of the primal; with
b = arg min kx − x0 k2 ,
x b = arg maxhx0 , ai − h(a),
a
x∈C kak2 ≤1
we have
x0 − x
b
a
b= .
kx0 − x
b k2
Note that for this problem, the uniqueness of the primal solution x
b
also implies the uniqueness of the dual solution a
b.
33
Summary: Closest Point Duality
Let C be a closed convex set, and let x0 ∈ RN be a fixed point.
On the previous pages, we have established:
1. For all x ∈ C and a with kak2 ≤ 1,
kx − x0k2 ≥ hx0, ai − h(a),
where
h(a) = suphy, ai.
y∈C
This bound holds uniformly for feasible x and a, so

min kx − x0k2 ≥ max hx0, ai − h(a).
x∈C kak2 ≤1
We call the minimization program on the left the primal

and the maximization program on the right the dual.
2. There is a pair of points where equality of the functionals
above is achieved, so
min kx − x0k2 = max hx0, ai − h(a).
x∈C kak2 ≤1
Moreover, the minimum/maximum are achieved by unique

x
b and a
b , and so
kx
b − x0 k2 = hx0 , a
b i − h(a
b ).
3. The dual optimal point (which is a normal vector for a hy-

perplane) is aligned with the error:
x0 − x
b
a
b= .
kx0 − x
b k2
This gives us the optimality condition for x
b:
b − x0 k22 = hx0 , x0 − x
kx b i − h(x0 − x
b ).
34
Let’s take one more look at the dual problem,
max hx0, ai − h(a),

kak2 ≤1
and see how we can interpret it as finding the maximum distance

from x0 to a hyperplane that separates x0 and C. We know that the
distance from x0 to a hyperplane {x : hx, ai = b}, when kak2 = 1
is |hx0, ai − b|. The quantity h(a) is the “maximal” value of b that
keeps such a hyperplane separating x0 and C, so
hx0, ai − h(a)
is the distance of x0 to the furthest separating hyperplane that has

a as a normal vector (again, with kak2 = 1). Thus taking the
maximum over all such normal vectors is a search for the separating
hyperplane that is furthest from x0.
C C
x0
x0
35

1
Problem: Let C = {x ∈ R2 : kxk1 ≤ 1}. Set x0 = . Find
3
b ∈ C to x0 — you can do this by inspection after
the closest point x
sketching C and x0. Calculate a
b above, and verify that
kx0 − x
b k2 = hx0 , a
b i − h(a
b ).
36
Problem: Given x0, describe how to solve
min kx0 − xk2 subject to kxk∞ ≤ 1.

x
This can be solved using your intuition. After a moments thought,

you know the solution x
b must be given by
(
x
b[n] =
Show that the optimality conditions hold for your solution:

b k22 = hx0 , x0 − x
kx0 − x b i − h(x0 − x
b)
37
Problem: Given x0, describe how to solve
min kx0 − xk2 subject to kxk1 ≤ 1.

x
Start by writing down the optimality conditions, and then thinking

hard about what they mean.
38

A First Look at Duality

Uploaded by

Copyright:

Available Formats

You might also like

A First Look at Duality

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A First Look at Duality

Uploaded by

Copyright:

Available Formats

A first look at duality

First, let me point out something which should almost be obvious at

A polygon is the intersection of a finite number of halfspaces:

That this is true can almost immediately be deduced from work

Moral: A (closed) convex set can be thought of as a collection of

To develop this appreciation, let’s look at another way to describe

This maximal hyperplane will support C at the solution PC (x0).

We will now make this geometrically intuitive fact mathematically

We will assume that 0 6∈ C, as otherwise the optimization problem

We will begin by first showing that d ≥ −h(a) for any a with

kxk2 = max hx, vi ≥ hx, −ai ≥ −h(a).

for all x ∈ C. Setting a

which yields d ≤ −h(a

min kxk2 = max −h(a).

For the general closest point problem, we use a simple change of

min kx − x0k2 = 0min kx0k2

It is also true that a

This bound holds uniformly for feasible x and a, so

We call the minimization program on the left the primal

Moreover, the minimum/maximum are achieved by unique

3. The dual optimal point (which is a normal vector for a hy-

max hx0, ai − h(a),

and see how we can interpret it as finding the maximum distance

is the distance of x0 to the furthest separating hyperplane that has

min kx0 − xk2 subject to kxk∞ ≤ 1.

This can be solved using your intuition. After a moments thought,

Show that the optimality conditions hold for your solution:

min kx0 − xk2 subject to kxk1 ≤ 1.

Start by writing down the optimality conditions, and then thinking

You might also like