A First Look at Duality

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

A first look at duality

It is time for our first look at one of the most important concepts in
convex optimization (and all of convex analysis in general): duality.
With the separating and supporting hyperplane theorems, we have
had a glimpse of how hyperplanes and halfspaces interact with convex
sets. In this section, we will take this even further, and show that for
our closest point problem, the search for the optimal vector (points in
RN ) can be equivalently written as a search for a hyperplane (linear
functional on RN ).

First, let me point out something which should almost be obvious at


this point:
A closed convex set is equal to the intersection of (halfspaces
defined by) its supporting hyperplanes.

A polygon is the intersection of a finite number of halfspaces:

25
Georgia Tech ECE 8823c notes by M. A. Davenport and J. Romberg. Last updated 23:45, January 27, 2019
A general closed convex set is the intersection of a possibly infinite
number of halfspaces:

That this is true can almost immediately be deduced from work


we have done already. Let HC be the intersection of the halfspaces
defined by supporting hyperplanes:
\ \
HC = {x : aTx ≤ b}
z∈bd C (a,b)∈A(z)

where
A(z) = {set of supporting hyperplanes at z}
= {(a, b) : {aTx = b} is a supporting hyperplane at z}.
If x ∈ C, then x is also in every halfspace in the intersection above,
simply by the definition of “supporting hyperplane”, thus x ∈ C ⇒
x ∈ HC . If x 6∈ C, then there is at least one halfspace which does not
contain x; in particular, we can choose the hyperplane that supports
C at PC (x) with normal vector x − PC (x). Thus x 6∈ C ⇒ x 6∈ HC .

Moral: A (closed) convex set can be thought of as a collection of


points, or a collection of hyperplanes.

26
Georgia Tech ECE 8823c notes by M. A. Davenport and J. Romberg. Last updated 23:45, January 27, 2019
To continue, it will be critical for you to appreciate that vectors play
two distinct roles when we talk about convexity. The usual role (the
“primal” role, if you will) is as a point (set of coordinates) in RN . The
other role (the “dual” role) is as a linear functional: any a ∈ RN
defines a functional f (v) = hv, ai = aTv over v ∈ RN . When
coupled with a scalar, these functionals can be used to define subsets
of RN (hyperplanes and halfspaces).

To develop this appreciation, let’s look at another way to describe


a convex set that is closely related to the infinite-intersection-of-
hyperplanes interpretation. The support functional of a convex
set C is
h(a) = sup hx, ai.
x∈C

This is a function that takes linear functionals (which again are just
vectors) as arguments, and returns the maximum1 value of that linear
functional over C. Geometrically, we take the halfspace {x : hx, ai ≤
b} and see how large we need to make b so that it contains C (of
course, b might be negative).
Note that scaling a scales h(a) by the same amount; for all β ≥ 0,
we can see that h(βa) = β h(a).

1
We are writing sup instead of max here since if C is not bounded, it very
well may be that h(a) is ∞. If C is closed and h(a) is finite, then there
is at least one point in C which achieves the maximum.

27
Georgia Tech ECE 8823c notes by M. A. Davenport and J. Romberg. Last updated 23:45, January 27, 2019
Examples:

{x : hx, ai = 3.89}

3
1 1
 
a=√
2 C 2 1
1 h(a) = 3.89
a
1 2 3

{x : hx, ai = 3}
3  
0
a=
2 C 1
1 h(a) = 3
a

1 2 3

3
1 1
 
a = −√
2 C
2 1
1 h(a) = −1.41
{x : hx, ai = 1.41}
a 1 2 3

28
Georgia Tech ECE 8823c notes by M. A. Davenport and J. Romberg. Last updated 23:45, January 27, 2019
Problem: Find h(a) = supx∈C hx, ai for the following sets:
1. C = {x : kxk2 ≤ 1}

2. C = {x : kxk1 ≤ 1}

3. C = {x : kxk∞ ≤ 1}

29
Georgia Tech ECE 8823c notes by M. A. Davenport and J. Romberg. Last updated 23:45, January 27, 2019
We can use the support functional to recast the closest point problem
as a search for a linear functional (i.e. hyperplane or halfspace). We
will see that the projection of x0 onto C can be found by searching
over the hyperplanes which separate x0 from C and choosing the one
which is the maximum distance from x0:

x0

This maximal hyperplane will support C at the solution PC (x0).

We will now make this geometrically intuitive fact mathematically


precise. Let C be a closed and convex set, let x0 be a point in RN ,
and let xb = PC (x0 ) be the projection of x0 onto C. We will show
that the distance from x0 to C can be written in two different ways:
d : = kx0 − PC (x0)k2
= min kx0 − xk2
x∈C
= maxN hx0, ai − h(a) subject to kak2 ≤ 1.
a∈R

We will show this first for the case where x0 = 0; the result for
general x0 will then follow easily just by shifting everything relative
to x0. We want to show that the distance from C to the origin,
d := min kxk2,
x∈C

30
Georgia Tech ECE 8823c notes by M. A. Davenport and J. Romberg. Last updated 23:45, January 27, 2019
can also be computed as

d = max −h(a).
kak2 ≤1

We will assume that 0 6∈ C, as otherwise the optimization problem


is trivial.

We will begin by first showing that d ≥ −h(a) for any a with


kak2 = 1. For any x ∈ C, we have

kxk2 = max hx, vi ≥ hx, −ai ≥ −h(a).


kvk2 =1

Note that the last inequality follows from the fact that, from the
definition of h(a), we have that for all x ∈ C, hx, ai ≤ h(a), which
implies that hx, −ai ≥ −h(a). In words, this means that x is a
distance of at least −h(a) from the origin. This holds for every point
in C, so
d ≥ −h(a).
The inequality above holds uniformly over all unit norm a, thus

d ≥ max −h(a).
kak2 =1

We will now show that there exists an ab such that d ≤ h(a b ), and
thus, when combined with the result above, shows that d = −h(a b ).
We know by the projection theorem that there is a unique close
point to the origin, which we call x
b . Moreover, recall that this x b
must satisfy the obtuseness property:

hx − x
b , x0 − x
b i ≤ 0 for all x ∈ C.

31
Georgia Tech ECE 8823c notes by M. A. Davenport and J. Romberg. Last updated 23:45, January 27, 2019
In our case, x0 = 0, which yields
x
 
hx − x
b , −x
bi ≤ 0 ⇒ x, − ≤ − kx
b k2 = −d,
b
kx
b k2

for all x ∈ C. Setting a


b = −x
b / kx
b k2 , this implies that

b ) = suphx, a
h(a b i ≤ −d,
x∈C

which yields d ≤ −h(a


b ).

The following picture might help you visualize the argument above:

{x : hx, a⇤ i = d}
C

Bd
d

â = a⇤ 0

32
Georgia Tech ECE 8823c notes by M. A. Davenport and J. Romberg. Last updated 23:45, January 27, 2019
We have shown that we can write the “closest point to the origin”
problem in two ways:

min kxk2 = max −h(a).


x∈C kak2 ≤1

For the general closest point problem, we use a simple change of


variable:

min kx − x0k2 = 0min kx0k2


x∈C x ∈C−x0
= max − sup hx0, ai
kak2 ≤1 x0 ∈C−x0
= max − suphx − x0, ai
kak2 ≤1 x∈C
= max hx0, ai − suphx, ai
kak2 ≤1 x∈C
= max hx0, ai − h(a).
kak2 ≤1

It is also true that a


b which maximizes the dual will be aligned with
the error xb − x0 of the primal; with

b = arg min kx − x0 k2 ,
x b = arg maxhx0 , ai − h(a),
a
x∈C kak2 ≤1

we have
x0 − x
b
a
b= .
kx0 − x
b k2
Note that for this problem, the uniqueness of the primal solution x
b
also implies the uniqueness of the dual solution a
b.

33
Georgia Tech ECE 8823c notes by M. A. Davenport and J. Romberg. Last updated 23:45, January 27, 2019
Summary: Closest Point Duality
Let C be a closed convex set, and let x0 ∈ RN be a fixed point.
On the previous pages, we have established:
1. For all x ∈ C and a with kak2 ≤ 1,
kx − x0k2 ≥ hx0, ai − h(a),
where
h(a) = suphy, ai.
y∈C

This bound holds uniformly for feasible x and a, so


min kx − x0k2 ≥ max hx0, ai − h(a).
x∈C kak2 ≤1

We call the minimization program on the left the primal


and the maximization program on the right the dual.
2. There is a pair of points where equality of the functionals
above is achieved, so
min kx − x0k2 = max hx0, ai − h(a).
x∈C kak2 ≤1

Moreover, the minimum/maximum are achieved by unique


x
b and a
b , and so
kx
b − x0 k2 = hx0 , a
b i − h(a
b ).

3. The dual optimal point (which is a normal vector for a hy-


perplane) is aligned with the error:
x0 − x
b
a
b= .
kx0 − x
b k2
This gives us the optimality condition for x
b:
b − x0 k22 = hx0 , x0 − x
kx b i − h(x0 − x
b ).

34
Georgia Tech ECE 8823c notes by M. A. Davenport and J. Romberg. Last updated 23:45, January 27, 2019
Let’s take one more look at the dual problem,

max hx0, ai − h(a),


kak2 ≤1

and see how we can interpret it as finding the maximum distance


from x0 to a hyperplane that separates x0 and C. We know that the
distance from x0 to a hyperplane {x : hx, ai = b}, when kak2 = 1
is |hx0, ai − b|. The quantity h(a) is the “maximal” value of b that
keeps such a hyperplane separating x0 and C, so

hx0, ai − h(a)

is the distance of x0 to the furthest separating hyperplane that has


a as a normal vector (again, with kak2 = 1). Thus taking the
maximum over all such normal vectors is a search for the separating
hyperplane that is furthest from x0.

C C

x0
x0

35
Georgia Tech ECE 8823c notes by M. A. Davenport and J. Romberg. Last updated 23:45, January 27, 2019
 
1
Problem: Let C = {x ∈ R2 : kxk1 ≤ 1}. Set x0 = . Find
3
b ∈ C to x0 — you can do this by inspection after
the closest point x
sketching C and x0. Calculate a
b above, and verify that

kx0 − x
b k2 = hx0 , a
b i − h(a
b ).

36
Georgia Tech ECE 8823c notes by M. A. Davenport and J. Romberg. Last updated 23:45, January 27, 2019
Problem: Given x0, describe how to solve

min kx0 − xk2 subject to kxk∞ ≤ 1.


x

This can be solved using your intuition. After a moments thought,


you know the solution x
b must be given by
(
x
b[n] =

Show that the optimality conditions hold for your solution:


b k22 = hx0 , x0 − x
kx0 − x b i − h(x0 − x
b)

37
Georgia Tech ECE 8823c notes by M. A. Davenport and J. Romberg. Last updated 23:45, January 27, 2019
Problem: Given x0, describe how to solve

min kx0 − xk2 subject to kxk1 ≤ 1.


x

Start by writing down the optimality conditions, and then thinking


hard about what they mean.

38
Georgia Tech ECE 8823c notes by M. A. Davenport and J. Romberg. Last updated 23:45, January 27, 2019

You might also like