IE643 Lecture10 Part2 25sep2020 PDF

Deep Learning - Theory and Practice
IE 643
Lecture 10 - Part 2
September 25, 2020
P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 1 / 55

Outline
1 Convex Sets
2 Convex Functions in 1D
3 Convex Sets and Functions in Higher Dimensions
4 Strictly Convex Functions
5 Convex Optimization

Convex Sets
Convex Sets

Convex Sets
Points in a 2D space

Convex Sets
Affine combination of two points
z is an arbitrary point on the line passing through x and y .

Convex Sets
Convex combination of two points
z is an arbitrary point on the line segment connecting x and y .

Convex Sets
Convex Sets
A set C is convex if λx + (1 − λ)y ∈ C, ∀x, y ∈ C, ∀λ ∈ [0, 1].

The line segment connecting x and y in C lies entirely within C.

Convex Sets
Convex Sets
A set C is convex if λx + (1 − λ)y ∈ C, ∀x, y ∈ C, λ ∈ [0, 1].

The line segment connecting x and y in C lies entirely within C.

Convex Sets
Non-convex Sets
A set C is not convex if there exist two points x and y in C such that
λx + (1 − λ)y ∈ / C, for some λ ∈ [0, 1].
The line segment connecting x and y in C does not entirely lie within
C.

Convex Sets
Convex Sets and Convex Combination

Going beyond two points
Let x, y , z be points in a set C.
How to extend the definition of convex combination to these three
points?

Convex Sets

Let x, y , z be points in a set C.
How to extend the definition of convex combination to these three
points? (Homework!)

Convex Sets

More generally, let x 1 , x 2 , . . . , x m be m points in a set C.
How to extend the definition of convex combination to these m
points? (Homework!)

Convex Functions in 1D
Convex Functions

Convex Function - Definition
A function f : C → R, defined over a convex set C ⊆ R is called

convex if f (λy + (1 − λ)z) ≤ λf (y ) + (1 − λ)f (z), ∀y , z ∈ C,
∀λ ∈ [0, 1].


∀λ ∈ [0, 1].
Chord over-estimates the graph of function.


∀λ ∈ [0, 1].


∀λ ∈ [0, 1].

Concave Function
Concave function is a close relative of convex function.

concave if -f is convex over C.
Note: Concave functions are also defined over convex sets.
Convex Function - Characterization
Extending convex function definition to more than two points.

How to extend the definition of convex functions to a set of points
{x 1 , x 2 , . . . , x m } ⊂ C?

Convex Function - Characterization
Extending convex function definition to more than two points.

How to extend the definition of convex functions to a set of points
{x 1 , x 2 , . . . , x m } ⊂ C? (Homework !)

Convex Function - First Order Characterization
For differentiable convex functions, the tangent lines under-estimate

the graph of function.
Recall: Tangent at a point is a first-order approximation of a function.
Recall: First order approximation of a function f at y in the vicinity

of point z:
I f (y ) ≈ f (z) + (y − z)f 0 (z).

Let C ⊆ R be an open interval. Let f : C → R be a continuously

differentiable function. Then f is convex if and only if
f (y ) ≥ f (z) + (y − z)f 0 (z), ∀y , z ∈ C.
f 0 (z) is the derivative of f at z.
Note: C ⊆ R is assumed to be an open interval.
Convex Function - Second Order Characterization
Let C ⊆ R be an open interval. Let f : C → R be a twice

continuously differentiable function. Then f is convex if and only if
f 00 (x) ≥ 0, ∀x ∈ C.
f 00 (x) is the double derivative of f at x.
f 00 (x) ≥ 0 indicates non-negative curvature.

Convex Function - Interesting Properties
Convex functions enjoy several interesting properties.

Pm
If fi are convex for i = 1, . . . , m, their weighted sum i=1 θi fi is
convex, when θi ≥ 0, ∀i = 1, . . . m.
If fi are convex for i = 1, . . . , m, maxi=1,...,m fi is convex.
If f is convex then g (x) = f (ax + b), a, b ∈ R is also convex. (Affine
invariance)

Moving Towards Higher Dimensions...

Convex Sets and Functions in Higher Dimensions
Convex Sets In High Dimensions
Extreme cases: the empty set ∅ and the full space Rd .

Hyperplane: {x ∈ Rd : w > x = a}, for some 0 6= w ∈ Rd and a ∈ R.

Closed Halfspace:
I {x ∈ Rd : w > x ≥ a}, for some 0 6= w ∈ Rd and a ∈ R.
I {x ∈ Rd : w > x 6 w ∈ Rd and a ∈ R.
≤ a}, for some 0 =
Open Halfspace:
I {x ∈ Rd : w > x > a}, for some 0 6= w ∈ Rd and a ∈ R.
I {x ∈ Rd : w > x 6 w ∈ Rd and a ∈ R.
< a}, for some 0 =

High Dimensional Representation - Notations
Gradient of a function f : Rd → R at a point x

 ∂f (x) 
∂x
 1 
 
 ∂f (x) 
 ∂x2 
 
 : 
∇f (x) =  
 .. 
 . 
 
 : 
∂f (x)
∂xd

High Dimensional Representation - Notations
Hessian Matrix of a function f : Rd → R at a point x

 2
∂ 2 f (x) ∂ 2 f (x)

∂ f (x)
. . .
 ∂x1 ∂x1 ∂x1 ∂x2 ∂x1 ∂xd 
 
 ∂ 2 f (x) ∂ 2 f (x) ∂ 2 f (x) 
 ∂x2 ∂x1 ∂x2 ∂x2 . . . ∂x2 ∂xd 
 
2  : : ... : 
H = ∇ f (x) =  
 . . .
 .. .. .. 

 ... 
 : : . . . : 
 
2
∂ f (x) 2
∂ f (x) 2
∂ f (x)
∂xd ∂x1 ∂xd ∂x2 . . . ∂xd ∂xd
Note the size of H: d × d, we will denote this as H ∈ Rd×d .

Note also that H is symmetric. (why?)

Transpose Of A Matrix
Matrix A of size d ×d Transpose of A (of same size)

 
a11 a12 . . . a1d
 
a11 a21 . . . ad1
   
   
 a21 a22 . . . a2d   a12 a22 . . . ad2 
   
A= : : ... :  A> =  : : ... : 
 

 .. .. ..   .. ..

.. 
 . . ... .   . . ... . 
   
 : : ... :   : : ... : 
ad1 ad2 . . . add a1d a2d . . . add
Note: Rows of matrix A are columns of matrix A> .

Symmetric Matrix
A matrix A is symmetric if A = A> .

 
a11 a12 . . . a1d
 
 
 a12 a22 . . . a2d 
 
A= : : ... :   = A>

 .. .. . 
 .
 . . . . .. 
 : : ... : 
a1d a2d . . . add

Symmetric Positive (Semi-)definite Matrix
A symmetric matrix A ∈ Rd×d is called positive semi-definite

(denoted by A 0) if
x > Ax ≥ 0, ∀x ∈ Rd .
Caution: This definition is non-intuitive.
A symmetric matrix A ∈ Rd×d is called positive definite (denoted by

A 0) if
x > Ax > 0, ∀x ∈ Rd such that x 6= 0.

Symmetric Positive (Semi-)definite Matrix
Computation-friendly definitions
A 0 ⇐⇒ all eigen values of A are non-negative.
A 0 ⇐⇒ all eigen values of A are positive.

Recall:
I A (non-zero) vector x is called an eigen vector of matrix A ∈ Rd×d
with a corresponding eigen value β, if Ax = βx.
I An eigen value of a symmetric matrix is always real. (Why?)

Convex Function - Characterization In 1D
Let C ⊆ R be an open convex set and let f : C → R be a convex function.

Recall:
Zero-th Order Characterization
f (λy + (1 − λ)z) ≤ λf (y ) + (1 − λ)f (z), ∀y , z ∈ C, ∀λ ∈ [0, 1].
First Order Characterization
f continuously differentiable, f (y ) ≥ f (z) + f 0 (z)> (y − z), ∀y , z ∈ C.
Second Order Characterization
f twice continuously differentiable and f 00 (x) ≥ 0, ∀x ∈ C.

Convex Function - Characterization In High Dimensions
Let C ⊆ Rd be an open convex set and let f : C → R be a convex

function.
f (λy + (1 − λ)z) ≤ λf (y ) + (1 − λ)f (z), ∀y , z ∈ C, ∀λ ∈ [0, 1].
f continuously differentiable, f (y ) ≥ f (z) + ∇f (z)> (y − z), ∀y , z ∈ C.
Second Order Characterization
f twice continuously differentiable and ∇2 f 0.

Other Flavors Of Convex Function
Strictly convex function
Strongly convex function

Strictly Convex Functions
Strictly Convex Function
A function f : C → R defined over a convex set C is called strictly convex if
f (λy + (1 − λ)z)<λf (y ) + (1 − λ)f (z), ∀y , z ∈ C s.t. x 6= y , ∀λ ∈ (0, 1).

A function f : C → R defined over a convex set C is called strictly convex if
f (λy + (1 − λ)z)<λf (y ) + (1 − λ)f (z), ∀y , z ∈ C s.t. x 6= y , ∀λ ∈ (0, 1).
Graph of the function should be strictly below the chord !

Strictly Convex Function - First Order Characterization
Let C ⊆ R be an open convex set. Let f : C → R be a continuously

differentiable function. Then f is strictly convex if and only if
f (y ) > f (z) + ∇f (z)> (y − z), ∀y , z, ∈ C, y 6= z.

Strictly Convex Function - Second Order Characterization
Let C ⊆ R be an open convex set. Let f : C → R be a twice

continuously differentiable function. f is strictly convex if
∇2 f 0.
Important note: This positive definiteness condition is sufficient but

not necessary.
I e.g. f (x) = x 4 , x ∈ R is strictly convex but f 00 (x) = 0 at x = 0.

Let C ⊆ Rd be a convex set and let f : C → R be a strictly convex

function.
f (λy + (1 − λ)z)<λf (y ) + (1 − λ)f (z), ∀y , z ∈ C, y 6= z ∀λ ∈ (0, 1).

f continuously differentiable in int(C) and
∀z ∈ int(C), f (y ) > f (z) + ∇f (z)> (y − z), ∀y 6= z ∈ C.

Let C ⊆ Rd be an open convex set and let f : C → R.

Second Order Characterization (sufficient condition)
f twice continuously differentiable and ∇2 f 0, =⇒

f is strictly convex.

Convex Optimization
Convex Optimization Problem

Convex Optimization
General Optimization Problem
min f (x)
x∈C
f is called objective function and C is called feasible set.

Let f ∗ = minx∈C f (x) denote the optimal objective function value.
Optimal Solution Set X ∗ = {x ∈ C : f (x) = f ∗ }.
Let us denote by x ∗ an optimal solution in X ∗ .

Convex Optimization
min f (x) (OP)

x∈C

Convex Optimization
min f (x) (OP)

x∈C
Local Optimal Solution

A solution z to (OP) is called local optimal solution if f (z) ≤ f (ẑ),
∀ẑ ∈ N (z) for some > 0.
Recall: N (z) denotes the -neighborhood of z with respect to a suitable

distance metric.
Global Optimal Solution
A solution z to (OP) is called global optimal solution if f (z) ≤ f (ẑ),
∀ẑ ∈ C.

Convex Optimization
Local vs. Global Optimal Solutions
First-order necessary condition for optimality

Let f : Rd → R be a continuously differentiable function. If a point
x ∗ ∈ Rd is a local optimal solution of minx∈Rd f (x) then ∇f (x ∗ ) = 0.

Convex Optimization
min f (x)
x∈C
C ⊆ Rd is a convex set
f : C → R is a convex function
Convex Optimization Regime!

Convex Optimization
What Is So Special About Convex Optimization?
Appealing geometry in small dimensions

Nice properties from an optimization perspective
I Every local optimal solution (if it exists) is a global optimal solution.

Convex Optimization
Proposition
Consider the convex optimization problem minx∈Rd f (x). Then every local
optimal solution of the problem minx∈Rd f (x) is a global optimal solution.

Convex Optimization
First-order necessary and sufficient condition for optimality

Let f : Rd → R be a continuously differentiable convex function. A point
x ∗ ∈ Rd is an optimal solution of minx∈Rd f (x) if and only if ∇f (x ∗ ) = 0.

Convex Optimization
Proposition
Let f : Rd → R be a continuously differentiable convex function. A point
x ∗ ∈ Rd is an optimal solution of minx∈Rd f (x) if and only if ∇f (x ∗ ) = 0.
Note: The zero gradient condition is necessary and sufficient for

optimality!

Convex Optimization
Optimal solutions for Strictly Convex Functions
Uniqueness of solution
Consider the convex optimization problem minx∈Rd f (x) where f is strictly
convex. If the set X ∗ = argminx∈Rd f (x) is non-empty, then X ∗ contains
exactly one element.

IE643 Lecture10 Part2 25sep2020 PDF

Uploaded by

Copyright:

Available Formats

You might also like

IE643 Lecture10 Part2 25sep2020 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

IE643 Lecture10 Part2 25sep2020 PDF

Uploaded by

Copyright:

Available Formats

Deep Learning - Theory and Practice

September 25, 2020

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 1 / 55

3 Convex Sets and Functions in Higher Dimensions

4 Strictly Convex Functions

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 2 / 55

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 3 / 55

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 4 / 55

Affine combination of two points

z is an arbitrary point on the line passing through x and y .

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 5 / 55

Convex combination of two points

z is an arbitrary point on the line segment connecting x and y .

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 6 / 55

A set C is convex if λx + (1 − λ)y ∈ C, ∀x, y ∈ C, ∀λ ∈ [0, 1].

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 7 / 55

A set C is convex if λx + (1 − λ)y ∈ C, ∀x, y ∈ C, λ ∈ [0, 1].

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 8 / 55

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 9 / 55

Convex Sets and Convex Combination

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 10 / 55

Convex Sets and Convex Combination

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 11 / 55

Convex Sets and Convex Combination

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 12 / 55

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 13 / 55

Convex Function - Definition

A function f : C → R, defined over a convex set C ⊆ R is called

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 14 / 55

Convex Function - Definition

A function f : C → R, defined over a convex set C ⊆ R is called

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 15 / 55

Convex Function - Definition

A function f : C → R, defined over a convex set C ⊆ R is called

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 16 / 55

Convex Function - Definition

A function f : C → R, defined over a convex set C ⊆ R is called

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 17 / 55

Concave function is a close relative of convex function.

Convex Function - Characterization

Extending convex function definition to more than two points.

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 19 / 55

Convex Function - Characterization

Extending convex function definition to more than two points.

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 20 / 55

Convex Function - First Order Characterization

For differentiable convex functions, the tangent lines under-estimate

Convex Function - First Order Characterization

Recall: First order approximation of a function f at y in the vicinity

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 22 / 55

Convex Function - First Order Characterization

Let C ⊆ R be an open interval. Let f : C → R be a continuously

Convex Function - Second Order Characterization

Let C ⊆ R be an open interval. Let f : C → R be a twice

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 24 / 55

Convex Function - Interesting Properties

Convex functions enjoy several interesting properties.

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 25 / 55

A 0 ⇐⇒ all eigen values of A are non-negative.

A 0 ⇐⇒ all eigen values of A are positive.

f twice continuously differentiable and ∇2 f 0.