IE643 Lecture10 Part2 25sep2020 PDF

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 55

Deep Learning - Theory and Practice

IE 643

Lecture 10 - Part 2

September 25, 2020

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 1 / 55


Outline

1 Convex Sets

2 Convex Functions in 1D

3 Convex Sets and Functions in Higher Dimensions

4 Strictly Convex Functions

5 Convex Optimization

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 2 / 55


Convex Sets

Convex Sets

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 3 / 55


Convex Sets

Points in a 2D space

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 4 / 55


Convex Sets

Affine combination of two points

z is an arbitrary point on the line passing through x and y .

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 5 / 55


Convex Sets

Convex combination of two points

z is an arbitrary point on the line segment connecting x and y .

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 6 / 55


Convex Sets

Convex Sets

A set C is convex if λx + (1 − λ)y ∈ C, ∀x, y ∈ C, ∀λ ∈ [0, 1].


The line segment connecting x and y in C lies entirely within C.

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 7 / 55


Convex Sets

Convex Sets

A set C is convex if λx + (1 − λ)y ∈ C, ∀x, y ∈ C, λ ∈ [0, 1].


The line segment connecting x and y in C lies entirely within C.

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 8 / 55


Convex Sets

Non-convex Sets

A set C is not convex if there exist two points x and y in C such that
λx + (1 − λ)y ∈ / C, for some λ ∈ [0, 1].
The line segment connecting x and y in C does not entirely lie within
C.

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 9 / 55


Convex Sets

Convex Sets and Convex Combination


Going beyond two points
Let x, y , z be points in a set C.
How to extend the definition of convex combination to these three
points?

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 10 / 55


Convex Sets

Convex Sets and Convex Combination


Going beyond two points
Let x, y , z be points in a set C.
How to extend the definition of convex combination to these three
points? (Homework!)

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 11 / 55


Convex Sets

Convex Sets and Convex Combination


Going beyond two points
More generally, let x 1 , x 2 , . . . , x m be m points in a set C.
How to extend the definition of convex combination to these m
points? (Homework!)

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 12 / 55


Convex Functions in 1D

Convex Functions

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 13 / 55


Convex Functions in 1D

Convex Function - Definition

A function f : C → R, defined over a convex set C ⊆ R is called


convex if f (λy + (1 − λ)z) ≤ λf (y ) + (1 − λ)f (z), ∀y , z ∈ C,
∀λ ∈ [0, 1].

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 14 / 55


Convex Functions in 1D

Convex Function - Definition

A function f : C → R, defined over a convex set C ⊆ R is called


convex if f (λy + (1 − λ)z) ≤ λf (y ) + (1 − λ)f (z), ∀y , z ∈ C,
∀λ ∈ [0, 1].
Chord over-estimates the graph of function.

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 15 / 55


Convex Functions in 1D

Convex Function - Definition

A function f : C → R, defined over a convex set C ⊆ R is called


convex if f (λy + (1 − λ)z) ≤ λf (y ) + (1 − λ)f (z), ∀y , z ∈ C,
∀λ ∈ [0, 1].
Chord over-estimates the graph of function.

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 16 / 55


Convex Functions in 1D

Convex Function - Definition

A function f : C → R, defined over a convex set C ⊆ R is called


convex if f (λy + (1 − λ)z) ≤ λf (y ) + (1 − λ)f (z), ∀y , z ∈ C,
∀λ ∈ [0, 1].
Chord over-estimates the graph of function.

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 17 / 55


Convex Functions in 1D

Concave Function

Concave function is a close relative of convex function.


A function f : C → R, defined over a convex set C ⊆ R is called
concave if -f is convex over C.
Note: Concave functions are also defined over convex sets.
P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 18 / 55
Convex Functions in 1D

Convex Function - Characterization

Extending convex function definition to more than two points.


How to extend the definition of convex functions to a set of points
{x 1 , x 2 , . . . , x m } ⊂ C?

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 19 / 55


Convex Functions in 1D

Convex Function - Characterization

Extending convex function definition to more than two points.


How to extend the definition of convex functions to a set of points
{x 1 , x 2 , . . . , x m } ⊂ C? (Homework !)

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 20 / 55


Convex Functions in 1D

Convex Function - First Order Characterization

For differentiable convex functions, the tangent lines under-estimate


the graph of function.
Recall: Tangent at a point is a first-order approximation of a function.
P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 21 / 55
Convex Functions in 1D

Convex Function - First Order Characterization

Recall: First order approximation of a function f at y in the vicinity


of point z:
I f (y ) ≈ f (z) + (y − z)f 0 (z).

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 22 / 55


Convex Functions in 1D

Convex Function - First Order Characterization

Let C ⊆ R be an open interval. Let f : C → R be a continuously


differentiable function. Then f is convex if and only if
f (y ) ≥ f (z) + (y − z)f 0 (z), ∀y , z ∈ C.
f 0 (z) is the derivative of f at z.
Note: C ⊆ R is assumed to be an open interval.
P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 23 / 55
Convex Functions in 1D

Convex Function - Second Order Characterization

Let C ⊆ R be an open interval. Let f : C → R be a twice


continuously differentiable function. Then f is convex if and only if
f 00 (x) ≥ 0, ∀x ∈ C.
f 00 (x) is the double derivative of f at x.
f 00 (x) ≥ 0 indicates non-negative curvature.

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 24 / 55


Convex Functions in 1D

Convex Function - Interesting Properties

Convex functions enjoy several interesting properties.


Pm
If fi are convex for i = 1, . . . , m, their weighted sum i=1 θi fi is
convex, when θi ≥ 0, ∀i = 1, . . . m.
If fi are convex for i = 1, . . . , m, maxi=1,...,m fi is convex.
If f is convex then g (x) = f (ax + b), a, b ∈ R is also convex. (Affine
invariance)

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 25 / 55


Convex Functions in 1D

Moving Towards Higher Dimensions...

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 26 / 55


Convex Sets and Functions in Higher Dimensions

Convex Sets In High Dimensions

Extreme cases: the empty set ∅ and the full space Rd .

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 27 / 55


Convex Sets and Functions in Higher Dimensions

Convex Sets In High Dimensions

Hyperplane: {x ∈ Rd : w > x = a}, for some 0 6= w ∈ Rd and a ∈ R.

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 28 / 55


Convex Sets and Functions in Higher Dimensions

Convex Sets In High Dimensions

Closed Halfspace:
I {x ∈ Rd : w > x ≥ a}, for some 0 6= w ∈ Rd and a ∈ R.
I {x ∈ Rd : w > x 6 w ∈ Rd and a ∈ R.
≤ a}, for some 0 =
Open Halfspace:
I {x ∈ Rd : w > x > a}, for some 0 6= w ∈ Rd and a ∈ R.
I {x ∈ Rd : w > x 6 w ∈ Rd and a ∈ R.
< a}, for some 0 =

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 29 / 55


Convex Sets and Functions in Higher Dimensions

High Dimensional Representation - Notations

Gradient of a function f : Rd → R at a point x


 ∂f (x) 
∂x
 1 
 
 ∂f (x) 
 ∂x2 
 
 : 
∇f (x) =  
 .. 
 . 
 
 : 
∂f (x)
∂xd

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 30 / 55


Convex Sets and Functions in Higher Dimensions

High Dimensional Representation - Notations

Hessian Matrix of a function f : Rd → R at a point x


 2
∂ 2 f (x) ∂ 2 f (x)

∂ f (x)
. . .
 ∂x1 ∂x1 ∂x1 ∂x2 ∂x1 ∂xd 
 
 ∂ 2 f (x) ∂ 2 f (x) ∂ 2 f (x) 
 ∂x2 ∂x1 ∂x2 ∂x2 . . . ∂x2 ∂xd 
 
2  : : ... : 
H = ∇ f (x) =  
 . . .
 .. .. .. 

 ... 
 : : . . . : 
 
2
∂ f (x) 2
∂ f (x) 2
∂ f (x)
∂xd ∂x1 ∂xd ∂x2 . . . ∂xd ∂xd

Note the size of H: d × d, we will denote this as H ∈ Rd×d .


Note also that H is symmetric. (why?)

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 31 / 55


Convex Sets and Functions in Higher Dimensions

Transpose Of A Matrix

Matrix A of size d ×d Transpose of A (of same size)


 
a11 a12 . . . a1d
 
a11 a21 . . . ad1
   
   
 a21 a22 . . . a2d   a12 a22 . . . ad2 
   
A= : : ... :  A> =  : : ... : 
 

 .. .. ..   .. ..

.. 
 . . ... .   . . ... . 
   
 : : ... :   : : ... : 
ad1 ad2 . . . add a1d a2d . . . add

Note: Rows of matrix A are columns of matrix A> .

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 32 / 55


Convex Sets and Functions in Higher Dimensions

Symmetric Matrix

A matrix A is symmetric if A = A> .


 
a11 a12 . . . a1d
 
 
 a12 a22 . . . a2d 
 
A= : : ... :   = A>

 .. .. . 
 .
 . . . . .. 
 : : ... : 
a1d a2d . . . add

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 33 / 55


Convex Sets and Functions in Higher Dimensions

Symmetric Positive (Semi-)definite Matrix

A symmetric matrix A ∈ Rd×d is called positive semi-definite


(denoted by A  0) if

x > Ax ≥ 0, ∀x ∈ Rd .

Caution: This definition is non-intuitive.

A symmetric matrix A ∈ Rd×d is called positive definite (denoted by


A  0) if

x > Ax > 0, ∀x ∈ Rd such that x 6= 0.

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 34 / 55


Convex Sets and Functions in Higher Dimensions

Symmetric Positive (Semi-)definite Matrix

Computation-friendly definitions

A  0 ⇐⇒ all eigen values of A are non-negative.

A  0 ⇐⇒ all eigen values of A are positive.


Recall:
I A (non-zero) vector x is called an eigen vector of matrix A ∈ Rd×d
with a corresponding eigen value β, if Ax = βx.
I An eigen value of a symmetric matrix is always real. (Why?)

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 35 / 55


Convex Sets and Functions in Higher Dimensions

Convex Function - Characterization In 1D

Let C ⊆ R be an open convex set and let f : C → R be a convex function.


Recall:

Zero-th Order Characterization

f (λy + (1 − λ)z) ≤ λf (y ) + (1 − λ)f (z), ∀y , z ∈ C, ∀λ ∈ [0, 1].

First Order Characterization

f continuously differentiable, f (y ) ≥ f (z) + f 0 (z)> (y − z), ∀y , z ∈ C.

Second Order Characterization

f twice continuously differentiable and f 00 (x) ≥ 0, ∀x ∈ C.

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 36 / 55


Convex Sets and Functions in Higher Dimensions

Convex Function - Characterization In High Dimensions

Let C ⊆ Rd be an open convex set and let f : C → R be a convex


function.
Zero-th Order Characterization

f (λy + (1 − λ)z) ≤ λf (y ) + (1 − λ)f (z), ∀y , z ∈ C, ∀λ ∈ [0, 1].

First Order Characterization

f continuously differentiable, f (y ) ≥ f (z) + ∇f (z)> (y − z), ∀y , z ∈ C.

Second Order Characterization

f twice continuously differentiable and ∇2 f  0.

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 37 / 55


Convex Sets and Functions in Higher Dimensions

Other Flavors Of Convex Function

Strictly convex function

Strongly convex function

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 38 / 55


Strictly Convex Functions

Strictly Convex Function

A function f : C → R defined over a convex set C is called strictly convex if

f (λy + (1 − λ)z)<λf (y ) + (1 − λ)f (z), ∀y , z ∈ C s.t. x 6= y , ∀λ ∈ (0, 1).

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 39 / 55


Strictly Convex Functions

Strictly Convex Function

A function f : C → R defined over a convex set C is called strictly convex if

f (λy + (1 − λ)z)<λf (y ) + (1 − λ)f (z), ∀y , z ∈ C s.t. x 6= y , ∀λ ∈ (0, 1).

Graph of the function should be strictly below the chord !

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 40 / 55


Strictly Convex Functions

Strictly Convex Function - First Order Characterization

Let C ⊆ R be an open convex set. Let f : C → R be a continuously


differentiable function. Then f is strictly convex if and only if

f (y ) > f (z) + ∇f (z)> (y − z), ∀y , z, ∈ C, y 6= z.

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 41 / 55


Strictly Convex Functions

Strictly Convex Function - Second Order Characterization

Let C ⊆ R be an open convex set. Let f : C → R be a twice


continuously differentiable function. f is strictly convex if

∇2 f  0.

Important note: This positive definiteness condition is sufficient but


not necessary.
I e.g. f (x) = x 4 , x ∈ R is strictly convex but f 00 (x) = 0 at x = 0.

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 42 / 55


Strictly Convex Functions

Strictly Convex Function

Let C ⊆ Rd be a convex set and let f : C → R be a strictly convex


function.
Zero-th Order Characterization

f (λy + (1 − λ)z)<λf (y ) + (1 − λ)f (z), ∀y , z ∈ C, y 6= z ∀λ ∈ (0, 1).

First Order Characterization


f continuously differentiable in int(C) and

∀z ∈ int(C), f (y ) > f (z) + ∇f (z)> (y − z), ∀y 6= z ∈ C.

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 43 / 55


Strictly Convex Functions

Strictly Convex Function

Let C ⊆ Rd be an open convex set and let f : C → R.


Second Order Characterization (sufficient condition)

f twice continuously differentiable and ∇2 f  0, =⇒


f is strictly convex.

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 44 / 55


Convex Optimization

Convex Optimization Problem

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 45 / 55


Convex Optimization

General Optimization Problem

min f (x)
x∈C

f is called objective function and C is called feasible set.


Let f ∗ = minx∈C f (x) denote the optimal objective function value.
Optimal Solution Set X ∗ = {x ∈ C : f (x) = f ∗ }.
Let us denote by x ∗ an optimal solution in X ∗ .

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 46 / 55


Convex Optimization

General Optimization Problem

min f (x) (OP)


x∈C

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 47 / 55


Convex Optimization

General Optimization Problem

min f (x) (OP)


x∈C

Local Optimal Solution


A solution z to (OP) is called local optimal solution if f (z) ≤ f (ẑ),
∀ẑ ∈ N (z) for some  > 0.

Recall: N (z) denotes the -neighborhood of z with respect to a suitable


distance metric.
Global Optimal Solution
A solution z to (OP) is called global optimal solution if f (z) ≤ f (ẑ),
∀ẑ ∈ C.

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 48 / 55


Convex Optimization

Local vs. Global Optimal Solutions

First-order necessary condition for optimality


Let f : Rd → R be a continuously differentiable function. If a point
x ∗ ∈ Rd is a local optimal solution of minx∈Rd f (x) then ∇f (x ∗ ) = 0.

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 49 / 55


Convex Optimization

General Optimization Problem

min f (x)
x∈C

C ⊆ Rd is a convex set

f : C → R is a convex function

Convex Optimization Regime!

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 50 / 55


Convex Optimization

What Is So Special About Convex Optimization?

Appealing geometry in small dimensions


Nice properties from an optimization perspective
I Every local optimal solution (if it exists) is a global optimal solution.

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 51 / 55


Convex Optimization

Local vs. Global Optimal Solutions

Proposition
Consider the convex optimization problem minx∈Rd f (x). Then every local
optimal solution of the problem minx∈Rd f (x) is a global optimal solution.

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 52 / 55


Convex Optimization

Local vs. Global Optimal Solutions

First-order necessary and sufficient condition for optimality


Let f : Rd → R be a continuously differentiable convex function. A point
x ∗ ∈ Rd is an optimal solution of minx∈Rd f (x) if and only if ∇f (x ∗ ) = 0.

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 53 / 55


Convex Optimization

Local vs. Global Optimal Solutions

Proposition
Let f : Rd → R be a continuously differentiable convex function. A point
x ∗ ∈ Rd is an optimal solution of minx∈Rd f (x) if and only if ∇f (x ∗ ) = 0.

Note: The zero gradient condition is necessary and sufficient for


optimality!

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 54 / 55


Convex Optimization

Optimal solutions for Strictly Convex Functions

Uniqueness of solution
Consider the convex optimization problem minx∈Rd f (x) where f is strictly
convex. If the set X ∗ = argminx∈Rd f (x) is non-empty, then X ∗ contains
exactly one element.

P.Balamurugan Deep Learning - Theory and Practice Lecture 10 - Part 2 55 / 55

You might also like