Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 14

Point of non- Point of non-

differentiabilit differentiabilit
y y
1

0
HINGE LOSS ReLU
calculus of the
non-differentiables
it is getting difficult to differentiate developments in the
crypto-NFT world from joke articles on The Onion
Convex Functions Revisited

𝑑
𝑓 :ℝ →ℝ 𝐱 𝐳 𝐲 𝐱 𝐳 𝐲
∀𝐱 ,𝐲 CONVEX FUNCTION NON-CONVEX FN
∀ 𝜆 ∈ [ 0,1 ]
If you “fill up” the space above the
function curve and that set looks
convex, then the function is convex

𝐳=𝜆 ⋅ 𝐱+ ( 1 − 𝜆 ) ⋅ 𝐲 too

𝑓 ( 𝐳 ) ≤ 𝜆 ⋅ 𝑓 ( 𝐱 )+ (1 − 𝜆 ) ⋅ 𝑓 ( 𝐲 ) A convex function must


lie below all its chords
Convex Functions Revisited
Affine function is a fancy name
for functions of the form
∇ 𝑓 ( 𝐱)
∇ 𝑓 ( 𝐱)
𝑑
𝑓 :ℝ →ℝ 𝐱 𝐲 𝐱 𝐲
∀𝐱 ,𝐲 CONVEX FUNCTION NON-CONVEX FN

𝑓 ( 𝐲 ) ≥ 𝑓 ( 𝐱 ) + ∇ 𝑓 ( 𝐱 ) ( 𝐲 − 𝐱 ) touches
Notice that that the tangent indeed
the function curve at since

The tangent to a function at a point is the


affine function with and i.e., the tangent is A differentiable convex
function must lie above all its
tangents
Derivatives of non-differentiable functions?
Let be a non-differentiable
function that is at least convex Point of non-
differentiabilit
Hinge loss, ReLU are good examples 1 y

Recall: for differentiable functions,


gradients give us tangents and a 0
convex function must lie above all HINGE LOSS ReLU
its tangent
Trick: turn this definition on its head
If an affine function lies below the
function and touches it at , then the
normal vector of that affine function i.e.,
can act as a gradient to the function at
Subgradients and Subdifferentials
Tangents to convex functions
satisfy two properties: they touch
the function curve at one point and
always lie below the function curve
Let be a (possibly non-differentiable but convex) function
At any suppose there exists a vector using which we define an affine
function
Note that . However, if we also have for all , then is called a
subgradient of the function at
The set of all subgradients of at a point is the subdifferential of at
and denoted by

If is non-diffble at then it can indeed have


multiple subgradients at . If is differentiable at , Wait! Does this mean a function can have
then it can have only one subgradient at , and more than one subgradient at a point
that is the gradient itself 
Subdifferential At where the function is
differentiable, there is just one
subgradient – the gradient itself.

However, at which is a point of


non-differentiability, there are
infinitely many subgradients.

How can I find out the


subgradients of a function?

There are rules of subdifferential


𝐱 𝐲 calculus just as there are rules of
regular calculus
Subdifferential Calculus Rules
𝑑 𝑑
𝐱 ∈ℝ , 𝐚 ∈ ℝ , 𝑏 , 𝑐 ∈ ℝ
SUBDIFFERENTIAL
REGULAR CALCULUS
CALCULUS
Scalin
g Rule

Sum
Rule

Chain
Rule
Subdifferential Calculus Rules – Max Rule
If , then
Local minima/maxima must be stationary in
this sense even for non-differentiable
where functions so we can use this fact to discover
local minima/maxima for non-diff functions
There is no counterpart to the max rule in regular calculus
This is because functions of the form usually turn out to be non-differentiable
and so regular calculus falls silent anyway

What about Good point! In subgradient calculus, a point is a


stationary stationary point for a function if the zero vector is
points? a part of the subdifferential i.e.,
Differentiating the Hinge Loss
1
Rewrite as ,
Differentiable at all points except
We get if
We also get if
At use subdifferential calculus to get

Use the fact that and

The subdifferential is a set i.e.,


we must say instead of
Differentiating the Hinge Loss
Can use chain rule to calculate more complicated subdifferentials
Given: a function defined as
Goal: find the subdifferential for the function i.e.,
Applying the chain rule tells us that
Exercise
 Let where . For any , find a subgradient of i.e., find a
 Note: no need to find the entire subdifferential, just any subgradient
 Let where . Find at all
 Let . Find at any
 Let . Find at any
 Let . Find at any
 Note: for last 4 questions, need to find the entire subdifferential
Summary
Non-differentiable functions are common in ML applications
Concepts of gradient can be extended to non-differentiable cvx fns
Involve studying the relationship between gradients and tangents
Allow us to develop the notion of subdifferential
Instead of a unique gradient, may have several (infinite)
subgradients
Elegant extension: for differentiable functions, unique subgradient
Rules of subdifferential calculus that ease calculations
Scaling rule, sum rule, chain rule, max rule
Stay amazing!
See you next time.

You might also like