Gradient Descent Vani

GRADIENT DESCENT
SLOPE
• The slope of a line is the measure of the steepness and the direction of the line.
• The slope of a line is defined as the change in y coordinate with respect to the change
in x coordinate of that line.
m = Δy/Δx
where,m is the slope
Note that tan θ = Δy/Δx
We also refer this tan θ to be the slope of the
line.
3
Slope of Horizontal Line
• A horizontal line is a straight line that is parallel to the x-axis or is drawn from left to right or right to left in a
coordinate plane.
• Net change in the y-coordinates of the horizontal line is zero. The slope of a horizontal line can be given as,
m = Δy/Δx = zero
Slope of Vertical Line
• A vertical line is a straight line that is parallel to the y-axis or is drawn from top to bottom or bottom to top in a
coordinate plane.
• Net change in the x-coordinates of the vertical line is zero. The slope of a vertical line can be given as,
m = Δy/Δx = undefined
TANGENT LINE
• The "tangent line" is one of the most important
applications of differentiation. The word
"tangent" comes from the Latin word "tangere"
which means "to touch".
• The tangent line touches the curve at a point on
the curve.
• To find the tangent line equation, we need to

know the equation of the curve (which is given
by a function) and the point at which the tangent
is drawn.
• The tangent line of a curve at a given point is a line that just touches the curve
(function) at that point.
SLOPE OF TANGENT LINE
• Let us consider a curve that is represented by a function f(x).
• Also, let us consider a secant line passing through two points of the curve P (x 0, f(x0))
and Q (x0 + h, f(x0 + h)). i.e., P and Q are at a distance of h units from each other.
• Then the slope of the secant line using the slope formula is,
• Slope of secant line = [f(x0 + h) - f(x0)] / (x0 + h - x0) = [f(x0 + h) - f(x0)] / h
• If Q comes very close to P (by making h → 0) and merges with P, then the secant
line becomes the tangent line at P. i.e., the slope of the tangent line at P can be
obtained by applying h → 0 to the slope of the secant line. So
Slope of Tangent at P = limₕ → 0 [f(x0 + h) - f(x0)] / h
• Then the slope of the secant line using the slope formula is,
• Slope of secant line = [f(x0 + h) - f(x0)] / (x0 + h - x0) = [f(x0 + h) - f(x0)] / h
• If Q comes very close to P (by making h → 0) and merges with P, then the secant
line becomes the tangent line at P. i.e., the slope of the tangent line at P can be
obtained by applying h → 0 to the slope of the secant line. So
Slope of Tangent at P = limₕ → 0 [f(x0 + h) - f(x0)] / h
We know that this is nothing but the derivative of f(x) at x = x0 (by the limit definition of
the derivative (or) first principles). i.e.,
•Slope of Tangent at P = f '(x0)
Therefore, the slope of the tangent is nothing but the derivative of the function
at the point where it is drawn.
Example 1: Find the equation of the tangent line to the graph of at the point (−1,2).
GRADIENT DESCENT
• Gradient descent (GD) is an iterative first-order optimisation algorithm, used to find a
local minimum/maximum of a given function.
• This method is commonly used in machine learning (ML) and deep learning (DL) to
minimize a cost/loss function (e.g. in a linear regression).
Function requirements
Gradient descent algorithm does not work for all
functions. There are two specific requirements. A
function has to be:
• Differentiable
• Convex
DIFFERENTIABLE
NON- DIFFERENTIABLE
function has to be convex.
• For a univariate function, this means that the line segment connecting two function’s
points lays on or above its curve (it does not cross it). If it does it means that it has a
local minimum which is not a global one.
function has to be convex.
• For a univariate function, this means that the line segment connecting two function’s
points lays on or above its curve (it does not cross it). If it does it means that it has a
local minimum which is not a global one.
if a univariate function is convex is to calculate the second

derivative and check if its value is always bigger than 0.
CONVEX FUNCTION
Because the second derivative is always bigger than 0, our function is strictly
convex.
Gradient
• It is a slope of a curve at a given point in a specified direction.

• In the case of a univariate function,
• it is simply the first derivative at a selected point.
• In the case of a multivariate function,

• it is a vector of derivatives in each main direction (along variable axes).
• A gradient for an n-dimensional function f(x) at a given point p is defined as follows:

Gradient
Gradient descent
• Imagine a valley and a person with no
sense of direction who wants to get to
the bottom of the valley.
• He goes down the slope and takes large

steps when the slope is steep and small
steps when the slope is less steep.
• He decides his next position based on

his current position and stops when he
gets to the bottom of the valley which
was his goal.
20
Gradient descent
Gradient Descent Algorithm iteratively
• calculates the next point using gradient at the current position
• scales it (by a learning rate)
• subtracts obtained value from the current position (makes a step).
• It subtracts the value because we want to minimise the function (to maximise it
would be adding).
an important parameter η which scales the gradient and thus

controls the step size. In machine learning, it is called learning
rate
• Gradient Descent method’s steps are:
1. choose a starting point (initialisation)
2. calculate gradient at this point
3. make a scaled step in the opposite direction to the gradient (objective: minimise)
4. repeat points 2 and 3 until one of the criteria is met:
• maximum number of iterations reached
• step size is smaller than the tolerance (due to scaling or a small gradient).
Gradient Descent
Gradient Descent Algorithm:

• Pick an initial point
• Iterate until convergence
where is the step size (sometimes called learning rate)
23
24
Gradient Descent: An Illustration
Negative gradient here . Let’s move Learning rate is very important
𝐿(𝒘 ) in the positive direction
Positive gradient
here. Let’s move in
the negative direction
𝒘
(0)
𝒘
𝒘 𝒘
( 1) (3)
∗
𝒘
(2)
𝒘 𝒘
𝒘 ∗𝒘
(2) 𝒘 (0)
(3) ( 1) 𝒘
Stuck at a local
minima
Good initialization
is very important
Gradient Descent
2
𝑓 ( 𝑥 ) =𝑥
Step size:
(0)
𝑥 =− 4
25
Gradient Descent
2
𝑓 ( 𝑥 ) =𝑥
Step size:
(0)
𝑥 =− 4
(1)
𝑥 =− 4 −.8 ⋅ 2 ⋅( − 4)
26
Gradient Descent
2
𝑓 ( 𝑥 ) =𝑥
Step size:
(0)
𝑥 =− 4
(1)
𝑥 =2.4
27
Gradient Descent
2
𝑓 ( 𝑥 ) =𝑥
Step size:
(0)
𝑥 =− 4
(1)
𝑥 =2.4
(2)
𝑥 =2.4 −.8 ⋅ 2⋅ 2.4
(1)
𝑥 =0.4
28
Gradient Descent
2
𝑓 ( 𝑥 ) =𝑥
Step size:
(0)
𝑥 =− 4
(1)
𝑥 =2.4
(2)
𝑥 =−1.44
29
Gradient Descent
2
𝑓 ( 𝑥 ) =𝑥
Step size:
(0)
𝑥 =− 4
(1)
𝑥 =2.4
1.44
(3)
𝑥 =.864
(4 )
𝑥 =− 0.5184
(5)
𝑥 =0.31104
(30)
𝑥 =−8.84296 𝑒−07
30
Gradient Descent
Step size: .9
31
Gradient Descent
Step size: .2
32
Gradient Descent
Step size matters!
33
Gradient Descent
Step size matters!
34
Slide credit: Andrew Ng
Xo =
LR=0.9
Xo =
LR=0.1
36

Gradient Descent Vani

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Gradient Descent Vani

Uploaded by

Copyright:

Available Formats

GRADIENT DESCENT

Slope of Vertical Line

• To find the tangent line equation, we need to

• Slope of secant line = [f(x0 + h) - f(x0)] / (x0 + h - x0) = [f(x0 + h) - f(x0)] / h

if a univariate function is convex is to calculate the second

• It is a slope of a curve at a given point in a specified direction.

• In the case of a multivariate function,

• A gradient for an n-dimensional function f(x) at a given point p is defined as follows:

• He goes down the slope and takes large

• He decides his next position based on

an important parameter η which scales the gradient and thus

Gradient Descent Algorithm:

where is the step size (sometimes called learning rate)

Step size matters!

Step size matters!

You might also like