Professional Documents
Culture Documents
Gradient Descent Vani
Gradient Descent Vani
SLOPE
• The slope of a line is the measure of the steepness and the direction of the line.
• The slope of a line is defined as the change in y coordinate with respect to the change
in x coordinate of that line.
m = Δy/Δx
where,m is the slope
Note that tan θ = Δy/Δx
We also refer this tan θ to be the slope of the
line.
3
Slope of Horizontal Line
• A horizontal line is a straight line that is parallel to the x-axis or is drawn from left to right or right to left in a
coordinate plane.
• Net change in the y-coordinates of the horizontal line is zero. The slope of a horizontal line can be given as,
m = Δy/Δx = zero
• A vertical line is a straight line that is parallel to the y-axis or is drawn from top to bottom or bottom to top in a
coordinate plane.
• Net change in the x-coordinates of the vertical line is zero. The slope of a vertical line can be given as,
m = Δy/Δx = undefined
TANGENT LINE
• The "tangent line" is one of the most important
applications of differentiation. The word
"tangent" comes from the Latin word "tangere"
which means "to touch".
• The tangent line touches the curve at a point on
the curve.
• If Q comes very close to P (by making h → 0) and merges with P, then the secant
line becomes the tangent line at P. i.e., the slope of the tangent line at P can be
obtained by applying h → 0 to the slope of the secant line. So
Slope of Tangent at P = limₕ → 0 [f(x0 + h) - f(x0)] / h
We know that this is nothing but the derivative of f(x) at x = x0 (by the limit definition of
the derivative (or) first principles). i.e.,
•Slope of Tangent at P = f '(x0)
Therefore, the slope of the tangent is nothing but the derivative of the function
at the point where it is drawn.
Example 1: Find the equation of the tangent line to the graph of at the point (−1,2).
GRADIENT DESCENT
• Gradient descent (GD) is an iterative first-order optimisation algorithm, used to find a
local minimum/maximum of a given function.
• This method is commonly used in machine learning (ML) and deep learning (DL) to
minimize a cost/loss function (e.g. in a linear regression).
Function requirements
Gradient descent algorithm does not work for all
functions. There are two specific requirements. A
function has to be:
• Differentiable
• Convex
DIFFERENTIABLE
NON- DIFFERENTIABLE
function has to be convex.
• For a univariate function, this means that the line segment connecting two function’s
points lays on or above its curve (it does not cross it). If it does it means that it has a
local minimum which is not a global one.
function has to be convex.
• For a univariate function, this means that the line segment connecting two function’s
points lays on or above its curve (it does not cross it). If it does it means that it has a
local minimum which is not a global one.
Because the second derivative is always bigger than 0, our function is strictly
convex.
Gradient
23
24
Gradient Descent: An Illustration
Negative gradient here . Let’s move Learning rate is very important
𝐿(𝒘 ) in the positive direction
Positive gradient
here. Let’s move in
the negative direction
𝒘
(0)
𝒘
𝒘 𝒘
( 1) (3)
∗
𝒘
(2)
𝒘 𝒘
𝒘 ∗𝒘
(2) 𝒘 (0)
(3) ( 1) 𝒘
Stuck at a local
minima
Good initialization
is very important
Gradient Descent
2
𝑓 ( 𝑥 ) =𝑥
Step size:
(0)
𝑥 =− 4
25
Gradient Descent
2
𝑓 ( 𝑥 ) =𝑥
Step size:
(0)
𝑥 =− 4
(1)
𝑥 =− 4 −.8 ⋅ 2 ⋅( − 4)
26
Gradient Descent
2
𝑓 ( 𝑥 ) =𝑥
Step size:
(0)
𝑥 =− 4
(1)
𝑥 =2.4
27
Gradient Descent
2
𝑓 ( 𝑥 ) =𝑥
Step size:
(0)
𝑥 =− 4
(1)
𝑥 =2.4
(2)
𝑥 =2.4 −.8 ⋅ 2⋅ 2.4
(1)
𝑥 =0.4
28
Gradient Descent
2
𝑓 ( 𝑥 ) =𝑥
Step size:
(0)
𝑥 =− 4
(1)
𝑥 =2.4
(2)
𝑥 =−1.44
29
Gradient Descent
2
𝑓 ( 𝑥 ) =𝑥
Step size:
(0)
𝑥 =− 4
(1)
𝑥 =2.4
1.44
(3)
𝑥 =.864
(4 )
𝑥 =− 0.5184
(5)
𝑥 =0.31104
(30)
𝑥 =−8.84296 𝑒−07
30
Gradient Descent
Step size: .9
31
Gradient Descent
Step size: .2
32
Gradient Descent
33
Gradient Descent
34
Slide credit: Andrew Ng
Xo =
LR=0.9
Xo =
LR=0.1
36