Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 36

GRADIENT DESCENT

SLOPE
• The slope of a line is the measure of the steepness and the direction of the line.
• The slope of a line is defined as the change in y coordinate with respect to the change
in x coordinate of that line.

m = Δy/Δx
where,m is the slope
Note that tan θ = Δy/Δx
We also refer this tan θ to be the slope of the
line.
3
Slope of Horizontal Line

• A horizontal line is a straight line that is parallel to the x-axis or is drawn from left to right or right to left in a
coordinate plane.

• Net change in the y-coordinates of the horizontal line is zero. The slope of a horizontal line can be given as,
m = Δy/Δx = zero

Slope of Vertical Line

• A vertical line is a straight line that is parallel to the y-axis or is drawn from top to bottom or bottom to top in a
coordinate plane.

• Net change in the x-coordinates of the vertical line is zero. The slope of a vertical line can be given as,
m = Δy/Δx = undefined
TANGENT LINE
• The "tangent line" is one of the most important
applications of differentiation. The word
"tangent" comes from the Latin word "tangere"
which means "to touch".
• The tangent line touches the curve at a point on
the curve.

• To find the tangent line equation, we need to


know the equation of the curve (which is given
by a function) and the point at which the tangent
is drawn.
• The tangent line of a curve at a given point is a line that just touches the curve
(function) at that point.
SLOPE OF TANGENT LINE
• Let us consider a curve that is represented by a function f(x).
• Also, let us consider a secant line passing through two points of the curve P (x 0, f(x0))
and Q (x0 + h, f(x0 + h)). i.e., P and Q are at a distance of h units from each other.
SLOPE OF TANGENT LINE
• Then the slope of the secant line using the slope formula is,
• Slope of secant line = [f(x0 + h) - f(x0)] / (x0 + h - x0) = [f(x0 + h) - f(x0)] / h
• If Q comes very close to P (by making h → 0) and merges with P, then the secant
line becomes the tangent line at P. i.e., the slope of the tangent line at P can be
obtained by applying h → 0 to the slope of the secant line. So
Slope of Tangent at P = limₕ → 0 [f(x0 + h) - f(x0)] / h
SLOPE OF TANGENT LINE
• Then the slope of the secant line using the slope formula is,

• Slope of secant line = [f(x0 + h) - f(x0)] / (x0 + h - x0) = [f(x0 + h) - f(x0)] / h

• If Q comes very close to P (by making h → 0) and merges with P, then the secant
line becomes the tangent line at P. i.e., the slope of the tangent line at P can be
obtained by applying h → 0 to the slope of the secant line. So
Slope of Tangent at P = limₕ → 0 [f(x0 + h) - f(x0)] / h
We know that this is nothing but the derivative of f(x) at x = x0 (by the limit definition of
the derivative (or) first principles). i.e.,
•Slope of Tangent at P = f '(x0)
Therefore, the slope of the tangent is nothing but the derivative of the function
at the point where it is drawn.
Example 1: Find the equation of the tangent line to the graph of at the point (−1,2).
GRADIENT DESCENT
• Gradient descent (GD) is an iterative first-order optimisation algorithm, used to find a
local minimum/maximum of a given function.
• This method is commonly used in machine learning (ML) and deep learning (DL) to
minimize a cost/loss function (e.g. in a linear regression).
Function requirements
Gradient descent algorithm does not work for all
functions. There are two specific requirements. A
function has to be:
• Differentiable
• Convex
DIFFERENTIABLE
NON- DIFFERENTIABLE
function has to be convex.
• For a univariate function, this means that the line segment connecting two function’s
points lays on or above its curve (it does not cross it). If it does it means that it has a
local minimum which is not a global one.
function has to be convex.
• For a univariate function, this means that the line segment connecting two function’s
points lays on or above its curve (it does not cross it). If it does it means that it has a
local minimum which is not a global one.

if a univariate function is convex is to calculate the second


derivative and check if its value is always bigger than 0.
CONVEX FUNCTION

Because the second derivative is always bigger than 0, our function is strictly
convex.
Gradient

• It is a slope of a curve at a given point in a specified direction.


• In the case of a univariate function,
• it is simply the first derivative at a selected point.

• In the case of a multivariate function,


• it is a vector of derivatives in each main direction (along variable axes).

• A gradient for an n-dimensional function f(x) at a given point p is defined as follows:


Gradient
Gradient descent
• Imagine a valley and a person with no
sense of direction who wants to get to
the bottom of the valley.

• He goes down the slope and takes large


steps when the slope is steep and small
steps when the slope is less steep.

• He decides his next position based on


his current position and stops when he
gets to the bottom of the valley which
was his goal.
20
Gradient descent
Gradient Descent Algorithm iteratively
• calculates the next point using gradient at the current position
• scales it (by a learning rate)
• subtracts obtained value from the current position (makes a step).
• It subtracts the value because we want to minimise the function (to maximise it
would be adding).

an important parameter η which scales the gradient and thus


controls the step size. In machine learning, it is called learning
rate
• Gradient Descent method’s steps are:
1. choose a starting point (initialisation)
2. calculate gradient at this point
3. make a scaled step in the opposite direction to the gradient (objective: minimise)
4. repeat points 2 and 3 until one of the criteria is met:
• maximum number of iterations reached
• step size is smaller than the tolerance (due to scaling or a small gradient).
Gradient Descent

Gradient Descent Algorithm:


• Pick an initial point
• Iterate until convergence

where is the step size (sometimes called learning rate)

23
24
Gradient Descent: An Illustration
Negative gradient here . Let’s move Learning rate is very important
𝐿(𝒘 ) in the positive direction

Positive gradient
here. Let’s move in
the negative direction

𝒘
(0)
𝒘
𝒘 𝒘
( 1) (3)

𝒘
(2)
𝒘 𝒘
𝒘 ∗𝒘
(2) 𝒘 (0)
(3) ( 1) 𝒘
Stuck at a local
minima
Good initialization
is very important
Gradient Descent
2
𝑓 ( 𝑥 ) =𝑥

Step size:
(0)
𝑥 =− 4

25
Gradient Descent
2
𝑓 ( 𝑥 ) =𝑥

Step size:
(0)
𝑥 =− 4
(1)
𝑥 =− 4 −.8 ⋅ 2 ⋅( − 4)

26
Gradient Descent
2
𝑓 ( 𝑥 ) =𝑥

Step size:
(0)
𝑥 =− 4
(1)
𝑥 =2.4

27
Gradient Descent
2
𝑓 ( 𝑥 ) =𝑥

Step size:
(0)
𝑥 =− 4
(1)
𝑥 =2.4
(2)
𝑥 =2.4 −.8 ⋅ 2⋅ 2.4

(1)
𝑥 =0.4
28
Gradient Descent
2
𝑓 ( 𝑥 ) =𝑥

Step size:
(0)
𝑥 =− 4
(1)
𝑥 =2.4
(2)
𝑥 =−1.44

29
Gradient Descent
2
𝑓 ( 𝑥 ) =𝑥

Step size:
(0)
𝑥 =− 4
(1)
𝑥 =2.4
1.44
(3)
𝑥 =.864
(4 )
𝑥 =− 0.5184
(5)
𝑥 =0.31104

(30)
𝑥 =−8.84296 𝑒−07
30
Gradient Descent

Step size: .9

31
Gradient Descent

Step size: .2

32
Gradient Descent

Step size matters!

33
Gradient Descent

Step size matters!

34
Slide credit: Andrew Ng
Xo =
LR=0.9

Xo =
LR=0.1

36

You might also like