Lecture 04 (3hrs) Neural Network and Deep Learning-Part A

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 76

Neural Network and Deep

Learning

Xizhao WANG
Dian ZHANG
Big Data Institute
College of Computer Science
Shenzhen University

March 2022
Gradient Descent Algorithm
BP Algorithm for Feed-Forward Neural Network Model
Convolutional Neural Network
Deep Learning

Outline

1. Gradient Descent Algorithm


2. BP Algorithm for Feed-Forward Neural
Network Model
3. Convolutional Neural Network
4. Deep Learning

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Definition of Gradient
BP Algorithm for Feed-Forward Neural Network Model 2. Gradient Descent Algorithm (GDA)
Convolutional Neural Network 3. Difference between GDA and Newton's Method
Deep Learning 4. An example

Lecture 01
Gradient Descent Algorithm
Gradient Descent Algorithm 1. Definition of Gradient
BP Algorithm for Feed-Forward Neural Network Model 2. Gradient Descent Algorithm (GDA)
Convolutional Neural Network 3. Difference between GDA and Newton's Method
Deep Learning 4. An example

Gradient Descent Algorithm

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Definition of Gradient
BP Algorithm for Feed-Forward Neural Network Model 2. Gradient Descent Algorithm (GDA)
Convolutional Neural Network 3. Difference between GDA and Newton's Method
Deep Learning 4. An example

Gradient Descent Algorithm - start from an example


Minimize: f ( x)  x 2 .
Step 1: computing the gradient,   2x.
Step 2: moving x along the negative direction of the gradient, i.e.,
x  x    , where γ is the learning rate.
Step 3: Looping Step 2, untill the difference of f (x) between two
adjacent iteration is small enough, which indicates f (x) attains its local
minimum value.
Step 4: outputting x, which is the optimal solution.

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Definition of Gradient
BP Algorithm for Feed-Forward Neural Network Model 2. Gradient Descent Algorithm (GDA)
Convolutional Neural Network 3. Difference between GDA and Newton's Method
Deep Learning 4. An example

Gradient Descent Algorithm


Example

Minimize f (x)=x2 by using Gradient Descent Algorithm

 
x, t , x The initial value of x is 2,
and the step length is 0.1.

After iteration 49 times, t


wi he minimum value 1.273
147e-09 of the function is
obtained, and the corresp
onding x value is 3.56811
9e-05.

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Definition of Gradient
BP Algorithm for Feed-Forward Neural Network Model 2. Gradient Descent Algorithm (GDA)
Convolutional Neural Network 3. Difference between GDA and Newton's Method
Deep Learning 4. An example

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Definition of Gradient
BP Algorithm for Feed-Forward Neural Network Model 2. Gradient Descent Algorithm (GDA)
Convolutional Neural Network 3. Difference between GDA and Newton's Method
Deep Learning 4. An example

Gradient Descent Algorithm


Definition:

Directional derivative (taking the triadic function as an example):


Suppose function f is defined in a neighborhood of point P0 (x0, y0,
z0),l is a ray from point P0, P (x, y, z) is a point on l and is
contained in the neighborhood of P0, ρ represents the distance
between P and P0.

If lim((( f ( P ))  ( f ( P0 ))) /  )  lim(f /  )

exists when ρ→0, we call this limit the directional derivative of f


at P0 along the direction of l.

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Definition of Gradient
BP Algorithm for Feed-Forward Neural Network Model 2. Gradient Descent Algorithm (GDA)
Convolutional Neural Network 3. Difference between GDA and Newton's Method
Deep Learning 4. An example

Gradient Descent Algorithm


Generally speaking, directional derivative is the rate of change
of a function in a specified direction.

The gradient of a scalar function f (x1, x2, ∙∙∙, xn) is denoted as


 f 
 x 
 1 
 f  T f
   f f f 
f ( X )   x 2  , ,  ,  .
 



 x1 x 2 x n 
 
 f 
 x 
 n 

In the three-dimensional Cartesian coordinate system with a Euclidean


metric, the gradient, if it exists, is given by:

f f f
f  i j k
x y z
where i, j, k are the standard unit vectors in the directions of the
coordinates, respectively. For example, the gradient of the function
f ( x, y, z )  2 x  3 y 2  sin( z ) is f  2i  6 yj  cos( z )k.
Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Definition of Gradient
BP Algorithm for Feed-Forward Neural Network Model 2. Gradient Descent Algorithm (GDA)
Convolutional Neural Network 3. Difference between GDA and Newton's Method
Deep Learning 4. An example

Gradient Descent Algorithm


Geometric Meaning
The gradient specifies the direction that produces the steepest increase in the
function. The negative of gradient therefore gives the direction of steepest
decrease.

In the above two images, the values of the function are represented in black and white, black re
presenting higher values, and its corresponding gradient is represented by blue arrows.

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Definition of Gradient
BP Algorithm for Feed-Forward Neural Network Model 2. Gradient Descent Algorithm (GDA)
Convolutional Neural Network 3. Difference between GDA and Newton's Method
Deep Learning 4. An example

Gradient Descent Algorithm


Geometric Meaning
2
 2
The gradient of the function f ( x, y )   cos x  cos y 
2

is depicted as a projected vector field on the bottom plane.

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Definition of Gradient
BP Algorithm for Feed-Forward Neural Network Model 2. Gradient Descent Algorithm (GDA)
Convolutional Neural Network 3. Difference between GDA and Newton's Method
Deep Learning 4. An example

Gradient Descent Algorithm


For the 2-dimensional case:
Gradient: Suppose z  f ( x, y ) has the first-order continuous partial derivative on
region D, and for there exists a vector  P ( x, y )
 f f   
 ,   f x ( x , y ) i  f y ( x , y ) j,
 x y 
then the gradient of z=f (x, y) at P(x, y) is marked as grad f (x, y) or i.e., f ( x, y ),

f  f 
grad f ( x, y )  f ( x, y )  i j.
x y

Along with gradient direction,


the function changes most
quickly

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Definition of Gradient
BP Algorithm for Feed-Forward Neural Network Model 2. Gradient Descent Algorithm (GDA)
Convolutional Neural Network 3. Difference between GDA and Newton's Method
Deep Learning 4. An example

Gradient Descent Algorithm


Suppose e = [cosα, cosβ] is a unit vector in l direction, then

f f f  f f 
 cos   cos    , cos  , cos  
l x y  x y 
 grad f ( x, y )  e
 grad f ( x, y )  e  cos gradf ( x, y ), e

cos gradf ( x, y ), e  1,
f
then the directional derivative attains its maximum value, which equals to the
l
norm of gradient, i.e.

2 2
 f   f 
grad f ( x, y )    
 y 
 .
 x   
Then when variables change along the gradient direction, the rate of change of a
function attains its maximum value, which is the norm of the gradient.
Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Definition of Gradient
BP Algorithm for Feed-Forward Neural Network Model 2. Gradient Descent Algorithm (GDA)
Convolutional Neural Network 3. Difference between GDA and Newton's Method
Deep Learning 4. An example

Gradient Descent Algorithm

When gradient is generalized to n dimensional space, it can be represented as:

 f 
 x 
 1
 f  T
   f f f 
f ( X )   x2    , ,,  .
   x1 x2 xn 
 
 f 
 x 
 n

Along with gradient direction, the function changes most quickly.

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Definition of Gradient
BP Algorithm for Feed-Forward Neural Network Model 2. Gradient Descent Algorithm (GDA)
Convolutional Neural Network 3. Difference between GDA and Newton's Method
Deep Learning 4. An example

Gradient Descent Algorithm


initial point

the minimum value

The gradient descent algorithm may lead to local optimal solution; the
global optimal one can be ensured when the loss function is convex.

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Definition of Gradient
BP Algorithm for Feed-Forward Neural Network Model 2. Gradient Descent Algorithm (GDA)
Convolutional Neural Network 3. Difference between GDA and Newton's Method
Deep Learning 4. An example

Gradient Descent Algorithm

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Definition of Gradient
BP Algorithm for Feed-Forward Neural Network Model 2. Gradient Descent Algorithm (GDA)
Convolutional Neural Network 3. Difference between GDA and Newton's Method
Deep Learning 4. An example

Gradient Descent Algorithm

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Definition of Gradient
BP Algorithm for Feed-Forward Neural Network Model 2. Gradient Descent Algorithm (GDA)
Convolutional Neural Network 3. Difference between GDA and Newton's Method
Deep Learning 4. An example

Gradient Descent Algorithm

Note on the Gradient Descent Algorithm parameters

1. The magnitude of gradient, epxilong, is one of termination conditions


2. Another termination condition is the iteration numbers (time control)

3. The learning rate, alpha, is to control the “walking-step”, too small will
lead to slow convergence (low efficiency), but too big will result in
vibrating (non-convergence). Its appropriate value is dependent on the
specific function to be minimized.

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Definition of Gradient
BP Algorithm for Feed-Forward Neural Network Model 2. Gradient Descent Algorithm (GDA)
Convolutional Neural Network 3. Difference between GDA and Newton's Method
Deep Learning 4. An example

Gradient Descent Algorithm

1. Definition of gradient
2. Gradient descent algorithm (GDA)
3. Difference between GDA and Newton’ s method
4. An example

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Definition of Gradient
BP Algorithm for Feed-Forward Neural Network Model 2. Gradient Descent Algorithm (GDA)
Convolutional Neural Network 3. Difference between GDA and Newton's Method
Deep Learning 4. An example

Gauss-Newton method
Suppose the objective function f(x) has the second order continuous partial
derivative; xk is an approximation of its minimum point. The second order
Taylor polynomial approximation of f(x) near xk is shown as follows:

Its gradient is

The minimum point of the approximate function satisfies

then
where H(xk) is the Hessian matrix of f(x) at point xk.

In the minimizing process of f(x), is considered as the


searching direction.

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Definition of Gradient
BP Algorithm for Feed-Forward Neural Network Model 2. Gradient Descent Algorithm (GDA)
Convolutional Neural Network 3. Difference between GDA and Newton's Method
Deep Learning 4. An example

Gauss-Newton method
The minimizing process of Gauss-Newton method can be represented as:

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Definition of Gradient
BP Algorithm for Feed-Forward Neural Network Model 2. Gradient Descent Algorithm (GDA)
Convolutional Neural Network 3. Difference between GDA and Newton's Method
Deep Learning 4. An example

Gauss-Newton method

In optimization, Newton's method is applied to the derivative f ′ of a twice-di


fferentiable function f to find the roots of the derivative (solutions to
f ′(x)=0), also known as the stationary points of f.

In the one-dimensional problem, Newton's method to find the roots attempt


s to construct a sequence xn from an initial guess x0 that converges towards
 
x , t , x
some value x* satisfying f ′(x*)=0. This x* is a stationary point of f. t
wi order Taylor expansion fT(x) of f around xn is:
The second

wi
1 ''
fT  x   fT  xn  x   f  xn   f '
 xn  x  f  xn  x 2 .
2

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Definition of Gradient
BP Algorithm for Feed-Forward Neural Network Model 2. Gradient Descent Algorithm (GDA)
Convolutional Neural Network 3. Difference between GDA and Newton's Method
Deep Learning 4. An example

Gauss-Newton method
We want to find Δx such that xn + Δx is a stationary point. We seek to solve the e
quation that sets the derivative of this last expression with respect to Δx equal t
o zero:
d  1 '' 
  n  n f  xn  x 2   f '  xn   f ''  xn  x.
'
0 f x  f x x 
d x  2 
 
For the value of Δx = −f ′(xn) / f ″(xn), which
x , t is, the solution
x of this equation, it can t
be hoped that xn+1 = xn + Δx = xn − f ′(xn) / f ″(xn) will be closer to a stationary point
x*. Provided that f is a twice-differentiable function and other technical condition
s are satisfied, the sequence x1, x2, ∙∙∙ will converge to a point x* satisfying f ′(x*)
= 0.

The above iterative scheme can be generalized to several dimensions by replac


ing the derivative with the gradient, ∇f (x), and the reciprocal of the second deri
vative with the inverse of the Hessian matrix, H f (x). One obtains the iterative s
cheme

1
x n 1  x n   H f  xn  f  xn  , n  0.
Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Definition of Gradient
BP Algorithm for Feed-Forward Neural Network Model 2. Gradient Descent Algorithm (GDA)
Convolutional Neural Network 3. Difference between GDA and Newton's Method
Deep Learning 4. An example

Gradient Descent Algorithm


Comparison of GDA and Netwon's Method

A comparison of gradient descent


 (green) and Newton's method (red
x, t , x t
) for minimizing a function (with s
mall step sizes).

wi Newton's method uses curvature


information to take a more direct
route.

从本质上去看,牛顿法是二阶收敛,梯度下降是一阶收敛
,所以牛顿法就更快。如果更通俗地说的话,比如你想找
一条最短的路径走到一个盆地的最底部,梯度下降法每次
只从你当前所处位置选一个坡度最大的方向走一步,牛顿
法在选择方向时,不仅会考虑坡度是否够大,还会考虑你
走了一步之后,坡度是否会变得更大。

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Definition of Gradient
BP Algorithm for Feed-Forward Neural Network Model 2. Gradient Descent Algorithm (GDA)
Convolutional Neural Network 3. Difference between GDA and Newton's Method
Deep Learning 4. An example

Gradient Descent Algorithm

1. Definition of gradient
2. Gradient descent algorithm (GDA)
3. Difference between GDA and Newton’s method
4. An example

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Definition of Gradient
BP Algorithm for Feed-Forward Neural Network Model 2. Gradient Descent Algorithm (GDA)
Convolutional Neural Network 3. Difference between GDA and Newton's Method
Deep Learning 4. An example

Newton’s method
Example

Minimize f (x)=x2 by using Netwon's Method


x, t , x0
The initial value of x is 2.

After iteration 15 times, t


he minimum value 3.7253
wi e-09 of the function is
obtained, and the corresp
onding x value is 6.1035e
x1
-05.
x2

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
Gradient Descent Algorithm 1. Definition of Gradient
BP Algorithm for Feed-Forward Neural Network Model 2. Gradient Descent Algorithm (GDA)
Convolutional Neural Network 3. Difference between GDA and Newton's Method
Deep Learning 4. An example

Gradient Descent Algorithm

Gradient Descent
Algorithm

The End.

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An application
6. Questions

Outline

1. Gradient Descent Algorithm


2. BP Algorithm for Feed-Forward Neural
Network Model
3. Convolutional Neural Network
4. Deep Learning

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model


• Rumelhart, McClelland proposed BP(Back Propagation)
algorithm for feed-forward neural network

David Geoffrey
Rumelhart Everest Hinton

• BP algorithm – key idea


– Using the error of output layer to estimate the error of its
previous layer, generally using the error of layer n to
estimate the error of layer n-1

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model


A intuitive understanding to a feed-forward neural network

A feed-forward NN is a smooth function which can used


to approximate a system of input-output (Black Box)

What is the specific form of the function


in box?
Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model


• A Perceptron

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model


• Sigmoid threshold unit

The Sigmoid unit computes its output o as where

It is easy to check that

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model

An intuitive example

Digit “3”

Digit “8”

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model

An intuitive example

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model

An intuitive example

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model

An intuitive example

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model


• A Perceptron can be used to represent many Boolean
functions, like the following:

A Perceptron cannot be used to represent:

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model


• Sigmoid function picture

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model

An intuitive understanding to a feed-forward neural network

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model

A intuitive understanding to a feed-forward neural network

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model

A intuitive understanding to a feed-forward neural network

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

Overview of Backprop algorithm

• Choose random weights for the network


• Feed in an example and obtain a result
• Calculate the error for each node
(starting from the last stage and propagating the error backwards)
• Update the weights
• Repeat with other examples until the network converges on the
target output

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

Backwards pass

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

Backwards pass

https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model

  
x  AT A
1
AT b,

Iteration method, approaches the optimal solution gradually through each updating
step.

Gradient descent, which belongs to iteration methods, is available for least squares
problems.

Gauss-Newton method is an commonly used iteration approach to solving


nonlinear least squares problems.

Levenberg-Marquardt is another iteration method to solve nonlinear least squares


problems.
Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model


It is a function of and its minimum
exists. BP algorithm is to use the
gradient descent technique to find
the minimum by gradually updating
the weights

It is easy to know that

The remaining task to derive a convenient expression


for
Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model


1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model


In summary:

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model

1. Brief introduction
2. Feedforward NN
3. BP algorithm
4. Notes on BP
5. An application
6. Questions

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model


• Learning process :
– Stimulated by input samples, the connection weights
update gradually, such that network outputs approach
expected outputs step by step.

• Learning essence :
– Dynamically update connection weights

• Learning rule :
– It is the rule of how updating the connection weights
(What rule is followed)

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model


• Learning type : Supervised
• Key idea :
– The output error (in a suitable form) is back-propagated
to input layer via hidden layer(s)

Assigning the error to Updating


all units (nodes) in weight for
layers each node
• Features :
– Signal forward-propagated
– Error back-propagated

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model


• Forward propagation :
– Input sample - input layer - every hidden layer
- output layer
• Judge whether go to back-propagation :
– If the difference between actual and expected outputs
(in output layer) is bigger than a threshold
• Back-propagation
– Representing errors of each layer and updating weight
for each node
• Stop if output error is under a predefined threshold or the
number of iterations attains the predefined maximum.
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model


Related concepts of gradient descent

1. Learning rate: in the process of gradient descent, the function decreases


along the negative direction of the gradient. Learning rate determines the
descent degree for each iteration step.

2. Feature: the inputs of the algorithm, which are used to describe the
samples.

3. Hypothesis function: in supervised learning, it aims to fit leaning samples.

4. Loss function: it can measure the effectiveness of hypothesis function,


generally, which is computed as the square of the difference between the
outputs and the prediction fitting values.

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model

Standard Gradient Descent As described in the Gradient Descent Algorithm,


the calculation of gradient is based on all the 
 training samples.
x, t , x t
Stochasticw Gradient Descent Whereas the gradient descent training rule
i
presented in the Gradient Descent Algorithm computes weight updates after summing
over all the training examples, the idea behind stochastic gradient descent is to
approximate this gradient
wi descent search by updating weights incrementally,
x , t
following the calculation of thefunction for each individual example.
x
Batch GradientwiDescent
, , where the gradient is based on a batch of the training
samples.

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model


Remarks

The key differences between standard gradient descent and stochastic gradient
descent are:

• In standard gradient descent, the error is summed


 over allexamples before updating weights,
x, t , x
whereas in stochastic gradient descent weights are updated upon examining each training
t
example.
wi

• Summing over multiple wexamples


i in standard gradient descent requires more computation per

weight update step.
x , t On the other hand, because it uses the true gradient, standard gradient
 size per weight update than stochastic gradient descent.
descent is often used with a larger step
x
wi ,
• In cases where there are multiple local minima with respect to the objective function, stochastic
gradient descent can sometimes avoid falling into these local minima because it uses the various
V E d ( G ) rather than V E ( 6 ) to guide its search.

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model

1. Brief introduction
2. Feedforward NN
3. BP algorithm
4. Notes on BP
5. An application
6. Questions

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model


• A 3-layer feed-forward neural network: Neural
network learning to steer an autonomous vehicle

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model

1. Brief introduction
2. Feedforward NN
3. BP algorithm
4. Notes on BP
5. An application
6. Questions

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model

Questions:

1.If the features are not numerical but symbolic, I mean,


the input-output system has input of symbols and
output of real number, do you think how to use BP to
train the approximator?
2.In comparison with real case, how about its
performance?
3.In your own opinion, how to empirically select the
step in Gradient Descent Algorithm?

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning
1. Brief Introduction
Gradient Descent Algorithm 2. Feedforward NN
BP Algorithm for Feed-Forward Neural Network Model 3. BP Algorithm
Convolutional Neural Network 4. Notes on BP
Deep Learning 5. An Application
6. Questions

BP Algorithm for Feed-Forward Neural Network Model

Feedforward NN and
BP Algorithm

The End.

Machine Learning Lecture – Xizhao Wang Lecture 03: Neural Network and Deep Learning

You might also like