Introduction To Neural Network: - CS 280 Tutorial One

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 14

Introduction to Neural Network

——CS 280 Tutorial One

Guoxing Sun
sungx@shanghaitech.edu.cn

2020-09-16
• The history of neural network

• Basic neural network

• Optimization of neural network

• Train a neural network with numpy


The history of neural network

1943 1957 1969

Warren McCulloch and Walter


Pitts modeled a simple neural Frank Rosenblatt invented Marivn Minsky and Seymour
network with electrical circuits. Perceptron. Papert found that single layer
Perceptron(linear model) can
not slove XOR problem

1974 1986 1989 1989 1997 1999

Paul Werbos proposed that using David Rumelhart , Geoffrey George Cybenko found the Lecun Jürgen Kernel SVM worked better
the back propagation of error to E. Hinton and Ronald J. university of neural propose Schmidhuber than neural network in many
train neural network, which make Williams reported this network d CNN proposed tasks.
multilayer networks possible approach, and made it famous. LSTM

2006 2012 Now

Geoffrey Hinton found deeper Geoffrey Hinton


neural network worked better and proposed Alex-Net
proposed greedy layer-wise
pretraining to train it
Basic neural network
Perceptron   𝑑
𝑇
𝑧=∑ 𝑤 𝑖 𝑥 𝑖+𝑏=𝒘 𝒙 +𝑏
1 𝑖=1
 𝑏
𝑥  1 𝑤
  1
 
¿
𝑥  2 𝑤
  2
 𝑎

   𝑓
𝑤
  𝑑 𝑥:
   input
𝑤  :   weight

𝑏:
   bias
𝑥  𝑑 𝑎:
   activation
𝑓  (⋅):   activation   function
Basic neural network
Multi Layer Perceptron (MLP)
𝑑
  (𝑙 ) ( 𝑙) ( 𝑙− 1 ) ( 𝑙)
𝑧 = ∑ 𝑤 𝑖 𝑎𝑖 +𝑏
𝑖=1
¿
𝑥  1
(𝑙) ( 𝑙)
𝑎
  =𝑓 𝑙 (𝑧 )
𝑥  2  𝑦
 𝑠

𝑥  𝑑
Basic neural network
Activation function

Adding nonlinear factors to solve the defect of insufficient expression ability of


linear model.
Basic neural network
Activation function
Optimization of neural network
Weight Initialization

Requirement:
(1) The activation value of each layer of neurons will not be saturated;
(2) The activation value of each layer cannot be 0.

1. Equal value initialization


General neural network has a symmetrical structure, so in the first error back-propagation, the updated network
parameters will be the same, in the next update, the same network parameters learning can not extract useful
features, so the deep learning model will not use equal value to initialize all parameters.

2. Random value
initialization
It's easy to saturate when it's big, but it's not activated when it's small.

https://zhuanlan.zhihu.com/p/40175178
Optimization of neural network
Weight Initialization
Input: mean: 0 var :1
3. Naive initialization Output: mean: 0 var 1/3n

4. Xavier initialization
Xavier glorot thinks that good initialization should make the activation value of each layer consistent with the
variance of gradient during propagation, which is called glorot condition.

Xavier can only be applied to saturated activation functions such as sigmoid and tanh, but not to unsaturated
activation functions such as relu.
Optimization of neural network
Weight Initialization

5. Kaiming initialization
Optimization of neural network
Optimizer

• Gradient Descent   ¿
• Stochastic Gradient Descent   ¿
• Mini-batch SGD
  ¿
𝐽  ( ⋅) :  loss   function
Optimization of neural network
Normalization

BN is applied between convolution calculation (or affine transformation in MLP) and activation function.
Optimization of neural network
Overfitting and Underfitting

Underfitting: the model cannot get small training errors.


Overfitting: the training error of the model is much smaller than the error
on the test dataset.
Influencing factor: model complexity, size of training dataset, ...
Optimization of neural network
Weight Decay

Weight decay (L2 norm regularization):

~  𝛼 𝑇
𝐽 ( 𝑤 ; 𝑋 , 𝑦)= 𝑤 𝑤+ 𝐽 ( 𝑤 ; 𝑋 , 𝑦)
2

α ↑, The elements of the learned weight parameter are closer to 0.

You might also like