Professional Documents
Culture Documents
Part 12
Part 12
learning I
Ji Hui
Ji Hui (National University of Singapore) Visual Information Interpretation: Basics on deep learning I October 25, 2021 1 / 37
Neural network based learning
Inspiration from brains composed of neurons at a low level
A neuron receives input from other neurons (generally thousands) from
its synapses
Inputs are approximately summed
When the input exceeds a threshold the neuron sends an electrical spike
that travels that travels from the body, down the axon, to the next
neuron(s)
Ji Hui (National University of Singapore) Visual Information Interpretation: Basics on deep learning I October 25, 2021 2 / 37
forward
Feedforwardnetwork for MNIST
Neural network
mages
f (1) is called the first layer of the NN, the input layer
f (2) is called the second layer
f (3) is called the output layer
These chain structures are the most commonly used structures of neural
networks
Overall the length of the chain is the depth of the model, e.g. depth=3 in
the example above.
Deep learning means the depth of the NN is very large.
Ji Hui (National University of Singapore) Visual Information Interpretation: Basics on deep learning I October 25, 2021 5 / 37
Deep neural network
A deep network is a multi-layer NN with MANY layers
Ji Hui (National University of Singapore) Visual Information Interpretation: Basics on deep learning I October 25, 2021 6 / 37
A network with depth 2: one hidden layer
Results:
One-layer perception (depth=2) for binary classification
One-layer perception (depth=2) cannot solve XOR (i.e., exclusive or).
Multi-layer perceptrons can solve XOR
Challenges:
How to train a multi-layer perceptron?
Ji Hui (National University of Singapore) Visual Information Interpretation: Basics on deep learning I October 25, 2021 8 / 37
A brief history of neural network
The thaw of NN learning
Ji Hui (National University of Singapore) Visual Information Interpretation: Basics on deep learning I October 25, 2021 8 / 37
A brief history of neural network
Deep learning renaissance
Ji Hui (National University of Singapore) Visual Information Interpretation: Basics on deep learning I October 25, 2021 8 / 37
Key factors in the popularity of deep learning
Ji Hui (National University of Singapore) Visual Information Interpretation: Basics on deep learning I October 25, 2021 9 / 37
Key factors in the popularity of deep learning
Ji Hui (National University of Singapore) Visual Information Interpretation: Basics on deep learning I October 25, 2021 10 / 37
Learning in NN
Definition
f (:) denote the unknown true mapping s.t. y = f (x)
h(:, w) denote the function representing an NN with weights w.
Learning a NN
Using input training data, {x(k) , y (k) = f (x(k) )}, to adjust the parameters w
to approximate f .
Ji Hui (National University of Singapore) Visual Information Interpretation: Basics on deep learning I October 25, 2021 11 / 37
Perceptron + Threshold Logic Unit (TLU)
A simple NN with perceptron and one TLU
X
y = h(x, w) = ( xk w k + w 0 )
k
Decision Boundary
Ji Hui (National University of Singapore) Visual Information Interpretation: Basics on deep learning I October 25, 2021 13 / 37
Other activation functions
Sigmoid activation function
1
(a) = sigm(a) =
1 + exp( a)
Sigmoid ReLU
Ji Hui (National University of Singapore) Visual Information Interpretation: Basics on deep learning I October 25, 2021 14 / 37
Simple NN with one hidden layer
More nodes in the hidden layer, more complex transform the NN can
model
Ji Hui (National University of Singapore) Visual Information Interpretation: Basics on deep learning I October 25, 2021 15 / 37
Binary decision boundary of simple NN
NN with different number of nodes in hidden layer. The activation is
ReLU.
Ji Hui (National University of Singapore) Visual Information Interpretation: Basics on deep learning I October 25, 2021 16 / 37
Multi-layer neural network
Ji Hui (National University of Singapore) Visual Information Interpretation: Basics on deep learning I October 25, 2021 17 / 37
Binary decision boundary of multi-layer NN
Ji Hui (National University of Singapore) Visual Information Interpretation: Basics on deep learning I October 25, 2021 18 / 37
Illustration of deep learning for classification
Definition (classifier)
A classifier f is a mapping that accepts a set of features and produce a class
label for them.
f (x 2 R3 ; w) ! y 2 { 1, 1}.
Ji Hui (National University of Singapore) Visual Information Interpretation: Basics on deep learning I October 25, 2021 19 / 37
Demo of training in NN
Iteratively refine the parameters w of NN to minimizing classification error
X
#(f (xi , w) 6= yi )
j
2.7 0.7(((((0)
Error=0.7
1.9
Adjust parameters based on classification errors
1.4
0. (((((0)
2.7
Error=0.
1.9
Ji Hui (National University of Singapore) Visual Information Interpretation: Basics on deep learning I October 25, 2021 21 / 37
Difference between deep NN and shallow NN
A shallow NN with one hidden layer
H(·) = v1 h(·, w1 ) + v2 h(·, w2 ) + v3 h(·, w3 ) · · ·
A summation based approximation scheme
A deep NN with many hidden layer
H(x) = h(h(h(· · · ; wn 2 ); wn 1 ); wn )
A variable substitution based approximation scheme
The question: what variable substitution is doing?
It rapidly change the level sets of the function
⌦h(·)=0 = {x 2 RN : h(x) = 0}
Ji Hui (National University of Singapore) Visual Information Interpretation: Basics on deep learning I October 25, 2021 22 / 37
Example of variable substitution
Consider a data set {(xi , yi )} where yi 2 {0, 1}. A binary classifier
function can be defined
⇢
1 f (x) > 1/2
f (x) =
0 f (x) < 1/2.
Consider a function h defined from ReLU network
8
< 2x if 0 x 1/2
h(x) = 2 2x if 1/2 x 1
:
0 otherwise.
Ji Hui (National University of Singapore) Visual Information Interpretation: Basics on deep learning I October 25, 2021 23 / 37
deep NN vs wide NN
Expression capability
Both NNs can do the job
wide NN is preferred, as it is easier in optimization owing to the separability
of node parameters.
Expression efficiency
For functions with simple level sets, deep NN has not advantages
For functions with complex level sets, deep NN is preferred owing to its
efficiency on rapidly changing level sets.
Conclusion: deep NN is for the application, where the targeting transform
satisfies
accuracy of its level sets is very important
structure of its level sets are complex
Sample applications
Multi-label function in classification
Value evaluation function in AI for games.
Pattern analysis in natural language processing
Ji Hui (National University of Singapore) Visual Information Interpretation: Basics on deep learning I October 25, 2021 24 / 37
Outline of NN training
yb = f (xn ; w)
Loss function
y , y) 2 R
L(b
Define goal:
N
X
w⇤ = argminw L(f (xn ; w), yi )
n=1
Ji Hui (National University of Singapore) Visual Information Interpretation: Basics on deep learning I October 25, 2021 25 / 37
Cost function
Consider a network denoted by f (x; w)
Gaussian error model: assume the prediction error follows normal
distribution:
pmodel (y|x) = N (y|f (x; w), I)
The likelihood function is then
N
Y
2 m 1
J(w) = (2⇡ ) 2 exp( 2
kyn f (xn ; w)k22 )
n=1
2
XN
1
min Ex,y2Pdata ky f (x; w)k22 ⇡ kyn f (xn ; w)k22
w
n=1
2
Ji Hui (National University of Singapore) Visual Information Interpretation: Basics on deep learning I October 25, 2021 27 / 37
Cross-entropy based cost function
Consider two distributions p(x) and q(x), then their cross-entropy is
defined as X
H(p, q) = p(x) ln q(x).
x
Consider a two-class problem y 2 {0, 1}, the Logistic regression, the MLE
minimize the negative log-likelihood
N
X
J(✓) = (yk⇤ ln yk + (1 yk⇤ ) ln(1 yk )),
n=1
where yk⇤ denote the true label of xk and yk = (! > xk ) denote the
prediction.
Furthermore, there is an interesting connection of cross-entropy to K-L
divergence
X X p(x)
H(p, q) = p(x) ln q(x) = p(x) log + H(p, p)
x x
q(x)
Ji Hui (National University of Singapore) Visual Information Interpretation: Basics on deep learning I October 25, 2021 29 / 37
Standard ML Training vs NN Training
Nonlinearity of neural network causes interesting loss functions to be
non-convex
Ji Hui (National University of Singapore) Visual Information Interpretation: Basics on deep learning I October 25, 2021 30 / 37
Learning vs Pure optimization
The cost function is the summations of multiple terms with same functional
form.
Ji Hui (National University of Singapore) Visual Information Interpretation: Basics on deep learning I October 25, 2021 31 / 37
Training with gradient descent
Ji Hui (National University of Singapore) Visual Information Interpretation: Basics on deep learning I October 25, 2021 32 / 37
Is differentiability necessary?
Ji Hui (National University of Singapore) Visual Information Interpretation: Basics on deep learning I October 25, 2021 33 / 37
How to calculate the gradient: backpropagation via
chain rule
Chain rule
given y = g(u) and u = h(x),
X dyi dui J
dyi
= , 8j, k.
dxk i=1
duj dxk
Ji Hui (National University of Singapore) Visual Information Interpretation: Basics on deep learning I October 25, 2021 35 / 37
Demonstration on NN
Consider the following network
Ji Hui (National University of Singapore) Visual Information Interpretation: Basics on deep learning I October 25, 2021 36 / 37
Illustration of backpropagation on NN
Ji Hui (National University of Singapore) Visual Information Interpretation: Basics on deep learning I October 25, 2021 37 / 37