Professional Documents
Culture Documents
MLP_1122_20240509_ch10_DeepNN
MLP_1122_20240509_ch10_DeepNN
MLP_1122_20240509_ch10_DeepNN
Sources:
Ch. 6, ”Deep Learning” textbook
by Goodfellow et al.
Ch 10, Introduction to Artificial Neural Networks
Deep Neural Network
l Deep learning -> Deep neural network
l Deep feedforward networks, also often called
feedforward neural networks, or multilayer perceptrons
(MLPs), are the quintessential deep learning models.
• The goal of a feedforward network is to approximate some
function f∗.
• For example, for a classifier, y = f∗ (x) maps an input x to a
category y.
• A feedforward network defines a mapping y = f (x; θ) and learns
the value of the parameters θ that result in the best function
approximation.
• There are no feedback connections in which outputs of the
model are fed back into itself.
• When feedforward neural networks are extended to include
feedback connections, they are called recurrent neural
networks.
2
Biological Neuron and Perceptrons
4
Activation Functions
l Nonlinearity of neural network
l Binary step function
l Sigmoid function
5
Receptive Fields of Lateral Geniculate
and Primary Visual Cortex
http://www.geog.ucsb.edu/~kclarke 7
Human Cortical Visual Regions: V1, V2, V3, V4, V5 (MT)
http://raymond.rodriguez1.free.fr/Documents/Organisme-A/Vision
Hubel/Wiesel Architecture and Multi-layer Perceptrons
depth
width
10
Multi-layer Perceptrons (MLP)
l When an ANN has two or more hidden layers, it is called a
deep neural network (DNN).
11
Multi-layer Perceptrons
l
Backpropagation MLP
l Steps of the backpropagation algorithm for each training
instance:
• Feed the training instance to the network to compute the output
of every neuron in each consecutive layer (this is the forward
pass, just like when making predictions).
• Measure the network’s output error (i.e., the difference between
the desired output and the actual output of the network).
• Compute how much each neuron in the last hidden layer
contributed to each output neuron’s error.
• Proceed to measure how much of these error contributions
came from each neuron in the previous hidden layer.
• Repeat the above two steps until the algorithm reaches the
input layer. This reverse pass efficiently measures the error
gradient across all the connection weights in the network by
propagating the error gradient backward in the network (hence
the name of the algorithm).
13
Backpropagation MLP
l Steps of the backpropagation algorithm for each
training instance:
1. first makes a prediction (forward pass)
2. measures the error
3. goes through each layer in reverse to measure the
error contribution from each connection (reverse
pass)
4. finally slightly tweaks the connection weights to
reduce the error (Gradient Descent step).
14
Reverse-mode autodiff
Backpropagation algorithm: Gradient Descent using reverse-
mode autodiff (implemented in TensorFlow)
15
6.1 A simple example: learning XOR
l Data:
l Target function:
l Linear model:
l MSE loss function:
16
A simple example: learning XOR
l Let there be nonlinearity!
l ReLU: rectified linear unit:
l ReLU is applied element-wise to h:
17
A simple example: learning XOR
l Use one hidden layer containing two hidden
units to learn Φ.
18
A simple example: learning XOR
l Complete neural network model:
19
A simple example: learning XOR
l Complete neural network model:
20
Major components for ANN
l Architectures
• Layer/neuron numbers,
feedforward/backpropagation
l Cost functions
• MSE, cross-entropy
l Algorithms for updating parameters
• Gradient decent
l Activation functions for output/hidden
layers
21
Activation functions
22
A modern MLP (including ReLU and
softmax) for classification
23
Output unites
l Linear Units for Gaussian Output Distributions
• Maximizing the log-likelihood is then equivalent to
minimizing the mean squared error.
24
Output unites
l Softmax Units for Multinoulli Output Distributions
• Used as the output of a classifier for n classes
• Maximize log-likelihood:
越大越好 越小越好
• An output saturates to 1 when the corresponding
input zi is maximal and much greater than all other inputs.
• An output can also saturate to 0 when zi is not
maximal and the maximum is much greater.
25
Output unites
• Softmax is a way to create a form of competition
between the units that participate in it.
• From a neuroscience point of view, lateral inhibition is
believed to exist between nearby neurons, that is,
winner-take-all.
26
Hidden Units
l Design of hidden units in an extremely active
area of research.
l ReLUs are an excellent default choice.
l Although not differentiable at all point, it is still
okay to use for gradient-based learning
algorithm.
• Use left or right derivative, instead.
l Hidden units compute:
• An affine transformation
• An element-wise nonlinear function g(z)
27
Generalizations of ReLUs: Maxout units
l Maxout units:
28
Other hidden units
l Many other types of hidden units are possible, but are
used less frequently. In general, a wide variety of
differentiable functions perform perfectly well.
• Radial basis function
• Softplus:
• Hard tanh
29
Architecture Design
l Architecture: overall structure of the network
• How many units it should have
• How these units should be connected to each other
• How to choose the depth and width of each layer
l Deeper networks often:
• Use far fewer units per layer and
far fewer parameters
• Generalize to the test set
• Are harder to optimize
30
Implementing MLPs with Keras
l Keras is a high-level Deep Learning API that allows you
to easily build, train, evaluate and execute all sorts of
neural networks.
l Its documentation (or specification) is available at
https://keras.io.
Creating
Using the model Training Using the
Keras to Compiling and model to
using the the model
load the evaluating make
sequential
dataset API the model predictions
32
Practice: Building an Image Classifier
Using the Sequential API
1. Using Keras to load the dataset
33
2. Creating the model using the
sequential API
Method 1: adding layers one by one
34
2. Creating the model using the
sequential API
35
2. Creating the model using the
sequential API
To get a model’s list of
layers using the layers
attribute, or use the
get_layer() method to
access a layer by name
We’ll discuss initializers further in Chapter 11, and the full list is at 36
https://keras.io/api/layers/initializers.
3. Compile the model
37
4. Training and evaluating the modal
The fit() method returns a History object containing the training parameters
(history.params), the list of epochs it went through (history.epoch), and most
importantly a dictionary (history.history) containing the loss and extra metrics it
measured at the end of each epoch on the training set and on the validation
set (if any).
38
4. Training and evaluating the modal
39
4. Training and evaluating the modal
40
5. Using the model to make predictions
41
More examples using API
l Building an Image Classifier Using the
Sequential API
l Building a Regression MLP Using the
Sequential API
l Building Complex Models Using the Functional
API
l Using the Subclassing API to Build Dynamic
Models
l Saving and Restoring a Model
42
Better Generalization with Greater Depth
l Empirical results showing that deeper networks
generalize better when used to transcribe multi-digit
numbers from photographs of addresses.
43
Large, Shallow Models Overfit More
n Deeper models tend to perform better.
44
Back-Propagation Algorithm
l When we use a feedforward neural network to
accept an input x and produce an output yˆ,
information flows forward through the network.
• Forward propagation: The inputs x provide the initial
information that then propagates up to the hidden units at
each layer and finally produces yˆ.
• During training, forward propagation can continue onward until
it produces a scalar cost J (θ).
l Back-propagation algorithm: To allow the
information from the cost to then flow backwards
through the network, in order to compute the
gradient,
• while another algorithm, such as stochastic gradient descent, is
used to perform learning using this gradient
45
Back-Propagation
l During inference:
l During training:
l Backpropagation:
compute gradients
47