Professional Documents
Culture Documents
Artificial Neural Network
Artificial Neural Network
Artificial Neural Network
Network:
Artificial Neural Networks Neuron / Node
Σ
Connection weight
Hidden layer 1
Hidden layer 3
biological neuron artificial neuron Hidden layer 2
image analysis
control
1
Activation function o = f ∑ w i xi − θ
i
• Typical neuron output is • The threshold θ is used for bias-effect
• Activation function f can take different forms
o = f ∑ w ixi − θ 1
i sigmoid(x) =
1+ e −x
• f is the activation function, w are the connection 1 if x > 0
weights, x are the inputs and θ is the threshold signum(x) = 0 if x = 0
− 1 if x < 0
1 if x > 0
step(x) =
0 otherwise
• Learning is used to update the weights w • The network is provided with input data for which
the corresponding output data is known
• Neural networks can be classified by the
algorithm used for learning • During learning output data of the network is
compared to the desired output
• Three most used learning mechanisms are • Different learning rules are used to adjust the
– Supervised weights w to obtain desired output
– Unsupervised
– Reinforced
• Does not require desired outputs • Mimics the way humans learn when
• Explores underlying structure or interacting with physical environment
correlations in the data and organizes • Network is presented with inputs, but not
patterns into categories with desired outputs
• SOM • If network delivers desired output, the
connections leading to this are
strengthened, otherwise weakened
• Can be used for control
2
Fundamentals of connectionist
McCulloch - Pitts Model
modeling
• In this section we will get familiar with • Output: o = f ∑ w i x i − θ
i
some of the earlier developments in the Bias input θ
Perceptron Perceptron
• Output: o = f ∑ w i xi − θ • Was developed for pattern classification of
i
linearly separable sets
Bias input θ
w1
x1
w2
x2 .
.
wI-1
. Σ Output of perceptron
xl-1
wl
xl
Learning
Mechanism
+
- • As long as the patterns are linearly separable
(Hebbian Rule)
the learning algorithm should converge
Target output
Perceptron Perceptron
3. Compute output
l
o = f ∑ w i xi − θ
∑w x −θ = 0
i
i i x1 w1
Σ Output of perceptron
i=1 w 1x1 + w 2 x 2 − θ = 0
x2 w2
w i = η t − f ∑ w i x i − θ x i
(Hebbian Rule)
w2 w2
i Target output
3
Adaline Adaline
w1
• Learning rule is formally derived using the
x1
x2
w2
.
gradient descent algorithm
.
. Σ Output of adaline
xl-1
wI-1
Wl
• Adjust weights by incrementing them by
xl
an amount proportional to the gradient of
LMS learning
mechanism +
-
the cumulative error of the network
Target output
w i = η t − ∑ w i x i − θ x i
0.3
i 0.2
Where η is a positive number ranging from 0 to 1, representing
0.1
the learning rate
0
5. If weights do not reach steady-state values go to 2. 0 0.2 0.4 0.6 0.8 1
2
w1 • Perceptron and Adaline suffer from their
w2
1.5
θ inability to train patterns belonging to
1 nonlinearly separable spaces, for example
0.5
XOR
0
-2
number of Adaline units in parallel →
-2.5
0 1000 2000 3000 4000 5000 6000
Madaline
Evolution of the parameters during training
4
Madaline Major classes of modern
neural networks
• Build a system of perceptrons that solves
the XOR problem. Use sign as • The multilayer perceptron
0 0 -1
activation function • Radial basis function networks
0 1 1
1 0 1 • Kohonen’s self-organizing network
1
1 1 -1 • Recurrent neural networks
0
0 1
network
Σ Σ Σ Output
Input
• MLP consists of an input layer, one or layer layer
Network
• The amount of hidden layers depends on input
Σ Σ Σ Σ
the task
Σ Σ Σ
5
Backpropagation learning Backpropagation learning
• Let’s test the on-line backpropagation • If the weight is connected to the output layer
equations for a simple case with one we get the following update rule
hidden layer with 3 sigmoids and a linear w new,i = w old,i + η(t − oout )o j
w new,bias = w old,bias − η(t − oout )
node in the output layer
• If the weight is connected to a node in the
OOUT
OOUT = -w7+w8o1+w9o2+w10o3 hidden layer we get
w new, j = w old, j + η(t − o out )w old,io j (1− o j )IN
7 8 9 10 o1 = 1/(1+exp(-tot1))
tot1 = -w1+w2 IN
o3 = 1/(1+exp(-tot3))
tot3 = -w5+w6 IN
-1 IN
0.6
0.4
0.2
0
-1
-0.2
-0.4
-0.6
-1 IN -0.8
-1
0 1 2 3 4 5 6 7
Improvements of Backpropagation
Example
and other methods
• Note that the the derivative of the sigmoid • The are many variants of backpropagation
function f(tot) is simply: that try to improve the basic algorithm
df (tot ) • On such method is variant is BP with
= f (tot )(1 − f (tot ))
dtot momentum
where
1 dE
f (tot ) = w new = −η + γ w old
1 + exp (− tot ) dw
• Show this fact!!
• Also a variable learning rate can improve
the results
6
Batch Backpropagation Backpropagation
• For the case of batch training, all samples • The main drawback of BD is that it has slow
are inspected before the weights are convergence
updated. The error signals for all patterns • Numerous other training approaches have been
in the batch are included in the sum: suggested
• To mention some
w ij = η∑ δi,k o j.k – Levenberg-Marquardt
k
– Genetic Algorithms
– Simulated annealing
– Extended Kalman-filter
– Unscented Kalman-filter
RMS
7
Radial basis function networks Radial basis function networks
• The activation functions it the hidden layer • The standard learning algorithm RBF network is
are symmetrical and they get their maximum the hybrid technique
values at the center of the function • The hybrid technique consists of two stages
• Most commonly used functions are • An unsupervised algorithm (k-means, maximum
Gaussians likelihood, self-organizing map etc) to specify the
parameter of the radial basis functions
− x − vi 2
• A supervised algorithm to update the weights
gi (x ) = exp
2σ i2 between the hidden layer and the output layer. This
step is normally performed with standard least
squares
Kohonen’s self-organizing
Kohonen network
network
• Also known as Kohonen self-organizing
map or self-organizing map
• Mapping of the input vectors to the output
layer results in reduction of the
dimensionality of the input space
• Output units are typically represented as a
two or three dimensional grid
8
Kohonen’s self-organizing
Kohonen’s self-organizing network
network
• Another example • Steps of the learning algorithm
• Initializing weights
• Choose an input from the input data set
• Select the winning output unit (BMU = best
matching unit)
• Update the weights of the winning unit
• Update the weights of the neighboring units
I = x - w c = min x - w ij
ij
[ ]
w (k ) + α(k ) x − w ij (k ) , if (i, j) ∈ Nc (k)
w ij (k + 1) = ij
w ij (k ) if (i, j) ∉ Nc (k)
r2
ŷ 2
1
x3
0.5
fault 2
The training data has to
0
1.5
include faulty data with Fault 1
1
1.5 knowledge of the fault
1
0.5
0.5
x2 0 0
x1
9
Monitoring with SOM – Distance to
Monitoring with SOM – ”regression”
BMU
• The model is based on fault free data • Find the BMU for the given input
• When new data is fed to the model the • Calculate distance between BMU and
BMU is found based on a subset of the input
input vector (the monitored variable y is • If the distance exceeds the threshold a
left out) fault has been detected
• Check the value of y for the BMU
• Calculate the residual between measured
y and y from BMU
input u
10
Introductory linear example Introductory linear example
• If we make an ARX model: • We can now use this model to make one step
N = length(y); predictions and compare them with the true
D = [y(1:N-2) y(2:N-1) u(2:N-1)];
yout = y(3:N); state. Looks rather good (SSQ = 0.08)
0.15
par = pinv(D)*yout
par = 0.1
-0.05575832188004 0.05
0.00169077887767
0.26950929371295
0
-0.05
Sampling interval: 1
-0.15
-0.2
0 50 100 150 200 250 300 350 400
0.8
0.1
0.6
0.4
parameter values
0.05
0.2
0
0
-0.2
-0.4
-0.05
-0.6
-0.8
-0.1
0 50 100 150 200 250 300 350 400
-1
0 0.005 0.01 0.015 0.02 0.025
noise variance
11
Architectures Architectures – Symmetrical
• Defining feature is the use of feedback signals • Hopfield network
• Self-loops or backward connections – Feedback links with symmetrical weights from each
output signal to all other output nodes
• Two main categories
– No self-loops
– Symmetrical weight connections
– Asymmetrical weight connections
12
Neural networks for system
NARX (series parallel)
identification
• Each of the main linear identification
methods (ARX, OE, FIR, ARMAX) have
analogous neural variants: NARX, NOE,
NFIR, NARMAX
RMStrain=2.3*10-4
RMStest1=2.1*10-4
RMStest2=3.2*10-4
13
Example: The pendulum Example: The pendulum
Test set 2
• It is necessary to make some additional
0.25 comments regarding the pendulum
0.20
0.15 • Feed-forward approaches (NARX/ARX)
x des
0.10 can also be used in this case mainly
0.05 because the measurements are noise free
0.00
50 100 150 200
-0.05 • The pendulum model is almost linear as
x net
-0.10 long as the angles are small: sin(x)≈x
-0.15
-0.20
• It turns out that a simple linear ARX model
Pattern index in this case gives a really good fit
ax (t − τ )
50 100 150 200
-0.05
x net
dx
= − bx
-0.10
dt 1+ x c (t − τ )
-0.15
-0.20
Pattern index
Case study: NARX model of the Case study: NARX model of the
Mackey-Glass system Mackey-Glass system
• We integrate the system with ODE4 1.6
states 0.6
0.2
0 200 400 600 800 1000 1200 1400 1600 1800 2000
t
14
Case study: NARX model of the Case study: NARX model of the
Mackey-Glass system Mackey-Glass system
model # of params RMS train RMS test
• We use 500 data Linear model 5 9.62E-02 9.66E-02
RMS train
0.06
RMS test
0.00
A model with three hidden neurons is
0 20 40 60 80
# of parameters
100 120 140
quite good result with 3 hidden nodes
15