Professional Documents
Culture Documents
9 Learning in NN
9 Learning in NN
Learning in Neural
Networks
Biological inspirations
Some numbers…
The human brain contains about 10 billion nerve
cells (neurons)
Each neuron is connected to the others through
10000 synapses
2
Biological neuron
A neuron has
A branching input (dendrites)
A branching output (the axon)
The information circulates from the dendrites to the axon via
the cell body
Axon connects to dendrites via synapses
Synapses vary in strength
Synapses may be excitatory or inhibitory
3
⎛ n −1
⎞
w0 y = f ⎜ w0 + ∑ wi xi ⎟
⎝ i =1 ⎠
x1 x2 x3
4
Activation functions
20
18
16
14
12
Linear
y=x
10
0
0 2 4 6 8 10 12 14 16 18 20
1.5
1
Logistic
0.5
1
y=
0
1 + exp(− x)
-0.5
-1
-1.5
-2
-10 -8 -6 -4 -2 0 2 4 6 8 10
1.5
1
Hyperbolic tangent
exp( x) − exp(− x)
0.5
y=
0
exp( x) + exp(− x)
-0.5
-1
-1.5
-2
-10 -8 -6 -4 -2 0 2 4 6 8 10
Neural Networks
6
Feed Forward Neural Networks
Input units/ output units/ hidden
units
Output layer The information is propagated
from the inputs to the outputs
2nd hidden
Time has no role (NO cycle
layer
between outputs and inputs)
No links between units in the
same layer, no links backward
1st hidden
to a previous layer, no links that
layer
skip a layer.
Perceptron: no hidden units
Multilayer networks: one or
more hidden units
x1 x2 ….. xn
8
Learning
2 types of learning
The supervised learning
The unsupervised learning
Supervised learning
10
Unsupervised learning
11
12
Other properties
Adaptivity
Adapt weights to environment and retrained easily
Generalization ability
May provide against lack of data
Fault tolerance
Graceful degradation of performances if damaged => The
information is distributed within the entire net.
13
Classification (Discrimination)
14
Example
15
16
Classical neural architectures
Perceptron
Multi-Layer Perceptron
Radial Basis Function (RBF)
Self Organizing Maps (SOM)
Many other architectures
17
Perceptron
Rosenblatt (1962) +
++
Linear separation + + y = +1
+ +
Inputs :Vector of real values + + + ++ +
+
+ + +
+ + ++ +
Outputs :1 or -1 + + + ++
+
+ +
+ +
y = sign(v) y = −1 + +
++
∑ v = c0 + c1 x1 + c2 x2 c0 + c1 x1 + c2 x2 = 0
c0 c2
c1
1 x1 x2
18
Learning (The perceptron rule)
19
20
Multi-Layer Perceptron
Output layer
One or more hidden
2nd hidden layers
layer Sigmoid activations
functions
1st hidden
layer
Input data
21
Learning
Back-propagation algorithm
n Credit assignment
net j = w j 0 + ∑ w ji o i ∂E
i δj =−
o j = f j (net ) ∂net j
j
∂E ∂ E ∂ net j
∆ w ji = −α = −α = αδ j o i
∂ w ji ∂ net j ∂ w ji
∂E ∂o j ∂E
δj =− =− f ′( net j )
∂ o j ∂ net j ∂o j
1 ∂E
E = ( t j − o j )² => = − (t j − o j )
2 ∂o j If the jth node is an output unit
δ j = ( t j − o j ) f ' ( net j )
22
∂E κ ∂E ∂netκ
= ∑k = −∑k δ k wkj
κ
∂o j ∂netκ ∂o j
δ j = f ' j (net j )∑k δ k wkj
κ
Momentum term to smooth
The weight changes over time
∆w ji (t ) = αδ j (t )oi (t ) + γ∆w ji (t − 1)
w ji (t ) = w ji (t − 1) + ∆w ji (t )
23
Three-Layer Abitrary
A B
(Complexity B
Limited by No. A
B A
of Nodes)
Outputs
Radial units
Inputs
25
26
RBF hidden layer units have a receptive field which has
a centre
Generally, the hidden unit function is Gaussian
The output Layer is linear
Realized function
(
f ( x) = ∑ j =1W j Φ x − c j
K
)
2
⎛ x − cj ⎞
(
Φ x − cj ) = exp− ⎜⎜
⎜ σj
⎟
⎟⎟
⎝ ⎠
Where cj is center of a region,
σj is width of the receptive field
27
Learning
2 steps
In the 1st stage, the input data set is used to determine the
parameters of the redial basis functions (center and width)
In the 2nd stage, functions are kept fixed while the second layer
weights are estimated ( Simple BP algorithm like for MLPs)
28
Learning formula
Linear weights (output layer)
∂Ε( n) N
∂E (n)
= ∑ e j (n)G (|| x j − t i (n) ||Ci ) wi (n + 1) = wi (n) − η1 , i = 1,2,..., M
∂wi ( n) j =1 ∂wi (n)
∂E ( n)
t i (n + 1) = t i (n) − η2 , i = 1,2,..., M
∂t i (n)
∂Σ i−1 (n)
= − wi ( n ) ∑
j =1
e j (n)G ' (|| x j − t i (n) ||Ci )Q ji (n) Q ji (n) = [x j − t i (n)][x j − t i (n)]T
∂E (n)
Σ −i 1 (n + 1) = Σ −i 1 (n) − η3
∂Σ i−1 (n)
29
30
Self organizing maps
31
Algorithm
1.initialize w’s 2nd neighborhood
2. find nearest cell
i(x) = argmin || x(n) - wj(n) ||
j
3. update weights of i(x) and its
neighbors
wj(n+1) = wj(n) + η (n)[x(n) - wj(n)]
4. reduce neighbors and η
5. Go to 2
1st neighborhood
32
Adaptation
33
Face recognition
Time series prediction
Process identification
Process control
Optical character recognition
Adaptative filtering
Etc…
34
Conclusion on Neural Networks
Neural networks are utilized as statistical tools
Adjust non linear functions to fulfill a task
Need of multiple and representative examples but fewer than in
other methods
Neural networks enable to model complex static
phenomena (FF) as well as dynamic ones (RNN)
NN are good classifiers BUT
Good representations of data have to be formulated
Training vectors must be statistically representative of the entire
input space
Unsupervised techniques can help
The use of NN needs a good comprehension of the
problem
35