9 Learning in NN

CIT 6261:
Advanced Artificial Intelligence
Learning in Neural
Networks
Lecturer: Dr. Md. Saiful Islam
Biological inspirations
Some numbers…
The human brain contains about 10 billion nerve
cells (neurons)
Each neuron is connected to the others through
10000 synapses
Properties of the brain

It can learn, reorganize from experience
It adapts to the environment
It is robust and fault tolerant
2
Biological neuron
A neuron has
A branching input (dendrites)
A branching output (the axon)
The information circulates from the dendrites to the axon via
the cell body
Axon connects to dendrites via synapses
Synapses vary in strength
Synapses may be excitatory or inhibitory
3
What is an artificial neuron ?
Definition : Non linear, parameterized function with

restricted output range
⎛ n −1
⎞
w0 y = f ⎜ w0 + ∑ wi xi ⎟
⎝ i =1 ⎠
x1 x2 x3
4
Activation functions
20
18
16
14
12
Linear
y=x
10
0
0 2 4 6 8 10 12 14 16 18 20
1.5
1
Logistic
0.5
1
y=
0
1 + exp(− x)
-0.5
-1
-1.5
-2
-10 -8 -6 -4 -2 0 2 4 6 8 10
1.5
1
Hyperbolic tangent
exp( x) − exp(− x)
0.5
y=
0
exp( x) + exp(− x)
-0.5
-1
-1.5
-2
-10 -8 -6 -4 -2 0 2 4 6 8 10
Neural Networks
A mathematical model to solve engineering

problems
Group of highly connected neurons to realize compositions
of non linear functions
Tasks
Classification
Discrimination
Estimation
2 types of networks
Feed forward Neural Networks
Recurrent Neural Networks
6
Feed Forward Neural Networks
Input units/ output units/ hidden
units
Output layer The information is propagated
from the inputs to the outputs
2nd hidden
Time has no role (NO cycle
layer
between outputs and inputs)
No links between units in the
same layer, no links backward
1st hidden
to a previous layer, no links that
layer
skip a layer.
Perceptron: no hidden units
Multilayer networks: one or
more hidden units
x1 x2 ….. xn
Recurrent Neural Networks
Can have arbitrary topologies

Delays are associated to a
specific weight
0 1
0 Training is more difficult
0 Performance may be problematic
1
Stable Outputs may be more
0 difficult to evaluate
0 1
Unexpected behavior (oscillation,
chaos, …)
x1 x2
8
Learning
The procedure that consists in estimating the

parameters of neurons so that the whole network can
perform a specific task
2 types of learning
The supervised learning
The unsupervised learning
Supervised learning
The Learning process (supervised)

Present the network a number of inputs and their
corresponding outputs
See how closely the actual outputs match the desired ones
Modify the parameters to better approximate the desired
outputs
The desired response of the neural network in

function of particular inputs is well known.
A “Professor” may provide examples and teach the
neural network how to fulfill a certain task.
10
Unsupervised learning
Idea : group typical input data in function of

resemblance criteria unknown a priori
Data clustering
No need of a professor
The network finds itself the correlations between the data
Examples of such networks :
Kohonen feature maps
11
Properties of Neural Networks
Supervised networks are universal approximators

(Non recurrent networks)
Theorem: Any limited function can be approximated
by a neural network with a finite number of hidden
neurons to an arbitrary precision
Type of Approximators
Linear approximators: for a given precision, the number of
parameters grows exponentially with the number of
variables (polynomials)
Non-linear approximators: the number of parameters
grows linearly with the number of variables
12
Other properties
Adaptivity
Adapt weights to environment and retrained easily
Generalization ability
May provide against lack of data
Fault tolerance
Graceful degradation of performances if damaged => The
information is distributed within the entire net.
13
Classification (Discrimination)
Class objects in defined categories

Rough decision OR
Estimation of the probability for a certain object to belong
to a specific class
Example : Data mining
Applications : Economy, speech and patterns
recognition, sociology, etc.
14
Example
Examples of handwritten postal codes

drawn from a database available from the US Postal service
15
What do we need to use NN ?
Determination of pertinent inputs

Collection of data for the learning and testing phase of
the neural network
Finding the optimum number of hidden nodes
Estimate the parameters (Learning)
Evaluate the performances of the network
IF performances are not satisfactory then review all the
precedent points
16
Classical neural architectures
Perceptron
Multi-Layer Perceptron
Radial Basis Function (RBF)
Self Organizing Maps (SOM)
Many other architectures
17
Perceptron
Rosenblatt (1962) +
++
Linear separation + + y = +1
+ +
Inputs :Vector of real values + + + ++ +
+
+ + +
+ + ++ +
Outputs :1 or -1 + + + ++
+
+ +
+ +
y = sign(v) y = −1 + +
++
∑ v = c0 + c1 x1 + c2 x2 c0 + c1 x1 + c2 x2 = 0
c0 c2
c1
1 x1 x2
18
Learning (The perceptron rule)
Minimization of the cost function : J (c) = ∑k∈M − y kp v k
J(c) is always >= 0 (M is the ensemble of bad classified

examples)
y kp is the target value
Partial cost
If x k is not well classified : J k (c) = − y kp v k
If x k is well classified J k (c ) = 0
Partial cost gradient ∂J k (c)

= − y kp x k
∂c
19
The perceptron algorithm

if y kp v k > 0 (x k is well classified): c(k) = c(k-1)
if y kp v k < 0 (x k is not well classified): c(k) = c(k-1) + y kp x k
The perceptron algorithm converges if examples are

linearly separable
20
Multi-Layer Perceptron
Output layer
One or more hidden
2nd hidden layers
layer Sigmoid activations
functions
1st hidden
layer
Input data
21
Learning
Back-propagation algorithm
n Credit assignment
net j = w j 0 + ∑ w ji o i ∂E
i δj =−
o j = f j (net ) ∂net j
j
∂E ∂ E ∂ net j
∆ w ji = −α = −α = αδ j o i
∂ w ji ∂ net j ∂ w ji
∂E ∂o j ∂E
δj =− =− f ′( net j )
∂ o j ∂ net j ∂o j
1 ∂E
E = ( t j − o j )² => = − (t j − o j )
2 ∂o j If the jth node is an output unit
δ j = ( t j − o j ) f ' ( net j )
22
∂E κ ∂E ∂netκ
= ∑k = −∑k δ k wkj
κ
∂o j ∂netκ ∂o j
δ j = f ' j (net j )∑k δ k wkj
κ
Momentum term to smooth
The weight changes over time
∆w ji (t ) = αδ j (t )oi (t ) + γ∆w ji (t − 1)
w ji (t ) = w ji (t − 1) + ∆w ji (t )
23
Different non linearly separable problems
Types of Exclusive-OR Classes with Most General

Structure
Decision Regions Problem Meshed regions Region Shapes
Single-Layer Half Plane A B
Bounded By B
A
Hyperplane B A
Two-Layer Convex Open A B

Or B
A
Closed Regions B A
Three-Layer Abitrary
A B
(Complexity B
Limited by No. A
B A
of Nodes)
Neural Networks – An Introduction Dr. Andrew Hunter 24

Radial Basis Functions (RBFs)
Features
One hidden layer
The activation of a hidden unit is determined by the distance
between the input vector and a prototype vector
Outputs
Radial units
Inputs
25
Decision surfaces of MLP vs RBF
26
RBF hidden layer units have a receptive field which has
a centre
Generally, the hidden unit function is Gaussian
The output Layer is linear
Realized function
(
f ( x) = ∑ j =1W j Φ x − c j
K
)
2
⎛ x − cj ⎞
(
Φ x − cj ) = exp− ⎜⎜
⎜ σj
⎟
⎟⎟
⎝ ⎠
Where cj is center of a region,
σj is width of the receptive field
27
Learning
The training is performed by deciding on

How many hidden nodes there should be
The centers and the sharpness of the Gaussians
2 steps
In the 1st stage, the input data set is used to determine the
parameters of the redial basis functions (center and width)
In the 2nd stage, functions are kept fixed while the second layer
weights are estimated ( Simple BP algorithm like for MLPs)
28
Learning formula
Linear weights (output layer)
∂Ε( n) N
∂E (n)
= ∑ e j (n)G (|| x j − t i (n) ||Ci ) wi (n + 1) = wi (n) − η1 , i = 1,2,..., M
∂wi ( n) j =1 ∂wi (n)
Positions of centers (hidden layer)

∂E (n) N
= 2 wi (n)∑ e j (n)G ' (|| x j − t i (n) ||Ci )Σ i−1[x j − t i (n)]
∂t i (n) j =1
∂E ( n)
t i (n + 1) = t i (n) − η2 , i = 1,2,..., M
∂t i (n)
Spreads of centers (hidden layer)

∂E (n) N
∂Σ i−1 (n)
= − wi ( n ) ∑
j =1
e j (n)G ' (|| x j − t i (n) ||Ci )Q ji (n) Q ji (n) = [x j − t i (n)][x j − t i (n)]T
∂E (n)
Σ −i 1 (n + 1) = Σ −i 1 (n) − η3
∂Σ i−1 (n)
29
MLPs versus RBFs

Classification
MLPs separate classes via
hyperplanes
RBFs separate classes via
hyperspheres X2 MLP
MLPs computes faster
Learning
MLPs use distributed learning
RBFs use localized learning X1
RBFs train faster
Structure
MLPs have one or more hidden X2
layers RBF
RBFs have only one hidden layer
RBFs require more hidden
neurons
X1
30
Self organizing maps
The purpose of SOM is to map a multidimensional input

space onto a topology preserving map of neurons of
lower dimension
Preserve topological structure so that neighboring neurons
respond to similar input patterns
The topological structure is often a 2 or 3 dimensional space
Each neuron is assigned a weight vector with the same
dimensionality of the input space
Input patterns are compared to each weight vector and
the closest wins (Euclidean Distance)
31
The activation of the neuron is

spread in its direct neighborhood
=>neighbors become sensitive to the
same input patterns
The size of the neighborhood is
initially large but reduce over time =>
Specialization of the network
Algorithm
1.initialize w’s 2nd neighborhood
2. find nearest cell
i(x) = argmin || x(n) - wj(n) ||
j
3. update weights of i(x) and its
neighbors
wj(n+1) = wj(n) + η (n)[x(n) - wj(n)]
4. reduce neighbors and η
5. Go to 2
1st neighborhood
32
Adaptation
During training, the “winner”

neuron and its neighborhood
adapts to make their weight
vector more similar to the input
pattern that caused the
activation
The neurons are moved closer
to the input pattern
The magnitude of the
adaptation is controlled via a
learning parameter which
decays over time
33
Neural Networks (Applications)
Face recognition
Time series prediction
Process identification
Process control
Optical character recognition
Adaptative filtering
Etc…
34
Conclusion on Neural Networks
Neural networks are utilized as statistical tools
Adjust non linear functions to fulfill a task
Need of multiple and representative examples but fewer than in
other methods
Neural networks enable to model complex static
phenomena (FF) as well as dynamic ones (RNN)
NN are good classifiers BUT
Good representations of data have to be formulated
Training vectors must be statistically representative of the entire
input space
Unsupervised techniques can help
The use of NN needs a good comprehension of the
problem
35

9 Learning in NN

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

9 Learning in NN

Uploaded by

Copyright:

Available Formats

CIT 6261:

Advanced Artificial Intelligence

Lecturer: Dr. Md. Saiful Islam

 Properties of the brain

What is an artificial neuron ?

 Definition : Non linear, parameterized function with

 A mathematical model to solve engineering

Recurrent Neural Networks

 Can have arbitrary topologies

 The procedure that consists in estimating the

 The Learning process (supervised)

 The desired response of the neural network in

 Idea : group typical input data in function of

Properties of Neural Networks

 Supervised networks are universal approximators

 Class objects in defined categories

Examples of handwritten postal codes

What do we need to use NN ?

 Determination of pertinent inputs

 Minimization of the cost function : J (c) = ∑k∈M − y kp v k

 J(c) is always >= 0 (M is the ensemble of bad classified

Partial cost gradient ∂J k (c)

 The perceptron algorithm

 The perceptron algorithm converges if examples are

Different non linearly separable problems

Types of Exclusive-OR Classes with Most General

Two-Layer Convex Open A B

Neural Networks – An Introduction Dr. Andrew Hunter 24

Decision surfaces of MLP vs RBF

 The training is performed by deciding on

 Positions of centers (hidden layer)

 Spreads of centers (hidden layer)

MLPs versus RBFs

 The purpose of SOM is to map a multidimensional input

 The activation of the neuron is

 During training, the “winner”

Neural Networks (Applications)

You might also like

Properties of the brain

Definition : Non linear, parameterized function with

A mathematical model to solve engineering

Can have arbitrary topologies

The procedure that consists in estimating the

The Learning process (supervised)

The desired response of the neural network in

Idea : group typical input data in function of

Supervised networks are universal approximators

Class objects in defined categories

Determination of pertinent inputs

Minimization of the cost function : J (c) = ∑k∈M − y kp v k

J(c) is always >= 0 (M is the ensemble of bad classified

The perceptron algorithm

The perceptron algorithm converges if examples are

The training is performed by deciding on

Positions of centers (hidden layer)

Spreads of centers (hidden layer)

The purpose of SOM is to map a multidimensional input

The activation of the neuron is

During training, the “winner”