Professional Documents
Culture Documents
10_Document1
10_Document1
10_Document1
7.1 Introduction
Lesson 06 discussed Delta learning rule and the back propagation training algorithm
for training of ANN. It was noted that Delta learning rule has come out as a learning
rule with mathematical basis. Back propagation offers a unique solution to train
multi-layer ANN entirely in the supervised mode. We noted that back propagation
algorithm involves so many calculations. As a result this algorithm appears to be very
time consuming at the training time. However, back propagation algorithms can
model any real world problem requiring supervised training into multi layer ANN. This
lesson discuses some issues with BP and various extension to training algorithms that
can cover some limitations in BP training. As such we discuss training in counter
propagation networks and the statistical training.
However, nowadays computers are so powerful and provide facilities for online
training of ANN. This has been practiced very well in training of flight simulators, by
receiving training data from the pilots while they fly. Therefore, longer training time
has becoming obsolete issue about ANN, especially in view of the emerging powerful
computer technologies.
With the use of Delta learning rule, it is quite natural that a network can be easily fall
into not responding state. Why? As we have already seen, the weight changes
produced by Delta learning rule are always very small. Thus such changes are not big
enough to make a feel to the network that something is happening. Therefore, in
order to produce a tangible change in a training session, researches have added non-
zero term to Delta learning rule. This term is called momentum. This is similar to give
a push if a vehicle cannot move or start.
Momentum for a neuron has been defined as a constant, , times the difference
between the weights in the previous two consecutive cycles.
Now, we can modify the Delta learning rule in the following manner to address the
issue of network paralysis.
w = X + (wt – wt-1)
With regard to human too, learning is always done so that the error in learning is kept
as small as possible, virtually zero.
Next we discuss some more strategies to train ANN. In a broader sense, these
strategies have been developed as kind of extensions for back propagation training
algorithm.
Activity 7.1
Do a survey on the use of back propagation training algorithm for modelling some real
world problems. Discuss specifications of such networks (e.g. number of layers,
neurons in each layer, activation function, Emax and accuracy rate).
In fact, we have already mentioned that supervised training data can be modelled
into an ANN which uses a combination of unsupervised and supervised training
algorithms.
Two researchers, Kohonen and Grossberg have proposed a special kind of network
architecture and method to train a multi layer ANN by using unsupervised and
supervised training algorithms. This approach is much simpler than the use of
algorithms such as Hebbian and Peceptron together. Let us discuss about Kohonen and
Grossberg approach. This is known as Kohonen Self organizing networks.
Kohonen has proposed a self-organizing network with two layers. In Kohonen self-
organizing networks, the input layer and the output layer are named as Kohonen layer
and Grossberg layer respectively. The number of neurons in layers is arbitrary.
Kohonen proposed an unsupervised strategy to train the input layer, while Grossberg
proposed a supervised strategy to train the output layer. They have proposed their
own learning algorithms to train two layers. More importantly, in the kohonen layer,
weight changes will be computed only for the wining-neurons. Only the wining
neurons are allowed to produce outputs for the Grossberg layer. Figure 7.1 shows
essentials of two-layer Kohonen Self-organizing network.
A Grossberg Layer
B
E O1
X1
F O2
C
X2 D
G O3
Kohonen Layer
Figure 7.1 shows how Kohonen self-organizing network accept the input X = [X1, X2]
and trained for the desired output [O1 O2 O3]. Note that the Kohonen layer has
identified neuron C as the wining neuron (neuron with maximum net value) and it
alone becomes the input for the Grossberg layer.
In practice, the self-organizing networks are able to model real world problems that
require a supervised training strategy as a whole.
Note that normalization is a process, where we take the square root of sum of squares
of numbers in a vector. Normalized form of vector X is denoted by X .
Example 7.1
Normalize the following input vector X and the weight vector W.
X = [1, 2, 1] W = [0.1, -0.2, 0.3]
Answer
X = [1, 2, 1]
W = [0.1, -0.2]
In Kohonen layer training also note the difference in the formula to calculate weight
changes. Here in Wnew = Wold + (X-Wold), is a constant. Note the term X-Wold,
where we subtract old weights from the input vector X. Previously, we have not
computer a term of this kind. It’s a difference in the formula to change weights in the
Kohonen layer.
Kohonen layer of counter propagation network can map an input with a large number
of components into a vector with few 1s and many zeros in the Kohonen layer. Such
an input can be then mapped into Grossberg layer with an arbitrary small number of
neurons. As such, counterpropergation network can be used to encode inputs with
large number of components into a vector with a smaller number of components. This
is in fact what we do in data compression. Therefore, counterpropagation networks
have been used or applications such as data compression in audio and video streams
during data transmissions.
Since counter propagation networks can be trained much faster, they can be used as a
look up for back propagation training sessions. For instance, if counter propagation
takes 8 hours to train a real world data set, using back propagation it would definitely
take more than 8 hours. Further number of neurons in each layer of the counter
propagation network may be a good clue about the architecture for back propagation
network. Obviously, since back propagation can handle and arbitrary number of
layers, using back propagation we can achieve a very high level of representation
power for ANN. Using two layers Kohonen networks got a relatively low representation
power. Therefore, we need back rogation and counter propagation would be a look up
such a situation.
Statistical training is yet another paradigm for training of ANN. This strategy can be
used to train multi layer ANN in the supervised mode too. Statistical training also
follows the same McCulloch-Pitts model to calculate net and f(net). However,
statistical training offers a different method to change the weights on neurons. It is a
surprisingly simple approach, yet so powerful to change weights.
Statistical training suggests to pick a weight randomly, and to change the particular
weight randomly. Well, whole subject of statistics is central at the concept of
randomness. As such in statistical training we go by random weight change. Recall
that, in contract all Hebbian, Peceptron and Delta learning rules offer a deterministic
method (we know the exact value of weight change) to change weight. It is
questionable whether our brains fire the neurons randomly store the learning affect in
a random manner. Perhaps, our brain may work through random changes in the
weights sometime, but it may not be the case all the time. Therefore, statistical
training has no any biological justification as such.
So after doing a random weight change, we will check whether the error has been
reduced under this weight change. Reducing error has also been named as improving
the objective function. If the error has been reduced, we accept the weight change
and go for the next input and continue with the training.
If error has been increased due to random weight change, you might say that we
reject the weight change and discard the training session. However, here is the
beauty of statistical training. It does not reject a weight change, just because error
has been increased. Instead, it uses a probability distribution (e.g. Boltzman
Distribution) to check whether the weight change can be accepted under a predefined
probability. If the probability calculated for the weight change is higher than
predefined probability, we accept the weight change; otherwise we reject the weight
change.
The above process gives a very important message that we hardly reject a weight
change. Thus statistical training is not wasteful as per calculations. As a result
statistical training is so dynamic and keeps the network moving all the time without
being stagnated or paralyzed. This why some researchers have used statistical training
to address the issue of networks being paralyzed with the use of back propagation
training.
7.6 Summary
This lesson discussed some more approaches to train ANN. As such we studied
Kohonen self-organizing networks as counter propagation networks. Kohonen networks
can be trained for training of supervised data in a special kind of multilayer ANN with
two layers. Here we studied a special concept of wining neuron and weight changing
only on those neurons in the Kohonen layer. This process causes to reduce the training
time of counter propagation networks drastically. This lesson also discussed statistical
train, an approach to train ANN in the supervised mode. Statistical training introduced
random weight change on the weight as oppose to deterministic weight change using
other learning rules.