10_Document1

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Lesson 07 – Counter propagation and statistical training

7.1 Introduction
Lesson 06 discussed Delta learning rule and the back propagation training algorithm
for training of ANN. It was noted that Delta learning rule has come out as a learning
rule with mathematical basis. Back propagation offers a unique solution to train
multi-layer ANN entirely in the supervised mode. We noted that back propagation
algorithm involves so many calculations. As a result this algorithm appears to be very
time consuming at the training time. However, back propagation algorithms can
model any real world problem requiring supervised training into multi layer ANN. This
lesson discuses some issues with BP and various extension to training algorithms that
can cover some limitations in BP training. As such we discuss training in counter
propagation networks and the statistical training.

7.2 Limitations of Back propagation


Despite back propagation training algorithm offers a unique way to train multi layer
neural networks entirely in the supervised mode, the algorithm has some limitations.
Among other limitations the following are frequently cited.

7.2.1 Extensive training time


A Back propagation algorithm is very time consuming in general. This is mainly due to
extensive calculations involved in the application of the algorithm. This is a crucial
issue since most ANN comprises of hundreds of neurons in each layer. We need to
calculate, net, f(net), f’(net) and weight change for all neurons. Sometime, we have
to discard longer training sessions, due to increase of error beyond the predefined
maximum error.

However, nowadays computers are so powerful and provide facilities for online
training of ANN. This has been practiced very well in training of flight simulators, by
receiving training data from the pilots while they fly. Therefore, longer training time
has becoming obsolete issue about ANN, especially in view of the emerging powerful
computer technologies.

7.2.2 Local minimum problem


All theories are valid until we find a scenario, where the theory is not valid. This is
applicable to ANN too. In fact, it is little serious in machine learning technique like
ANN. Why? For instance, ANN can provide a solution, but there is no guarantee that
this particular solution is the best. Within the boundaries of our training sessions, we
might find this is the best solution. There can be a better solution if we do more
experiments. Our inability find the globally best solution is called the local minimum
problem. Only way to overcome this problem would be trying of large number of
network architectures under various parameter settings and finding the best one out
of these. This is why we need to do experiments to find the best ANN that can model
our problem with the least possible maximum error. Testing of the network solution
should also exemplify a higher accuracy rate.
As such, an ANN solution must accepted on the basis of a large number of experiments
carried out to test the performance of the network.

7.2.3 Paralysing a training session


Sometimes, an ANN training session falls into a situation where error of the training
cycles no longer increases or decreases, but stagnates at some value below the
predefined Emax. So any attempt improve training session will not be accepted by the
training session. Being stagnated or paralyzed is worst than increasing the error. This
is because, paralyzed mean no response, what so ever. This is an issue as we expect
the error to goes down over the training cycle. This is the whole point behind Delta
learning rule, on which back propagation algorithm works.

With the use of Delta learning rule, it is quite natural that a network can be easily fall
into not responding state. Why? As we have already seen, the weight changes
produced by Delta learning rule are always very small. Thus such changes are not big
enough to make a feel to the network that something is happening. Therefore, in
order to produce a tangible change in a training session, researches have added non-
zero term to Delta learning rule. This term is called momentum. This is similar to give
a push if a vehicle cannot move or start.

Momentum for a neuron has been defined as a constant, , times the difference
between the weights in the previous two consecutive cycles.

Momentum = (wt – wt-1)

Now, we can modify the Delta learning rule in the following manner to address the
issue of network paralysis.

w = X + (wt – wt-1)

We can change  (momentum coefficient) as another parameter to control a training


session. Note that initialising of weights, changing of , , f(net) and  targets for
single objective of finding a network which trains subject to a least possible maximum
error.

With regard to human too, learning is always done so that the error in learning is kept
as small as possible, virtually zero.

Next we discuss some more strategies to train ANN. In a broader sense, these
strategies have been developed as kind of extensions for back propagation training
algorithm.
Activity 7.1
Do a survey on the use of back propagation training algorithm for modelling some real
world problems. Discuss specifications of such networks (e.g. number of layers,
neurons in each layer, activation function, Emax and accuracy rate).

7.3 Counter propagation networks


The name counter propagation stand for computing of weight changes in the opposite
direction to that of back propagation algorithm. In other words, counter propagation
approach describes how multi layer ANN can be trained in the supervised mode,
computing from the input layer to output layer, but not in the way that back
propagation algorithm works.

In fact, we have already mentioned that supervised training data can be modelled
into an ANN which uses a combination of unsupervised and supervised training
algorithms.

Two researchers, Kohonen and Grossberg have proposed a special kind of network
architecture and method to train a multi layer ANN by using unsupervised and
supervised training algorithms. This approach is much simpler than the use of
algorithms such as Hebbian and Peceptron together. Let us discuss about Kohonen and
Grossberg approach. This is known as Kohonen Self organizing networks.

7.4 Kohonen Self-organizing Networks


In the case of the use of learning algorithms such as Hebbian, Perceptron, Delta
learning and Backpropagation, we treated all neurons equally, and updated weights
with no exception. However, it would be interesting to notice that some neurons in an
ANN may fire with higher outputs. Kohonen named such neurons as wining-neurons in
an ANN. He also proposed to change weights only on wining neurons, without
changing weights on all neurons. This reduces time required for calculating weight
changes for all neurons. The kind of ANN that update their weight as per wining
neurons are called self-organizing networks. This is because, such a selective weight
change amounts the network to self-organize to accommodate a better learning
session.

Kohonen has proposed a self-organizing network with two layers. In Kohonen self-
organizing networks, the input layer and the output layer are named as Kohonen layer
and Grossberg layer respectively. The number of neurons in layers is arbitrary.
Kohonen proposed an unsupervised strategy to train the input layer, while Grossberg
proposed a supervised strategy to train the output layer. They have proposed their
own learning algorithms to train two layers. More importantly, in the kohonen layer,
weight changes will be computed only for the wining-neurons. Only the wining
neurons are allowed to produce outputs for the Grossberg layer. Figure 7.1 shows
essentials of two-layer Kohonen Self-organizing network.
A Grossberg Layer

B
E O1
X1
F O2
C

X2 D
G O3

Kohonen Layer

Figure 7.1 Kohonen Self-organizing Networks

Figure 7.1 shows how Kohonen self-organizing network accept the input X = [X1, X2]
and trained for the desired output [O1 O2 O3]. Note that the Kohonen layer has
identified neuron C as the wining neuron (neuron with maximum net value) and it
alone becomes the input for the Grossberg layer.

In practice, the self-organizing networks are able to model real world problems that
require a supervised training strategy as a whole.

7.4.1 Training the Kohonen layer


Train in the Kohonen requires some special arrangements to the inputs and the
weights. The learning rule proposed for the Kohonen layer is also different from the
learning rules that we have discussed. The necessary point about traing of Kohonen
layer is as follows

 Normalize the input vector


 Normalize the weight vector
 Calculate net for all neurons using net =  xiwi
 Identify the wining neuron(s) by evaluating maximum net
 Calculate weight change only on winning neuron using
Wnew = Wold + (X-Wold)
 Continue the procedure for all inputs

Note that normalization is a process, where we take the square root of sum of squares
of numbers in a vector. Normalized form of vector X is denoted by  X .

Example 7.1
Normalize the following input vector X and the weight vector W.
X = [1, 2, 1] W = [0.1, -0.2, 0.3]

Answer
X = [1, 2, 1]

 X  = [1, 2, 1]/(12 + 22 + 12)

= [1/6, 2/6, 1/6]


= [0.4, 0.8, 0.4]

W = [0.1, -0.2]

 W  = [0.1, -0.2, 0.3]/(0.12 + -0.22 + 0.32)


= [0.1, -0.2, 0.3] (0.14)

= [0.26, 0.53, 0.80]

In Kohonen layer training also note the difference in the formula to calculate weight
changes. Here in Wnew = Wold + (X-Wold),  is a constant. Note the term X-Wold,
where we subtract old weights from the input vector X. Previously, we have not
computer a term of this kind. It’s a difference in the formula to change weights in the
Kohonen layer.

7.4.2 Training of Grossberg layer


The output of the Kohonen layer works as the input for the Groosberg layer. However,
output of all Kohonen neurons will not be considered to create inputs for the
Groosberg layer. In this case, we set the output of wining neurons in the Kohonen
layer as 1, and output of other neurons as 0. Grossberg layers receive the input and
trained in the supervised mode, using special learning rule. The essential points about
training of Grossberg layer is as follows

• Set the value of Kohonen layer wining neurons (otherwise ) as 1 to


prepare the input for the Grossberg layer
• Normalized the weights and inputs for the Grossberg layer
• Use the formular Wnew = Wold + (D - Wold)Y to calculate weight
change in the Grossberg layers

7.4.3 Feature of Counter propagation networks.


There are so many advantages of counter propagation networks over the back
propagation. More importantly, counter propagation does not involve extensive
calculations as in back propagation. Therefore, training time for counter propagation
networks is less in hundredfold, as compared with the back propagation.

Kohonen layer of counter propagation network can map an input with a large number
of components into a vector with few 1s and many zeros in the Kohonen layer. Such
an input can be then mapped into Grossberg layer with an arbitrary small number of
neurons. As such, counterpropergation network can be used to encode inputs with
large number of components into a vector with a smaller number of components. This
is in fact what we do in data compression. Therefore, counterpropagation networks
have been used or applications such as data compression in audio and video streams
during data transmissions.
Since counter propagation networks can be trained much faster, they can be used as a
look up for back propagation training sessions. For instance, if counter propagation
takes 8 hours to train a real world data set, using back propagation it would definitely
take more than 8 hours. Further number of neurons in each layer of the counter
propagation network may be a good clue about the architecture for back propagation
network. Obviously, since back propagation can handle and arbitrary number of
layers, using back propagation we can achieve a very high level of representation
power for ANN. Using two layers Kohonen networks got a relatively low representation
power. Therefore, we need back rogation and counter propagation would be a look up
such a situation.

7.5 Statistical Training

Statistical training is yet another paradigm for training of ANN. This strategy can be
used to train multi layer ANN in the supervised mode too. Statistical training also
follows the same McCulloch-Pitts model to calculate net and f(net). However,
statistical training offers a different method to change the weights on neurons. It is a
surprisingly simple approach, yet so powerful to change weights.

Statistical training suggests to pick a weight randomly, and to change the particular
weight randomly. Well, whole subject of statistics is central at the concept of
randomness. As such in statistical training we go by random weight change. Recall
that, in contract all Hebbian, Peceptron and Delta learning rules offer a deterministic
method (we know the exact value of weight change) to change weight. It is
questionable whether our brains fire the neurons randomly store the learning affect in
a random manner. Perhaps, our brain may work through random changes in the
weights sometime, but it may not be the case all the time. Therefore, statistical
training has no any biological justification as such.

So after doing a random weight change, we will check whether the error has been
reduced under this weight change. Reducing error has also been named as improving
the objective function. If the error has been reduced, we accept the weight change
and go for the next input and continue with the training.

If error has been increased due to random weight change, you might say that we
reject the weight change and discard the training session. However, here is the
beauty of statistical training. It does not reject a weight change, just because error
has been increased. Instead, it uses a probability distribution (e.g. Boltzman
Distribution) to check whether the weight change can be accepted under a predefined
probability. If the probability calculated for the weight change is higher than
predefined probability, we accept the weight change; otherwise we reject the weight
change.
The above process gives a very important message that we hardly reject a weight
change. Thus statistical training is not wasteful as per calculations. As a result
statistical training is so dynamic and keeps the network moving all the time without
being stagnated or paralyzed. This why some researchers have used statistical training
to address the issue of networks being paralyzed with the use of back propagation
training.

7.6 Summary
This lesson discussed some more approaches to train ANN. As such we studied
Kohonen self-organizing networks as counter propagation networks. Kohonen networks
can be trained for training of supervised data in a special kind of multilayer ANN with
two layers. Here we studied a special concept of wining neuron and weight changing
only on those neurons in the Kohonen layer. This process causes to reduce the training
time of counter propagation networks drastically. This lesson also discussed statistical
train, an approach to train ANN in the supervised mode. Statistical training introduced
random weight change on the weight as oppose to deterministic weight change using
other learning rules.

You might also like