Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Lesson 08 – Deign and development of ANN

8.1 Introduction
Lesson 7 discussed conterpropagation networks and statistical training. Both strategies can
be used to train multi layer ANN in supervised training. Conterpropagation networks can be
trained very fast as compared with backpropagation training. In contrast, statistical
training always offers to accept weight changes and makes the training session so dynamic.
Statistical training and conuterpropergation networks offer some solutions to issues in
backpropagation. This is because, backpropagation training is considered as the most
heavily used ANN training algorithm to date. Now we are going to learn some guidelines for
design and training of ANN. Note that these guidelines are rather heuristics-based and can
be used only for getting some idea about the design and training of ANN.

8.2 Major concerns - Deign and development of ANN


The major concerns about design and training of ANN can be expressed from various
viewpoints. Perhaps the following concerns would be the most appropriate to look at
concerns about design and development of ANN.

• Preparation of inputs
• Design of network architecture
• Controlling of training sessions

It should be noted that there are large number of freely available toolkits for development
of ANN. However, your knowledge about the above three aspects is essential for the
effective use of such tools as well. In fact, many tools hide those aspects from the user and
work as a black-boxes for training ANN. Therefore, we discuss those aspects in little details.

8.2.1 Preparation of inputs


As you remember net is calculated as weighted sum of weight and components of an input.
Therefore, unless we make weights and input components as small as possible net would be
a big value that cannot be handled by memory allocations in standard computers.

In general we can divide components in an input vector by a large number such as 100 to
make them small. Alternatively, input vector can also be normalized to make its
components smaller. Normalization has been accepted as a better technique for this
purpose.

It should be noted that weight initialization for an ANN must be done not only by setting
small values but also by using a combination of positive and negative values. Otherwise, if
all weights are positive, although they are small, at one point, produced net value can be
considerably large.

It is also encouraged not to use 0 as the values of components of an input or weights. This is
because; multiplication of any number by zero would be zero. Therefore, you may use an
encoding for 0 as a number like -1, to avoid this issue.
Bias
There can be an input which has 0 as the value of all components of the input, but its
desired output is non-zero. Think of an input pair like X = [0, 0, 0] D = [1, 1]. Here whatever
the weights that you introduced, you never get 1 as the net, which is the output. In order
to obtain a non-zero output through the net, we can do a modification to all the inputs.
The modification neither is the addition of nor-zero components, usually 1, to all inputs.
This quantity is called the bias. For instance, once we use the bias, the above input will be
modified as X = [0, 0, 0, -1].

In fact, introducing a bias for all inputs has been a customary, regardless of whether an
input has all components zero. Of course, we apply the bias for all neurons including the
ones which are in hidden layers and the output layer. This does not make any harm.

8.2.2 Design of network architecture


As we already know, higher the number of layers of an ANN, higher the representational
power. More importantly we mentioned that three layer architecture of ANN can model any
real world problem. Now it would be an interesting question to ask how we decide on
number of neurons in each layer. Note that all these guidelines are discussed with regard to
supervised learning. This is because, the supervised learning is the most useful and the
difficult training strategy to use in ANN.

The following heuristics guide how we can decide on the network architecture or topology
comprising multiple layers with several neurons.

Number of neurons in the output layer


The first heuristic about the network architecture determines the nature of the output
layer. The number of components in an output can be taken as the number of neurons for
the output layer. This is such a powerful heuristic to model ANN.

For instance, if a data set has desired outputs as D1 = [1, 0, 1, 1], we introduce four
neurons in the output layer. The decision about the number of neurons in the output layer
does not depend on the nature of the inputs.

Number of neurons in the input layer


Input layer of an ANN can have one or more neurons. If you have more neurons, the learning
affect will be considerably distributed. Having introduced more neurons implies the need
for so many calculations, but will be a good ANN model. For example, the input vector X =
[1, 1, 0], can be applied to an input layer with any number of neurons as one, two, three,
etc.

However, if we use the number of components in an input as a measure for number of


neurons for the input layer, this would be a mathematically handy heuristics. This is
because, when number of components in an input and the umber of neurons in the input
layer are the same, we will get a square matrix as the weight matrix of the input layer.
Generally, there are so many mathematical theories to analyse square matrices. Such
analysis would be useful for further investigation of behaviour of ANN.
Number of neurons in the hidden layers
The number of hidden layers and the number of neurons in hidden layers cannot be decided
before hand. In fact, number of hidden layers and their neurons are some control
parameters for training a neural network. With regard to hidden layers and their neurons,
we experiment with number of hidden layers and number of neurons, and find the
architecture that is trained with minimum error.

As such, number of hidden layers and their neurons must be decided only through
experiments. Note that higher hidden layers, higher the representational power. However,
when we have too many hidden layers, training sessions will be too long due to extensive
calculations.

Since three layer ANN can model any real world problem, we may begin with network
architecture with one hidden layer and continue to do experiments for higher number of
hidden layers and neurons.

8.2.3 Controlling of training sessions


Training of a neural network is such tedious task. Although we can write computer programs
to calculate net and weight change a training session cannot be fully automated. This is
because, the key challenge in ANN training is to achieve a trained network subjected to a
minimum Emax and high accuracy. In a typical training session, when the error goes beyond
Emax, we discard the training and restart session with some new weights. Sometime,
initializing weights will not solve the problem and we have to tryout some more control
parameters. As such we can change the following parameters also to control a training
session.

• Threshold function
• Learning constant
• Momentum coefficient
• Order of application of data

Generally bipolar continuous function offer a higher flexibility in a training session, yet
involves too much calculations. Learning constant (η) can also be increased or decreased if
the training cannot be kept below the Emax. Generally, smaller η values give smoother
training sessions, but liable to be paralyzed. Thus we must achieve the best through
experiments. Momentum coefficient can also be tried out.

The order of application of training data into an ANN is also very crucial. Suppose you are
training a ANN to recognize hand written numbers. If you keep on applying various forms of
one, and then proceed to train different hand written twos, obviously, error will suddenly
go up. This is because, the pattern of 2 are very different that of 1. In order to address this
issue, we never repeatedly apply input data from the same class or the category; instead
randomize the application of inputs over the different class or categories.

This scenario is very familiar to us. For instance, when you prepare for GCE (A/L), you
never studied chemistry one whole day, then physics and biology afterwards. We always
prefer to randomize the access to subjects within a learning session. Otherwise, brain tend
to remember but not to generalize, and also returns very high error, when a reasonably new
thing has been presented.

Example 8.1
Propose (a) the simplest possible (b) more appropriate network architecture for training the
following input pairs from a data set. Also propose a suitable learning algorithm for this
network.

X1 = [1, -1, 0.2, 1] D1 = [1, 1]


X2 = [1, 0.5, 1, -1] D2 = [1, -1]
X3 = [1, 0.5, -1, 1] D3 = [-1, 1]

Solution

(a) Simplest possible architecture


Since each output has two components, we propose two neurons for the output layer.
Further, we can introduce one or more neurons for the input layer. If we are looking for the
simplest possible network, we can propose a single neuron for the input layer. In addition,
since this data requires a supervised training strategy, the simplest possible architecture
should have at least two layers. The proposed network architecture is shown in Figure 8.1.
We can use delta learning rule with backpropagation algorithm for this purpose.

A
C

Input layer Output layer

Figure 8.2 – Simplest possible architecture

(b) More appropriate architecture


Note also that if you are looking for more appropriate simplest possible network
architecture would have three layers. This is because; ANN with 3-layers can model any real
world problem without any reservation. Therefore, going for 3-layer architecture is safer.

In this architecture, as above output layer should have two neurons. However, to achieve a
better distribution of learning effect, it is appropriate to have more links. Thus, four
neurons in the input layer would be justifiable from a mathematical viewpoint. In addition,
there should be at least one hidden layer for the most promising and simplest network
architecture. The number of neurons in hidden layer must be decided through experiments.
Figure 8.3 shows more appropriate network architecture to model the above problem. We
can also use delta learning rule with backpropagation algorithm for this purpose.
I1 H1 O1

I2 H2
O2
I3
H3
I4
Output layer
H4

Input layer Hn Hidden layer

Figure 8.3 – More appropriate architecture

8.3 A successful training session


It has been a customary to graphically visualize fluctuation of error during a training session
of an ANN. In typical training session we always want to maintain the error below a certain
Emax value, and the errors to go down over the training cycles. Figure 8.3 can be considered
as a error graph of successful training session.

Error

Emax

Cycle

Figure 8.4 – A successful training session

This kind of graph helps us to detect issues in a training session and apply solutions
accordingly. Below are some of the interesting such situations.
Case 1
Error always go beyond the Emax. Obviously, we have to go for strategies like weight re-
initialization, change of threshold function, learning constant, etc. This can also happen if
our Emax. is too ambitious. Then better to try with a smaller Emax. value. You can also change
number of neurons, and number of layers also in the network architecture.

Case 2
Error is less than Emax, but errors do not increase of decrease. This has happened due to
network paralysis. We can adjust the momentum coefficient to address this issue.

Case 3
Error has been decreasing over the cycles, yet suddenly error goes up and did not go
beyond the Emax. You may have introduced an input which is very different from previously
used inputs. Therefore, we better reshuffle the input data to address this issue.

8.4 Toolkits for design and development of ANN


There are plenty of open source toolkits available for training of ANN. These tools generally
provide facilities for importing any input data and execute training. Some tools provide
facilities to define a network architecture, select threshold function, bias, momentum, and
to change other learning parameters. Most of these tools provide graphical user interfaces
and facility to visualise error fluctuations.

Activity 8.1
Download a toolkit for training of ANN. Study its features and examples provided about the
use of ANN technology.

You might also like