Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

European Journal of Scientific Research

ISSN 1450-216X / 1450-202X Vol.121 No.2, 2014, pp.217-225


http://www.europeanjournalofscientificresearch.com

Investigation and Analysis of ANN Parameters

Khaled M. Matrouk
Department of Computer Engineering, Faculty of Engineering
Al-Hussein Bin Talal University, P. O. Box (20), Ma'an, Zip Code 71111, Jordan
E-mail: khaled.matrouk@ahu.edu.jo
Tel: +962-3-2179000 (ext. 8503), Fax: +962-3-2179050, Mobile: +962-779-469339

Haitham A. Alasha'ary
Department of Computer Engineering, Faculty of Engineering
Al-Hussein Bin Talal University, P. O. Box (20), Ma'an, Zip Code 71111, Jordan

Abdullah I. Al-Hasanat
Department of Computer Engineering, Faculty of Engineering
Al-Hussein Bin Talal University, P. O. Box (20), Ma'an, Zip Code 71111, Jordan

Ziad A. Al-Qadi
Department of Computer Engineering, Faculty of Engineering
Al-Hussein Bin Talal University, P. O. Box (20), Ma'an, Zip Code 71111, Jordan

Hasan M. Al-Shalabi
Department of Computer Engineering, Faculty of Engineering
Al-Hussein Bin Talal University, P. O. Box (20), Ma'an, Zip Code 71111, Jordan

Abstract

Different ANN architectures with a range of values for a set of network parameters
will be tested and analyzed in order to get the relationship between the performance factors
and ANN parameters; it will be shown how to use ANN parameters to evaluate ANN
training time and absolute error.

Keywords: ANN, Training Time, Absolute Error, Epochs, Training Rate

1. Introduction
Artificial neural networks (ANNs) are difficult to describe with a simple definition (Vennapusa et al,
2011; Yeh and JLi, 2002; Zupan and Gasteiger, 1993). Maybe the closest description would be a
comparison with a black box having multiple- input and multiple-output structure which operates using
a large number of mostly parallel connected simple arithmetic units. The most important thing to
remember about all ANN methods is that they work best if they are dealing with non-linear (Rumelhart
et al, 1986; Kirenko, 2006) dependence between the inputs and outputs (Fig. 1).
ANNs can be employed to describe or to find linear relationship as well (Ferzli and Karam,
2007), but the final result might often be worse than that if using another simpler standard statistical
techniques. Due to the fact that at the beginning of experiments we often do not know whether the
responses are related to the inputs in a linear or in a nonlinear way, a good advice is to try always some
standard statistical technique for interpreting the data parallel to the use of ANNs.
Investigation and Analysis of ANN Parameters 218
Figure 1: Neural network as a black-box featuring the non-linear relationship between the multivariate input
variables and multi-variety responses

The first thing to be aware of in our consideration of employing the ANNs is the nature of the
problem (Kohonen, 1988; Hecht-Nielsen, 1987a; Hecht-Nielsen, 1987b) we are trying to solve: does
the problem require a supervised or an unsupervised approach? The supervised problem means that the
chemist has already a set of experiments with known outcomes for specific inputs at hand, while the
unsupervised problem means that one deals with a set of experimental data which have no specific
associated answers (or supplemental information) attached. Typically, unsupervised problems arise
when the chemist has to explore the experimental data collected during pollution monitoring, or always
he or she is faced with the measured data at the first time or if one must find a good method to display
the data in a most appropriate way.
Usually, first problems associated with handling data require unsupervised methods. Only
further after we became more familiar with the measurement space (the measurement regions) of input
and output variables and with the behaviors of the responses, we can select sets of data on which the
supervised methods (modeling for example) can be carried on.
The basic types of goals or problems for solution (Hecht-Nielsen, 1988; Zupan et al, 1994) of
which the ANNs can be used are the following:
 Election of samples from a large quantity of the existing ones for further handling
 Classification of an unknown sample into a class out of several pre-defined (known in
advance) number of existing classes (Zupan et al, 1994; Forina and Armanino, 1982)
 Clustering of objects, i.e., finding the inner structure of the measurement space to which
the samples belong, and making direct and inverse models for predicting behaviors or
effects of unknown samples in a quantitative manner (Derde and Massart, 1982; Zupan
and Massart, 1989)
Once we have decided which type of the problem we have, we can look for the best strategy or
method to solve it. Of, course in any of the above aspects we can employ one or more different ANN
architectures and different ANN learning strategies.
Artificial neuron is supposed to mimic the action of a biological neuron, i.e., to accept many
different signals (Derde and Massart, 1986), xi, from many neighboring neurons and to process them in
a pre-defined simple way. Depending on the outcome of this processing, the neuron j decides either to
fire an output signal yj or not. The output signal (if it is triggered) can be either 0 or 1, or can have any
real value between 0 and 1. Mainly from the historical point of view the function which calculates the
output from the m-dimensional input vector x, f(x), is regarded as being composed of two parts. The
first part evaluates the so called 'net input', while the second one 'transfers' the net input in a non-linear
manner to the output value y.
The first function is a linear combination of the input variables, x1, x2, ... xi, ... xm, multiplied
with the coefficients, wji, called 'weights', while the second function serves as a 'transfer function'
because it 'transfers' the signal(s) through the neuron's axon to the other neurons' dendrites.
219 Khaled M. Matrouk, Haitham A. Alasha'ary, Abdullah I. Al-Hasanat
Ziad A. Al-Qadi and Hasan M. Al-Shalabi

Artificial neural networks (ANNs) can be composed of different number of neurons. In


chemical applications, the size of ANNs (i.e. the number of neurons) can range from tens of thousands
to only as little as less than ten (1-3). The neurons in ANNs can be all put into one layer or two, three
or even more layers of neurons can be formed. Fig. 2 shows us the difference between the one and
multilayer ANN structure. In Fig. 2 the one-layer network has four neurons (sometimes called nodes),
each having four weights. Altogether there are 16 weights in this one-layer ANN. Each of four neurons
accepts all input signals plus the additional input from the bias which is always equal to one. The fact,
that the input bias is equal to 1, however, does not prevent the weights leading from the bias towards
the nodes to be changed!
The two-layer ANN (Fig. 2, right) has six neurons (nodes): two in the first layer and four in the
second or output layer. Again, all neurons in one layer obtain all signals that are coming from the layer
above. The two-layer network has (4 x 2) + (3 x 4) = 20 weights: 8 in the first and 12 in the second
layer. It is understood that the input signals are normalized between 0 and 1.
In Fig. 1.2 the neurons are drawn as circles and the weights as lines connecting the circles
(nodes) in different layers. As can be seen, the lines (weights) between the input variables and the first
layer have no nodes or neurons marked at the top.
To overcome this inconsistency the input is regarded as a non-active (buffer) layer of neurons
serving only to distribute the signals to the first layer of active neurons. Therefore, Fig. 2 shows
actually a 2-layer and 3-layer networks with the input layer being inactive. The reader should be
careful when reading the literature on ANNs because authors sometimes actually refer to the above
ANNs as to the two and three-layer ones. We shall regard only the active layer of neurons as actual
layer and will therefore name these networks as one and two-layer ANNs (Zupan, 1989; Varmuza and
Lohninger, 1990).

Figure 2: One-layer (left) and two-layer (right) ANNs. The ANNs shown can be applied to solve a 3-variable
input 4-responses output problem

2. Experimental Part
Artificial neural network performance depends on the network architecture (number of layers and
number of neurons in each layer), network parameters such as learning rate, and number of training
cycles (epochs), activation function used in each layer.
Investigation and Analysis of ANN Parameters 220

Here we will investigate and analyze ANN performance in mean of calculating absolute error
and training time and we look for the relationship between ANN performance factors and the
parameters: learning rate, number of hidden layers and number of epochs.

2.1. Experiment 1
An ANN with two inputs, no hidden layers, epochs = 50, learning rate = 0.1 was created, trained and
simulated using the data inputs shown in Table 1 (row 1: data set1, row 2: data set2, row 3: target
output, row 4: simulated output, row 5: relative output error).

Table 1: Input Data Sets, Target, Simulated Output and Relative Output Error

0.9501 0.2311 0.6068 0.4860 0.8913 0.7621 0.4565 0.0185 0.8214 0.4447
0.6154 0.7919 0.9218 0.7382 0.1763 0.4057 0.9355 0.9169 0.4103 0.8936
0.5847 0.1830 0.5594 0.3588 0.1571 0.3092 0.4270 0.0170 0.3370 0.3974
0.5799 0.1716 0.5414 0.3673 0.1570 0.3055 0.4355 0.0171 0.3430 0.4100
0.82% 6.23% 3.22% 2.37% 0.06% 1.20% 1.99% 0.59% 1.78% 3.17%

Comparing the values of the target and the simulated output we can see that these two outputs
are very close, which means that the absolute error between them is minimum as shown.

2.2. Experiment 2
Different ANN architectures with different parameters (number of hidden layers, epochs, and learning
rate) were created using MATLAB, trained and tested using the data sets shown in Table 1. The
experimental results are shown in Tables 2 through 7.

Table 2: Varying x2 and fixing x1 and x3

# of hidden layers Training time In Absolute error %


Epochs (x2) Learning rate (x3)
(x1) msec (TT) (AT)
0 5 0.1 82 26.8
0 10 0.1 72 29.9
0 20 0.1 122 1.4
0 30 0.1 135 5.4
0 40 0.1 87 30
0 50 0.1 224 2.2
0 60 0.1 276 0.5
0 100 0.1 295 0.9

Table 3: Varying x2 and fixing x1 and x3

# of hidden layers
Epochs (x2) Learning rate (x3) Training time msec (TT) Absolute error % (AT)
(x1)
1 5 0.1 63 36.9
1 10 0.1 86 15.1
1 20 0.1 83 26.1
1 30 0.1 157 11.4
1 40 0.1 159 5.6
1 50 0.1 159 22.8
1 60 0.1 221 1.4
1 100 0.1 323 0.5
221 Khaled M. Matrouk, Haitham A. Alasha'ary, Abdullah I. Al-Hasanat
Ziad A. Al-Qadi and Hasan M. Al-Shalabi

Table 4: Varying x2 and fixing x1 and x3

# of hidden layers Training time msec Absolute error %


Epochs (x2) Learning rate (x3)
(x1) (TT) (AT)
0 5 0.01 87 21
0 10 0.01 36 17.8
0 20 0.01 96 1.4
0 30 0.01 267 5.4
0 40 0.01 185 2.7
0 50 0.01 156 5.4
0 60 0.01 258 5.4
0 100 0.01 292 1.1
0 200 0.01 539 0.5

Table 5: Varying x2 and fixing x1 and x3

# of hidden layers Training time msec Absolute error %


Epochs (x2) Learning rate (x3)
(x1) (TT) (AT)
1 5 0.01 84 24.9
1 10 0.01 79 2.6
1 20 0.01 126 2.1
1 30 0.01 144 5.4
1 40 0.01 156 0.9
1 50 0.01 173 24.9
1 60 0.01 244 0.9
1 100 0.01 326 0.08

Table 6: Varying x2 and fixing x1 and x3

# of hidden layers Training time msec Absolute error %


Epochs (x2) Learning rate (x3)
(x1) (TT) (AT)
0 5 0.5 8 10.4
0 10 0.5 76 5.5
0 20 0.5 97 1.9
0 30 0.5 186 1.6
0 40 0.5 148 2.1
0 50 0.5 224 1
0 60 0.5 202 1.2
0 100 0.5 311 1
0 200 0.5 541 1.1

Table 7: Varying x2 and fixing x1 and x3

# of hidden layers Training time msec Absolute error %


Epochs (x2) Learning rate (x3)
(x1) (TT) (AT)
1 5 0.5 64 22.3
1 10 0.5 95 2.3
1 20 0.5 84 3.5
1 30 0.5 163 3
1 40 0.5 240 0.1
1 50 0.5 154 25.1
1 60 0.5 262 0.86
1 100 0.5 330 0.8
1 200 0.5 602 0.1
Investigation and Analysis of ANN Parameters 222

3. Discussion of Results
The obtained results and data input set were analyzed using multiple regression technique provided by
MATLAB in order to get the following:
 A relationship between the training time and ANN parameters
 A relationship between the absolute error and ANN parameters
Equation 1 shows the obtained relationship between training time and ANN parameters (# of
hidden layers, epochs and training rate), while Equation 2 shows the obtained relationship between the
absolute error and ANN parameters.
et=61.4148+6.0064*x1+2.5237*x2+0.3379*x3 (1)
at=11.1747-0.7242*x1+0.0007*x2-14.4027*x3 (2)
Referring to Equations 1 and 2 we can estimate ANN performance depending on ANN
architecture and parameters.
We can see from the above equations that there is a linear relationship between each of the
functions (training time and the absolute error) and ANN parameters. This is shown in Figs 3-8 which
are obtained by applying variant values of the parameters in the above equations.

Figure 3: Execution time as a function of hidden layer (x2 and x3 are fixed)
Exucution time as a function of # of hid layers
100

95

90
Exucution time in m seconds

85

80

75

70

65

60
0 1 2 3 4 5 6
Number of hidden layers

Figure 4: Execution time as a function of epochs (x1 and x3 are fixed)

Exucution time as a function of # iterations


600

500
Exucution time in m seconds

400

300

200

100

0
0 20 40 60 80 100 120 140 160 180 200
Number of Epochs(training cycles)
223 Khaled M. Matrouk, Haitham A. Alasha'ary, Abdullah I. Al-Hasanat
Ziad A. Al-Qadi and Hasan M. Al-Shalabi

Figure 5: Execution time as a function of learning rate (x1 and x2 are fixed)

Exucution time as a function of # learning rate


61.8

61.75

61.7
Exucution time in m seconds

61.65

61.6

61.55

61.5

61.45

61.4
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Learining rate

Figure 6: Absolute error as a function of hidden layers (x2 and x3 are fixed)

Absolute error as a function of # of hidden layers


11.5

11

10.5

10
Absolute error %

9.5

8.5

7.5

6.5
0 1 2 3 4 5 6
Nnmber of hidden layers
Investigation and Analysis of ANN Parameters 224

Figure 7: Absolute error as a function of epochs (x1 and x3 are fixed)

Absolute error as a function of # iterations


11.34

11.32

11.3

11.28
Absolute error %

11.26

11.24

11.22

11.2

11.18

11.16
0 20 40 60 80 100 120 140 160 180 200
Nnmber of epochs

Figure 8: Absolute error as a function of learning rate (x1 and x2 are fixed)

Absolute error as a function of learning rate


10

6
Absolute error %

-2
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Learing rate

4. Conclusions
Based on the investigation and analysis results we can conclude the following:
 The performance of ANN depends on its architecture and parameters.
 It is possible to use the achieved equations to evaluate any ANN performance and
behavior.
225 Khaled M. Matrouk, Haitham A. Alasha'ary, Abdullah I. Al-Hasanat
Ziad A. Al-Qadi and Hasan M. Al-Shalabi

References
[1] Vennapusa B., Cruz L. D. L., Shah H., Michalski V., and Zhang Q. Y., 2011. “Erythrocyte
Sedimentation Rate (ESR) Measured by the Streck ESR-Auto Plus Is Higher Than With the
Sediplast Westergren Method”, American Journal of Clinical Pathology 135(3), pp. 386-390.
[2] JLi X., and Yeh A.G., 2002. “Neural-Network-Based Cellular Automata for Simulating Multiple Land
Use Changes Using GIS”. International Journal of Geographical Information Science 16, pp. 323-343.
[3] J. Zupan, and J. Gasteiger.“Neural Networks for Chemists: an Introduction”. VCH, Weinheim,
1993.
[4] D. E. Rumelhart, G.E. Hinton, and R. J. Williams, 1986. “Learning Internal Representations by
Error Backpropagation in Distributed Parallel Processing: Explorations in the Microstructures
of Cognition”, Eds. D.E. Rumelhart, J.L. MacClelland,Vol. 1. MIT Press, Cambridge, MA,
USA, 1986, pp. 318-362.
[5] R. P. Kirenko, Jan. 2006. Reduction of Coding Artifacts Using Chrominance and Luminance Spatial
Analysis. Digest of Technical Papers. International Conference on Consumer Electronics, pp. 209-210.
[6] T. Ferzli, and Karam L., Oct. 2007. “A No-Reference Objective Image Sharpness Metric Based on Just-
Noticeable Blur and Probability Summation”. IEEE International Conference Image Processing, pp.
445-448.
[7] T. Kohonen, 1988. “An Introduction to Neural Computing”, Neural Networks 1, pp. 3-16.
[8] R. Hecht-Nielsen, 1987. “Counter-Propagation Networks”, Proceedings of the IEEE First
International Conference on Neural Networks, pp. 19-32.
[9] R. Hecht-Nielsen, 1987. “Counter-Propagation Networks”, Applied Optics 26, pp. 4979-4984.
[10] R. Hecht-Nielsen, 1988. “Application of Counter-Propagation Networks”, Neural Networks 1,
pp. 131-140.
[11] J. Zupan, M. Novic, X. Li, and J. Gasteiger, 1994. “Classification of Multicomponent Analytical Data of
Olive Oils Using Different Neural Networks”, Analytica Chimica Acta 292, pp. 219-234.
[12] M. Forina, C. and Armanino, 1982. “Eigenvector Projection and Simplified Non-Linear Mapping of
Fatty Acid Content of Italian Olive Oils, Annali Di Chimica (Rome) 72, pp. 127-143
[13] M. P. Derde, and D. L. Massart, 1982. “Extraction of Information from Large Data Sets by Pattern
Recognition”, Fresenius' Zeitschrift für Analytische Chemie 313, pp. 484-495.
[14] J. Zupan, and D. L. Massart, 1989. “Evaluation of the 3-D Method with the Application in
Analytical Chemistry”, Analytical Chemistry 61, pp. 2098-2182.
[15] M. P. Derde, and D. L. Massart, 1986. “Supervised Pattern Recognition: The Ideal Method”, Analytica
Chimica Acta 191, pp. 1-16.
[16] J. Zupan. “Algorithms for Chemists”, John Wiley, Chichester, 1989.
[17] K. Varmuza, and H. Lohninger, 1990. “Principal Component Analysis of Chemical Data in PCs
for Chemists”, Ed. J. Zupan, Elsevier, Amsterdam, 1990, pp. 43-64.

You might also like