Professional Documents
Culture Documents
Investigation and Analysis of ANN Parameters
Investigation and Analysis of ANN Parameters
Khaled M. Matrouk
Department of Computer Engineering, Faculty of Engineering
Al-Hussein Bin Talal University, P. O. Box (20), Ma'an, Zip Code 71111, Jordan
E-mail: khaled.matrouk@ahu.edu.jo
Tel: +962-3-2179000 (ext. 8503), Fax: +962-3-2179050, Mobile: +962-779-469339
Haitham A. Alasha'ary
Department of Computer Engineering, Faculty of Engineering
Al-Hussein Bin Talal University, P. O. Box (20), Ma'an, Zip Code 71111, Jordan
Abdullah I. Al-Hasanat
Department of Computer Engineering, Faculty of Engineering
Al-Hussein Bin Talal University, P. O. Box (20), Ma'an, Zip Code 71111, Jordan
Ziad A. Al-Qadi
Department of Computer Engineering, Faculty of Engineering
Al-Hussein Bin Talal University, P. O. Box (20), Ma'an, Zip Code 71111, Jordan
Hasan M. Al-Shalabi
Department of Computer Engineering, Faculty of Engineering
Al-Hussein Bin Talal University, P. O. Box (20), Ma'an, Zip Code 71111, Jordan
Abstract
Different ANN architectures with a range of values for a set of network parameters
will be tested and analyzed in order to get the relationship between the performance factors
and ANN parameters; it will be shown how to use ANN parameters to evaluate ANN
training time and absolute error.
1. Introduction
Artificial neural networks (ANNs) are difficult to describe with a simple definition (Vennapusa et al,
2011; Yeh and JLi, 2002; Zupan and Gasteiger, 1993). Maybe the closest description would be a
comparison with a black box having multiple- input and multiple-output structure which operates using
a large number of mostly parallel connected simple arithmetic units. The most important thing to
remember about all ANN methods is that they work best if they are dealing with non-linear (Rumelhart
et al, 1986; Kirenko, 2006) dependence between the inputs and outputs (Fig. 1).
ANNs can be employed to describe or to find linear relationship as well (Ferzli and Karam,
2007), but the final result might often be worse than that if using another simpler standard statistical
techniques. Due to the fact that at the beginning of experiments we often do not know whether the
responses are related to the inputs in a linear or in a nonlinear way, a good advice is to try always some
standard statistical technique for interpreting the data parallel to the use of ANNs.
Investigation and Analysis of ANN Parameters 218
Figure 1: Neural network as a black-box featuring the non-linear relationship between the multivariate input
variables and multi-variety responses
The first thing to be aware of in our consideration of employing the ANNs is the nature of the
problem (Kohonen, 1988; Hecht-Nielsen, 1987a; Hecht-Nielsen, 1987b) we are trying to solve: does
the problem require a supervised or an unsupervised approach? The supervised problem means that the
chemist has already a set of experiments with known outcomes for specific inputs at hand, while the
unsupervised problem means that one deals with a set of experimental data which have no specific
associated answers (or supplemental information) attached. Typically, unsupervised problems arise
when the chemist has to explore the experimental data collected during pollution monitoring, or always
he or she is faced with the measured data at the first time or if one must find a good method to display
the data in a most appropriate way.
Usually, first problems associated with handling data require unsupervised methods. Only
further after we became more familiar with the measurement space (the measurement regions) of input
and output variables and with the behaviors of the responses, we can select sets of data on which the
supervised methods (modeling for example) can be carried on.
The basic types of goals or problems for solution (Hecht-Nielsen, 1988; Zupan et al, 1994) of
which the ANNs can be used are the following:
Election of samples from a large quantity of the existing ones for further handling
Classification of an unknown sample into a class out of several pre-defined (known in
advance) number of existing classes (Zupan et al, 1994; Forina and Armanino, 1982)
Clustering of objects, i.e., finding the inner structure of the measurement space to which
the samples belong, and making direct and inverse models for predicting behaviors or
effects of unknown samples in a quantitative manner (Derde and Massart, 1982; Zupan
and Massart, 1989)
Once we have decided which type of the problem we have, we can look for the best strategy or
method to solve it. Of, course in any of the above aspects we can employ one or more different ANN
architectures and different ANN learning strategies.
Artificial neuron is supposed to mimic the action of a biological neuron, i.e., to accept many
different signals (Derde and Massart, 1986), xi, from many neighboring neurons and to process them in
a pre-defined simple way. Depending on the outcome of this processing, the neuron j decides either to
fire an output signal yj or not. The output signal (if it is triggered) can be either 0 or 1, or can have any
real value between 0 and 1. Mainly from the historical point of view the function which calculates the
output from the m-dimensional input vector x, f(x), is regarded as being composed of two parts. The
first part evaluates the so called 'net input', while the second one 'transfers' the net input in a non-linear
manner to the output value y.
The first function is a linear combination of the input variables, x1, x2, ... xi, ... xm, multiplied
with the coefficients, wji, called 'weights', while the second function serves as a 'transfer function'
because it 'transfers' the signal(s) through the neuron's axon to the other neurons' dendrites.
219 Khaled M. Matrouk, Haitham A. Alasha'ary, Abdullah I. Al-Hasanat
Ziad A. Al-Qadi and Hasan M. Al-Shalabi
Figure 2: One-layer (left) and two-layer (right) ANNs. The ANNs shown can be applied to solve a 3-variable
input 4-responses output problem
2. Experimental Part
Artificial neural network performance depends on the network architecture (number of layers and
number of neurons in each layer), network parameters such as learning rate, and number of training
cycles (epochs), activation function used in each layer.
Investigation and Analysis of ANN Parameters 220
Here we will investigate and analyze ANN performance in mean of calculating absolute error
and training time and we look for the relationship between ANN performance factors and the
parameters: learning rate, number of hidden layers and number of epochs.
2.1. Experiment 1
An ANN with two inputs, no hidden layers, epochs = 50, learning rate = 0.1 was created, trained and
simulated using the data inputs shown in Table 1 (row 1: data set1, row 2: data set2, row 3: target
output, row 4: simulated output, row 5: relative output error).
Table 1: Input Data Sets, Target, Simulated Output and Relative Output Error
0.9501 0.2311 0.6068 0.4860 0.8913 0.7621 0.4565 0.0185 0.8214 0.4447
0.6154 0.7919 0.9218 0.7382 0.1763 0.4057 0.9355 0.9169 0.4103 0.8936
0.5847 0.1830 0.5594 0.3588 0.1571 0.3092 0.4270 0.0170 0.3370 0.3974
0.5799 0.1716 0.5414 0.3673 0.1570 0.3055 0.4355 0.0171 0.3430 0.4100
0.82% 6.23% 3.22% 2.37% 0.06% 1.20% 1.99% 0.59% 1.78% 3.17%
Comparing the values of the target and the simulated output we can see that these two outputs
are very close, which means that the absolute error between them is minimum as shown.
2.2. Experiment 2
Different ANN architectures with different parameters (number of hidden layers, epochs, and learning
rate) were created using MATLAB, trained and tested using the data sets shown in Table 1. The
experimental results are shown in Tables 2 through 7.
# of hidden layers
Epochs (x2) Learning rate (x3) Training time msec (TT) Absolute error % (AT)
(x1)
1 5 0.1 63 36.9
1 10 0.1 86 15.1
1 20 0.1 83 26.1
1 30 0.1 157 11.4
1 40 0.1 159 5.6
1 50 0.1 159 22.8
1 60 0.1 221 1.4
1 100 0.1 323 0.5
221 Khaled M. Matrouk, Haitham A. Alasha'ary, Abdullah I. Al-Hasanat
Ziad A. Al-Qadi and Hasan M. Al-Shalabi
3. Discussion of Results
The obtained results and data input set were analyzed using multiple regression technique provided by
MATLAB in order to get the following:
A relationship between the training time and ANN parameters
A relationship between the absolute error and ANN parameters
Equation 1 shows the obtained relationship between training time and ANN parameters (# of
hidden layers, epochs and training rate), while Equation 2 shows the obtained relationship between the
absolute error and ANN parameters.
et=61.4148+6.0064*x1+2.5237*x2+0.3379*x3 (1)
at=11.1747-0.7242*x1+0.0007*x2-14.4027*x3 (2)
Referring to Equations 1 and 2 we can estimate ANN performance depending on ANN
architecture and parameters.
We can see from the above equations that there is a linear relationship between each of the
functions (training time and the absolute error) and ANN parameters. This is shown in Figs 3-8 which
are obtained by applying variant values of the parameters in the above equations.
Figure 3: Execution time as a function of hidden layer (x2 and x3 are fixed)
Exucution time as a function of # of hid layers
100
95
90
Exucution time in m seconds
85
80
75
70
65
60
0 1 2 3 4 5 6
Number of hidden layers
500
Exucution time in m seconds
400
300
200
100
0
0 20 40 60 80 100 120 140 160 180 200
Number of Epochs(training cycles)
223 Khaled M. Matrouk, Haitham A. Alasha'ary, Abdullah I. Al-Hasanat
Ziad A. Al-Qadi and Hasan M. Al-Shalabi
Figure 5: Execution time as a function of learning rate (x1 and x2 are fixed)
61.75
61.7
Exucution time in m seconds
61.65
61.6
61.55
61.5
61.45
61.4
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Learining rate
Figure 6: Absolute error as a function of hidden layers (x2 and x3 are fixed)
11
10.5
10
Absolute error %
9.5
8.5
7.5
6.5
0 1 2 3 4 5 6
Nnmber of hidden layers
Investigation and Analysis of ANN Parameters 224
11.32
11.3
11.28
Absolute error %
11.26
11.24
11.22
11.2
11.18
11.16
0 20 40 60 80 100 120 140 160 180 200
Nnmber of epochs
Figure 8: Absolute error as a function of learning rate (x1 and x2 are fixed)
6
Absolute error %
-2
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Learing rate
4. Conclusions
Based on the investigation and analysis results we can conclude the following:
The performance of ANN depends on its architecture and parameters.
It is possible to use the achieved equations to evaluate any ANN performance and
behavior.
225 Khaled M. Matrouk, Haitham A. Alasha'ary, Abdullah I. Al-Hasanat
Ziad A. Al-Qadi and Hasan M. Al-Shalabi
References
[1] Vennapusa B., Cruz L. D. L., Shah H., Michalski V., and Zhang Q. Y., 2011. “Erythrocyte
Sedimentation Rate (ESR) Measured by the Streck ESR-Auto Plus Is Higher Than With the
Sediplast Westergren Method”, American Journal of Clinical Pathology 135(3), pp. 386-390.
[2] JLi X., and Yeh A.G., 2002. “Neural-Network-Based Cellular Automata for Simulating Multiple Land
Use Changes Using GIS”. International Journal of Geographical Information Science 16, pp. 323-343.
[3] J. Zupan, and J. Gasteiger.“Neural Networks for Chemists: an Introduction”. VCH, Weinheim,
1993.
[4] D. E. Rumelhart, G.E. Hinton, and R. J. Williams, 1986. “Learning Internal Representations by
Error Backpropagation in Distributed Parallel Processing: Explorations in the Microstructures
of Cognition”, Eds. D.E. Rumelhart, J.L. MacClelland,Vol. 1. MIT Press, Cambridge, MA,
USA, 1986, pp. 318-362.
[5] R. P. Kirenko, Jan. 2006. Reduction of Coding Artifacts Using Chrominance and Luminance Spatial
Analysis. Digest of Technical Papers. International Conference on Consumer Electronics, pp. 209-210.
[6] T. Ferzli, and Karam L., Oct. 2007. “A No-Reference Objective Image Sharpness Metric Based on Just-
Noticeable Blur and Probability Summation”. IEEE International Conference Image Processing, pp.
445-448.
[7] T. Kohonen, 1988. “An Introduction to Neural Computing”, Neural Networks 1, pp. 3-16.
[8] R. Hecht-Nielsen, 1987. “Counter-Propagation Networks”, Proceedings of the IEEE First
International Conference on Neural Networks, pp. 19-32.
[9] R. Hecht-Nielsen, 1987. “Counter-Propagation Networks”, Applied Optics 26, pp. 4979-4984.
[10] R. Hecht-Nielsen, 1988. “Application of Counter-Propagation Networks”, Neural Networks 1,
pp. 131-140.
[11] J. Zupan, M. Novic, X. Li, and J. Gasteiger, 1994. “Classification of Multicomponent Analytical Data of
Olive Oils Using Different Neural Networks”, Analytica Chimica Acta 292, pp. 219-234.
[12] M. Forina, C. and Armanino, 1982. “Eigenvector Projection and Simplified Non-Linear Mapping of
Fatty Acid Content of Italian Olive Oils, Annali Di Chimica (Rome) 72, pp. 127-143
[13] M. P. Derde, and D. L. Massart, 1982. “Extraction of Information from Large Data Sets by Pattern
Recognition”, Fresenius' Zeitschrift für Analytische Chemie 313, pp. 484-495.
[14] J. Zupan, and D. L. Massart, 1989. “Evaluation of the 3-D Method with the Application in
Analytical Chemistry”, Analytical Chemistry 61, pp. 2098-2182.
[15] M. P. Derde, and D. L. Massart, 1986. “Supervised Pattern Recognition: The Ideal Method”, Analytica
Chimica Acta 191, pp. 1-16.
[16] J. Zupan. “Algorithms for Chemists”, John Wiley, Chichester, 1989.
[17] K. Varmuza, and H. Lohninger, 1990. “Principal Component Analysis of Chemical Data in PCs
for Chemists”, Ed. J. Zupan, Elsevier, Amsterdam, 1990, pp. 43-64.