Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

International Journal of Technical Research(IJTR)

Vol 1,Issue 1,Mar-Apr 2012

Minimization of Error in Training a Neural Network


Using Gradient Descent Method
Er. Parveen Sehgal1, Dr. Sangeeta Gupta2, Prof. Dharminder Kumar3
1
Associate Professor, Department of Computer Science & Engineering, PPIMT, Hisar (Haryana)
2
Director (Management), OITM, Hisar (Haryana),
3
Chairman, Department of Computer Science & Engineering, GJUS&T, Hisar (Haryana)
1
parveensehgal@yahoo.com, 2sangeet_gju@yahoo.co.in, 3dr_dk_kumar_02@yahoo.com

Abstract--The paper demonstrates the gradient descent


minimization of error, in training of a prediction model for
insurance created on multilayer neural network with error back
propagation. This involves finding the local minima of error
function and thus providing a corrective adjustment of synaptic
weights in between the neurons of the network layers. The paper
also explains the effect of changing the learning rate parameter on
finding the local minima.
Keywords--Artificial Neural Networks, Gradient Descent
Optimization, Multilayer Perceptron, Supervised Learning, Error
Back Propagation

I. INTRODUCTION

Multi Layer Perceptron can be applied to solve difficult The error back propagation algorithm uses method of
and complex problems and can be used to approximate supervised learning. We provide the algorithm with the
virtually any non-linear and complex function to any recorded set of observations or training set [2] i.e.
desired accuracy. The network can be trained in a examples of the inputs and the desired outputs that we
supervised manner to reach the error in desired limits. want the network to compute, and then the error
Thus the idea can be applied to develop prediction (difference between actual and expected results) is
models based on neural networks, because the problem computed. These differences in output are back
is highly complex and stochastic in nature and needs propagated in the layers of the neural networks and the
approximation of very complex functions, which is not algorithm adjusts the synaptic weights in between the
possible with traditional techniques. Predicted output of neurons of successive layers such that overall error
the system can every time be compared with the energy of the network, E is minimized. Back-
desired output and the error can be propagated back in propagation performs a gradient descent minimization
the network[1] and tune the network parameters for of error energy E. The idea of the back propagation
reduction of error in next prediction, this way the algorithm is to reduce this error, until the ANN learns
network can be trained till limits of desired accuracy. the training data.
We need to move in a direction in order to descend the
error in a very controlled way like in gradient descent Training of network i.e. error correction is stopped
manner. when the value of the error energy function has become
sufficiently small and as desired in the required
II. TECHNIQUES USED FOR TRAINING THE NETWORK limits.[3]

A. Error Back Propagation


Based on approach of error correction learning, back
propagation is a systematic method for training a
multilayer artificial neural network. Back propagation
provides a computationally efficient method for
changing the synaptic weights in the neural network,
with differentiable activation function units.  
Total error for pth observation of data set and jth neuron
in the output layer can be computed as:
10
International Journal of Technical Research(IJTR)
Vol 1,Issue 1,Mar-Apr 2012

We have taken the square of error energy so that errors


Where, represents the desired target output and of opposite signs don’t cancel out each other; also we
are interested in overall magnitude of the error and not
represents the predicted from the system.
in the sign of error.
B. Gradient Descent Method
Training the neural network involves finding the
minimum of a complicated nonlinear function (called The total error energy is obtained as , for
``error function''). This function describes the error the the complete data set[5].
neural network makes in approximating or classifying Objective of the learning process is to adjust the ANN
parameters (synaptic weights) to minimize the overall
the training data, as a function of the weights of the
error energy. Weights are updated on pattern by pattern
network. A simple example of three dimensional error basis until complete set of training data is utilized (one
surface is shown below. epoch) for training of the network. We need to find out
a new weight value so that E comes minimum, and also
we know that changing the weight value either takes us
away from minima or closer to it, so to tune for that
weight value, we need to find the direction that will
take us to local minima of the curve i.e. opposite to the
direction of the gradient of error.
Gradient means change of error energy w. r. t. weight
value can be computed as [6]:

By applying chain rule of differentiation, we can write:

We want the error to become as small as possible and


…(2)
should thus try to move towards the point of minimum
error [4] as shown in the figure. Gradient descent
Where, represents the synaptic weight to jth neuron
simply means going downhill in small steps until you in the output layer from the ith neuron in the previous
reach the bottom of error surface. This is the learning layer.
technique used in back propagation. The back From eqn. 1, we can write:
propagation weight update is equal to the slope of the
energy function that is further scaled by a learning rate
η, (thus, the steeper the slope, the bigger the update but …(3)
may cause a slow convergence). Also, we know in a neural network:
Let us train the model for p number of data
observations. In back propagation, for the jth neuron in , therefore
the output layer, if output , then for the pth
observation, the error signal for the neuron j in output
layer can be given as:
…(4)
Now, from eqns. (2), (3) and (4), we can write:

…(1)
…(5)
Where set C includes all the neurons in the output layer
Correction in weight must be opposite to the gradient,
of the network. For pth observation, represents the so we can write:
desired target output and represents actual observed
output from the system, for jth neuron in the output
layer. And thus, updated new value of the synaptic weight
can be written as:
11
International Journal of Technical Research(IJTR)
Vol 1,Issue 1,Mar-Apr 2012

keep the learning rate too low to be in safe limits then


…(6) also it takes a long time to descend towards the point of
local minima.
After introducing a new parameter, eqn. (6) can be re-
written as[7]: V. REFERENCES

[1] Alex Berson & Stephen J. Smith, “Data Warehousing, Data Mining &
OLAP”, Tata McGraw Hill Edition, 2004, Ch. 19, pp. 390-392, ISBN-
13: 978-0-07-058741-0.
[2] Internet: http://en.wikipedia.org/wiki/Stochastic_gradient_descent,29th
The parameter , controls the speed at which we do January 2012.
[3] Jamal M. Nazzal, Ibrahim M. El-Emary and Salam A. Najim,
the error correction or decides for the rate at which the “Multilayer Perceptron Neural Network (MLPs) For Analyzing the
network learns, and therefore is known as learning rate Properties of Jordan Oil Shale”, World Applied Sciences Journal 5
parameter. (5): 2008, pp. 546-552. IDOSI Publications, 2008.
[4] Er. Parveen Sehgal, Dr. Sangeeta Gupta, Dr. Dharminder Kumar, “A
Study of Prediction Modeling Using Multilayer Perceptron (MLP)
C. Effect of Learning Rate Parameter With Error Back Propagation”, Proceedings of AICTE sponsored
The amount of corrective adjustments applied to National Conference on Knowledge Discovery & Network Security:
February 2011, ISBN: 978-81-8465-755-5, pp. 17-20.
synaptic weighs is called the learning rate η. The [5] Walter H. Delashmit and Michael T. Manry, “Recent Developments in
Multilayer Perceptron Neural Networks”, Proceedings of the 7th
learning rate parameter , provides an extra control for Annual Memphis Area Engineering and Science Conference, MAESC
tuning of the system to control the speed at which we 2005, pp. 1-3.
fall down or descend toward the point of local minima. [6] Jiawei Han and Micheline Kamber, “Data Mining: Concepts and
Techniques”, Second Ed.., Morgan Kaufmann Publishers, Elsevier,
To find a local minimum of a function using gradient 2007, ISBN: 978-81-312-0535-8, pp. 317-318.
descent, one takes steps proportional to the negative of [7] T. Jayalakshmi and Dr. A. Santhakumaran, “Improved Gradient
the gradient at the current point. Gradient descent is Descent Back Propagation Neural Networks for Diagnoses of Type II
Diabetes Mellitus”, Global Journal of Computer Science and
also known as steepest descent, or the method of Technology, Vol. 9 Issue 5 (Ver 2.0), January 2010, pp. 94-97.
steepest descent. [8] M. Z. Rehman, N. M. Nawi, “Improving the Accuracy of Gradient
Descent Back Propagation Algorithm (GDAM) on Classification
Parameter should not be kept too high so that we fly Problems”, International Journal on New Computer Architectures and
off from the point of local minima and also not so small Their Applications (IJNCAA) 1(4): pp. 861-870, the Society of Digital
Information and Wireless Communications, 2011, ISSN: 2220-9085.
that we never reach that point and network is not
trained with desired accuracy and takes a very long
time to converge for solution.

III. FUTURE SCOPE

The problem can be studied from different angles like


optimization of learning rate parameter or the effect of
increase of size of data set or how to reduce the time
for convergence of local minima using some improved
approach or how to search for a guaranteed global
minima in case of multi- dimensional error surface. The
solution can also be compared with other traditional
techniques like regression, decision trees etc. or new
algorithms may be developed to find a better solution,
which may reduce the training time for the network or
increase the accuracy of prediction.

IV. CONCLUSION

Our study of the gradient descent method on insurance


data set finds the accuracy of the method very reliable.
However increase of learning rate parameter shows a
slow convergence. If the learning rate parameter is kept
very high then there are chances to oscillate between
error surface [8] in search for the point of minima and
it in turn can cause very slow convergence rate near the
final value of the solution. On the other hand, if we

12

You might also like