Professional Documents
Culture Documents
Minimization of Error in Training A Neural Network Using Gradient Descent Method
Minimization of Error in Training A Neural Network Using Gradient Descent Method
I. INTRODUCTION
Multi Layer Perceptron can be applied to solve difficult The error back propagation algorithm uses method of
and complex problems and can be used to approximate supervised learning. We provide the algorithm with the
virtually any non-linear and complex function to any recorded set of observations or training set [2] i.e.
desired accuracy. The network can be trained in a examples of the inputs and the desired outputs that we
supervised manner to reach the error in desired limits. want the network to compute, and then the error
Thus the idea can be applied to develop prediction (difference between actual and expected results) is
models based on neural networks, because the problem computed. These differences in output are back
is highly complex and stochastic in nature and needs propagated in the layers of the neural networks and the
approximation of very complex functions, which is not algorithm adjusts the synaptic weights in between the
possible with traditional techniques. Predicted output of neurons of successive layers such that overall error
the system can every time be compared with the energy of the network, E is minimized. Back-
desired output and the error can be propagated back in propagation performs a gradient descent minimization
the network[1] and tune the network parameters for of error energy E. The idea of the back propagation
reduction of error in next prediction, this way the algorithm is to reduce this error, until the ANN learns
network can be trained till limits of desired accuracy. the training data.
We need to move in a direction in order to descend the
error in a very controlled way like in gradient descent Training of network i.e. error correction is stopped
manner. when the value of the error energy function has become
sufficiently small and as desired in the required
II. TECHNIQUES USED FOR TRAINING THE NETWORK limits.[3]
…(1)
…(5)
Where set C includes all the neurons in the output layer
Correction in weight must be opposite to the gradient,
of the network. For pth observation, represents the so we can write:
desired target output and represents actual observed
output from the system, for jth neuron in the output
layer. And thus, updated new value of the synaptic weight
can be written as:
11
International Journal of Technical Research(IJTR)
Vol 1,Issue 1,Mar-Apr 2012
[1] Alex Berson & Stephen J. Smith, “Data Warehousing, Data Mining &
OLAP”, Tata McGraw Hill Edition, 2004, Ch. 19, pp. 390-392, ISBN-
13: 978-0-07-058741-0.
[2] Internet: http://en.wikipedia.org/wiki/Stochastic_gradient_descent,29th
The parameter , controls the speed at which we do January 2012.
[3] Jamal M. Nazzal, Ibrahim M. El-Emary and Salam A. Najim,
the error correction or decides for the rate at which the “Multilayer Perceptron Neural Network (MLPs) For Analyzing the
network learns, and therefore is known as learning rate Properties of Jordan Oil Shale”, World Applied Sciences Journal 5
parameter. (5): 2008, pp. 546-552. IDOSI Publications, 2008.
[4] Er. Parveen Sehgal, Dr. Sangeeta Gupta, Dr. Dharminder Kumar, “A
Study of Prediction Modeling Using Multilayer Perceptron (MLP)
C. Effect of Learning Rate Parameter With Error Back Propagation”, Proceedings of AICTE sponsored
The amount of corrective adjustments applied to National Conference on Knowledge Discovery & Network Security:
February 2011, ISBN: 978-81-8465-755-5, pp. 17-20.
synaptic weighs is called the learning rate η. The [5] Walter H. Delashmit and Michael T. Manry, “Recent Developments in
Multilayer Perceptron Neural Networks”, Proceedings of the 7th
learning rate parameter , provides an extra control for Annual Memphis Area Engineering and Science Conference, MAESC
tuning of the system to control the speed at which we 2005, pp. 1-3.
fall down or descend toward the point of local minima. [6] Jiawei Han and Micheline Kamber, “Data Mining: Concepts and
Techniques”, Second Ed.., Morgan Kaufmann Publishers, Elsevier,
To find a local minimum of a function using gradient 2007, ISBN: 978-81-312-0535-8, pp. 317-318.
descent, one takes steps proportional to the negative of [7] T. Jayalakshmi and Dr. A. Santhakumaran, “Improved Gradient
the gradient at the current point. Gradient descent is Descent Back Propagation Neural Networks for Diagnoses of Type II
Diabetes Mellitus”, Global Journal of Computer Science and
also known as steepest descent, or the method of Technology, Vol. 9 Issue 5 (Ver 2.0), January 2010, pp. 94-97.
steepest descent. [8] M. Z. Rehman, N. M. Nawi, “Improving the Accuracy of Gradient
Descent Back Propagation Algorithm (GDAM) on Classification
Parameter should not be kept too high so that we fly Problems”, International Journal on New Computer Architectures and
off from the point of local minima and also not so small Their Applications (IJNCAA) 1(4): pp. 861-870, the Society of Digital
Information and Wireless Communications, 2011, ISSN: 2220-9085.
that we never reach that point and network is not
trained with desired accuracy and takes a very long
time to converge for solution.
IV. CONCLUSION
12