Ijcet: International Journal of Computer Engineering & Technology (Ijcet)

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & ISSN

0976 - 6375(Online), Volume 4, Issue 6, November - December (2013), IAEME

TECHNOLOGY (IJCET)

ISSN 0976 6367(Print) ISSN 0976 6375(Online) Volume 4, Issue 6, November - December (2013), pp. 290-298 IAEME: www.iaeme.com/ijcet.asp Journal Impact Factor (2013): 6.1302 (Calculated by GISI) www.jifactor.com

IJCET
IAEME

EFFORT ESTIMATION USING A SOFT COMPUTING TECHNIQUE


Sheenu Rizvi1, Prof. Dr. S. Q. Abbas2,
1

Prof. Dr. Rizwan Beg3

Deptt. of CS, Amity University, Luck now, India 2 Director, AIMT, Luck now, India 3 Controller of Examination, Integral University, Lucknow, India

ABSTRACT Accurately estimating software effort is probably the biggest challenge facing software developers. Estimates done at the proposal stage has high degree of inaccuracy, where requirements for the scope are not defined to the lowest details, but as the project progresses and requirements are elaborated, accuracy and confidence on estimate increases. It is important to choose the right software effort estimation techniques for the prediction of software effort. In the present work Artificial Neural Network (ANN) model has been developed using Multi Layered Feed Forward Neural Network using Back Propagation learning algorithm by iteratively processing a set of training samples and comparing the networks prediction with the actual effort. COCOMO dataset has been used to test and train the network using nine different types of training algorithms. It was observed that the proposed model improves the estimation accuracy of the model. The performance indices Mean-Square-Error (MSE), Magnitude of Relative-Error (MRE), and Regression analysis (R) have been used to compare the results obtained from this method. The preliminary results suggest that the proposed model can be replicated for accurately forecasting the software development effort. Keywords: Software Effort Estimation, Artificial Neural Network, Back Propagation, COCOMO. I. INTRODUCTION Software effort estimation is one of the most critical and complex, but an inevitable activity in the software development processes. Although a great amount of research time, and money have been devoted to improving accuracy of the various estimation models, due to the inherent uncertainty in software development projects as like complex and dynamic interaction factors, intrinsic software complexity, pressure on standardization and lack of software data, it is unrealistic to expect very accurate effort estimation of software development processes [1].

290

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 4, Issue 6, November - December (2013), IAEME

There are many estimation models which have been proposed and can be categorized based on their basic formulation schemes; estimation by expert [8], analogy based estimation schemes [2], algorithmic methods including empirical methods, rule induction methods [7], artificial neural network based approaches [4] [11], Bayesian network approaches [3], decision tree based methods and fuzzy logic based estimation schemes [5]. Among the software cost estimation techniques, COCOMO (Constructive Cost Model) is the mostly used algorithmic cost modeling technique because of its simplicity for estimating the effort in person-months for a project at different stages. Many researchers have [4, 6] explored the possibility of using neural networks for estimating the effort. The aim of the present work is to propose an optimal neural network model for software effort estimation. The network architecture is designed accordingly to accommodate the COCOMO model and its parameters. The present work explores a new single layered artificial neural network that was constructed for software effort estimation and is trained with nine different learning algorithms. II. LITERATURE REVIEW Many software cost estimation models have been developed over the last decades. A recent study by Jorgensen [9] provides a detailed review of different studies on the software development effort. Many researchers have applied the neural networks approach to estimate software development effort [6, 10]. Many different models of neural networks have been proposed [14]. They may be grouped in two major categories. First one is feed forward networks where no loops in the network path occur. Another one is feedback networks that have recursive loops. Understanding the adversity in applying neural networks, Nasser Tadayon [1] has proposed a dynamic neural network that will initially use COCOMO II Model. COCOMO, however, has some limitations. It cannot effectively deal with imprecise and uncertain information, and calibration of COCOMO is one of the most important tasks that need to be done in order to get accurate estimations. So, there is always scope for developing effort estimation models with better predictive accuracy. III. ANN METHODOLOGY An ANN can be defined as a system or mathematical model consisting of many nonlinear artificial neurons running in parallel which can be generated as one or multiple layered. A Feed Forward Neural Network (FFNN) consists of at least three layers, input, output and hidden layer. The number of hidden layers and neurons in hidden layer are determined by trial and error method. The strength of connection between the two layers is determined by the weights Wij . The schematic diagram of a FFNN is shown in Fig. 1

X1

X2

X3

Wij

Input Layer

Hidden

Output Layer

Figure 1: Schematic diagram of a FFNN

291

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 4, Issue 6, November - December (2013), IAEME

Each neuron in a layer receives weighted inputs from a previous layer and transmits its output to neurons in the next layer. The summation of weighted input signals is calculated by Eq. (1) and is transferred by a nonlinear activation function given in Eq. (2). The responses of network are compared with the observation results and the network error is calculated with equation (3)
N

Ynet = X i .wi + w0
i =1

(1)

Yout = f ( ynet ) =

1 1 + e Ynet

(2) (3)

1 k J r = . (Yobs Yout ) 2 2 i =1

Yout is the response of neural network system, f (Ynet) is the nonlinear activation function, Ynet is the summation of weighted inputs, Xi is the neuron input, wi is weight coefficient of each neuron input, w0 is bias, Jr is the error between observed value and network result, Yobs is the observation output value IV. AVAILABLE DATA, MODEL INPUTS AND MODEL STRUCTURE One of the most important steps in the development of any prediction model is the selection of appropriate input variables that will allow an ANN to successfully produce the desired results. Good understanding of the process under consideration is an important prerequisite for successful application of data driven approaches. The main reason for this is that ANN belong to the class of data-driven approaches. Predicting Effort is a complicated problem that involves multiple interacting factors. In order to build a reasonably accurate model for prediction, proper parameters must be selected. Some practical considerations in parameter selections are firstly, the selected parameters must affect the target problem, i.e., strong relationships must exist among the parameters and target (or output) variables, and secondly, the selected parameters must be well-populated, and corresponding data must be as clean as possible. Since the soft computing methods model problems based on available data, the availability and quality of data are both essential V. DATA DESCRIPTION In the present work COCOMO81 dataset has been used for Effort Estimation ANN model development. In algorithmic cost estimation, costs and efforts are predicted using mathematical formula. The formula are derived based on some historical data. The best known algorithmic cost model called COCOMO (Constructive Cost Model) was published by Barry Boehm in 1981. It was developed from the analysis of sixty three (63) software projects. The data used as input and output variables for optimum model development are given in the Table:1 below. In all sixteen input variables have been used which include fifteen effort multipliers and the SIZE measured in thousand delivered source instructions. The output of the model is the Development Effort (DE), which is measured in man-months. The data were collected from the analysis of sixty three (63) software projects, as published by Barry Boehm in 1981.

292

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 4, Issue 6, November - December (2013), IAEME

Table 1: Input and Output variables RELY - Required software reliability Input Variables DATA - Data base size CPLX - Product complexity TIME - Execution time STORmain storage constraint VIRTvirtual machine volatility TURNcomputer turnaround time ACAPanalyst capability AEXPapplications experience PCAPprogrammer capability VEXPvirtual machine experience LEXPlanguage experience MODPmodern programming TOOLuse of software tools SCEDrequired development schedule SIZE Development Effort (DE) Output Variable VI. NETWORK BUILDING AND TRAINING NN Model is created using Matlab Neural Network toolbox. Matlab tool facilitates ease of simulation and modeling. As discussed earlier, size of input and output vector is decided. Trials are first conducted by randomly selecting number of processing elements. NN models are created with one hidden layer and varying number of processing elements or neurons. As per Whites theorem, one layer with non-linear activation function is enough to map non-linear functional relationship in a fairly accurate way. In the present work optimal network geometry was investigated, using trial and error approach, in an attempt to create more optimum model. Thus to minimise the number of networks that required training and testing, ANNs containing 2 to 20 nodes were considered in order to narrow down the search. Once this range was determined, the trial and error approach was repeated, with the number of hidden nodes increasing in increment of two from minimum nodes onwards. The learning rate was also initially kept to minimum and slowly increased. Thus various permutation and combinations of both these factors were used during the training process. The fixed period stops of 1000 cycles was used for training the network and the target error was set to stop during training when the average error reaches below 0.999999. Finally the optimum nodes were found for the best developed network and the networks on either side of the best developed network were also tested. Further training of the network is carried out. Training is the process by which the weights of an ANN are estimated, by using an iterative procedure to minimise a predetermined error, or objective function, such as the MSE. Therefore, ANN training is essentially a nonlinear least squares problem, which can be solved using standard nonlinear least squares methods. As mentioned earlier, nine different types of training algoritms are investigated for developing the MLP network. The algorithms include: LM=Levenberg-Marquardt; GD=Gradient Descent; GDA= Gradient Descent with adaptive learning rate; GDX= Gradient Descent with adaptive learning rate and momentum; GDM= Gradient descent with momentum; SCG= Scaled Conjugate Gradient; RP= Resilient Backpropagation; BFG= BFGS quasi-Newton backpropagation; BR= Bayesian Regularization.
293

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 4, Issue 6, November - December (2013), IAEME

MATLAB provides the built in transfer function which is used in the present work, i.e tangent sigmoid transfer in the hidden layer and linear one in the outer layer. Here in this work BackPropagation algorithm has been used for training the Feed Forward Neural Network architecture. Table 2: Model parameter values Parameters used for Network Training Feed Forward Neural Network with Back Propagation LM=Levenberg-Marquardt; GD=Gradient Descent; GDA= Gradient Descent with adaptive learning rate; GDX= Gradient Descent with adaptive learning rate and momentum; GDM= Gradient descent with momentum; SCG= Scaled Conjugate Gradient; RP= Resilient Baxkpropagation; BFG= BFGS quasi-Newton backpropagation BR = Bayesian Regularization learnGDM and learnGD

Network Type Training Functions Used

Adaption Learning Function Performance Function Mean Square Error (MSE); Regression (R) For Hidden layer tansigmoid Transfer Function For output layer - linear 2 to 10; in some cases upto 20 No. Of neurons used for hidden layer

Once the training is complete, the weights are frozen, network structure is finalized and data to be used for functional requirements of the NN model is converted into useful format, training, testing, and validation of the NN model can be started. Matlab NN toolbox has characteristic of dividing the available data into 70 percent for training, 15 percent for testing and another 15 percent for validation of the NN model. Training is the only time data is back propagated through the network. During recall, the network is strictly feed forward. The various parameters used for training the network are given in the Table: 2. VII. PERFORMANCE MEASURING CRITERIA Performance criteria which measure prediction accuracy generally measure the fit (or lack there of) between the model outputs and the observed data by some error measure E y . (1) Mean Squared Error (MSE) MSE evaluates the residual between measured and forecasted values. MSE is a frequently-used measure of the difference between values predicted by a model or an estimator and the values actually observed from the thing being modelled or estimated. Theoretically, if this criterion equals zero then model represents the perfect fit, which is not possible at all. (2) Regression (R) It provides information on the strength of linear relationship between the observed and the computed values. The value r close to 1.0 indicates good model performance and can be calculated using the following formula, (3) Magnitude of Relative Error (MRE) It is defined as MRE ={ [Mod (Actual Effort Predicted Effort)] / Actual Effort}*100
294

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 4, Issue 6, November - December (2013), IAEME

A high score means worse prediction accuracy. Here it is assumed that the error is proportional to the size of the project. VIII. RESULTS AND DISCUSSIONS The model performance in terms of training, testing and validating results corresponding to each neurons added to the hidden layer were calculated and plotted as shown in figures 2 to 7. Table: 3 shows the comparative chart of estimated and predicted effort values for randomly selected 10 project values using COCOMO and ANN methodologies. Further Table: 4 tabulate the Magnitude of Relative Error (MRE) values for both the COCOMO and ANN models. The same results have been plotted graphically in figure 7. This figure shows the effort prediction accuracy of the neural network. The chart clearly shows that there is a decrement in the relative error, so that the proposed model is more suitable for effort estimation. The preliminary results obtained suggest that the proposed architecture can be replicated for accurately forecasting the software development effort. The aim of this study is to improve the estimation accuracy of COCOMO model, so that the estimated effort is more close to the actual effort. Once the network has been trained to the level where the predicted results are fairly accurate, testing is carried out to assure that predicted results are in close proximity to actual values. For the best developed NN model i.e. 16-8-1, with eight neurons in the hidden layer using Levenberg training algorithm, the MSE plot is shown in figure:2(b) for the best validation performance and correspondingly the regression plot of training, testing and validation is shown in figure:2(a). MSE is used to judge the accuracy of the prediction during training, testing and validation. It was also noticed from figure 2 that the performance did not improve even when the network error was low. It was noticed that roughly after 5 epochs the training error continued to decrease even when the performance of testing and validation were somewhat stagnant. This can be referred to the effect as overfitting. Since from the study conducted on network topology, the performance of NN model which is most accurate is 16-8-1, hence it is our best choice of network topology. With this selection of network topology, number of layers, processing elements, generalization characteristics are preserved. It was also noticed that training time is also significantly reduced as there are lesser iterations every time. Figure 6 depicts comparison of actual values and values predicted by best selected NN model for effort estimation using LM training algorithm and GDM training function for N=8. It is noticed that except for rare occasions, simulated effort values for the designated parameters are in acceptable proximity with actual values. This representation therefore agrees with the conclusion that, high accuracy of prediction is attained by Neural Network model after successful completion of training criteria i.e. with the value of MSE being within acceptable range as well as agreeable performance measure. Hence from the results it is inferred that the performance of NN model is acceptable. Further, from the perusal of the data and the analysis of the graphical plot of R values for training, testing and validation data sets from Figure 3, 4 and 5, it is clear that the best NN model that has been created uses Levenberg-Marquardt training algorithm, followed in descending order of performance by BR, RP and finally SCG. Thus, L-M algorithm seems to be better suited for training the NN model for better prediction accuracy.

295

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 4, Issue 6, November - December (2013), IAEME

Figure 2(a): Regression analysis graph for N=4

Figure 2(b): Depicts the acceptable training performance

R Value for different Training Algorithms

R Valus for Testing Data


1
1 0.8
R(T ra in in gD a ta )

LM-GDM LM-GD SCG-GDM SCG-GD BR-GDM BR-GD 1 2 3 4 5 RP-GDM

LM-GDM LM-GD

0.5

0.6 0.4 0.2 0

SCG-GDM SCG-GD BR-GDM BR-GD RP-GDM

0
1 2 3
No. of neurons

No. of Neurons

Figure 3: Shows regression values for training data

Figure 4: Shows regression values for testing data

R Value for Validation Data


1 LM-GDM LM-GD SCG-GDM SCG-GD BR-GDM 1 2 3 4 5 BR-GD RP-GDM

0.5

0 No. of Neurons

Figure 5: Shows regression values for validating data

Figure 6: Comparative Plot of Actual, COCOMO and Predicted ANN Efforts

Sl. No 1 2 3 4 5 6 7 8

Table 3: Comparison of Effort Estimation MRE using COCOMO ANN 8.651814 0.00108 73.9111 98.0285 1.37749 0.048272 2.00825 0.06958 16.9394 20.8772 40.51163 72.82633 22.125 23.66643 41.41395 0.05777

296

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 4, Issue 6, November - December (2013), IAEME

Table 4: MRE Values for different Models Sl. No 1 2 3 4 5 6 7 8 9 10 COCOMO 8.651814 73.9111 1.37749 2.00825 16.9394 40.51163 22.125 41.41395 21.04728 14.17757 MRE using ANN 0.00108 98.0285 0.048272 0.06958 20.8772 72.82633 23.66643 0.05777 53.33515 95.84656

Figure 7: Plot of MRE values for COCOMO and ANN models

IX. CONCLUSION AND FUTURE WORK A reliable and accurate estimate of software development effort has always been a challenge for both the software industrial and academic communities. There are several software effort forecasting models that can be used in forecasting future software development effort. Hence an effort estimation model based on artificial neural networks has been constructed. The idea consists in the use of a model that maps COCOMO model to a neural network with minimal number of layers and nodes to increase the performance of the network. The neural network that has been used to predict the software development effort is the multi layer feed forward neural network with back propagation training algorithm. The COCOMO81 dataset has been used to train and to test the network and it was observed that neural network model provided significantly better effort estimations than the estimation done using COCOMO model. Accordingly, it is inferred that the rationalization and interpretation of the knowledge stored in the architecture and synapse weights of the neural nets is very important to gain practitioners acceptance. Another great advantage of this work is that one can put together expert knowledge, project data and the traditional algorithmic model into one general framework that can have a wide range of applicability in software cost estimation. This work can be extended by integrating this approach with fuzzy logic to effectively deal with imprecise and uncertain information associated with COCOMO. Therefore, a promising line of future work is to extend to the neuro-fuzzy approach.
297

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 4, Issue 6, November - December (2013), IAEME

REFERENCES 1. Boehm, B.W., Software Engineering Economics, Prentice-Hall, Englewood Cliffs, NJ, USA, 1994. [22] Tadayon, N., Neural network approach for software cost estimation, International Conference on Information Technology: Coding and Computing (ITCC 2005), Volume: 2, on page(s): 815- 818, 2005. 22. Chiu NH, Huang SJ, The Adjusted Analogy-Based Software Effort Estimation Based on Similarity Distances, Journal of Systems and Software, Volume 80, Issue 4, pp 628-640, 2007. G. H. Subramanian, P. C. Pendharkar, and M. Wallace, "An Empirical Study of the Effect of Complexity, Platform, and Program Type on Software Development Effort of Business Applications," Empirical Software Engineering, vol. 11, pp. 541-553, 2006. Heiat A, Comparison of Artificial Neural Network and Regression Models for Estimating Software Development Effort, Journal of Information and Software Technology, Volume 44, Issue 15, pp 911- 922, 2002. Huang SJ, Lin CY, Chiu NH, Fuzzy Decision Tree Approach for Embedding Risk Assessment Information into Software Cost Estimation Model, Journal of Information Science and Engineering, Volume 22, Number 2, pp 297313, 2006. Hughes, R.T., An evaluation of machine learning techniques for software effort estimation, University of Brighton, 1996. Jeffery R, Ruhe M,Wieczorek I, Using Public Domain Metrics to Estimate Software Development Effort, In Proceedings of the 7th International Symposium on Software Metrics, IEEE Computer Society, Washington, DC, pp 1627, 2001. Jorgen M, Sjoberg D.I.K, The Impact of Customer Expectation on Software Development Effort Estimates International Journal of Project Management, Elsevier, pp 317-325, 2004. Jrgensen. M., A Review of Studies on Expert Estimation of Software Development Effort, Journal of Systems and Software, Volume 70, pp. 37-60, 2004. Jorgerson, M., Experience with accuracy of software maintenance task effort prediction models, IEEE Transactions on Software Engineering, Volume 21 (8), 674681, 1995. K. Srinivasan and D. Fisher, "Machine learning approaches to estimating software development effort," IEEE Transactions on Software Engineering, vol. 21, pp. 126-137, 1995. P.V.G.D. Prasad Redd, CH.V.M.K. Hari, T. Srinivasa Rao, Multi Objective Particle Swarm Optimization for Software Cost Estimation, International Journal of Computer Applications (0975 8887) Volume 32 No.3, October, 2011. P.V.G.D. Prasad Reddy and CH.V.M.K. Hari, A Fine parameter tuning for COCOMO 81 software effort estimation using Particle swarm optimization, A. Iman and H.O. Siew, Soft Computing Approach for Software Cost Estimation, Int.J. of Software Engineering, IJSE Vol.3 No.1, pp.1-10, January 2010. Srinivasa Rao et al, Predictive and Stochastic Approach for Software Effort Estimation, Int. J. of Software Engineering, IJSE Vol. 6 No. 1 January 2013. Peram Subba Rao, Dr.K.Venkata Rao and Dr.P.Suresh Varma, A Novel Software Interval Type - 2 Fuzzy Effort Estimation Model using S-Fuzzy Controller With Mean and Standard Deviation, International Journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 3, 2013, pp. 477 - 490, ISSN Print: 0976 6367, ISSN Online: 0976 6375. S.HemaChandra and Dr.R.V.S.Satyanarayana, Temperature Control of Transformers using Soft Computing Techniques, International Journal of Computer Engineering & Technology (IJCET), Volume 3, Issue 2, 2012, pp. 133 - 137, ISSN Print: 0976 6367, ISSN Online: 0976 6375.

2.

3.

4.

5.

6. 7.

8. 9. 10. 11. 12.

13.

14. 15.

16.

298

You might also like