Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Information and Software Technology 63 (2015) 44–57

Contents lists available at ScienceDirect

Information and Software Technology


journal homepage: www.elsevier.com/locate/infsof

A fuzzy logic based approach for phase-wise software defects prediction


using software metrics
Harikesh Bahadur Yadav, Dilip Kumar Yadav ⇑
Department of Computer Applications, National Institute of Technology, Jamshedpur 831014, India

a r t i c l e i n f o a b s t r a c t

Article history: Context: The software defect prediction during software development has recently attracted the atten-
Received 29 August 2014 tion of many researchers. The software defect density indicator prediction in each phase of software
Received in revised form 24 January 2015 development life cycle (SDLC) is desirable for developing a reliable software product. Software defect pre-
Accepted 2 March 2015
diction at the end of testing phase may not be more beneficial because the changes need to be performed
Available online 19 March 2015
in the previous phases of SDLC may require huge amount of money and effort to be spent in order to
achieve target software quality. Therefore, phase-wise software defect density indicator prediction model
Keywords:
is of great importance.
Software defect
Software defect density indicator
Objective: In this paper, a fuzzy logic based phase-wise software defect prediction model is proposed
Software metrics using the top most reliability relevant metrics of the each phase of the SDLC.
Fuzzy logic Method: In the proposed model, defect density indicator in requirement analysis, design, coding and test-
Software reliability ing phase is predicted using nine software metrics of these four phases. The defect density indicator met-
ric predicted at the end of the each phase is also taken as an input to the next phase. Software metrics are
assessed in linguistic terms and fuzzy inference system has been employed to develop the model.
Results: The predictive accuracy of the proposed model is validated using twenty real software project
data. Validation results are satisfactory. Measures based on the mean magnitude of relative error and bal-
anced mean magnitude of relative error decrease significantly as the software project size increases.
Conclusion: In this paper, a fuzzy logic based model is proposed for predicting software defect density
indicator at each phase of the SDLC. The predicted defects of twenty different software projects are found
very near to the actual defects detected during testing. The predicted defect density indicators are very
helpful to analyze the defect severity in different artifacts of SDLC of a software project.
Ó 2015 Elsevier B.V. All rights reserved.

1. Introduction the total number of defects divided by the size of the software. A
defect can also be defined as a product anomaly [2]. Software relia-
Nowadays, people are working under direct or indirect influ- bility is the probability that software will not cause the failure of a
ence of software. Human dependence on software has been system for a specified period of time under the specified condi-
increased since last three decades. Software reliability and quality tions. The probability is a function of the inputs to, and use of,
modeling is essential because the software is used in diverse area the system as well as a function of the existence of faults in the
of various applications. Various historical events illustrate the software. The inputs to the system determine whether existing
effect of software failures encountered in and around the world faults, if any, are encountered. Software reliability model were
[1]. The consequences of software failures may result in monetary designed to quantify the likelihood of software failure. A failure
and human losses. Therefore, software quality and reliability pre- is defined as the termination of the ability of a functional unit to
diction is unavoidable and it has become a major research area. perform its required function. In the other way, failure can defined
A general method to measure the quality of software is to reveal as an event in which a system or system component does not per-
the presence of defects in it, and usually the metric used for it is form a required function within specified limits [3].
software defect density. The software defect density is defined as Software defect density prediction plays an important role in
producing reliable software. In order to achieve target defect esti-
⇑ Corresponding author. Tel.: +91 9931897599. mate, it is required to predict the defect density indicator at the
E-mail addresses: yadavaharikesh@gmail.com (H.B. Yadav), dkyadav1@gmail. end of each phase of the SDLC. Numerous models have been pro-
com (D.K. Yadav). posed for estimation and prediction of software reliability in the

http://dx.doi.org/10.1016/j.infsof.2015.03.001
0950-5849/Ó 2015 Elsevier B.V. All rights reserved.
H.B. Yadav, D.K. Yadav / Information and Software Technology 63 (2015) 44–57 45

past three decades. It is observed that traditional models for soft- software can be predicted by using only software complexity or
ware reliability prediction are neither universally successful in pre- software size metric [17]. According to Fenton and Neil [18], most
dicting the reliability of the software nor generally tractable to of the software defects prediction models use size and complexity
users [4]. The majority of models are based on probabilistic metrics to predict the residual defects. Software size metrics mea-
approach. Failure data are not available in the early phases of the sures the intrinsic complexity of the software [19]. Therefore, it is
SDLC. However, the failure information during the early phases better to use size and other software metrics for prediction of the
of the SDLC is available in the form of expert knowledge which software residual defects. A model for defect prediction with
may be reflected in terms of software metrics [5]. In fact, most of Bayesian net is developed by Fenton et al. [20]. The main feature
the software metrics are associated with uncertainty. The smaller is that it does not require detailed domain knowledge and it com-
size of software testing data, unrealistic assumptions, and the fact bines both qualitative and quantitative data. Mohanta et al. [21,22]
that some measures cannot be defined precisely, are the key rea- proposed a model to predict the reliability of object-oriented sys-
sons that a fuzzy logic based approach should be consider for pre- tems during the early stages of the product development based
dicting the software defects. The term defect density indicator on bottom-up approach. In this approach, the reliability of the sys-
(DDI) is used in this proposed model to stand for defect density tem is estimated based on operation profile and reliabilities of
at the end of the each phase of the SDLC. Therefore, in this paper, classes. Okutan and Yildiz [23] proposed a novel method using
a fuzzy logic based phase-wise software defect prediction model Bayesian networks to explore the relationships among software
is proposed using the reliability relevant metrics of the each phase metrics and defect proneness. Dejaeger et al. [24] compare 15 dif-
of the SDLC. ferent Bayesian network classifiers with famous defect estimation
The rest of the paper is organized as follows: In Section 2, methods on 11 data sets. They concluded that simple and compre-
related work is discussed. In Section 3, the proposed methodology hensible networks with fewer nodes can be constructed using the
is presented. Section 4 and 5 describe the twenty case studies and Bayesian network.
predicted result, respectively. In Section 6, model validation is Zhang and Pham [25] suggested thirty-two factors which have
described. Sensitivity analysis is discussed in Section 7. impact on the software reliability in all stages of the software
Conclusion is presented in Section 8. development process. A similar study conducted by Li et al.
[26,27] where they have done phase-wise ranking of software met-
rics which influence the software reliability. These software met-
2. Related work rics were ranked with respect to their ability in predicting
software reliability through an expert opinion elicitation process.
The software development is performed by a human being that Catal and Diri [28,29] provided a systematic review of various
is measured in terms of man-hour. The man-hour is of fuzzy nat- software fault prediction studies with focus on metrics, methods,
ure. Therefore, failure-free software development is a challenging and datasets. Radjenovic et al. [30] reported that process metrics
task. It is very essential to ensure that the underlying software will are more successful in finding the faults compared to traditional
perform its intended functions correctly. Therefore, there is a size and complexity metrics. Hall et al. [31] find out that out of
growing need to ensure reliability of software systems as early as 208 studies only 36 studies are useful and majority is less useful
possible. than they could be. This makes it difficult for software developer
Numerous models have been proposed for estimation and pre- team to select a model to match their context.
diction software reliability [6,7]. The air force’s Rome laboratory Pandey and Goyal [32] have proposed an early fault prediction
[8] developed a model for early software reliability prediction. In model using process maturity and software metrics. They have
this model, they selected some factors that are related to the fault considered the fuzzy profiles of various metrics in different scale
density in the requirement analysis, design, coding, and testing and have not explained the criteria used for developing these fuzzy
phases. The model is mainly based on the software requirement profiles. Yadav et al. [33] proposed a software defect prediction
specification document. Agresti and Evanco [9] proposed a model model in which they had considered only the uncertainty associ-
to predict the defect density on the basis of process and product ated over the assessment of software size metric and three metrics
characteristics. This model uses multivariate regression analysis of requirement analysis phase. Recently, Pandey and Goyal [34]
for defect density prediction. In a similar study, Wohlin and developed a multistage model for residual fault prediction. They
Runeson [10] proposed two novel methods to estimate the number have considered 10 software metrics as input to the model. Maa
of remaining defects based on the review information. Conclusions et al. [35] analyze the ability of requirement metrics for software
about the remaining number of defects are then drawn after defect prediction during design phase.
reviews. Smidts et al. [11] proposed software reliability prediction On the basis of above literature survey and review, it is found
model based on the requirements of software and failure modes. out that the software reliability is a function of number of residual
The input of this model is failure data however these data would defect in the software. Reliability relevant software metrics play a
not be available during early phases unless there would have been vital role in defect prediction and these metrics are of fuzzy nature.
similar executed projects. Early reliability assessment of UML Therefore, in this research paper, a fuzzy logic based model for
based models starts with analysis of the unified modeling language phase-wise software defects density prediction is developed using
model of software architecture followed by the Bayesian frame- the reliability relevant software metrics.
work for reliability prediction. However, this model assumes that
failures of components are independent of each other. This means
that software coding languages allowing shared state information 3. Proposed methodology
are not eligible for this method [12]. A phase based model for pre-
dicting reliability was proposed by Gaffney and Davis [13,14]. The The architecture of the proposed model is shown in Fig. 1. In the
model is mainly based on the fault statistics found during the proposed model, defect density indicator in the first four phases of
review of various software development phases. In short, these tra- SDLC is predicted based on the measures present in first four
ditional models for defect prediction are organization specific and phases of SDLC. Therefore, proposed model leverages the reliability
not flexible. relevant top metrics [27] of the requirements analysis, design, cod-
The main factor influencing the number of residual defects in ing and testing phases of SDLC. Metrics are denoted using elliptical
software is the size of the software [15,16]. Residual defect of structures in the proposed model. Requirements stability,
46 H.B. Yadav, D.K. Yadav / Information and Software Technology 63 (2015) 44–57

Fig. 1. Proposed model architecture.

requirement fault density, and review, inspection, and walk- Predicting the reliability of software system considering all the
through software metrics have been considered as input in metrics which becomes available throughout the phases of soft-
requirements analysis phase to predict the defect density indicator ware life cycle is almost impossible. However, it is essential to con-
at the end of requirement analysis phase. Requirement analysis sider the metrics which are most important from reliability point
phase defect density indicator (RPDDI), cyclomatic complexity, of view. In relevance with this issue, Li and Smidts [27] selected
and design review effectiveness software metrics have been con- thirty software metrics which influence software reliability.
sidered as input in design phase to predict the defect density indi- These software metrics were ranked with respect to their ability
cator at the end of design phase. The design phase defect density in predicting software reliability through an expert opinion elic-
indicator (DPDDI), programmer capability, and process maturity itation process. The top three software metrics applicable in four
software metrics have been considered as input in coding phase different phases (requirements analysis, design, coding and test-
to predict the defect density indicator at the end of coding phase. ing) of software life cycle which are the major contributors of soft-
Similarly, the coding phase defect density indicator (CPDDI), staff ware reliability are shown in Table 1.
experience, and quality of documented test cases have been con- Therefore, in the proposed model, requirement fault density,
sidered as input in testing phase to predict the defect density indi- requirement stability, and review, inspection and walkthroughs
cator at the end of testing phase. The following steps are involved software metrics have been considered in the requirement analysis
in this proposed model. phase. In the design phase, design defect density metric depends
upon the review in the design phase. Therefore, design review effec-
A. Selection of software metrics. tiveness and cyclomatic complexity metrics have been considered in
B. Define membership function of input and output metrics. the design phase in the proposed model. In the coding phase, code
C. Design fuzzy rules. defect density metric depends upon the programmer capability
D. Perform fuzzy inference and defuzzification. and process maturity. Therefore, programmer capability and pro-
cess maturity metrics have been considered in the coding phase in
3.1. Selection of software metrics the proposed model. In the testing phase, testing failure rate metric
and coverage factor metric depend upon the experience of testing
A number of software defect prediction model using software team and quality of documented test cases respectively. Therefore,
metrics has been proposed in last two decades. The prediction of staff experience and quality of documented test cases metrics have
defect from these models may be useful for reliable software devel- been considered in the testing phase in the proposed model.
opment. Almost all existing defect prediction models has consid- Metrics considered in the proposed model are explained as follows:
ered a considerable number of software metrics such as
traditional software metrics, object oriented software metrics, pro- Table 1
cess metrics [28–30]. However, predicting software defects by tak- Top three software metrics present in first four phases of the SDLC [27].
ing all the software metrics (traditional, object oriented and Sl Requirements phase Design phase Coding phase Testing
process metrics) have following drawbacks: computationally com- no phase
plex, more expensive processing cost, there are many less impor- 1 Fault density Design defect Code defect Testing
tant software metrics, correlations among the software metrics, density density failure rate
increase time complexity. Prior study shows that the right selec- 2 Requirements change Cyclomatic Design defect Code defect
tion of metrics plays a vital role in improving the defect prediction requests complexity density density
3 Reviews, inspections Fault density Cyclomatic Coverage
[36]. Right selection of software metric could improve the predic-
and walkthroughs complexity factor
tion accuracy.
H.B. Yadav, D.K. Yadav / Information and Software Technology 63 (2015) 44–57 47

3.1.1. Requirement phase software metrics 3.1.2. Design phase software metrics

i. Requirement Stability (RS): Requirement stability is inversely i. Cyclomatic Complexity (CC): It can be used to indicate an
proportional to requirement change request. There are many upper bound on the model for estimating the number
way of requirement gathering. The main problem with remaining software defects [38]. Software complexity
requirement gathering is that stakeholders (user, customer, depends on the decision points. It is usually human intuitive
developer, and project manager) are not very clear to their that the software program with the larger number of deci-
requirements. Requirement change may happen at any time sion points are likely to be more complex. McCabe cyclo-
during a software project development. Requirements matic complexity software metric is used to calculate the
changes are of two types: controlled and uncontrolled. logical complexity of a program by counting the decision
Controlled requirements changes may take place to provide points [39]. The cyclomatic complexity V(G) of a control flow
enhancement of the features of the software system or graph is obtained using Eq. (1).
changing customer needs, and other factors. These require-
ments changes may be essential for the adaptation of a sys- VðGÞ ¼ e  n þ 2p ð1Þ
tem to the changes that occur either in hardware or where e is the number of arcs in the flow graph, n is the num-
software. Uncontrolled requirements changes may lead to ber of nodes in the flow graph and p is the number of con-
adverse effect on cost, quality and reliability, schedule of nected elements.
the project under development. Studies have exposed that ii. Design Review Effectiveness (DRE): Design defects are usually
more than half the errors that occur during software devel- found by design review process during the software project
opment due to imprecisely defined requirements [37]. development. The goal of design review is to make sure that
ii. Requirement Fault Density (RFD): This metrics measures the the design document meets the stakeholder’s requirements
fraction of faulty requirements specification documents. or to find whether design requires modification. If the mod-
Requirement fault density provides an indicator of the soft- ifications required are few, then that is not a big problem.
ware quality of developing software during requirement However, if the modifications required are many, then a fol-
analysis phase. It is side effect of requirement engineering. low up meeting by the moderator or a re-review might be
Requirement fault density may range from simple fault to essential to authenticate whether the changes have been
complex fault that may impact a large segment of the incorporated properly. This has more influence than code
description inspection metric. This metric is very important for software
iii. Review, Inspection and Walkthrough (RIW): This metrics pur- quality assurance.
ify the software product and can be applied at various points
during software project development. Software developer 3.1.3. Coding phase software metrics
and the customer both are actively participate for review
of software requirements specification (SRS) document. i. Programmer Capability (PC): Programmer capability depends
The review is conducted at various levels. The goal of review on the education, experience, intelligence and domain
process is to ensure that the SRS is feasible, complete, con- knowledge of the programmer. An experienced and sound
sistent and accurate. From quality point of view, it is very technical back-ground programmer will develop reliable
important metric. software with less number of defects.

Fig. 2. Process of fuzzy inference and defuzzification.


48 H.B. Yadav, D.K. Yadav / Information and Software Technology 63 (2015) 44–57

Table 2
Assessment of software projects.

Case study no. Project # [20] Size (KLOC) RS RFD RIW CC DRE PC PM SE QDT
1 1 6 L H VH M H H H H H
2 2 0.9 H H VH L H H H H H
3 3 53.9 H VH VH H H VH VH H H
4 7 21 M L VH L H VH VH M H
5 8 5.8 H L H M M H H M M
6 9 2.5 VH M VH L VH VH VH VH H
7 10 4.8 H M H M H H H M M
8 11 4.4 H H H H H H M H M
9 12 19 L M H H M M H H M
10 13 49.1 L H M H H H M M M
11 15 154 VL VH H H H H H H M
12 16 26.7 M H H L H H H H M
13 17 33 M H M L H M M L H
14 19 87 M H H H H H H M H
15 20 50 VL M M VH L VL H VL H
16 21 22 M M H L H H H H H
17 22 44 L M M M L M H M H
18 24 99 L H M M H H H M M
19 29 11 VH M VH M H VH H VH H
20 30 1 VH M VH L H H H H H

ii. Process Maturity (PM): In Software Company, capability Table 4


maturity model (CMM) plays a key role in defining software Design phase software metrics.

development process improvement. CMM has five process Design phase software metrics Fuzzy Linguistic
maturity levels: range terms
Level 1: Initial Input metrics Cyclomatic Complexity (CC) {0–1} {L, M, H}
Level 2: Repeatable Design Review Effectiveness (DRE) {0–1} {L, M, H}
Level 3: Defined Requirement Phase Defect Density {0–1} {VL, L, M,
Indicator (RPDDI) H, VH}
Level 4: Managed
Level 5: Optimizing output metrics Design Phase Defect Density {0–1} {VL, L, M,
Indicator (DPDDI) H, VH}
Software defect density reduces as one proceeds from one CMM
level to next CMM level [40]. The process maturity of the software
development process is an important software metrics for predic-
Table 5
tion of residual defects [18]. There are evidences that higher level
Coding phase software metrics.
companies usually deliver higher quality software product than
lower level companies [18,40,41]. Coding phase Fuzzy Linguistic
software metrics range terms
Input metrics Programmer Capability (PC) {0–1} {L, M, H}
3.1.4. Testing phase software metrics
Process Maturity (PM) {0–1} {L, M, H}
Design Phase Defect Density {0–1} {VL, L, M,
i. Staff Experience (SE): Testing staff having a sound technical Indicator (DPDDI) H, VH}
background and experience has a great impact on the test Output metrics Coding Phase Defect Density {0–1} {VL, L, M,
quality. Staffs involved in software testing are destructive Indicator (CPDDI) H, VH}
in nature and try their best to find software defect.
Therefore, skills and experience of test team has a great
impact on the software quality.
ii. Quality of Documented Test Cases (QDT): Software testing is Table 6
Testing phase software metrics.
costly and time consuming, therefore effective test cases
are needed to be developed. Software test cases are spec- Testing phase software metrics Fuzzy Linguistic
ifications of the inputs to the test and the expected output range terms

from the system plus a statement of what is being tested. Input metrics Staff Experience (SE) {0–1} {L, M, H}
The test cases are designed to expose defects. A good test Quality of Documented Test {0–1} {L, M, H}
Cases (QDT)
case is one that has a high probability to expose software
Coding Phase Defect Density {0–1} {VL, L, M,
defects. Indicator(CPDDI) H, VH}
Output metrics Testing Phase Defect Density {0–1} {VL, L, M,
Indicator(TPDDI) H, VH}
Table 3
Requirement analysis phase software metrics. VL-Very Low, L-Low, M-Medium, H-High, VH-Very High.

Requirement analysis phase software metrics Fuzzy Linguistic


range terms 3.2. Define membership function of input and output metrics
Input metrics Requirement Stability (RS) {0–1} {L, M, H}
Requirement Fault Density (RFD) {0–1} {L, M, H} Membership function can be generated either with the help of
Review, Inspection and {0–1} {L, M, H}
domain expert or real data. The problem of construction of mem-
Walkthrough (RIW)
bership function is very important because the success of a method
Output metrics Requirement Phase Defect Density {0–1} {VL, L, M,
depends on the membership functions used. Therefore, it is needful
Indicator (RPDDI) H, VH}
to explain that how the membership functions is derived. There are
H.B. Yadav, D.K. Yadav / Information and Software Technology 63 (2015) 44–57 49

Fig. 3. Requirement stability.

Fig. 4. Requirement fault density.

Fig. 5. Review, inspection and walkthrough.

no standard guidelines or rules that can be used for the appropriate Membership function can have a variety of shapes like polygo-
membership function construction technique. Another problem nal, trapezoidal, triangular, and so on [42]. However, triangular and
that makes membership function construction an important task trapezoidal shapes provide a convenient representation of domain
is the lack of consensus on the definition and interpretation of expert knowledge and it also simplifies the process of com-
membership functions. Membership functions for all the input putation. Triangular and trapezoidal membership functions are lin-
and output metrics which are considered in the proposed model ear in nature. Further, triangular and trapezoidal membership
should be defined by domain experts. Developing a membership functions are more suitable when the precise membership func-
function with help of domain expert knowledge is one of the basic tions of a fuzzy set are not known [43,44]. Therefore, in this model
steps in the design of a problem which is to be solved by fuzzy set triangular and trapezoidal membership are considered for
theory. representing the linguistic states. In the proposed model,
50 H.B. Yadav, D.K. Yadav / Information and Software Technology 63 (2015) 44–57

Fig. 6. Requirement phase defect density indicator.

Fig. 7. Cyclomatic complexity.

Fig. 8. Design review effectiveness.

membership functions of all the input and output metrics are from different sources such as domain experts, historical data
defined with the help of domain experts. analysis, and knowledge engineering from existing literature
[25,27]. In general, fuzzy rules are designed with the help of domain
3.3. Design fuzzy rules experts. In the proposed model the fuzzy rules that are required for
the prediction of defects of software projects are defined with the
In this step fuzzy rule is defined in the form of IF-THEN condi- help of domain expert. Considering all the selected software met-
tional statement. IF part of the rule is known as antecedent, and rics one at a time, it is required to define large numbers of rules
THEN part is consequent [45]. The fuzzy rule base can be designed for the prediction of software defects. Therefore, software metrics
H.B. Yadav, D.K. Yadav / Information and Software Technology 63 (2015) 44–57 51

Fig. 9. Design phase defect density indicator.

Fig. 10. Programmer capability.

Fig. 11. Process maturity.

involved from phase to phase are considered in the proposed applications, crisp value needs to be obtained as an output. The
model. It reduces the required number of rules. defuzzification method such as centroid, max–min and bisection
maps fuzzy set into crisp value. Centroid method of defuzzification
3.4. Perform fuzzy inference and defuzzification (also known as center of area, center of gravity) is used to calculate
the value of z⁄ in the proposed model. This method is the most
Fuzzy inference engine evaluates and combines the result of common and physically appealing of all the defuzzification meth-
each fuzzy rule. Fuzzy inference engine maps fuzzy set into a fuzzy ods [42]. The process of fuzzy inference and defuzzification [46]
set. A fuzzy max–min operator is used for this step. In many is shown in Fig. 2.
52 H.B. Yadav, D.K. Yadav / Information and Software Technology 63 (2015) 44–57

Fig. 12. Coding phase defect density indicator.

Fig. 13. Staff experience.

Fig. 14. Quality of documented test cases.

4. Case studies software metrics are represented in terms of Very Low (VL), Low
(L), Medium (M), High (H) and Very High (VH).
4.1. Data set used
4.2. Model illustration: case study 1
In order to validate the proposed model twenty real software
projects data sets are used from [20] for case studies and repro- In this case study, software project #1 [20] has been considered
duced in Table 2 where the qualitative value of considered to explain the proposed approach. Following are the steps for
H.B. Yadav, D.K. Yadav / Information and Software Technology 63 (2015) 44–57 53

Fig. 15. Testing phase defect density indicator.

Table 7 4.2.2. Define membership function of input and output variable


Requirement analysis phase fuzzy rule. The software metric values have fuzziness in nature at the each
Rule no. Fuzzy rule phases of SDLC. Membership function is used to describe the fuzzi-
1 If RS is L and RFD is L and RIW is L then RPDDI is L
ness of software metrics. There are many methods for assigning
2 If RS is L and RFD is L and RIW is M then RPDDI is VL membership values to fuzzy variables such as: intuitive, inference,
. . .. . .. ............................................................................................ rank ordering, angular fuzzy sets, neural networks, genetic algo-
26 If RS is H and RFD is H and RIW is M then RPDDI is L rithm. In intuitive method, membership value is assigned based
27 If RS is H and RFD is H and RIW is H then RPDDI is VL
on the human intelligence and its capability of thinking. In the pro-
posed method membership function is developed using the help of
domain expert. The proposed model consists of a set of input and
finding the defect density indicator and total number of defects for output software metrics. The membership functions for each input
case study 1. and output software metrics is developed using triangular and
trapezoidal function which is illustrated in Figs. 3–15. These mem-
4.2.1. Selection of software metrics bership functions affect both the performance and predictability of
The fuzzy range of software metric varies from project to pro- proposed approach.
ject for the same software metric. For example, cyclomatic com-
plexity metric has fuzzy range 1–45, 1–180, 1–96, 1–470 for KC1, 4.2.3. Design fuzzy rules
KC2, CM1, and JM1 data set respectively. However, the maximum The fuzzy rules for case study 1 is shown phase wise in Tables
value of cyclomatic complexity for these data sets lies below ten 6–9.
[2,50]. Moreover, software metrics considered in the proposed
model have different fuzzy range. Therefore, the fuzzy range of i. Requirements analysis phase fuzzy rule: If RS is more, the
software metrics considered in the proposed model has been defect will be less and if RFD is more, the defect will be more
represented in the normalized form i.e., in the range [0, 1]. The nor- but it is not applicable for RIW. In the proposed model, there
malized fuzzy range is obtained by Eq. (2). are three input metrics in requirement analysis phase. Each

 
Minimum Value  Minimum Value Maximum Value  Minimum Value
Normalized fuzzy range ¼ ; ð2Þ
Maximum Value  Minimum Value Maximum Value  Minimum Value

Software metrics available in the each phase of SDLC provide input metric has three linguistic states i.e., low (L), medium
the initial explanation of the performance of software. Therefore, (M) and high (H). Therefore, total number of rules is 27.
the selected software metrics and their fuzzy range and linguistic These fuzzy rules are given in Table 7.
states are shown in Tables 3–6. ii. Design phase fuzzy rule: For lower value of CC, the defect will
be lower but for lower value of DRE, the defct will be higher.
In the proposed model, There are three input metrics in
design phase. Two input metric has three linguistic states

Table 8 Table 9
Design phase fuzzy rule. Coding phase fuzzy rule.

Rule no. Fuzzy rule Rule no. Fuzzy rule


1 If CC is L and DRE is L and RPDDI is VL then DPDDI is VL 1 If PC is L and PM is L and DPDDI is VL then CPDDI is VL
2 If CC is L and DRE is L and RPDDI is L then DPDDI is VL 2 If PC is L and PM is L and DPDDI is L then CPDDI is L
. . .. . ... ............................................................................................ . . .. . .. ......................................................................................................
44 If CC is H and DRE is H and RPDDI is H then DPDDI is VH 44 If PC is H and PM is H and DPDDI is H then CPDDI is L
45 If CC is H and DRE is H and RPDDI is VH then DPDDI is VH 45 If PC is H and PM is H and DPDDI is VH then CPDDI is H
54 H.B. Yadav, D.K. Yadav / Information and Software Technology 63 (2015) 44–57

Table 10
Testing phase fuzzy rule.
5. Prediction result

Rule no. Fuzzy rule The proposed model has considered only the reliability relevant
1 If SE is L and QDT is L and CPDDI is VL then TPDDI is VL metrics of requirement analysis, design, coding and testing phase
2 If SE is L and QDT is L and CPDDI is L then TPDDI is L of the SDLC. The proposed model predicated the defect density indi-
. . .. . .. . . .................................................................................................
44 If SE is H and QDT is H and CPDDI is H then TPDDI is M
cator of these phases using the software metrics and fuzzy inference
45 If SE is H and QDT is H and CPDDI is VH then TPDDI is H system. The prediction results of twenty real software projects are
shown in Table 11. It contains the defect density indicator in require-
i.e., low (L), medium (M) and high (H) and one input metric ment analysis phase, design phase, coding phase, and testing phase.
has five linguistic states i.e., very low (VL), low (L), medium It also contains actual defects, defects predicted by proposed model,
(M), high (H), very high (VH). Therefore, total number of Fenton et al. [20], Yadav et al. [33], and Pandey and Goyal [34] mod-
rules is 45. These fuzzy rules are given in Table 8. els. Number of software defects is calculated using the testing phase
iii. Coding phase fuzzy rule: If the PC and PM is high then defect defect density indicators as discussed in Section 4.2.4. Fenton pro-
will be low in software project. In the proposed model, There posed a Bayesian net model for predicting the software defects for
are three input metrics in coding phase. Two input metric the same software projects. Predicated defects of software projects
has three linguistic states i.e., low (L), medium (M) and high has been compared from the similar results done by Fenton et al.
(H) and one input metric has five linguistic states i.e., very [20], Yadav et al. [33], and Pandey and Goyal [34].
low (VL), low (L), medium (M), high (H), very high (VH).
Therefore, total number of rules is 45. These fuzzy rules 6. Model validation
are given in Table 9.
iv. Testing phase fuzzy rule: Similarly in testing phase, lower 6.1. Evaluation measures
value of CPDDI will be cause of lower value of TPDDI. Two
input metric has three linguistic states i.e., low (L), medium To validate the prediction accuracy of the proposed model com-
(M) and high (H) and one input metric has five linguistic monly used and suggested evaluation measures [20,49] have been
states i.e., very low (VL), low (L), medium (M), high (H), very taken which are as follows.
high (VH). Therefore, total number of rules is 45. These fuzzy
rules are given in Table 10. i. Mean Magnitude of Relative Error (MMRE): MMRE is the mean
of absolute percentage errors. It is a measure of the spread of
4.2.4. Perform fuzzy inference and defuzzification the variable Z, where Z = estimate/actual
Defect density indicator value is obtained using fuzzy inference 1X n
jyi  y^i j
tool of MATLAB at the end of requirement analysis phase, design MMRE ¼ ð3Þ
n i¼1 yi
phase coding phase and testing phase. There exist an approxi-
mately linear relationship between software size and number of where yi is the actual value and y ^i is the estimated value of a
defects [47,48]. Therefore, the total number of software defect is variable of interest
calculated as follows: ii. Balanced Mean Magnitude of Relative Error (BMMRE): MMRE is
unbalanced and penalizes overestimates more than the under-
Case study no.: 1 estimates. For this reason, a balanced mean magnitude of rela-
Project no. # [20] tive error measure is also considered which is as follows:
RPDDI: 0.0133
1X n
^i j
jyi  y
DPDDI: 0.0183 BMMRE ¼ ð4Þ
^i Þ
n i¼1 Min ðyi ; y
CPDDI: 0.0222
TPDDI: 0.0259 The lesser value of MMRE and BMMRE indicates better accu-
Total no. of defects = TPDDI ⁄ LOC = 0.0259 ⁄ 6000 = 155 racy of prediction.

Table 11
Actual and predicted number of defects.

Case study no. RPDDI DPDDI CPDDI TPDDI Actual Defect Defects predicted by
Proposed Model Yadav et al. [33] Pandey and Goyal [34] Fenton et al. [20]
1 0.0133 0.0183 0.0222 0.0259 148 155 88 56 75
2 0.0167 0.0219 0.0290 0.0333 31 30 9 6 52
3 0.0025 0.0031 0.0038 0.0038 209 205 261 211 254
4 0.0066 0.0080 0.0087 0.0099 204 209 204 113 262
5 0.0066 0.0080 0.0087 0.0092 53 53 56 54 48
6 0.0044 0.0059 0.0061 0.0067 17 17 24 – 57
7 0.0044 0.0059 0.0061 0.0064 29 31 70 26 203
8 0.0033 0.0181 0.0183 0.0191 71 84 64 41 51
9 0.0030 0.0042 0.0046 0.0051 90 97 92 176 347
10 0.0014 0.0020 0.0025 0.0028 129 142 476 337 516
11 0.0055 0.0073 0.0098 0.0113 1768 1740 1490 1651 1526
12 0.0025 0.0031 0.0038 0.0038 109 101 130 128 145
13 0.0100 0.0146 0.0185 0.0222 688 733 589 136 444
14 0.0030 0.0042 0.0046 0.0051 476 446 130 574 581
15 0.0033 0.0181 0.0183 0.0191 928 955 892 869 986
16 0.0064 0.0075 0.0081 0.0087 196 192 214 106 259
17 0.0025 0.0031 0.0038 0.0044 184 194 213 291 501
18 0.0033 0.0181 0.0183 0.0191 1597 1554 1440 – 1514
19 0.0066 0.0080 0.0082 0.0082 91 91 107 110 116
20 0.0030 0.0042 0.0046 0.0051 5 5 5 6 46
H.B. Yadav, D.K. Yadav / Information and Software Technology 63 (2015) 44–57 55

Table 12
Values of model evaluation measures.

Project size MMRE BMMRE


Fenton et al. Yadav et al. Pandey and Goyal Proposed Fenton et al. Yadav et al. Pandey and Goyal Proposed
[20] [33] [34] model [20] [33] [34] model
Projects < 5 KLOC (n = 5) 3.5024 0.5268 0.3831 (n = 4) 0.0639 3.5245 0.6862 1.3034 (n = 4) 0.0642
5KLOC 6 Projects < 50 KLOC 0.9731 0.3936 0.5880 0.0470 1.0416 0.4237 1.0907 0.0475
(n = 10)
Projects 6 50 KLOC (n = 5) 0.1375 0.1313 0.0863 (n = 4) 0.0308 0.1424 0.1404 0.0885 (n = 4) 0.0319
All projects (n = 20) 1.11377 0.3613 0.4310 (n = 18) 0.0472 1.4375 0.4185 0.9153 (n = 18) 0.0478

n is the number of projects, MMRE – Mean Magnitude of Relative Error, BMMRE – Balanced Mean Magnitude of Relative Error.

Fig. 16. Project size vs. number of defects.

Fig. 17. Impact of input metrics on defect density indicator metrics (i–iv).

6.2. Validation results the different measures is better than Fenton et al. [20], Yadav
et al. [33], and Pandey and Goyal [34]
It can be observed in Table 12 that the MMRE and BMMRE for It can also be observed that the predictive accuracy of the model
the proposed model are 0.0472 and 0.0478, respectively. It is clear expressed by different measures increases with the size of the pro-
that the predictive accuracy of the proposed model, expressed by ject. The measures based on relative error (MMRE, BMMRE)
56 H.B. Yadav, D.K. Yadav / Information and Software Technology 63 (2015) 44–57

correct decision regarding rework and software development


resource utilization. The proposed model has been applied on
twenty software projects. The predicted defect for twenty software
projects are found very near to the actual defects detected during
testing. The proposed model is very useful for software developers
for developing a reliable software product at reduced cost.

References

[1] M.R. Lyu, Handbook of Software Reliability Engineering, vol. 222, IEEE
Computer Society Press, CA, 1996.
[2] IEEE Guide for the use of IEEE Standard Dictionary of Measures to Produce
Reliable Software. IEEE, New York, IEEE Std. 982.2-1988, 1988.
[3] IEEE Standard Glossary of Software Engineering Terminology. IEEE, New York,
Fig. 18. Propagation of defects in different phases. IEEE Std. 610.12–1990, pp. 1–84, 1990.
[4] K.Y. Cai, C.Y. Wem, M.L. Zhang, A critical review on software reliability
modeling, Reliab. Eng. Syst. safety 32 (3) (1991) 357–371.
decrease significantly, as project size increases for the proposed [5] C. Kaner, Software engineering metrics: what do they measure and how do we
model, Fenton et al. [20] model and Yadav et al. [33] model. know?, in: 10th International Software Metrics Symposium, vol. 6, 2004.
[6] J.D. Musa, A. Iannino, K. Okumoto, Software Reliability: Measurement,
From Fig. 16, it is clear that the prediction accuracy of the proposed Prediction, Application, McGraw-Hill Publishers, New York, 1987.
model is much closer to actual defect than the defect predicted by [7] H. Pham, System Software Reliability, Reliability Engineering Series, Springer-
Fenton et al. [20] and Yadav et al. [33]. Verlag Publisher, London, 2006.
[8] Methodology for Software Reliability Prediction and Assessment. TechRep RL-
TR-92-95, Rome Laboratory, vol. 1–2, 1992.
7. Sensitivity analysis [9] W.W. Agresti, W.M. Evanco, Projecting software defects form analyzing ada
design, IEEE Trans. Softw. Eng. 18 (11) (1992) 988–997.
[10] C. Wholin, P. Runeson, Defect content estimations from review data, in:
In order to justify the effect of software metrics in the proposed Proceedings of 20th International Conference on software Engineering, 1998,
model, sensitivity analysis is preformed. In the sensitivity analysis, pp. 400–409.
[11] C. Smidts, M. Stutzke, R.W. Stoddard, Software reliability modeling: an
we analyze the impact of input variable on output variable. It is
approach to early reliability prediction, IEEE Trans. Reliab. 47 (3) (1998)
desirable to know the significance of input metrics on the predic- 268–278.
tion of defect density indicator. [12] V. Cortellesa, H. Singh, B. Cukic, Early reliability assessment of UML based
Fig. 17 shows the sensitivity of requirements fault density, software models, in: Proceedings of the 3rd International Workshop on
Software and Performance, 2002, pp. 302–309.
requirement phase defect density indicator, design phase defect [13] J.E. Gaffney Jr., C.F. Davis, An approach to estimating software errors and
density indicator, and coding phase defect density on the defect availability, in: Proceedings of 11th Minnowbrook Workshop on Software
density indicator predicted at the end of requirements analysis Reliability, SPC-TR-88-007, version 1.0, July 26–29, 1988, Blue Mountain Lake,
NY, 1988.
phase, design phase, coding phase, and testing phase respectively. [14] J.E. Gaffney Jr., J. Pietrolewiez, An automated model for software early error
From Fig. 17(i), it is clear that sensitivity of RFD metric on prediction (SWEEP), in: Proceedings of 13th Minnowbrook Workshop on
RPDDI is almost constant when it is low and high but when RFD Software Reliability, Blue Mountain Lake, NY, 1990.
[15] J.E. Gaffney Jr., Estimating the number of faults in code, IEEE Trans. Softw. Eng.
increase from low to medium its show high change in defect den- 10 (4) (1984) 141–152.
sity indicator. Similarly when the defect density indicator of pre- [16] M. Lipow, Number of faults per line of code, IEEE Trans. Softw. Eng. Se-8 (4)
vious phase is very high the defect density indicator predicted in (1982) 437–439.
[17] T.M. Khoshgoftaar, J.C. Musson, Predicting software development errors using
the next phase is almost constant; similarly it is almost constant software complexity metrics, IEEE J. Sel. Areas Commun. 8 (2) (1990) 253–261.
when it is very low and low. However when it lies between med- [18] N.E. Fenton, M. Neil, A critique of software defect prediction models, IEEE
ium and high it shows the maximum impact. Trans. Softw. Eng. 25 (5) (1999) 675–689.
[19] N.E. Fenton, M. Neil, et al., Predicting software defects in varying development
Phase wise defect density indicator is important for software
lifecycles using Bayesian nets, Inf. Softw. Technol. 49 (1) (2007) 32–43.
professionals and researchers to know the phase in which mod- [20] N.E. Fenton, M. Neil, et al., On the effectiveness of early life cycle defect
ification needs to be performed in order to achieve the reliable prediction with Bayesian Nets, Empirical Softw. Eng. 13 (5) (2008) 499–537.
software within time and costs. We can observe from Fig. 18 that [21] S. Mohanta, G. Vinod, A.K. Ghosh, R. Mall, An approach for early prediction of
software reliability, ACM SIGSOFT Softw. Eng. Notes 35 (6) (2010) 1–9.
the maximum defect density occurs in requirement analysis phase [22] S. Mohanta, G. Vinod, R. Mall, A technique for early prediction of software
which also effect later on in the design phase, coding phase and reliability based on design metrics, Int. J. Syst. Assurance Eng. Manage. 2 (4)
testing phase. In sum we can say that software metrics that are (2011) 261–281.
[23] Okutan, O.T. Yildiz, Software defect prediction using Bayesian networks,
responsible for defects present in the initial phases of SDLC need Empirical Softw. Eng. 19 (1) (2014) 154–181.
to be considered with more attention than the metrics that become [24] K. Dejaeger, T. Verbraken, B. Baesens, Toward comprehensible software fault
available in the later phases of SDLC. In Case study 8, 15 and 18, prediction models using Bayesian network classifiers, IEEE Trans. Software
Eng. 39 (2) (2013) 237–257.
defect density in design phase is higher than requirement analysis [25] X. Zhang, H. Pham, An analysis of factors affecting software reliability, J. Syst.
phase. Therefore, design phase is critical in these projects. Similarly Softw. 50 (1) (2000) 43–56.
in case study 1, 2, 11 and 13 coding phase is critical and requires [26] M. Li, C. Smidts et al., Ranking software engineering measures related to
reliability using expert opinion, in: Proceedings of 11th International
high attention. Symposium on Software Reliability Engineering (ISSRE), October 08-1, 2000,
SanJose, California, 2000, pp. 246–258.
[27] M. Li, C. Smidts, A ranking of software engineering measures based on expert
8. Conclusion opinion, IEEE Trans. Softw. Eng. 29 (9) (2003) 811–824.
[28] C. Catal, B. Diri, A systematic review of software fault predictions studies,
In this paper, a fuzzy logic based model is proposed for predict- Expert Syst. Appl. 36 (4) (2009) 7346–7354.
[29] C. Catal, Software fault prediction: a literature review and current trends,
ing software defects density indicator at each phase of SDLC. The
Expert Syst. Appl. 38 (4) (2011) 4626–4636.
proposed model considers only reliability relevant software met- [30] D. Radjenovic et al., Software fault prediction metrics: a systematic literature
rics of each phase of SDLC. The predicted defect density indicators review, Inf. Softw. Technol. 55 (8) (2013) 1397–1418.
are very helpful to analyze the defects severity in different artifacts [31] T. Hall, S. Beecham, D. Bowes, D. Gray, S. Counsell, A systematic literature
review on fault prediction performance in software engineering, IEEE Trans.
of SDLC of a software project. Software developer may easily detect Software Eng. 38 (6) (2012) 1276–1304.
the artifact which has more defects and accordingly may take [32] A.K. Pandey, N.K. Goyal, Early Software Reliability Prediction, Springer, 2013.
H.B. Yadav, D.K. Yadav / Information and Software Technology 63 (2015) 44–57 57

[33] D.K. Yadav, S.K. Charurvedi, R.B. Mishra, Early software defects prediction International Conference on Fuzzy Systems, 2003, FUZZ’03, vol. 2, 2003, pp.
using fuzzy logic, Int. J. Performability Eng. 8 (4) (2012) 399–408. 881–886.
[34] A.K. Pandey, N.K. Goyal, Multistage model for residual fault prediction, in: [44] H.B. Yadav, D.K. Yadav, A multistage model for defect prediction of
Early Software Reliability Prediction, Springer, India, 2013, pp. 59–80. software development life cycle using fuzzy logic, in: Proceedings of the
[35] Y. Maa, S. Zhua, K. Qin, G. Luo, Combining the requirement information for Third International Conference on Soft Computing for Problem Solving
software defect estimation in design time, Inform. Process. Lett. 114 (9) (2014) (SOCPROS-2013), 26–28 Dec. 2013, Advances in Intelligent Systems and
469–474. Computing, vol. 259, IIT Roorkee, Springer India Publication, India, 2014,
[36] P. He, B. Li, X. Liu, J. Chen, Y. Ma, An empirical study on software defect pp. 661–671.
prediction with a simplified metric set, Inf. Softw. Technol. 59 (2015) 170–190. [45] L.A. Zadeh, Knowledge representation in fuzzy logic, IEEE Trans. Knowl. Data
[37] N. Martin, N. Fenton, L. Nielson, Building large-scale Bayesian networks, Eng. 1 (1) (1989) 89–100.
Knowl. Eng. Rev. 15 (3) (2000) 257–284. [46] H.B. Yadav, D.K. Yadav, Early software reliability analysis using reliability
[38] S.H. Kan, Metrics and Models in Software Quality Engineering, Addison- relevant software metrics, Int. J. Syst. Assurance Eng. Manage. (2014) 1–12,
Wesley Publications, 2002. http://dx.doi.org/10.1007/s13198-014-0325-3, 2014.
[39] T. McCabe, A complexity measure, IEEE Trans. Softw. Eng. 4 (1976) 308–320. [47] D. Tang, M. Hecht, Evaluation of software dependability based on stability test
[40] M. Diaz, J. Sligo, How software process improvement helped Motorola, IEEE data, Fault-Tolerant Computing, 27–30 June 1995, FTCS-25. Digest of Papers,
Softw. 14 (5) (1997) 75–81. Twenty-Fifth International Symposium on, pp. 434–443, 1995.
[41] M. Agrawal, K. Chari, Software effort, quality and cycle time: a study of CMM [48] C. Withrow, Error density and size in Ada software, IEEE Softw. 7 (1) (1990)
level 5 projects, IEEE Trans. Softw. Eng. 33 (2007) 145–156. 26–30.
[42] T.J. Ross, Fuzzy Logic with Engineering Applications, 2nd ed., John Wiley & [49] S. Chulani, B. Boehm, B. Steece, Bayesian analysis of empirical software
Sons Publications, 2009. engineering cost models, IEEE Trans. Softw. Eng. 25 (4) (1999) 573–583.
[43] M. Kaya, R. Alhajj, A clustering algorithm with genetically optimized [50] PROMISE Repository <http://promise.site.uottawa.ca/SERepository/datasets-
membership functions for fuzzy association rules mining, in: The 12th IEEE page.html>.

You might also like