Identifcation Offuzzy Models Ofsoftware Cost Estimation

Fuzzy Sets and Systems 145 (2004) 141 – 163
www.elsevier.com/locate/fss
Identi"cation of fuzzy models of software cost estimation

Zhiwei Xua , Taghi M. Khoshgoftaarb;∗
a
Motorola Labs, Schaumburg, IL, USA
b
Department of Computer Science and Engineering, Empirical Software Engineering Laboratory, Florida Atlantic
University, Boca Raton, FL 33431, USA
Abstract
Software cost estimation is one of the most critical tasks in managing software projects. Development costs
tend to increase with project complexity, and hence accurate cost estimates are highly desired during the early
stages of development. An important objective of the software engineering community has been to develop
useful models that constructively explain the software development life-cycle and accurately estimate the cost
of software development. Currently used software development e4ort estimation models such as, COCOMO
and Function Point Analysis, do not consistently provide accurate project cost and e4ort estimates. This is often
because important project data, available at the time of modeling, are often vague, imprecise, and incomplete.
Traditionally used cost estimation models cannot utilize such vague yet important information in their models.
Fuzzy logic-based cost estimation models are more appropriate when vague and imprecise information is to
be accounted for. Such models usually rely on expert knowledge, which is however, often too general to "t
a particular data set because di4erent data sets have di4erent characteristics. We present an innovative fuzzy
identi"cation cost estimation modeling technique to deal with linguistic data, and automatically generate fuzzy
membership functions and rules. A case study based on the COCOMO81 database compared the proposed
model with all three COCOMO models, i.e., Basic, Intermediate, and Detailed. It was observed that the fuzzy
identi"cation model provided signi"cantly better cost estimations than the three COCOMO models.
c 2003 Published by Elsevier B.V.
Keywords: Fuzzy identi"cation; Rule generation; Software cost estimation; Fuzzy clustering; COCOMO models
1. Introduction
Estimating the cost and the schedule required to develop a software system is one of the most
critical and di?cult tasks in managing software projects. While technical and marketing issues have
a strong impact on a project’s success, poorly managed projects—no matter how advanced the
∗
Corresponding author. Tel.: +1-561-297-3994; fax: +1-561-297-2800.
E-mail addresses: zhiwei.xu@motorola.com (Z. Xu), taghi@polaris.cse.fau.edu (T.M. Khoshgoftaar).
c 2003 Published by Elsevier B.V.

0165-0114/$ - see front matter
doi:10.1016/j.fss.2003.10.008
142 Z. Xu, T.M. Khoshgoftaar / Fuzzy Sets and Systems 145 (2004) 141 – 163
technology—are more likely to fail than succeed. Despite increasing attempts to treat software de-
velopment as a form of engineering, many projects are still not completed on schedule, with under
or over estimates of e4ort each causing their own particular problems. Therefore, in order to reduce
budget and schedule overruns and to improve contractual bids for development projects, various soft-
ware cost estimation models have been developed [1,5,19,22]. In the "eld of software engineering,
both terms, cost and e4ort imply the same concept, and hence in this paper we use these two terms
interchangeably.
The rapidly changing nature of software development has made it extremely di?cult to develop
cost models that continue to yield high prediction accuracies. Software development costs continue
to increase and practitioners continue to express their concerns over their inability to accurately
predict the costs involved. Thus, one of the most important objectives of the software engineering
community has been to develop useful models that constructively explain the software development
life-cycle and accurately predict the cost of developing a software product.
A few of the current software development e4ort estimation models include: COCOMO [5], SLIM
[19] Estimacs, and Function Point Analysis (FPA) [1,15]. However, no model has proven to be con-
sistent in successfully providing accurate software development e4ort estimations. This is largely
due to the fact that information about software e4ort is often uncertain, imprecise, and incomplete.
In the early stages of software development, it is di?cult to build an explicit software e4ort esti-
mation model at the time of modeling. In such cases, using a fuzzy identi"cation approach is more
appropriate.
The application of fuzzy logic concepts to software cost estimation modeling has been recently
explored by various researchers. Generally speaking, such e4orts can be classi"ed into two categories:
(1) using fuzzy numbers for interval prediction [8] and (2) rule-based fuzzy logic [9]. Both of these
categories are based on expert knowledge. However, expert knowledge is often too general to "t a
particular data set. This stimulates us to seek a way to generate rules from data directly that can
show the particular relationship of independent and dependent variables in a particular data set.
Recently, various methods have been proposed for automatically generating fuzzy if–then rules
from numerical data. Most of these methods applied iterative learning procedures or complicated
rule generation mechanisms. These algorithms include gradient descent learning methods [10,11,17]
genetic algorithm-based methods [12,13] least squares methods [20,21] fuzzy c-means method [25]
and fuzzy-neuro method [6,16,18]. We also built a fuzzy rule extraction model to classify fault-
prone and not fault-prone modules [24]. However, all of the above mentioned modeling methods
can only extract rules from numerical data. Furthermore, most software development e4ort estimates
are required and performed at the earlier stages of development, where important software attributes
can only be expressed as vague and imprecise non-numerical values. Consequently, many useful
software attributes used for e4ort estimation are linguistic values, which makes the above-mentioned
fuzzy logic methods unsuitable for software e4ort estimation.
In this paper, we present an innovative fuzzy identi"cation method to deal with linguistic data, and
generate fuzzy membership functions and rules automatically. This paper summarizes the concepts
of fuzzy identi"cation and its usage in software e4ort estimation. To the best of our knowledge,
this is the "rst time that linguistic software attributes, recorded in the earlier life cycle stages,
have been modeled in this way. A case study based on the COCOMO81 database compared a
model built using the proposed fuzzy identi"cation model with the three COCOMO81 models, i.e.,
Basic, Intermediate, and Detailed. It was observed that the proposed fuzzy identi"cation model
Z. Xu, T.M. Khoshgoftaar / Fuzzy Sets and Systems 145 (2004) 141 – 163 143
provided signi"cantly more accurate cost estimations than all of the three COCOMO81 estimation
models.
The layout of the rest of the paper is as follows. In Section 2, we brieMy introduce the principle
of software e4ort estimation models with a focus on the Intermediate COCOMO81 model. Section 3
presents the fuzzy identi"cation and fuzzy identi"cation techniques that could be applied to soft-
ware e4ort estimation. In Sections 4 and 5, we describe the validation and analysis of the results
obtained from our experiment. A conclusion and an overview of future work conclude this paper in
Section 6.
2. Software development eort estimation models
A large portion of the work in the cost estimation "eld has focused on algorithmic cost modeling.
In these methods, mathematical formulas are used to estimate the software project cost and e4ort
based on software metrics and other input parameters. An alternative method is based on a formal
model. In a formal model, the formulae used arise from the analysis of historical data. In both cases,
the accuracy of the model can be improved by calibrating the model to a project-speci"c development
environment. This involves adjusting the weights of the metrics according to their importance.
2.1. COCOMO81 (COnstructive COst MOdel)
The COCOMO cost estimation model is used by numerous software project managers, and is
based on a study of hundreds of software projects. Unlike other cost estimation models, COCOMO
is an open model, hence all of its details are published, including: COCOMO81, derived from the
analysis of 63 software projects in 1981. Boehm proposed three levels of the COCOMO model:
Basic, Intermediate, and Detailed.
• The Basic COCOMO81 model is a single-valued static model, that computes software development
e4ort (and cost) as a function of the program size expressed in estimated lines of code (LOC).
• The Intermediate COCOMO81 model computes software development e4ort as a function of the
program size and a set of “cost drivers”, that include subjective assessments of product, hardware,
personnel, and project attributes.
• The Detailed COCOMO81 model incorporates all characteristics of the Intermediate model with
an assessment of the cost drivers impact on each step (i.e., analysis, design, etc.) of the software
engineering process.
COCOMO81 models primarily are dependent on two equations. The "rst equation is development
e8ort, based on MM—man-month, person-month, or sta4-month, where each is a measure of 1
month of e4ort by one person. The "rst equation is given by
MM = a(KDSI)b ; (1)
The second equation is e8ort and development time (TDEV), and is given by
TDEV = c(MM)d : (2)

Table 1
MM for the Basic COCOMO
Development mode Basic e4ort equation
Organic MM = 2:4 ∗ (KDSI)1:05

Semi-detached MM = 3:0 ∗ (KDSI)1:12
Embedded MM = 3:6 ∗ (KDSI)1:20
Table 2
TDEV for the Basic COCOMO
Development mode Basic e4ort equation
Organic TDEV = 2:5 ∗ (MM)0:38

Semi-detached TDEV = 2:5 ∗ (MM)0:35
Embedded TDEV = 2:5 ∗ (MM)0:32
In the above two equations, KDSI represents the number in thousands of delivered source instruc-
tions and is a measure of the program size. The coe?cients a, b, c, and d depend on the project
development mode. There are three modes of development and they are listed below:
• Organic mode: The project is being developed in a familiar, stable environment and is similar to
previously developed projects.
• Embedded mode: The project will require much innovation and is subject to tight, inMexible
interface requirements and constraints.
• Semi-detached mode: The project is characterized somewhere between the Organic and Embedded
development modes.
2.2. Basic COCOMO model
The Basic COCOMO is the top-level model, and is e4ective when a rough software e4ort estimate
is needed. The E4ort equations for each mode of project development are shown in Table 1. The
time to develop the project (TDEV) is based on E4ort and is characterized by the equations listed
in Table 2.
2.3. Intermediate COCOMO
Boehm [5] suggests that the accuracy of Basic COCOMO is limited because it does not account
for di4erences in hardware, quality and experience of personnel, use of modern tools, and other
attributes that are known to have a signi"cant inMuence on project cost. Intermediate COCOMO
adds accuracy to the Basic COCOMO by multiplying “Cost Drivers” into the equation with a new
variable: “E4ort Adjustment Factor” (EAF).
The EAF term shown in Table 3 is the product of 15 “E4ort Multipliers”, that are listed in
Table 4. If the category values of all the "fteen cost drivers are “Nominal”, then the EAF term
Table 3
MM for the Intermediate COCOMO
Development mode Intermediate e4ort equation
Organic MM = EAF ∗ 3:2 ∗ (KDSI)1:05

Semi-detached MM = EAF ∗ 3:0 ∗ (KDSI)1:12
Embedded MM = EAF ∗ 2:8 ∗ (KDSI)1:20
Table 4
Intermediate COCOMO multipliers
Cost driver Very low Low Nominal High Very high Extra high
ACAP 1.46 1.19 1.00 0.86 0.71 —

AEXP 1.29 1.13 1.00 0.91 0.82 —
CPLX 0.70 0.85 1.00 1.15 1.30 1.65
DATA — 0.94 1.00 1.08 1.16 —
LEXP 1.14 1.07 1.00 0.95 — —
MODP 1.24 1.10 1.00 0.91 0.82 —
PCAP 1.42 1.17 1.00 0.86 0.70 —
RELY 0.75 0.88 1.00 1.15 1.40 —
SCED 1.23 1.08 1.00 1.04 1.10 —
STOR — — 1.00 1.06 1.21 1.56
TIME — — 1.00 1.11 1.30 1.66
TOOL 1.24 1.10 1.00 0.91 0.83 —
TURN — 0.87 1.00 1.07 1.15 —
VEXP 1.21 1.10 1.00 0.90 — —
VIRT — 0.87 1.00 1.15 1.30 —
is equal to 1.0, implying that for a Semi-detached model the Intermediate and Basic COCOMO
models would yield the same results. Due to the new cost drivers introduced for the Organic and
Embedded development modes, the project cost estimates would, respectively, increase or decrease.
Depending on the cost drivers, EAF would be increased or decreased by choosing lower or higher
values, respectively.
The details of the cost driver acronyms used in Table 3 are as follows [5]: ACAP—analyst capa-
bility; AEXP—applications experience; CPLX—product complexity; DATA—database size; LEXP—
language experience; MODP—modern programming practices; PCAP—programmer capability;
RELY—required software reliability; SCED—required development schedule; STOR—main stor-
age constraint; TIME—execution time constraint; TOOL—use of software tools; TURN—computer
turnaround time; VEXP—virtual machine experience; and VIRT—virtual machine volatility.
2.4. Detailed COCOMO
The Detailed model di4ers from the Intermediate model with respect to only one major aspect, in
that the Detailed model uses di4erent E4ort Multipliers for each phase of a project. Phase-dependent
E4ort Multipliers yield better estimates than the Intermediate model. The Detailed model de"nes six
life cycle phases: requirements, product design, detailed design, coding and unit testing, integration
and testing, and maintenance.
The Detailed COCOMO model illustrates the importance of recognizing the di4erent levels of
predictability at each phase of the development cycle. Boehm et al. had the right idea here, however,
COCOMO81 by itself is not robust enough to predict project costs accurately at all phases of
the development life cycle. By considering an extreme scenario of applying appropriate weights
to the requirements analysis phase, a serious Maw in the Detailed COCOMO model is observed.
Furthermore, the model evaluates the cost estimate based on inputs that are not very accurate until
the later phases of the software design. Hence, the Intermediate model is the more accurate COCOMO
model, as compared to both the Basic and Detailed models.
The steps in obtaining an estimate using the Intermediate COCOMO81 model are:
(1) Identify the mode of development for the new product, i.e., Organic, Semi-detached, or
Embedded.
(2) Estimate the size of the project in KDSI to derive a nominal e4ort prediction. Adjust the "fteen
cost drivers (Table 4) to reMect the project.
(3) Calculate the e4ort adjustment factor (EAF).
(4) Calculate the predicted project e4ort using the equations in Table 3.
2.5. Characteristics of COCOMO81
COCOMO is transparent, in that one can observe how the model works, whereas other models
such as SLIM do not provide such a transparency. Drivers are particularly helpful to the estimator in
order to understand the impact of di4erent factors that a4ect project costs. However, the COCOMO81
model demonstrates a few drawbacks, as listed below.
• It is di?cult to accurately estimate KDSI during the early stages of the project when accurate
e4ort estimates are required the most.
• KDSI is not a true size measure but rather it is a length measure. This makes the model extremely
vulnerable to mis-classi"cation of the project development mode.
• Success depends largely on tuning the model, with the use of historical data, to the needs of the
organization. However, historical data is usually not available when needed.
3. Fuzzy identication
Systems can be represented by mathematical models of many di4erent forms, such as algebraic
equations, di4erential equations, and "nite state machines. A fuzzy model is based on a set of if–then
rules that describe the relationships between variables. Basically, a fuzzy model provides an e4ective
way to explore and present the approximate and imprecise nature of the real world. In particular, a
fuzzy model appears useful when the systems are not suitable for analysis by conventional quanti-
tative techniques or when the available information on the systems is uncertain or inaccurate. For
example, the rules establish logical relations between the system’s variables by relating qualitative
values of one variable to qualitative values of another variable. The qualitative values typically have
a clear linguistic interpretation and are referred to as linguistic terms.
The term fuzzy identi"cation [2] usually refers to the techniques and algorithms for constructing
fuzzy models from data. There are two main approaches for obtaining a fuzzy model from data:
• The expert knowledge in a verbal form that is translated into a set of if–then rules. A certain
model structure can be created, and parameters of this structure, such as membership functions
and weights of rules, can be tuned using input and output data.
• No prior knowledge about the system under study is initially used to formulate the rules, and a
fuzzy model is constructed from data based on a certain algorithm. It is expected that extracted
rules and membership functions can explain the system behavior. An expert can modify the rules
or supply new ones based upon his or her own experience. The expert tuning is optional in this
approach.
Our study will focus on the second approach, i.e., fuzzy models are directly derived from data
automatically. The fuzzy model adopted in our study is the Takagi–Sugeno fuzzy model [20,21],
in which the data can be numerical or linguistic. The two major components of a fuzzy model are
membership functions and rules. The following sections will focus on the algorithms for membership
functions, Takagi-Sugeno fuzzy model, and rules extraction.
3.1. Fuzzy clustering
Prior to determining the membership functions, we need to obtain for each input variable, the
partitioning of the data into a number of clusters based on experiences. These clusters have “fuzzy”
boundaries, in the sense that each data value belongs to a given cluster to some degree, implying
that the membership of each data observation is neither crisp nor certain. Having decided upon the
number of such clusters to be used, we need some procedure to locate their mid-points (or more
generally, their centroids) and to determine the associated membership functions and degrees of
membership for the data-points.
A variety of fuzzy clustering methods have been proposed. In this section, we have selected to
describe only the fuzzy c-means (FCM) method and its most obvious generalizations. We note that
readers familiar with the FCM algorithm and its process can skip or just skim through this section.
The FCM algorithm is really a generalization of the “hard” c-means algorithm. The FCM algorithm
is closely associated with such early contributors as Bezdek [4] and Dunn [7], and is widely used
in "elds such as pattern recognition. Suppose we are given a set of n elements or data samples that
we wish to classify:
X = {x1 ; x2 ; : : : ; xk ; : : : ; xn }: (3)
Each element xk is an m-dimensional data vector:
xk = [xk1 ; xk2 ; : : : ; xkm ]: (4)
The FCM algorithm is generally more suited for a data set that has data points that are evenly (ap-
proximately) distributed around distinct cluster “centers”. A cluster center is a vector vi (i = 1; 2; : : : ; c)
of m components representing a “prototype” for the elements in cluster i. vi does not necessarily
need to be an element of the set. A membership value which describes the degree of belonging of
the kth element in the set, xk , to the ith cluster can be denoted by,
ik =∈ [0; 1] (1 6 i 6 c; 1 6 k 6 n): (5)
In the FCM algorithm, a partition matrix U , that reMects a measure of similarity among the
elements (i.e., proximity of the data points to each of the cluster centers), is de"ned as follows:
 
11 12 · · · 1n
 21 22 · · · 2n 
 
U =  .. .. . . . : (6)
 . . . .. 
c1 c2 · · · cn
The classi"cation criterion is realized by minimizing the following objective function (the perfor-
mance index) with respect to both the membership values and the cluster centers:
n
c
J (c) = (ik )w (dik )2 ; (7)
k=1 i=1
where w ∈ (1; ∞) denotes the fuzziness index (a weight on membership values) and dik is a measure
of proximity between the kth data sample xk and the ith cluster center vi , de"ned by the Euclidean
distance norm,
 
m
dik = xk − vi =  ((xkj − vij )2 )1=2  : (8)
j=1
Since the elements in the set have m features (co-ordinates) to describe their location in feature
space, each cluster center also requires m features to determine its location in the same space.
Therefore, the ith cluster center vi is an m-dimensional vector, similar to a data point:
vi = [vi1 vi2 · · · vim ]: (9)
The value of the jth feature of vi is calculated by the expression:
n w
k=1 (ik ) xkj
vij = n ; (10)
k=1 (ik )
w
where the computation is done for all features, j = 1; 2; : : : ; m.

The fuzzy c-means method is an iterative algorithm, and is described as follows:
(1) Given the desired number of clusters c for the classi"cation of the n elements of the data set,
and a real number w¿1, assume an initial partition matrix U (0) . The iteration number in this
algorithm is labeled with superscript t, with t being 0 for the initial guess.
(2) Compute the cluster centers (prototypes) vi(t) = [vi1
(t) (t) (t)
vi2 · · · vim ] for i = 1; 2; : : : ; c, using the ex-
pression:
n (t) w
(t) k=1 (ik ) xkj
vij = n (t) w
: (11)
k=1 (ik )
(3) Compute the distances from each element in the set to each cluster center, using
 
m
d(t) (t)  ((xkj − vij(t) )2 )1=2 
ik = xk − vi = (12)
j=1
for all clusters i = 1; 2; : : : ; c and elements k = 1; 2; : : : ; n:

(4) Update the membership value of each data point. The updated values ik of element k in cluster
i are computed by the formula:
1
ik(t+1) =
: (13)
c (t) (t) 2=(w−1)
j=1 (dik =djk )
The special form of this formula ensures that the sum of membership values of an element over
all clusters equals unity. The partition matrix U (t+1) is then re-computed with these updated
membership values as
 (t+1) (t+1) (t+1) 
11 12 · · · 1n
 (t+1) (t+1) · · · (t+1) 
(t+1)  21 22 2n 
U = . . . . : (14)
 . . .
. . . . .
(t+1) (t+1) (t+1)
c1 c2 · · · cn
(5) The iterative process stops when it has converged under some selected norm. Otherwise, a new
iteration is performed, i.e., set t = t + 1 and return to step (2). The norm employed for checking
convergence might be:
max |ik(t+1) − ik(t) | 6 ; (15)
where is a prede"ned error limit.
3.2. Takagi–Sugeno fuzzy model
The Takagi–Sugeno Fuzzy Model, also known as the TS fuzzy model, was proposed by Takagi,
Sugeno and Kang [20,21] in an e4ort to develop a systematic approach to generating fuzzy rules
from a given input–output data set. A typical fuzzy rule in a TS fuzzy model has the form,
Ri : If x1 is Ai1 and · · · xp is Aip then yi = f(x); i = 1; 2; : : : ; n (16)
where Ai is the fuzzy set in the antecedent, while yi = f(x) is a crisp function in the consequent.
Usually yi = f(x) is a polynomial function with respect to the input variable x. When yi = f(x)
is a "rst-order polynomial function, the resulting fuzzy inference system is called a ;rst-order TS
fuzzy model, which was originally proposed in [20,21]. When f is a constant, we have what is
known as a zero-order TS fuzzy model, which can be viewed as a special case of the Mamdani
fuzzy inference system [14]. The following is a single-input TS fuzzy model:
If X is small then Y = 0:1X + 6:4;
If X is medium then Y = −0:5X + 4; (17)
If X is large then Y = X − 2:
1.2 8
small medium large

7
1
0.8
Membership Grades
Y
0.6 4
3
0.4
0.2
1
0 0
-10 -5 0 5 10 -10 -5 0 5 10
(a) X (b) X
Fig. 1. TS fuzzy model with non-fuzzy input membership function. (a) Antecedent MFs for crisp rules, (b) Overall I/O
curve for crisp rules.
If “small”, “medium”, and “large” are non-fuzzy sets with membership functions in Fig. 1, then the
overall input–output curve is piece-wise linear. On the other hand, if we have fuzzy input membership
function, the overall input–output curve becomes smooth like the one shown in Fig. 2.
3.3. Rule extraction
We can generate rules for the fuzzy model based on the membership functions obtained by using
FCM. There are various algorithms to generate rules, however, we will focus our e4orts on the
Takagi–Sugeno (TS) models.
The idea of constructing TS fuzzy models by fuzzy clustering is not new, Yoshinari et al. applied
the fuzzy c-elliptotypes algorithm to derive a TS fuzzy model [26] Babuska and Verbruggen et al.
used the GK algorithm [2]. We will present a way of deriving TS model using FCM. The fuzzy
model can be represented as a set of TS rules:
Ri : If x is Ai then yi = aTi x + bi ; i = 1; 2; : : : ; n: (18)
The antecedent fuzzy set Ai can be extracted from the fuzzy partition matrix by projections. The
consequent parameters, ai and bi , are estimated from the data using least-squares methods.
1.2 8
small medium large

7
1
0.8
Membership Grades
Y
0.6 4
3
0.4
0.2
1
0 0
-10 -5 0 5 10 -10 -5 0 5 10
(c) X (d) X
Fig. 2. TS fuzzy model with fuzzy input membership function. (c) Antecedent MFs for fuzzy rules, (d) Overall I/O curve
for fuzzy rules.
3.4. Generating antecedent membership function by projection
The principle of this method is to obtain the individual antecedent variable membership function
by projecting the multi-dimensional fuzzy sets de"ned point-wise in the rows of the partition matrix
U onto axes associated with the individual antecedent variable. Currently, there are many projection
methods available, however, in our study we use the axis-orthogonal projection method.
This method projects the fuzzy partition matrix U onto the axes of the antecedent variables xj ,
16j6p. The TS rules are then expressed in the conjunctive form:
Ri : If x1 is Ai1 and · · · xp is Aip then yi = aTi x + bi ; i = 1; 2; : : : ; n (19)
In order to obtain membership functions for the antecedent fuzzy sets Aij , the multi-dimensional
fuzzy set de"ned point-wise in the ith row of the partition matrix U is projected onto the regressors
xj by
Aij (xjk ) = projj (ik ): (20)
The projection operator is based on the following two de"nitions.
Denition 1 (Point-wise projection). Let U = (X (i) )i∈Nn be a universe of dimension n and let C, S
and T be index subsets of Nn which satisfy the conditions T = S ∪C, S ∩C = ∅ and S = ∅. Point-wise
projection of X T onto X S is the mapping redTS : X T → X S de"ned by
redTS = xS with (∀i ∈ S : (xS )(i) = (xT )(i) ): (21)
A11
A12
A1
x1
x2
Fig. 3. Example of projection from 2 dimension to 1 dimension.
Denition 2 (Projection of a fuzzy set). Let U = (X (i) )i∈Nn be a universe of dimension n, M an

index set with ∅ = M ⊆ Nn . The projection of A onto xM is the mapping projM : F(X ) → F(X M )
de"ned by

projM ((x)) = sup (x ) | x ∈ X ∧ x = redNMn (x ) : (22)
An example of projection from a two-dimensional space to a one-dimensional space is presented
in Fig. 3.
3.5. Estimating consequent parameters
There are several approaches for obtaining the consequent parameters, however, in our study
we have adopted the weighted least-squares technique. The identi"cation data and the membership
degrees of the fuzzy partition are arranged in the following matrices:
 T  
x1 y1
 xT  y 
 2  2
X =  . ; y = 
 
 .. 
 (23)
.
 .   . 
xNT yN
 
i1 0 · · · 0
 0 ··· 0 
 i2 
Wi =  .. .. . . ..  :
 (24)
 . . . . 
0 0 · · · iN
The consequent parameters, ai and bi , of the rule belonging to the ith cluster, are concatenated
into a single parameter vector, %i , which is given by,
%i = [aiT ; bi ]T (25)
Appending a unitary column to X gives the extended regressor matrix Xe

Xe = [X; 1]: (26)
The membership degree, ik , of the fuzzy partition serves as the weights expressing the relevance
of the data pair (xk ; yk ) to that local model. If the columns Xe are linearly independent and ik ¿0
for 16k6n, then
%i = [XeT Wi Xe ]−1 XeT Wi y (27)
is the least-squares solution of y = Xe % + where the kth data pair (xk ; yk ) is the weighted by ik .
The parameters ai and bi are given by
ai = [%1 ; %2 ; : : : ; %p ]; bi = %p+1 : (28)
However, if the columns of Xe are linearly dependent, we should use the orthogonal factorization
of X . To simplify the computation, it is more e?cient to "rst multiply each row of Xe and y by
√ e
ik :
√ T  √ 
i1 xe1 i1 y1
√ T  √ 
 i2 xe2   i2 y2 
   
X̃i =  .. ; y =  ..  (29)
 .   . 
   
√ T √
iN xen iN yn
and then compute %i by
T T
%i = [X̃i X̃i ]−1 X̃i y: (30)
4. Preprocessing data
In the COCOMO81 Intermediate model, the cost driver attributes are grouped into four categories:
software product attributes, computer attributes, personnel attributes, and project attributes. They are
listed below:
• Product Attributes:
RELY : Required Software Reliability;
DATA : Database Size;
CPLX : Product Complexity.
• Computer Attributes:
TIME : Execution Time Constraint;
STOR : Main Storage Constraint;
VIRT : Virtual Machine Volatility;
TURN : Computer Turnaround Time.
• Personnel Attributes:
ACAP : Analyst Capability;
AEXP : Applications Experience;

PCAP : Programmer Capability;
VEXP : Virtual Machine Experience;
LEXP : Programming Language Experience.
• Project Attributes
MODP : Modern Programming Practices;
TOOL : Use of Software Tool;
SCED : Required Development Schedule.
The fuzzy model will be extremely complex if we use all of the above-mentioned 15 factors as
individual inputs to the fuzzy model. Hence, it would be more practical to use the four grouped
categories as inputs. The values of the four categories are obtained from the comprehensive contri-
bution of the factors in their respective categories. We "rst de"ne membership functions for these
four categories. The category is divided into 10 membership functions as follows:
1
very-low (x) = x 5:00
1+| |
12:5
1
low (x) = x−20 5:00
1+| |
12:5
1
low-nominal (x) = x−30 5:00
1+| |
12:5
1
nominal (x) = x−40 5:00
1+| |
12:5
1
nominal-high (x) = x−50 5:00
1+| |
12:5
1
high (x) = x−60 5:00
1+| |
12:5
1
high-very high (x) = x−70 5:00
1+| |
12:5
1
very high (x) = x−80 5:00
1+| |
12:5
1
very-extra high (x) = x−90 5:00
1+| |
12:5
1
extra high (x) = x−100 5:00 (31)
1+| |
12:5
The following rules are then used to calculate the values of the categories. Let us consider ‘RELY’
from the Product Attributes in the following example.
(1) Rule:
(2) If RELY is very low then Product Attributes associated with RELY is very low.
(3) If RELY is low then Product Attributes associated with RELY is low.
(4) If RELY is between low and nominal then Product Attributes associated with RELY is low-
nominal.
(5) If RELY is nominal then Product Attributes associated with RELY is nominal.
(6) If RELY is between nominal and high then Product Attributes associated with RELY is
nominal-high.
(7) If RELY is high then Product Attributes associated with RELY is high.
(8) If RELY is between high and very-high then Product Attributes associated with RELY is
high-very high.
(9) If RELY is very-high Product Attributes associated with RELY is very-high.
(10) If RELY is between very-high and extra-high then Product Attributes associated with RELY
is very-extra high.
(11) If RELY is extra-high then Product Attributes associated with RELY is extra-high.
The other factors in Product Attributes use similar rules. Hence we can then obtain the overall
fuzzy set Product Attributes associated with RELY, DATA, and CPLX. We use the Larsen Product-
Addition inference methods [23] to obtain the overall Product Attributes. Consequently, we extract
a crisp value from the fuzzy set Product Attributes associated with RELY, DATA, and CPLX as a
representative value. This process is similar to the defuzzi"cation [23] in a fuzzy system, and we
apply the “Centroid of Areas” method to calculate the defuzzi"cation values.
5. Case study
5.1. Preprocessing data
The COCOMO81 database that consists of 63 projects [5], was investigated in our study. We
preprocess the database according to Section 4. The extracted representative values for four categories
are listed in Table 5, and our fuzzy model is based on these extracted representative values. The
PN value in the table indicates the sequential project number, currsize.
5.2. COCOMO 81 model
We evaluated the Intermediate COCOMO81 model using the Organic model described in Table
3. The cost driver attributes determine a multiplying factor that estimates the e4ect of the attribute
on the software development e4ort. These multipliers are applied to a COCOMO development e4ort
to obtain a re"ned estimate of the software development e4ort. Each cost driver in the Intermediate
COCOMO81 model is measured using a rating scale of six linguistic values: very low, low, nominal,
high, very high, and extra high. Table 4 lists the e4ort multipliers used in this model.
Sometimes these six linguistic values are not su?cient to distinguish particular project attribute,
therefore, we added four more values: low-nominal, nominal-high, high-very high, and very-extra
high. The scale rate for these four extra attributes can take the average value of the two scales into
which they fall. For example nominal-high for RELY can be 1.07. Table 7 shows the "tting result
of the three COCOMO81 models.
Table 5
COCOMO extracted category data
PN Product Computer Personnel Project
1 41.1240 54.9800 24.3957 33.5152

2 40.1425 50.0000 50.0000 33.5655
3 46.6479 32.7032 63.8372 53.3200
4 42.5577 35.1900 38.6238 41.2253
5 27.0283 35.1900 51.9920 33.3186
6 25.5631 49.8162 41.5188 25.5631
7 33.3186 30.3048 48.0080 53.3200
8 53.3521 70.9635 43.7821 20.4275
9 53.3521 64.6792 40.2771 46.6800
10 59.8575 59.1114 59.8417 40.0401
11 59.8575 59.1114 59.8417 40.0401
12 53.3521 50.0000 59.8417 40.2374
13 53.3521 54.9800 51.8616 53.0906
14 51.1653 64.6232 44.1198 13.7604
15 59.8575 55.0160 36.7206 41.2253
16 66.2675 53.4798 48.0080 40.0401
17 66.2675 53.4798 55.8463 40.0401
18 66.4345 64.6792 44.0241 17.7181
19 53.3200 47.5121 49.8531 53.3200
20 72.9717 64.8100 45.9852 25.5631
21 59.7626 40.3112 44.0241 53.3200
22 46.6800 54.8142 59.8417 41.2253
23 46.6800 47.5103 59.8417 33.3186
24 27.0283 42.6876 36.4120 66.2675
25 72.9717 45.1858 44.1537 53.3521
26 30.2440 57.4692 44.1537 33.7325
27 46.9094 54.8491 40.0401 40.1425
28 66.4345 64.8100 40.2771 27.0283
29 33.5655 40.0401 36.4120 49.1321
30 33.5655 40.0401 40.7008 49.1321
31 59.7626 72.0865 48.1384 40.0401
32 33.7325 40.0401 55.7159 27.0283
33 72.9717 63.3083 59.8417 53.3200
PN TIME STOR VIRT TURN
34 53.3200 37.6902 40.0401 33.5152

35 41.1240 57.3124 38.1615 23.4277
36 33.7325 30.3048 40.2771 53.3200
37 17.7181 45.0200 55.8463 40.0401
38 46.6800 30.3048 57.8434 59.7626
39 46.6800 35.1900 67.6406 40.2374
40 33.5655 35.1900 47.9965 33.5655
41 27.0283 30.3048 63.7299 53.3200
42 40.2400 45.1858 50.0170 53.3191
43 46.6809 50.0000 51.9920 46.6800
44 40.2400 52.5078 48.0080 40.2374
45 40.2400 45.1858 48.0080 50.0000
Table 5 (Contd.)
PN TIME STOR VIRT TURN
46 40.2400 45.1858 51.9920 46.6800

47 41.1240 30.3048 55.9145 40.2374
48 20.4275 35.1900 42.2813 40.2374
49 33.5655 35.1900 51.8616 59.7626
50 46.6800 54.8491 44.1537 40.0401
51 33.5655 49.8162 36.4120 33.5152
52 20.4275 50.0000 32.2631 25.5631
53 33.7325 64.6792 47.9848 27.0283
54 33.5655 45.1858 51.8508 40.2374
55 17.7181 30.3048 56.0105 33.5655
56 53.3521 59.6632 38.2746 20.4275
57 40.2374 64.6792 32.2631 17.7181
58 59.8575 58.9399 71.6592 46.6800
59 40.2374 42.6876 40.0401 46.6800
60 53.3521 50.0000 36.3781 20.4275
61 40.2374 30.3048 51.9920 53.0906
62 40.1425 59.8121 60.8193 41.2253
63 40.2374 35.1900 59.7229 53.0906
5.3. Fuzzy modeling
We applied fuzzy modeling to the COCOMO81 database. We selected the overall representa-
tive value of Product Attributes, Computer Attributes, Personnel Attributes, Project Attributes, and
KDSI as independent variables, and the dependent variable was Man Month (MM) as described in
Section 2. We evaluated the model using the quality of "t and cross validation evaluation techniques.
The initial number of clusters is empirically selected as 5, implying that there are 5 TS rules for
the model. However, other values for the number of clusters can also be used. The optimization
problem of determining the best value for number of clusters is beyond the scope of this paper. In
the cross validation evaluation technique, 63 iterations of model building and evaluation were per-
formed. At each iteration one entry from the database is used as the test data, while the remaining
62 entries are used as the "t data. The model is built using the "t data, whereas the test data is
used to evaluate the model. The average of the 63 test data evaluations represents the evaluation of
the "tted model (Fig. 4).
The sampling period is 1 s, whereas the termination tolerance of the clustering algorithm was
0.01. Fig. 5 shows the projection membership functions for these "ve independent variables. The
cluster centers are shown in Table 6. The quality of "t and cross validation cost estimate results
are shown in Table 7, whereas the statistical results are shown in Table 8. The quality of "t values
represent the cost estimates of the model when the complete "t data set is used as a test data set,
i.e., resubstitution. The cross validation values indicate the cost estimates of the model based on the
leave-one-out strategy.
Very Low Low-Nominal Nominal-High High-Very High Very-Extra High

Low Nominal High Very High Extra High
1
0.5
0
0 10 20 30 40 50 60 70 80 90 100
Fig. 4. Membership function for the COCOMO81 categories.
1 1
0.5 0.5
0 0
0 20 40 60 80 30 40 50 60 70 80
u1 u2
1 1
0.5 0.5
0 0
20 40 60 80 0 20 40 60 80
u3 u4
1
0.5
0
0 200 400 600 800 1000
u5
Fig. 5. Projection membership functions for "ve inputs.
In the following, the output-speci"c information is shown for each output, i.e., the Rules for Man
Month (MM) are shown.
1. If u1 is A11 and u2 is A12 and u3 is A13 and u4 is A14 and u5 is A15 then
MM = 32:00u1 + 34:50u2 − 28:60u3 + 51:60u4 + 3:44u5 − 2540:00.
MM = − 5:94u1 − 0:73u2 + 2:20u3 + 5:80u4 + 9:43u5 − 85:40.
Table 6
Cluster centers
Rule u1 u2 u3 u4 u5
1 39:9 51:2 42:4 29:5 169:0

2 40:6 43:1 48:7 40:3 8:5
3 42:3 50:1 45:2 38:4 24:4
4 48:6 50:7 54:9 47:0 66:6
5 65:1 52:5 45:6 40:7 373:0
MM = 7:39u1 − 1:53u2 + 0:48u3 − 7:01u4 + 5:82u5 − 43:90.
MM = 18:90u1 + 19:90 − 33:60u3 − 13:60u4 + 8:60u5 + 447:00.
MM = − 3:01 × 108 u1 + 6:48 × 108 u2 − 6:27 × 108 u3 + 3:79 × 108 u4 − 4:13 × 106 u5 + 2:86 × 108 ,
where u1 , u2 , u3 , u4 , and u5 represent Product Attributes, Computer Attributes, Personnel Attributes,
Project Attributes, and KDSI, respectively, currsize.
5.4. Discussion
As shown in Tables 7 and 8, the estimation accuracy for the proposed fuzzy modeling approach
is better than that of the COCOMO model. The average absolute error (AAE) value for cross
validation of the fuzzy model (Fuzzy Cross) is only 45.5745, i.e., 32.5715 lower than the Intermediate
COCOMO "t model. The AAE value for quality of "t of the fuzzy model (Fuzzy Fit) is even lower,
i.e., it is better. Furthermore, since the Intermediate COCOMO model has better AAE than that of
the Basic and Detailed COCOMO models, both Fuzzy Cross and Fuzzy Fit models perform better
than the Basic and Detailed COCOMO models. Thus, we can clearly observe that AAEs for cross
validation and quality of "t values of the fuzzy model are signi"cantly better than those of the
COCOMO models. Also as shown in Table 8, the average relative error (ARE) values for the fuzzy
models are also signi"cantly lower than all of the three COCOMO models.
In order to statistically verify our observations, we conducted a paired t-test [3] with ARE (as
well as AAE) as the response variable for the statistical test. The null and alternate hypothesis tests
(for ARE) are formulated as:
ARE ARE
H0 : (MMFuzzy − MMCOCOMO ) ¿ 0;
ARE ARE
HA : (MMFuzzy − MMCOCOMO ) ¡ 0;
ARE ARE
where MMFuzzy and MMCOCOMO represent the absolute relative error (ARE) for the fuzzy identi"cation
model and COCOMO model, respectively. We compared the Fuzzy Cross and Fuzzy Fit models with
each of the COCOMO models. However, only the results of ARE comparisons with the Intermediate
and Detailed models are presented.
When comparing the Fuzzy Cross and Fuzzy Fit models with the Intermediate COCOMO model,
the t values obtained were 1.7561 and 1.8254, respectively. Both of the t values obtained, were
Table 7
MM Results of the COCOMO and fuzzy model
PN Fuzzy Fuzzy Intermediate Detailed Basic Real

cross "t COCOMO COCOMO COCOMO
1 1665.76 2029.11 2218 2286 1047 2040

2 1738.9 1642.63 1770 1760 2702 1600
3 386.781 290.548 245 248 711 243
4 227.697 257.462 212 207 134 240
5 35.1072 30.0092 39 38 44 33
6 35.2049 42.23228 30 30 10.3 43
7 9.6609 9.2164 9.8 10.2 18 8
8 511.155 1013.03 869 994 147 1075
9 428.083 470.224 397 395 213 423
10 239.023 252.081 214 218 115 321
11 205.23 277.115 243 248 131 218
12 195.67 219.131 238 237 274 201
13 110.603 102.336 108 106 163 79
14 89.466 77.126 60 64 10.3 73
15 66.7695 59.02503 52 51 18 61
16 34.9789 36.8081 38 39 17 40
17 11.1449 12.4844 10.7 10.9 7.8 9
18 11400 11400 11056 12380 3652 11400
19 6105.09 6600 7764 7699 13749 6600
20 6338.81 6400 6536 7571 1698 6400
21 2414.12 2455 1836 1864 2741 2455
22 705.797 778.306 733 728 1003 724
23 609.902 463.14 443 445 640 539
24 454.276 443.364 326 337 463 453
25 451.143 523 430 433 283 523
26 365.347 370.318 339 341 375 387
27 82.227 88.7186 89 82 53 88
28 85.4045 132.093 133 143 35 98
29 8.12141 7.873 7 7.0 7.0 7.3
30 6.05914 6.03522 5.8 5.9 6.4 5.9
31 1013.37 1002.02 962 1057 394 1063
32 815.256 656.699 869 868 1527 702
33 642.263 540.293 529 543 301 605
34 203.147 188.059 201 201 147 230
35 72.2977 76.099 161 162 78 82
36 67.9568 68.6683 33 34 49 55
37 46.7448 43.981 44 44 97 47
38 18.3142 20.4074 20 20 41 12
39 9.744 8.328 8.4 8.3 16 8
40 9.0743 8.3674 8.1 8.0 6.3 8
41 8.02493 6.062 4.7 4.9 14 6
42 39.448 46.794 46 46 54 45
43 74.9033 131.163 102 102 79 83
44 75.0984 125.501 130 128 85 87
45 123.041 95.5027 100 100 91 106
46 150.338 219.995 166 164 167 126
Table 7 (Contd.)
PN Fuzzy Fuzzy Intermediate Detailed Basic Real

47 32.6361 32.8342 33 32 65 36
48 1155.35 1282.97 1542 1519 1858 1272
49 169.238 151.5083 168 170 469 156
50 191.581 128.538 193 194 163 176
51 130.866 147.1222 114 115 27 122
52 37.8064 51.1202 55 56 22 41
53 16.30243 17.2839 22 22 19.4 14
54 24.881 20.84719 14 14 11.4 20
55 20.7061 17.524364 7.5 7.4 16.6 18
56 826.252 929.524 537 570 188 958
57 244.518 182.428 239 247 93 237
58 131.502 132.405 145 143 171 130
59 78.6103 73.273 68 68 59 70
60 61.6232 67.75 60 61 18 57
61 51.2874 59.1798 47 48 79 50
62 37.4984 35.23229 42 41 36 38
63 9.40078 17.6209 17 17 57 15
Table 8
Statistical results of the COCOMO and fuzzy model
Fuzzy Fuzzy Intermediate Detailed Basic

AAE 45.575 20.073 78.146 99.498 464.659

ARE 0.137 0.134 0.188 0.188 0.602
greater than the critical value t1−( ; n−1 . In our case study, ( = 0:05 and n = 63 (the number of projects
in the COCOMO database), and consequently the critical value is t = 1:67. Therefore, we reject the
null hypothesis, H0 , and conclude that both fuzzy identi"cation models signi"cantly reduced the
average relative error (ARE) in this case study. A similar observation was made when performing
t tests with AAE as the response variable.
When comparing the fuzzy models with the Detailed COCOMO model, the t values obtained
(for ARE) were 1.7471 (Fuzzy Cross) and 1.8153 (Fuzzy Fit). Once again these values are greater
than the critical value of 1.67. Furthermore, a similar conclusion was once again observed when
performing t tests with AAE as the response variable. Therefore in summary, we conclude that the
fuzzy identi"cation models for this case study yielded statistically better cost estimation results than
all of the three COCOMO estimation models.
6. Conclusions
It is a well-known fact that software project management teams can greatly bene"t from knowing
the estimated cost of their software projects. The bene"ts can have greater impact if accurate cost
estimations are deduced during the early stages of the project life cycle. The process and performance
behavior of software projects have been used as measures for software cost and e4ort estimation
models. With a knowledge of the expected cost of a project, software management teams can control
the software development process using an e4ective approach.
Estimating the process and performance behavior of a software project, early in the software life
cycle, is very challenging and often very di?cult to model. This issue is further elevated by the
fact that important and useful recorded information pertaining to the software cost are often vague,
imprecise, and in some cases even linguistic. Traditionally used software cost estimation techniques
are not capable of incorporating such vague and imprecise information in their cost estimation
models. This incapability prevents the extraction and use of important information that could very
well improve a model’s project cost estimation.
The commonly used COCOMO cost estimation model(s), has been used in obtaining software
cost and e4ort estimations. The model(s) incorporates data collected from several software projects,
and uses this gathered information for its cost and e4ort estimations. However, it may not provide
accurate project cost estimations as the software size and complexity increases.
A Fuzzy Identi"cation software cost estimation technique, presented in this paper, incorporates
the important project information that are often too vague and imprecise. The proposed estimation
technique is an advanced fuzzy logic technique that integrates fuzzy clustering, space projection,
fuzzy inMuence, and defuzzi"cation. It is observed that the structure of the fuzzy model is very
simple and the number of inference rules is the same as the number of fuzzy clusters. The rule
based preprocessing of data reduces the database size signi"cantly.
In this case study, we applied fuzzy identi"cation to extract rules and membership functions from
fuzzy input data. The cost estimation results were then compared with those obtained from the three
types of COCOMO cost estimation models, i.e., Basic, Intermediate, and Detailed. For the case
study investigated, it is clearly indicated that the cost estimation accuracy of the fuzzy models was
signi"cantly better than that of the COCOMO models.
Future research may investigate using the fuzzy modeling approach in other estimation problems
such as project size estimation. Furthermore, the proposed fuzzy identi"cation modeling technique
may be investigated using other case studies.
Acknowledgements
We thank the anonymous reviewers for their useful comments, and Dr. Witold Pedrycz for his
useful suggestions. We thank Naeem Seliya for his assistance with modi"cations, editorial reviews
and useful suggestions. We also thank Erik Geleyn and Robert M. Szabo for their suggestions. This
work was supported in part by Cooperative Agreement NCC 2-1141 from NASA Ames Research
Center, Software Technology Division (Independent Veri"cation and Validation Facility). The "nd-
ings and opinions in this paper belong solely to the authors and are not necessarily those of the
sponsors, or collaborators.
References
[1] A.J. Albrecht, J.E. Ga4ney, Software function, source lines of code, and development e4ort prediction: a software
science validation, IEEE Trans. Software Engg. SE-9 (6) (1983) 639–647.
[2] R. Babuska, Fuzzy Modeling For Control, Kluwer Academic Publishers, Dordrecht, 1999.
[3] M.L. Berenson, D.M. Levine, M. Goldstein, Intermediate Statistical Methods and Applications: A Computer Package
Approach, Prentice-Hall, Englewood Cli4s, NJ, 1983.
[4] J.C. Bezdek, Patern Recognition with Fuzzy Objective Function Algorithm, Plenum Press, New York, 1981.
[5] B.W. Boehm, Software Engineering Economics, Prentice-Hall, Englewood Cli4s, NJ, 1981.
[6] K.H. Chen, H.L. Chen, H.M. Lee, A multiclass neural network classi"er with fuzzy teaching inputs, Fuzzy Sets and
Systems 91 (1997) 15–35.
[7] J.C. Dunn, A fuzzy relative of the ISODATA process and its use in detecting compact well separated clusters,
J. Cybernet. 3 (1974).
[8] Z. Fei, X. Liu, f-COCOMO: fuzzy constructive cost model in software engineering, IEEE Internat. Conf. Fuzzy
Systems, 1992, pp. 331–337.
[9] P.W. Garratt, A.C. Hodgkinson, A neurofuzzy cost estimator, Proc. Software Engineering and Applications, 1999,
pp. 401– 406.
[10] H. Ichihashi, T. Watanabe, Learning control system by a simpli"ed fuzzy reasoning model, Proc. Information
Processing and Management of Uncertainty, 1990, pp. 417– 419.
[11] H. Ishibuchi, K. Nozaki, H. Tanaka, Y. Hosaka, M. Matsuda, Empirical study on learning in fuzzy systems by rice
taste analysis, Fuzzy Sets and Systems 64 (1994) 129–144.
[12] C.L. Karr, E.J. Gentry, Fuzzy control of pH using genetic algorithms, IEEE Trans. Fuzzy Systems 1 (1993) 46–53.
[13] J.R. Koza, Genetic Programming: on the Programming of Computers by Means of Natural Selection, MIT Press,
Cambridge, MA, 1996.
[14] E.H. Mamdani, S. Assilian, An experiment in linguistic synthesis with a fuzzy logic controller, Internat. J. Man
Mach. Stud. 7 (1) (1975) 1–13.
[15] J.E. Matson, B.E. Barrett, J.M. Mellichamp, Software development cost estimation using function points, IEEE Trans.
Software Eng. 20 (4) (1994) 275–287.
[16] D. Nauck, R. Kruse, A neuro-fuzzy method to learn fuzzy classi"cation rules from data, Fuzzy Sets and Systems
89 (1997) 277–288.
[17] H. Nomura, I. Hayashi, N. Wakami, A learning method of fuzzy inference rules by descent method, Proc.:
FUZZ-IEEE’92, 1992, pp. 203–210.
[18] T. Pfeufer, M. Ayoubi, Application of a hybrid neuro-fuzzy system to the fault diagnosis of an automotive
electromechanical actuator, Fuzzy Sets and Systems 89 (1997) 351–360.
[19] L.H. Putnam, A general empirical solution to the macro software sizing and estimation problem, IEEE Trans. on
Software Engineering, July 1978, pp. 345–361.
[20] M. Sugeno, G.T. Kang, Structure identi"cation of fuzzy model, Fuzzy Sets and Systems 28 (1988) 15–33.
[21] H. Takagi, M. Sugeno, Fuzzy identi"cation of systems and its applications to modeling and control, IEEE Trans.
Systems Man Cybernet. 15 (1985) 116–132.
[22] C.E. Walston, A.P. Felix, A method of programming measurement and estimation, IBM Systems J. 16 (1) (1977)
54–73.
[23] Z. Xu, Fuzzy logic control system CAD and study on the improvement of FLC performance, Master Thesis, Guangxi
University, Nanning, Guangxi P.R. China, May 1997.
[24] Z. Xu, Fuzzy logic techniques for software reliability engineering, Ph.D. Dissertation, Atlantic University, Boca
Raton, FL, May 2001.
[25] J. Yen, R. Langari, Fuzzy Logic: Intelligence, Control, and Information, Prentice Hall, Inc., Upper Saddle River,
NJ, 1999.
[26] Y. Yoshinari, W. Pedrycz, K. Hirota, Construction of fuzzy models through clustering techniques, Fuzzy Sets and
Systems 54 (1993) 157–165.

Identifcation Offuzzy Models Ofsoftware Cost Estimation

Uploaded by

Copyright:

Available Formats

You might also like

Identifcation Offuzzy Models Ofsoftware Cost Estimation

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Identifcation Offuzzy Models Ofsoftware Cost Estimation

Uploaded by

Copyright:

Available Formats

Fuzzy Sets and Systems 145 (2004) 141 – 163

Identi"cation of fuzzy models of software cost estimation

c 2003 Published by Elsevier B.V.

2. Software development eort estimation models

2.1. COCOMO81 (COnstructive COst MOdel)

TDEV = c(MM)d : (2)

Organic MM = 2:4 ∗ (KDSI)1:05

Organic TDEV = 2:5 ∗ (MM)0:38

2.2. Basic COCOMO model

2.3. Intermediate COCOMO

Organic MM = EAF ∗ 3:2 ∗ (KDSI)1:05

ACAP 1.46 1.19 1.00 0.86 0.71 —

2.4. Detailed COCOMO

2.5. Characteristics of COCOMO81

3.1. Fuzzy clustering

Each element xk is an m-dimensional data vector:

xk = [xk1 ; xk2 ; : : : ; xkm ]: (4)

where the computation is done for all features, j = 1; 2; : : : ; m.

for all clusters i = 1; 2; : : : ; c and elements k = 1; 2; : : : ; n:

3.2. Takagi–Sugeno fuzzy model

small medium large

3.3. Rule extraction

Ri : If x is Ai then yi = aTi x + bi ; i = 1; 2; : : : ; n: (18)

small medium large

3.4. Generating antecedent membership function by projection

Fig. 3. Example of projection from 2 dimension to 1 dimension.

Denition 2 (Projection of a fuzzy set). Let U = (X (i) )i∈Nn be a universe of dimension n, M an

3.5. Estimating consequent parameters

Appending a unitary column to X gives the extended regressor matrix Xe

AEXP : Applications Experience;

5.1. Preprocessing data

5.2. COCOMO 81 model

PN Product Computer Personnel Project

1 41.1240 54.9800 24.3957 33.5152

PN TIME STOR VIRT TURN

34 53.3200 37.6902 40.0401 33.5152

PN TIME STOR VIRT TURN

46 40.2400 45.1858 51.9920 46.6800

5.3. Fuzzy modeling

Very Low Low-Nominal Nominal-High High-Very High Very-Extra High

Fig. 4. Membership function for the COCOMO81 categories.

Fig. 5. Projection membership functions for "ve inputs.

1 39:9 51:2 42:4 29:5 169:0

PN Fuzzy Fuzzy Intermediate Detailed Basic Real

1 1665.76 2029.11 2218 2286 1047 2040

PN Fuzzy Fuzzy Intermediate Detailed Basic Real

Fuzzy Fuzzy Intermediate Detailed Basic

AAE 45.575 20.073 78.146 99.498 464.659

You might also like

2. Software development eort estimation models

Denition 2 (Projection of a fuzzy set). Let U = (X (i) )i∈Nn be a universe of dimension n, M an