Professional Documents
Culture Documents
A Unified Metric of Software Complexity
A Unified Metric of Software Complexity
Renato R. Gonzalez
Faculty of Engineeting, Santo Domingo Institute of Technology, Santo Domingo, Dominican Republic
There are different sources of software complexity. A with the notions of matter and measurement of
large set of complexity metrics can measure distinct length, mass, and time; thermodynamic became a
program attributes to show different program size indi- science with the measurements of temperature and
cators. Nevertheless, the size of a program must be entropy.
obtained from the overall program complexity based
It is impossible to carry out this objective until the
on the values of all program factors. Based on the
right questions are asked. Scientists must know what
concept that software complexity is a measurement of
the resources expended through the software life cy-
to measure before creating a metric of an object.
cle, and the fact that a program may be approached They must interpret the complex phenomena by
from three distinct perspectives, the complexity fac- reducing them to elementary events or processes
tors are classified into three complexity domains: syn- whose behavior is described by simple laws (Estes,
tactical, functional, and computational. Each domain 1982). To define a metric, we must know the context
represents the magnitudes of the factors in one of the of the object, its intrinsic nature and components,
three dimensions: length, time, and level or depth. the characteristics to be measured, and the space in
Using these ideas, in this article we define ordinal which the metric function will operate.
measures of the complexity factors based on discrete Today’s software applications are more complex
mathematical structures of programs and the informa-
and software failure more critical, potentially result-
tion content or entropy. transform the different do-
ing in economic damage or even threatening the
mains of software complexity in linear metric spaces
in order to represent a program by a set of vectors
health or lives of human beings (Basili, 1987). De-
whose magnitudes and distances represent metrics of spite the evolution and advance of methods and
the program components, and define a “unified com- tools of design, programming, testing, debugging,
plexity metric” of the program size and the effort and project management, software engineering is
needed to produce it over the multilinear complexity still more craft than science.
space conformed by the three complexity spaces. The software industry faces serious problems in
These metrics may be used to define a statistical software project management, because it is difficult
method that estimates the size of a program and the to control what cannot be measured. The principal
effort needed to produce it from the external system problems come from the most important character-
design, the productivity in software projects, and the
istic of software: software is not a physical object but
quality and value of software products.
an intellectual one; its components are hidden, its
interfaces and connections are obscure.
1. INTRODUCTION Then, the most fundamental question is “how big
To pass from art to science, a discipline must be is a program?” But first we must define what “big”
abridged by quantitative methods. It is necessary to means, at least by comparing one program to an-
know about the “essence” of the objects under study other to find out the difference of effort invested.
and quantify such essence. Physics became a science “Size” is not obvious for software. Metrics must be
objective in the sense that the measurement process
will yield the same result no matter who applies it
(Beizer, 1990).
Address correspondence to Prof: Renato R. Gonzalez, Faculty of
Engineering, Santa Domingo Institute of Technology, EPS #P6103,
We can identify seven sources of software com-
P.O. Box 02-5261, Miami, FL 33102-5261. plexity. Their correlation and interdependence de-
termine the level of complexity in a software product module and external complexity due to module in-
(Figure 11, although most metrics measure only one terrelationships (Lew et al., 1988).
software complexity factor. Using statistical factors
analysis, researchers have found a huge amount of
common variance among internal complexity met-
rics, such as Halstead software science metric, state- 2. SOME CONSIDERATIONS FOR BUILDING
ment counts, the McCabe cyclomatic number, logi- A METRIC
cal depth, etc. (Munson and Khoshgoftaar, 1992). From a mathematical perspective a metric is a func-
Sixteen metrics selected in this study have been tion that takes two arguments from certain domains
reduced to five complexity domains: control struc- or spaces and whose returned value represents a
ture, size of code, information content, modularity, difference between the two arguments. However,
and data structure. software complexity measurements, often called
This fact would suggest that the software compo- “metrics,” are functions that take one argument
nents’ size is the primary aspect assessed through whose returned value represents a measurement of
the technical and operational dimensions of many certain aspects of the single argument (Ramamurthy
metrics. As Curtis (1987) pointed out, “the perva- and Melton, 1988). These measurements are ordinal
siveness of the effect of size on software metrics has quantities of software complexity. By ordinal, we
made it difficult for most studies to demonstrate that mean that the metric is intended to order programs
metrics perform better than lines of code in predict- in relation to their complexity, but no conclusions
ing many criteria.” can be drawn as to the “distance” between two
Other types of metrics include those called “de- measures or values (Harrison, 1992).
sign metrics” (Henry and Selig, 1990; Szulewsky In this sense, most metrics are incomplete be-
et al., 19841, which measure external software com- cause they measure only one factor of program
plexity; that is, the level of module coupling using complexity, in spite of a program having different
the control and data connection between system sources of complexity. The best metric measures the
modules. These metrics were developed from primi- major quantity of these complexities’ factors to tell
tive design metrics that are predictive, objective, and more about the program and to explain better its
automatable (Zage et al., 1992). But, when describ- complexity. In other cases of looking for this objec-
ing the complexity of a software system, we must tive, many metrics are simple linear compounds of
take into account both internal complexity of each others metrics. Nevertheless, the sum of two or
more measures does not necessarily offer additional
information about program complexity (Munson and
Khoshgoftaar, 1992).
Statistical measurements, such as arithmetic mean,
variance, and covariance, are indicators of certain
relationships among factors, but they are not metrics
in a strict mathematical sense. They tell us some-
thing about certain common characteristics, factor
correlation, and interdependence with some proba-
bility measurement error. However, certain statisti-
cal methods are useful for estimating internal soft-
ware complexity from the software design, a critical
aspect of software project management.
To solve this problem, we attempted to build a
unified software complexity metric that takes into
account the different domains, dimensions, and fac-
tors of software size. This complexity metric must be
useful in three ways: (1) to provide a measurement
that allows comparison of the relative complexity of
two different programs; (2) to take two valves from
an argument and return a measurement of distance
or variation between the two arguments; and (3) to
obtain an indirect measure to estimate and predict
Figure 1. Different sources of software complexity. the productivity of the people in software projects
A Unified Metric of Software Complexity J. SYSTEMS SOFTWARE 19
1995; 29~17-37
and the reliability, performance, quality, and value trol complexity through theory and discipline (Mills,
of the software product. 1972). As Dijkstra et al. (1970) argued, “the art of
programming is the art of organizing complexity by
mean of structures.” Figure 2 shows the PASCAL
3. THE DOMAINS AND MAGNITUDE code of the algorithm to merge two ordered lists as
OF SOFTWARE COMPLEXITY an example of a program using structured program-
ming.
3.1 Programs and Chunks In this sense, a program can be split into elemen-
Consider a computer program as a combinatorial tal objects or blocks called chunks, which constitute
activity whose components may be represented as an ordered set M = {M,, M2, . . . , Mk}. A chunk is a
finite discrete mathematical structures, specially di- sequence of one or more adjacent program state-
rected graphs, and trees. A directed graph G = ments Mi = {zi, zi+ 1,. . . , zi+,J with the property
(V, A, cp) consists of a nonempty set V of nodes, a that there is no explicit transfer of control to any
set of arcs, and a function q from A to the Carte- statement in the sequence other than zi (Davis and
sian set I’ x V, so that cp(a) = (u, v), where a E A LeBlanc, 1988). That is, whenever the first statement
and (u, u) E V x I/. A tree is a connected graph of the block is executed, the other statements are
with no cycles. A rooted tree is a directed graph with executed in a given order (Weyuker, 1988).
a distinct node LI,,such that all nodes are accessible If the program was made by use of structured
from c0 (Fisher, 1977). The basic logical structures programming rules, then zi is the decision point
are defined by structured programming (sequence, (ifthenelse, dowhile, or dountil) with transfer of con-
decision, and iteration) following its purpose of con- trol in implicit form, or a null statement when the
C:\compiler\pascal>type program.prg
Begin
Indexa := 1;
Indexb := 1;
Indexc := 1;
While (Indexa <= n and Indexb <= m] do
Begin
If A[Indexa] < B[Indexb] then
Begin
C[Indexcl := A[Indexa];
Indexa := Indexa + 1
End
Else
Begin
Figure 2. Algorithm to merge sorted lists. C[Indexc] := B[Indexb];
Indexb := Indexb + 1
End;
Indexc := Indexc + 1
End; If Indexa > n then
For k = 0 to (m - Indexb) do
Begin
Indexc := Indexc + k;
Indexb := Indexb + k;
C[Indexcl := B[Indexb]
End
Else
For k = 0 to (n - Indexa) do
Begin
Indexc := Indexc + k;
Indexa := Indexa + k;
C[Indexc] := A[Indexa]
End;
End.
20 J. SYSTEMS SOFTWARE R. R. Gonzalez
1995; 2917-37
ways considered auxiliary to a higher purpose in Nodes in V, are terminal chunks, and arcs in A, are
program specification and coding. the control paths. Each path is weighted by the
Then, we can represent the syntactical structure Boolean function of the decisional path
of a program as a context-free grammar g = (V,, P(X,, x2,. . ., x,,) = x, A x2 A . . . A x,, constituted
V,, p, M), where V, is a set of chunks as variables by the conjunction of different predicates xi or
or nonterminal symbols, V, is the set of elemental decision points, from the root node to the leaf or
instructions or terminal chunks as terminal symbols, terminal chunk in the syntax tree Gr (Figure 8).
p is the combination of logical control structure as Program M has five Boolean functions:
production rules, and M is the start symbol or
program (Hopcroft and Ullman, 1969). From this . P” =.x0,
perspective, it is also possible to represent a pro- l p, = (x1 A x2) A (X,)
gram via Buckus-Nuurform (BNF) notation, as shown
l P2 = (x, A x2)
in Figure 6.
Using this definition, we can represent the pro- l P3 = (x, A x5)
gram syntax structure by use of a derivation tree, l P4 = (x4 A xc)
also called abstract syntax trees GT = (V, A, q+),
where V = VT u V , and A is the set of arcs such
that where x0 = T, x1 = (indexa < n), x2 = (indexb <
m), x3 = (A[indexa] < B[indexb]), x4 = (indexa > n),
1. cp,(root) = M
X 5 = (0 5 k 5 (m - indexb)), x6 = (0 s k I (n -
2. &ode of outdegree 0) E VT indexa)), and xi E {T, F}. Figure 8 shows the control
3. For any node u E V, if the set of immediate flow graph and the Boolean functions associated
successors of v is nonempty and ordered with each control path. Each pi in G, represents the
v u . . . > vk, then (Pj-(U) + (Pr(U,) logical decision point of the terminal chunk and is
cp:; VJ’. . . cpT(vk) is a production of g (Fisher, considered as part of the chunk instructions.
1977). VN is the set of nonterminal chunks or the The magnitudes of the syntactical domain are as
internal nodes in the tree, and Vr the leaves, follows:
terminal symbols, or terminal chunks. The decisional level (L) defines the logical com-
This idea is represented in Figure 7. plexity of the program or nesting level and is repre-
A controlflow graph G, = (V,, A,, cp,) represents sented by the binary tree of the Boolean function p
the logical dependence sequence among chunks. in disjunctive normal form. The nonterminal nodes
I 4
. Avntactical
I Software
Complexity c Cohesiveness FF) -
Structural information
Content (Hd
in the tree are the logical expressions or predicates Code length 6) is the quantity of code contained
of p, and the leaves or terminal nodes are the in the terminal chunks based on properties of pro-
logical values, false or true, of the elemental predi- gram or specification text without interpreting the
cates. The depth of the tree is the length of the text or ordering of components of the text in chunks.
maximum logical path of logical expressions, which It is conformed by different operands and operators
is the logical level of the chunk in the syntactical of the terminal chunk, that is, the syntax expressions
structure of the program. Figure 9a shows the pi of decision points and the basic chunk instructions
binary tree. alike in syntactical time. These are represented by
Syntactical time CT>is defined by the abstract syn- every node in the abstract syntax tree as shown in
tax of a chunk that identifies the meaningful compo- Figure 9b. This magnitude also represents the ab-
nents of each syntactical construct or expressions. stract level of programming language, and so is also
An abstract syntax tree represents the components termed linguistic complexity (Beizer, 1990).
and subcomponents of the construct. Nonterminal
nodes in the tree are operators, and the leaves or
terminal nodes are the operands (Sethi, 1989). The 3.4 Functional Complexity Domain
operators represent logical time consumption in the In this domain, we abstract the commonly known
terminal chunks viewed as units of time in the idea of computer program, the “. . . finite set of
syntactical expressions, that is, arithmetic operators, functions, called instructions (or module), each with
control operators, logical operators, input and out- a finite domain contained in a common set, called
put operators, call operators, and functions. the data space, and a finite range contained in the
Figure 9b shows the abstract syntax tree for the Cartesian product of the data space and the pro-
while and if instructions in chunk M,, which are gram, called state space” (Mills, 1972).
considered decisional point instructions as part of The syntactical domain concerns logical structure
terminal chunks Mz2, Mzll, and Mz12. and text quantity of chunks, but not relationships
A Unified Metric of Software Complexity J. SYSTEMS SOFIWARE 23
1995: 29~17-37
0 =
M
0 M3
0 =
Ml Zll 212 z13
0M2 = @I @ 1
0M21 = Q&JIG
0M22 = 2221
0M31 = C@J
context, data variables acquire a meaningful or se-
z 3111 z 3112 z 3113 mantic interest from the point of view of the module
function, and these take certain values from a de-
0M32 = @I
fined set or data space. This is why it is considered
in a higher level of software design.
The functional program process is constituted of
transformation paths. Each execution path has dif-
23211 z3212 23213
ferent transformation functions, which transform the
input message into an output message (Lew et al.,
Figure 6. BNF notation for program M.
1988). From this abstract point of view of a chunk or
module, interaction and complexity can be charac-
among the chunks. The data variables, for example, terized in terms of information content of a message
are simple logical formalisms of a syntactical struc- passed to the chunk, called chunk coupling, and the
ture. In the functional domain,_ the connections information content of the internal relationships
among chunks are based on the interchange and among the instructions that transform the input
transformation of data and control variables. In this message into an output message, called chunk cohe-
sion.
The magnitudes of functional domains are as fol-
lows:
Functional coupling (L), the logical functional level
of the chunk, measures the complexity produced by
the different input execution paths coming in a
chunk in control flow graph Go. It makes up a
subcontrol flow graph in which the terminal chunk is
in the last position. There is one subcontrol flow
graph per terminal chunk (Figure 10).
Data coupling 6) represents the quantity of infor-
mation handled by a chunk due to the number of
different fields coming in from other chunks and
Figure 7. Syntax tree Gr of program M. going out to other ones, global variables, and data
24 J. SYSTEMS SOFTWARE R. R. Gonzalez
1995; 29:17-37
Gc 2 1 2 8 GC3 2 1 +
h= de pt h of t he t re e
(4
6 Mm
components of a complexity vector in a particular program M: decisional level, code length, and syn-
vectorial space. tactical time.
By the Euclidean triangular relation among vector S, is the displacement of the chunk in terms of
components in Figure 14, we obtain the linguistic information content as a measurement
of the quantity of code in programs. In Appendix 1,
we obtain the entropy expression H, in formula Al,
which estimates the value of code length for a chunk.
This value depends on the number of different oper-
By substituting equation (4) in equation (31, we ators (n,,) and operands (n,,) and its occurrences
obtain the following efsort equation to the chunk i in viewed as entropy classes. Table 1 shows the occur-
the specific vector space E: rence of symbols per chunk of program M, and the
values of H&z,r + ns2) are shown in Table 2. The
chunk position is the displacement in relation to the
predecessor chunk in the ordering of control flow
Then, the total effort needed to produce the pro-
graph G,,
gram M is
Operands (n,,)
n 1 1 1 2
m 1 1 1
3 3
1 5 1 5
1 2 4
2 3 3
1
1
1
0
Operators (n,y2)
While
For
If
>
<
I
and
:=
t
_
Totals
n,, 4 9 9 2 9 I
ns2 1 7 7 2 I I
N,l 6 15 15 3 16 16
42 3 9 9 2 10 10
28 J. SYSTEMS SOFTWARE R. R. Gonzalez
1995; 29~17-37
obtained by applying the expression H&r,,) in Ap- is the total syntactical entropy effort derived by use
pendix 1. Table 1 shows the occurrence of operator of equation 6.
symbols, and Table 2 shows the values of informa- Table 2 shows these entropy values and magni-
tion content of syntactical time of chunks in pro- tudes for the terminal chunks of program M in the
gram M. vectorial space Es.
L, is the entropy or information content of the
syntax decision level, and it measures the uncer- 5.3 Functional Vector Space E,
tainty of a particular logical value in a predicate as
Vector complexity space Ef is the set of vectors
Boolean function /3 given a set of possible values.
efi = (sfi, tfi, lfi)sMf c Ef that represents the func-
As pointed out in Section 3.3, a Boolean function
tional factors of chunks: functional cohesion, data
may be represented by a binary decision tree (Figure
coupling, and functional coupling.
9). The nodes in the levels of decision of the tree
Sf is the functional chunk position. Because the
represent entropy classes used to obtain the value of
chromatic information content H, measures the de-
entropy by applying the definition of a probability
gree of a node in a graph, it can be used to measure
function (Appendix 3). Table 3 shows the values of
the information content due to the number of fields
L, derived by use of formula A8 (Appendix 3), which
in a data flow graph or data coupling in chunks (Lew
are the entropy values or decisional information
et al., 1988). In Appendix 5, we explain the concepts
contents of chunks.
of chromatic information content measurement. We
Thus, a chunk in E, is defined by esi, and the
have proved that the maximum chromatic entropy
expression
for a node is 2, based on the fact that it is possible to
IIMlls = EIle,,lI
i=l
(8)
represent the elements (coming in or going out) in
nodes by use of four colors at the most as entropy
classes.
is the sum of vectorial magnitudes that represents The functional chunk position is given by its data
the syntax information entropy volume of the pro- length and the displacement in relation to the posi-
gram M. tion of the predecessor chunk in the ordering given
by the control flow graph G,:
IlMllse = g L,, . Jiq q (9)
I=1 Sf; = Hq + Sf,,-,, (10)
where HJn,,) is the value of chromatic information S, represents, in a data structure graph, the de-
content of the chunks coupling i with nfs data fields gree of a node or the number of different fields in
coming in and going out. the data structure, located by the displacement in
In the example shown in Figure 2, chunk Iw, has the order of the control flow graph. As with data
three data fields going out, so we use three different coupling, we use chromatic information content
colors as entropy classes, then apply formula 5.1 H,(n,,) as a measure of entropy or uncertainty,
(Appendix V): H,,(3) = -3/3 (log l/3) = 1.585. where n,, is the number of data fields used as input
Because chunk M2,, has eight data fields, and output in the data node of the data structure
HKZ1,(8) = -2(4/S log4/8) = 1.000 with two colors graph Gd.
used as minimum entropy classes. Then Sri = 1.585
SC,= HK,+ s+,, (13)
+ 1.00 = 2.585 is the functional position of the
chunk MI, ,. Table 3 shows the values of Sr derived is the positional equation for each chunk in M.
by applying formula 10. T, is the time expended in making the decision to
L, measures the level of uncertainty from select- execute the terminal chunk instructions in a particu-
ing between a set of possible paths in control flow lar execution path. Time is measured by computa-
graph G,. The notion of structural information con- tional information content, which is the quantity of
tent explains that the larger the number of nodes in information needed to represent the decisional time
a class or orbit, the greater the uncertainty that a consumption.
particular realization is selected (Lew et al., 1988). In Appendix 4, we show that algorithmic complex-
In Appendix 2, we explain the theoretical bases of ity may be represented by a decisional tree and, as
the structural entropy. Table 3 contains the value of with syntactical decisional information content, its
Lf derived by use of formula A2. entropy or information content is an asymptote
T[ is the degree of interaction among instructions function bounded by 2. Using this conclusion, we
or temporal dependence in a chunk as a measure- obtain the values of information content of a typical
ment of cohesiveness. I have represented this rela- function of algorithm complexity shown in Table Al.
tion by use of a structural flow graph G, in Figure L, uses structural information content as a mea-
12, which shows the relationship between instruc- surement of the number of hierarchical levels of
tions within a chunk. As in Lf, structural informa- data dependence. Its value increases as the number
tion content measures the entropy or uncertainty of of dependent nodes increases in the data structure
this order by mean of orbits as entropy classes of the Gd and, in the computational domain, represents the
instructions’ dependence, as explained in Section level of uncertainty in making the decision to locate
3.4. Table 3 shows the values of Tf by applying a data entity in a data space (Appendix 2). For
formula 2.1 (Appendix 2). example, in an array data structure there are two
levels of decision; then, HC = -2(lj2log l/2) =
IIMIIF= t Ile ,,ll (11) 1.000 is the information content; in a graph, data
1= I structure is the number of data nodes in different
is the sum of the vectorial magnitudes and repre- data paths, and its orbits are entropy classes.
sents the functional Jyntax entropy volume of the The expression
program M.
IIMII, = E lie,, (14)
!=I
IIMllFr = 5 L,, . &g iq (12)
I= I represents the computational information content uol-
ume. Using equation 6, we obtain
is the totalfinctional entropy effort, derived by use of
effort equation 6. IlMllc e= 5 L,, . &Fq (15)
Table 3 shows these entropy values for the pro- I= 1
gram example shown in Figure 2 in vectorial space which is the total computational entropy effort value.
E.,. Table 4 shows the computational entropy values of
chunks in the program M.
5.4 Computational Vector Space E,
Vector complexity space is composed of the vectors 6. A UNIFIED METRIC AS MEASUREMENT
eci = (s C,)t C1)1CI) 7which represent the computational OF WHOLE COMPLEXITY
factors of chunks: data length, algorithmic time, and We have defined E, as syntactical vector space, Es.
data structure hierarchical level of chunks. as functional vector space, and E, as computational
30 J. SYSTEMS SOFTWARE R. R. Gonzalez
1995: 29~17-31
vector space, each in H3, where Z-Z3is a three-linear the metric over E or distance from ei to ei+ I,
vector entropy space. But to obtain a metric that defined by
takes into account all the software complexity fac-
II@(e,i, ef,, eCi) - @(esi+l,efi+l, eci+,)II
tors together, we need to build a unified space of
software complexities. This space is a multilinear = ll@(esi - es,+,, ef, - ef,+,, e, - ecc+)ll
combination of the three complexity spaces, and
=Idet(M,,+l - M,,)I (19)
over this new space we define a new complexity
metric. Then, Es x Ef X E, is called a three-linear which is the total amount of information content
entropy space. Let @ be a map defined by needed to produce chunk Mi+ I after chunk Mi, in
@:E,xEfxE,-+H the ordering of control flow graph G,. Thus
We,, ef, c.e,) = c.Ne,, e2, e,> is the total entropy effort needed to produce the
program M. These metrics are termed uniJied soft-
@ is a three-linear form over H, or a three-couatiunt
tensor over H. The set of three-covariant tensors
over H we term T,(H), which also is a vector space
over H. If we define a 3 x 3 matrix A4, composed of L
Es x Er x EC
the three complexity vectors of the chunk iVi in Es, t
Ef, and EC as the rows of the matrix, then
ware complexity metrics. Table 5 shows the complex- stochastic Markov chain. The likelihood function of
ity values derived by applying the unified metric to X1,X2,..., X,, is defined to be the joint density
program M. function
person-month. Then, we can estimate the total cost which is called the robust software quality estimator:
and time of software development. the degree of approximation to the optimal software
solution.
number of dollars that must be expended to accom- + l/5 log l/5) = 1.922 is its linguistic information
plish the desired function or performance. It should content value.
be noted that quality is part of function (Pruett and
Hotard, 1984). Then the expression
APPENDIX 2: STRUCTURAL INFORMATION
CONTENT
represents software value, where C, is the total cost Definition A2.1. We consider the control flow
of software, which is a measure of the software graph G,(M) as the basic functional structure of a
performance in relation to the development cost. program or a segment of a program, where M =
{M,, Mz, . . . , Mm) is the set of nodes or chunks in
G,. Then, it is possible to define a permutation as a
APPENDIX 1: Linguistic information Content one-to-one function 7r from a subset of M onto
As we explained in Section 3.3, we can represent the itself. We call all possible permutations over M S,
abstract syntax of instructions in a program or chunk (Fisher, 1977).
using an abstract syntax tree, which shows the com- In general, the image of each element under a
ponents and subcomponents of the syntactical ex- permutation rr can be realized by the description
pressions. Nonterminal nodes in the tree are opera-
1
tors, and the leaves or terminal nodes are the M, M, ... M,,
(A21
operands, called symbols or tokens of the expres- n-(M,) a(MJ ... TTT(M,)
sions. Taking each syntax tree of the different ex-
pressions, we obtain II symbols in a chunk or pro-
gram, and we classify the symbols by the occurrence Definition A2.2. If we have a function r that
of different symbol types. If we assume there are t transforms G, onto itself, preserving the adjacen-
symbol types in our sample of n tokens, then ties, then we say n is isomorphic. As S, c S, is the
I set of all possible permutations of 7~ in G, over G,,
n= En r then S, conforms a permutation group named G.
I= I The number of elements in the permutation group
and the relative frequency is we call order of group or /G/.
Definition A2.6. The structural information con- Dejinitiun A3.2. We define the entropy classes as
tent of the G, is defined by the entropy relation the numbers of decisions in the different levels of
the decisional tree (2’, i = 1,2,. . . , h). Then, the de-
Hc(Gc) = - tpilogpi (A4) cisional information content is defined by the en-
i=l tropy function
where k is the number of equivalent classes or
orbits, pj = ki/m, ki is the number of nodes in the H,(G,) = W-0
i=o
class Ci, and m is the number of chunks in M (set
of nodes in G,). This equation is transformed in
The maximum entropy is obtained when n = k,
because there is only one element per orbit. In other H, = log n - $rcn + l)log(n + 1) - 2n]
words, there is no possibility to interchange ele-
ments in G,, and so the n elements are fixed. Then, = 2 + log tI 1 + ; log(n + 1)
( 1
1
HE = -; log z = logk=log($b(g)~) and from the last two terms of the equation we
obtain the inequality
log(n f 1)
- log IGI = log n (AS
[(1+;1
logln + 1) - log II I
1 (n + 1)
The minimum entropy value is obtained when we
have one orbit or class for the n elements. In other and
words, it is possible to do a total interchange of the logh -t- 1) h+l
n elements in G,. Then H,12- =2-_ (A91
2h+’
n+l
k k
Hc = - k log k =logl = log c IF(g)1 using formula A7. Applying L’Hopital rule in equa-
LSEG tion A9, we get
- log IGI = 0 646) lim Ha = 2
h-sm
In Figure 10, the subgraph Gc3i1 of chunk M,,, This result demonstrates that the entropy of the
has five nodes,. The isomorphic permutation group decisional logical process in a program is a asintote
G is composed of {go, g,, ) as permutation functions. function upper bounded by 2.
There are four orbits; the equivalent classes are Figure 9a represents the decisional binary tree of
(11, {21,{1),{1), because nodes M,, I&, and M3i, the Boolean function PI. It has four entropy classes
A Unified Metric of Software Complexity J. SYSTEMS SOFIWARE 35
1995; 29~17-37
Definition A5.2. The chromatic number of the Davis, J. S., and LeBlanc, R. J., A Study of the Applicabil-
graph G is the smallest natural number k I n for ity of Complexity Measures, IEEE Trans. Software Eng.
which G has a k coloring or K(G). Then the chro- SE-14, 1366-1372 (1988).
matic information content H,(G) of G is defined by Dijkstra, E. W., Dahl, 0. J., and Hoare, C. A., Structured
Programming, Academic Press, New York, 1970.
k Estes, W. K., Models of Learning, Memory and Choice:
H,(G) = - &logpi (AM Selected Papers, Praeger Publishers, 1982, pp. 23-24.
i=l
Fisher, J. L., Application-OrientedAlgebra: An Introduction
to Discrete Mathematics, Harper and Row, 1977.
where k is the number of equivalent classes or k
Halstead, M. H., Toward a theoretical basis for estimating
coloring in G used as entropy classes, pi = ki/m, ki programming effort, in Proceedings of the ACM Confer-
is the number of arcs in the color class i, and m is ence, 1975.
the number of arcs in A. Halstead, M. H., Elements of Software Science, Elsevier
Furthermore, Guthrie’s conjecture (Toranzos, North-Holland, New York, 1977.
1976) states that if G is a planar graph, then Harrison, W., An Entropy-Based Measure of Software
K(G) 5 4. In other words, with only four colors, it is Complexity, IEEE Trans. Software Eng. 18 (1992).
possible to color the arcs of any graph G. Then, Harrison, W., and Magel, K., A Complexity Measure Based
from the chromatic information content equation on Nesting Level, SZGPLAN Not. 63-74 (1981).
A12, and doing K, = m/4, H,(G) I 2. Henry, S., and Selig, C., Predicting Source Code Complex-
For example, if we have a node in M with three ity at the Design Stage, IEEE Software 7 (1990).
arcs coming in or going out of A, then we can use Hopcroft, J. E., and Ullman, J. D., Formal Languages and
Their Relation to Automata, Addison-Wesley, 1969.
three color classes as a minimum number, and then
Jensen, H. A., and Vairavan, K., An Experiment Study of
H,(3) = - 3(1/3>log l/3 = 1.585 is its chromatic in-
Software Metrics for Real-Time Software, IEEE Trans.
formation content. A node with seven arcs can use Software Eng. SE-11, 231-234 (1985).
three color classes (31, (31, and (1); then H, (7) = Khoshgoftar, T. M., Bhattacharya, B. B., and Richardson,
-(2(3/7)log 3/7 + 17 log l/7) = 1.449. G. D., Predicting Software Error, During Development,
Using Nonlinear Regression Models: A Comparative
Study, IEEE Trans. Reliability 41 (1992).
ACKNOWLEDGMENTS Lew, K. S., Dillon, T. S., and Forward, K. E., Software
Complexity and Its Impact on Software Reliability IEEE
I wish to thank Professor Wilfrido Fiallo for his insightful
suggestions and his important assistance that have helped to Trans. Software Eng. SE-14,1645-1655 (1988).
improve this work. Lipschutz, S., General Topology, McGraw-Hill, 1970,
pp. 111-119.
McCabe, T. J., A Complexity Measure, IEEE Trans. Soft-
ware Eng. SE-2, 308-320 (1976).
REFERENCES Miller, G. A., The Magical Number Seven, Plus or Minus
Baase, S., Computer Algorithms: Introduction to Design and Two: Some Limits on Our Capacity to Processing Infor-
Analysis, Addison-Wesley, 1988. mation, Psychol. Rev. 63 (1956).
Basili, V. R., Selby, R. W., and Hutchens, D. H., Experi- Mills, H. D., Mathematical Foundations for Structured
mentation in Software Engineering, IEEE Trans. Soft- Programming, IBM Report FSC 72-6012, 1972.
ware Eng. SE-12, 733-743 (1986). Mood, A. M., Graybill, F. A., and Boes, D. C., Introduction
Basili, V. R., and Rombach, H. D., Implementing Quanti- to the Theory of Statistics, McGraw-Hill, 1974.
tative SQA: A Practical Model, IEEE Software, Sep- Mostow, G. D., and Sampson, J. H., Linear Algebra,
tember 1987. McGraw-Hill, 1969, pp. 259-260.
Beizer, B., Software Testing Techniques, Van Nostrand Munson, J. C., and Khoshgoftaar, T. M., Measuring Dy-
Reinhold, New York, 1990. namic Program Complexity, IEEE Software (1992).
Belady, B. L. A., Software geometry, in Proceedings of the Muss, J. D., Tools for Measuring Software Reliability,
1980 International Computer Symposium, 1980. IEEE Spectrum 39-42 (1989).
Boehm, B. W., Software Engineering Economy, Prentice- Pruett, J. M., and Hotard, D. G., The Value of Value
Hall, Englewood Cliffs, NJ, 1981. Analysis, Zndust. Manag. (1984).
CotC, V., Bourque, P., Oligny, S., and Rivard, N., Software Ramamurthy, B., and Melton, A., A Synthesis of Software
Metrics: An Overview of Recent Results, J. Syst. Soft- Science Measures and Cyclomatic Number, IEEE Trans.
ware 8, 121-131 (1988). Software Eng. SE-14 (1988).
Coulter, N. S., Software Science and Cognitive Psychology, Scudder, R. A., and Kucie, A. R., Productivity Measures
IEEE Trans. Software Eng. SE-9, 166-171 (1983). for Information Systems, Infor. Manage. 20 (1991).
Curtis, B., Foundations for a Measurement Discipline, Sethi, R., Programming Languages: Concepts and Con-
Quality Time, IEEE Software (1987). structs, Addison-Wesley, 1989, pp. 383-388.
A Unified Metric of Software Complexity J. SYSTEMS SOFTWARE 37
1995; 29~17-37
Shannon, C., A Mathematical Theory of Communication, Toranzos, F. A., Introduction a la Teoria de Grafos, The
Bell Syst. Tech. J. 27, 379-423, 623-654 (1948). General Secretariat of the Organization of American
Schneiderman, B., Software Psychology, Wintrop, Cam- States (OAS), Washington, D.C., 1976, pp. 53.
bridge, MA, 1980. Weyuker, E. J., Evaluating Software Complexity Mea-
Shooman, M. L., Software Engineering: Design, Reliability sures, ZEEE Trans. SofnYare Eng. SE-14 (1988).
and Management, McGraw Hill, New York, 1983, Wirth, N., Program Development by Stepwise Refinement,
pp. 150-151. Commun. ACM (1971).
Siegrist, K., Reliability of Systems with Markov Transfer Wirth, N., Algorithims + Data Structures = Programs,
of Control, II, IEEE Trans. Software Eng. SE-14 (1988). Prentice-Hall, 1976.
Sneed, H. M., and Merey, A., Automated Software Quality
Woodfield, S. N., Enhanced Effort Estimation by Extend-
Assurance, IEEE Trans. Sofhvare Eng. SE-11, 909-916
ing Basic Programming Model to Include Modularity
(1985).
Factors, Ph.D. Thesis, Department of Computer Sci-
Szulewski, P. A., Sodano, N. M., Rosner, A. J., and
ence, Purdue University, 1980.
DeWolf, J. B., Automating Software Design Metrics,
Zage, W. M., Zage, D. M., Bhargava, M., and Gaumer, D.,
RADC-TR-84-27, Rome Air Development Center,
Design and Code Metrics Through a Diana Based Tool,
Rome, NY, 1984.
SECC-TR-109-P, Software Engineering Research Cen-
Taguchi, G., and Clausing, D., Robust Quality, Harvard
ter, Purdue University, 1992.
Bus. Rev. (1990).