Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

A Unified Metric of Software Complexity:

Measuring Productivity, Quality, and Value

Renato R. Gonzalez
Faculty of Engineeting, Santo Domingo Institute of Technology, Santo Domingo, Dominican Republic

There are different sources of software complexity. A with the notions of matter and measurement of
large set of complexity metrics can measure distinct length, mass, and time; thermodynamic became a
program attributes to show different program size indi- science with the measurements of temperature and
cators. Nevertheless, the size of a program must be entropy.
obtained from the overall program complexity based
It is impossible to carry out this objective until the
on the values of all program factors. Based on the
right questions are asked. Scientists must know what
concept that software complexity is a measurement of
the resources expended through the software life cy-
to measure before creating a metric of an object.
cle, and the fact that a program may be approached They must interpret the complex phenomena by
from three distinct perspectives, the complexity fac- reducing them to elementary events or processes
tors are classified into three complexity domains: syn- whose behavior is described by simple laws (Estes,
tactical, functional, and computational. Each domain 1982). To define a metric, we must know the context
represents the magnitudes of the factors in one of the of the object, its intrinsic nature and components,
three dimensions: length, time, and level or depth. the characteristics to be measured, and the space in
Using these ideas, in this article we define ordinal which the metric function will operate.
measures of the complexity factors based on discrete Today’s software applications are more complex
mathematical structures of programs and the informa-
and software failure more critical, potentially result-
tion content or entropy. transform the different do-
ing in economic damage or even threatening the
mains of software complexity in linear metric spaces
in order to represent a program by a set of vectors
health or lives of human beings (Basili, 1987). De-
whose magnitudes and distances represent metrics of spite the evolution and advance of methods and
the program components, and define a “unified com- tools of design, programming, testing, debugging,
plexity metric” of the program size and the effort and project management, software engineering is
needed to produce it over the multilinear complexity still more craft than science.
space conformed by the three complexity spaces. The software industry faces serious problems in
These metrics may be used to define a statistical software project management, because it is difficult
method that estimates the size of a program and the to control what cannot be measured. The principal
effort needed to produce it from the external system problems come from the most important character-
design, the productivity in software projects, and the
istic of software: software is not a physical object but
quality and value of software products.
an intellectual one; its components are hidden, its
interfaces and connections are obscure.
1. INTRODUCTION Then, the most fundamental question is “how big
To pass from art to science, a discipline must be is a program?” But first we must define what “big”
abridged by quantitative methods. It is necessary to means, at least by comparing one program to an-
know about the “essence” of the objects under study other to find out the difference of effort invested.
and quantify such essence. Physics became a science “Size” is not obvious for software. Metrics must be
objective in the sense that the measurement process
will yield the same result no matter who applies it
(Beizer, 1990).
Address correspondence to Prof: Renato R. Gonzalez, Faculty of
Engineering, Santa Domingo Institute of Technology, EPS #P6103,
We can identify seven sources of software com-
P.O. Box 02-5261, Miami, FL 33102-5261. plexity. Their correlation and interdependence de-

J. SYSTEMS SOFTWARE 1995; 29:17-37


0 1995 by Elsevier Science Inc. 0164-1212/95/$9.50
655 Avenue of the Americas, New York, NY 10010 SSDI 0164-1212(94)00126-8
18 J. SYSTEMS SOFTWARE R. R. Gonzalez
1995; 29~17-31

termine the level of complexity in a software product module and external complexity due to module in-
(Figure 11, although most metrics measure only one terrelationships (Lew et al., 1988).
software complexity factor. Using statistical factors
analysis, researchers have found a huge amount of
common variance among internal complexity met-
rics, such as Halstead software science metric, state- 2. SOME CONSIDERATIONS FOR BUILDING
ment counts, the McCabe cyclomatic number, logi- A METRIC
cal depth, etc. (Munson and Khoshgoftaar, 1992). From a mathematical perspective a metric is a func-
Sixteen metrics selected in this study have been tion that takes two arguments from certain domains
reduced to five complexity domains: control struc- or spaces and whose returned value represents a
ture, size of code, information content, modularity, difference between the two arguments. However,
and data structure. software complexity measurements, often called
This fact would suggest that the software compo- “metrics,” are functions that take one argument
nents’ size is the primary aspect assessed through whose returned value represents a measurement of
the technical and operational dimensions of many certain aspects of the single argument (Ramamurthy
metrics. As Curtis (1987) pointed out, “the perva- and Melton, 1988). These measurements are ordinal
siveness of the effect of size on software metrics has quantities of software complexity. By ordinal, we
made it difficult for most studies to demonstrate that mean that the metric is intended to order programs
metrics perform better than lines of code in predict- in relation to their complexity, but no conclusions
ing many criteria.” can be drawn as to the “distance” between two
Other types of metrics include those called “de- measures or values (Harrison, 1992).
sign metrics” (Henry and Selig, 1990; Szulewsky In this sense, most metrics are incomplete be-
et al., 19841, which measure external software com- cause they measure only one factor of program
plexity; that is, the level of module coupling using complexity, in spite of a program having different
the control and data connection between system sources of complexity. The best metric measures the
modules. These metrics were developed from primi- major quantity of these complexities’ factors to tell
tive design metrics that are predictive, objective, and more about the program and to explain better its
automatable (Zage et al., 1992). But, when describ- complexity. In other cases of looking for this objec-
ing the complexity of a software system, we must tive, many metrics are simple linear compounds of
take into account both internal complexity of each others metrics. Nevertheless, the sum of two or
more measures does not necessarily offer additional
information about program complexity (Munson and
Khoshgoftaar, 1992).
Statistical measurements, such as arithmetic mean,
variance, and covariance, are indicators of certain
relationships among factors, but they are not metrics
in a strict mathematical sense. They tell us some-
thing about certain common characteristics, factor
correlation, and interdependence with some proba-
bility measurement error. However, certain statisti-
cal methods are useful for estimating internal soft-
ware complexity from the software design, a critical
aspect of software project management.
To solve this problem, we attempted to build a
unified software complexity metric that takes into
account the different domains, dimensions, and fac-
tors of software size. This complexity metric must be
useful in three ways: (1) to provide a measurement
that allows comparison of the relative complexity of
two different programs; (2) to take two valves from
an argument and return a measurement of distance
or variation between the two arguments; and (3) to
obtain an indirect measure to estimate and predict
Figure 1. Different sources of software complexity. the productivity of the people in software projects
A Unified Metric of Software Complexity J. SYSTEMS SOFTWARE 19
1995; 29~17-37

and the reliability, performance, quality, and value trol complexity through theory and discipline (Mills,
of the software product. 1972). As Dijkstra et al. (1970) argued, “the art of
programming is the art of organizing complexity by
mean of structures.” Figure 2 shows the PASCAL
3. THE DOMAINS AND MAGNITUDE code of the algorithm to merge two ordered lists as
OF SOFTWARE COMPLEXITY an example of a program using structured program-
ming.
3.1 Programs and Chunks In this sense, a program can be split into elemen-
Consider a computer program as a combinatorial tal objects or blocks called chunks, which constitute
activity whose components may be represented as an ordered set M = {M,, M2, . . . , Mk}. A chunk is a
finite discrete mathematical structures, specially di- sequence of one or more adjacent program state-
rected graphs, and trees. A directed graph G = ments Mi = {zi, zi+ 1,. . . , zi+,J with the property
(V, A, cp) consists of a nonempty set V of nodes, a that there is no explicit transfer of control to any
set of arcs, and a function q from A to the Carte- statement in the sequence other than zi (Davis and
sian set I’ x V, so that cp(a) = (u, v), where a E A LeBlanc, 1988). That is, whenever the first statement
and (u, u) E V x I/. A tree is a connected graph of the block is executed, the other statements are
with no cycles. A rooted tree is a directed graph with executed in a given order (Weyuker, 1988).
a distinct node LI,,such that all nodes are accessible If the program was made by use of structured
from c0 (Fisher, 1977). The basic logical structures programming rules, then zi is the decision point
are defined by structured programming (sequence, (ifthenelse, dowhile, or dountil) with transfer of con-
decision, and iteration) following its purpose of con- trol in implicit form, or a null statement when the

C:\compiler\pascal>type program.prg

Procedure Merge (Var A, B, C: Array of Integer; m, n:


Integer 1
Var Indexa, Indexb, Indexc, k: Integer;

Begin
Indexa := 1;
Indexb := 1;
Indexc := 1;
While (Indexa <= n and Indexb <= m] do
Begin
If A[Indexa] < B[Indexb] then
Begin
C[Indexcl := A[Indexa];
Indexa := Indexa + 1
End
Else
Begin
Figure 2. Algorithm to merge sorted lists. C[Indexc] := B[Indexb];
Indexb := Indexb + 1
End;
Indexc := Indexc + 1
End; If Indexa > n then
For k = 0 to (m - Indexb) do
Begin
Indexc := Indexc + k;
Indexb := Indexb + k;
C[Indexcl := B[Indexb]
End
Else
For k = 0 to (n - Indexa) do
Begin
Indexc := Indexc + k;
Indexa := Indexa + k;
C[Indexc] := A[Indexa]
End;
End.
20 J. SYSTEMS SOFTWARE R. R. Gonzalez
1995; 2917-37

block is a sequence structure. Chunks may contain 3.2 Software Complexity


other chunks conforming logical nested structures We define software complexity as a measurement of
produced by a stepwise refinement process. In a the resources that must be expended in developing,
nested logical structure, a chunk containing a se- testing, debugging, maintenance, user training, oper-
quence of instructions where zi is the only decision ation, and correction of software products
point is called a teminal chunk. Figure 3 shows the (Shooman, 1983). The complexity is a property of
control flow chart for the program example shown in the software life cycle, but by measuring the pro-
Figure 2. gram properties as a final product of the develop-
Another important aspect is that the available ment process, we might infer the level of complexity
evidence strongly suggests that programmers do not in a software system.
understand programs on a character-by-character or This complexity depends on (1) the nature of the
symbol basis, as suggested by many metrics problem itself, (2) the number of objects created in
(Halstead, 1975). Rather, they assimilate groups of the design of the solution (functional modules and
statements, which have a common function repre- data structures), (3) the interfaces and relationships
sented by chunks (Davis and LeBlanc, 1988). From among objects, (4) the algorithmic and logical struc-
an operational point of view, these groups of state- ture of the programs, (5) methods and tools used in
ments make up the execution units of the program; the development of solutions, (6) programming lan-
from a functional point of view, they are the trans- guages used to implement solutions, (7) the level of
formation units of input messages in output mes- human expertise involved in the development pro-
sages, and, from a syntactical point of view, they are cess, (8) the computer hardware and software envi-
the basic objects of the program .handled by the ronment, (9) project organization, and (10) project
logical structures of control. management.
When we think about computer problems, we see
the software solution not only in one layer but in
I M several parallel layers or domains, as is suggested in
Section 3.1. These domains have certain degree of
relationship and interdependence, because the same
components of the program play distinct roles in
different domains. This idea may be a powerful tool
in connecting the static text of a program with the
dynamic process it invokes in execution (Mills, 1972).
In this context, a program may be conceived in
three fundamental complexity domains: syntactical,
functional, and computational. Each has three mag-
M2 nitudes itself: length, time, and level. That is similar
to fundamental physical standard magnitudes of
length, time, and mass. Thus, we are facing a multi-
dimensional problem of software complexity
(Figure 4).
Each domain has ordinal measurements in terms
of information content that are based on the en-
tropy of the different magnitudes in order to build a
metric. Using these ideas, in Figure 5 I suggest a
taxonomy of software complexity; the measurements
of factors are explained in the following sections.
M3

3.3 Syntactical Complexity Domain


Program syntax is the structure of an algorithm
represented by program texts in a static way
(Dijkstra et al., 1970). In fact, we define the syntax as
the set of rules or formulas that defines the set of
Figure 3. Flow chart of merging sorted lists algorithm M. sentences. The syntactical structure is therefore al-
A Unified Metric of Software Complexity J. SYSTEMS SOFTWARE 21
1995; 29:17-37

ways considered auxiliary to a higher purpose in Nodes in V, are terminal chunks, and arcs in A, are
program specification and coding. the control paths. Each path is weighted by the
Then, we can represent the syntactical structure Boolean function of the decisional path
of a program as a context-free grammar g = (V,, P(X,, x2,. . ., x,,) = x, A x2 A . . . A x,, constituted
V,, p, M), where V, is a set of chunks as variables by the conjunction of different predicates xi or
or nonterminal symbols, V, is the set of elemental decision points, from the root node to the leaf or
instructions or terminal chunks as terminal symbols, terminal chunk in the syntax tree Gr (Figure 8).
p is the combination of logical control structure as Program M has five Boolean functions:
production rules, and M is the start symbol or
program (Hopcroft and Ullman, 1969). From this . P” =.x0,
perspective, it is also possible to represent a pro- l p, = (x1 A x2) A (X,)
gram via Buckus-Nuurform (BNF) notation, as shown
l P2 = (x, A x2)
in Figure 6.
Using this definition, we can represent the pro- l P3 = (x, A x5)
gram syntax structure by use of a derivation tree, l P4 = (x4 A xc)
also called abstract syntax trees GT = (V, A, q+),
where V = VT u V , and A is the set of arcs such
that where x0 = T, x1 = (indexa < n), x2 = (indexb <
m), x3 = (A[indexa] < B[indexb]), x4 = (indexa > n),
1. cp,(root) = M
X 5 = (0 5 k 5 (m - indexb)), x6 = (0 s k I (n -
2. &ode of outdegree 0) E VT indexa)), and xi E {T, F}. Figure 8 shows the control
3. For any node u E V, if the set of immediate flow graph and the Boolean functions associated
successors of v is nonempty and ordered with each control path. Each pi in G, represents the
v u . . . > vk, then (Pj-(U) + (Pr(U,) logical decision point of the terminal chunk and is
cp:; VJ’. . . cpT(vk) is a production of g (Fisher, considered as part of the chunk instructions.
1977). VN is the set of nonterminal chunks or the The magnitudes of the syntactical domain are as
internal nodes in the tree, and Vr the leaves, follows:
terminal symbols, or terminal chunks. The decisional level (L) defines the logical com-
This idea is represented in Figure 7. plexity of the program or nesting level and is repre-
A controlflow graph G, = (V,, A,, cp,) represents sented by the binary tree of the Boolean function p
the logical dependence sequence among chunks. in disjunctive normal form. The nonterminal nodes

I 4

. Avntactical

Figure 4. Software complexity domains and dimensions.


22 J. SYSTEMS SOFIWARE R. R. Gonzalez
1995; 2917-37

Complexity Domain Complexity Factor Complexity


and Space or Magnitude Measurement
I
Code Length (Ss) - Linguistic information
Content (Ho)

Syntactical lime (Ts) - Linguistic information


Content (Ho)

Decisional Level (La) - Decisional information


Content (Hs)

Dam Chromatic Information


Content (HK)

I Software
Complexity c Cohesiveness FF) -
Structural information
Content (Hd

Functional Coupling (LF) - Structural information


Content (Hg)

Data Length (SC) Chromatic Information


Content (HK)
Computational (EC) Algorithmic Time (Tc) Computational informetion
Content (HQ
I
* Data Hierarchical Level (k) -+Structurai information
Content (Hg)

Figure 5. Taxonomy of software complexity.

in the tree are the logical expressions or predicates Code length 6) is the quantity of code contained
of p, and the leaves or terminal nodes are the in the terminal chunks based on properties of pro-
logical values, false or true, of the elemental predi- gram or specification text without interpreting the
cates. The depth of the tree is the length of the text or ordering of components of the text in chunks.
maximum logical path of logical expressions, which It is conformed by different operands and operators
is the logical level of the chunk in the syntactical of the terminal chunk, that is, the syntax expressions
structure of the program. Figure 9a shows the pi of decision points and the basic chunk instructions
binary tree. alike in syntactical time. These are represented by
Syntactical time CT>is defined by the abstract syn- every node in the abstract syntax tree as shown in
tax of a chunk that identifies the meaningful compo- Figure 9b. This magnitude also represents the ab-
nents of each syntactical construct or expressions. stract level of programming language, and so is also
An abstract syntax tree represents the components termed linguistic complexity (Beizer, 1990).
and subcomponents of the construct. Nonterminal
nodes in the tree are operators, and the leaves or
terminal nodes are the operands (Sethi, 1989). The 3.4 Functional Complexity Domain

operators represent logical time consumption in the In this domain, we abstract the commonly known
terminal chunks viewed as units of time in the idea of computer program, the “. . . finite set of
syntactical expressions, that is, arithmetic operators, functions, called instructions (or module), each with
control operators, logical operators, input and out- a finite domain contained in a common set, called
put operators, call operators, and functions. the data space, and a finite range contained in the
Figure 9b shows the abstract syntax tree for the Cartesian product of the data space and the pro-
while and if instructions in chunk M,, which are gram, called state space” (Mills, 1972).
considered decisional point instructions as part of The syntactical domain concerns logical structure
terminal chunks Mz2, Mzll, and Mz12. and text quantity of chunks, but not relationships
A Unified Metric of Software Complexity J. SYSTEMS SOFIWARE 23
1995: 29~17-37

0 =
M
0 M3

0 =
Ml Zll 212 z13

0M2 = @I @ 1

0M21 = Q&JIG

M211 z 2111 z 2112


o=

M212 22121 22122


o-

0M22 = 2221

0M3 = @I@ Figure 8. Control flow graph G, of G,.

0M31 = C@J
context, data variables acquire a meaningful or se-
z 3111 z 3112 z 3113 mantic interest from the point of view of the module
function, and these take certain values from a de-

0M32 = @I
fined set or data space. This is why it is considered
in a higher level of software design.
The functional program process is constituted of
transformation paths. Each execution path has dif-
23211 z3212 23213
ferent transformation functions, which transform the
input message into an output message (Lew et al.,
Figure 6. BNF notation for program M.
1988). From this abstract point of view of a chunk or
module, interaction and complexity can be charac-
among the chunks. The data variables, for example, terized in terms of information content of a message
are simple logical formalisms of a syntactical struc- passed to the chunk, called chunk coupling, and the
ture. In the functional domain,_ the connections information content of the internal relationships
among chunks are based on the interchange and among the instructions that transform the input
transformation of data and control variables. In this message into an output message, called chunk cohe-
sion.
The magnitudes of functional domains are as fol-
lows:
Functional coupling (L), the logical functional level
of the chunk, measures the complexity produced by
the different input execution paths coming in a
chunk in control flow graph Go. It makes up a
subcontrol flow graph in which the terminal chunk is
in the last position. There is one subcontrol flow
graph per terminal chunk (Figure 10).
Data coupling 6) represents the quantity of infor-
mation handled by a chunk due to the number of
different fields coming in from other chunks and
Figure 7. Syntax tree Gr of program M. going out to other ones, global variables, and data
24 J. SYSTEMS SOFTWARE R. R. Gonzalez
1995; 29:17-37

Gc 2 1 2 8 GC3 2 1 +

h= de pt h of t he t re e

(4

6 Mm

Figure 10. Different subcontrol flow graphs for G,.

connecting two instructions, and cp the assigning


function of arcs to pairs of instructions (Figure 12).

3.5 Computational Complexity Domain


As Wirth (1976) pointed out, “programs, after all,
are concrete formulations of abstract algorithms
(b)
Figure 9. (a) Decisional binary tree of a Boolean function
in G,. (b) Abstract syntax tree for whiZe and if.

structure attributes. The set of modules


or functions
are the functional transformation processes repre-
sented as data flow graphs Gf = (Vr, Af, cp>; the
nodes in Vr are the terminal chunks and the global
data, the arcs in Af are the input and output
messages, and q(a) = (ui, uj>, . a E Af and ui, Vj E
VT (Figure 11).
Cohesiveness (T) is the degree of interaction among
instructions within a chunk. Two instructions have
temporal dependence if the assigning variable or
arguments of the first appears in the expression of
the second. In other words, there is an order of
predecessor-successor in time. Boolean function pi
is viewed as part of chunk instructions; the relation-
ship with the other ones is given by its arguments. w he re
We can represent these relations by use of a struc- di a re da t a st ruc t ure a t t ribut e s a nd
Di a re c ont rol a nd da t a inform a t ion.
tural flow graph G, = (2, A,, 401, where Z is the
ordered set of instructions in the chunk, A, the arcs Figure 11. Data flow graph Cr.
A Unified Metric of Software Complexity J. SYSTEMS SOFIWAKE 25
1995: 29:17-31

time and space: input and output access, size of data


structures, type of traveling on the data structure,
and the execution frequency of instructions needed
0Zll 0 P2
to produce new information or data elements. Con-
sequently, this domain is oriented to software per-
0ZIZ
0 Z221 formance or computational efficiency in the opera-
tional context.
0213 Data structures can be represented by a graph
Cd = (V,, A,, cp), where V, is the set of nodes or
data entities and A, the set of arcs representing the
relationship between nodes or groups of data fields,
such as hierarchical data structure or data relational
model represented in canonical form (Figure 13). A
complex data structure could result in greater de-
pendence between modules because we need more
information to describe the data structure. As an
algorithm is represented by a control flow graph G,,
it travels through the data structures flow graph Cd
following the order of execution control.
Figure 12. Structural flow graphs for instructions of a The three magnitudes of the computational do-
chunk. main are as follows:
The data hierarchical level (L) is the depth of the
data structure represented by the number of differ-
ent paths and the level of data node dependence in
based on particular representations and structures
the data flow graph Cd as a measure of the data
of data.. . . Decisions about structuring data cannot
structure complexity. We need to make more logical
be made without knowledge of the algorithms ap-
and computational decisions to locate a data ele-
plied to the data and that, vice versa, the structure
ment in a data space when the number of data
and choice of algorithms often strongly depend from
relations grows. The hierarchical level of a linear
the structure of the underlying data.”
array, for example, is two; the elements of the array
The computational domain of programs explains
are in the first level and the data type in the second
the consumption of computer resources in terms of
level.
Data length 6) is the degree of data nodes in a
data structure and is equal to the number of arcs
coming in and going out of the data nodes in Cd.
riGizq k Array 1 Therefore, for a data structure graph, the degree of
Data a node represents the number of different fields or
X A
type 0 attributes used by the chunk traveling by different
Data levels of the data hierarchy or different attributes
type
8 used to fetch a particular data entity or data field
group.
Algotithmic time (T) is the amount of work done,
measured by some specific complexity measurement,
which is the number of specified basic operations
performed in the execution of a program (Baase,
1988). Each terminal chunk in a control flow graph
has a computational complexity. It is the time ex-
pended in making the decision, using B function, to
execute basic chunk instructions and the time in-
vested in the execution of these instruction via a
data structure G, of IZ elements or entities. The
complexity is represented by a function f(n) of
Figure 13. Data structure graphs G,. worst-case behavior of the chunk’s algorithm.
26 J. SYSTEMS SOFTWARE R. R. Gonzalez
1995; 29~17-31

4. ENTROPY AS MEASURE OF SOFTWARE sents the magnitudes in three dimensions (length,


COMPLEXITY time, and level) of the terminal chunk i in a particu-
lar domain (syntactical, functional, or computa-
4.1 Software Information Content tional) of the program M in the ordering of control
If we see a program as a discrete information pro- flow graph G,. The vector ei E H3, where H3 is a
cess, then we can represent it as a Markov process. three-dimensional space over H, which is the set of
Then it is possible to define a quantity that will information entropy values.
measure, in some sense, how much information is We define the norm of the vector ei over vector
produced by such a process (Shannon, 1948). space E in H, written
Entropy is a measurement of this class used to IleJl: E + H
adapt the theoretical notion of measure of uncer- which represents the length of the complexity vector
tainty or variety in certain phenomena. Suppose we ei or the information content of the chunk i.
have E = (S, F, P) as an experiment, where S = The set (E, Ml) is called the complexity vector
{A,, A,, * * *, A,} is a set of elementary events, F is space with inner product (Lipschutz, 1970). Then, we
Bore1 field over S, and P(A,) = pi is a function can induce from the norm Ilell, llei - ei+lll, called
assigning real value (0 I pi 5 1, i = 1,2,. . . , n) to the metric over E or distance from e, to ej+ 1,
events Aj. Then the entropy of the experiment E is defined by d: E --, H and represented by the euclid-
applied to a finite sample of arbitrary elements of IZ ian norm
distinct types, termed entropy classes; the entropy
d(e,, ei+ 1)
associated with that sample can be calculated by use
of relative frequencies or number of occurrences in = J(Si- si+ 1 y + (ti - tj+J2 + (li - li+lj2 (2)
a particular class as unbiased estimates of their
where d is the amount of information content
probabilities pi (Davis and LeBlanc, 1988).
needed to produce chunk Mi+ 1 after chunk Iwi, in
Then,
the order of control flow graph G, in a particular
domain.
H = - i Pr 1% P, (1) We define the “effort” required to lift some vecto-
r=l
rial magnitude lIeill up to some height or level in a
is the entropy value or information content for the metric space by use of the Newton’s work law as
experiment E. W= F-DecosO (3)
Each event is associated with an object or compo- where W is the effort value, F = lIeill is the force
nent of the program (chunks, execution paths, data applied through the distance D, D = Li is the height
paths). Entropy is used to measure the information or level, and 8 is the vectorial angle formed by the
content of the variety of three aspects of chunks and application of the force. Figure 14 represents the
data structures in a program: the structure of chunks
and data connections, chunk and data content, and
chunk and data size.
The quantity of decisions taken to solve a problem
concerns the information content of the problem. In
other words, the quantity of decisions is directly
proportional to the problem complexity, that is, the
uncertainty grows when the level of logical decisions
grows.

5. THREE METRIC SPACES FOR THE


THREE DOMAINS

5.1 Definition of Metric Spaces


I define ordinal measures or entropy functions to
the different magnitudes of each domain to obtain a
new space called complexity vector space E with S,
T, and L as variables. The elements of E are called
complexity vectors ei = (si, ti, li), where ei repre- Figure 14. Complexity vectors in E.
A Unified Metric of Software Complexity J. SYSTEMS SOFTWARE 27
1995; 29:17-37

components of a complexity vector in a particular program M: decisional level, code length, and syn-
vectorial space. tactical time.
By the Euclidean triangular relation among vector S, is the displacement of the chunk in terms of
components in Figure 14, we obtain the linguistic information content as a measurement
of the quantity of code in programs. In Appendix 1,
we obtain the entropy expression H, in formula Al,
which estimates the value of code length for a chunk.
This value depends on the number of different oper-
By substituting equation (4) in equation (31, we ators (n,,) and operands (n,,) and its occurrences
obtain the following efsort equation to the chunk i in viewed as entropy classes. Table 1 shows the occur-
the specific vector space E: rence of symbols per chunk of program M, and the
values of H&z,r + ns2) are shown in Table 2. The
chunk position is the displacement in relation to the
predecessor chunk in the ordering of control flow
Then, the total effort needed to produce the pro-
graph G,,
gram M is

Chunk M, has H,,,(5) = 2.113 = S,, and chunk Mzll


has H02r1(16) = 3.768; applying formula 7 yields
S S211 = 2.113 + 3.768 = 5.881. Table 2 contains the
5.2 Syntactical Vector Space E, values of S, of the chunks in M.
Vector complexity space E, is a compound of the set T, is the syntactical time entropy based on linguis-
of vectors esi = (ssi, tsi, lsi> so that M, = {e,,, tic information content, like in S,, which measures
e s2, ...,esm}c E, is an ordered set that represents the quantity of time in terms of the different opera-
the values of syntax factors o terminal chunks in tors (n,,) in chunk i, as explained in Section 3.3. It is

Table 1. Occurrence of Symbols per Chunk in Program M


Frequency per Chunk
Type of
Symbols M 211 M212 M22 M 311 M321

Operands (n,,)
n 1 1 1 2
m 1 1 1
3 3
1 5 1 5
1 2 4
2 3 3
1
1
1

0
Operators (n,y2)
While
For
If

>
<
I
and
:=
t
_
Totals
n,, 4 9 9 2 9 I
ns2 1 7 7 2 I I
N,l 6 15 15 3 16 16
42 3 9 9 2 10 10
28 J. SYSTEMS SOFTWARE R. R. Gonzalez
1995; 29~17-37

Table 2. Syntactical Complexity Values


Types of Types of Boolean Nodes per
Terminal Operands Operators Function Classes in
Chunks n,1 ns2 P P mn,, + n,,) s, lIesill W,i

Ml 4 1 PO 1 2.113 2.113 0.000 0.000 2.113 0.000


M 211 9 7 Pl 1,2,4,4 3.768 5.881 2.725 1.823 6.733 11.816
M 212 9 7 Pl L&4,4 3.768 9.649 2.725 1.823 10.191 18.278
M22 2 2 P2 AT4 1.922 11.571 1.000 1.379 11.696 16.016
M 311 9 7 P3 1,2,4 3.767 15.338 2.646 1.379 15.626 21.464
M 321 7 7 P4 1,2,4 3.551 18.889 2.646 1.379 19.123 26.302
--
65.482 93.876
IlMllJ IlMllse

obtained by applying the expression H&r,,) in Ap- is the total syntactical entropy effort derived by use
pendix 1. Table 1 shows the occurrence of operator of equation 6.
symbols, and Table 2 shows the values of informa- Table 2 shows these entropy values and magni-
tion content of syntactical time of chunks in pro- tudes for the terminal chunks of program M in the
gram M. vectorial space Es.
L, is the entropy or information content of the
syntax decision level, and it measures the uncer- 5.3 Functional Vector Space E,
tainty of a particular logical value in a predicate as
Vector complexity space Ef is the set of vectors
Boolean function /3 given a set of possible values.
efi = (sfi, tfi, lfi)sMf c Ef that represents the func-
As pointed out in Section 3.3, a Boolean function
tional factors of chunks: functional cohesion, data
may be represented by a binary decision tree (Figure
coupling, and functional coupling.
9). The nodes in the levels of decision of the tree
Sf is the functional chunk position. Because the
represent entropy classes used to obtain the value of
chromatic information content H, measures the de-
entropy by applying the definition of a probability
gree of a node in a graph, it can be used to measure
function (Appendix 3). Table 3 shows the values of
the information content due to the number of fields
L, derived by use of formula A8 (Appendix 3), which
in a data flow graph or data coupling in chunks (Lew
are the entropy values or decisional information
et al., 1988). In Appendix 5, we explain the concepts
contents of chunks.
of chromatic information content measurement. We
Thus, a chunk in E, is defined by esi, and the
have proved that the maximum chromatic entropy
expression
for a node is 2, based on the fact that it is possible to
IIMlls = EIle,,lI
i=l
(8)
represent the elements (coming in or going out) in
nodes by use of four colors at the most as entropy
classes.
is the sum of vectorial magnitudes that represents The functional chunk position is given by its data
the syntax information entropy volume of the pro- length and the displacement in relation to the posi-
gram M. tion of the predecessor chunk in the ordering given
by the control flow graph G,:
IlMllse = g L,, . Jiq q (9)
I=1 Sf; = Hq + Sf,,-,, (10)

Table 3. Functional Complexity Values


Data Arcs Chromatic Structural Instructions Structural
Terminal per Classes Chunks Classes per Chunk Classes
Chunks Chunk k per G, of G, in Gi of G, H,(k) S, Ht%iJ H<;b c ) IIe ,c iII WF,
Ml 3 1,Ll 1 1 3 3 1.585 1.585 0.000 0.000 1.585 0.000
M 211 8 494 2 171 3 Ll, 1 1.000 2.585 1.585 1.000 3.193 3.032
M 212 8 474 2 1,l 3 1,1,l 1.000 3.585 1.585 1.000 4.045 3.920
M22 1 1 4 1,2,1 2 171 o.OOfl 3.585 1.000 1.500 4.013 3.886
M 311 7 3,391 5 1,2,1,1 4 371 1.449 5.034 0.811 1.922 5.449 9.800
M 321 7 3,391 5 1,2,1,1 4 391 1.449 6.483 0.811 1.922 6.810 12.557
--
25.095 33.195
IIMIIF IlMll~e
A Unified Metric of Software Complexity J. SYSTEMS SOFIWARE 29
1995; 29:17-37

where HJn,,) is the value of chromatic information S, represents, in a data structure graph, the de-
content of the chunks coupling i with nfs data fields gree of a node or the number of different fields in
coming in and going out. the data structure, located by the displacement in
In the example shown in Figure 2, chunk Iw, has the order of the control flow graph. As with data
three data fields going out, so we use three different coupling, we use chromatic information content
colors as entropy classes, then apply formula 5.1 H,(n,,) as a measure of entropy or uncertainty,
(Appendix V): H,,(3) = -3/3 (log l/3) = 1.585. where n,, is the number of data fields used as input
Because chunk M2,, has eight data fields, and output in the data node of the data structure
HKZ1,(8) = -2(4/S log4/8) = 1.000 with two colors graph Gd.
used as minimum entropy classes. Then Sri = 1.585
SC,= HK,+ s+,, (13)
+ 1.00 = 2.585 is the functional position of the
chunk MI, ,. Table 3 shows the values of Sr derived is the positional equation for each chunk in M.
by applying formula 10. T, is the time expended in making the decision to
L, measures the level of uncertainty from select- execute the terminal chunk instructions in a particu-
ing between a set of possible paths in control flow lar execution path. Time is measured by computa-
graph G,. The notion of structural information con- tional information content, which is the quantity of
tent explains that the larger the number of nodes in information needed to represent the decisional time
a class or orbit, the greater the uncertainty that a consumption.
particular realization is selected (Lew et al., 1988). In Appendix 4, we show that algorithmic complex-
In Appendix 2, we explain the theoretical bases of ity may be represented by a decisional tree and, as
the structural entropy. Table 3 contains the value of with syntactical decisional information content, its
Lf derived by use of formula A2. entropy or information content is an asymptote
T[ is the degree of interaction among instructions function bounded by 2. Using this conclusion, we
or temporal dependence in a chunk as a measure- obtain the values of information content of a typical
ment of cohesiveness. I have represented this rela- function of algorithm complexity shown in Table Al.
tion by use of a structural flow graph G, in Figure L, uses structural information content as a mea-
12, which shows the relationship between instruc- surement of the number of hierarchical levels of
tions within a chunk. As in Lf, structural informa- data dependence. Its value increases as the number
tion content measures the entropy or uncertainty of of dependent nodes increases in the data structure
this order by mean of orbits as entropy classes of the Gd and, in the computational domain, represents the
instructions’ dependence, as explained in Section level of uncertainty in making the decision to locate
3.4. Table 3 shows the values of Tf by applying a data entity in a data space (Appendix 2). For
formula 2.1 (Appendix 2). example, in an array data structure there are two
levels of decision; then, HC = -2(lj2log l/2) =
IIMIIF= t Ile ,,ll (11) 1.000 is the information content; in a graph, data
1= I structure is the number of data nodes in different
is the sum of the vectorial magnitudes and repre- data paths, and its orbits are entropy classes.
sents the functional Jyntax entropy volume of the The expression
program M.
IIMII, = E lie,, (14)
!=I
IIMllFr = 5 L,, . &g iq (12)
I= I represents the computational information content uol-
ume. Using equation 6, we obtain
is the totalfinctional entropy effort, derived by use of
effort equation 6. IlMllc e= 5 L,, . &Fq (15)
Table 3 shows these entropy values for the pro- I= 1
gram example shown in Figure 2 in vectorial space which is the total computational entropy effort value.
E.,. Table 4 shows the computational entropy values of
chunks in the program M.
5.4 Computational Vector Space E,
Vector complexity space is composed of the vectors 6. A UNIFIED METRIC AS MEASUREMENT
eci = (s C,)t C1)1CI) 7which represent the computational OF WHOLE COMPLEXITY
factors of chunks: data length, algorithmic time, and We have defined E, as syntactical vector space, Es.
data structure hierarchical level of chunks. as functional vector space, and E, as computational
30 J. SYSTEMS SOFTWARE R. R. Gonzalez
1995: 29~17-31

Table 4. Computational Complexity Values


Data Chromatic Complexity Data
Terminal Fields per Classes Function Hierarchy
Chunks Chunk k f DL H,(K) S, H$f) llecill wci
Ml 3 1, 1,1 1 l(3) 1.585 1.585 1.479 0.000 2.168 0.000
M 211 8 494 n 2(4) l(2) 1.000 2.585 1.643 4.000 5.038 12.252
A4212 8 494 2(4) l(2) 1.ooo 3.585 1.643 4.000 5.617 15.774
f’422 1 1 2nn- 1 l(l) 0.000 3.585 1.655 0.000 3.944 0.000
M311 8 4,4 n 2(2) l(5) 1.000 4.585 1.643 2.000 5.265 9.741
M 321 8 494 n 2(2) l(5) 1.000 5.585 1.643 2.000 6.156
~-
11.643
28.188 49.410
IlMllc IMIc e

vector space, each in H3, where Z-Z3is a three-linear the metric over E or distance from ei to ei+ I,
vector entropy space. But to obtain a metric that defined by
takes into account all the software complexity fac-
II@(e,i, ef,, eCi) - @(esi+l,efi+l, eci+,)II
tors together, we need to build a unified space of
software complexities. This space is a multilinear = ll@(esi - es,+,, ef, - ef,+,, e, - ecc+)ll
combination of the three complexity spaces, and
=Idet(M,,+l - M,,)I (19)
over this new space we define a new complexity
metric. Then, Es x Ef X E, is called a three-linear which is the total amount of information content
entropy space. Let @ be a map defined by needed to produce chunk Mi+ I after chunk Mi, in
@:E,xEfxE,-+H the ordering of control flow graph G,. Thus

so that IlMlb = E II@ill = E IIWesi, ef,, e,,)II


i=l i=l
Ne,, + es*, ef, e,) = Wesl, ef, e,) + @(e,z, er, e,) (16)
We,, ef, + er,, e,) = Ne,, ef,, e,) + We,, efi, e,) = t ldet04,i)l (20)
I= 1
@(es, ef, eC1+ eC2)= We,, ef, e,,) + We,, ef, eCZ) is the total tensoriul entropy volume and
@Cc.e,, ef, e,) = c.Nesr ef, eJ,
11~11~= 5 II@(esi,ef,,eJ - ~(esi_,,efi_,,e,i_,)II (21)
@(e,,c.e,, e,) = c.We,, ef, eJ, i=l

We,, ef, c.e,) = c.Ne,, e2, e,> is the total entropy effort needed to produce the
program M. These metrics are termed uniJied soft-
@ is a three-linear form over H, or a three-couatiunt
tensor over H. The set of three-covariant tensors
over H we term T,(H), which also is a vector space
over H. If we define a 3 x 3 matrix A4, composed of L
Es x Er x EC
the three complexity vectors of the chunk iVi in Es, t
Ef, and EC as the rows of the matrix, then

We,, er, e,) = detO4,). (17)

It is possible to prove that @ is a three-covariant


tensor over H. Then
Il@Jl= II@(e,i,efi, e,)ll = IdetOQl (18)
is the tensorid norm that represents the unified
entropy of the terminal chunk Mi. Il@illis the infor-
mation content volume (tensotiul complexity volume),
because det function is the volume of parallelepiped
in a three-dimensional vector space represented in
Figure 15 (Mostow and Sampson, 1969).
The set (T,(H), lI@Il>is called complaity multilin-
ear space. Then, we can induce, from the norm 11@11,
J T

Figure 15. Tensorial volume of chunk complexity.


A Unified Metric of Software Complexity .I. SYSTEMS SOFTWARE 31
1995; 29117-37

ware complexity metrics. Table 5 shows the complex- stochastic Markov chain. The likelihood function of
ity values derived by applying the unified metric to X1,X2,..., X,, is defined to be the joint density
program M. function

L(O) = L(O; x,, X*,. ..) X,),


7. APPLICATION OF THE UNIFIED
SOFTWARE METRIC which is considered a function of 0.
If 0” is the value of 0 in 0 that maximizes
7.1 The Best Estimator of Internal Complexity L(O), then O* = ~*(x,,x~,. . . , x,,) is the muximum-
likelihood estimator of 0 for the sample x,, x2, . . . , x,
Based on the concept of the unified software com- (Mood and Graybill, 1974), and it represents the
plexity metric, it is possible to build a statistical minimum complexity volume of the best program
estimator of the llMllT and )JMIIE. This estimator solution to the program M from the infinite space
from the external software design (functional and solution {g,, g,, g,, . . . I. O* Is the lower bound vol-
data structures design) is defined by the following ume of the software complexity for program M.
characteristics:

1. Data structure, control information, and data in-


formation in the design of modules can be used 7.2 Software Productivity Measurement
to estimate the number of chunks in a program, Productivity is concerned with the efficiency of the
the control structure (control flow graph), the resources consumed in producing a given application
data flow graph, and the nesting level of chunks. in a timely manner; as such, it ought not be ignored
in measuring system performance (Scudder and
Information content of the message should be pro-
Kucie, 1991). The principal question in productivity
portional to the uncertainty caused by the decision
is: what are the time and human and computer
statement. The order of magnitude of the informa-
resources to be invested in coding, testing, debug-
tion transferred to a module should be of the same
ging, integrating, and maintenance a program for a
order of magnitude as the cyclomatic number (Lew
quantity of software volume?
et al., 1988). The different combination of control
0 is the quantity of force needed to produce a
information, data information, and data structure
given software from the software design; it is similar
levels should be the same order of nesting level.
to the physics measurement of the work (Newton’s
2. The number and types of operators and operands laws) needed to do something. In this sense, this
as well as the level of language can be used as measurement is independent of the human agent
good estimators of code complexity and space. capacity to make the work. O* may be used to
3. The type of algorithm used in each module and estimate the productivity based in cognitive science
its type of complexity is a good estimator of theories that use the chunk concept to obtain a
computational complexity. measure of the human effort invested to produce a
software system (Miller, 1956; Coulter, 1983;
Let X,, X,, . . . , X,z be IZ random variables that Shneiderman, 1980; Woodfield, 1980).
represent the estimator factors listed in points l-3 Many researchers suggest that it is more difficult
above, and f be a statistical density function of the to develop, test, debug, and maintain programs with
program M derived from the fact that A4 is a deeply nested control structures than those with less
nesting (Belady, 1980; Jensen and Vairavan, 1985;
Harrison and Magel, 1981). Thus, the effort or intel-
Table 5. Values of So&are Complexity lectual work needed to produce a particular chunk
Using the Unified Metric or program is associated with the value IlMllr and
Terminal IIMIIE.
Chunks Idet(M,,)I We know that productivity depends on the ability
M, 0.000 0.000 and capacity of the staff to manage the tools and
M?II 6.764 12.506 methods used to develop a software project (Boehm,
M 212 16.393 0.000 1981). So we need to obtain the intellectual capacity,
ME 20.109 4.486
M 311 20.601 12.984 or intellectual power, of the project staff. This mea-
M x21 26.466 0.000 surement is used to estimate the total effort and
90.333 29.976 time needed to produce and implement a software
IlMllr IlMtiE
product, for example, the volume of software per
32 J. SYSTEMS SOFTWARE R. R. Gonzalez
1995: 29~17-37

person-month. Then, we can estimate the total cost which is called the robust software quality estimator:
and time of software development. the degree of approximation to the optimal software
solution.

7.3 Software Quality Measurement


Quality is judged indirectly by focusing on the effec- 7.4 Software Reliability Measurement
tiveness and usefulness of the delivered system in In software reliability, the quantity of code devel-
performing the task for which it was designed (Scud- oped (volume), the number of instructions added or
der and Kucie, 19911, in other words, how the system modified is directly related to the quantity of faults
is actually constructed and how it performs as op- or coding errors introduced in the software (Muss,
posed to how it should have been built and how it 1989). Other authors suggest that the number of
should perform @need and Merey, 1985). bugs are proportional to the number of decisions in
The components of software quality may be programs. In this sense, software reliability is strongly
grouped in two classes: influenced by the level of logical decisions
(Shooman, 1983). We suggest that program fault is
1. Functional components: usability, correctness, the overall result of total software complexity fac-
maintainability, portability, reusability tors.
2. Operational components: reliability, efficiency Faults are revealing as a program failure occurs
and integrity during the execution due to certain data events not
The functional components are qualitative charac- predicted in design or coding. The data event is a
teristics that depend on the technical specification of time-independent random variable. However, the
the software environment (development tools, oper- most important software reliability models devel-
oped to date describe the mean numbers of failures
ating systems, data base management systems, lan-
guages, etc.) and from the design process. Nowadays, experienced as a function of time while software is
with CASE and automated software quality assur- executed.
ance tools, this task is more practical, more rapid, Software complexity metrics are related to the
and more automatic @need and Merey, 1985). statistical distribution of faults in program modules
The operational components are quantitative char- (Khoshgoftar et al., 1992). It is assumed that control
acteristics measured from the software execution. of the software is transferred among the modules or
The volume 0 of software is a measure of the chunks according to a Markov process. Each module
amount of information content needed to operate (chunk) has an associated reliability that gives the
the software in optimal way from the perspective of probability that the module will operate correctly
functional components. In other words, if the design when called and will transfer control successfully
is accomplished with the functional characteristics when finished (Siegrist, 1988).
and system requirements (design quality), 0 repre- The measure of reliability considered is the mean
sents the quantitative measurement of the opera- number of transitions until failure, starting in a
tional quality. designated initial state. It is possible to build a good
Quality is a virtue of design. The robustness of reliability model by use of the operational profile of
software products is more a function of good design the program and assigning a probability distribution
than of on-line control, however stringent the devel- based on the proportional distribution of chunk vol-
opment processes. Most software products contain ume through the total entropy volume of the pro-
perturbations of one class or other, usually the re- gram. That is because frequency of execution of a
sult of a faulty meshing of one component (object) chunk is not the only factor of software failure, but
with corresponding components. Such performance also the quantity of information content in the
degradations may result either from something go- chunks of the program.
ing wrong in the development process or from an
inherent failure in the design (Taguchi and Clausing,
1990). 7.5 Software Value Measurement
We can obtain an index of software operational Software value is a measurement of relationship
quality by between effectiveness and the project cost. Effec-
tiveness is the ability to accomplish the overall in-
0
tended function for which the software was de-
R* = IlMllT signed. Maximum value is said to be the minimum
A Unified Metric of Software Complexity J. SYSTEMS SOFTWARE 33
1995: 29117-31

number of dollars that must be expended to accom- + l/5 log l/5) = 1.922 is its linguistic information
plish the desired function or performance. It should content value.
be noted that quality is part of function (Pruett and
Hotard, 1984). Then the expression
APPENDIX 2: STRUCTURAL INFORMATION
CONTENT

represents software value, where C, is the total cost Definition A2.1. We consider the control flow
of software, which is a measure of the software graph G,(M) as the basic functional structure of a
performance in relation to the development cost. program or a segment of a program, where M =
{M,, Mz, . . . , Mm) is the set of nodes or chunks in
G,. Then, it is possible to define a permutation as a
APPENDIX 1: Linguistic information Content one-to-one function 7r from a subset of M onto
As we explained in Section 3.3, we can represent the itself. We call all possible permutations over M S,
abstract syntax of instructions in a program or chunk (Fisher, 1977).
using an abstract syntax tree, which shows the com- In general, the image of each element under a
ponents and subcomponents of the syntactical ex- permutation rr can be realized by the description
pressions. Nonterminal nodes in the tree are opera-

1
tors, and the leaves or terminal nodes are the M, M, ... M,,
(A21
operands, called symbols or tokens of the expres- n-(M,) a(MJ ... TTT(M,)
sions. Taking each syntax tree of the different ex-
pressions, we obtain II symbols in a chunk or pro-
gram, and we classify the symbols by the occurrence Definition A2.2. If we have a function r that
of different symbol types. If we assume there are t transforms G, onto itself, preserving the adjacen-
symbol types in our sample of n tokens, then ties, then we say n is isomorphic. As S, c S, is the
I set of all possible permutations of 7~ in G, over G,,
n= En r then S, conforms a permutation group named G.
I= I The number of elements in the permutation group
and the relative frequency is we call order of group or /G/.

Definition A2.3. Let g, and g, be elements in G,


then g,og, is the composition or product of permu-
This way, we obtain an empirical distribution of tations. Suppose g is a permutation of G. We begin
symbols or tokens within a program or chunk with any M,sM in G, and write CM,, g&M,),
(Harrison, 19921, where t represents the number of
g,(M,), . . . , g,_ ,( M,)) where g,(M,) is the first repe-
entropy classes, n, the number of symbols per class, tition of Mi. This sequence is called a cycle. Using
and f, the probability p, of the class r. the concept in this manner, until all elements of M
Then, the average amount of information con- occur in cycles with different cycles disjoint, we
tributed by each symbol in a program is the entropy obtain the cycle decomposition of G.
function defined over pr:
I f
Definition A2.4. Let G be an isomorphic permuta-
Ho = - c p, log p, = - c 5 log % (Al)
tion group on the set M. An element x in M is said
r= I r=l n
to be G equivalent to y in M if there exists g in G
This equation represents the linguistic information such that g(x) = y. We write this x - y. G equiva-
content of a program, module, or chunk with t lent is an equivalent relation on M because
different symbols. This entropy value is also inter-
preted as the level of linguistic abstraction of a it is a reflexive relation because G contains the
program written in a particular language. identity e(x) = x and for all x in M x N x;
Table 1 contains the occurrence of symbols per it is symmetric because, if g(x) = y, then there
chunk in example program M. Chunk Mz2 contains exists g-’ such that g-‘(y) = g-‘(g(x)) =
four types of symbols (two operands and two opera- e(x) = x;
tors), five total symbols, and the frequency per sym- it is transitive because, if x N y and y 4 z, then
bol or entropy classes are (2},{11, (11,111. Then, there exist g(.x) = y and h(y) = z, such that
HO = -(2/5.log 2/5 + l/5 log l/5 + l/5 log l/5 (hog)(x) = h(g(x)) = h(y) = 2.
34 J. SYSTEMS SOFIWARE R. R. Gonzalez
1995; 29:17-37

cannot be permuted because we alter the control


Then, G equivalence partitions M into equivalence logic of the program, and nodes A&i and i&12
classes called orbits of C. may be interchanged without altering the control
logic. Then, the information content is Hs =
Definition A2.5. The stabilizer of x in M is G, = -((2/5)log2/5 + 3(1/5)log l/5) = 1.922.
{g E E : g(x) = x}, thus G, c G that individually fix
x. In fact G, is a subgroup of G, because g, h E G,
APPENDIX 3: Decisional Information Content
implies g(x) =x so that (g&r-‘)(x) = g(/z-l(x)) =
g(x) = x and gh-’ E Gx. If C is a G-equivalent Every Boolean function p, as defined in Section 3.3,
class in M, then /G/ =/C/ /G,/ where x E C. may be represented as a binary decisional tree G,,
For any g in the permutation group G on M, let where p is the root, the. predicates constitute the
the set of fixed points be F(g). Thus F(g) = (x f internal nodes, and the leaves correspond to the
M : g(x)x} and the number of G-equivalent classes logical binary values True or False. p is invariant in
in M, by Burnside’s theorem (Fisher, 1977) is relation to the basic operation in the Boolean alge-
bra and may be transformed to a normal disjunctive
(A3) form (Fisher, 1977). We consider G, as a complete
binary tree, without losing generality.

where n = 2h+l - 1, h = lo& + 1) - 1 C-47)

M = c, u c, u ***u @/( represent the total number n of nodes in a tree and


its height h, respectively,

Definition A2.6. The structural information con- Dejinitiun A3.2. We define the entropy classes as
tent of the G, is defined by the entropy relation the numbers of decisions in the different levels of
the decisional tree (2’, i = 1,2,. . . , h). Then, the de-
Hc(Gc) = - tpilogpi (A4) cisional information content is defined by the en-
i=l tropy function
where k is the number of equivalent classes or
orbits, pj = ki/m, ki is the number of nodes in the H,(G,) = W-0
i=o
class Ci, and m is the number of chunks in M (set
of nodes in G,). This equation is transformed in
The maximum entropy is obtained when n = k,
because there is only one element per orbit. In other H, = log n - $rcn + l)log(n + 1) - 2n]
words, there is no possibility to interchange ele-
ments in G,, and so the n elements are fixed. Then, = 2 + log tI 1 + ; log(n + 1)
( 1
1
HE = -; log z = logk=log($b(g)~) and from the last two terms of the equation we
obtain the inequality
log(n f 1)
- log IGI = log n (AS
[(1+;1
logln + 1) - log II I
1 (n + 1)
The minimum entropy value is obtained when we
have one orbit or class for the n elements. In other and
words, it is possible to do a total interchange of the logh -t- 1) h+l
n elements in G,. Then H,12- =2-_ (A91
2h+’
n+l
k k
Hc = - k log k =logl = log c IF(g)1 using formula A7. Applying L’Hopital rule in equa-
LSEG tion A9, we get
- log IGI = 0 646) lim Ha = 2
h-sm

In Figure 10, the subgraph Gc3i1 of chunk M,,, This result demonstrates that the entropy of the
has five nodes,. The isomorphic permutation group decisional logical process in a program is a asintote
G is composed of {go, g,, ) as permutation functions. function upper bounded by 2.
There are four orbits; the equivalent classes are Figure 9a represents the decisional binary tree of
(11, {21,{1),{1), because nodes M,, I&, and M3i, the Boolean function PI. It has four entropy classes
A Unified Metric of Software Complexity J. SYSTEMS SOFIWARE 35
1995; 29~17-37

because the depth of the tree is 3 and has 11 nodes


or different decisions. The number of elements by
class are {I), {Z), (41, and {4}, because we have one
H,(f(n))
T
decision in level 0, two decisions in level 1, and four
decisions in levels 2 and 3, respectively. Then,
H,( p,) = -(l/11 log l/11 + 2/11 log 2/11+
2(4/11)log4/11) = 1.823 is the decisional informa-
tion content of the Boolean function &.

APPENDIX 4: Computational Information Content

Definition A4.1. As we say in Section 3.5, each


Figure Al. Computational information content of a com-
terminal chunk in a control flow graph G, has a plexity function f(n).
computational complexity, which is the time invested
in making decision to execute basic instructions us-
ing a data structure of IZ elements and the time
expended in the execution of its basic instructions Equation All is the proportion of the entropy frame
that conform the chunk. The complexity is repre- area for the algorithmic complexity function f(n)
sented by a function f(n) of worst-case behavior of operating over a data structure of II entities. It
the chunk’s algorithm. represents the computational information content of
From the above definition, we can represent the the chunk whose algorithmic complexity function is
algorithmic complexity as a decisional process used f(n).
to execute the basic instructions in a program. To Table Al shows the values of computational infor-
execute a chunk, it is necessary to make a logical mation content of some typical complexity functions
decision. In other words, the decision points in the based on the numerical approximation of formula
program control the processing of the IZ data ele- All with certain correction factors.
ments in a data structure. The quantity of decision is
given by the complexity function f(n), or the amount
of decisions made to do the work. APPENDIX 5: Chromatic Information Content
Then, doing h = f(n) in the decisional binary tree
represented by formula A9 in Appendix 3, we obtain
f(n) + 1 Definition A5.2. An arc coloring of a graph
l&(fh)) = 2 - 2f(n)+, . (AlO) G(M, A, cp) is an assignment of colors to its arcs in
As with decisional information content, computa- A such that no two adjacent arcs have the same
tional information content is upper bounded by 2 color. Using n colors to assign one color to each arc
when the value of complexity function grows, be- in A, we obtain II coloring of a graph G, which
cause f(n) is a nondecreasing function. conforms a partition of A in 12 independent sets
We define a measurement to quantify the value of called color classes.
computational entropy. The measurement is the
proportion of the area occupied by the complexity
function in the entropy frame as shown in Figure Table Al. Information Content of Typical Function of
Al. Algorithmic Complexity
We elect f(n,) such that for all f(n) > f<n,) we
Complexity
have H,(f(n))c(2 - 6,2 + S), where E, S > 0. The Function H,*(f)
entropy frame is an area of 2 N,, where N, = f<n,>. f 6 = 10-x
Then 1 1.419
N/2 I .636
N 1.643
2N+ 1 1.655
N.lo N 1.665
and 24 1.688
N2 1.722
Nfbl) + 1
Hs*cfcn,>= 2 - ;/ N-7 1.844
E
2f(n)-l df (All) NN 1.961
F0
36 .I.SYSTEMS SOFTWARE R. R. Gonzalez
1995; 29:17-37

Definition A5.2. The chromatic number of the Davis, J. S., and LeBlanc, R. J., A Study of the Applicabil-
graph G is the smallest natural number k I n for ity of Complexity Measures, IEEE Trans. Software Eng.
which G has a k coloring or K(G). Then the chro- SE-14, 1366-1372 (1988).
matic information content H,(G) of G is defined by Dijkstra, E. W., Dahl, 0. J., and Hoare, C. A., Structured
Programming, Academic Press, New York, 1970.
k Estes, W. K., Models of Learning, Memory and Choice:
H,(G) = - &logpi (AM Selected Papers, Praeger Publishers, 1982, pp. 23-24.
i=l
Fisher, J. L., Application-OrientedAlgebra: An Introduction
to Discrete Mathematics, Harper and Row, 1977.
where k is the number of equivalent classes or k
Halstead, M. H., Toward a theoretical basis for estimating
coloring in G used as entropy classes, pi = ki/m, ki programming effort, in Proceedings of the ACM Confer-
is the number of arcs in the color class i, and m is ence, 1975.
the number of arcs in A. Halstead, M. H., Elements of Software Science, Elsevier
Furthermore, Guthrie’s conjecture (Toranzos, North-Holland, New York, 1977.
1976) states that if G is a planar graph, then Harrison, W., An Entropy-Based Measure of Software
K(G) 5 4. In other words, with only four colors, it is Complexity, IEEE Trans. Software Eng. 18 (1992).
possible to color the arcs of any graph G. Then, Harrison, W., and Magel, K., A Complexity Measure Based
from the chromatic information content equation on Nesting Level, SZGPLAN Not. 63-74 (1981).
A12, and doing K, = m/4, H,(G) I 2. Henry, S., and Selig, C., Predicting Source Code Complex-
For example, if we have a node in M with three ity at the Design Stage, IEEE Software 7 (1990).
arcs coming in or going out of A, then we can use Hopcroft, J. E., and Ullman, J. D., Formal Languages and
Their Relation to Automata, Addison-Wesley, 1969.
three color classes as a minimum number, and then
Jensen, H. A., and Vairavan, K., An Experiment Study of
H,(3) = - 3(1/3>log l/3 = 1.585 is its chromatic in-
Software Metrics for Real-Time Software, IEEE Trans.
formation content. A node with seven arcs can use Software Eng. SE-11, 231-234 (1985).
three color classes (31, (31, and (1); then H, (7) = Khoshgoftar, T. M., Bhattacharya, B. B., and Richardson,
-(2(3/7)log 3/7 + 17 log l/7) = 1.449. G. D., Predicting Software Error, During Development,
Using Nonlinear Regression Models: A Comparative
Study, IEEE Trans. Reliability 41 (1992).
ACKNOWLEDGMENTS Lew, K. S., Dillon, T. S., and Forward, K. E., Software
Complexity and Its Impact on Software Reliability IEEE
I wish to thank Professor Wilfrido Fiallo for his insightful
suggestions and his important assistance that have helped to Trans. Software Eng. SE-14,1645-1655 (1988).
improve this work. Lipschutz, S., General Topology, McGraw-Hill, 1970,
pp. 111-119.
McCabe, T. J., A Complexity Measure, IEEE Trans. Soft-
ware Eng. SE-2, 308-320 (1976).
REFERENCES Miller, G. A., The Magical Number Seven, Plus or Minus
Baase, S., Computer Algorithms: Introduction to Design and Two: Some Limits on Our Capacity to Processing Infor-
Analysis, Addison-Wesley, 1988. mation, Psychol. Rev. 63 (1956).
Basili, V. R., Selby, R. W., and Hutchens, D. H., Experi- Mills, H. D., Mathematical Foundations for Structured
mentation in Software Engineering, IEEE Trans. Soft- Programming, IBM Report FSC 72-6012, 1972.
ware Eng. SE-12, 733-743 (1986). Mood, A. M., Graybill, F. A., and Boes, D. C., Introduction
Basili, V. R., and Rombach, H. D., Implementing Quanti- to the Theory of Statistics, McGraw-Hill, 1974.
tative SQA: A Practical Model, IEEE Software, Sep- Mostow, G. D., and Sampson, J. H., Linear Algebra,
tember 1987. McGraw-Hill, 1969, pp. 259-260.
Beizer, B., Software Testing Techniques, Van Nostrand Munson, J. C., and Khoshgoftaar, T. M., Measuring Dy-
Reinhold, New York, 1990. namic Program Complexity, IEEE Software (1992).
Belady, B. L. A., Software geometry, in Proceedings of the Muss, J. D., Tools for Measuring Software Reliability,
1980 International Computer Symposium, 1980. IEEE Spectrum 39-42 (1989).
Boehm, B. W., Software Engineering Economy, Prentice- Pruett, J. M., and Hotard, D. G., The Value of Value
Hall, Englewood Cliffs, NJ, 1981. Analysis, Zndust. Manag. (1984).
CotC, V., Bourque, P., Oligny, S., and Rivard, N., Software Ramamurthy, B., and Melton, A., A Synthesis of Software
Metrics: An Overview of Recent Results, J. Syst. Soft- Science Measures and Cyclomatic Number, IEEE Trans.
ware 8, 121-131 (1988). Software Eng. SE-14 (1988).
Coulter, N. S., Software Science and Cognitive Psychology, Scudder, R. A., and Kucie, A. R., Productivity Measures
IEEE Trans. Software Eng. SE-9, 166-171 (1983). for Information Systems, Infor. Manage. 20 (1991).
Curtis, B., Foundations for a Measurement Discipline, Sethi, R., Programming Languages: Concepts and Con-
Quality Time, IEEE Software (1987). structs, Addison-Wesley, 1989, pp. 383-388.
A Unified Metric of Software Complexity J. SYSTEMS SOFTWARE 37
1995; 29~17-37

Shannon, C., A Mathematical Theory of Communication, Toranzos, F. A., Introduction a la Teoria de Grafos, The
Bell Syst. Tech. J. 27, 379-423, 623-654 (1948). General Secretariat of the Organization of American
Schneiderman, B., Software Psychology, Wintrop, Cam- States (OAS), Washington, D.C., 1976, pp. 53.
bridge, MA, 1980. Weyuker, E. J., Evaluating Software Complexity Mea-
Shooman, M. L., Software Engineering: Design, Reliability sures, ZEEE Trans. SofnYare Eng. SE-14 (1988).
and Management, McGraw Hill, New York, 1983, Wirth, N., Program Development by Stepwise Refinement,
pp. 150-151. Commun. ACM (1971).
Siegrist, K., Reliability of Systems with Markov Transfer Wirth, N., Algorithims + Data Structures = Programs,
of Control, II, IEEE Trans. Software Eng. SE-14 (1988). Prentice-Hall, 1976.
Sneed, H. M., and Merey, A., Automated Software Quality
Woodfield, S. N., Enhanced Effort Estimation by Extend-
Assurance, IEEE Trans. Sofhvare Eng. SE-11, 909-916
ing Basic Programming Model to Include Modularity
(1985).
Factors, Ph.D. Thesis, Department of Computer Sci-
Szulewski, P. A., Sodano, N. M., Rosner, A. J., and
ence, Purdue University, 1980.
DeWolf, J. B., Automating Software Design Metrics,
Zage, W. M., Zage, D. M., Bhargava, M., and Gaumer, D.,
RADC-TR-84-27, Rome Air Development Center,
Design and Code Metrics Through a Diana Based Tool,
Rome, NY, 1984.
SECC-TR-109-P, Software Engineering Research Cen-
Taguchi, G., and Clausing, D., Robust Quality, Harvard
ter, Purdue University, 1992.
Bus. Rev. (1990).

You might also like