Automatic Generation of Floating-Point Test Data

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-2, NO.

3, SEPTEMBER 1976 223

Automatic Generation of Floating-Point Test Data


WEBB MILLER AND DAVID L. SPOONER

Abstract-For numerical programs, or more generally for programs "heuristic" in that it is not guaranteed to produce a set of test
with floating-point data, it may be that large savings of time and data executing a given path whenever such data exist. (On
storage are made possible by using numerical maximization methods the other hand, we know of no guaranteed data generation
instead of symbolic execution to generate test data. Two examples,
a matrix factorization subroutine and a sorting method, illustrate the scheme whose execution time does not, in the worst case,
types of data generation problems that can be successfully treated grow at least exponentially with the length of the execution
with such maximization techniques. path.)
Index Terns-Automatic test data generation, branching, data con-
straints, execution path, software evaluation systems. NUMERICAL MAXIMIZATION METHODS FOR
GENERATING TEST DATA
INTRODUCTION Given the problem of generating floating-point test data our
R ESEARCH in program evaluation and verification has approach begins by fixing all integer parameters of the given
only rarely (e.g., [1]) begun with the explicit require- program (e.g., the dimensions of the data in a matrix program
ment that the program deal with real numbers as op- or the number of iterations in an iterative method) so that the
posed to integers. This may be an oversight since there are only unresolved decisions controlling program flow are com-
theoretical results which suggest the desirability of this as- parisons involving real values. Then, as will be seen, an execu-
sumption. Specifically, a general procedure of Tarski [2] tion path takes the form of a straight-line program of float-
shows that certain properties, undecidable (in the technical ing-point assignment statements interspersed with "path
sense) for "integer" programs, are decidable for "numerical" constraints" of the form ci = 0, ci > 0, or ci > 0. Each ci is a
programs. Examples of this phenomenon arise when one asks data-dependent real value possibly defined in terms of pre-
if there exists a set of data driving execution of a certain kind viously computed results. For instance, a path which takes
of program down a given path. the true branch of a test "IF(X.NE.Y)" has a constraint c > 0,
Moreover, there is practical evidence supporting the case for where, e.g., C = ABS(X - Y) or c = (X - Y)2. (We will not discuss
automatic verification of special properties of numerical pro- in any detail the philosophical and practical difficulties asso-
grams. Proving "numerical correctness," i.e., verifying a satis- ciated with equality tests when computation is contaminated
factory level of insensitivity to rounding error, is sometimes by rounding error. Nor will we consider the problem of (auto-
much easier than proving that the program performs properly matically or manually) generating the straight-line program;
in exact arithmetic. The ideal and contaminated results can we have nothing new to add on this subject.)
often be meaningfully compared with only minimal under- The situation is clarified by an example. Consider the fol-
standing of the program. Simple, portable, general-purpose lowing subprogram of Moler [8] .
software [31, [4] can easily provide answers which have SUBROUTINE DECOMP(N,NDIM,A,IP)
eluded specialists in roundoff analysis. This work [3], [4] REAL A(NDIM,NDIM) ,T
also shows the possible advantage of using, e.g., numerical INTEGER IP (NDIM)
maximization methods to do the verification, avoiding the c
alternative of using, e.g., computer symbolic manipulation C MATRIX TRIANGULARIZATION BY GAUSSIAN ELIMINATION.
(5], [6]. C INPUT..
This paper considers automatic test data generation, a prob- C N = ORDER OF MATRIX.
lem which arises in such fields as automatic software evalua- C NDIM = DECLARED DIMENSION OF ARRAY A.
tion systems [7] and in automatic roundoff analysis [4] . Our C A = MATRIX TO BE TRIANGULARIZED.
contention is that automatic test data generation is sometimes C OUTPUT..
best formulated and solved as a numerical maximization prob- C A(I,J), I.LE.J = UPPER TRIANGULAR FACTOR, U.
lem. The reader should be warned that our scheme is only C A(I,J), I.GT.J = MULTIPLIERS = LOWER TRIANGULAR
FACTOR, I-L.
Manuscript received September 9, 1975; revised February 23, 1976. IC IP(K), K.LT.N = INDEX OF K-TH PIVOT ROW.
This work was supported in part by the National Science Foundation
under Grant GJ-42968. IP(N) = (-1)**(NUMBER OF INTERCHANGES) OR 0.
W. Miller is with the Department of Computer Science, Pennsylvania C USE 'SOLVE' TO OBTAIN SOLUTION OF LINEAR SYSTEM.
State University, University Park, PA 16802. C DETERM(A) = IP(N)*A(1, 1)*A(2 ,2)* *A(N,N).
D. L. Spooner was with the Department of Computer Science, Penn-
sylvania State University, University Park, PA 16802. He is now with C IF IP(N)=o, A IS SINGULAR, SOLVE WILL DIVIDE BY ZERO.
the Department of Computer Science, Cornell University, Ithaca, NY. C INTERCHANGES FINISHED IN U, ONLY PARTLY IN L.
224 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, SEPTEMBER 1976

C A(1,3) = T
IP(N) = 1 cs = ABS(T) > 0
DO 6 K = 1,N A(2,3) = A(2,3) + A(2, 1)*T
IF(K.EQ.N) GO TO 5 A(3,3) = A(3,3) + A(3, 1) *T
KPI = K+1 C6 = ABS(A(1, 1)) > 0
M=K C7 = ABS(A(3,2)) - ABS(A(2, 2)) > 0
DO 11 = KP1,N T = A(3,2)
IF(ABS(A(I,K)) .GT.ABS(A(M,K))) M = I A(3,2) = A(2,2)
1 CONTINUE A(2,2) = T
IP(K) = M C8 = ABS(T) > 0
IF(M.NE.K) IP(N) --IP(N) A(3,2) = -A(3,2)/T
T = A(M,K) T = A(3, 3)
A(M, K) = A(K,K) A(3,3) = A(2,3)
A(K, K) = T A(2,3) = T
IF(T.EQ.0.) GO TO 5 cg = ABS(T) > 0
DO 2 I = KP1,N A(3,3) = A(3,3) + A(2,3)*T
2 A(I,K) = -A(I, K)/T clo ABS(A(2,2)) > 0
=
DO 4 J = KP1,N c = ABS(A(3, 3)) > 0
T = A(M,J)
One method of test data generation [9] -[11] begins with
A(M,J) = A(K,J)
A(K,J) = T
symbolic execution of the program to find explicit representa-
tions for the ci in terms of-the data. For instance, to write C7
IF(T.EQ.0.) GO TO 4
in this form we express the recomputed values A(2, 2) and
DO 31 = KPI,N
3 A(I,J) = A(I,J) + A(I,K)*T
A(3, 2) in terms of the original A(I, J), getting
4 CONTINUE C7=ABS(A(l1, 2) A(I, 1) * A(3,2) * A(3, I)-')
5 IF (A(K,K) .EQ.0.) IP(N) = 0
6 CONTINUE
ABS(A(2,2) - A(2, 1) A(3,2)- A(3, l)-1).
-

RETURN An inequality-solving method or a numerical "hill-climbing"


END routine can then be applied to seek a simultaneous solution for
the inequalities. See [91 for details.
The only integer parameter influencing program flow is the However, the program itself presumably gives a very efficient
matrix order N. NDIM does not enter in executable statements procedure for numerical evaluation of the ci. This efficiency
and IP is taken to be originaly undefined. To detemine a may well be obliterated if any algebraic "simplification" is at-
path we need to fLx N and fix the result of each floating-point tempted on the symbolic constraints. This is one of the rea-
comparison (notice that the result of the test IF(M.NE.K) iS sons we would like to propose an altemative scheme which
completely determined by the tests in DO loop 1). Specifi- works directly with the straight-line program, thereby also re-
cally, let us set N = 3, always take the true branch of the IF moving the need for algebraic manipulation and providing an
statement inside DO loop 1 and take the false branch in all approach which extends without difficulty to programs involv-
other floating-point comparisons. Integer operations on this ing, e.g., trigonometric or exponential functions.
path are performed when the following straight-line program is For notational convenience suppose that the path con-
being constructed, but then become irrelevant and hence are straints are of the form ci > 0 for i = 1, 2, m and ci = 0
omitted. The important point is that a 3 X 3 matrix A drives for i = m + 1, * * *, k (constraints c,> 0 are handled much like
Moler's program down our chosen path if and only if A makes ci > 0). Pick a continuous, real-valued function f(cI, *, cm) - -

cl, ,,.cl
, 1positive. such that f(cl, * c,Cm) < 0 if at least one ci is strictly nega-
el = ABS(A(2,1) - ABS(A(1,1)) > 0 tive and f(cl, ,cm)
c > 0 if all ci are strictly positive (by
c2 = ABS(A(3, 1)) ABS(A(2, 1)) > 0
- continuity this implies that f(cl, -.. ,cm) > 0 whenever
T = A(3, 1) ci > 0 for all i). For instance, using the notation Zc = min (c, 0)
A(3, 1) A(1, 1) pick one of
A(1,1) - T
C3 = ABS(T) > 0 MooCI)- XCM) = in (Cl- *Cm)
A(2, 1) = -A(2, 1)/T
A(3, 1) = -A(3, 1)/T if at least one ci is
T = A(3,2)
f2(cl, -
- *,
C.) = { = C ) negative
A(3,2) = A(1,2)
A(1,2) = T min (cl, - -
* " Cm) if no ci is negative
c4 = ABS(T) > 0
m
A(2,2) = A(2,2) + A(2, 1)*T if at least one ci is
A(3,2) = A(3,2) + A(3,1)*T
T = A(3,3)
*Cmm)
fi(cl, *mc
E Fi
) ~~~negative
i nc=negative .
A(3,3) = A(1,3) min (cl, * * -*, cm) if no ci is negative.
MILLER AND SPOONER: GENERATION OF FLOATING-POINT TEST DATA 225

Let A denote a set of data for the straight-line program. In RETURN


our current example A is just a 3 X 3 matrix. The values of END
the ci are actually functions Ci(A). The problem, then, is to c
find an A satisfying f(Ci(A), * * *, Cm(A)) > 0 subject to the C HEAPIFY
constraints Ci(A) = 0 for i = m + 1, * *, k. We propose that
- SUBROUTINE HEAPFY(A,I,L,N)
one search for such an A by picking a random initial set A0 DIMENSION A(N)
of data and then applying numerical techniques for con- II = I
strained maximization [12], stopping when f becomes posi- 10 CONTINUE
tive. Values off(cl(A), --, Cm(A)) are computed by execut-
- K = 2*1I
ing (directly or interpretively) the straight-line program to get IF(K.GT.L) GO TO 30
the Ci(A). Since f may be nondifferentiable, and since the IF(K + 1.GT.L) GO TO 20
maximizer will generally not be operating near local maxima, IF(A(K + 1) .GT.A(K)) K = K + 1
we suggest using one of the better "direct search" methods 20 IF (A(II) .GT.A(K)) GO TO 30
[131, [14]. TEMP = A(II)
A(II) = A(K)
For example, we tried finding a matnx A for the above path
in Moler's subprogram. The initial data we chose were A(K) = TEMP
II = K
/3 1 1\ GO TO 10
AO=( 1 4 1 30 RETURN
END
I1 1 5
N, the size of the array to be sorted, is the only integer pa-
Note that CI(AO) = -2, c2(A0) = 0 and c7(Ao) =-1. Since rameter that needs to be set. All other integer variables are
the other ci are never strictly negative, f.(Cl (Ao),- defined in terms of N. The only IF statements involving inte-
ClI (AO)) = - 2. With the choice f = fo, our maximizer (a vari- ger variables appear in subroutine Heapfy and are used to de-
ant of Rosenbrock's method [13]) found termine if a particular node is a leaf in the heap. In the final
/0.3857 18.62 1.0 straight-line program these branches will also disappear as they
too depend only on N.
A. (t 0.6268 - 13.865 1.0 We selected the program path which Heapsort's execution
i1.439 1.0 5.0/ follows when N = 30 and when the array A happens to be in-
itially sorted. The resulting straight-line program
satisfying the path constraints (in fact, fO,(c1(A+), -.
C 1 (A +)) 0.2411) in a fraction of a second of CPU time on
-
C= A(30) - A(15) > 0
our IBM 370/168. Another trial with f = fi took even less TEMP = A(15)
time. A(l5) =A(30)
A(30) = TEMP
HEAPSORT EXAMPLE C2 = A(29) - A(28) > 0
To further illustrate our proposed method of test data gen- C3 = A(29) - A(14) > 0
eration, let us consider a pair of Fortran subroutines modeled TEMP A(14)
after the Heapsort algorithm of Aho, Hopcroft, and Ullman
[15, pp. 87-921.
involves 211 constraints ci > 0 and 408 assignment statements,
C HEAPSORT
but no arithmetic operations (we have replaced ">" con-
c
straints by ">"; however, any A making all ci strictly positive
SUBROUTINE HSORT(N,A) satisfies the path constraints a fortfori).
DIMENSION A(N)
c
We applied the maximizer to f2(c1, c2, ,c211), If2 de-
C BUILD THE HEAP.
fimed as above, trying three initial sets of data:
DO 10IBACK= 1,N A [ 1 l = (30.0, 29.0, 2 8.0, * ,1.0),
I=N-IBACK+ 1
A l l= (30.0, 1.0, 29.0, 2.0, 28.0, 3.0, * ,and
CALL HEAPFY(A,I,N,N)
1 0 CONTINUE A1 = (30.0,28.0,26.0,*
c
C SORT THE HEAP. 4.0, 2.0, 1.0, 3.0, 5.0, , 29.0).
DO 20 IBACK= 2,N Sets of test data causing the path to be executed were located
I = N- IBACK+ 2 by the maximizer after 1147, 541, and 522 evaluations off2,
TEMP = A(I) respectively. A randomly chosen array A(1), , A(30)
A(I) = A(l) stands only one chance in 30! (t2.65 X 103) of driving
A(1) = TEMP execution down this path, so a successful search procedure
CALL HEAPFY(A, 1,(I - 1), N) which tries only about 1000 arrays A is reasonably efficient.
20 CONTINUE (Execution times for the three numerical searches ranged
226 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, SEPTEMBER 1976

between four and seven seconds of CPU time on our IBM [11] J. King, "Symbolic execution and program testing," submitted
370/168, at ten cents per second. However, this is in some for publication.
[12] P. Gill and W. Murray, Ed., Numerical Methods for Constrain ted
ways pessimistic since we neither took the trouble nor had the Optimization. New York: Academic, 1974.
appropriate software to explicitly generate the straight-line [13] W. Swann, "Direct search methods," in Numerical Methods for
program. Instead, we essentially executed the Fortran pro- Unconstrained Optimization, W. Murray, Ed. New York: Aca-
demic, 1972, pp. 13-28.
gram each of the, e.g., 1147 times to determine the constraints [14] -, Constrained optimization by direct search," in Numerical
and the assignment statements along the given path.) Methods for Unconstrained Optimization, W. Murray, Ed. New
York: Academic, 1972, pp. 191-217.
ACKNOWLEDGMENT [15] A. Aho, J. Hopcroft, and J. Ullman, The Design and Analysis of
Computer Algorithms. Reading, MA: Addison-Wesley, 1974.
The authors wish to thank the referee who pointed out ref-
erence [9] and made other helpful suggestions. The final form
of our Heapsort example was prompted by J. King's informal
conjecture that our methods are not much more efficient than
random generation of test data until a set is found which
causes the given path to be executed.
Webb Miller was born in Walla Walla, WA, on
November 30, 1943. He received the B.S. De-
REFERENCES gree in mathematics from Whitman College,
[1] T. Hull et al., "The correctness of numerical algorithms," in Proc. Walla Walla, WA, in 1966 and the Ph.D. degree
ACM Conf. Proving Assertions about Programs, New Mexico in mathematics from the University of Washing-
State University, Jan. 6-7, 1972. ton, Seattle, in 1969.
[2] A. Tarski, A Decision Method for Elementary Algebra and Geom- He is currently an Associate Professor in the
etry. Berkeley, CA: University of California Press, 1951. Department of Computer Science, Pennsylvania
[3] W. Miller, "Software for roundoff analysis," Ass. Comput. Mach. State University, State College, and is trying to
Trans. Math. Software, vol. 1, pp. 108-128, 1975. find time to pursue his interests in rounding er-
[4] W. Miller and D. Spooner, "Software for roundoff analysis, II," ror analysis and computational complexity.
be published in Ass. Comput. Mach. Trans. Math. Software.
[5] W. Kahan, "One numerical analyst's experience with one symbol
manipulator," SIAM Rev., vol. 16, p. 129, 1974.
[6] D. Stoutemyer, "Automatic error analysis using computer alge-
braic manipulation," submitted for publication.
[7] C. Ramamoorthy and S.-B. Ho, "Testing large software with au-
tomated software evaluation systems," IEEE Trans. Software David L. Spooner was born in State College,
Eng., vol. 1, pp. 46-58, 1975. PA, on April 13, 1953. He received the B.S. de-
[8] C. Moler, "Algorithm 423, linear equation solver," Commun. Ass. gree in computer science from Pennsylvania
Comput. Mach., vol. 15, p. 274, Apr. 1972. State University, University Park, PA, in 1975.
[9] R. Boyer, B. Elspas, and K. Levitt, "SELECT-A formal system He is currently a graduate student at Cornell
for testing and debugging programs by symbolic execution," in University, Ithaca, NY. His major interests are
Proc. 1975 Int. Conf. Reliable Software; also SIGPLANNotices, in the areas of programming languages and com-
vol. 10, pp. 234-245, June 1975. piler design.
[101 L. Clarke, "A system to generate test data and symbolically ex- Mr. Spooner is a member of the Association
ecute programs," Dept. Comp. Sci., Univ. of Colorado, Rep. CU- for Computing Machinery, Phi Kappa Phi, and
CS-060-75, Feb. 1975. Upsilon Pi Epsilon.

You might also like