Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Application of Artificial Bee Colony Algorithm to Software Testing

Surender Singh Dahiyaa, Jitender Kumar Chhabrab, Shakti Kumarc


a, b
National Institute of Technology, Kurukshetra (INDIA)
c
Institute of Science & Technology, Kalawad, Yamunanagar (INDIA)
surendahiya@gmail.com, jitenderchhabra@rediffmail.com, shaktik@gmail.com

Abstract generation problem. Section 5 presents the experimental


This paper presents an artificial bee colony based novel setup and its results, which is followed by conclusion in
search technique for automatic generation of structural section 6.
software tests. Test cases are symbolically generated by
measuring fitness of individuals with the help of branch
distance based objective function. Evaluation of the test 2. Related Works
generator was performed using ten real world programs.
Some of these programs had large ranges for input In test data generation activity, primary objective of
variables. Results show that the new technique is a search algorithm is to find the set of input data from
reasonable alternative for test data generation, but input-domains, which can reveal the maximum faults in
doesn’t perform very well for large inputs and where the software. Several types of traditional search
constraints are having many equality constraints. techniques such as random testing [8, 9, 10], algorithmic
search approaches [13, 14, 21] have been applied by
Key Words: Automatic test data generation; Artificial researchers but these are restricted by their excessively
Bee Colony (ABC) Algorithm; Swarm intelligence; slow search capability. Their performance also
Symbolic testing; Soft computing; Search Algorithm. deteriorates significantly for large and complex
programs and especially where input domain size is large
[23].
1. Introduction Most successful search algorithm class is based on
metaheuristic techniques such as Genetic Algorithm
Test data generation is the central activity in software (GA), Simulated Annealing (SA), tabu search, Ant
testing. Besides saving precious development cost, Colony Optimization (ACO), Particle Swarm
automation in test data generation is also targeted to Optimization (PSO) etc. Xanthakis [25] first time
generate unbiased, effective and efficient test data. applied GA for automatic test case generation. Pargas et
Highly non-linear structure of software presents a al [24] proposed a GA based testing technique where
challenge to search algorithm for finding optimal and number of executed control dependent nodes of the
efficient test data from a complex, discontinuous, non- target node decides the fitness of solutions population.
linear inputs’ search space. For such environment, search Wagener et al [1] logarithmized objective function to
algorithm must have both types of search capabilities; provide better guidance for its GA based test data
local as well as global. Several computational generator. Watkins [22] and Ropar [26] used coverage
intelligence based search algorithms such as genetic based criteria for assessing the fitness of individuals in
algorithm, simulated annealing and tabu search have their GA based test generator. Lin and Yeh [15] used
been used in the past for fulfilling the requirement of hamming distance based metric in objective function of
software testing and to improve the quality of their GA program to identify the similarity and distance
automation for test data generation [6, 19]. between actual path and the already selected target path
This paper presents an Artificial Bee Colony (ABC) for traversal in dynamic testing. Michal et al [20] have
based search algorithm to generate test data using used GA based testing method for covering all the
symbolic execution method. For experimentation conditions on a path for c and c++ programs.
purpose ten real world programs are taken. The Tracey [16] constructed a SA based test data generator
organization of this paper is as follows. Section 2 of this for safety critical system by using a hybrid objective
paper describes various search techniques used for test function, which includes both concepts; branch distance
data generation. Section 3 represents software testing as and number of executed control-dependent-nodes. Diaz
a search based problem. Section 4 explains the general et al [5] developed a tabu search based test data
principles of ABC and its application to test data generator, which maintains a search list also called as
tabu list. It uses neighbourhood information and symbolically executes a testing path as identified from
backtracking for solving local optima. Ayari et al [11] CFG of program. A valid test case is generated, which
proposed an evolutionary approach based on ACO to should execute the particular path by satisfying all of the
reduce the cost of test data generation in the context of boolean expression included in that path. Concatenation
mutation testing. This ACO based approach is enhanced of all such expressions involved in that path is then done
by a probability density estimation technique in order to to generate a composite predicate. Internal variables are
provide better guidance to the search for continuous converted in terms of basic inputs and subsequently
input parameters. replaced in constraints system during the process of
Windisch et al [2] have reported the application of concatenation as it may affect the subsequent execution
PSO method for test data generation for dynamic testing. criterion of the remaining path.
Authors in [9] proposed another PSO based algorithm From above discussion, test case generation problem
for automatic test case generation activity using can be thought as the application of some search
symbolic testing. The approach has been validated and algorithm to determining the values of input variables in
compared with GA and was found to be a promising the program from input space to satisfy some given
alternate for test case generation. criteria. Even a moderate program may have very large
Another recent search algorithm in swarm intelligence input space and generating test cases from such a large
category is Artificial Bee Colony algorithm, which input space, meets the criteria to be an NP-hard [27] or
simulates the honey bees’ working toward food foraging NP-complete [17] problem and hence it becomes a
and nectar gathering system optimization. Although this perfect case for employing good search and optimization
technique has been successfully employed on scores of algorithm.
engineering applications such as internet server
allocation [29], pattern recognition [30] , job scheduling
[12], data clustering [31] etc., but its applicability in
testing domain is still unexplored.

3. Software testing as Search problem


Both functional and structural testing criteria can be
employed for generation of test data. In structural
testing, these criteria can be anything from all-statements
execution to all-paths coverage [7]. An all-paths
coverage criterion is concerned with execution of all
feasible paths by the generated test data sets. In path Figure1. Block diagram of automatic test data
testing, each feasible path is selected from control flow generator for symbolic path testing.
graph (CFG) of program and then test inputs are
generated in such a way that on executing program with Figure 1 shows the different building blocks of a path
these, it covers all the branches in that path. In other based automatic test data generator for symbolic testing.
words, to cover a particular branch, the condition(s) at First, test object source code is fed to program
branch node must be satisfied by the test data, which instrumentation for CFG and node expressions
direct control flow of program to that particular branch. generation. Subsequently, CFG is used to generate all
A path may contain several branches and in order to possible paths which are filtered manually for feasible
execute that path, all these branch-conditions must be path in order to become input to constraint system
evaluated to true by the test data. Consequently, the generator. Finally, constraint system particular to each
problem of path testing can be formulated simply as feasible path is solved by a search algorithm by taking
condition or constraint satisfaction problem, which inputs from input domains, which become valid test data
should be analyzed and solved with the help of some subsequently.
search method by generating inputs in such a way so that
it can satisfy all of the path constraints. This analysis can
be dynamic as well as static. In dynamic analysis, 4. ABC and Test data generation
program is actually executed with the values of input
variables and then objective function determines the
extent up to which it has satisfied the testing criterion, 4.1 Search algorithm
which becomes the fitness of the set of values (also
called test case). On the other hand, static analysis does ABC algorithm is biologically inspired technique of
not require the actual execution of program, but it swarm intelligence for searching. It is all about honey
bees’ work distribution and collective foraging strategy profitability is measured. In computer model this
to accumulate extra nectar for their survival in winter profitability is represented by the fitness of the position.
season. Seeley [4] investigated the behavior of bees in ABC algorithm works in three phases. First phase is
distributing their work to optimize the collection of called employed phase where employed bees modified
nectar. Instead of initiating exploration by all bees, some the position of elite flower patches (where profitability is
dedicated explorer bees (scout bees) are appointed to higher) in neighborhood. Second phase is executed by
explore the “profitability” of flower patches in the onlookers, who modify their patches’ positions with
surrounding environment. This profitability accounts influence from elite positions. These positions are known
various parameters such as amount of nectar in flower as selected patches. After every phase a greedy selection
patches, sugar contents in nectar, distance of flower process is repeated, where solutions (flower patch
patches from bee hive etc. If an explorer bee satisfies positions) compete themselves for retention in the elite
itself that there is sufficient profitability then it recruits or selected patches based on their fitness. In this process
unloader for unloading the nectar it has collected during some of the sources may migrate from one category to
exploration and dances (known as waggle dance) on another or some may be abandoned in favor of randomly
dance floor (a designated place in beehive) to give generated sources, which are simulated by scout phase of
feedback to foragers (observer or onlookers bees, which the algorithm. Subsequently, these search processes of
actually collect nectar from patches) about the quality of the employed, onlooker, and scout phases are repeated in
the flower patch, which they have recently searched out. cycles until the stopping criterion is met. ABC algorithm
The dance strength and its inclination with Sun doesn’t have much flexibility in tuning its parameters for
determine the distance and the direction of the the best results other than the size of colony and number
designated flower patch from beehive. The working of of bees’ allocation for the elite patches. For test case
honeybee colony is reported as robust and adaptive by generation, we have taken size of colony as 30 where
[3]. This motivates us to use this for path testing half of these work as employed bees for elite patches.
approach. Figure 2 gives the pseudo code for honey bee
algorithm.
4.2 Fitness Function
1. Initialize the random population of solutions (flower patch
positions) For path testing criterion, in order to traverse a feasible
2. Evaluate the population
3. Produce new solutions in the neighborhood of for the path, the control must satisfy the entire branch
employed bees by using following equation. predicates, which falls on that particular path. In our
-----------------------(1) experimentations, we have used symbolic execution
Where is a random number between 0 to 1 and is a technique of static structural testing. So, corresponding
randomly selected solution. to each path a compound predicate (CP) is made by
4. Apply the greedy selection process between and .
5. Calculate the probability values for the solutions by
‘anding’ each branch predicate of the path. The CP must
means of their fitness values, be evaluated to true by a candidate solution in turn to
----------------------------------------------(2) become a valid test case. The ABC generates population

6. Produce new solutions (new positions) for the onlookers of candidate solutions and these are used to evaluate CP.
from the solutions depending on probability and If predicate is not evaluated to true by an individual then
evaluate them. all the constraints of particular path are split into distinct
7. Apply the greedy selection process between new and old predicate (DP) and one by one each DP is evaluated by
solution
8. Determine the abandoned solution (source), if exists, and
taking values of its operands from candidate solution. A
replace it with a new randomly produced solution for the DP is that one, which contains only one operator (a
scout. constraint with modulus operator is exception) and can
9. Memorize the best food source position (solution) achieved be expressed in form of expression where and
so far
are LHS and RHS of expression made of one or more
10. Repeat step 3 to 9 until stopping criterion is reached
Figure 2. Artificial Bee Colony algorithm
operand(s) and is relational operator. If DP is
satisfied then no penalty is imposed to candidate
solution, otherwise candidate solution is penalized on the
For test data generation, initially a random population basis of branch distance concept rules as shown in table
of candidate solutions is generated from the inputs’ 1, which is also recommended by Watkins et al [18] for
domains. In ABC the solutions are represented by the static structural testing.
position of flower patches. The optimum positions of After this integrated fitness due to whole of CP is
flower patches are searched out in such a way so that determined by adding penalty values of two DPs, if they
positions of these can satisfy the targeted path constraint are connected by a conditional ‘and’ operator. If two
system. Corresponding to each flower patch, its DPs are connected by a conditional ‘or’ operator then
minimum penalties of two DPs is considered for the
evaluation of whole CP fitness. If integrated fitness is case is a solution, which does not qualify to become a
zero then CP is called evaluated or satisfied by the test case.
individual, whose values are replaced in CP and search We have chosen 10 real world programs for test data
process for particular path is terminated otherwise search generation activity. Some of these are frequently used by
is allowed to proceed further. researchers. These are called test objects here and brief
explanation for each test object is given below.
Table1. Branch Predicate based Fitness function

Violated Penalty to be imposed in case 1. Triangle classifier (TC) is one of the most used
distinct predicate is not satisfied programs for experimentation of test data generation
predicate in structural testing environment. It accepts three
A<B A – B+ ζ inputs as sides of a triangle and then decides whether
A≤B A–B these sides form a triangle and if yes, then of what
type. This program contains total 7 feasible paths of
A>B B – A+ ζ
which four involves equality constraints.
A≥ B B–A
A=B Abs(A – B) 2. Line-rectangle classifier (LRC) program identifies
A≠B ζ – abs(A – B) whether a line cuts a rectangle or lies completely
A and B are operands and ζ is a smallest constant of outside or lies completely inside of the rectangle. In
operands’ universal domains. In case integer it is 1 this program total eight inputs are entered; four for
and in case real values it can be 0.1 or 0.01 depending co-ordinates of rectangle and other four inputs to
on the accuracy we need in solution. define the line. Some of the nodes in CFG of this
program have very high level of nesting. This is main
reason of using this program so that the difficulty of
testing a nested structure can be found.
5. Experimental Setup and Results 3. Number of days between two dates program(DBTD)
accepts six integer input variables representing two
In order to prove the worthiness of the ABC method dates. Input ranges for first date year and second date
for test data generation, we have experimented on ten year are between 2000 and 2100. This program
real world problems. The aim of the experiment was to contains plenty of branches with equality conditions;
generate test cases automatically from the corresponding some of them use the remainder operator, which adds
CFG using the standard ABC algorithm. CFG of discontinuity to the decisions domains and therefore
programs are automatically constructed from respective tester may face a greater difficulty in finding the test
source code and all feasible paths are identified cases that cover those branches. The nesting level is
manually. The fitness function corresponding to the very high for some of the nodes. These characteristics
target path is constructed using the concept of symbolic make this program an ideal one to evaluate the
testing and path constraint system, which has been effectiveness and efficiency of automatic test
already described in section 4.2 of this paper. ABC generator for the path coverage criterion. This
algorithm is implemented using MATLAB programming program also contains several loops. We have
environment. The performance of the algorithms is converted loops in case statements in such a way so
measured using Average Test Cases generated Per Path that each condition within the loop is executed at
(ATCPP) and Average Percentage Coverage (APC) least once by test cases and it covers each statement
metrics. Experiment is conducted 10 times for averaging in the loop.
results. In each attempt, ABC is iterated for 100
4. Program ‘a2f’ (A2F) converts a numeric string into
generations for each of 10 runs. In each run, except for
real value. The main reason of taking this program is
the first run, first-generation population is seeded with
its complexity and the nested structure it is having,
the best solution from the previous run. This is done to
making compound constraint in symbolic testing
check premature convergence of population. Total
more complex and hard to be satisfied. It inputs an
number of real encoded individuals in each population is
array of numeric characters. Input domain for each
30. If a solution is not found within all runs that
position in array is 0 to 127 which represents
generates total 30,000 invalid test cases then it is
characters in ASCII table. This program has 15
declared that the test case generation has failed for that
decision nodes. The highest nesting level is seven
particular attempt. This value has been obtained by
which is rare in most real world programs. This
multiplying total number of runs, generations and
program contains a few equality conditions branches
number of individual in each population. An invalid test
also. This program also contains several loops. We
have allowed loops to execute utmost thrice thereby
limiting the explosion of number of paths but side by ABC for large domains of inputs. It also fails to generate
side enough chance is given for the execution of test data for TC (small domain) frequently for a path in
every statement in loop and traversing its effect in which it has to prove triangle as equilateral. Thereby, we
future execution of loop. can also conclude that search algorithm performance is
affected by the number of equality constraints the target
5. Binary search (BS) program accepts a variable size
path involving. Other than these, the binary search is the
array of maximum 80 elements. The loop in the
only program for which ABC fails to generate test cases
program is allowed to execute 5 times utmost.
miserably. This may be due to requirement of inputting
6. Remainder (REM) finds the remainder of two integer variable array to satisfy the boundary cases. Although
numbers. It also contains 4 loops which are again we have taken a fixed size array of size 80 but its size is
restricted for 5 executions only. varied by taking an external variable ‘n’ during
experimentation. Although, we have used the same
7. Bubble sort (BUB) arrange an array in ascending approach for A2F and BUB programs but in these,
order. It accepts a variable size array. This program is boundary cases are not required to be satisfied.
unique in the sense that it is the only program in this
set of programs, where nested loops structure is Table 3. ATCPP and APC for Test Objects
present. Name of Program ATCPP APC
8. Quadratic equation program(QUAD) finds the roots TC (small Domain) 6197 85.71
of a quadratic equation. Program also tests equation TC (Large Domain) 17156 42.86
for linearity or infeasibility. LRC (small Domain) 1255 100
9. Min-max program (MINMAX) finds minimum and LRC (Large Domain) 3924 89.06
maximum value from an array. In this program loop DBTD 206 100
is again allowed to execute 5 times utmost. A2F 3195 100
BS 15545 51.94
10. Isprime (ISPRIME) program test an integer for its
primeness. This is the simplest program in the list. REM 970 100
BUB 258 100
Detail characteristics of these test objects are given in QUAD 1930 100
table 2. MINMAX 619 100
ISPRIME 52 100
Table 2. Test Objects’ characteristics
Name of
Highest Nesting
Decision Nodes

Program
Feasible Paths
Total Paths in
Lines of Code

Complexity
Cyclomatic

Number of

6. Conclusion
Level

CFG

We have proposed a swarm intelligence based


approach for structural software testing.
TC 35 07 06 05 07 07 Experimentations are done on ten real world problems.
LRC 56 19 18 12 17 17 Static testing based symbolic execution method has been
DBTD 123 26 22 05 1643 566
A2F 48 15 14 07 910 568
used in which first, target path is selected from CFG of
BS 23 05 04 03 124 62 program and then inputs are generated using ABC
REM 35 10 8 04 22 22 method to satisfy composite predicate corresponding to
BUB 21 04 03 03 121 31 the target path. The technique has performed
QUAD 24 06 05 03 06 06 satisfactorily for most of the program except for
MINMAX 27 04 03 03 121 121
programs having large inputs’ domains and many
ISPRIME 16 03 02 02 10 08
equality based paths’ constraints.

Table 3 presents the results of testing efforts we have


made on 10 testing objects selected for experimentation. REFERENCES
Test cases for TC and LRC programs are generated from
inputs by taking small as well as large domain of size [1] Wegener J, Baresel A, Sthamer H., “Evolutionary test
104 and 108 respectively for each path. ABC is able to environment for automatic structural testing”.
generate test cases for all paths except in cases of TC Information and Software Technology, 2001;43, 841–
(small as well as large domain), LRC (large domain) and 54,
binary search program. This shows the inapplicability of
[2] Windisch A, Wappler S and Wegener J, Applying [18] Watkins A and Hufnagel EM, Evolutionary test data
Particle Swarm Optimization to Software Testing, generation: a comparison of fitness functions. Software
Proceedings of the 2007 conference on Genetic and Practice & Experience 2006; 36:95–116
evolutionary computation GECCO’07, London,
England, United Kingdom, July 7–11, 2007 [19] McMinn P, Search-based Software Test Data
Generation: A Survey. Software Testing, Verification
[3] Schmickl T, Thenius R and Crailsheim K, Simulating and Reliability June 2004; 14(2):105-156.
Swarm Intelligence in Honey Bees: Foraging in
Differently Fluctuating Environments, GECCO'05, [20] Michael C, McGraw G and Schatz M, Generating
Washington, DC, USA, 273-274,2005. software test data by evolution. IEEE Transactions on
[4] Seeley TD, The Wisdom of the Hive, Harvard Software Engineering 2001; 27(12):1085–1110.
University Press, Cambridge, MA, 1995. [21] Miller W and Spooner D, Automatic generation of
[5] Díaz E, Javier T, Raquel B and José JD, A tabu search floating-point test data. IEEE Transactions on Software
algorithm for structural software testing. Computers and Engineering 1976; 2(3):223-226.
Operations Research (2007), doi:10.1016/j.cor. [22] Watkins AL, The automatic generation of test data using
2007.01.009 genetic algorithms. In The fourth software quality
[6] Edvardsson J, A survey on automatic test data conference 1995; 2:300–309.
generation. In Proceedings of the second conference on [23] Myers GJ. The art of software testing. New York:
computer science and engineering, Linkoping: ESCEL; Wiley; 1979
October 1999; 21–28.
[24] Pargas RP, Harrold MJ and Peck R,. Test-data
[7] Frankl PG, Weyuker EJ, An Applicable Family of Data generation using genetic algorithms. Journal of Software
Flow Testing Criteria. IEEE Transaction On Software Testing, Verification and Reliability 1999; 9(4):263–82.
Engineering. 1988; 14(10):1483-1498.
[25] Xanthakis S, Ellis C, Skourlas C, Gall AL, Katsikas S
[8] Duran JW and Ntafos S, A Report On Random Testing. and Karapoulios K, Application of genetic algorithms to
International Conference on Software engineering software testing. In The fifth international conference on
Proceedings of the 5th international conference on software engineering 1992; 625–36.
Software engineering 1981, San Diego, California,
United States March 09 - 12, 1981 [26] Roper M, Computer aided software testing using genetic
algorithms. In 10th International Software Quality
[9] Thayer RA, Lipow M and Nelson EC, Software Week, San Francisco, USA, 1997.
Reliability, North-Holland, Amsterdam, 1978.
[27] Yuan Z, A Search-Based Framework for Automatic
[10] DeMillo RA, Lipton RJ and Sayward FG, Hints on Test Test-Set Generation for MATLAB/Simulink Models.
Data Selection: Help for the Practicing Programmer. PhD Thesis, University of York Department of
IEEE Computer, Vol. II, No. 4, pp. 34-41, 1978. Computer Science, December 2005.
[11] Ayari K, Bouktif S and Antoniol G, Automatic [28] Dahiya SS, Chhabra JK and Kumar S, Application of
Mutation Test Input Data Generation via Ant Colony, Particle Swarm Optimization Algorithm to Symbolic
GECCO’07, July 7–11, 2007, London, England, United Software Testing. ADCOM 2009, to be held in
Kingdom. Bangalore on 14-17 December 2009. (Communicated
[12] Chong CS, Low MYH, Sivakumar AI and Gay KL, A for publication)
Bee Colony Optimization Algorithm to Job Shop [29] Nakrani S and Tovey C, On Honey Bees and Dynamic
Scheduling. Proceedings of the 37th Winter Simulation, Allocation in an Internet Server Colony. Proceedings of
Monterey, California, 1954-1961, 2006. 2nd International Workshop on the Mathematics and
[13] Korel B, Automated software test data generation. IEEE Algorithms of Social Insects, Atlanta, Georgia, USA,
transaction on software engineering, 1990; 16(8):870- 2003.
879. [30] Pham DT, Otri S, Afify A, Mahmuddin M, and Al-
[14] Demillo RA and Offutt AJ, Constraint-based automatic Jabbouli H, Data clustering using the Bees Algorithm.
test data generation. IEEE transaction on Software In 40th CIRP International Seminar on Manufacturing
engineering, 1991; 17(9): 900-910. Systems. 2007: Liverpool.

[15] Lin JC,Yeh PL, Automatic test data generation for path [31] Pham DT, Otri S, Ghanbarzadeh A, Kog E, Application
testing using GAs. Information Sciences 2001; 131:47– of the Bees Algorithm to the Training of Learning
64. Vector Quantisation Networks for Control Chart
Pattern Recognition. ICTTA'06 Information and
[16] Tracey N, A Search-Based Automated Test-Data Communication Technologies, 1624-1629, 2006b.
Generation Framework for Safety Critical Software.
PhD thesis, University of York, 2000.
[17] Mansour N and Salame M. Data generation for path
testing. Software Quality Journal 2004; 12:121–136.

You might also like