Lecture Notes On Software Testing As A Supplement To The Lecture "Dependable Systems"

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Lecture notes on Software Testing as a supplement to the lecture Dependable Systems

Katinka Wolter 5. Januar 2006

Purpose of software testing: evaluate and increase software reliability, or MTTF. Dierent kinds of testing exist and dierent classications. E.g. functional testing evaluates the functionality of a software system without looking into its structure (modules, classes). For a given input the output is examined. structural testing (coverage) examines the structure of software, assignments, loops, conditions, etc. (This is white-box testing) user-oriented testing is similar to functional testing, only that it uses the software as a whole. While functional testing executes single functions, user-oriented testing evaluates the software as seen by a user. (This is a form of black-box testing.) Code coverage analysis is a structural testing technique. Structural testing compares test program behavior against the apparent intention of the source code. Structural testing examines how the program works, taking into account possible pitfalls in the structure and logic. Functional testing examines what the program accomplishes, without regard to how it works internally. Structural testing is also called path testing since you choose test cases that cause paths to be taken through the structure of the program. Do not confuse path testing with the path coverage measure, explained later. At rst glance, structural testing seems unsafe. Structural testing cannot nd errors of omission. However, requirements specications sometimes do not exist, and are rarely complete. This is especially true near the end of the product development time line when the requirements specication is updated less frequently and the product itself begins to take over the role of the specication. The dierence between functional and structural testing blurs near release time. We can distinguish testing as based on the software phase in which testing is used: unit (small software components) integration (larger software components) 1

product (the whole system) regression (re-release) The commonly used white-box testing is used to achieve structural coverage. It consists of data and control ow testing mutation testing. In particular we distinguish 1. Statement coverage. Every statement is executed at least once. Does statement coverage = 1.0 provide a guarantee for a fault free programm P? The chief disadvantage of statement coverage is that it is insensitive to some control structures. For example, consider the following C/C++ code fragment: int* p = NULL; if (condition) p = &variable; *p = 123; Without a test case that causes condition to evaluate false, statement coverage rates this code fully covered. In fact, if condition ever evaluates false, this code fails. This is the most serious shortcoming of statement coverage. If-statements are very common. Statement coverage does not report whether loops reach their termination condition - only whether the loop body was executed. With C, C++, and Java, this limitation aects loops that contain break statements. Since do-while loops always execute at least once, statement coverage considers them the same rank as non-branching statements. Statement coverage is completely insensitive to the logical operators ( and &&). Statement coverage cannot distinguish consecutive switch labels. Test cases generally correlate more to decisions than to statements. You probably would not have 10 separate test cases for a sequence of 10 non-branching statements; you would have only one test case. For example, consider an if-else statement containing one statement in the then-clause and 99 statements in the else-clause. After exercising one of the two possible paths, statement coverage gives extreme results: either 1% or 99% coverage. Basic block coverage eliminates this problem. Block coverage uses a sequence of statements instead of single statements and for the rest is like statement coverage. 2. Decision coverage is a measure indicating whether every decision in the code evaluated to true and false. This measure has the advantage of simplicity without the problems of statement coverage. 2

A disadvantage is that this measure ignores branches within boolean expressions which occur due to short-circuit operators. For example, consider the following C/C++/Java code fragment: if (condition1 && (condition2 || function1())) statement1; else statement2; This measure could consider the control structure completely exercised without a call to function1. The test expression is true when condition1 is true and condition2 is true, and the test expression is false when condition1 is false. In this instance, the short-circuit operators preclude a call to function1. 3. Data ow coverage. This variation of path coverage considers only the subpaths from variable assignments to subsequent references of the variables. It indicates whether all defuse pairs are covered. Example: S1 : x = f () S2 : p = g(x, .) S1 is the denition and S2 is the use of variable x, then S1 , S2 is a def-use pair. c-use, uses x in a computational expression p-use, uses x in a predicate. A path (S1 , S2 ) is denition-free, if no other statement between S1 and S2 denes x. A path is feasible, if d D where d is a test case from the input domain and test set D, such that P (d) executes (S1 , S2 ). All statements following statement S are its sucessors. A c-use or p-use is covered if there exists at least one input d which executes a denitionfree path (S1 , S2 ) and to all its successors. The advantage of this measure is the paths reported have direct relevance to the way the program handles data. One disadvantage is that this measure does not include decision coverage. Another disadvantage is complexity. 4. Mutation testing. Given a program P, generate mutants that are syntactically correct by using some algorithm, rules or tools, such as Mothra (http://ise.gmu.edu/ofut/rsrch/mut.html) implementing heuristics such as genetic algorithms, simulated annealing etc. Mutation testing is based upon seeding the implementation with a fault (mutating it), by applying a mutation operator, and determining whether testing identies this fault. If a test case d distinguishes between the mutant M and the original program P, M (d) = P (d) it is said to kill the mutant. The idea behind mutation testing is quite simple: given an appropriate set of mutation operators, if a test set kills the mutants generated by these operators then, since it is able to nd these small dierences, it is likely to be good at nding real faults. 3

Mutation testing may be used to judge the eectiveness of a test set: the test set should kill all the mutants. Similarly, test generation may be based on mutation testing: tests are generated to kill the mutants. Interestingly, many test criteria may be represented using mutation testing by simply choosing appropriate mutation operators. Functional testing uses operational proles. An operational prole consists in test inputs together with their relative frequencies of use. Operational prole = {(d, p), d D, p [0, 1]} To generate an operational prole rst developing a customer prole is needed. For all four white-box test methods criteria of adequacy exist. 1) A test set T is adequate with respect to (wrt) decision coverage, if all decisions in a software system are covered when executed against all t T. 2) A test set T is adequate wrt p-use (or c-use), if all p-uses (c-uses) are covered by T . 3) T is adequate wrt the mutation criterion if it distinguishes all non-equivalent mutants. Item 1) - 3) are measurable. For functional testing no measurable criterion exists. Some properties: If T is p-use or c-use adequate, then T is decision adequate. (A formal proof exists). We say that data ow coverage subsumes decision coverage (i.e. decision coverage data ow coverage). mutation adequate data ow adequate. ( does typically not hold.) If a test set is not data ow adequate, it is not mutation adequate. for several types of errors structural testing is not sucient, but functional testing is. (Errors of ommission). Testsequence always rst employs functional testing, then structural testing. How to obtain reliability measures? Reliability estimates are computed using a time-based model or a structure-based model. We formalise as follows. Let P be the program to test on a test case d from input domain D. Let Tk be the time of the k-th failure, Nk the number of tests used by time Tk . Dene the testing eort Ek as . Ek = Tk Tk1 Nk Nk1 for the time-based model for the test-case-based model

Let ei be the eort in execution i of P , then


l2

Ek =
i=l1

ei

where el1 and el2 is the eort of the rst and last execution of P during the k-th failure time interval. Another view on reliability: The reliability R of P is the probability of no failure over the entire input domain. R = P rP (d)is correct for any d D Time-based result: Let Sk =
i=1 k

Ei

be the cumulative eort over k inter-failure epochs. Let x be the expose period. The probability that the software will not fail during the next x time units is formalised as R(x|t) P rEk > x|Sk1 = t Convergence of R(x|t): R(x|t) R as x if the test inputs are operationally signicant. Example: In studies and semi-formal proofs it could be shown that structural testing is not able to reveal all faults. For functional testing not even a saturation eect can be proven. In a study, TEX by Knuth and AWK by Kernighan were tested using the tools TRIPTEST (TEX) and ATAC. The coverage statistics are Block TEX AWK 85 70 Decision 72 59 p-use 53 48 c-use 48 55

A possible scenario would look employ a test sequence as the one shown in the following graph. The dashed elds indicate the saturation region, where the test method employed does not reveal any more faults.

References
Handbook of Software Reliability Engineering, Michael Lyu (Ed.) http://www.bullseye.com/coverage

residual faults

faults revealed

Data flow

Functional

111111 000000 111111 000000 111111 000000 111111 000000 111111 000000

Decision

11111 00000 11111 00000 11111 00000 11111 00000 11111 00000 11111 00000 11111 00000 11111 00000

testing effort (t)

1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 Saturation 1111111 0000000 region 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000

Mutation

1111 0000 1111 0000 1111 0000 1111 0000 1111 0000 1111 0000 1111 0000 1111 0000 1111 0000 1111 0000 1111 0000 1111 0000 1111 0000 1111 0000 1111 0000 1111 0000 1111 0000 1111 0000

You might also like