Basic Definitions: Testing: What Is Software Testing?

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 50

Basic Definitions: Testing

 What is software testing?


• Running a program
• In order to find faults
• a.k.a. defects
• a.k.a. errors
• a.k.a. flaws
• a.k.a. faults
• a.k.a. BUGS

 Hrm. . . that’s a lot of “a.k.a”s


• Let’s refine this terminology a bit

1
Faults, Errors, and Failures
 Fault: a static flaw in a program
• What we usually think of as “a bug”

 Error: a bad program state that results


from a fault
• Not every fault always produces an error

 Failure: an observable incorrect behavior


of a program as a result of an error
• Not every error ever becomes visible

2
To Expose a Fault with a Test
 Reachability: the test much actually reach
and execute the location of the fault
 Infection: the fault must actually corrupt
the program state (produce an error)
 Propagation: the error must persist and
cause an incorrect output – a failure

3
An Example
int findLast (int a[], int n, int x) {
// Returns index of last element in a
// equal to x, or -1 if no such.
// n is length of a
int i; Find the fault
for (i = n-1; i > 0; i--) {
if (a[i] == x)
return i;
}
return -1;
}

4
An Example
int findLast (int a[], int n, int x) {
// Returns index of last element in a
// equal to x, or -1 if no such.
// n is length of a Here’s a test case:
int i;
a = {}
for (i = n-1; i > 0; i--) { n=0
if (a[i] == x) x=2
return i;
} Does not even reach
the fault
return -1;
}

5
An Example
int findLast (int a[], int n, int x) {
// Returns index of last element in a
// equal to x, or -1 if no such.
// n is length of a Here’s another:
int i;
a = {3, 9, 4}
for (i = n-1; i > 0; i--) { n=3
if (a[i] = x) x=2
return i;
} Reaches the fault
Infects state with error
return -1;
But no failure
}

6
An Example
int findLast (int a[], int n, int x) {
// Returns index of last element in a
// equal to x, or -1 if no such.
// n is length of a And finally:
int i;
a = {2, 9, 4}
for (i = n-1; i > 0; i--) { n=3
if (a[i] = x) x=2
return i;
} Reaches the fault
Infects state with error
return -1;
And fails – returns -1
} instead of 0

7
Controllability and Observability
 Goals for a test case:
• Reach a fault
• Produce an error
• Make the error visible as a failure
 In order to make this easy the program must be
controllable and observable
• Controllability:
• How easy it is to drive the program where we want
to go
• Observability:
• How easy it is to tell what the program is doing

8
Design for Testability
 If a program is not designed to be
controllable and observable, it generally
won’t be
 We have to start preparing for testing
before we write any code
• Testing as an after-the-fact, ad hoc, exercise is
often limited by earlier design choices

9
Test-Driven Development
 One way to design for testability is to write the
test cases before the code
• Idea arising from Extreme Programming and agile
development
• Write automated test cases first
• Then write the code to satisfy tests

• Helps focus attention on making software well-specified


• Forces observability and controllability: you have to be
able to handle the test cases you’ve already written
(before deciding they were impractical)
• Reduces temptation to tailor tests to idiosyncratic
behaviors of implementation

10
Controllability: Simulation and Stubbing
 A key to controllable code is effective
simulation and stubbing
• Simulation of low-level hardware devices
through a clean driver interface
• Real hardware may be slow
• May be impossible/expensive to induce some
hardware failure modes on real hardware
• Real hardware may be a limited resource
• Stubbing for other routines and code
• Other code/modules may not be complete
• May be slow and irrelevant to test
• May need to simulate failure of other modules

11
Simulation and Stubbing: JPL Example
 When testing JPL flash storage modules we
rely on software simulation of flash devices
• Real flash devices are slow
• Can’t do aggressive random testing
• Real flash devices are expensive
• JPL only has a few boards – constant competition to
test on these
• Running hundreds of thousand of tests will wear the
flash hardware out
• Enables us to introduce rare hardware failures
• System resets, spontaneous bad blocks and write
failures, etc.

12
Controllability: Downwards Scalability
 Another important aspect of controllability is
to make code “downwards scalable”
• Many faults cause an error only in a corner
case due to a resource limit
• An effective strategy for finding errors is to
reduce the resource limits
• Test a version of the program with very tight bounds
• Finding corner cases is easier if the corners are
close together
• Too many programs hard-code resource limits
or make assumptions about resources
unconnected to defined limits
• E.g., not checking the result of malloc

13
Downwards Scalability: JPL Example
 Flight flash hardware is usually 1-4 GB
device
• E.g., 64 blocks of 32 pages of 8192 bytes
 We primarily test with much smaller “devices”
(using software simulation)
• 6 blocks of 4 pages of 64 bytes
• Forces flash file system to compact storage
more often
• Tests assumptions about how space is used on
flash
• Forces more multi-page writes and directory
entries over multiple pages
14
Downwards Scalability: JPL Example
 Easier to explore various combinations of
states of blocks/pages of the device

Used page
Free page

Dirty page

Bad block

15
Controllability
 Other important themes for controllability
• Network/file access
• If program reads from the network or to remote files,
this is hard to control
• Again, simulation and stubbing are key
• System calls
• Similarly, reading the time from the operating system
can be hard to control
• Simulation and stubbing – Operating System
Abstraction Layer etc.
• GUI control
• Allow scripted control of GUI elements so tests can
be automated

16
Observability: Assertions
 Assertions improve observability by making
(some) errors into failures
• Even if the effect of a fault doesn’t propagate, it
may be visible if an assertion checks the state
at the right time
 Assertions also improve observability by
making the error, rather than failure, visible
• Know how the state was corrupted
directly, not just eventual effect

17
Observability: Invariant Checkers
 Can extend the idea of assertions to writing
“full” invariant checkers
• Do a crawl of code’s basic data structures
• Check various invariants that would be
too expensive to check at runtime
• Invariant checker can be written to be
easy-to-use: recursion, memory
allocation, etc.
• Won’t run on actual system
• But be careful! If your invariant checker has
a bug and changes the system state. . .

18
Observability
 Other important themes for observability
• Logging
• Especially critical for GUI interfaces, to mirror
GUI events in ordered parseable messages
• Network/file access
• If program writes to the network or to remote
files, this is hard to observe

19
Controllability & Observability: Memory
Allocation
 More extreme case: embedded code for
mission or safety critical systems
• May be running without memory protection
• Dynamic allocation often forbidden
 Design module to accept a static block allocated
elsewhere, and only access this memory
• Controllability: allows us to introduce memory
faults, simulate warm reboots
• Observability: allows us to easily instrument
code with low-overhead checks to find memory
safety violations during testing

20
Coverage
 Literature of software testing is primarily
concerned with various notions of coverage
 Ammann and Offutt identify four basic kinds of
coverage:
• Graph coverage

• Logic coverage

• Input space partitioning

• Syntax-based coverage
21
Graph Coverage

 Cover all the nodes, edges, or paths of


some graph related to the program
 Examples:
• Statement coverage
• Branch coverage
• Path coverage
• Data flow (def-use) coverage
• Model-based testing coverage
• Many more – most common kind of
coverage, by far
22
Graph Coverage

 Most FSM testing algorithms can be seen


as graph coverage
• Consider VC – computing a spanning tree
to nodes is standard graph exploration

• Beizer: “find a graph and cover it”

23
Statement/Basic Block Coverage
if (x < y) Statement coverage:
{ Cover every node of these
y = 0; 1 graphs
x = x + 1; x<y x >= y
} y=0
x=x+1 2 3 x=y
else
{
x = y; 4
}
if (x < y) 1
{ x<y
Treat as one node because y = 0; y=0 x >= y
x=x+1 2
if one statement executes x = x + 1;
the other must also execute }
(code is a basic block) 3

24
Branch Coverage
if (x < y) Branch coverage vs.
{ statement coverage:
y = 0; 1 Same for if-then-else
x = x + 1; x<y x >= y
} y=0
x=x+1 2 3 x=y
else
{
x = y; 4
}
if (x < y) 1
But consider this if-then { x<y
structure. For branch coverage y = 0; y=0 x >= y
can’t just cover all nodes, but x=x+1 2
x = x + 1;
must cover all edges – get to }
node 3 both after 2 and without 3
executing 2!

25
Path Coverage
How many paths through
if (x < y) this code are there? Need
{ one test case for each to
y = 0; get path coverage
x = x + 1;
1
} To get statement and branch
x<y x >= y
else coverage, we only need two
y=0 test cases:
{ 2 3 x=y
x=x+1
x = y; 1 2 4 5 6 and 1 3 4 6
}
4 Path coverage needs two more:
x<y 12456
if (x < y)
y=0 x >= y
{ x=x+1 5 1346
1246
y = 0;
6 13456
x = x + 1;
}
In general: exponential in
the number of conditional branches!
26
Data Flow Coverage
x = 3;
1 x=3
Def(x)
y = 3; Annotate program with
2 y=3 locations where variables
if (w) { Def(y) are defined and used
x = y + 2; (very basic static
} 3 analysis)
w Def-use pair coverage requires
if (z) { x=y+2 executing all possible pairs
!w
y = x – 2; 4 of nodes where a variable is
Def(x)
} Use(y) first defined and then used,
without any intervening
n=x+y 54 re-definitions
z
y=x-2 !z E.g., this path covers the pair
Def(y) 6 where x is defined at 1 and used
Use(x) at 7: 1 2 3 5 6 7

May be many pairs, 7 n=x+y But this path does NOT:


some not actually executable Use(x) Use(y) 1234567

27
Logic Coverage
What if, instead of:

if (x < y) 1
{ ((a>b) || G)) && (x < y)
y = 0; y=0 ((a <= b) && !G) || (x >= y)
x = x + 1; x=x+1 2
}
we have: 3

if (((a>b) || G)) && (x < y)) Now, branch coverage will guarantee
{ that we cover all the edges, but does
y = 0; not guarantee we will do so for all
x = x + 1; the different logical reasons
}

We want to test the logic of the guard


of the if statement
28
Active Clause Coverage

( (a > b) or G ) and (x < y)


With these values
for G and (x<y), 1 T F T T
(a>b) determines
the value of the
predicate
2 F F T F
With these values
for (a>b) and 3 F T T T duplicate
(x<y), G
determines the
value of the
4 F F T F
With these values
predicate
for (a>b) and G,
(x<y) determines
5 T T T T
the value of the
predicate 6 T T F F
29
29
Input Domain Partitioning
 Partition scheme q of domain D
 The partition q defines a set of blocks, Bq = b1 ,
b2 , … bQ
 The partition must satisfy two properties:
1. blocks must be pairwise disjoint (no overlap)
2. together the blocks cover the domain D (complete)

b1 b2 bi  bj = ,  i  j, bi, bj  Bq

b3  b=D
b  Bq
Coverage then means using at least one input from each
of b1, b2, b3, . . .
30
30
Input Domain Partitioning
 Some subtleties here…
 What’s wrong with this partition of file contents?
• {
• b1: Sorted ascending file
• b2: Sorted descending file
• b3: Neither sorted ascending nor sorted descending
• }

b1 b2 bi  bj = ,  i  j, bi, bj  Bq

b3  b=D
b  Bq

31
31
Syntax-Based Coverage

 Based on mutation testing (a pet topic of


Amman and Offutt, who are heavily into this
research area)
 Bit different kind of creature than the other
coverages we’ve looked at
 Idea: generate many syntactic mutants of
the original program
 Coverage: how many mutants does a test
suite kill (detect)?

32
32
Mutating Our Buggy Program
int findLast (int a[], int n, int x) {
// Returns index of last element in a
// equal to x, or -1 if no such.
// n is length of a
int i;
for (i = n-1; i > 0; i--) {
if (a[i] = x)
return i;
}
return -1;
}

33
Mutant #1
int findLast (int a[], int n, int x) {
// Returns index of last element in a
// equal to x, or -1 if no such.
// n is length of a
int i;
for (i = n; i > 0; i--) {
if (a[i] = x)
return i;
}
return -1;
}

34
Mutant #2
int findLast (int a[], int n, int x) {
// Returns index of last element in a
// equal to x, or -1 if no such.
// n is length of a
int i;
for (i = n-1; i > 0; i--) {
if (a[i] = x)
return i;
}
return 0;
}

35
Mutant #3
int findLast (int a[], int n, int x) {
// Returns index of last element in a
// equal to x, or -1 if no such.
// n is length of a
int i;
for (i = n-1; i > 0; i--) {
if (a[i] != x)
return i;
}
return -1;
}

36
Mutant #4
int findLast (int a[], int n, int x) {
// Returns index of last element in a
// equal to x, or -1 if no such.
// n is length of a
int i;
for (i = n-1; i > 0; i--) {
if (a[i] = n)
return i;
}
return -1;
}

37
Mutant #5: Wait, this one’s the fix!
int findLast (int a[], int n, int x) {
// Returns index of last element in a
// equal to x, or -1 if no such.
// n is length of a
int i;
for (i = n-1; i >= 0; i--) {
if (a[i] = x)
return i;
}
return -1;
}

38
Syntax-Based Coverage
MUTANTS OF P
Program P

100% coverage
means you kill
all the mutants with
your test suite

39
39
Generation vs. Recognition
 Generation of tests based on coverage
means producing a test suite to achieve a
certain level of coverage
• As you can imagine, generally very hard
• Consider: generating a suite for 100%
statement coverage easily reaches
“solving the halting problem” level
• Obviously hard for, say, mutant-killing

 Recognition means seeing what level of


coverage an existing test suite reaches

40
Coverage and Subsumption
 Sometimes one coverage approach subsumes another
• If you achieve 100% coverage of criteria A, you are
guaranteed to satisfy B as well
• For example, consider node and edge coverage
• (there’s a subtlety here, actually – can you spot it?)

 What does this mean?


• Unfortunately, not a great deal
• If test suite X satisfies “stronger” criteria A and test suite
Y satisfies “weaker” criteria B
• Y may still reveal bugs that X does not!
• For example, consider our running example and statement
vs. branch coverage
• It means we should take coverage with a grain of salt,
for one thing

41
Testing “for” Coverage
 Never seek to improve coverage just for the
sake of increasing coverage
• Well, unless it’s a command from-on-high
 Coverage is not the goal
• Finding failures that expose faults is the goal
• No amount of coverage will prove that the
program cannot fail

“Program testing can be used to show the


presence of bugs, but never to show their
absence!” – E. Dijkstra, Notes On Structured
Programming

42
The Purpose of Testing
“Program testing can be used to show the
presence of bugs, but never to show their
absence!” – E. Dijkstra, Notes On
Structured Programming

 Dijkstra meant this as a criticism of testing and an


argument in favor of more disciplined and total
approaches (proving programs correct)
 But he also points out what testing is good for:
exposing errors
 Coverage is valuable if and only if test sets with
higher coverage are more likely to expose failures

43
The Purpose of Testing
“Program testing can be used to show the
presence of bugs”

 When we first start “testing,” we often want to


“see that the program works”
• Try out some scenarios and watch the program
“do its stuff”
• Surprised (annoyed) when (if) the program fails
• This is not really testing: testing is not the
same as a demonstration
• Aim to break (your) code, if it can be broken
44
Levels of Testing
 Adapted from Beizer, by Amman and Offutt
• Level 0: Testing is debugging
• Level 1: Testing is to show the program works
• Level 2: Testing is to show the program
doesn’t work
• Level 3: Testing is not to prove anything
specific, but to reduce risk of using program
• Level 4: Testing is a mental discipline that
helps develop higher quality software

45
What’s So Good About Coverage?
 Consider a fault that
causes failure every int findLast (int a[], int n, int x) {
// Returns index of last element
time the code is // in a equal to x, or -1 if no
// such. n is length of a
executed int i;

 Don’t execute the for (i = n-1; i >= 0; i--) {


if (a[i] = x)
return i;
code: cannot possibly }
return 0;
find the fault! }

 That’s a pretty good


argument for
statement coverage

46
What’s So Good About Coverage?
 We should have an
argument for any kind int findLast (int a[], int n, int x) {
// Returns index of last element
of coverage: // in a equal to x, or -1 if no
// such. n is length of a

• “If I don’t cover this, int i;

then there is more for (i = n-1; i >= 0; i--) {


if (a[i] = x)
chance I’ll miss a }
return i;

fault like that” return 0;


}
• Backed with
empirical data,
preferably!

47
Return to Our Example
int findLast (int a[], int n, int x) {
// Returns index of last element in a
// equal to x, or -1 if no such.
// n is length of a
Let’s write a tester for
int i;
this version of the
for (i = n-1; i > 0; i--) { program (back to the
if (a[i] == x) first off-by-one bug)
return i;
Forget for a moment
} that we know what the
return -1; bug is!
}

48
Return to Our Example
int findLast (int a[], int n, int x) {
// Returns index of last element in a
// equal to x, or -1 if no such.
// n is length of a
int i; What kind of coverage
might we want to think
for (i = n-1; i > 0; i--) {
about when testing this
if (a[i] = x) code?
return i;
}
return -1;
}

49
Return to Our Example
#define N 5 // 5 is “big enough”?
int testFind () {
int a[N];
int p, i;
for (p = 0; p < N; p++) {
random_assign(a, N)
a[p] = 3;
for (i = p; i < N; i++) { What kind of coverage
if (a[i] == 3) does this tester exploit?
a[i] = a[i] – 1;
}
printf (“TEST: findLast({”);
print_array(a, N);
printf (“}, %d, 3)”, N);
assert (findLast(a, N, 3) == p);
}
}

50

You might also like