Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 10

Assertive Debugging: Correcting Software As If We Meant It

Debugging is an art that needs much further study …. The most effective debugging
techniques seem to be those which are designed and built into the program itself—
many of today’s best programmers will devote nearly half of their programs to
facilitating the debugging process on the other half; the first half … will eventually
be thrown away, but the net result is a surprising gain in productivity

—Donald Knuth, The Art of Computer Programming1,vol. 1, page 189.

Introduction

As Don Knuth implies, debugging is a much-neglected subject, and we are paying a terrible
price for that neglect. There has been little progress in debugging methodology in half a
century,1 with the result that projects everywhere are bogged down because of buggy
software. The price in lost time and wasted resources, when the projects are commercial,
must run into the billions; the price when the projects are military is paid not only in dollars
but in lives. This situation is intolerable; new ideas and approaches must be found. This paper
offers one such new approach.

This is a proposal for the adoption of a new system for debugging software, the Assertive
Debugging System, which would transform debugging from a minor art form to a modern
industrial process. It is an exploitation of a very old idea—assertions seem to have been
suggested first by John von Neumann in 19472—but it does something with assertions that
neither he nor anyone else, to my knowledge, has proposed, much less done: it uses them
systematically and exhaustively rather than as ad hoc tools that are employed only when the
programmer remembers them and feels like using them. In doing so, it transforms assertions
from an idea that has been floating around for half a century without achieving much, into a
technology that could effect a revolution in program development. And unlike the methods
Knuth had in mind, it does not throw away that part of the program devoted to debugging,
but preserves it as valuable documentation of the state of the subject program, and for later
re-use when that program is modified.

Buggy Software: the Major Bottleneck

It would be nearly impossible to find a scientific or engineering project these days that isn’t
dependent on computing, and almost as hard to find one that wasn’t slipping its schedule
because of buggy software, so the debugging problem is a critical one for nearly all our
projects. The penalties we pay for buggy software are high already, causing loss of business
due to customer dissatisfaction and lateness in bringing products to market, but will be getting
much higher as computers are used increasingly for critical applications—mission-critical and
even life-critical. In such applications, being able to take, and prove you have taken, serious
debugging measures is going to become much more important; even legally required; for
applications on which so much depends, today's half-hearted gestures at debugging will no
longer be acceptable. The Assertive Debugging System (ADS) proposed here represents an

Mark Halpern 3/16/2018 Page 1 of 10


approach to program debugging that directly addresses these issues: it enables developers to
shorten the debugging process, and it supports the systematic and documentable debugging of
software objects that I contend will soon be required just to stay in business—perhaps
required just to stay out of jail.

Today’s Debugging Technology: Rounding Up the Usual Suspects

The most remarkable thing about debugging today is how little it differs from debugging at the
dawn of the modern computing age, half a century ago. We still do it by letting a faulty
program run up to what we conjecture is a critical point, then stop execution and look at the
state of what we think are the key variables. If one of these variables differs in value from
what we expected, we try to understand how it could have assumed that value. If we can't
understand where it went wrong, we repeat the process, stopping at some earlier point. After
an unpredictable number of iterations of this process, we finally stop the program close
enough to the location of the bug, and the standard revelation occurs: we find that we have
forgotten to reset some counter, or flush some buffer, or allow for the overflow of some data
structure, or committed one of the other half-dozen classic programming errors. This is how
software was debugged in the mid '50s, and how it is debugged today. It is a process that will
always, if time and customer patience permit, eventually find the bug that’s troubling you—
but, usually, only that particular occurrence of it, and only after a debugging effort of
unpredictable length, and without leaving anyone the wiser about the program being
debugged, or about how to find other such bugs.

What is a bug?—a taxonomy of computing problems

To make clear which are the really troublesome bugs, the ones that ADS is meant to deal with,
I offer here a high-level taxonomy of software problems in general, with estimates of their
relative gravity. There is nothing original in it; all it does is to gather and organize some
common truths, and put them in a form convenient for understanding ADS. Only programmer
errors are considered here—problems due to hardware failure, operator error, or other
conditions not under the programmer's control are not nearly so difficult to deal with, nor so
serious a problem:

1. Algorithm design errors. The programmer (or his client) has misunderstood the
problem, and hence the way to solve it; consequently, his algorithm, even if
implemented perfectly, will not work. For example, he may be trying to compute the
orbit of an artificial Earth satellite on the assumptions that the planet is perfectly
spherical and uniform in density. His error has nothing to do with computing, but
rather with his or his client’s understanding of the problem he is dealing with.

2. Program design errors. The programmer’s understanding of his problem, and his
approach to solving it, are correct, but he has blundered in designing a program to
implement that solution. For example, he has failed to realize that the program he is
Mark Halpern 3/16/2018 Page 2 of 10
expecting the computer to execute would take longer to run than the expected lifetime
of the universe, or does not cover all the conditions that are logically possible. Such
an error is computer-related; it reflects a defect in the programmer's understanding—
not of any specific computer or programming language, but of computing in general.

3. Program implementation errors. Both the programmer’s understanding of his


problem and his program design are correct, but he has erred in specifying the
instructions to be executed by his computer. Of this type, there are two varieties:

a. Formal or syntactic errors. His program has violated a rule imposed by his
program-development tools—but the violation is of the type caught by those
tools.

b. Substantive or logic errors. His program compiles but does not run to
completion, or runs but yields bad output. He has either made some
mechanical error (e.g., a typo), or a formal error of a type not caught by the
development system, or—and this is the critical type—an error in detailed
program logic; e.g., neglected to flush a buffer, or written beyond the end of a
data area. (Conceivably, he may have encountered a bug in his program-
development tools. This, of course, is not his fault, but it is the fault of a
programmer—another programmer.)

Type 1 errors have nothing to do with computing; they are just plain old ignorance,
carelessness, or stupidity, for which no general remedy is known. Type 2 errors are computer-
related, but are not particularly troublesome; they are so gross that they are usually found
early in the program’s design stage, and they are relatively uncommon. Type 3a are already
reasonably well handled—most modern program-development systems detect all the common
syntactic errors, and closely pinpoint them; sometimes they can even fix them, as the program
used to compose this article silently changes hte to the.

The Really Dangerous Bugs

It is Type 3b errors that are the real villains: easy to introduce, hard to notice, and patient in
waiting for the worst possible moment to manifest themselves. The reason they are so great a
problem is that they are so trivial, so inconspicuous, so hard to focus on. Type 3b bugs—
henceforth just "bugs"—are dangerous precisely because they are seldom immediately
troublesome; very often, a program infected with them is asymptomatic until it crashes
disastrously or yields obviously faulty output—they often let programs run with no sign of
trouble long after they have in fact corrupted the results. By the time it is evident that
something is wrong, much has happened to delete or corrupt the evidence needed to
determine just where the problem originated; hence the long and painful period of
backtracking that the debugging process almost always begins with.

Mark Halpern 3/16/2018 Page 3 of 10


What is needed, then, to deal with the debugging problem is some way to make bugs manifest
themselves quickly, so as to give us the earliest possible warning of their existence, and let us
take action before continued program execution can obliterate their traces. Ideally, we would
like bugs to become so blatant that their presence can be detected even before they have
acted; we want to catch them when they are just about to do their dirty work. That is what
ADS is designed to do.

How Assertive Debugging Works

The way to catch bugs while they’re fresh and out in the open is by monitoring at run time the
behavior of virtually every variable in the program, looking for violations of assertions about
their behavior made by the programmer when he defined them. “Variable” means here not just
those quantities a mathematician would think of and label as such, but any program construct
any of whose properties change in a predictable way, either absolutely or relative to some
other program construct. Among these would be the numeric variables that specify how often
a loop is to be traversed, how many characters a buffer can hold before it is to be written out,
how many states a switch can assume, and so on; they define, collectively, the route the
program is meant to follow. It is the major premise of ADS that no bug can take effect
without very soon causing some variable to violate a constraint, and that if such violations are
systematically detected, virtually every bug will cause an alarm to sound while it is still
"fresh," easily found and understood.

Note that this approach differs radically from that of almost all academic approaches to the
debugging problem, which tend to the abstract and formal: they typically try to identify the
sources and causes of bugs, and to create ever more detailed and precise classifications of
bugs. They thereby run the danger of becoming studies of programmer psychology and
sociology rather than providers of immediately applicable tools and procedures for the
elimination of bugs. ADS is founded on the view that the root causes of bugs are deep-rooted
human weaknesses – ignorance, laziness, forgetfulness, and the like – that are not likely to be
abolished any time soon, and that the attempt to build a complete phylogeny and classification
of bugs is neither needed nor possible.

On the positive side, ADS is founded on the view that all bugs, no matter what human
weakness caused them, nor where they fit into some taxonomy of bugs, have the same result,
if they have any result at all: they cause the execution of the wrong code, or of code at the
wrong time. (If the bug is caused by an act of omission – the failure to reset a counter, say,
before re-cycling through an array – it is the code that depends on the resetting of that counter
that is now the “bad”code, the code that actually causes abnormal program results, even
though it is not itself buggy.) And the execution of such bad code, given the assertions that an
ADS program contains, will virtually always quickly violate a programmer-defined constraint,
thereby signaling its existence and at least approximate location.

Think of a source-language processor as a funnel into which a program containing a great


variety of newly-generated bugs has been introduced. The bugs with which the program is
Mark Halpern 3/16/2018 Page 4 of 10
infested will in general be the result of a variety of human failings, and the compiler or
interpreter into which this buggy program has been fed has no way of detecting them by
applying some theory of why and how programs contain bugs – and would not profit much
even if its theory about why programmers create bugs were correct. But what the language
processor can do, given ADS, is to concentrate on the funnel at its narrowest point, where the
inputs fed into it, no matter what their origin, are reduced to one kind of stuff: executing code
– at which stage it becomes easy to catch the buggy code, because it’s doing something
known to be wrong.

Changing the metaphor to illuminate ADS from another angle, the ADS strategy is analogous
to that of the U. S. Navy during the Cold War: they wanted to know the location of all
Russian submarines as they ventured out from their ports into the open ocean. The Navy did
this not by attempting the impossible task of patrolling and searching every square mile of the
Atlantic and Pacific oceans; they simply noted that all Russian subs, in making their transit
into the open sea, had to pass through one or another of a very small number of narrow
passages – choke points, as they were dubbed – and that close monitoring of those choke
points would make it virtually impossible for any of those subs to make the transit without
being spotted by Western observers, automated or human. The monitoring was chiefly sonic –
the sound of a submarine was well known, sometimes to the point where the identity of the
specific boat whose passage was being monitored could be deduced. One of the chief factors
making this feat possible – besides the relative noisiness of Cold War-era Russian submarines
– was the U.S. Navy’s throrough knowledge of the natural sounds to be found in the waters
being monitored; these could be electronically “subtracted” from the total collection of sounds
being heard at any moment, leaving only the unnatural, man-made sounds, such as those made
by a transiting sybmarine. This “subtraction of the uninteresting” is analogous to what is
accomplished by assertions; they define the proper, wanted course of the program, so that
violations stand out bold and clear.

To vary the metaphor yet again, the rigorous and systematic testing of such by ADS of
assertions throughout program execution amounts to erecting walls on both sides of the very
narrow path that a program must take if its results are to be correct, so that the slightest
deviation from that path causes an almost immediate collision between the running program
and some assertion. Consequently, something valuable is learned from every execution-time
failure: a bug is found (or at least its hiding area is narrowed down significantly), or a
programmer’s misconception is uncovered.

Using ADS

For each of his program constructs, the programmer asserts at definition time all the
constraints on its behavior he can think of. The possible constraints include the following;
others will doubtless suggest themselves as experience in the use of ADS grows:

 its maximum and minimum size


 the step size by which it will vary
Mark Halpern 3/16/2018 Page 5 of 10
 whether it is cyclic, monotonic, or 'random' – i.e., data dependent -- in the sequence of
values it can assume
 the relationships it bears to one or more other variables
 an explicit list of the values it may take on, or those it may not.
 for a pointer or link, the type of construct it can point to
 the time constraints, if any, on any portion of the program, either in absolute units of
time (e.g., milliseconds) or in relation to some other event (e.g., before the execution
of some specified instruction). Note that this makes it possible for ADS to debug “real
time” software, where certain tasks must be accomplished within specified periods.

These assertions are expressed in a notation that is a natural extension of the source language
the programmer uses, and they may be grouped in various ways, so that the programmer can
activate or deactivate sets of related assertions with one command.

At each compilation of a subject program, the activated assertions generate into the object
program code that can be used to check the variables to which they apply, at every change of
value, for violations of any of the constraints so imposed ("can be used" because not every
test need be, or normally would be, executed every time). When the monitoring code detects
that any variable has violated (or in some cases, is about to violate) an assertion, it halts
execution of the program, and takes the exception action specified by the programmer.

At this point the programmer using ADS is in a very different position from that of the
programmer of today whose program has stopped at a breakpoint. When ADS stops an
execution, it is not because the program has just come to some point where the programmer
hoped that an examination of some of his variables will reveal something; it is at a point—
which may be far earlier or far later than the point at which that programmer would have
inserted a breakpoint—where an anomaly has definitely been detected, and almost certainly
detected very close to its origin. Nor is the programmer limited, when ADS reports such an
event, to the kind of hit-or-miss search that his counterpart today typically performs when at a
conventional breakpoint; if the bug is not immediately apparent, the ADS user’s next step is to
rerun his program with a greater degree of monitoring enabled for all code dynamically prior
to the point at which the anomaly was detected, so as to catch it at an even earlier moment.

There is unfortunately no practical way to demonstrate the validity of these claims for ADS
short of building and using it, but a thought-experiment can be helpful, if not conclusive.
Draw on your experience with a real bug that you’ve recently encountered, or create an
imaginary bug based on experience. Recreate on paper the state of the program variables just
before the faulty code caused the first deviation from correct behavior, but with assertion
checking, as just described, enabled. Consider, that is, that every variable—every predictably
varying construct—in the program was being monitored at every change of value for violation
of any the conditions you would have specified if you’d been using ADS. See how close to
the point where the bug first manifests itself ADS would have stopped and raised a red flag—
and imagine how much easier it would be to pinpoint that bug with such help than without it.
In almost all cases, I think you will find that the difference between conventional debugging
and ADS debugging is so great as to amount a difference in kind.
Mark Halpern 3/16/2018 Page 6 of 10
The Cost of Assertive Debugging

Most programmers exposed to the idea seem to agree that ADS would enable them to find
bugs much more quickly, but many protest that the cost would be prohibitive; program
execution, burdened with all that run-time checking, could cost hundreds of times more cycles
than ordinary execution. Many also quail at the thought of supplying all the assertions that
would enable ADS to rigorously monitor the execution of the program. These are not
unreasonable concerns, but they are more imposing at first glance than after a hard look.

To arrive at a true estimate of the cost of the ADS approach, the main requirement is to make
sure that it is being compared against a thoroughly realistic estimate of the cost of the
alternative, the present method of debugging. Present-day debugging runs often yield little or
no knowledge about the bug being sought, and the cost of those runs, in the full sense of cost,
must be taken into account. With ADS, on the other hand, every execution results in useful,
documented knowledge; either it finds a violation of an assertion, or it runs to completion,
reporting that with respect at least to the assertions activated, and the range of data involved,
the program is bugfree. Even if ADS reports a violation, but it turns out that the program
code is correct, and it is the programmer’s assertion that was mistaken, something of value is
learned. In fact, it may be that the information gained in that case is the most valuable of all; it
is not an individual bug that has been found, but a misconception in the programmer's mind
about his program—something even more important to correct.

Note, too, that the cost of ADS is almost entirely in machine cycles; what it saves is project
schedule slippage, software-engineer time, and time-to-market. In short, it preserves assets
that are growing ever more expensive, and it does so by spending assets that are growing ever
cheaper, and are already so cheap as to be in many cases not even worth metering. And this
comparison, of course, does not even take into account the most important of all the benefits
that ADS promises, which are not measurable: programs that work correctly while controlling
such things as the amount of radiation a cancer patient is receiving and the way an airliner
responds to a warning of a possible mid-air collision or loss of pressurization, or the accuracy
of an ABM missile as it seeks an incoming ICBM.

As for the burden imposed on the programmer, note that the formulation of assertions is done
as he declares the variables and data structures they refer to; that is, at the moment when his
intentions for them are clearer in his mind than they will ever be again. Present-day
implementation systems already require him to give full static definitions of his variables and
structures; ADS requires him to add explicit statements about how they’re allowed or
forbidden to change at run-time. In effect, the user does a lot of his debugging at the ideal
time: when he is not under pressure to get a specific bug fixed to make a specific delivery
date, and when his mind is clear and his knowledge fresh.

ADS and Conventional Debugging: Philosophies Compared


Mark Halpern 3/16/2018 Page 7 of 10
The great difference between debugging with ADS and with conventional tools is that ADS,
once primed, takes the initiative and, within the limits set for it by its user, does a completely
systematic job. In conventional debugging, the system in effect says to the user, "I know of
no reason to stop execution at this point, but you have ordered a halt here by setting a
breakpoint, so here's a window through which you can look at whatever variables you, in your
present state of knowledge, think might be relevant to your bug hunt. If an anomaly does
exist in the current state of your program, you are responsible for recognizing it; I wouldn't
know an anomaly if I tripped over one "

ADS, by contrast, says, "Way back in program design and development days, you told me
what you meant by an anomaly for each of many of the variables and other constructs in this
program; more recently, you told me which of these anomalies I was to keep looking for, and
what I was to do when I found one. I have now found one, and am reporting as instructed.
The details are as follows:..." With ADS, the software engineer does the planning, the
debugging system does the heavy lifting. The difference between the two could spell the
difference between watching every project slip because of buggy software, and turning
programming into a respectable and reliable industrial discipline.

Mark Halpern 3/16/2018 Page 8 of 10


1
Compare the opening words of Andreas Zeller, “Automated Debugging: Are We Close?” IEEE Computer (November
2001), pages 2-7: “For the past 50 years, software engineers have enjoyed tremendous productivity increases as more
and more tasks have become automated. Unfortunately, debugging—the process of identifying and orrecting a
failure’s root cause—seems to be the exception, remaining as labor-intensive and painful as it was five decades ago.”
I take special pleasure in quoting these words of Zeller’s, because my claim that little has been accomplished with
respect to debugging over the last fifty years has occasioned some resentment on the part of those who have been
working in that area, and they have more than once referred me to Zeller to show me how wrong I am.

Addendum: Jack Ganssle, “Twenty years on,” Embedded Systems Design (January 2008), pp. 45-48, says at page 46
“Oddly, debuggers were [ca. 1988] in some ways better than those today.”
2
Herman H. Goldstine, The Computer from Pascal to von Neumann (Princeton U.P., 1972), page 268.

Mark Halpern is a programmer and software designer whose experience goes back to the days when Fortran was the latest
thing in computing systems. An account of his career, and other writings of his, can be found at his Web site, www.rules-
of-the-game.com/. He welcomes comments; send them to markhalpern@iname.com.

possibly useful metaphors:

 bugs have to pass through chokepoints, like Russian warships. Whatever type or level a bug may be, it must, if it
is to have any effect on the program, manifest itself in erroneous code, and erroneous code must soon cause some
variable to violate some condition established by systematic use of assertions.
 bugs have to run a gantlet
 bugs have to negotiate a maze or labyrinth

Three ways a program can end if it’s buggy:

 program terminates short of correct end point (aborts, crashes)


 program terminates only when human intervenes (loops)
 program terminates at correct point, but output not as desired

Housman’s likening (in The Application of Thought to Textual Criticism) of an editor searching for textual problems to a
dog searching for fleas can usefully be applied to a programmer searching for bug:

“… textual criticism is not a branch of mathematics, nor indeed an exact science at all. It deals with a matter not rigid and constant, like
lines and numbers, but fluid and variable; namely the frailties and aberrations of the human mind, and of its insubordinate servants, the
human fingers. It therefore is not susceptible of hard-and-fast rules. It would be much easier if it were; and that is why people try to pretend
that it is, or at least behave as if they thought so. Of course you can have hard-and-fast rules if you like, but then you will have false rules,
and they will lead you wrong; because their simplicity will render them inapplicable to problems which are not simple, but complicated by
the play of personality. A textual critic engaged upon his business is not at all like Newton investigating the motions of the planets: he is
much more like a dog hunting for fleas. If a dog hunted for fleas on mathematical principles, basing his researches on statistics of area and
population, he would never catch a flea except by accident. They require to be treated as individuals; and every problem which presents itself
to the textual critic must be regarded as possibly unique.”

You might also like