Testing Done Right

Its been a rough couple weeks.
Not only did I have all sorts of catching-up to do after Code

PaLOUsa, but it also happened to be release week. And oh, do I hate release week.
Dont get me wrong, Im just as excited as anyone else when a new release of BuildMaster
comes out (it was release 2.3, in case you were wondering), but new releases mean testing.
And fixes. And more testing. And still more testing. And oh, do I hate testing.
As I worked my way through the drudgery that was release week, I spent a lot of time thinking
about testing. How did I get stuck on the test team? Why didnt I call in sick today? Cant we get
someone else to do this? There were even a few times I asked the fundamental and basic
question, what is the whole point of testing software in the first place? While I never found
answers to most questions, the latter question has a very simple answer.
Testing is performed to reduce the risk of introducing defects in production.
And thats it. Of course, you can do other things while testing (such as Test-Driven Development
or Training through Testing), but then again you can overload just about any activity. But when
the primary purpose of an overloaded activity is no longer necessary (commuting sixty miles,
for example), the secondary purposes are often achieved in a more efficient manner: you dont
need to drive a car in order to listen to the radio.
Types of Testing
In my career, Ive heard of dozens of different types of tests and testing techniques, but when
you look at things as a whole, its relatively simple. There are exactly five categories of tests
that can be performed on software, and theyre generally performed in a sequential order.
1. Integration Testing formal or informal testing that is generally performed by the
developer(s) responsible for the changes; this serves as a verification that the changes
are integrated into the larger application and are ready for functional testing
2. Functional Testing formal test scripts (i.e. documents containing test cases, or step-
by-step guides to verify functional requirements) are executed by testers
3. Acceptance Testing formal or informal testing to ensure that functional requirements
as implemented are valid and meet the business need
4. Quality Testing formal or informal testing to ensure that non-functional requirements
(regulatory, performance, etc) are met
5. Staging Testing verification that the software can be deployed to an environment that
matches the production environment
Every type of test fits into one or more of these categories. Automated unit tests, for example,
are generally considered to be a type of Integration Testing, as they test inter- and intra-
component integration. Guerilla testing i.e. clicking on a bunch of things in no particular order
hoping to find something thats broken is generally a form of acceptance testing, but its
chaotic enough that you could count it as integration or quality testing, too. But regardless of
the category, testing as a whole is performed to give a good enough answer to the following
questions:
1. Does the program function?
2. Does the program functionality meet the requirements?
3. Do the requirements meet the need?
4. Does the program meet quality standards?
5. Can the program be deployed?
I say good enough because no matter how hard you try, a definitive answer is impossible. At
best (i.e., with unlimited resources), you can be 99.999% confident that there will be no
defects in production.
The reason that no amount of testing can provide 100% accuracy goes back to a fundamental
problem posed by Plato: quis custodiet ipsos custodes?, or who will guard the guardians? The
tests themselves can be flawed and allow otherwise detectable defects to go to production.
While one could certainly test their tests, those test tests would face a similar problem. As
would the test test tests. And the test test test tests, ad infinitum.
Like many things that converge on perfection, there are significantly increasing costs as you
approach 100%. A five-minute smoke test may only provide 40% certainly, but it may cost five
hours of testing to achieve 60%, and fifty hours to achieve 80%.
An Inherent Risk
Because no amount of testing can prevent all defects, there is always a risk to making changes.
You might not think that simply changing the label next to a text field could cause anything to
go wrong, but it has happened before (Ive seen it first hand), and it will happen again. It
doesnt matter what the source of the defect is (code, deployment, configuration, etc.), the fact
is that the defect was introduced as an end result of a change.
The only way to completely avoid the inherent risk of change is to avoid change altogether, but
thats as feasible of an option as never leaving the house to avoid getting hit by a bus. As
important as it is to reduce the risk of defects through testing, its equally important to consider
the remaining, untestable risk.
1. Change Impact the estimated scope of a given change; this varies from change to
change and, like testing, is always a good enough estimate, as even a seemingly
simple change could bring down the entire system
2. Severity of Defect the impact of a defect on the overall system; this is a constant for
a given system, as there is no way of knowing how severe a defect might be
The risk of change, therefore, is the function of three factors:
{Change Impact} x {Severity of Defect} / {Thoroughness of Testing}
Were generally pretty good at balancing these three factors, at least when it comes to
computer-related changes. While network Operations will generally just implement a DNS
change without reproducing an entire network infrastructure just to make sure that the change
wont cause any problem, I doubt you would bat an eye if the Mars Rover team tested
commands before sending them using a replica Mars Rover sitting on a pile of replica Mars
rocks.
But oftentimes, our risk management of software-related changes is a little out of balance.
The Weakest Link
Every now and then, Ill talk to a developer that will proudly proclaim, weve finally achieved
100% code coverage!
For those unaware, that metric refers to the fact that every single line of code in a codebase
will be executed by an automated unit test. Its the Diebold XL2400 Bank Vault Door of unit
testing, complete with 16 thick stainless steel cladding and a time-sensitive lock. And like any
impenetrable entryway, its only as secure as its weakest link. Installing one next to a paned
window would render it entirely useless.
The same rule applies to those iron-clad code coverage metrics. Who cares if the theres 100%
code coverage when a unit test has a defect in it? Or if the requirements were misunderstood
by the developer? Or if the requirements were wrong? Or if its not PCI compliant? Or if it
breaks when it gets deployed to production?
It doesnt matter how comprehensive your unit tests are if your functional, acceptance, quality,
and staging tests are inadequate. Defects will simply slip through the most un-tested part.
When I explain all of this to that enthusiastic developer, the response is sometimes along the
lines of, but thats not my job, so who cares?
Thats an unfortunate attitude to have. While its true that we, as programmers, are paid
primarily to write code that conforms to requirements, the reason that were being paid is so
that the organization can have adequate software. Not caring about the end result reminds me
of that old contractor joke.
The foundation guy notices a problem with the plans, but says that the framer will fix it. The
framer says that the drywaller will fix it, the drywaller says the finish carpenter will fix it, the
finish carpenter says the painter will fix it, and the painter says I sure hope the homeowner is
blind and doesnt see it.
A true craftsman is not only passionate about the quality of his work, but of the quality of the
entire project.
Defects Are Not Necessarily Problems
You may have noticed that Ive used terms like good enough and adequate to describe the
quality that we should strive for instead of words like high and utmost. The key difference is
that adequate is a variable quality level that can be anything from below average to above
average, whereas high generally refers to well above average.
I understand that striving for adequacy may seem hypocritical for someone who has so
frequently lambasted low quality software, but allow me to explain. Actually, allow my
refrigerator to explain.

A little more than a year ago, I was in the market for kitchen appliances and had a pretty good
idea of what I could get with my budget. It wasnt a whole lot, but then again, neither was my
budget. And then I stumbled across this LG Side-by-Side. It was a 26.5 cubic foot fridge with
contoured doors, hidden hinges, an in-door icemaker, and several other features that I couldnt
afford, but it had a deeply discounted price tag that brought it in my budget. The sides of the
unit told why: it was as if Wolverine himself had unloaded it off the truck.
The unsightly gashes along the sides of the fridge were clearly a defect introduced during
shipping, but it wasnt a problem for me. In fact, it was a welcome defect, and I wish that
Wolverine was assigned to unload my microwave, stove, and dishwasher. I would have been
able to get more features at the cost of defects that were not problems to me.
Quality or Quantity
I realize that there are many differences between software and refrigerators, but the variable
nature of quality is similar. In many cases, introducing defects through change just isnt that big
of a problem.
Sometimes, it just makes sense to pay for quantity (more features) instead of quality (more
testing). Practically speaking, that means spending your time writing new features instead of
building unit tests, or vice versa. Either way, its not really our decision to make, since were not
the ones paying our salaries.
Ultimately, the decision of quality over quantity should rest with the individual or organization
that is paying for the software. Obviously, its our obligation as professionals to not only
educate these decision makers about the risks of defects, but to also provide recommendations
to help facilitate their decision.
This can sometimes be difficult, especially since many of us would love nothing more than to
build Xanadu. But just as it would be negligent to not recommend a comprehensive test plan
for the software that powers an MRI machine, it would be equally negligent to recommend that
same level of testing for your churchs congregation database.
It all goes back to assessing the Severity of Defects. In this example, its not really a problem if
Father Cronin needs to type in <br /> for a line break (it probably isnt even worth your bill
rate to fix that defect), but it certainly is a problem if an integer overflow might cause the
pressure monitoring system to crash, which in turn might cause an MRI machine explosion.
Testing Done Right
No amount of testing can completely eliminate the risk of introducing defects. The harder you
try, the more costly it becomes, and there comes a point where the cost of insuring against a
risk is no longer worth the premium. Therefore, Testing Done Right is an exercise in reducing
the risk of change to an acceptable level.
There are no hard and fast rules for determining what the acceptable level of risk is, but the
factors to consider are the frequency of changes, the impact of those changes, and the severity
of defects. But keep in mind how they relate to each other. For example, if an application will
likely only change every year or so, investing upfront in an automated testing may be:
Wasteful, assuming the application is relatively easy to maintain and a defect would
cause, at worst, a few hours of inconvenience a month
Valuable, if the application is complex and a defect might cause a stoppage at a
manufacturing facility
Equally important in assessing the risk is mitigating the risk. Remember, defects will simply slip
through the most un-tested part, so a balanced testing plan is critical.
1. Integration Testing does the program function?
2. Functional Testing does the program functionality meet the requirements?
3. Acceptance Testing do the requirements meet the need?
4. Quality Testing does the program meet quality standards?
5. Staging Testing can the program be deployed?
Its a complete waste of resources to develop an application with 100% unit test coverage but
limited functional, acceptance, and staging testing. Of course, the absolute best way to
reducing the risk of defects in a system is to minimize the codebase and to keep things as
simple as possible, thereby reducing the number of components and the overall complexity.
But thats a whole different soapbox.

Testing Done Right

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Testing Done Right

Uploaded by

Copyright:

Available Formats

Its been a rough couple weeks.

Not only did I have all sorts of catching-up to do after Code

You might also like