Developing A Reference Dataset To Accurately Evaluate SA Tools

Developing a Reference
Dataset to Accurately
Evaluate SA Tools
Paul E. Black
Computer Scientist, NIST
paul.black@nist.gov
+1 301-975-4794
OWAS
P
AppSe
c
This is a work of the U.S. Government and is not subject to
copyright protection in the United States.
DC
October 2005
The OWASP
http://www.owasp.org/
Foundation
Outline
Goals for a Reference Dataset

Content and Characteristics
Complications and Considerations
Tour of NIST’s Prototype RDS
OWASP AppSec DC 2005 2

Goals
Provide researchers, developers, and

consumers with a set of known bugs, flaws,
weaknesses, vulnerabilities.
This allows developers and researchers to test their
methods and consumers to evaluate tools.
The reference dataset will encompass a wide
variety of vulnerabilities, languages, platforms,
etc. from all phases of the software life cycle.
The dataset is a long-term effort needing
contributions from many sources.

How is a Reference Dataset Used?
 Researchers
 Is a new method “better” than existing methods?
Faster? Finds more flaws? Fewer false alarms? Produces
more reliable programs? In what instances?
 Saves time of assembling test cases
 Tool Developers
 What flaws should be caught? What problems should be
prevented in code?
 Suggests direction for new development
 End Users
 Understand need for better tools and techniques
 Save effort and improve thoroughness in evaluation
 Confirm utility of methods and tools

Role in the NIST SAMATE Project
 Surveys
 Draw from researchers, developers, and users
 Taxonomies
 Use common taxonomy of flaws
 Grouped by common taxonomy of tools and methods
 Gaps and research agendas
 Helps highlight what is needed
 Studies to develop metrics
 Well-characterized samples for study
 Enable tool evaluations
 Standard reference material for use in test plans

Outline


What’s in the Reference Dataset?
Samples of designs, source code, binaries, etc.

with known flaws and vulnerabilities.
Corresponding samples with the problems fixed.
Metadata to label samples with
Contributor, source for binaries, remediation, etc.,
Location and type of flaw(s), compiler or platform where
it occurs etc.,
Drivers, stubs, declarations, etc.,
Input demonstrating flaw, expected results, and
Comments
Test suites consisting of sets of samples

Characteristics of the Reference
Dataset
Metadata for sample is separate from the
sample itself.
Samples are “wild” (production), “synthetic”
(written to test), and academic (from
classes).
Samples are mostly small (10’s of lines), but
some are large (1000’s or millions of lines).
Samples are “write-only”, but may be
declared obsolete and superceded in test
suite.

Samples for Static vs. Dynamic Analysis
Samples are mostly for static analysis.

Some executable samples, but …
Dynamic analysis is better done in a test bed.
Network applications must have controlled access. network
applications.
Dynamic analysis may interfere with running systems.
Need to rebuild corrupted systems.
Hard to run legacy executables on new platforms.
Flawed executables should be auxotrophic: won’t run
without a special file, dll, etc. saying Warning:
Software has Known Vulnerability

Public Access
Submit a new sample.

Status is tentative until examined.
Add new “fixed” versions, inputs
demonstrating flaw, etc.
Retrieve samples, metadata, and test
suites.
Comment on a sample.
Report result of a run of a tool on a
sample.

Outline


Complications and Considerations
What about new threats, languages, etc.?

Reference dataset will evolve
May be slightly easier to “port” samples than
write from scratch
What if developers build to the test?
Use the error sample generator
These tests are only one consideration
How about malicious use, i.e., university
for crackers?
These are known flaws
Limit access to high risk flaws?

Outline


NIST Prototype Reference Dataset
(RDS)
Populating with material from
Fortify Software (80+ samples)

MIT Lincoln Lab (1000+ samples)
NIST (20 samples)
Other possible sources:
OWASP WebGoat
Foundstone HacmeBook and HacmeBank
CVE list of known vulnerabilities
Other tool developers
Home Page

Search RDS by Keyword

Result of Search

ID 7: gets is never safe for untrusted
input

Code for gets1-bad.c

Contact to Participate
 Paul E. Black
Project Leader
Software Diagnostics & Conformance Testing
Division, Software Quality Group, Information
Technology Laboratory, NIST
paul.black@nist.gov

Developing A Reference Dataset To Accurately Evaluate SA Tools

Uploaded by

Copyright:

Available Formats

You might also like

Developing A Reference Dataset To Accurately Evaluate SA Tools

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Developing A Reference Dataset To Accurately Evaluate SA Tools

Uploaded by

Copyright:

Available Formats

Developing a Reference

Goals for a Reference Dataset

OWASP AppSec DC 2005 2

Provide researchers, developers, and

OWASP AppSec DC 2005 3

OWASP AppSec DC 2005 4

OWASP AppSec DC 2005 5

Goals for a Reference Dataset

OWASP AppSec DC 2005 6

Samples of designs, source code, binaries, etc.

OWASP AppSec DC 2005 7

OWASP AppSec DC 2005 8

Samples are mostly for static analysis.

OWASP AppSec DC 2005 9

Submit a new sample.

OWASP AppSec DC 2005 10

Goals for a Reference Dataset

OWASP AppSec DC 2005 11

What about new threats, languages, etc.?

OWASP AppSec DC 2005 12

Goals for a Reference Dataset

OWASP AppSec DC 2005 13

Fortify Software (80+ samples)

OWASP AppSec DC 2005 15

OWASP AppSec DC 2005 16

OWASP AppSec DC 2005 17

OWASP AppSec DC 2005 18

OWASP AppSec DC 2005 19

OWASP AppSec DC 2005 20

You might also like