Fuzzing Challenges and Reflections

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/MS.2020.3016773, IEEE Software

Department: Head
Editor: Name, xxxx@email

Fuzzing: Challenges and


Reflections
Marcel Böhme
Monash University, Australia
Cristian Cadar
Imperial College London, UK
Abhik Roychoudhury
National University of Singapore, Singapore

Abstract—Fuzzing is a method to discover software bugs and vulnerabilities by automatic test


input generation which has found tremendous recent interest in both academia and industry.
Fuzzing comes in the form of several techniques. On one hand, we have symbolic execution,
which enables a particularly effective approach to fuzzing by systematically enumerating the
paths of a program. On the other hand, we have random input generation, which generates large
amounts of inputs per second with none or minimal program analysis overhead. In this article,
we summarize the open challenges and opportunities for fuzzing and symbolic execution as they
emerged in discussions among researchers and practitioners in a Shonan Meeting, and were
validated in a subsequent survey. We take a forward-looking view of the software vulnerability
discovery technologies and provide concrete directions for future research.

Introduction scale. Today, fuzzing is one of the most promising


The Internet and the world’s Digital Economy techniques in this regard. Fuzzing is an automatic
runs on a shared, critical open-source software bug and vulnerability discovery technique which
(OSS) infrastructure. A security flaw in a single continuously generates inputs and reports those
library can have severe consequences. For in- that crash the program. There are three main cat-
stance, OpenSSL implements protocols for secure egories of fuzzing tools and techniques: blackbox,
communication and is widely used by Internet greybox and whitebox fuzzing.
servers, including the majority of HTTPS web- Blackbox fuzzing generates inputs without any
sites. The Heartbleed vulnerability in an earlier knowledge of the program. There are two main
version of OpenSSL would leak secret data and variants of blackbox fuzzing: mutational and gen-
caused huge financial losses. It is important for erational. In mutational blackbox fuzzing, the
us to develop practical and effective techniques fuzz campaign starts with one or more seed
to discover vulnerabilities automatically and at inputs. These seeds are modified to generate new

IEEE Software Published by the IEEE Computer Society


c 2020 IEEE
1

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/MS.2020.3016773, IEEE Software

inputs. Random mutations are applied to random In this article, we provide reflections on recent
locations in the input. For instance, a file fuzzer advances in the field as well as concrete directions
may flip random bits in a seed file. The process for future research. We discuss recent impact
continues until a time budget is exhausted. In gen- and enumerate open research challenges from the
erational blackbox fuzzing, inputs are generated perspective of both practitioners and researchers.
from scratch. If a structural specification of the For a detailed, technical review, we refer the
input format is provided, new inputs are generated reader to Godefroid [9].
that meet the grammar. Peach (http://community.
peachfuzzer.com) is one popular blackbox fuzzer. Recent Impact
Greybox fuzzing leverages program instru- Fuzzing for automatic bug and vulnerability
mentation to get lightweight feedback which is discovery has taken by storm both the software in-
used to steer the fuzzer. Typically, a few con- dustry and the research community. The research
trol locations in the program are instrumented problem of finding bugs in a program by auto-
at compile time and an initial seed corpus is matic input generation has a long-standing history
provided. Seed inputs are mutated to generate which began well before Miller’s inception of
new inputs. Generated inputs that cover new con- the term ”fuzzing” in 1990 (http://pages.cs.wisc.
trol locations, and thus increase code coverage, edu/∼bart/fuzz/Foreword1.html). Yet, only now
are added to the seed corpus. The coverage- do we see mainstream deployment of fuzzing
feedback allows a greybox fuzzer to gradually technology in industry.
reach deeper into the code. In order to identify Using greybox fuzzing, Google has discov-
bugs and vulnerabilities, sanitizers inject asser- ered more than 16,000 bugs in the Chrome
tions into the program. Existing greybox fuzzing browser over the past eight years and more than
tools include AFL (https://lcamtuf.coredump.cx/ 11,000 bugs in over 160 open-source software
afl/), LibFuzzer (https://llvm.org/docs/LibFuzzer. projects over the past three years (https://google.
html), and Honggfuzz (https://github.com/google/ github.io/clusterfuzz/#trophies). Microsoft credits
honggfuzz). its whitebox fuzzing tool SAGE with saving
Whitebox fuzzing is based on a technique millions of dollars during the development of
called symbolic execution [6], which uses pro- Windows 7 [10]. Trail of Bits has been developing
gram analysis and constraint solvers to systemat- various fuzzing tools, including DeepState, a unit
ically enumerate interesting program paths. The testing framework that allows developers to fuzz
constraint solvers used as the back-end in white- the various units of their system (https://github.
box fuzzing are Satisfiability Modulo Theory com/trailofbits/deepstate). The 2016 DARPA Cy-
(SMT) solvers, which allow for reasoning about ber Grand Challenge had machines attack and
(quantifier free) first-order logic formulas with defend against other machines by exploiting and
equality and function/predicate symbols drawn hardening against software vulnerabilities. The
from different background theories. Whitebox Mayhem system [4], which was awarded two
fuzzers calculate the path condition of an input million dollars for winning the competition, made
i—the set of inputs which traverse the same path extensive use of whitebox fuzzing (https://www.
as i. The path condition is represented as an SMT darpa.mil/news-events/2016-08-04).
formula, e.g. i[0] = 42 ∧ i[0] − i[1] > 7. What has enabled this recent surge of interest
Given a seed input s, the path condition is in fuzzing? First, there is a tremendous need.
calculated and mutated (as opposed to mutating Life and business are increasingly permeated
the program input). The mutated path condition by software systems, and a security vulnerabil-
is then sent to a constraint solver to generate new ity in even the smallest system can have dire
inputs. The main benefit of this technique is that consequences. Second, we now have the incen-
by carefully keeping track of path conditions of tives and the required mindset. Some software
all inputs seen so far, it always generates an input companies have established lucrative bug bounty
traversing a new path (new control flow). Existing programs that pay top dollar for critical bugs.
whitebox fuzzing tools include KLEE [5] and Anyone, including the reader, can offer vulner-
SAGE [10]. ability rewards on bug bounty platforms, such as

2 IEEE Software

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/MS.2020.3016773, IEEE Software

HackerOne (https://www.hackerone.com/), which generates inputs and observes the program’s out-
provides ethical coordination and responsible dis- put. The community is actively working on how
closure. Independent security researchers can re- to fuzz programs that take highly-structured in-
port the discovered vulnerabilities and collect the puts, such as file parsers or object-oriented pro-
bounties. Some stakeholders take matters into grams. However, fuzzing cyber-physical systems,
their own hands, with several companies contin- which interact with the environment as part of
uously fuzzing their own software. their execution, or machine learning systems,
Third, we now have the tools. Many fuzzers whose behavior is determined by their training
are open-source, freely available, easy to use, and data, is an under-explored area.
very successful in finding bugs. For instance, the How do we fuzz stateful software, such as
KLEE symbolic execution engine (https://klee. protocol implementations, which can produce dif-
github.io/) has been freely available, maintained ferent outputs for the same input? Most greybox
and widely-used for more than ten years. As and whitebox fuzzers are written with a single
a result, several companies, such as Baidu, Fu- programming language in mind. How do we
jitsu and Samsung, have used and extended it to fuzz polyglot software which is written in several
test their software products. Similarly, the AFL languages? How do we fuzz GUI-based programs
greybox fuzzer (http://lcamtuf.coredump.cx/afl/) that take as inputs a sequence of events executed
is highly effective and easy to use. Its trophy case on a user interface? For whitebox fuzzing, we
includes bugs and security vulnerabilities found already know how SymEx can formulate con-
in a large number of open-source systems. straints on numeric or string-based input domains.
Lastly, this open-science approach and However, given a program whose input domain is
meaningful engagement between industry and defined by a grammar and/or protocol, how can
academia has facilitated rapid advances in a symbolic execution tool effectively formulate
fuzzing. For instance, fuzzers are getting faster, constraints on such ,,structured” input domains?
find more types of bugs and work for more
application domains. [C.2] More Bug Types. How can the fuzzer
identify more types of vulnerabilities? A signifi-
Challenges cant portion of current works on fuzzing focus on
In September 2019, we organized a Shonan simple oracles, such as finding crashes. We need
meeting on Fuzzing and Symbolic Execution studies of security-critical classes of bugs that
in Shonan Village Center, Japan (https://shonan. do not manifest as crashes and develope oracles
nii.ac.jp/seminars/160/). The meeting brought to- that can efficiently detect them. Vulnerabilities
gether thought leaders, distinguished researchers, are often encoded as assertions on the program
tool builders, founders, and promising young sci- state. Using such assertions, we already know
entists from the greybox and whitebox fuzzing how we can discover memory- or concurrency
(SymEx) communities. Below, we discuss the related errors. The discovery of side-channel vul-
main challenges identified during the meeting. nerabilities, such as information leaks or timing,
We phrase the challenges as research questions cache, or energy-related side-channels is currently
and hope that they provide guidance and direction an active research topic [15]. Going forward, we
going forward. should invent techniques to automatically detect
and invoke privilege escalation, remote code exe-
Automation cution, and other types of critical security flaws,
Automated vulnerability discovery is a game not only in C/C++ but also in other programming
between adversaries. Given the same resources, languages.
the adversary with the fuzzer that finds more [C.3] More Difficult Bugs. How can we find
vulnerabilities has the advantage. ”deep bugs” for which efficient oracles exist,
[C.1] More Software. How can we fuzz ef- but that have thus far evaded detection? Deep
ficiently more types of software systems? We bugs are those that evaded discovery despite long
already know how to fuzz command-line tools fuzzing campaigns, e.g. because they are guarded
(AFL, KLEE) and APIs (LibFuzzer). The fuzzer by complex conditions, or because, using existing

TBD 2020
3

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/MS.2020.3016773, IEEE Software

techniques, they require many orders of mag- namically direct the fuzzer? How can the fuzzer
nitude more resources than available. Are there explain what prevents it from progressing, and
certain kinds of deep bugs that can be found ef- how can the auditor instruct the fuzzer to over-
ficiently with specialized approaches? Structure- come the roadblock?
aware and grammar-based fuzzing as well as [C.6] Usability. How can we improve the
the integration of static analysis and symbolic usability of fuzzing tools? Ethical hacking re-
execution with greybox fuzzing are promising quires a very special set of skills. Fuzzing already
directions [11], [19]. Moreover, we should in- simplifies the process by automating at least the
vestigate boosting strategies to maximize fault test input generation. How can we make fuzzing
finding such as the AFLFast effort which enables more accessible to developers and software engi-
faster crash detection in greybox fuzzers [1]. We neers? How can we make it easier to develop test
can study the utility of GPUs and other means of drivers for fuzzers? How can we integrate fuzzing
efficient parallelization to maximize the number into the day-to-day development process, e.g., as
of executions per unit time [16]. component of the CI-pipeline or as a fuzz-driven
[C.4] More Empirical Studies. What is the unit testing tool in the IDE? Particularly our
nature of vulnerabilities that have evaded dis- industry participants and respondents identified
covery despite long fuzzing campaigns? Why usability as most important.
have they evaded discovery? We need empirical How can we prepare the output of a fuzzer for
studies to understand the nature and distribution human consumption? A fuzzer produces an input
of security vulnerabilities in source code. that crashes the program and the developer must
find out why it crashes. How can we extend the
The Human Component fuzzer such that it generates a detailed bug report
or even a bug fix for each identified vulnerability?
[C.5] Human-In-The-Loop. How can fuzzers
Automated repair techniques which have emerged
leverage the ingenuity of the auditor? Many re-
recently can help in this regard [13]. Recent work
searchers think of fuzzing as a fully automated
on Linux kernel fuzzing [18] discusses techniques
process that involves the human only at the begin-
to address usability challenges while deploying
ning when the software system is prepared for the
the kernel fuzzer syzkaller on enterprise Linux
fuzzer and at the end when the fuzzer-discovered
distributions. Generalizing such enhancements to
vulnerabilities need to be reported. In reality, se-
a fuzzer for general-purpose software remains a
curity auditors use fuzzers in an iterative manner.
challenge.
During our meeting, Ned Williamson, a prolific
security researcher at Google, demonstrated his
semi-automated approach to vulnerability discov- Fuzzing Theory
ery. Ned would first audit the code to identify It is important for any discipline to stand on
units that may contain a security flaw. He would a firm scientific foundation. We have seen many
prepare the unit for fuzzing, run the fuzzer for technical advances in the engineering of fuzzing
a while, and identify roadblocks for the fuzzer. tools. But why do some fuzzers work so much
Ned would manually patch out the roadblock to better than others? What are their limitations? We
help the fuzzer make better progress. If the fuzzer want to be able to explain interesting phenomena
spends more time fuzzing less relevant portions that we have observed empirically, make predic-
of the code, Ned would adjust the test driver and tions and extrapolate from these observations. To
re-focus the fuzzer. Once a potential vulnerability do this, we need a sound theoretical model of the
is found, he would backtrack, add each roadblock fuzzing process.
back, and adjust the vulnerability-exposing input [C.7] How can we assess residual security
accordingly. risk if the fuzzing campaign was unsuccessful?
This semi-automated process raises several Blackbox and whitebox fuzzing sit on two ends
research questions. How can we facilitate a more of a spectrum. A whitebox fuzzer might provide a
effective communication between fuzzer and se- formal guarantee about the absence of detectable
curity auditor? How can the security auditor dy- vulnerabilities. If we assume that a symbolic ex-

4 IEEE Software

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/MS.2020.3016773, IEEE Software

ecution (SymEx) engine can enumerate all paths sonable time”, ”software that we care about”, or
in piece of code and the oracle is encoded as ”important bugs”? If no important bugs are found,
assertions, then whitebox fuzzing can formally how do we measure effectiveness? How do we
verify the absence of bugs. If it can enumerate prevent over-fitting? What is a fair baseline for
only some paths in reasonable time, we can still comparison?
provide partial guarantees [8]. To make SymEx To measure progress, we need to develop
applicable in practice, correctness or complete- reasonable standards for comparison against pre-
ness are traded for scalability. How does this vious work. We encourage the community to be
trade-off affect the guarantees? open about releasing tools, benchmarks, and ex-
In contrast, a blackbox fuzzer can never guar- perimental setup publicly for anyone to reproduce
antee the absence of vulnerabilities for all inputs. the results and to build upon.
What is the residual risk that at the end of a
fuzzing campaign a bug still exists in the program Benchmarks
that has not been found? If we model blackbox
[C.9] How can we evaluate specialized
fuzzing as a random sampling from the program’s
fuzzers? There are programs that take structured
input space, we can leverage methods from ap-
and those that take unstructured inputs. There
plied statistics to estimate the residual risk.
are stateful and stateless programs. There are
A greybox fuzzer uses program feedback to
programs where the source code is available
boost the efficiency of finding errors. However,
and programs where only the compiled binary
this program feedback introduces an adaptive
is available. There are programs that take inputs
bias. How do we account for this adaptive
via a file, a GUI, or an API. Extending fuzzing
bias when assessing residual risk? To answer
to different types of software systems is a key
such questions, we should develop statistical and
technical challenge (C.1).
probabilistic frameworks, and methodologies for
sound estimation with quantifiable accuracy. Similarly, some fuzzers are specialized for a
specific purpose. For instance, there are fuzzers
[C.8] What are the theoretical limitations of
that seek to reach a program location [2], [14]
blackbox, greybox, and whitebox fuzzing? Black-
or that focus on exposing specific types of bugs,
and greybox fuzzers are highly efficient—but
such as performance bugs [3].
at the cost of effectiveness. Unlike whitebox
However, existing benchmarks are often not
fuzzers, they struggle to generate inputs that
designed for these specialized tasks. If there is no
exercise paths frequented by few inputs. This
previous work, we need standards for researchers
tension raises several research questions. Given
to choose suitable subject programs and baselines
a program and a time budget, how can we se-
for comparison.
lect that fuzzing technique, or combination of
techniques, which finds the most vulnerabilities [C.10] How can we prevent overfitting to a
within the time budget? How do program size and specific benchmark? For any benchmark suite,
complexity affect the scalability and performance there is always the danger of overfitting. Despite
of each technique? How much more efficient is a demonstration of superiority on the benchmark
an attacker that has an order of magnitude more subjects, a fuzzer might still be inferior in general.
computational resources? With an understanding What are reasonable strategies to mitigate overfit-
of the limitations of existing approaches, we can ting? Can we propose a fair and sound policy to
develop more advanced techniques. collect benchmarks? How can we avoid ”single-
source” types of benchmarks that are contributed
Evaluation and Benchmarks by just one group and might give undue control
In order to validate a claim of superiority to a single set of people?
for novel fuzzing tools and techniques, we need Fuzzing tool competitions could be part of
sound methods for evaluation. Generally speak- the solution for challenges C9 and C10. One
ing, the better fuzzer finds a larger number of model, inspired by constraint solving and veri-
important bugs in software that we care about fication competitions, is to have different compe-
within a reasonable time. But what is a ”rea- tition categories, such as coverage-based fuzzing,

TBD 2020
5

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/MS.2020.3016773, IEEE Software

directed fuzzing and so on. Within each cate- What can we do to make synthetic bugs more like
gory, there can be a further division based on real bugs? Which types of vulnerabilities are not
the type of bugs and applications the fuzzer is represented in synthetic bug benchmarks?
suited for. Tool builders can submit their own [C.12] Are real bugs, which have previously
benchmarks and fuzzers, which would allow in- been discovered with other fuzzers, representa-
dependent scrutiny of the entire process. T EST- tive? Another approach is to collect actual vul-
C OMP (https://test-comp.sosy-lab.org/) is an ex- nerabilities that have been found through other
isting competition that illustrates this model. means into a benchmark. However, this process
A second model is to come up with challenge is tedious, such that the sample size may be
problems in the form of buggy programs, and relatively low which would affect the generality
have tool developers directly apply the fuzzers to of the results. Secondly, the evaluation only es-
find the hidden bugs. This has the advantage of tablishes that the newly proposed fuzzer finds at
tool developers configuring their tools in the best least the same vulnerabilities that have been found
possible way for each task, but makes indepen- before. It does not evaluate how well the newly
dent reproduction of the results more challenging. proposed fuzzer finds new vulnerabilities. How
RODE 0DAY (https://rode0day.mit.edu/) is an ex- representative are the discovered vulnerabilities of
isting competition that illustrates this model. all (undiscovered) vulnerabilities? We could build
Another approach is a continuous evalua- a large, shared database of vulnerabilities in many
tion, where fuzzers are repeatedly used to fuzz software systems that have been found by several
real programs. For instance, as a concrete out- fuzzers or auditors over a period of time.
come of our Shonan meeting, Google has de-
veloped F UZZ B ENCH (https://github.com/google/ [C.13] Is coverage a good measure of fuzzer
fuzzbench) and committed computational re- effectiveness? When no suitable bug benchmark
sources to evaluate submitted fuzzers on submit- is available, we need other means of evaluating
ted benchmarks. In addition to scientific evalua- the effectiveness of a fuzzer. Code coverage is
tion of technical advances, this approach allows the classic substitute measure. The intuition is
direct application of these technical advances to a that vulnerabilities cannot be exposed if the code
large set of actual open-source software, to make containing the vulnerability is never executed.
critical software systems safer and more secure. How effective is coverage really at measuring
the capability of a fuzzer to expose vulnerabil-
Measures of Fuzzer Performance ities? We need empirical studies that assess how
During the evaluation of two fuzzing tech- strongly the increase in different coverage metrics
niques, which quantities should we compare? correlates with an increase in the probability to
What do we measure? Today, fuzzers are typ- find a vulnerability. In addition to code coverage,
ically evaluated in terms of their effectiveness there are many other measures of coverage such
and efficiency. When we are interested in security as GUI, constraint, model, grammar, or state
vulnerabilities, a fuzzer’s effectiveness for a soft- coverage. We should conduct empirical studies to
ware system is determined by the total number determine correlation and agreement of various
of vulnerabilities a fuzzer has the capability of proxy measures of effectiveness.
finding. In contrast, a fuzzer’s efficiency for a [C.14] What is a fair choice of time budget?
software system is determined by the rate at It is not possible to measure fuzzer effectiveness
which vulnerabilities are discovered. directly. If our measure is the number of bugs
[C.11] Are synthetic bugs representative? For found, then effectiveness is the total number of
evaluation, buggy software systems can be gen- bugs the fuzzer finds in the limit, i.e., when given
erated efficiently simply by injecting artificial infinite time. Instead, researchers can derive a
faults into an existing system [7]. We need to trivial lower bound on the effectiveness, i.e., the
study empirically whether such synthetic bugs are total number of bugs a fuzzer finds, by fixing
indeed representative of real and important secu- a time budget. Currently, this time budget is
rity vulnerabilities. If they are not representative, typically anywhere between one hour and one
how are they different from actual vulnerabilities? day. However, an extremely effective fuzzer may

6 IEEE Software

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/MS.2020.3016773, IEEE Software

take some time to generate test cases during all identified challenges as important or very
which time another fuzzer can generate several important on a 5-point Likert-scale. No major ad-
orders of magnitudes more test cases [12]. If the ditional challenges were identified. Other survey
chosen time budget is too small, the faster, yet results were directly added to the corresponding
less effective fuzzer might appear more effective. sections.
Thus, we should develop standards that facilitate
a fair choice of time budget when evaluating the Awareness and Education
effectiveness of a fuzzer. Fuzzing is used today in corporations in a
significant manner, often on a daily basis, for de-
Techniques versus Implementations tecting bugs and security flaws. Despite advances
[C.15] How do we evaluate techniques in- in static analysis and formal verification, fuzzing
stead of implementations? In order to demonstrate remains the primary automatic mechanism for
claims of superiority of a proposed technique, vulnerability discovery in most software products.
researchers compare an implementation of the However, the security of our software systems is
proposed technique to that of an existing tech- in the hands of each and every software engineer,
nique. In the implementation, the researcher can including future volunteers that contribute to crit-
make engineering decisions that can substantially ical open-source software. We believe awareness
affect the effectiveness of the fuzzer [17]. For and education, in the small and in the large, are
instance, a comparison between the AFL grey- of paramount importance.
box fuzzer against the KLEE whitebox fuzzer to One mechanism is the organization of
determine whether a whitebox fuzzing technique security-oriented hackathons and Capture-the-
outperforms a greybox fuzzing technique should Flag (CTF) competitions. For instance, the Build
always be taken with a grain of salt. If possible, it Break it Fix it contest from Maryland (https:
the proposed technique (e.g., an improvement to //builditbreakit.org/) represents an early success-
greybox fuzzing) is implemented directly into the ful attempt in this direction. The community
baseline (e.g., AFL). could also move towards competitions between
fuzzing tools (such as FuzzBench, Test-Comp,
Survey and Rode0Day) or organize regular fuzzing
To request feedback from the larger commu- camps.
nity on the identified challenges, we surveyed Another mechanism is to teach about fuzzing
further experts from industry and academia. Our in software engineering and cyber-security
objective was to identify points of contention, to courses. The second and third authors were ac-
add challenges or reflections that we might have tively involved in designing and delivering such
overlooked, and to solicit concrete pathways or courses at the university level. A key challenge
initiatives for some of the identified challenges. in developing such educational content is that
We sent an email invitation to software security the students need to be exposed to several tools,
experts who have previously published in fuzzing which takes a significant amount of the stu-
or have professional work on automatic vulnera- dents’ time. The recent development of online
bility discovery. Out of 24 respondents, 14 work books [20] can alleviate some of these issues by
in academia and 10 work in industry; 3 attended presenting an integrated resource and repository
the Shonan meeting. for getting familiarised with various variants of
The survey participants marked improving fuzzing.
automation (71%), building a theory of fuzzing
(63%), and finding valid measures of fuzzer per- Acknowledgments
formance (63%) as their Top-3 most important We thank the participants at the Shonan Meet-
challenges. While practitioners and researchers ing on Fuzzing and Symbolic Execution, and
were mostly in agreement, practitioners demon- the survey respondents. This project has received
strated a particularly greater interest in the devel- funding from European Research Council (ERC)
opment of human-in-the-loop approaches (+0.8 under the European Union’s Horizon 2020 re-
Likert points). On average, a respondent marked search and innovation programme (grant agree-

TBD 2020
7

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/MS.2020.3016773, IEEE Software

ment No. 819141). This work was partially sup- with Code Fragments. USENIX Security Symposium
ported by the National Satellite of Excellence (USENIX Security 2012).
in Trustworthy Software Systems and funded 12. George Klees, Andrew Ruef, Benji Cooper, Shiyi Wei,
by National Research Foundation (NRF) Singa- Michael Hicks. Evaluating Fuzz Testing. ACM Confer-
pore under National Cybersecurity R&D (NCR) ence on Computer and Communications Security (CCS
programme. This work was partially funded by 2018).
the Australian Research Council (ARC) through 13. Claire Le Goues, Michael Pradel, Abhik Roychoud-
a Discovery Early Career Researcher Award hury. Automated Program Repair. Communications of the
(DE190100046). ACM vol. 62, no. 12 (Dec. 2019).
14. Paul Dan Marinescu, Cristian Cadar. KATCH: High-
Coverage Testing of Software Patches. ACM Symposium
REFERENCES
on the Foundations of Software Engineering (ESEC-FSE
1. Marcel Böhme, Van-Thuan Pham, Abhik Roychoud- 2013).
hury. 2016. Coverage-based Greybox Fuzzing as Markov 15. Shirin Nilizadeh, Yannic Noller, Corina S. Pasareanu.
Chain. ACM Conference on Computer and Communica- DifFuzz: differential fuzzing for side-channel analysis.
tions Security (CCS’16). International Conference of Software Engineering (ICSE
2. Marcel Böhme, Van-Thuan Pham, Manh-Dung Nguyen, 2019).
Abhik Roychoudhury. 2016. Directed Greybox Fuzzing. 16. Ajitha Rajan, Subodh Sharma, Peter Schrammel,
ACM Conference on Computer and Communications Daniel Kroening. Accelerated test execution using GPUs.
Security (CCS’17). Automated Software Engineering Conference (ASE
3. Jacob Burnim, Sudeep Juvekar, Koushik Sen. WISE: 2014).
Automated test generation for worst-case complexity. 17. Eric F. Rizzi, Sebastian Elbaum, Matthew B. Dwyer.
International Conference of Software Engineering (ICSE On the Techniques We Create, the Tools We Build,
2009). and Their Misalignments: A Study of KLEE. International
4. Sang Kil Cha, Thanassis Avgerinos, Alexandre Re- Conference on Software Engineering (ICSE 2016).
bert, David Brumley. 2012. Unleashing Mayhem on Bi- 18. Heyuan Shi, Runzhe Wang, Ying Fu, Mingzhe Wang,
nary Code. IEEE Symposium on Security and Privacy Xiaohai Shi, Xun Jiao, Houbing Song, Yu Jiang, Jia-
(S&P’12). Guang Sun. Industry practice of coverage-guided en-
5. Cristian Cadar, Daniel Dunbar, Dawson Engler. 2008. terprise Linux kernel fuzzing. ACM Symposium on the
KLEE: Unassisted and Automatic Generation of High- Foundations of Software Engineering (ESEC-FSE 2019).
Coverage Tests for Complex Systems Programs. 19. Nick Stephens, John Grosen, Christopher Salls, Audrey
USENIX Conference on Operating Systems Design and Dutcher, Ruoyu Wang, Jacopo Corbetta, Yan Shoshi-
Implementation (OSDI’08). taishvili, Christopher Kruegel, Giovanni Vigna. Driller:
6. Cristian Cadar, Koushik Sen. 2013. Symbolic Execution Augmenting Fuzzing Through Selective Symbolic Exe-
for Software Testing:Three Decades Later. Communica- cution. Network and Distributed System Security Sym-
tions of the ACM vol. 56, no. 2 (Feb. 2013). posium (NDSS 2016).
7. Brendan Dolan-Gavitt, Patrick Hulin, Engin Kirda, Tim 20. Andreas Zeller, Rahul Gopinath, Marcel Böhme, Gor-
Leek, Andrea Mambretti,William K. Robertson, Freder- don Fraser, Christian Holler. 2019. The Fuzzing Book.
ick Ulrich, Ryan Whelan. LAVA: Large-Scale Automated https://www.fuzzingbook.org/
Vulnerability Addition. IEEE Symposium on Security and
Privacy (IEEE S&P 2016).
8. Antonio Filieri, Corina S. Pasareanu, Willem Visser. Reli-
ability Analysis in Symbolic Pathfinder. International Con-
ference of Software Engineering (ICSE 2013).
9. Patrice Godefroid. Fuzzing: Hack, Art and Science. Com-
munications of the ACM, vol. 63, no. 2 (Jan. 2020).
10. Patrice Godefroid, Michael Y. Levin, David Molnar. Auto-
mated White-box Fuzz Testing. Network and Distributed
System Security Symposium (NDSS 2008).
11. Christian Holler, Kim Herzig, Andreas Zeller. Fuzzing

8 IEEE Software

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.

You might also like