Professional Documents
Culture Documents
Software Techniques For Good Practice in Audio and Music Research
Software Techniques For Good Practice in Audio and Music Research
This Convention paper was selected based on a submitted abstract and 750-word precis that have been peer reviewed
by at least two qualified anonymous reviewers. The complete manuscript was not peer reviewed. This convention
paper has been reproduced from the author’s advance manuscript without editing, corrections, or consideration by the
Review Board. The AES takes no responsibility for the contents. Additional papers may be obtained by sending request
and remittance to Audio Engineering Society, 60 East 42nd Street, New York, New York 10165-2520, USA; also see
www.aes.org. All rights reserved. Reproduction of this paper, or any portion thereof, is not permitted without direct
permission from the Journal of the Audio Engineering Society.
ABSTRACT
In this paper we discuss how software development can be improved in the audio and music research commu-
nity by implementing tighter and more effective development feedback loops. We suggest first that researchers
in an academic environment can benefit from the straightforward application of peer code review, even for
ad-hoc research software; and second, that researchers should adopt automated software unit testing from
the start of research projects. We discuss and illustrate how to adopt both code reviews and unit testing
in a research environment. Finally, we observe that the use of a software version control system provides
support for the foundations of both code reviews and automated unit tests. We therefore also propose that
researchers should use version control with all their projects from the earliest stage.
implemented model, evaluating how closely it cor- are used to facilitate short development cycles, with
responds with observations from reality. But any regular and quick software releases, making develop-
requirement to verify the implementation, to estab- ment as flexible as possible.
lish that the model described in the publication is
the same as that actually implemented, usually goes Even in software companies that do not adopt Agile
unmet. procedures, however, there are some techniques very
widely used in commercial development—including
We have previously argued [2] that a lack of quality in companies throughout the audio and music soft-
or confidence in the implementation may be a cause ware industry—that are not widely adopted in
of failure to publish code. Similarly, studies of data academia. Fig. 1 sketches some of these techniques,
sharing in other fields have reported a relationship making a hierarchy of feedback mechanisms operat-
between willingness to share supporting data and ing at different speeds, from most immediate (cycle
quality of reporting in the paper [3]. A cause for a—compiler errors) to least (cycle g—reports from
lack of confidence with code may be the lack of for- users of published software).
mal training in software development for researchers
working in academic fields outside computer science.
A recent study [4] found a great deal of variation
in the level of understanding of standard software
engineering concepts by scientists, and found that
in developing and using scientific software, informal
self-study or learning from peers was commonplace.
The audio and music research community is par-
ticularly heterogeneous in terms of academic back-
grounds, with researchers having very different ori-
gins: as well as those from computing and engineer-
ing fields, many researchers in this community come
from backgrounds with no formal training in soft-
ware, such as music, dance or performance [1]. Such
a wide variety of backgrounds poses a big challenge,
given that many of these researchers do not have Fig. 1: Software development feedback cycles in in-
the skills or desire to become involved in traditional dustry
software development practice or in publication and
maintenance of code. Researchers are familiar with a similar set of feed-
In the remainder of this paper we will look into the back processes in the development of written publi-
software development practices of industry and dis- cations: collaborative review and editing with imme-
cuss how some of these can be adopted by the audio diate peer researchers and supervisors, formal peer-
and music research community. review cycles, and so on. These processes provide
new perspectives which help to produce quality re-
2. FEEDBACK CYCLES sults more quickly and help to remove potentially
serious errors.
In commercial software development, much of exist-
ing “best practice” consists of trying to shorten or 3. FEEDBACK CYCLES IN RESEARCH
simplify feedback cycles so that the developer learns
about mistakes as soon as possible. This is made The three most typical feedback cycles in research
explicit in incremental software development meth- software development are illustrated in Fig. 2: (a)
ods such as the Agile process [5], where there is a compiler errors, or immediate feedback from an in-
strong emphasis on individuals and interactions and teractive development environment; (f) using the
use of techniques such as pair programming, contin- code to run experiments, examining the results, and
uous integration and unit testing. These techniques judging whether or not they appear to make sense;
and (g) obtaining feedback from users following pub- itably be applied even for small informal projects in
lication of the software. an academic research environment.
4. CODE REVIEWS
and training developers in “how to read code” makes A unit test is a means of automating and repeating
a substantial difference [9] to the number of errors tests of the individual parts of a program. It consists
found during code review. of code that calls a function in the program, gives
it some input, and tells the developer whether it re-
This suggests that learning to read code is a valu-
turns the expected result. If all non-trivial functions
able part of learning to write and debug software,
have tests, and if those tests provide a reasonable
a principle supported by [10] and others. Formal
coverage of both likely inputs and edge cases, this
training in code comprehension may be of benefit to
provides a baseline assurance that the components
the researcher-developer who wishes to review code,
of a program work as expected, even if the program
but in addition to any formal training, the exercise
as a whole is complex or changing.
of code review is itself a learning opportunity.
Although the studies cited here show improvements 5.1. Testing research software can be hard
in review performance due to improved code com- Much research software is written to perform novel
prehension and review methods, a high level of com- work: potentially complex experiments with previ-
prehension is not a prerequisite for taking part in ously unknown outputs. How do we test such soft-
reviews: in [8] it is noted that some reviewers who ware, given that the authors are not able to predict
did not fully understand the code nonetheless found its output?
around half of the defects in it.
Unit testing goes some way toward helping with this.
4.3. Reviews are effective For code to be testable through unit tests it must be
In a well known article, Fagan [11] shows that code written in sufficiently small functions, with clearly
reviews can remove 60−90% of errors before the first defined inputs and outputs, for each to be tested
test is run [12]. Comparative studies of code re- mechanically. At this level it should indeed be pos-
view techniques and reviewers support the proposi- sible to calculate the expected output of a function,
tion that code review is generally effective, showing given sufficiently simple choices of input. Further,
for example that “median” reviewers found 60−70% code written in this way is easier to read and reuse.
of defects [8] or that reviews exposed all but one Unit tests written during software development also
defect in the sample code [7]. provide early sanity-checking for code. A program to
4.4. Recommendations read data from a file and calculate a result might fail,
not only because the basic algorithm is wrong, but
We therefore suggest code reviews as appropriate for because the input is read wrongly or the code fails
academic software development for three reasons: to deal with unusual cases such as short or empty
datasets or missing values.
1. Code reviews can be carried out quickly and
informally in a peer setting such as a research Although unit testing cannot guarantee that a pro-
environment; gram produces the correct results, ensuring that it
is built from correct components is a useful step to-
2. Carrying out reviews gives the reviewer and au- ward ensuring it actually implements the method
thor practice in reading code and in writing described in the publication.
code that can be read, both of which are valu-
able in improving software development ability; 5.2. Unit tests help avoid regressions
In software development, a regression is an error
3. Code reviews are more effective at finding errors
introduced accidentally while modifying software—
than any other commonly used technique.
something which used to work but has since become
broken, such as a new bug introduced while fixing
5. UNIT TESTING something else.
Unit testing is a second technique, widely used in Regressions are common: in [13] Robert Glass iden-
non-academic settings, for providing software devel- tified 8.5% of errors identified after delivery of
opers with feedback (Fig. 1, loop c). software from a substantial commercial aerospace
project as regressions. Regressions may also be dan- perspective to that usually used when simply writ-
gerous, because they change behaviour that may be ing code. Rather than begin by considering how
relied upon or that has been already published; and to implement a solution and then think about how
regressions are dispiriting, because they are exam- to test whether it works, the test-driven approach
ples of unambiguously wrong behaviour that suggest begins by considering what would be necessary to
the developer is making negative progress. establish whether a solution was correct or not, be-
fore filling in the implementation. We believe this
Although regression testing is especially important
approach may be a mental tool worth considering
when working in a team, such errors also occur in
when approaching hard-to-implement problems, re-
solo projects, and regressions can also arise from fac-
gardless of whether it ultimately produces better
tors outside a developer’s own control such as plat-
code or not.
form dependencies or changes between versions of a
third-party library. 5.4. Recommendations
Automated tests are vital in avoiding regressions, We propose unit testing as part of a practical,
because regression testing is simply too difficult and research-driven development environment for the
tedious to perform manually. It would require test- following reasons:
ing every use case by hand after every code change,
which is not feasible. Unit tests can be run quickly 1. Research software can be hard to perform high-
after every build, and can identify with some preci- level testing for: unit tests are a relatively easy
sion the part of the code that has been broken by a way to obtain any assurance of correctness;
change.
2. Unit testing is a good way to defend against
5.3. Test-driven development may be a useful regressions, a common class of error which can
analytical tool be hard to discover by manual testing;
A fundamental discussion around unit testing con-
3. A form of unit test development, test-driven de-
cerns when to write the tests.
velopment, can provide an additional analytical
Some developers, e.g. [14], advocate test-driven (or perspective when solving difficult problems.
test-first) development, a process in which the unit
tests are written before the rest of the code. The
act of writing tests constructs a contract which the 6. VERSION CONTROL
implementation is then written to satisfy. In princi- A version control system is a tool that tracks the his-
ple, this ensures the code will be correct as it is first tory of all files in a software project, logging changes
written. and different versions, thus allowing researchers to
Erdogmus et al. [15] found that those developers who easily compare different versions of the same file.
adopted a test-driven strategy wrote more tests; that Version control systems also help developers to share
the developers who wrote more tests tended to be and merge changes, making projects easier to man-
more productive; and that the quality of software age when several researchers are working together
produced was roughly proportional to the number of on them. Commonly used version control systems
tests provided. A further survey [16] found that the include git1 , Mercurial2 , and Subversion3 .
quality of tests in a test-driven approach was often Besides making general management of code easier
higher than that of those written after the implemen- and more pleasant, a version control system provides
tation. However, we are unaware of any studies that a necessary foundation for effective code reviews and
show a direct link between the test-driven approach unit testing.
and higher quality or productivity, controlling for
In allowing its user to see the differences between
other factors.
respective revisions of a project, a version control
Nonetheless, what makes test-driven development 1 http://git-scm.com/
interesting for research software is the opportunity 2 http://mercurial.selenic.com/
system makes it possible to isolate a set of changes already proven problematic or where bugs have been
and be sure that a code review is covering every- found in “finished” code.
thing that has been changed in a particular piece of Be careful to avoid exact comparisons when test-
work. It also provides a mechanism for sharing code ing outputs from inexact calculations or those using
between developers, such that they can be sure they floating-point arithmetic. The error margin for a
are looking at the same version of the same software. calculation is part of the test, just as the expected
In the context of unit testing, especially for regres- result is.
sion testing, a version control system provides the Because it can be hard to get started in writing
necessary support to compare two versions of code unit tests, we propose adopting the simplest avail-
and identify the source of a change in behaviour. able unit test framework. In some cases, it may
even be easiest to write unit tests without a frame-
7. RECOMMENDED ACTIONS FOR A RE- work: call a function; complain if the result is wrong.
SEARCH GROUP More usually the simplest method will be to adopt a
7.1. Code reviews standard test framework, such as Nose for Python4 .
Other authors [17] advise using a test framework for
As noted above in Sec. 4, in a research environment all unit testing.
we advocate the adoption of an informal style of
“use-case” code review. Here the reviewer follows 7.3. Version control
data paths through the code, from an imagined in- A version control system should be used from the
put to the result output, with the original developer very beginning of any project, regardless of its com-
available to explain what each step is for if neces- plexity. Recent reports from other fields show en-
sary. This technique can be conducted without much couraging trends in adoption of such systems in an
preparation, because the reviewer does not need an academic setting [20]. In general, the right system
in-depth knowledge of the specifics of the implemen- to choose is whichever is available in the research
tation and there is no requirement to make checklists group or in use by peer researchers. In the absence
or other aids prior to the review process. of a local standard, we recommend the use of a mod-
ern distributed version control system such as git or
Wilson et al [17] propose the use of “pre-merge” code
Mercurial.
reviews, in which reviews are a requirement for code
to be included into a shared repository. When using Similar guidelines apply to the choice of hosting ser-
a modern distributed version control system, this vices. Several free solutions are available on-line,
could be set up either by committing to a user’s such as GitHub5 for git, or Bitbucket6 for both git
own branch and then merging to a main branch after and Mercurial projects with good support for pri-
review, or else by having the developer commit only vate work. In addition to these generic hosting sites,
on their local machine and get the code reviewed there are subject-specific sites such as the Sound-
in person by another researcher before pushing to Software project site7 for UK-based researchers in
a central repository. In either case, it is important the audio and music field.
to review before performing experiments using the
code. 8. CONCLUSIONS
7.2. Unit testing Throughout this paper we argue that code reviews
and unit testing are valuable techniques that can
Unit tests should be small and as simple as possible. and should be adopted in academic contexts. We
The aim is to exercise the code in unexpected ways, introduce these techniques in the context of a set
focusing on tricky small cases rather than easy large of feedback loops that give the researcher-developer
ones. Studies show that areas of code where bugs timely information about problems in their code.
are found are more likely to be fragile in general 4 https://nose.readthedocs.org/en/latest/
[18, 19] and bugs that have already been found are 5 https://github.com/
relatively likely to reappear as regressions, so more 6 https://bitbucket.org/
We have also given some advice and pointers to how IEEE 2012 International Conference on Acous-
to adopt such techniques in a research environment, tics, Speech, and Signal Processing (ICASSP
such as: 2012), pp. 2745–2748, IEEE Signal Processing
Society, March 2012.
• apply an informal style of “use-case” code re-
view technique, calling on peer researchers to [3] J. M. Wicherts, M. Bakker, and D. Molenaar,
review code before using it in substantial ex- “Willingness to share research data is related
periments; to the strength of the evidence and the quality
of reporting of statistical results,” PLoS ONE,
• adopt the simplest available unit testing regime, vol. 6, p. e26828, 11 2011.
keep unit tests small and concise, and apply a
standard test framework for unit tests on larger [4] J. E. Hannay, C. MacLeod, J. Singer, H. P.
pieces of code; Langtangen, D. Pfahl, and G. Wilson, “How do
scientists develop and use scientific software?,”
• support these techniques with the use of a ver- in Proceedings of the 2009 ICSE Workshop on
sion control system from early in a project. Software Engineering for Computational Sci-
ence and Engineering, SECSE ’09, (Washing-
Techniques such as these rely on becoming comfort- ton, DC, USA), pp. 1–8, IEEE Computer Soci-
able with the idea that others will be looking at one’s ety, 2009.
research code. We believe that this is an important
[5] K. Beck, M. Beedle, A. van Bennekum,
step, and one that makes documentation and unit
A. Cockburn, W. Cunningham, M. Fowler,
testing easier, code reviews possible, and publica-
J. Grenning, J. Highsmith, A. Hunt, R. Jeffries,
tion and peer review of software less scary.
J. Kern, B. Marick, R. C. Martin, S. Mellor,
As part of our ongoing UK-based SoundSoftware K. Schwaber, J. Sutherland, and D. Thomas,
project8 , we have developed several materials to add “Manifesto for Agile Software Development,”
to this subject, such as handouts with practical guid- 2001.
ance, video tutorials, and slides. You can freely ac-
cess these on the project’s website9 . [6] A. Dunsmore, M. Roper, and M. Wood,
“Object-oriented inspection in the face of de-
9. ACKNOWLEDGMENTS localisation,” in Proceedings of the 22nd in-
This work was supported by EPSRC Grant ternational conference on Software engineering,
EP/H043101/1. Prof. Mark D. Plumbley is pp. 467–476, ACM, 2000.
also supported by EPSRC Leadership Fellowship
EP/G007144/1. [7] A. Dunsmore, M. Roper, and M. Wood, “Prac-
tical code inspection techniques for object-
oriented systems: An experimental compari-
10. REFERENCES son,” IEEE Software, vol. 20, pp. 21–29, 7–8
2003.
[1] I. Damnjanovic, L. A. Figueira,
C. Cannam, and M. D. Plumbley, [8] A. Dunsmore, M. Roper, and M. Wood, “The
“SoundSoftware.ac.uk Survey Report.” role of comprehension in software inspection,”
http://code.soundsoftware.ac.uk/documents/17, Journal of Systems and Software, vol. 52,
2011. pp. 121–129, 2000.
[2] C. Cannam, L. A. Figueira, and M. D. Plumb- [9] S. Rifkin and L. Deimel, “Applying program
ley, “Sound software: Towards software reuse in comprehension techniques to improve software
audio and music research,” in Proceedings of the inspections,” in In Proceedings of the 19th An-
8 http://soundsoftware.ac.uk nual NASA Software Engineering Laboratory
9 http://soundsoftware.ac.uk/resources Workshop, pp. 3–8, 1994.
[10] L. E. Deimel and J. F. Naveda, “Reading com- [20] K. Ram, “git can facilitate greater reproducibil-
puter programs: Instructor’s guide and exer- ity and increased transparency in science,”
cises,” Tech. Rep. CMU/SEI-90-EM-3, Soft- Source Code for Biology and Medicine, 2013.
ware Engineering Institute, Carnegie Mellon
University, August 1990.