Professional Documents
Culture Documents
Applying Property-Based Testing in Teaching Safety-Critical System Programming
Applying Property-Based Testing in Teaching Safety-Critical System Programming
Applying Property-Based Testing in Teaching Safety-Critical System Programming
Abstract—At the Universidad Politcnica de Madrid students as Java libraries (java.util.concurrent, etc); graded
attending a course on concurrency are taught a high-level small programming exercises are used to encourage student
formalism which permits concise specification of shared resources. learning and participation, and to judge their skill set. However,
This formalism is used to express safety-critical access policies although such relatively low-level libraries and concurrency
for typical control problems such as robot plants. Students are primitives are highly useful for programming high-performing
moreover provided with programming recipes for implementing
such shared resource specifications in programming languages
concurrent applications, they are, as experience show, not easy
(typically Java). The teachers of the course use various tools to master. A second part of the course introduces both a higher-
to ensure that the implementations developed by students for a level concurrent library JCSP2 [1] which provides CSP-like
shared resource are of an acceptable quality. Such tools include concurrency primitives for Java, and, the topic of this paper,
normal unit tests, but also the systematic application of property- the shared resource formalism [2].
based testing to judge the quality of the exercises. In this article
we provide an overview of the tools, techniques and methods used The core idea with shared resources is that the design must
in one particular exercise of the course: the implementation of a distinguish active entities (threads, processes) from passive
control system for an automated warehouse. ones (resources, shared memory locations). The latter represent
all kind of interaction among the former, and are formally
Keywords—Testing; Java; Concurrency; Safety;
modelled using an abstraction (shared resource specifications)
that contains a clear interface – which can be invoked from
I. I NTRODUCTION processes – and a transactional transition semantics.
Programming concurrent applications is today still a chal- The code obtained from the active entities is considered
lenging task. A programmer has to understand the concurrency light in the sense that it is assumed to be free from concurrency-
guarantees provided by a programming language (or operating specific constructs and is, thus, easier to verify, more portable,
system) with regards to thread (or process) scheduling, and and does not require specially trained programmers to develop
with regards to shared memory writes and accesses. The or test. On the other hand, the code from the shared resources is
programming language provides basic tools to combat this considered heavy code, as it is here where all the concurrency
complexity, e.g., for Java, synchronized methods, synchronized specific code is placed. It is convenient then, that this code
regions, and the possibility to declare variables “volatile”. is carefully derived from validated designs so that it can be
However, as is evident from the vast number of questions regenerated if requirements change, rather than modified by
found on the Internet1 , these, and other concurrency primitives hand. In practice, this translation of shared resources into code
and libraries for Java, are not well understood. Moreover, in an actual programming language is semi-automatic, by means
concurrency is becoming more important: almost all proces- of certain idioms and code patterns[3], [4]. Certainly, the use
sors today come equipped with multiple processing cores of these patterns enforces some discipline in the use of error-
which share memory. Given the importance of concurrent prone concurrency primitives, thus alleviating some of the
programming, and the inherent difficulty, it is no surprise that aforementioned language-related issues.
a course on concurrency is taught to undergraduate students
at the Universidad Politécnica de Madrid (Madrid Technical The focus of this paper is on describing the experiences of
University). The course, which is taught in the fourth term giving the undergraduate students, in the course on concurrent
in the computer science bachelor degree program at the programming, the task of correctly implementing a particular
University, uses the Java programming language as a basis. shared resource specification. To focus attention away from
The choice of Java is not because the language provides an lower-level concurrency issues, and instead centering it on
especially good foundation for concurrent programming, but issues to do with safety critical systems, and embedded systems,
rather because students are already familiar with the language the specification describes an automatic (shipping) warehouse
from earlier programming courses. The course has a first part in which autonomous robots operate. We describe in detail
where the classical language-based concurrency tools are intro- how the exercise was set up, the tools and techniques given to
duced (monitors, semaphores, etc) and their implementations students to help them in their task, and finally evaluate how
successful the students were in solving this (in our opinion quite
This work has been partially funded by the European Commission FP7
project ICT-2011-317820 PROWESS, the Spanish MINECO project TIN2012-
representative) concurrent programming task. The contributions
39391-C04-02 STRONGSOFT, and the ARTEMIS Joint Undertaking under of this are twofold, first, we provide a detailed recipe for how
grant agreement no 295373 (project nSafeCer).
1 e.g., as seen on the popular stackoverflow.com site 2 http://www.cs.kent.ac.uk/projects/ofa/jcsp/
310
500
100 300
cannot exit
300
300
200
300 300
cannot exit
100 200 400
cannot cannot
enter enter
200
100
200
0 1 2
identifier passed along in the two operations. Moreover note If CPRE does not hold, because the corridor is not empty,
that the example abstracts away from the use of e.g. scales to the thread executing the method will wait on the condition
weigh robots and cargo, i.e., it is assumed that calls to these freedCorridor[n+1] until another thread signals it (in
two operations always identify the correct weights of robots. enterWarehouse(n,w)).
Once the CPRE is established, the POST condition is
IV. A S HARED R ESOURCE I MPLEMENTED IN JAVA established by modifying the state of the resource (not shown
A correct implementation of a shared resource ensures in the code excerpt). Then, finally, the method signals any other
that its operations are executed only when the concurrency thread, corresponding to a robot waiting to enter warehouse n
precondition (CPRE) so permits, and in isolation. However, which the robot executing exitWarehouse(n,w) just left.
there may also be additional requirements on the order in
which different calls are served which are not expressed by the V. E XERCISE : F ROM S PECIFICATION TO A G RADED
resource specification. In the warehouse exercise, for instance, A SSIGNMENT
students were expected to implement the progress condition
that if in a state if it is safe for a non-empty set of robots to As the main (graded) exercise of the course, students were
enter or exit warehouses, then some robot must eventually do provided with the specification of the warehouse example
so. (in Fig. 2), and instructed to implement the resource using
Java. Moreover, the students were required to use a particular
A resource specification can be implemented in different concurrency construct [5], which is an improvement on the
languages, using different concurrency language primitives. lock and condition solution seen in Fig. 3, in that it is not
We can implement a resource in Java, for instance, using needed to test the concurrency precondition using a while loop.
e.g. the Locks and Condition classes provided by the
java.util.concurrent package. As an example, Fig. 3 In earlier lectures, students had been taught in detail about
provides a (sketched) Java class that can serve as a starting all these topics. They had seen, for instance, several other
point for a complete implementation. Note that the class is shared resource specifications, and had attended lectures on
rather incomplete. It does for instance not address the special how to implement such specifications in Java. Students were
role of the last and first warehouses, i.e., that there is no corridor grouped together3 in teams of two students to solve the exercise.
before the first warehouse, and the absence of a corridor after As the course does not teach software engineering skills in
the last warehouse. general, but rather basic skills in concurrent programming and
design, both students in a pair were supposed to partake equally
The exitWarehouse(n,w) method begins by acquiring in all activities required to solve the exercise. However, no
a lock, ensuring that no other call executes simultaneously. Then,
the concurrency precondition (CPRE) is continuously evaluated. 3 the selection of an exercise partner could be made freely
311
C-DAT WarehouseAccessControl
OPERATIONS
ACTION enterWarehouse: Warehouse[i] × Weight
ACTION exitWarehouse: Warehouse[i] × Weight
BEHAVIOUR
DOMAIN:
TYPE: WarehouseAccessControl = (weight: Warehouse → Weight × occupied: Warehouse → B)
Warehouse = 0 .. N WAREHOUSES - 1
Weight = 0 .. MAX WEIGHT WAREHOUSE
attempt was made to determine whether indeed such an equal control system could expect, and the overall dynamic of the
participation took place. Apart from attending lectures, students warehouse example. The simulator had ample print-statements
had individual access to teachers to resolve questions during the to provide ample feedback to students.
entire exercise period. In total around 220 students attempted
to solve the exercise; there were three teachers available to The second tool was the system for handing in solutions
assist the students during the exercise. itself. Solution were not handed in using electronic mail,
but using an automated web-based program. This web-based
program came with a small set of test cases (15 during phase
A. Exercise Time Plan one), programmed by hand by teachers, which attempted to
The exercise was setup in two phases: the first phase ran detect violation of the main system invariant and the overall
during almost two weeks, which finished with students having progress condition, i.e., that the maximum weight in any
the possibility of handing in a preliminary solution. Although warehouse can be at most 1000 weight units, and that if it
the handing of a solution in this phase was not strictly obligatory, is safe for some robot to enter or exit a warehouse, some
almost all student groups did this. After a brief period of such safe action must be taken. Such test cases essentially
evaluation by teachers (2 days), students were presented with created a few robots with certain weight, let these robots make
individualised feedback (to be described later), and phase two of calls to the student programmed control system, and judged
the exercise started. During phase two, which lasted roughly a whether the control system made the correct decision in letting
week, students improved their solutions and eventually handed a robots proceed (or not). In case a student solution failed one
in their final solutions to the programming task. Then, the or more of these simple test cases, the web-based program
teachers of the course assessed the final solutions, partly with simply refused to accept their implementation, never forwarding
the help of automated testing tools, and assigned final exercise it to the teachers responsible for grading.
grades. To get feedback, students could attempt to hand in a solution
repeatedly, without penalty, just to get information from running
B. Exercise Tools the test cases. In case a solution was successful, students could
still submit improved versions (until phase one had terminated);
Student had two basic tools to aid them in comprehending the latest solution submitted was used for (eventual) grading.
the exercise task, and in assessing the quality of their solutions. Moreover, the test cases were available, in source code form,
First, a warehouse simulator was provided. This simulator to students.
was rather simplistic. It was a text based simulator, which
created a set of robots, letting them load random items (with A typical such test case is shown in Fig. 4, using English
random weights) in warehouses, and calling the control system language for clarity (test cases were programmed in Java). The
programmed by the students to illustrate the calls which their test case depicted involves a set of robots that ask the control
312
public c l a s s WarehouseResource {
/ / Resource s t a t e
private int weight [ ] ;
p r i v a t e boolean occupied [ ] ;
/ / Handling concurrency
p r i v a t e Lock l o c k ;
private Condition freedWarehouse [ ] ;
private Condition freedCorridor [ ] ;
public WarehouseResource ( ) {
/ / i n i t i a l i z e s t a t e and c r e a t e m o n i t o r s and c o n d i t i o n s
}
p u b l i c v o i d e n t e r W a r e h o u s e ( i n t n , i n t w) {
lock . lock ( ) ;
/ / CPRE h o l d s h e r e , u p d a t e r e s o u r c e s t a t e ( POST )
// ...
/ / S i g n a l w a i t e r s t h a t t h e r o b o t has l e f t t h e c o r r i d o r
freedCorridor [n ]. signal ( ) ;
lock . unlock ( ) ;
}
p u b l i c v o i d e x i t W a r e h o u s e ( i n t n , i n t w) {
lock . lock ( ) ;
/ / CPRE h o l d s h e r e , u p d a t e r e s o u r c e s t a t e ( POST )
// ...
/ / S i g n a l w a i t e r s t h a t t h e r o b o t has l e f t t h e warehouse
freedWarehouse [ n ] . s i g n a l ( ) ;
lock . unlock ( ) ;
}
}
313
test20(TestRobots)
java.lang.AssertionError: the call enterWarehouse(0,400) of robot 2
raised the exception java.lang.NullPointerException
Stacktrace:
java.lang.NullPointerException
at ControlAccesoNavesMonitor.enterWarehouse(ControlAccesoNavesMonitor.java:61)
at Call.call(Call.java:117)
at Call.toTry(Call.java:112)
at es.upm.babel.cclib.Tryer.run(Tryer.java:50)
Call trace:
enterWarehouse(0,800) of robot 0 -- did not block
enterWarehouse(0,100) of robot 1 -- did not block
enterWarehouse(0,400) of robot 2
Fig. 5. A typical test case failure report – an exception (text translated from spanish)
system for permission to enter and exit warehouses; the text in manually inspection then determined whether implementations
parenthesis show what the expected decisions by the control marked as suspicious indeed used more than one monitor. To
system are. Feedback to students when their solution failed determine whether a solution had bugs (i.e., as reported under
a test case was in natural language, an example is shown in the setting Has-bugs), we extended the number of test cases to
Fig. 5. around 30, taking care to design test cases that more carefully
tested solutions.
Moreover, students were encouraged, but not forced, to
develop their own unit tests for their solutions. In practice, very The results from the assessment were that around 15 solu-
few students seems to have done this. tions had programmed the concurrency precondition incorrectly
(CPRE-while), that 11 solutions used static variables, that 7
C. Feedback on Hand-ins after phase one solutions had too few comments, that around 40 solutions still
had bugs, and five solutions were using multiple monitors.
At the end of phase one, almost all student groups had The more critical bugs were clearly the ones which had
been able to successfully hand in their solutions using the programmed the concurrency precondition incorrectly, or were
automatic web-based system, i.e., the test cases did not reveal using multiple monitors. Although we had stressed these (part
any problems. Still, it was known that the test cases were style) concerns repeatedly during the course, and we had stated
not nearly exhaustive enough to determine whether a solution clearly that solutions that did not follow these guidelines were
was correct, and moreover, we wanted to examine the code to not acceptable, still a fair number of solutions did not meet
assess e.g. whether a solution was really using the required Java the requirements. That solutions still contained bugs was not
library to solve the assignment. The solutions were assessed wholly unexpected, given the incomplete test suite provided.
according to the following criteria:
• CPRE-while: not testing the concurrency precondition
in a while loop. This condition has to do with a stylistic D. phase two
concern which had been stressed repeatedly in lectures,
which aims to reduce the number of concurrency errors For phase two, the test cases used in the automatic web-
(see [5] for details) based handing-in system was extended; we used the same
test cases that had been used to assess the quality of the
• NO-static: static (class variables) variables must not implementation (Has-bugs) at the end of phase one. We could
be used have extended the number of test cases further, but the aim of
providing them was not force students to write (functionally)
• NO-comments: no, or few, comments
correct code, but rather to help them to understand likely
• Multiple-monitors: the code incorrectly uses multiple problems with their solutions. We consider it unlikely that
monitors student working in industry after graduation on a programming
task will have access to high quality test suites, and so we
• Has-bugs: functional bugs were detected. did not want to subject students to them during this exercise
Students received a “report sheet” listing any problems detected either. Rather, students were told that the test suite provided had
in their solutions according to the above criteria. The sheet defects, and that they should not simply rely on its diagnostics,
noted that a few additional style requirements had not been but that they were ultimately responsible for deciding when
checked against their code, but would be checked during phase their solutions were of an acceptable quality.
two.
Violations of the criteria above, except for Has-bugs, was E. Grading
detected using automatic source code analysis combined with
manual inspection. For instance, to detect usage of multiple To assess the final quality of the solutions after phase two,
monitors, the UNIX grep utility was used to count the number the procedure from the assessment of phase one was repeated,
of occurrences of the Java Monitor class in the source code, and and was extended by automated testing of solutions.
314
The code style comments had now been fixed4 . The style PASSED FAILED
criteria for how to program a shared resources were checked 55
a bit more carefully, and although warned in advance, there test1 test2 test3 #
were a number of student groups that did not follow the new 48 - 2
requirements. - 13
38
The automatic testing was done by developing a model of
- - 2
the problem. To do this we used the Quviq QuickCheck [6]
tool. Essentially, we described the correct functioning of an Fig. 6. Test results for student’s Warehouse implementations, classified by
implementation of the robot warehouse using a state machine, experiments.
which was used both to automatically generate random test
cases, and to judge whether such automatically generated test
cases were executed correctly by the student implementations. implementations were found erroneous using test3 which were
The details for how this was accomplished are described in [7]. not detected by either test1 or test2.
The QuickCheck based model used for automatic testing was
not made available to students. VI. C ONCLUSION AND F UTURE WORK
Although the task of correctly programming the shared Clearly programming concurrent systems is a quite challeng-
resource warehouse example may not appear overly difficult, ing task for the average undergraduate student. Even though
the results of our testing using QuickCheck were, at least to us, we had devoted quite substantial resources to aiding students
surprising. Of the 103 solutions tested, which had passed the in developing correct solution, when the exercise reported in
manually crafted test suite, we found errors in 55 of them, i.e., this article had finished, we could still find bugs in over 50%
54.40% of the solutions handed contained at least one error, of the solutions handed in, although students were motivated
indicating that the particular test suite was not particularly (by grading concerns) to hand in good solutions.
good at finding errors, and that the QuickCheck-based testing It must be stressed that their difficulty lay not mainly in
approach was much more successful. interpreting the high-level concurrency “design pattern” shared
We separated the testing of the implementations into three resource specification correctly, rather, based on observations
experiments. First, we tried to identify implementation which during individual tutoring of students and code inspection, the
failed a basic safety criterion (test1), i.e., an implementation students had significant difficulties in correctly using the basic
admits a robot in a warehouse even when the total resulting Java concurrency tools for implementing a shared resource.
weight exceeds the limit, or admits a robot in a corridor Clearly, we should (ideally) invest even more (teaching)
even when the corridor is already occupied by another robot. resources into teaching students concurrent programming;
The second property (test2) additionally tests whether there this is not surprising given the inherent difficulty of the
are calls (to enter or exit a warehouse) which the model task. Unfortunately, at least in our university, the course on
permits, but which the implementation blocks. That is, the concurrent programming receives less resources than most other
implementation does not satisfy a liveness condition. Finally programming courses. Moreover, this situation is likely not
test3, in contrast to test1 and test2, tries to execute multiple calls about to change. So, given that concurrent programming is
concurrently, to detect possible incorrect uses of the basic Java truly quite hard, and there are not enough teaching resources
concurrency mechanisms. The correctness property checked available to enable student to become truly proficient in its use,
for test3 includes both safety (test1) and liveness (test2). what should we do?
The result of the testing using our testing framework are One way forward, in our opinion, would be to de-emphasize
summarized in Figure 6. Note that test3 was run only if no errors the use of low-level concurrency mechanisms for solving basic
were detected using test1 or test2. To properly interpret the concurrent tasks, and focusing more attention on higher-level
results, note that the properties tested in the three experiments concurrency constructs. For instance, we would like to continue
are not mutually exclusive. That is, test2 is a more strict test having students using shared resources (but attempting to de-
that test1 (i.e., testing both safety and liveness), whereas test3 is velop tools for automatically deriving correct implementations
more strict than both test2 and test1, as it tests both safety and of such resources), or having students using the JCSP library
liveness under the added complication of (possibly) concurrent for Java [1], or using an actor-based language such as e.g.
calls. As the testing is randomized, there is a chance that Erlang [8].
although an implementation error is, say, first detected during
experiment test2, it may be still be a safety error which due to Clearly there still are concurrent application where the
the nature of random test generation, was by chance not detected performance benefits from using low-level libraries outweigh
during test1. In the figure “PASSED” are those implementations their drawbacks, but for many other uses the price in terms
(48) in which testing failed to detect an error, and those under of low programmer productivity, and potential for introducing
the heading “FAILED” (55) had at least one bug. The figure difficult to find race condition bugs, makes their application
moreover indicates which test spotted an error. Thus, as an prohibitively expensive.
example, among the failed implementations, row 2 shows that
there were 13 implementations that failed experiment test2 (and R EFERENCES
did not fail test1), while the last row tells us that two additional
[1] P. H. Welch, N. Brown, J. Moores, K. Chalmers, and B. H. C.
4 Except
Sputh, “Integrating and extending JCSP,” in The 30th Communicating
for a few cases were students had not submitted any solution at Process Architectures Conference, CPA 2007, organised under the
the end of phase one, and had thus not received any individualised feedback. auspices of WoTUG and the University of Surrey, Guildford, Surrey,
315
UK, 8-11 July 2007, ser. Concurrent Systems Engineering Series, [5] A. Herranz and J. Mariño, “A verified implementation of priority monitors
A. A. McEwan, S. A. Schneider, W. Ifill, and P. H. Welch, in Java,” in Proceedings 2nd. International Conference on Formal
Eds., vol. 65. IOS Press, 2007, pp. 349–370. [Online]. Available: Verification of Object-Oriented Software (FoVeOOS’11), Revised Lectures,
http://www.booksonline.iospress.nl/Content/View.aspx?piid=5982 ser. Lecture Notes in Computer Science, B. Beckert, F. Damiani, and
[2] A. Herranz, J. Mariño, M. Carro, and J. J. Moreno-Navarro, D. Gurov, Eds., vol. 7421. Springer, 2012, pp. 160–177.
“Modeling concurrent systems with shared resources,” in Formal [6] T. Arts, J. Hughes, J. Johansson, and U. T. Wiger, “Testing telecoms
Methods for Industrial Critical Systems, 14th International Workshop, software with quviq quickcheck,” in Proceedings of the 2006 ACM
FMICS 2009, Eindhoven, The Netherlands, November 2-3, 2009. SIGPLAN Workshop on Erlang, Portland, Oregon, USA, 2006, pp. 2–10.
Proceedings, ser. Lecture Notes in Computer Science, vol. 5825, 2009, [7] L. Fredlund, Á. Herranz-Nieva, and J. Mariño, “A testing-based
pp. 102–116. [Online]. Available: http://www.springerlink.com/content/ approach to ensure the safety of shared resource concurrent systems,” in
b83m037648436667/ Software Engineering and Formal Methods - SEFM 2014 Collocated
[3] J. Mario, R. N. N. Alborodo, and Ángel Herranz, “Model-based Workshops: HOFM, SAFOME, OpenCert, MoKMaSD, WS-FMDS,
thread-safe Java code generation from JML specifications,” 2015, in Grenoble, France, September 1-2, 2014, Revised Selected Papers,
preparation. [Online]. Available: http://babel.upm.es/∼rnnalborodo/sr ser. Lecture Notes in Computer Science, C. Canal and A. Idani,
web/jml clause paper.pdf Eds., vol. 8938. Springer, 2014, pp. 116–130. [Online]. Available:
[4] M. Carro, J. Mario, . Herranz, and J. Moreno-Navarro, “Teaching how http://dx.doi.org/10.1007/978-3-319-15201-1 8
to derive correct concurrent programs from state-based specifications [8] F. Cesarini and S. Thompson, Erlang Programming – A Concurrent
and code patterns,” in Teaching Formal Methods, ser. Lecture Notes Approach to Software Development. O’Reilly Media, 2009.
in Computer Science, C. Dean and R. Boute, Eds. Springer
Berlin Heidelberg, 2004, vol. 3294, pp. 85–106. [Online]. Available:
http://dx.doi.org/10.1007/978-3-540-30472-2 6
316