Professional Documents
Culture Documents
A Simple Defer Feature For C
A Simple Defer Feature For C
toggle toc
show new
Jens Gustedt (INRIA, France) – Robert C. Seacord (NCC Group, USA)
show both
3. Rationale
Consistent resource management is key for the security of modern applications. Many modern programming languages
(such as C++, Go, D, or Java), operating systems (such as POSIX) or libraries (such as boost) provide proper constructs
to specify callbacks that are automatically launched on the exit of a block or function. For the complete motivation for
this paper and the provided feature see N2542. For this paper here we concentrate on the functional intersection of
several existing extensions to the C programming language that are implemented in the field; since none of these
extensions is predominant and since they are syntactically quite different, we propose to map these on a similar feature
as it is found in Go, coined defer.
The top-level view of this feature is that each defer declaration specifies a callback for which the execution is deferred
until the current scope is left for whatever reason. For a given resource, this allows to specify a callback close to the
resource definition, that “cleans up” at the end; it avoids the necessity to specify cleanup during further processing, each
time that a exceptional condition is met.
This paper builds on the assumption that at least simple lambdas are integrated into C23; otherwise it is obsolete. That
assumption helps to simplify the proposal a lot and evacuates certain points that had be contentious, even between the
original authors of N2542.
double* q = malloc(something);
defer { free(q); }
it would not be clear which value of q would be used for the call to free at the end of the surrounding block. Would it
be the current value of q at the moment when the defer is met, or would it be the value that q has at the end of the
execution of the block?
People were much divided here and it was not possible to reach a satisfying consensus, so probably the community is
not (yet?) ready to establish a default behavior for that choice.
In this proposal we use lambdas to specify the defer callback and thus the decision about the point of evaluation of
variables boils down all naturally to captures. So it is the user who will explicitly chose one model or the other
according to their needs. There are several principal scenarios.
double* q = malloc(something);
defer [q]{ free(q); };
Here, q is explicitly listed in the capture list and the value is frozen at the point of the defer and a new local object q
shadows the use inside the defer callback.
double* q = malloc(something);
defer [&q]{ free(q); };
q is explicitly listed as identifier capture. The identifier is evaluated when the defer callback is executed; in that case
with the execution of the callback sees the last value of q when leaving the surrounding function or lambda.
double* q = malloc(something);
defer [qp = &q]{ free(*qp); };
An alternative to an identifier capture could be to take the address of the corresponding variable explicitly and
memorize it in a value capture.
double* q = malloc(something);
defer [=]{ free(q); };
All variables (and so q ) are shadow captures and the value is frozen at the point of the defer.
double* q = malloc(something);
defer [&]{ free(q); };
All variables (and so q ) are identifier captures and the value is determined when the defer callback is executed.
Using lambdas makes it even possible to use mixed captures as in the following.
Here is possible to capture the initial value of q and use this inside the defer callback to test if the initial buffer (an
automatic array) has been replaced by a large allocation.
We also had much debate if a deferred statement (as we called it) should be attached to a function or to a block (AKA
compound statement). Using lambdas as callbacks instead of compound statements eases the argument and probably
also the implementation.
The present proposal avoids to position itself with respect to that question.
The proposal is formulated as attaching defer declarations (and not statements) to blocks, but makes the
appearance in blocks other than function bodies implementation-defined. Thereby implementations may add the
feature to other blocks than functions, but are not forced to do so. So for a particular implementation the following
code may be valid.
int main(void) {
... do something ...
// a local task that needs a lot of memory
{
double* a = malloc(huge);
if (!a) return EXIT_FAIURE;
defer [&a]{ free(a); };
... do complicated things
// free the array at the end of the block
}
... do other things that need a lot of storage
}
If it is valid, the semantics are well defined by the proposed text; if it is not valid, a diagnostic is required.
The presence of lambdas in the language makes it easy to reformulate defer declarations that are attached to a
block. A portable rewrite of the above can look as follows
int main(void) {
... do something ...
// a local task that needs a lot of memory
// modeled as a lambda that is called in-place
[&]{
double* a = malloc(huge);
if (!a) return EXIT_FAIURE;
defer [&a]{ free(a); };
... do complicated things
// free the array at the end of the block
}();
... do other things that need a lot of storage
}
This is semantically equivalent to the above and should be accepted by all implementations.
In N2542 we had formulated the defer feature as a control statement, the present one changes this to make defer a
declaration. This makes a description of the feature much easier, lambdas are first of all values, and it clearly anchors
the lifetime and scope of applicability of the feature at the innermost enclosing block.
POSIX has these two functions (or macros) to ensure that a cleanup functionality can be attached to an implicit scope
that is established by paired function calls to the following functions:
Here the argument arg is a pointer to a context that will be passed on to the routine callback when either the
pthread_cleanup_pop call is met ( execute has to be non-zero in that case) or if a thread exits prematurely, for
example by pthread_exit or by being killed.
The calls must be paired, that is they must be statements that are attached to the same innermost compound statement.
By that property they form some sort of implicit block, and implementations my even enforce that by hiding {} -pairs
inside the macros.
{
double*const q = malloc(something);
// implicitly starts a block
pthread_cleanup_push(free, q);
// use q as long as you wish
...
// implicitly terminates the cleanup block
// executes free unconditionally
pthread_cleanup_pop(true);
// now q falls out of scope
}
But such a simple usage does not extend easily when for example q may change during the execution of the inner code.
For such a scenario a proper function that performs the cleanup has to be provided.
void destroy(void* p) {
double** pp = p;
free(*pp);
*pp = 0;
}
{
double* q = malloc(something);
pthread_cleanup_push(destroy, &q);
// use q as long as you wish, and change it if necessary
...
// change q eventually
...
pthread_cleanup_pop(true);
// now q falls out of scope
}
Compared to the defer feature as proposed here, this feature has several differences that we mostly see as disadvantages
This feature is very similar in its functionality to what is proposed here. It allows to add a finally-block to a try-block
such that the finally-block is executed independently from how the try-block is terminated.
{
double* q;
__try {
q = malloc(something);
// use q as long as you wish
...
}
__finally {
free(q);
}
// now q falls out of scope
}
Compared to the defer feature as proposed here, this feature has several differences that we mostly see as disadvantages
The Gcc, Clang and Icc compilers implement a feature that has a functionality that has the same expressiveness to our
proposal, namely the cleanup attribute. It allows to attach a callback to a variable in the top level scope of a function as
follows:
void callback(toto*);
That is, callback is a function that receives a pointer to the variable and is supposed to do the necessary cleanup for
that variable.
Unfortunately this feature cannot be easily lifted into C23 as a standard attribute, because the cleanup feature clearly
changes the semantics of a program and can thus not be ignored.
Although this feature is attached to specific variables, it can easily be used and extended for arbitrary callbacks with a
signature of void (*)(void) . Therefore only a stub callback
#include <stdio.h>
#include <stdlib.h>
That is, an auxiliary variable someUnusedId of pointer to function type is used to attach the meta-callback that in turn
calls the callback that we are interested in.
Because Gcc has nested functions, code with functionality that is close to what we propose here could be something as
the following.
#include <stdio.h>
#include <stdlib.h>
int main(void) {
// ...
__attribute__((__cleanup__(destroy_double))) double* A = malloc(sizeof(double[54]));
// ...
__attribute__((__cleanup__(destroy_unsigned))) unsigned* B = malloc(sizeof(unsigned[5]));
// ...
// A nested function for the purpose
void local_callback(void) {
// do the cleanup here
printf("will be cleaning %p and %p\n", (void*)A, (void*)B);
}
__attribute__((__cleanup__(callback_caller))) void (*cb)(void) = local_callback;
}
Clang has similar possibilities to express semantically the same features, but this would need either the use of a plain
function (with less expressiveness) or the use of the block extension that they borrowed from Objective C. Similarly, the
code can be adapted to Microsoft’s MVC to implement the same semantics with __try and __finally . We will not go
into details how this can be achieved.
Our proposal puts the feature into a simple normative framework, makes it portable across implementations and, maybe
most importantly, facilitates its use. Code with the same functionality as above is somewhat simpler expressed as
follows.
#include <stdlib.h>
int main(void) {
// ...
double* A = malloc(sizeof(double[54]));
defer [&]{
printf("freeing A %p\n", (void*)A);
free(A);
}
// ...
unsigned* B = malloc(sizeof(unsigned[5]));
defer [&]{
printf("freeing B %p\n", (void*)B);
free(B);
}
// ...
defer [&]{
// do the cleanup here
printf("will be cleaning %p and %p\n", (void*)A, (void*)B);
};
}
Compared to the defer feature as proposed here, this feature has several differences that we mostly see as disadvantages
5. Design choices
As explained above, using lambdas as a main tool to express defer callbacks implies that we don’t have to decide if
variables inside these are accessed by their value when the defer is met or when it is executed. The different types of
captures for lambdas here provide the possibility for users to chose the variant they need for their particular case.
Other design choices that we discussed for the original proposal are not directly impacted by the implementation as
lambdas. We try to be mostly conservative here: the proposed feature should be functional, easy to implement and not
inhibit future extensions that might be proposed with a TR.
It was much discussed if defer should be a feature that is attached to functions or to blocks, and even the existing prior
art follows different strategies, here. Whereas gcc’s cleanup attribute is attached to functions, POSIX’ cleanup
functions and the try/finally are attached to possibly nested blocks.
This indicates that existing mechanism in compilers may have difficulties with a block model. So we only require it to
be implemented for function bodies and make it implementation-defined if it is also offered for internal blocks.
Nevertheless, we think that it is important that the semantics for the feature are clearly specified for the case of blocks,
such that all implementations that do offer such a feature follow the same semantics. Therefore it is also a constraint
violation to ask for the feature for blocks on implementations that don’t have support for it.
One of the possible difficulties for implementing the feature is if the list of defer that has to be processed can have an
order that depends on the execution. This could happen for example because some defer is conditionally omitted or
when a local jump interchanges the order in which two defer declarations are seen for the first time.
We think that the feature should be implementable with very little effort and resources. So we propose that
for a given execution of a block, a defer declaration can be met at most once
if a defer declaration is met during such an execution, all lexically preceeding defer declarations within the
same block have been met
each block has to make room for a finite number of defer callbacks that is known at compile time
these callbacks can be organized in an array that has defer callbacks in a fixed order, namely declaration order
the state of an execution of a block with respect to the defer feature can simply be described by the number of
defer declarations that have already been met.
These properties are enforced by an interdiction to jump over a defer declaration by means of switch , goto or
longjmp . This requirement is a natural extension of the fact that jumping over any kind of declaration skips the
initialization of a variable.
The original proposal had several features that are designed to handle exceptional control flow, such as preliminary
exists of the whole execution, of a thread or if signals are met. These features found a mixed reception and with this
paper we do not want to impose any such feature for implementations that do not yet have mechanisms to which they
could attach such features.
It is difficult to foresee which kind of requirements would be consensual for WG14, so we make one main proposal
which leaves most of the stack unwind properties undefined (in the direct sense of the term) and only imposes that for
any such scenarios none or all registered defer callbacks must be called. Two optional scenarios build on that, the first
just forcing implementations to document their behavior by making the stack unwind features implementation-defined.
The second additionally introduces feature tests that give the possibility to test dynamically at runtime if the present
incarnation of the C library allows to unwind the stack or not.
The existence of the POSIX cleanup feature shows that there is a demand for tools that cleanup a whole stack of
callbacks that are attached to a thread of execution. Also, implementations that are POSIX compliant and that would
want to build upon their existing implementation of the cleanup feature should not be penalized.
For the terminating functions in <stdlib.h> we try to follow the directions that the standard already has for callbacks;
that is in particular that abort should never call callbacks. On the other hand, a preliminary exit by one of these
functions should always have defined behavior.
For other functions that implementations offer we cannot impose much, in particular if we want to allow future
extensions (such as a panic function) or if we have to take well-established termination functions such a
pthread_kill into account. Therefore we make the behavior of all such extensions explicitly undefined and leave
room for implementations to be creative.
5.3.3. Signals
Signals are only scarcely specified in the C standard. In particular it only explicitly defines 3 software triggered signals
for which we may specify behavior in case they lead to a termination of a thread or execution. We don’t think that it
would make any sense for the standard to impose any type of behavior for the remaining 3 hardware interrupts that it
describes, so we leave the handling of these signals undefined by omission.
6. Suggested changes
6.1. Syntax anchor
Add a new term defer-declaration to the end of the declaration rule in 6.7 p1.
2 A declaration other than a static_assert or, attribute or defer declaration shall declare at least a
declarator (other than the parameters of a function or the members of a structure or union), a tag, or the
members of an enumeration.
Syntax
1 defer-declaration:
defer lambda-expression ;
Constraints
2 A defer declaration shall have block scope. It is implementation-defined if a defer declaration in a block
other than the outermost block of a function definition or lambda expression is accepted.1)
3 The lambda expression shall have no parameter type list or such a list specified as () or (void) .
Description
4 A defer declaration defines an unnamed object λ , called a defer callback, of lambda type and automatic
storage duration that is initialized with the lambda expression. The object has a lifetime that corresponds to
the current execution of the innermost block B in which the declaration is found. An abnormal termination
of B is a termination of B that is caused by a function call that does not return, by a signal or by a goto
statement. Sequenced immediately after the definition, λ is registered with the execution of B ; when the
execution of B terminates normally calls without arguments to the registered defer callbacks are sequenced
as void expressions
Recursively, if during the execution of a defer callback λ a defer declaration is met during the execution of
a block D , the corresponding defer callback κ is registered for that execution of D .2)
5 Jumps by means of switch , goto or longjmp shall not be used to jump over a defer declaration.3) If λ
does not return, the behavior is undefined.4)
6 Unless specified otherwise, abnormal termination of the execution of B shall not call defer callbacks; the
behavior is undefined unless the abnormal termination is caused
by a call to one of the library functions of clause 7.22.4 that are declared with _Noreturn ,5)
by the signals SIGABRT , SIGINT or SIGTERM ,6) or
by a call to thrd_exit .7)
1) Thus an implementation may allow a defer declaration for example as the declaration expression of a for-loop or inside another
compound statement, but programs using such a mechanism would not be portable. If a translation unit that uses such a defer
declaration is not accepted, a diagnostic is required.
3) This ensures that defer callbacks are properly initialized at the same time they are registered, that defer declarations are not
revisited during the same execution of a block, and that, within their block, defer callbacks are registered in lexicographic order of
their defer declarations.
4) So using calls to exit, thrd_exit, longjmp or any other library function that is specified with _Noreturn to terminate a defer
5) Implementations that provide other functionality to terminate execution are invited to document their behavior with respect to
defer callbacks.
6) Implementations that provide other signal values that terminate execution per default are invited to document their behavior with
7) Implementations that provide other functionality to terminate execution of a thread, for example by killing it from another thread,
6.3. Optional addition for thrd_exit, exit, abort, SIGINT, SIGTERM and SIGABRT
7 It is implemementation-defined if an explicit call to thrd_exit calls any defer callbacks. If it does so, it
calls the defer callbacks of all active execution of blocks of the thread that are registered before the call to
thrd_exit , sequenced in the reverse order they had been registered. These calls happen before any other
action defined for the thrd_exit library function are performed and take place within the scope of the
block for which they have been registered.
8 Similarly, it is implementation-defined, if an explicit call to exit calls defer callbacks for the current
thread. If the execution is terminated by a call to a different _Noreturn function of clause 7.22.4 than
exit , no defer callbacks shall be called.
9 Similarly, it is implementation-defined, if a default handling of the signals SIGINT and SIGTERM that
terminates execution calls defer callbacks for the current thread. If the execution is terminated because the
signal SIGABRT occurred, no defer callbacks shall be called.
6.4. Alternative version for thrd_exit, exit, abort, SIGINT, SIGTERM and SIGABRT with
feature tests
7 If the value of thrd_exit_defer is true , see 7.26, the defer callbacks of all active executions of blocks
of the thread that are registered before an explicit call to thrd_exit are called, sequenced in the reverse
order they had been registered. These calls happen before any other action defined for the thrd_exit
library function are performed and take place within the scope of the block for which they have been
registered. If the value is false , no defer callbacks are called.
8 Similarly, if the value of __exit_defer is true , see 7.22, an explicit call to exit calls the defer
callbacks for the current thread. If the value is false , no defer callbacks are called. If the execution is
terminated by a call to a different _Noreturn function of clause 7.22.4 than exit , no defer callbacks shall
be called.
9 Similarly, if the value of sig_exit_defer is true , see 7.14, a default termination of the executions for
the signals SIGINT or SIGTERM calls the defer callbacks for the current thread. If the value is false , no
defer callbacks are called. If the execution is terminated because the signal SIGABRT occurred, no defer
callbacks shall be called.
5 The macro
sig_exit_defer
expands to a value of type bool that is true if the implementation executes defer callbacks when the
default handling of signals SIGINT and SIGTERM terminates the execution, see 6.7.12, and false
otherwise; the expansion is not an lvalue and the value is the same for the whole execution.
6 The macro
__exit_defer
expands to a value of type bool that is true if the implementation executes defer callbacks on explicit
calls to exit , see 6.7.12, and false otherwise; the expansion is not an lvalue and the value is the same for
the whole execution.
6 The macro
thrd_exit_defer
expands to a value of type bool that is true if the implementation executes defer callbacks on explicit
calls to thrd_exit , see 6.7.12, and false otherwise; the expansion is not an lvalue and the value is the
same for the whole execution.
Depending on the version of lambdas that are integrated into C23, this example might need small adjustments.
10 EXAMPLE In the following, the values of p , q and r are used as arguments to free at the end of the
execution of main . Because the corresponding capture is a shadow capture, for p the initial value is used
as argument to the call; for q it is an identifier capture and thus the value is used that was stored last
before a return statement is met or the execution of the function body ends. Similarly, for r the value
capture rp has the address of r and frees the last allocation for which the address was stored in r . The
four return statements are all valid and according to the control flow that is taken the function executes
0 , 1 , 2 , or 3 defer callbacks. If at least the first three allocations are successful, the storage is freed in the
order r , q and p . If the call to realloc fails, the initial value of q is passed as argument to free .
#include <stdlib.h>
int main(void) {
double*const p = malloc(sizeof(double[23]));
if (!p) return EXIT_FAILURE;
defer [p]{ free(p); };
double* q = malloc(sizeof(double[23]));
if (!q) return EXIT_FAILURE;
defer [&q]{ free(q); };
double* r = malloc(sizeof(double[23]));
if (!r) return EXIT_FAILURE;
defer [rp = &r]{ free(*rp); };
{
double* s = realloc(q, sizeof(double[32]));
if (s) q = s;
else return EXIT_FAILURE;
}
}
a panic/recover mechanism and a more sophisticated specification for error paths such as preliminary exits and
signal handling
to allow defer callbacks to appear within conditionals such that they are registered dynamically in varying order
as they are met during execution
Other possible extensions would arise from the choices that are made in this proposal.
Using lambdas as the base feature as we propose here opens other possibilities, in particular for the status of captures.
This example from the beginning is not valid with our proposal:
double* q = malloc(something);
defer { free(q); };
The use of a compound statement here could be seen as an indication that the access to q is meant to be an identifier
capture, and we could then per default expand this to
double* q = malloc(something);
defer [&]{ free(q); };
On the other hand, in golang from where we borrowed this feature a version without {}
double* q = malloc(something);
defer free(q);
would use the value of q as it is evaluated when the defer declaration is met. So in this sense this would probably best
be expanded with as value closure
double* q = malloc(something);
Over all it seems that several extension of the syntax seem possible to the syntax
1 defer-declaration:
defer defer-callback ;
defer-callback:
no-argument-callable
function-call
compound-statement
no-argument-callable:
expression
The expression of a no argument callable shall evaluate to a function pointer type or to a lambda value type
that receive no arguments. A defer declaration with a function call or compound statement behaves,
respectively, as if it were specified with lambda expressions as in the following
If we would extend the possibility to have defer declarations in file scope this could have the similar semantics as calls
to atexit .
FILE* logfile = 0;
defer []{
if (logfile) {
log2file(logfile, "execution is terminating\n</p>\n</html>\n");
fclose(logfile);
}
};
FILE* logfile = 0;
void logfile_callback(void) {
if (logfile) {
log2file(logfile, "execution is terminating\n</p>\n</html>\n");
fclose(logfile);
}
};
8. Questions to WG14
8.1. Base
Does WG14 want to integrate a defer feature as proposed in 6.1 and 6.2 of N2895 into C23?
Since we don’t really have a possibility to vote for multiple choices, we propose to escalate the feature.
Does WG14 want to integrate previsions for abnormal termination as in 6.3 of N2895 into C23?
Does WG14 want to replace the previsions for abnormal termination and use 6.4 instead of 6.3 of N2895
into C23?
8.3. Example
Does WG14 want to integrate the example as in 6.5 of N2895 into C23?