Professional Documents
Culture Documents
Garbage Collection Techniques
Garbage Collection Techniques
net/publication/2751901
CITATION READS
1 910
1 author:
Norbert Podhorszki
Oak Ridge National Laboratory
158 PUBLICATIONS 2,774 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Norbert Podhorszki on 30 May 2013.
1. Introduction
In the execution of logic programming languages, a logic variable can have
only one value during its existence, after its instantiation it cannot be changed.
Therefore, new values cannot be stored in the same memory cell which property is
natural in imperative languages. For similar reasons, a data structure is always copied
when a small modification is required. Because of it, memory consumption speed is
very high and much smaller problems can be solved with declarative languages than
with procedural ones. Moreover, there are no deallocating features in declarative
languages. A memory management is required to reuse the lost memory areas.
Researches showed that most of the structures and variables created during the
execution are not referenced by the program and the allocated memory cells are not
usable. Obviously, these lost areas should be freed for reusing. Since the birth of the
declarative languages much effort have been done on solving this problem. The lost
memory areas are called garbages therefore, the common name of the problem is
garbage collection.
Since the late 50's many garbage collection algorithms have been proposed. At
the beginning they were used by the implementations of LISP and other functional
languages. In the 80's the Prolog required memory management. In 1981 J. Cohen
summarized and classified the work done in the area and these algorithms were altered
and developed later, see [Coh81] and [Wil92].
The early methods worked only with single-sized memory cells. The
introduction of structures and recording them into programming languages made the
performation of algorithms on varisized memory pieces important. Nowadays, every
language provides structures therefore, we consider only this kind of methods.
The functioning of a garbage collector consists of two parts:
(a) Garbage detection i.e., distinguishing the live objects from the garbage in
some way
(b) Reclaiming the storage occupied by garbage, i.e., making it available for
reusing.
* The work presented in this paper was done in the framework of Hungarian-Portuguese joint project
'PAR-PROLOPPE' registered under No. H-9305-02/1095
Garbage Collection Techniques
(a1) By keeping counters indicating the number of times objects have been
referenced. In this case, identification consists of recognizing those cells
whose reference count is zero.
(a2) By keeping a root set, a list of immediately accessible cells, global and local
variables of the running program, registers, etc., and following their
references to trace and mark every accessible (live) object. This method of
identification is usually called marking.
(b1) Incorporation into a free list in which available memory cells or areas are
linked by pointers.
(b2) Compaction of all living objects in one end of memory, while the other end
contains contiguous memory area available for reusing. The type of the
compaction can be classified by the relative positions in which objects are left
after compaction:
(b2.1) Arbitrary. The order of the compacted objects is arbitrary.
(b2.2) Linearizing. Objects originally pointing to each other (usually) become
adjacent after compaction.
(b2.3) Sliding. The original linear order of the objects is not changed by the
compaction.
Considering the execution time of the collectors, the algorithms can be divided
into two basic groups. This classification accords to the classification of the phase (a).
The methods in the first group make the unaccessible memory areas free as early as
possible. They use reference counters to indicate the number of times the objects are
referenced. This technique uses a list to connect the separate free areas.
The methods in the second group are executed when the program runs out of
memory. The collector marks all the accessible objects and creates two separate
memory areas, one containing the living objects while the other is the reusable free
memory area. Fig. 1 shows the classification of the algorithms.
However, there are further points of view in classifying garbage collection
algorithms. Run-time systems and interactive programs require collectors which do not
stop the execution of the program. The appropriate collectors are called incremental.
The recent algorithms divide the living objects into generations and the
memory areas containing older generation are collected less frequent than those
containing younger generations.
At last, the execution environment also influences the choice among the
techniques. The early algorithms were developed for sequential systems but the shared
memory multiprocessors and the distributed multicomputers require other - or at least,
modified - methods.
2. Reference Counting
2
Garbage Collection Techniques
of the referenced objects are decremented. Typically, when a single referenced large
structure is freed, all of the reference counts of the objects in that structure become
zero and all of the objects are reclaimed.
Theoretically, the refcount should be large enough to hold the number of
references which can equal at most to the number of memory cells therefore, the
counter should be as large as a pointer.
The main advantage of this technique is that the garbage collection time is
distributed and does not stop the execution of the program for a significant amount of
time which is a basic constraint on real-time and interactive systems. The inactive cells
can be stored in an implicit stack implemented as a linked list. The free cells themselves
can store the pointers. The distribution is not proper because of the transitive
reclamation of the whole data structures. Keeping a list of already freed but not-yet-
processed objects can distribute that recursive reclamation.
The other great advantage of this method is its locality. While other algorithms
scan the whole memory, reference counting works with objects locally. In a system
using virtual memory or paging environment or even running on recent computers
using cache memories, the local use of the memory is very important. The large
number of the so-called page faults - when the required data is not in the fastest
memory - may slow down the system significantly. (This feature may be also a
3
Garbage Collection Techniques
disadvantage because the live objects are not compacted and they are scattered in the
whole memory.)
However, the method has several disadvantages. First, the extra spaces needed
for the (theoretically) large counters occupy significant amount of memory. Second,
the permanent updating of the counters makes an overhead on the execution time, the
cost is proportional to the amount of the work done by the running program. Third,
this method is unable to reclaim cyclic structures. Finally, it can be remarked that
reference counting concentrates on the objects which become garbage while other
methods traverse the graph of live objects. The statistics show that the number of live
objects is always significantly less than the number of the garbage objects and the
implementors of the high-performance general-purpose systems prefer that kind of
garbage collectors.
The size of the counter can be smaller than that of a pointer. After the
maximum storeable reference number is reached, the counter is not more incremented
or decremented. In that case the objects referenced many times cannot be reclaimed
when they become garbage. However, the statistics show that most of the objects are
referenced only once. This fact can be taken into consideration in reference counting.
A one-bit-size counter method is implemented in [Chi87] (see also [Ina90]). It can
reclaim only the garbage of single referenced objects. Deutsch and Bobrow used hash
tables for LISP systems to handle the single referenced objects efficiently, see
[Bob75],[Deu76],[Coh81]. The "lazy reference counting" method (described in
[Goto88]) avoids to allocate additional spaces for single referenced objects, only the
multiple referenced ones are handled by indirection cells.
The time overhead consists of two kinds of cost. One cost is the permanent
updating of a counter whenever a pointer to the object is created or destroyed. The
argument passing at procedure activations creates short-lived variables - generally,
placed on the stack of the system - which usually disapper quickly because most
procedures (which are near the leaves of the call graph) return very shortly after they
are called. In these cases, reference counts are incremented and decremented in a very
short time interval cancelling each other. This overhead can be eliminated by special
treatment of local variables. The deferred reference counting (see [Deut76]) does not
take the references from the local variables into account. Of course, when a counter
becomes zero the object may not be reclaimable. The systematical scan of the local
stack brings the counters up to date and the objects still having zero counter after a
scan can be reclaimed. The interval between two scan phases should be chosen to be
short enough that garbage is reclaimed often and quickly, yet still long enough that the
cost of periodical scan is not high.
The other cost cannot be eliminated. When an object is reclaimed, some
bookkeeping must be done to make it available to the system. Typically, this involves
linking the freed objects into one or more "free lists", which the memory allocation
requests are statisfied from.
4
Garbage Collection Techniques
Therefore, reference
counting garbage collectors
usually include an other kind of
collection which is executed
when the memory is full and the
first method does not free
enough space. However, in that
case an interactive or real-time Figure 2.The problems with cyclic structures
system may fall back to the use
of a non-real-time collector at a critical moment.
In recent years, the cost of reference counting and its failure with circular
structures made it unattractive to most implementors. The mark-reclaim methods are
usually more efficient and reliable. Their incremental versions can be used in real-time
systems and the locality advantage of reference counting can also be partially included
in generational versions of them.
However, the implementor of a new system should examine the system in the
point of view of reference counting. The immediate reclaiming and the strong locality
may be superior to the disadvantages. If almost all objects live, the reference counting
performs with little degradation while "global scanning" techniques may be inefficient.
Reference counts themselves can be used in functional language implementations to
support optimizations by allowing destructive modification of uniquely referenced
objects. The weighted reference counting (explained in Section Parallel and Distributed
systems) can be used in distributed systems. Special hardware may also support the use
of reference counting.
5
Garbage Collection Techniques
The number_of_fields(p) gives the number of the data fields of the structure
referenced by p. These fields should be examined recursively. The field(p,i) gives a
reference to the ith data field of the structure referenced by p.
The procedure can be implemented without recursive procedure calls using a
stack. The stack stores the references which should be examined. When a structure is
marked, its fields that are references to non-marked structures are pushed into the
stack. The marking procedure uses the reference on the top of the stack which contains
the root set at the start of the marking. The procedure runs until the stack is empty. It is
a fast method but theoretically the size of the stack may be proportional to the size of
the memory. If the memory contains N structures and all of them are accessible from a
root, the size of the stack grows up nearly to N.
To avoid the problem of the large stacks Knuth [Knut73] used a stack of fixed
length h. He stored the references circular - using mod h as the stack index. When the
index exceeds h, the references are stored again from index 0, overwriting the previ-
ously stored information. Therefore, the stack recalls only the most recently stored h
items and forgets the other ones. Of course, the stack may become empty before the
whole marking task is complete. When it happens, the memory is scanned from the
lowest address, looking for any marked object whose content refers to unmarked
objects. If such an object is found, marking resumes as before and continues until the
stack becomes empty again otherwise, the task is complete. Not the whole memory
should be scanned if the procedure records the minimum address of the lost references.
The task can be done without using a stack. Notice, that the recursive
procedure call contains implicitly the use of the stack and the memory can become full
during the execution of the marking. Deutsch, Schorr and Waite developed
independently an elegant algorithm to LISP implementations, ([Scho67],[Knut73]). It
requires one additional bit per LISP cell. The main idea of this algorithm is that the
edges of the directed graph (consisting of the structures and their references) can be
directed reversely during the marking until leaves or already marked nodes are found.
After that, the next node to be examined can be reached by going back on the reversed
links which can be restored into the original directions this time. The additional bit per
cell (LISP cell) indicates the direction in which the restoration of reversed links should
proceed (i.e. wheter to follow the left or the right pointer).
Reclaiming
Sweeping
The simplest method for reclaiming is that the unmarked cells are incorporated
into one or more free lists. This may be accomplished by keeping a list for each object
6
Garbage Collection Techniques
size commonly used in a program, see [Knut73]. These are called homogeneous free
lists (H-lists). In addition, another free list, the M-list, contains cells of miscellaneous
size ordered by their address. If possible, the reclaimed cells are linked into the
corresponding H-list, otherwise, into the M-list.
The memory allocation requests are satisfied from the H-list of the desired size,
if it is not empty. Otherwise, a suitably cell is taken from the M-list, if possible. If it is
larger than the desired size, it is split into two smaller ones and unrequired half is linked
into one of the free lists. If the request cannot be satisfied, first a semicompaction is
performed to move the cells of the H-lists to the M-list and bind together the adjacent
free cells. If the semicompaction is still not enough, a real compaction is performed on
the free cells.
The sweeping algorithms has three main disadvantages. First, the handling of
the varisized cells is cumbersome and fragments the memory. Second, the cost of the
collection is proportional to the size of the entire memory because all live objects must
be marked and all garbages must be collected. The third problem is very interesting.
This method does not move the objects - which can be seen as an advantage because of
the unnecessity of pointer updating - therefore, the later allocated objects are
interspersed with the earlier ones. This interleaving may be unsuitable for a system
using virtual memory.
However, this problems are not always as bad as one could think. According to
the statistics, the objects are often created in clusters and are typically active in the
same time. The clever use of the mark bits can speed up the sweeping algorithms. The
use of bitmap for mark bits makes possible the checking of 32 bits with one 32-bit
integer operation. Since the objects tend to survive in clusters, this can greatly
decrement the constant of the proportionality. The use of bitmap can also reduce the
cost of the allocation by allowing fast allocation from contiguous unmarked areas
rather than using free lists.
Compacting
The sliding type collectors are important in the sequential implementations of
Prolog because most of the used data structures can be freed or reused at backtracking.
However, the efficient implementation requires the temporal order of the objects not to
be mixed.
Haddon and Waite ([Hadd67],[Wait73]) proposed a compactor of sliding type.
They perform two scans of the entire memory (notice that the marking phase performs
one more scan). The objective of the first scan is to perform the compaction and to
build a "break table" which is used by the second scan to readjust the references. The
break table contains the initial address of each "hole" - a sequence of unmarked cells -
and the size of the hole. The construction of the break table can be made without
additional storage because it can be build up in the holes. It can be proved that the
space available in the holes are sufficient to store the table. However, the table should
be handled dynamically, rolling through the holes already filled with new data. At the
end of the first scan, the live objects are collected into one end of the memory. The
break table occupies the liberated part of the memory. The table is then sorted to speed
up the pointer readjustment done by the second scan. It consists of examinig each
pointer, consulting the table (using a binary search in the ordered table) to compute the
new position of the cell and changing the pointer accordingly. This algorithm is
considerably slow because of the use of the holes and the binary search for each
reference.
7
Garbage Collection Techniques
8
Garbage Collection Techniques
encountered during this phase. At the end of the first scan all the backward pointers are
updated according to the new address of the pointed cell.
The second scan starts at the first memory cell and goes towards the end of the
memory. The forwarding pointers (pointing to a higher address) are updated in this
scan and the marked cells are moved now to their new positions.
3.2. Copying algorithms
The copying collectors move all of the live objects into a separate storage and
the rest of the memory is then reusable because it contains only garbage. The garbage
is not touched really thus, the "garbage collection" is implicit. While the compacting
collectors use a separate marking phase, the copying collectors integrate the traversal
of the data and the copying process. Therefore, most objects are traversed only once.
The work is proportional to the amount of live data.
An obviuous algorithm for compacting is that all accessible (live) data are
written out to a secondary storage area and then they are read back to the main
memory. However, this solution has several disadvantages:
Minsky proposed in [Mins63] an algorithm for LISP which used one marking
bit per cell. Each accessible cell is traced and marked if it is unmarked. The program
computes and outputs a triplet to the secondary storage. The triplet consists of the left
and right fields of the cell and the computed new address of it. The reference is
updated according to the new address. The new address is placed also in the marked
cell therefore, the multiple references can be handled easily. After marking, the triplets
are read back and the contents of the fields are stored in the specified new address.
Minsky's algorithms is a linearizing collector because the linked list elements are
positioned next to each other.
The secondary storage area can also be placed in the main memory and the role
of the storages can be changed periodically. This common kind of copying garbage
collectors is the semispace collector - proposed first by Fenichel and Yochelson in
[Feni69] - using Cheney's algorithm for the copying traversal [Chen70].
The data memory is subdivided into two contiguous semispaces. During normal
program execution only one of these semispaces is in use. Notice, that the cost of the
memory allocation is low unlike that kind of memories which store the free areas in
lists (used with reference counting or mark-sweep collectors). When the available half
of the memory (the current semispace) is exhausted, the program is stopped and the
collector is called. All of the live data are copied from the current semispace
(fromspace) to the other semispace (tospace). Once the copying is completed, the
tospace becomes the current semispace, and the program execution is resumed. Thus,
the roles of the two spaces are reversed each time the garbage collector is invoked. See
the work of the copying collector on Fig. 5.
Cheney used a breadth-first traversal method for copying the objects. The
tospace is considered as a queue. A "free pointer" points to the end of the queue and a
"scan pointer" points to the object to be examined. Initially, an immediately reachable
(from the root set) object is copied into the tospace. The scan pointer is advanced
9
Garbage Collection Techniques
Figure 5. Copying of a graph from the current space to the new space
through the object, location by location. Each time a pointer into fromspace is
encountered, the referred-to-object is copied to the end of the queue (the free pointer
isadvanced now), the pointer is updated according to the new location. Then the scan
continues. This algorithm effects the breadth-first traversal, copying all of the
descendants of a structure next to it, resulting a kind of linearizing collector (not in the
sense of the depth-first linearizing).
10
Garbage Collection Techniques
If the scan pointer reaches the free pointer, another object, referenced from the
root set, is copied into the queue and the process continues. Once all immediately
reachable objects are processed, all of the live data are copied into tospace. To avoid
the multiple copying of a multiple referred object, a forwarding pointer is installed in
the old version of the object, like in Minsky's solution.
The copying collectors are very efficient if sufficient large memory is available.
The cost of one collection is proportional to the amount of the live objects and
independent of the size of the memory. It can be assumed that approximately the same
amount of data is live during the program execution. In this case, decreasing the
frequency of garbage collection will decrease the total cost of collections. The more
memory is available the less collection is performed.
The data are transferred fast and only once in the memory - unlike Minsky's
algorithm. However, the disadvantage of pointer updating remains.
3.3. Non-copying implicit collector
To avoid the updating of the references, Baker proposed in [Bake92] a new
collector which does not move the objects. The separate spaces are considered as sets.
Any implementation of these sets is proper if the following operations are efficient
enough: (1) determine which set the object belongs to; (2) swap the roles of the sets.
One implementation is the copying method using two separate continuous spaces.
Baker implements these sets as double-linked lists. Two pointer fields and a
"color" field are added to each object. The pointers link the object into one of the sets
and the "color" field indicates which set it belongs to. The free space is divided into
chunks which are linked into a list and the memory allocation is performed by
advancing a pointer forward in this list. This allocation pointer divides the list into the
used part and the remaining free part. The disadvantage is the memory fragmentation
by supporting varisized objects, similar to other list processing collectors.
When the free list is exhausted, the collector traverses the live objects but the
transfer contsists only of unlinking the object from the "fromset", recoloring it and
linking it into the "toset". The space reclamation is implicit. When all of the reachable
objects are moved to the toset, the fromset contains only garbage and can be used as a
free list. The cost of the collection is proportional to the amount of the live data
however, the per-object constants are higher than that in copying algorithms. Notice,
that the reclaimed area is linked to the free list with one operation, unlike other free list
handling collectors which link the free object spaces one by one. Therefore, the
efficiency is much higher in this solution.
The main advantage is that the objects are not moved in the memory and the
references have not to be updated. This is important in languages which support
language-level pointers. In parallel environment the avoidance of updating remote
references is particularly important. This is also true in real-time systems where the
collection is incremental and the memory should be shared with the running program.
Baker's collector is an incremental one, here only the non-incremental version
is described. The more complex implementation of the incremental version is explained
in Section 4.3.
4. Run-time systems
In a real-time application, which has to respond for the events in a costrainted
time interval, the garbage collector must not stop the execution for significant time -
for minutes, seconds or even milliseconds depending on the task of the application.
The collector should distribute its work in time and enable the program running. It
11
Garbage Collection Techniques
implicates that the collection cannot be made as an atomic action but it should be
interleaved with program execution. This kind of collectors are called incremental.
As we could see, the reference counting scheme can be easily modified to be
incremental. However, its effectiveness is not enough in many cases and its space and
time overhead is higher than that of other solutions. Therefore, the copying and
marking collectors (they are called together tracing collectors) were made incremental.
The difference between the reclamation methods are not particularly important
in this point of view. However, the incremental tracing is very interesting. The
difficulty is that the program running paralle with the collector may change the graph
of the accessible objects while the collector is working on another part of the graph.
Therefore, the program is referred as mutator, which can be seen as a concurrent
process modifying the data structures.
4.1. The tricolor marking scheme
The problem of the concurrent marking can be described with the abstract
tricolor marking scheme. The tracing collectors (during marking or copying) can be
described as a process of traversing the graph of live objects and coloring them.
Initially, all of the objects are white, and by the end of tracing, the live objects should
be colored black.
In a classical marking process, the coloring is implemented directly by the mark
bit. The bit of the accessed object is set, i.e. they are black. In a copy collector the
coloring is implicitly implemented by the moving of the object. The color is defined by
the position of the object, those being in tospace, are black, the others are white.
The mutator can change the structure of the graph. If it writes a new object
reference into an already traversed data structure, the collector will not access the new
object and its memory area will be reclaimed. Introducing an intermediate coloring
technique, an invariant can be given to define precisely these problems.
A third color, gray, indicates
that an object has already been
reached but its descendant may not
have been. That is, a reached white
object is colored gray. When it is
scanned and its all pointers are
traversed, it is colored black and its
descendants (offsprings) are colored
gray. Fig. 6. shows an example, where
A and its descendants are copied. The
classical marking process - using
somewhat graph traversal - stores the
Figure 6.The tricolor marking scheme objects to be examined in a stack or
queue. The gray objects correspond to
the objects stored in that control structure. The black ones are the processed objects,
and those that have not been reached are white. In a copying collector, the gray objects
are those that are in the unscanned area of the tospace - between the scan and free
pointers. The objects that have been passed by the scan pointer are black.
12
Garbage Collection Techniques
There are two basic approaches to coordinating. The use of a read barrier
avoids the access of white objects by the mutator. When the mutator attempts to access
a pointer to a white object, the object is colored immediately gray. As a consequence,
the mutator is not able to install in a black object a pointer referring to a white object.
Baker's collectors use read barrier.
The other approach is more direct and involves a write barrier. When the
mutator attempts to write a pointer into an object, the write is trapped and recorded.
The write barrier technique is cheaper than the read barrier one because the memory
write is much less common than memory read.
However, the invariant has not to be hold when a pointer to a white object is
only copied into a black object. In this case, the white object will be reached through
the original reference. The snapshot-at-beginning collector recordes only the
overwritten pointers referring to white objects ensuring that they will be found during
the collection.
The incremental update collectors records even less pointers.
4.2. Baker's incremental copying collector
Baker adapted the simple copying scheme of Cheney to the real-time systems,
see [Bake78]. It uses read barrier for coordination with the mutator. As the original
algorithm, it copies the accessible objects in breadth-first traversal, advancing the scan
pointer in the unscanned area of tospace and moving the referred-to objects from
fromspace to tospace and updating the reference pointer.
The new objects allocated by the mutator during the collection are allocated in
tospace and are treated as already examined objects - i.e. they are assumed to be live
or, with other words, they are black initially. The freed ones of them will not be
reclaimed until the next garbage collection cycle.
The incremental behaviour requires that all of the live data should be find and
copied into the tospace before the available free space is exhausted by the program. In
order to ensure it, the rate of copy collection is tied to the rate of the allocation.
Whenever an object is allocated, an increment of scanning and copying is done.
In order to hold the invariant, the read barrier ensures that the objects of the
fromspace (white objects) accessed by the program (mutator) are copied into the
tospace (they become gray).
13
Garbage Collection Techniques
14
Garbage Collection Techniques
end connected to the from-list. The scan pointer separates the scanned and unscanned
area of the to-list.
The algorithm is isomorph with the copying algorithm. The objects in the
new-list are black. The objects of the from-list are white. The to-list contains black and
gray objects, which are separated by the scan pointer. When all of the reachable objects
of from-space are moved to the to-list and there are no unscanned (gray) objects in
to-list, the remaining objects of from-list are known to be garbage and the collection is
complete. The free-list merged with the from-list will be the free-list, while the new-list
and to-list (containing preserved data) form the from-list at the next cycle. (The
new-list and to-list are empty).
The situation is similar to the beginning of the previous cycle, except that the
segments have moved around the cycle - hence the name "treadmill".
15
Garbage Collection Techniques
Any pointer, which is overwritten and the original value is not copied into a
reachable object, is not taken into account - unlike the snapshot-at-beginning - and the
originally pointed object become garbage if there are no other references to it.
Another difference between this scheme and the others is that the newly
allocated objects during the collection are assumed not to be reachable (their color is
not initially black but white). If a pointer to one of them is installed into a reachable
object, it will be surely traversed and retained. This is a very important feature because
most of the objects are short-lived, so if the collector does not reach such an object
early in its traversal, it is probably never reached. The plus time overhead of this
solution - comparing to the others - is that newly allocated and live objects have to be
traversed. However, more space can be reclaimed at the end of the garbage collection.
4.6. A case study: An incremental Garbage Collector for
WAM-based Prolog
The careful analysis of a system may make possible to develop a very particular
but efficient garbage collector. The WAM (Warren Abstract Machine) is the most
efficient sequential interpreter of Prolog programs. However, it gives a constraint of
the garbage collection: the temporal order of the objects on the heap must be kept in
order to be able to undo the changes on backtracking.
Nowadays the actual requirements on a garbage collector include the
incremental feature and the usability in virtual memory systems. W.J.Older and
J.A.Rummel proposed in [Old92]an incremental garbage whose short collection phases
are built into the code of the compiled Prolog program (WAM code). These short
phases use a copying algorithm despite the fact that copying algorithm is not a sliding
compactor.
A choicepoint stack is used in WAM in order to permit bactracking. A
choicepoint holds information required to return the computation to the state just
before a non-deterministic call. For example, it stores pointers which show the end of
the heap and environment stack before the call. The deleting of the unnecessary data
(because of the failure of the computation) of the heap on backtracking is a simple
pointer update. This requires the temporal order of the objects on the heap but only
across choicepoint boundaries. Therefore, a copying algorithm can be used, if the
collector's work is so distributed that only one area is collected at a time.
Another observation is that garbage - which cannot be reclaimed by
backtracking - is in the tip region of the heap (past the critical point remembered in the
choicepoint stack). If a collection is made before this region is "buried" by new data
and new choicepoints, the collection can discard all of the garbage in one choicepoint
area (using a non-sliding compactor however, there are some problems with unbound
variables).
The collection of a small region involves the difficulty of finding all external
references into it. Because of the nature of Prolog, the pointers are under complete
system control and by choosing appropriate points of the execution, when collection
can be performed, the number of the external references can be limited to a list of
known locations. Particularly, the WAM implementation can provide all external
references without additional cost.
The implementors execute the collector when the program execution returns
from the call that created the last choicepoint. Therefore, the invocation of the
collector is built into one WAM instruction (the so-called proceed). However, careful
analysis showed that the cut operation can also 'cut' the execution of the collector and
it was modified, too.
16
Garbage Collection Techniques
The memory is subdivided into areas that will hold objects of different
(approximate) ages, or generations. Objects are allocated in the area of the youngest
objects, until it is full. Then this generation is collected, reclaiming the inaccessible
objects. Fig. 9. shows the memory organization of a system using generational copying
collector with two generations. The state of the system after the collection of the
younger generation can be seen in Fig. 10.
If an object survives long enough to be member of an older age, it is moved
into an older generation. It will not collected again in the next collections of the
younger generation. An older generation will fill up and it is collected, too. However, it
fills up much slowly because relatively few objects live long.
The number of generations may be greater than two. The statistics of logic
programming languages (even the parallel ones) classify the objects clearly into two
classes therefore, the two-generation collector is suggested. However, there is an
eight-generation collector used in a Smalltalk-80 implementation, see [CWB86],
indicating that other languages has other features.
17
Garbage Collection Techniques
not be preserved, or by the use of a copying collector, the pointers may not be updated.
The first solution is to introduce indirection tables which can be used as part of the root
set. However, the cost of the indirection is high and it is not recommended. A better
solution is to recording these pointers. If the system observates a pointer in the older
generation referring to a younger object, it is pushed into a stack. The collector of the
younger generation can use the pointers stored in the stack as roots and it can update
them if necessary. The observation is similar to the write barrier used in the incremental
collectors.
18
Garbage Collection Techniques
Figure 10. The memory after the collection of the youngest generation
19
Garbage Collection Techniques
is used. (1) If an older generation is collected, all the youngers are scavenged, too. In
this case, none of the intergenerational pointer should be taken into account as root
pointers. (2) The pointers from the new generation to the older one can be found by
scanning the new generation. In this case, all data in the new generation are assumed
to be live and the pointers are used as roots. The cost of scanning the new generation
for pointers is less then the cost of the garbage collection on it.
(1) How long an object must survive in one generation before it is advanced to
the next.
(2) The organization of the memory, dividing it among generations and within
one generation. How it affects the efficiency in virtual memory systems and
cache memory systems.
(3) The scheduling of the collection to reduce program pauses as much as
possible.
20
Garbage Collection Techniques
Another disadvantage of this scheme is that every remote pointer creation and
deletion causes a message passing through the network.
21
Garbage Collection Techniques
The weighted reference counting described in [Bev87] and [Wat87] avoids the
necessity of synchronisation and the communication overhead is restricted to one
message when a reference is deleted. In this solution, each reference has a positive
weight represented by an integer and each object has a reference count. The invariant
of the system is:
The reference count of an object is equal to the sum of the weights of the
references to it.
This invariant ensures that the refcount of an object becomes zero if, and only
if, it is not referenced by others. The possible operations hold the invariant in the
following way:
When a new object is created with a reference from an already existing object,
the new object is given an arbitrary positive reference count and the reference to it is
given a weight equal to that count.
When a reference is duplicated, its weight is split between the two resulting
references in such a way that the sum of their weights is equal to the original weight.
Notice, that the reference count of the object is not modified and no communication is
needed.
When a reference to an object is deleted, the reference count of the object is
reduced by the weight of the reference. To achieve it, a message should be sent to the
object.
These operations hold the invariant. The problem of non-synchronised
networks described above is avoided because the reference counter of an object cannot
rise, it can be only decremented.
Since the weights are always halved, in the implementations the weights are
restricted to be powers of two therefore, only the binary logarithm should be stored.
That is, if a weight is n bit wide, it can contain values from 0 to 2n-1, representing the
numbers of powers of two from 1 to 2(2n-1). For example, using 3 bits for reference
counts, storing numbers from 1 to 8, a 2 bit long weight field is enough to store the
possible 1,2,4 or 8 numbers.
The problem of this scheme is that the references of weight 1 cannot be
duplicated. In order to cope with them, an indirection cell is introduced. It is a small
object consisting of a reference with weight one (the weight should not be stored,
because it is always 1) and the own reference count. When a reference with weight one
should be duplicated, an indirection cell is created which refers to the referred object
and its refcount is given the maximum. The reference to be duplicated is replaced by a
reference to the indirection cell with the maximum weight and it is duplicated as
normal. Notice, that the reference count of the referenced object does not need to be
changed. To avoid the creation of long reference chains at the first evaluation the result
can be placed into the indirection cell.
6.2. PIE64
A good example of hybrid distributed garbage collectors is the one developed
for the Parallel Inference Engine (PIE64) computer and the FLENG logic language
[Xu89]. The japans use both the reference counting and the mark-scan schemes. The
collector is divided into three parts: first, the reference counting scheme is used to
collect the single referenced objects (on one inference unit, IU); second, a local
mark-scan collection is performed to reclaim completely the local, single referenced
22
Garbage Collection Techniques
objects; third, a global mark-scan collection completes the task of the garbage
collection.
Since all allocations in logic programming languages are regulated to the
system, it can have some information about the objects, and their consuming rates,
lifetimes and references can be estimated from these information. The memory is
divided into pages and single referenced objects with the same estimated consuming
rates are allocated in the same page (or pages, if they occupy much memory).
Therefore, page reference counting can be used. That is, each page has a reference
counter which is equal to the number of references to the objects placed in the page.
The whole page can be reclaimed when its counter becomes zero. In ideal case, all of
the objects of a page become garbage almost in the same time and the unreachable
areas of the memory become available to the system soon.
Of course, the estimation of the lifetime of the objects is not proper therefore,
there may be pages in the system in which only a small part of the objects are in use.
These pages cannot be reclaimed completely. The second stage of the garbage
collection consists of a local marking and compacting procedure. Like in other parallel
implementations of logic languages, the subgoals of the systems are executed parallel.
In this implementation, the objects allocated during the executing of a goal are
allocated separately by the others, they are stored in goalframes. The goals of the
system are activated or suspended and stored in two corresponding queues. The
goalframes which are not accessible from the active goal queue and the suspended goal
queue can be reclaimed. In the mark phase, only the goalframes are marked, the
involved objects are not, therefore, the marking is very fast. The goalframes are then
compacted using sliding compaction.
Finally, a global garbage collection is used to reclaim all of the garbage. The
system is halted and the collection is performed on all IU. In usual computation an
object points indirectly to a remote object, but during the collection an indirection table
is built on each processor. The hardware of the PIE64 supports the creation of this
table. The root set now consists of the active and suspended goal queue and the
Remote-Mark requests. When during the marking a remote pointer is found, a
Remote-Mark request is sent to the corresponding IU where the pointer is placed into
the indirection table and, with the help of a backward message, the original cell is
rewritten to refer to the indirection table. The message sending is performed by the
Network Inference Processors (NIPs) independently from the main processor.
The compaction phase uses Morris' algorithm. The first phase is the same as
that in Morris' original algorithm, when a remote reference is found, nothing is done.
In the second phase, when the memory is scanned from the lowest address, the remote
pointers are also restored which is also supported by the NIPs.
6.3. Garbage Collection on shared-memory multiprocessors
While the processing elements (PEs) in a distributed system use their own local
resources and the remote resources are accessed via communication, the PEs of a
shared memory use a common resource. The synchronization among the data accesses
is very important. The data memory is a common area and the garbage collection is
performed on this space. Since the processors work on a common area, the garbage
collection is a common work, too. Therefore, parallel algorithms should be developed
for these systems. Naturally, the existing sequential algorithms are parallelized.
Generally, the memory is divided into regions owned by the processors
exclusively avoiding the frequent use of the synchronized data access. Each PE use its
own "local" memory area - like in distributed systems however, the access of common
23
Garbage Collection Techniques
data is realized in a synchronized way. Since the shared data structures become
temporarily inconsistent during a collection process of a PE, the other processors have
to stop the normal execution therefore, they can be better involved in the garbage
collection.
The marking-compacting and the copying algorithms can be modified to
become parallel processes. Parallel copying methods can be found in [Ima93] and
[Ali96]. The basic difference between the two methods is that the former
implementation is based on the breadth-first travelsal of the objects, while the latter
traverses them in a depth-first manner which is a better method if the storage consists
of non-contiguous memory blocks.
Morris' sliding collector is also adapted to shared systems, see [Wee90]. Since
the method concentrates only on the reclaiming phase of the mark-reclaim collection,
the use of an incremental marking process is suggested.
The generational collectors can also be implemented in parallel environment,
see [Oza90].
The main problem of the parallel collectors is the load balancing. The
appropriate distribution of work is very important to achieve the fastest collection.
Conclusion
As it is shown in this study, there is no an unambiguously best method among
the proposed garbage collection algorithms. Each of them has advantages and
disadvantages, and an exhaustive analysis of the system should be done before a
collection method is selected. The main points of view of the analysis should be:
♦ Speed
♦ Effectiveness
♦ How much extra space can be used for garbage collection
♦ Virtual memory environment and use of cache memories
♦ Real-time or interactive systems
♦ How much work can be spared with compile time analysis
♦ Shared memory multiprocessors
♦ Distributed multicomputers, speed of interconnection network
♦ Special hardware features
This report was written in order to give an overview on the existing methods
for garbage collection. In the future, we are going to develop a garbage collector for
the Distributed Data Driven Prolog Abstract Machine (3DPAM). This system runs on
distributed multicomputers however, the processing elements use only their own local
memory.
References
[Coh81] J. Cohen: Garbage Collection of Linked Data Structures
Computing Surveys, Vol. 13, No. 3, September 1981.
[Wat87] P. Watson, I. Watson: An Efficient Garbage Collection Scheme for Parallel
Computer Architectures.
Conference on PARLE'87.
[Bev87] D. I. Bevan: Distributed Garbage Collection Using Reference Counting.
Conference on PARLE'87.
24
Garbage Collection Techniques
25
Garbage Collection Techniques
26
Garbage Collection Techniques
27