Garbage Collection Techniques

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/2751901
Garbage Collection Techniques
Article · March 1998

Source: CiteSeer
CITATION READS
1 910
1 author:
Norbert Podhorszki
Oak Ridge National Laboratory
158 PUBLICATIONS 2,774 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
The EPSI Project View project
Data Reduction in Exascale HPC Systems View project
All content following this page was uploaded by Norbert Podhorszki on 30 May 2013.
The user has requested enhancement of the downloaded file.

Garbage Collection Techniques*
Norbert Podhorszki
KFKI Research Institute for Measurement and Computing Techniques of the
Hungarian Academy of Sciences
H-1525 Budapest, P.O.Box 49. Hungary
E-mail: pnorbert@sunserv.kfki.hu
1. Introduction
In the execution of logic programming languages, a logic variable can have
only one value during its existence, after its instantiation it cannot be changed.
Therefore, new values cannot be stored in the same memory cell which property is
natural in imperative languages. For similar reasons, a data structure is always copied
when a small modification is required. Because of it, memory consumption speed is
very high and much smaller problems can be solved with declarative languages than
with procedural ones. Moreover, there are no deallocating features in declarative
languages. A memory management is required to reuse the lost memory areas.
Researches showed that most of the structures and variables created during the
execution are not referenced by the program and the allocated memory cells are not
usable. Obviously, these lost areas should be freed for reusing. Since the birth of the
declarative languages much effort have been done on solving this problem. The lost
memory areas are called garbages therefore, the common name of the problem is
garbage collection.
Since the late 50's many garbage collection algorithms have been proposed. At
the beginning they were used by the implementations of LISP and other functional
languages. In the 80's the Prolog required memory management. In 1981 J. Cohen
summarized and classified the work done in the area and these algorithms were altered
and developed later, see [Coh81] and [Wil92].
The early methods worked only with single-sized memory cells. The
introduction of structures and recording them into programming languages made the
performation of algorithms on varisized memory pieces important. Nowadays, every
language provides structures therefore, we consider only this kind of methods.
The functioning of a garbage collector consists of two parts:
(a) Garbage detection i.e., distinguishing the live objects from the garbage in
some way
(b) Reclaiming the storage occupied by garbage, i.e., making it available for
reusing.
In practice, these two phases may be functionally or temporally interleaved and

the reclamation technique strongly depends on the used detection method. Phase (a)
can be performed using one of the two methods:
* The work presented in this paper was done in the framework of Hungarian-Portuguese joint project
'PAR-PROLOPPE' registered under No. H-9305-02/1095
(a1) By keeping counters indicating the number of times objects have been
referenced. In this case, identification consists of recognizing those cells
whose reference count is zero.
(a2) By keeping a root set, a list of immediately accessible cells, global and local
variables of the running program, registers, etc., and following their
references to trace and mark every accessible (live) object. This method of
identification is usually called marking.
Phase (b) can also be divided into two classes:
(b1) Incorporation into a free list in which available memory cells or areas are
linked by pointers.
(b2) Compaction of all living objects in one end of memory, while the other end
contains contiguous memory area available for reusing. The type of the
compaction can be classified by the relative positions in which objects are left
after compaction:
(b2.1) Arbitrary. The order of the compacted objects is arbitrary.
(b2.2) Linearizing. Objects originally pointing to each other (usually) become
adjacent after compaction.
(b2.3) Sliding. The original linear order of the objects is not changed by the
compaction.
Considering the execution time of the collectors, the algorithms can be divided
into two basic groups. This classification accords to the classification of the phase (a).
The methods in the first group make the unaccessible memory areas free as early as
possible. They use reference counters to indicate the number of times the objects are
referenced. This technique uses a list to connect the separate free areas.
The methods in the second group are executed when the program runs out of
memory. The collector marks all the accessible objects and creates two separate
memory areas, one containing the living objects while the other is the reusable free
memory area. Fig. 1 shows the classification of the algorithms.
However, there are further points of view in classifying garbage collection
algorithms. Run-time systems and interactive programs require collectors which do not
stop the execution of the program. The appropriate collectors are called incremental.
The recent algorithms divide the living objects into generations and the
memory areas containing older generation are collected less frequent than those
containing younger generations.
At last, the execution environment also influences the choice among the
techniques. The early algorithms were developed for sequential systems but the shared
memory multiprocessors and the distributed multicomputers require other - or at least,
modified - methods.
2. Reference Counting
2.1. The original method

As it is described in [Coll60] and [Weiz63],an extra field, called refcount, is
introduced to each object. This field indicates the number of times the object is
referenced. This field has to be updated each time when a pointer to the object is
created or destroyed. When the counter becomes equal to zero, the object is inactive
and can be collected. If it is a structure, all of its fields are examined and the refcounts
2
Garbage Collection Algorithms
Reference counting Mark & ...
& sweep & compact Copying algorithms
Separate phase Incremental collectors for Run-time systems
Reference counting Incremental mark & ...

or copying
use of read barrier use of write barrier
One generation of objects Generational collectors
Sequential machines Shared memory Distributed

multiprocessors multicomputers
Figure 1. Classification of garbage collection techniques
of the referenced objects are decremented. Typically, when a single referenced large
structure is freed, all of the reference counts of the objects in that structure become
zero and all of the objects are reclaimed.
Theoretically, the refcount should be large enough to hold the number of
references which can equal at most to the number of memory cells therefore, the
counter should be as large as a pointer.
The main advantage of this technique is that the garbage collection time is
distributed and does not stop the execution of the program for a significant amount of
time which is a basic constraint on real-time and interactive systems. The inactive cells
can be stored in an implicit stack implemented as a linked list. The free cells themselves
can store the pointers. The distribution is not proper because of the transitive
reclamation of the whole data structures. Keeping a list of already freed but not-yet-
processed objects can distribute that recursive reclamation.
The other great advantage of this method is its locality. While other algorithms
scan the whole memory, reference counting works with objects locally. In a system
using virtual memory or paging environment or even running on recent computers
using cache memories, the local use of the memory is very important. The large
number of the so-called page faults - when the required data is not in the fastest
memory - may slow down the system significantly. (This feature may be also a
3
disadvantage because the live objects are not compacted and they are scattered in the
whole memory.)
However, the method has several disadvantages. First, the extra spaces needed
for the (theoretically) large counters occupy significant amount of memory. Second,
the permanent updating of the counters makes an overhead on the execution time, the
cost is proportional to the amount of the work done by the running program. Third,
this method is unable to reclaim cyclic structures. Finally, it can be remarked that
reference counting concentrates on the objects which become garbage while other
methods traverse the graph of live objects. The statistics show that the number of live
objects is always significantly less than the number of the garbage objects and the
implementors of the high-performance general-purpose systems prefer that kind of
garbage collectors.
The size of the counter can be smaller than that of a pointer. After the
maximum storeable reference number is reached, the counter is not more incremented
or decremented. In that case the objects referenced many times cannot be reclaimed
when they become garbage. However, the statistics show that most of the objects are
referenced only once. This fact can be taken into consideration in reference counting.
A one-bit-size counter method is implemented in [Chi87] (see also [Ina90]). It can
reclaim only the garbage of single referenced objects. Deutsch and Bobrow used hash
tables for LISP systems to handle the single referenced objects efficiently, see
[Bob75],[Deu76],[Coh81]. The "lazy reference counting" method (described in
[Goto88]) avoids to allocate additional spaces for single referenced objects, only the
multiple referenced ones are handled by indirection cells.
The time overhead consists of two kinds of cost. One cost is the permanent
updating of a counter whenever a pointer to the object is created or destroyed. The
argument passing at procedure activations creates short-lived variables - generally,
placed on the stack of the system - which usually disapper quickly because most
procedures (which are near the leaves of the call graph) return very shortly after they
are called. In these cases, reference counts are incremented and decremented in a very
short time interval cancelling each other. This overhead can be eliminated by special
treatment of local variables. The deferred reference counting (see [Deut76]) does not
take the references from the local variables into account. Of course, when a counter
becomes zero the object may not be reclaimable. The systematical scan of the local
stack brings the counters up to date and the objects still having zero counter after a
scan can be reclaimed. The interval between two scan phases should be chosen to be
short enough that garbage is reclaimed often and quickly, yet still long enough that the
cost of periodical scan is not high.
The other cost cannot be eliminated. When an object is reclaimed, some
bookkeeping must be done to make it available to the system. Typically, this involves
linking the freed objects into one or more "free lists", which the memory allocation
requests are statisfied from.
The reference counting is unable to detect a "cyclic garbage" where some

objects refer to each other but they are not reachable by the program. For example,
graph data structures or trees with nodes which contain backpointers to the parents
contain cycles, see Fig.2. Complex cycles are sometimes formed by the use of hybrid
data structures which combine some simpler data structures. If a program creates many
cyclic structures, it runs out of the memory soon despite the garbage collection.
Although, there are some solutions to reclaim cyclic structures (see. e.g. [Les92]), they
are very inefficient.
4
Even if the size of the

counter is small, the objects
referenced many times remain
in the memory during the whole
program execution.
Therefore, reference
counting garbage collectors
usually include an other kind of
collection which is executed
when the memory is full and the
first method does not free
enough space. However, in that
case an interactive or real-time Figure 2.The problems with cyclic structures
system may fall back to the use
of a non-real-time collector at a critical moment.
In recent years, the cost of reference counting and its failure with circular
structures made it unattractive to most implementors. The mark-reclaim methods are
usually more efficient and reliable. Their incremental versions can be used in real-time
systems and the locality advantage of reference counting can also be partially included
in generational versions of them.
However, the implementor of a new system should examine the system in the
point of view of reference counting. The immediate reclaiming and the strong locality
may be superior to the disadvantages. If almost all objects live, the reference counting
performs with little degradation while "global scanning" techniques may be inefficient.
Reference counts themselves can be used in functional language implementations to
support optimizations by allowing destructive modification of uniquely referenced
objects. The weighted reference counting (explained in Section Parallel and Distributed
systems) can be used in distributed systems. Special hardware may also support the use
of reference counting.
3. Marking and reclaiming
3.1. Classical solutions

The garbage collection is executed in two phases. In the first phase, the
accessible objects are marked. One bit per object is used. In the second phase the
whole data memory is scanned and the marked objects are collected in a contiguous
area. If the collection really moves the objects in the memory, the references to them
should be updated according to the new positions of the objects.
Marking
The marking can be defined as follows. There are some immediately accessible
objects in the program, they are global and local variables, registers of the system. They
are called root set. They contain references to objects in the memory. If a data
structure is accessible each object referenced by one of the structure fields is accessible.
The marking procedure can be defined recursively (consider, that the root set consists
of only one element):
5
procedure mark(p); { p is a reference, i.e. a pointer to a data object }

begin
integer i;
if not_marked(p) then
begin
mark_data(p);
if structure(p) then
begin
for i := 1 to number_of_fields(p) do mark( field(p,i) );
end
end
end mark;
The number_of_fields(p) gives the number of the data fields of the structure
referenced by p. These fields should be examined recursively. The field(p,i) gives a
reference to the ith data field of the structure referenced by p.
The procedure can be implemented without recursive procedure calls using a
stack. The stack stores the references which should be examined. When a structure is
marked, its fields that are references to non-marked structures are pushed into the
stack. The marking procedure uses the reference on the top of the stack which contains
the root set at the start of the marking. The procedure runs until the stack is empty. It is
a fast method but theoretically the size of the stack may be proportional to the size of
the memory. If the memory contains N structures and all of them are accessible from a
root, the size of the stack grows up nearly to N.
To avoid the problem of the large stacks Knuth [Knut73] used a stack of fixed
length h. He stored the references circular - using mod h as the stack index. When the
index exceeds h, the references are stored again from index 0, overwriting the previ-
ously stored information. Therefore, the stack recalls only the most recently stored h
items and forgets the other ones. Of course, the stack may become empty before the
whole marking task is complete. When it happens, the memory is scanned from the
lowest address, looking for any marked object whose content refers to unmarked
objects. If such an object is found, marking resumes as before and continues until the
stack becomes empty again otherwise, the task is complete. Not the whole memory
should be scanned if the procedure records the minimum address of the lost references.
The task can be done without using a stack. Notice, that the recursive
procedure call contains implicitly the use of the stack and the memory can become full
during the execution of the marking. Deutsch, Schorr and Waite developed
independently an elegant algorithm to LISP implementations, ([Scho67],[Knut73]). It
requires one additional bit per LISP cell. The main idea of this algorithm is that the
edges of the directed graph (consisting of the structures and their references) can be
directed reversely during the marking until leaves or already marked nodes are found.
After that, the next node to be examined can be reached by going back on the reversed
links which can be restored into the original directions this time. The additional bit per
cell (LISP cell) indicates the direction in which the restoration of reversed links should
proceed (i.e. wheter to follow the left or the right pointer).
Reclaiming
Sweeping
The simplest method for reclaiming is that the unmarked cells are incorporated
into one or more free lists. This may be accomplished by keeping a list for each object
6
size commonly used in a program, see [Knut73]. These are called homogeneous free
lists (H-lists). In addition, another free list, the M-list, contains cells of miscellaneous
size ordered by their address. If possible, the reclaimed cells are linked into the
corresponding H-list, otherwise, into the M-list.
The memory allocation requests are satisfied from the H-list of the desired size,
if it is not empty. Otherwise, a suitably cell is taken from the M-list, if possible. If it is
larger than the desired size, it is split into two smaller ones and unrequired half is linked
into one of the free lists. If the request cannot be satisfied, first a semicompaction is
performed to move the cells of the H-lists to the M-list and bind together the adjacent
free cells. If the semicompaction is still not enough, a real compaction is performed on
the free cells.
The sweeping algorithms has three main disadvantages. First, the handling of
the varisized cells is cumbersome and fragments the memory. Second, the cost of the
collection is proportional to the size of the entire memory because all live objects must
be marked and all garbages must be collected. The third problem is very interesting.
This method does not move the objects - which can be seen as an advantage because of
the unnecessity of pointer updating - therefore, the later allocated objects are
interspersed with the earlier ones. This interleaving may be unsuitable for a system
using virtual memory.
However, this problems are not always as bad as one could think. According to
the statistics, the objects are often created in clusters and are typically active in the
same time. The clever use of the mark bits can speed up the sweeping algorithms. The
use of bitmap for mark bits makes possible the checking of 32 bits with one 32-bit
integer operation. Since the objects tend to survive in clusters, this can greatly
decrement the constant of the proportionality. The use of bitmap can also reduce the
cost of the allocation by allowing fast allocation from contiguous unmarked areas
rather than using free lists.
Compacting
The sliding type collectors are important in the sequential implementations of
Prolog because most of the used data structures can be freed or reused at backtracking.
However, the efficient implementation requires the temporal order of the objects not to
be mixed.
Haddon and Waite ([Hadd67],[Wait73]) proposed a compactor of sliding type.
They perform two scans of the entire memory (notice that the marking phase performs
one more scan). The objective of the first scan is to perform the compaction and to
build a "break table" which is used by the second scan to readjust the references. The
break table contains the initial address of each "hole" - a sequence of unmarked cells -
and the size of the hole. The construction of the break table can be made without
additional storage because it can be build up in the holes. It can be proved that the
space available in the holes are sufficient to store the table. However, the table should
be handled dynamically, rolling through the holes already filled with new data. At the
end of the first scan, the live objects are collected into one end of the memory. The
break table occupies the liberated part of the memory. The table is then sorted to speed
up the pointer readjustment done by the second scan. It consists of examinig each
pointer, consulting the table (using a binary search in the ordered table) to compute the
new position of the cell and changing the pointer accordingly. This algorithm is
considerably slow because of the use of the holes and the binary search for each
reference.
7
Knuth suggested another compacting algorithm in [Knut73] which uses three

(or more) linear scans of the entire memory. In the first scan the holes are linked to
form a free list. Two fields are reserved in each hole to store its size and a pointer to
the next hole. (A subsequent scan may combine adjacent holes into a single, larger
one.) The second scan readjustes the references by recognizing the pointers and
computing the new addresses. If the pointer is p, the collector finds the ith hole -
addressed ai - for which ai-1 < p < ai. The new value of the pointer can be computed
by substracting from p the sum of the sizes of the 1st, 2nd, ..., (i-1)th holes. The third
scan takes care of moving the accessible cells to the compacted area. This compactor is
a sliding type one.
This algorithm has several variants which try to speed up the work of the
pointer readjustment. Wegbreit [Wegb72] also stores the sum of the sizes of the
previous holes in the free list. Lang and Wegbreit [Lang72] subdivide the memory into
a fixed number of equal segments. The segments are compacted first independently and
a final scan compacts the segments. Terashima and Goto [Tera78] build up a balanced
tree of the holes storing the necessary pointers in the holes themselves. This structure
minimizes the computation of the readjustments.
Morris proposed in [Morr78]

another sliding type compactor. It performs
compacting in linear time (i.e. proportional
to the amount of the memory) and requires
one additional bit per pointer. A cell is
linked with the cells pointing to it, forming
the so-called "relocation chain". See the
transformation on Fig. 3. The chain contains Figure 3. Equivalent representation
the value of the cell (the last cell of the chain of a tree
while the original cell is the head of the list)
and the pointers. The additional bit in the cells indicates that they are "relocated". Two
scans are performed, the first goes from the end of the memory toward the start of it
(from the highest address toward the lowest one), while the second goes reversely.
The first scan updates all the references which point to a lower address
(backward pointers in the term
of memory consumption but
forwarding pointers if we
consider the direction of the
first scan) in the memory and
links them to the appropriate
relocation chain. See Fig. 4.
The new address can be
computed in constant time
keeping a counter of the
already examined marked cells.
The live objects will be
collected in the lowest part of
the memory therefore, the new
address of the cell is the lowest
address plus the total number
of marked cells minus the
Figure 4. The first phase of Morris’ algorithm number of marked cells already
8
encountered during this phase. At the end of the first scan all the backward pointers are
updated according to the new address of the pointed cell.
The second scan starts at the first memory cell and goes towards the end of the
memory. The forwarding pointers (pointing to a higher address) are updated in this
scan and the marked cells are moved now to their new positions.
3.2. Copying algorithms
The copying collectors move all of the live objects into a separate storage and
the rest of the memory is then reusable because it contains only garbage. The garbage
is not touched really thus, the "garbage collection" is implicit. While the compacting
collectors use a separate marking phase, the copying collectors integrate the traversal
of the data and the copying process. Therefore, most objects are traversed only once.
The work is proportional to the amount of live data.
An obviuous algorithm for compacting is that all accessible (live) data are
written out to a secondary storage area and then they are read back to the main
memory. However, this solution has several disadvantages:
(1) It requires an additional storage as large as the main memory.

(2) The data transferring between the storages has a considerable time overhead.
(3) The references to the moved objects must be updated.
Minsky proposed in [Mins63] an algorithm for LISP which used one marking
bit per cell. Each accessible cell is traced and marked if it is unmarked. The program
computes and outputs a triplet to the secondary storage. The triplet consists of the left
and right fields of the cell and the computed new address of it. The reference is
updated according to the new address. The new address is placed also in the marked
cell therefore, the multiple references can be handled easily. After marking, the triplets
are read back and the contents of the fields are stored in the specified new address.
Minsky's algorithms is a linearizing collector because the linked list elements are
positioned next to each other.
The secondary storage area can also be placed in the main memory and the role
of the storages can be changed periodically. This common kind of copying garbage
collectors is the semispace collector - proposed first by Fenichel and Yochelson in
[Feni69] - using Cheney's algorithm for the copying traversal [Chen70].
The data memory is subdivided into two contiguous semispaces. During normal
program execution only one of these semispaces is in use. Notice, that the cost of the
memory allocation is low unlike that kind of memories which store the free areas in
lists (used with reference counting or mark-sweep collectors). When the available half
of the memory (the current semispace) is exhausted, the program is stopped and the
collector is called. All of the live data are copied from the current semispace
(fromspace) to the other semispace (tospace). Once the copying is completed, the
tospace becomes the current semispace, and the program execution is resumed. Thus,
the roles of the two spaces are reversed each time the garbage collector is invoked. See
the work of the copying collector on Fig. 5.
Cheney used a breadth-first traversal method for copying the objects. The
tospace is considered as a queue. A "free pointer" points to the end of the queue and a
"scan pointer" points to the object to be examined. Initially, an immediately reachable
(from the root set) object is copied into the tospace. The scan pointer is advanced
9
Figure 5. Copying of a graph from the current space to the new space
through the object, location by location. Each time a pointer into fromspace is
encountered, the referred-to-object is copied to the end of the queue (the free pointer
isadvanced now), the pointer is updated according to the new location. Then the scan
continues. This algorithm effects the breadth-first traversal, copying all of the
descendants of a structure next to it, resulting a kind of linearizing collector (not in the
sense of the depth-first linearizing).
10
If the scan pointer reaches the free pointer, another object, referenced from the
root set, is copied into the queue and the process continues. Once all immediately
reachable objects are processed, all of the live data are copied into tospace. To avoid
the multiple copying of a multiple referred object, a forwarding pointer is installed in
the old version of the object, like in Minsky's solution.
The copying collectors are very efficient if sufficient large memory is available.
The cost of one collection is proportional to the amount of the live objects and
independent of the size of the memory. It can be assumed that approximately the same
amount of data is live during the program execution. In this case, decreasing the
frequency of garbage collection will decrease the total cost of collections. The more
memory is available the less collection is performed.
The data are transferred fast and only once in the memory - unlike Minsky's
algorithm. However, the disadvantage of pointer updating remains.
3.3. Non-copying implicit collector
To avoid the updating of the references, Baker proposed in [Bake92] a new
collector which does not move the objects. The separate spaces are considered as sets.
Any implementation of these sets is proper if the following operations are efficient
enough: (1) determine which set the object belongs to; (2) swap the roles of the sets.
One implementation is the copying method using two separate continuous spaces.
Baker implements these sets as double-linked lists. Two pointer fields and a
"color" field are added to each object. The pointers link the object into one of the sets
and the "color" field indicates which set it belongs to. The free space is divided into
chunks which are linked into a list and the memory allocation is performed by
advancing a pointer forward in this list. This allocation pointer divides the list into the
used part and the remaining free part. The disadvantage is the memory fragmentation
by supporting varisized objects, similar to other list processing collectors.
When the free list is exhausted, the collector traverses the live objects but the
transfer contsists only of unlinking the object from the "fromset", recoloring it and
linking it into the "toset". The space reclamation is implicit. When all of the reachable
objects are moved to the toset, the fromset contains only garbage and can be used as a
free list. The cost of the collection is proportional to the amount of the live data
however, the per-object constants are higher than that in copying algorithms. Notice,
that the reclaimed area is linked to the free list with one operation, unlike other free list
handling collectors which link the free object spaces one by one. Therefore, the
efficiency is much higher in this solution.
The main advantage is that the objects are not moved in the memory and the
references have not to be updated. This is important in languages which support
language-level pointers. In parallel environment the avoidance of updating remote
references is particularly important. This is also true in real-time systems where the
collection is incremental and the memory should be shared with the running program.
Baker's collector is an incremental one, here only the non-incremental version
is described. The more complex implementation of the incremental version is explained
in Section 4.3.
4. Run-time systems
In a real-time application, which has to respond for the events in a costrainted
time interval, the garbage collector must not stop the execution for significant time -
for minutes, seconds or even milliseconds depending on the task of the application.
The collector should distribute its work in time and enable the program running. It
11
implicates that the collection cannot be made as an atomic action but it should be
interleaved with program execution. This kind of collectors are called incremental.
As we could see, the reference counting scheme can be easily modified to be
incremental. However, its effectiveness is not enough in many cases and its space and
time overhead is higher than that of other solutions. Therefore, the copying and
marking collectors (they are called together tracing collectors) were made incremental.
The difference between the reclamation methods are not particularly important
in this point of view. However, the incremental tracing is very interesting. The
difficulty is that the program running paralle with the collector may change the graph
of the accessible objects while the collector is working on another part of the graph.
Therefore, the program is referred as mutator, which can be seen as a concurrent
process modifying the data structures.
4.1. The tricolor marking scheme
The problem of the concurrent marking can be described with the abstract
tricolor marking scheme. The tracing collectors (during marking or copying) can be
described as a process of traversing the graph of live objects and coloring them.
Initially, all of the objects are white, and by the end of tracing, the live objects should
be colored black.
In a classical marking process, the coloring is implemented directly by the mark
bit. The bit of the accessed object is set, i.e. they are black. In a copy collector the
coloring is implicitly implemented by the moving of the object. The color is defined by
the position of the object, those being in tospace, are black, the others are white.
The mutator can change the structure of the graph. If it writes a new object
reference into an already traversed data structure, the collector will not access the new
object and its memory area will be reclaimed. Introducing an intermediate coloring
technique, an invariant can be given to define precisely these problems.
A third color, gray, indicates
that an object has already been
reached but its descendant may not
have been. That is, a reached white
object is colored gray. When it is
scanned and its all pointers are
traversed, it is colored black and its
descendants (offsprings) are colored
gray. Fig. 6. shows an example, where
A and its descendants are copied. The
classical marking process - using
somewhat graph traversal - stores the
Figure 6.The tricolor marking scheme objects to be examined in a stack or
queue. The gray objects correspond to
the objects stored in that control structure. The black ones are the processed objects,
and those that have not been reached are white. In a copying collector, the gray objects
are those that are in the unscanned area of the tospace - between the scan and free
pointers. The objects that have been passed by the scan pointer are black.
By this abstraction the traversal can be seen intuitively as a wavefront of the

gray objects, which separate the white (unreached) objects from the black (reached and
examined) ones. That is, there are no direct pointers from black objects to white
objects. The invariant of the traversal can be defined as above. Fig. 7. shows an
12
example for violation of the invariant.

Consider that the references from A to
C and from B to D are replaced by the
mutator after the traverse of A and
before the traverse of B. In this case,
D will be never reached by the
collector and it will be reclaimed at the
end of the garbage collection cycle.
If the invariant is preserved by
both the collector and the mutator, the
collector can assume that the black
objects are examined well and the Figure 7.Violating the invariant
traversal can be continued. If a new
pointer in a black object is created by the mutator referring to a white object, some
coordination must be made with the collector to inform it to bring the traversed graph
up to date.
There are two basic approaches to coordinating. The use of a read barrier
avoids the access of white objects by the mutator. When the mutator attempts to access
a pointer to a white object, the object is colored immediately gray. As a consequence,
the mutator is not able to install in a black object a pointer referring to a white object.
Baker's collectors use read barrier.
The other approach is more direct and involves a write barrier. When the
mutator attempts to write a pointer into an object, the write is trapped and recorded.
The write barrier technique is cheaper than the read barrier one because the memory
write is much less common than memory read.
However, the invariant has not to be hold when a pointer to a white object is
only copied into a black object. In this case, the white object will be reached through
the original reference. The snapshot-at-beginning collector recordes only the
overwritten pointers referring to white objects ensuring that they will be found during
the collection.
The incremental update collectors records even less pointers.
4.2. Baker's incremental copying collector
Baker adapted the simple copying scheme of Cheney to the real-time systems,
see [Bake78]. It uses read barrier for coordination with the mutator. As the original
algorithm, it copies the accessible objects in breadth-first traversal, advancing the scan
pointer in the unscanned area of tospace and moving the referred-to objects from
fromspace to tospace and updating the reference pointer.
The new objects allocated by the mutator during the collection are allocated in
tospace and are treated as already examined objects - i.e. they are assumed to be live
or, with other words, they are black initially. The freed ones of them will not be
reclaimed until the next garbage collection cycle.
The incremental behaviour requires that all of the live data should be find and
copied into the tospace before the available free space is exhausted by the program. In
order to ensure it, the rate of copy collection is tied to the rate of the allocation.
Whenever an object is allocated, an increment of scanning and copying is done.
In order to hold the invariant, the read barrier ensures that the objects of the
fromspace (white objects) accessed by the program (mutator) are copied into the
tospace (they become gray).
13
The implementation of the read barrier in software is quite expensive because

the many pointer accesses involve many code execution. Special hardware which can
detect pointers into a given memory area and trap them to a handler may support the
implementation.
4.3. Baker's incremental non-copying collector
Baker's non-copying collector described earlier in Section 3. was incremental.
It uses double-linked lists to implement the sets of objects of each color.
The whole memory is organized into one double-linked list and divided into
four sections (New, Free, From, To, connecting them in this order). Fig. 8. shows the
cycle of the four lists.
During a garbage collection cycle the new objects are allocated from the
new-list. It is contiguous with the free-list and allocation is performed by advancing the
pointer that separates them. At the beginning of the garbage collection cycle the
new-list is empty.
The from-list is connected to the free-list. It holds the objects that were
allocated before the garbage collection. These objects are the subject of the collection.
As the collector and mutator traverse data structures, objects are moved from the
from-list to the to-list, which is connected to the from-list and to the new-list building a
double-linked circular list of the memory. The to-list is initially empty and grows at the
Figure 8.The treadmill: the structure of the memory
14
end connected to the from-list. The scan pointer separates the scanned and unscanned
area of the to-list.
The algorithm is isomorph with the copying algorithm. The objects in the
new-list are black. The objects of the from-list are white. The to-list contains black and
gray objects, which are separated by the scan pointer. When all of the reachable objects
of from-space are moved to the to-list and there are no unscanned (gray) objects in
to-list, the remaining objects of from-list are known to be garbage and the collection is
complete. The free-list merged with the from-list will be the free-list, while the new-list
and to-list (containing preserved data) form the from-list at the next cycle. (The
new-list and to-list are empty).
The situation is similar to the beginning of the previous cycle, except that the
segments have moved around the cycle - hence the name "treadmill".
Baker's incremental collectors approximates only the true liveness of the

objects. The objects allocated during the garbage collection are assumed to be live,
even if they die before the collection is finished. The objects that are already traversed
are also assumed to be live, they (the gray and black objects) are not re-examined
however, the mutator can destroy any reference to them.
On the other hand, garbage can be created during garbage collection if all paths
to an object are destroyed before the collector traverses the object. These objects are
reclaimed.
4.4. Snapshot-at-beginning marking scheme
Snapshot-at-beginning algorithms use a write barrier to ensure that no objects
ever become inaccessible to the garbage collector while collection is in progress. That
is, the graph of the reachable objects is "fixed" at the start of the collection, taking a
"snapshot" on it. The scheme can be implemented with garbage collectors which do
not move the objects.
The simplest snapshot collection is Yuasa's algorithm [Yua90]. If a location is
overwritten, the original value is stored in a marking stack which is examined later by
the collector. All objects, live at the beginning of the garbage collection cycle, will be
reached.
This scheme has the large advantage over Baker's solution by using a write
barrier instead of a read one because most of the pointer accesses do not involve
modification, e.g. dereferencing or comparison. However, Yuasa's scheme is more
conservative in the sense that objects, living at the beginning of garbage collection
cycle, cannot be freed during that cycle. The objects do not become inaccessible until
the end of the collection and they can be reclaimed only at the next cycle.
4.5. Incremental update scheme
While snapshot-at-beginning algorithms record every overwritten pointer, the
incremental update algorithm - proposed independently by Dijkstra [DLM78] and by
Steele [Stee75]- records only the pointers which are written into an already scanned
object. That is, to avoid violating the invariant, when a pointer to a white object
escapes into a black object, the collector is notified that this black object should be
examined again - that is, it is colored gray.
Before the garbage collection is completed, the recorded - formerly scanned -
objects are scanned again to find any live objects that would otherwise escape. The
process is iterative but always terminates.
15
Any pointer, which is overwritten and the original value is not copied into a
reachable object, is not taken into account - unlike the snapshot-at-beginning - and the
originally pointed object become garbage if there are no other references to it.
Another difference between this scheme and the others is that the newly
allocated objects during the collection are assumed not to be reachable (their color is
not initially black but white). If a pointer to one of them is installed into a reachable
object, it will be surely traversed and retained. This is a very important feature because
most of the objects are short-lived, so if the collector does not reach such an object
early in its traversal, it is probably never reached. The plus time overhead of this
solution - comparing to the others - is that newly allocated and live objects have to be
traversed. However, more space can be reclaimed at the end of the garbage collection.
4.6. A case study: An incremental Garbage Collector for
WAM-based Prolog
The careful analysis of a system may make possible to develop a very particular
but efficient garbage collector. The WAM (Warren Abstract Machine) is the most
efficient sequential interpreter of Prolog programs. However, it gives a constraint of
the garbage collection: the temporal order of the objects on the heap must be kept in
order to be able to undo the changes on backtracking.
Nowadays the actual requirements on a garbage collector include the
incremental feature and the usability in virtual memory systems. W.J.Older and
J.A.Rummel proposed in [Old92]an incremental garbage whose short collection phases
are built into the code of the compiled Prolog program (WAM code). These short
phases use a copying algorithm despite the fact that copying algorithm is not a sliding
compactor.
A choicepoint stack is used in WAM in order to permit bactracking. A
choicepoint holds information required to return the computation to the state just
before a non-deterministic call. For example, it stores pointers which show the end of
the heap and environment stack before the call. The deleting of the unnecessary data
(because of the failure of the computation) of the heap on backtracking is a simple
pointer update. This requires the temporal order of the objects on the heap but only
across choicepoint boundaries. Therefore, a copying algorithm can be used, if the
collector's work is so distributed that only one area is collected at a time.
Another observation is that garbage - which cannot be reclaimed by
backtracking - is in the tip region of the heap (past the critical point remembered in the
choicepoint stack). If a collection is made before this region is "buried" by new data
and new choicepoints, the collection can discard all of the garbage in one choicepoint
area (using a non-sliding compactor however, there are some problems with unbound
variables).
The collection of a small region involves the difficulty of finding all external
references into it. Because of the nature of Prolog, the pointers are under complete
system control and by choosing appropriate points of the execution, when collection
can be performed, the number of the external references can be limited to a list of
known locations. Particularly, the WAM implementation can provide all external
references without additional cost.
The implementors execute the collector when the program execution returns
from the call that created the last choicepoint. Therefore, the invocation of the
collector is built into one WAM instruction (the so-called proceed). However, careful
analysis showed that the cut operation can also 'cut' the execution of the collector and
it was modified, too.
16
This example shows that choicing among different garbage collection

techniques should be done carefully and deep analysis of the new system can suggest
efficient solutions.
5. Generational garbage collectors

The statistics show that in most programs, written in a variety of languages,
most objects live a very short time while a small percentage of them live much longer.
The main reason is that most of the created data are only used to pass information
between procedures (or goals in logic languages) and very few of them appear in the
results. This is exceptionally true for logic languages.
The life time of an object is not measured in milliseconds because it would be
very dependent from the hardware. Rather, the amount of memory allocation between
the birth and death of an object is considered. The very short time in the previous
paragraph may mean tens or hundreds of kilobytes of allocation (depending on the
language and its implementation).
The following observation can be taken: the most objects "die" before a
garbage collection (even if the collection is performed frequently), while those
surviving a collection, survive probably several ones. It implicates that these long-life
objects are collected many times and the system spends much time to keep them.
According to the statistics it is a major source of inefficiency in simple garbage
collectors.
The generational collectors - proposed first in [LH83] - classify the data by age
and store them in different memory areas. The areas containing older objects are
collected less often than the younger ones. If an object survives a given (estimated
statistically for each implementation) number of collections, it is moved to an older
generation.
The collection method in an area can be either a copying or a marking
algorithm.
The memory is subdivided into areas that will hold objects of different
(approximate) ages, or generations. Objects are allocated in the area of the youngest
objects, until it is full. Then this generation is collected, reclaiming the inaccessible
objects. Fig. 9. shows the memory organization of a system using generational copying
collector with two generations. The state of the system after the collection of the
younger generation can be seen in Fig. 10.
If an object survives long enough to be member of an older age, it is moved
into an older generation. It will not collected again in the next collections of the
younger generation. An older generation will fill up and it is collected, too. However, it
fills up much slowly because relatively few objects live long.
The number of generations may be greater than two. The statistics of logic
programming languages (even the parallel ones) classify the objects clearly into two
classes therefore, the two-generation collector is suggested. However, there is an
eight-generation collector used in a Smalltalk-80 implementation, see [CWB86],
indicating that other languages has other features.
The collection of an area cannot be performed independently from the others

because there may be references among the generations. Two kinds of pointer exist.
One of them points from the older generation to the younger one. If the scavenge of
the young generation does not take these pointers into account, some live objects may
17
Figure 9. The memory structure of a generational copying collector
not be preserved, or by the use of a copying collector, the pointers may not be updated.
The first solution is to introduce indirection tables which can be used as part of the root
set. However, the cost of the indirection is high and it is not recommended. A better
solution is to recording these pointers. If the system observates a pointer in the older
generation referring to a younger object, it is pushed into a stack. The collector of the
younger generation can use the pointers stored in the stack as roots and it can update
them if necessary. The observation is similar to the write barrier used in the incremental
collectors.
18
Figure 10. The memory after the collection of the youngest generation
The use of these pointers as roots results, however, in some conservatism. If an

unreachable object in an older generation refers to a younger object, the younger one
will be preserved by the collector.
The other kind of pointers refers to older objects from a younger generation.
Their number is much higher than that of the others because the references are
typically created by creating new objects which refer to existing objects. The use of a
stack of these pointers would be very costly. Rather one of the following two solution
19
is used. (1) If an older generation is collected, all the youngers are scavenged, too. In
this case, none of the intergenerational pointer should be taken into account as root
pointers. (2) The pointers from the new generation to the older one can be found by
scanning the new generation. In this case, all data in the new generation are assumed
to be live and the pointers are used as roots. The cost of scanning the new generation
for pointers is less then the cost of the garbage collection on it.
Although, the generational collectors have great advantages on the simpler

ones, there is a disadvantage: the cost of handling intergenerational pointers. Such a
program, which creates many references from newer objects to older ones, may be
executed slower than with the use of a simple collector. However, this is not true
generally.
The incremental collectors use barriers. The write-barrier is appropriate for
discovering the intergenerational pointers, too.
The implementor of a generational garbage collector should take care of the
following problems:
(1) How long an object must survive in one generation before it is advanced to
the next.
(2) The organization of the memory, dividing it among generations and within
one generation. How it affects the efficiency in virtual memory systems and
cache memory systems.
(3) The scheduling of the collection to reduce program pauses as much as
possible.
6. Parallel and distributed systems

The parallel and distributed systems brought new problems into Garbage
Collection. The new problems are the synchronisation and the handling of remote
references. Of course, the simplest method to perform the collection is to halt the
system while the collector is working. However, this is unacceptable inefficient. If the
use of remote references always includes message passing in a distributed system, the
collector may be slow. In shared memory systems the synchronization is a problem but
the solutions are similar to those used in incremental collectors. The processors may
execute the collection parallel in a shared memory and improve the efficiency of a
sequential collector.
In this section, some recent solutions are described as examples for distributed
and parallel garbage collectors. The subsections 6.1. and 6.2. introduce the collection
methods of a distributed system. The subsection 6.3. gives the ideas of collectors
developed for shared memory computers.
6.1. Weighted references
In a distributed system the memory store consists of locally placed memory
storages, each of them is directly accessible only by the local processor. All accesses of
a non-local store should be achieved by sending a request to the processor coupled to
that store. Therefore, a remote reference means a message passing through the
interconnecting network.
The reference counting scheme can be adapted to the distributed systems but it
has several drawbacks. To ensure the correctness of the simple scheme, the time order
of the messages between two processors should be kept. Otherwise, the following error
can occur, see Fig. 11. Let's consider, that object A placed on processor 1 refers to the
20
Figure 11. The problem of reference counting in distributed environment
object P placed on processor 2. The refcount of P is 1. If an object is copied from A -

the object B placed on processor 1 -, the number of references of P rise to 2. A
message should be sent from processor 1 to the processor 2 to increment the refcount
of P. Meanwhile, the object B can become inaccessible and reclaimed. That is, a
message should be sent to decrement the refcount of P. If the second message is
carried out first, the refcount of P becomes zero and P will be garbage collected,
causing an incorrect state in the program.
Another disadvantage of this scheme is that every remote pointer creation and
deletion causes a message passing through the network.
21
The weighted reference counting described in [Bev87] and [Wat87] avoids the
necessity of synchronisation and the communication overhead is restricted to one
message when a reference is deleted. In this solution, each reference has a positive
weight represented by an integer and each object has a reference count. The invariant
of the system is:
The reference count of an object is equal to the sum of the weights of the
references to it.
This invariant ensures that the refcount of an object becomes zero if, and only
if, it is not referenced by others. The possible operations hold the invariant in the
following way:
When a new object is created with a reference from an already existing object,
the new object is given an arbitrary positive reference count and the reference to it is
given a weight equal to that count.
When a reference is duplicated, its weight is split between the two resulting
references in such a way that the sum of their weights is equal to the original weight.
Notice, that the reference count of the object is not modified and no communication is
needed.
When a reference to an object is deleted, the reference count of the object is
reduced by the weight of the reference. To achieve it, a message should be sent to the
object.
These operations hold the invariant. The problem of non-synchronised
networks described above is avoided because the reference counter of an object cannot
rise, it can be only decremented.
Since the weights are always halved, in the implementations the weights are
restricted to be powers of two therefore, only the binary logarithm should be stored.
That is, if a weight is n bit wide, it can contain values from 0 to 2n-1, representing the
numbers of powers of two from 1 to 2(2n-1). For example, using 3 bits for reference
counts, storing numbers from 1 to 8, a 2 bit long weight field is enough to store the
possible 1,2,4 or 8 numbers.
The problem of this scheme is that the references of weight 1 cannot be
duplicated. In order to cope with them, an indirection cell is introduced. It is a small
object consisting of a reference with weight one (the weight should not be stored,
because it is always 1) and the own reference count. When a reference with weight one
should be duplicated, an indirection cell is created which refers to the referred object
and its refcount is given the maximum. The reference to be duplicated is replaced by a
reference to the indirection cell with the maximum weight and it is duplicated as
normal. Notice, that the reference count of the referenced object does not need to be
changed. To avoid the creation of long reference chains at the first evaluation the result
can be placed into the indirection cell.
6.2. PIE64
A good example of hybrid distributed garbage collectors is the one developed
for the Parallel Inference Engine (PIE64) computer and the FLENG logic language
[Xu89]. The japans use both the reference counting and the mark-scan schemes. The
collector is divided into three parts: first, the reference counting scheme is used to
collect the single referenced objects (on one inference unit, IU); second, a local
mark-scan collection is performed to reclaim completely the local, single referenced
22
objects; third, a global mark-scan collection completes the task of the garbage
collection.
Since all allocations in logic programming languages are regulated to the
system, it can have some information about the objects, and their consuming rates,
lifetimes and references can be estimated from these information. The memory is
divided into pages and single referenced objects with the same estimated consuming
rates are allocated in the same page (or pages, if they occupy much memory).
Therefore, page reference counting can be used. That is, each page has a reference
counter which is equal to the number of references to the objects placed in the page.
The whole page can be reclaimed when its counter becomes zero. In ideal case, all of
the objects of a page become garbage almost in the same time and the unreachable
areas of the memory become available to the system soon.
Of course, the estimation of the lifetime of the objects is not proper therefore,
there may be pages in the system in which only a small part of the objects are in use.
These pages cannot be reclaimed completely. The second stage of the garbage
collection consists of a local marking and compacting procedure. Like in other parallel
implementations of logic languages, the subgoals of the systems are executed parallel.
In this implementation, the objects allocated during the executing of a goal are
allocated separately by the others, they are stored in goalframes. The goals of the
system are activated or suspended and stored in two corresponding queues. The
goalframes which are not accessible from the active goal queue and the suspended goal
queue can be reclaimed. In the mark phase, only the goalframes are marked, the
involved objects are not, therefore, the marking is very fast. The goalframes are then
compacted using sliding compaction.
Finally, a global garbage collection is used to reclaim all of the garbage. The
system is halted and the collection is performed on all IU. In usual computation an
object points indirectly to a remote object, but during the collection an indirection table
is built on each processor. The hardware of the PIE64 supports the creation of this
table. The root set now consists of the active and suspended goal queue and the
Remote-Mark requests. When during the marking a remote pointer is found, a
Remote-Mark request is sent to the corresponding IU where the pointer is placed into
the indirection table and, with the help of a backward message, the original cell is
rewritten to refer to the indirection table. The message sending is performed by the
Network Inference Processors (NIPs) independently from the main processor.
The compaction phase uses Morris' algorithm. The first phase is the same as
that in Morris' original algorithm, when a remote reference is found, nothing is done.
In the second phase, when the memory is scanned from the lowest address, the remote
pointers are also restored which is also supported by the NIPs.
6.3. Garbage Collection on shared-memory multiprocessors
While the processing elements (PEs) in a distributed system use their own local
resources and the remote resources are accessed via communication, the PEs of a
shared memory use a common resource. The synchronization among the data accesses
is very important. The data memory is a common area and the garbage collection is
performed on this space. Since the processors work on a common area, the garbage
collection is a common work, too. Therefore, parallel algorithms should be developed
for these systems. Naturally, the existing sequential algorithms are parallelized.
Generally, the memory is divided into regions owned by the processors
exclusively avoiding the frequent use of the synchronized data access. Each PE use its
own "local" memory area - like in distributed systems however, the access of common
23
data is realized in a synchronized way. Since the shared data structures become
temporarily inconsistent during a collection process of a PE, the other processors have
to stop the normal execution therefore, they can be better involved in the garbage
collection.
The marking-compacting and the copying algorithms can be modified to
become parallel processes. Parallel copying methods can be found in [Ima93] and
[Ali96]. The basic difference between the two methods is that the former
implementation is based on the breadth-first travelsal of the objects, while the latter
traverses them in a depth-first manner which is a better method if the storage consists
of non-contiguous memory blocks.
Morris' sliding collector is also adapted to shared systems, see [Wee90]. Since
the method concentrates only on the reclaiming phase of the mark-reclaim collection,
the use of an incremental marking process is suggested.
The generational collectors can also be implemented in parallel environment,
see [Oza90].
The main problem of the parallel collectors is the load balancing. The
appropriate distribution of work is very important to achieve the fastest collection.
Conclusion
As it is shown in this study, there is no an unambiguously best method among
the proposed garbage collection algorithms. Each of them has advantages and
disadvantages, and an exhaustive analysis of the system should be done before a
collection method is selected. The main points of view of the analysis should be:
♦ Speed
♦ Effectiveness
♦ How much extra space can be used for garbage collection
♦ Virtual memory environment and use of cache memories
♦ Real-time or interactive systems
♦ How much work can be spared with compile time analysis
♦ Shared memory multiprocessors
♦ Distributed multicomputers, speed of interconnection network
♦ Special hardware features
This report was written in order to give an overview on the existing methods
for garbage collection. In the future, we are going to develop a garbage collector for
the Distributed Data Driven Prolog Abstract Machine (3DPAM). This system runs on
distributed multicomputers however, the processing elements use only their own local
memory.
References
[Coh81] J. Cohen: Garbage Collection of Linked Data Structures
Computing Surveys, Vol. 13, No. 3, September 1981.
[Wat87] P. Watson, I. Watson: An Efficient Garbage Collection Scheme for Parallel
Computer Architectures.
Conference on PARLE'87.
[Bev87] D. I. Bevan: Distributed Garbage Collection Using Reference Counting.
Conference on PARLE'87.
24
[Goto88] A. Goto, Y. Kimura, T. Nakagawa, T. Chikayama: Lazy Reference

Counting.
5th Int. Conference on Logic Programming 1988.
[Chi87] T. Chikayama, Y. Kimura: Multiple Reference Management in Flat GHC.
4th Int. Conference on Logic Programming 1987.
[Tou88] H. Touati, T. Hama: A Light-weight Prolog Garbage Collector.
Proc. of the Int. Conference on Fifth Generation Computer Systems 1988.
[Xu89] Lu Xu, H. Koike, H. Tanaka: Distributed Garbage Collection for the Parallel
Inference Engine PIE64.
Conference on NALPC'89.
[Oza90] T. Ozawa, A. Hosoi, A. Hattori: Generation Type Garbage Collection for
Parallel Logic Languages.
Conference on NACLP'90.
[Ina90] Y. Inamura, N. Ichiyoshi, K. Rokusawa, K. Nakajima: Optimization
Techniques Using the MRB and Their Evaluation on the Multi-PSI/V2.
North American Symp. on Logic Programming 1990.
[Old92] W. J. Older, J. A. Rummel: An incremental Garbage Collector for WAM-
Based Prolog.
Logic Programming: Proc. of the Joint Intl. Conf. and Symp. on Logic
Programming, 1992, ed. Krzysztof Apt.
[Wee90] P. Weemeeuw, B. Demoen: A la Recherche de la Mémoire Perdue or:
Memory Compaction for Shared Memory Multiprocessors.
Logic Programming: Proc. of the 1990 North American Conf., ed. S.
Debray and M. Hermenegildo.
[Klu88] F. Kluniak: Compile Time Garbage Collection for Ground Prolog.
Logic Programming vol. 2: Proc. of the 5th Intl. Conf. and Symp. 1988. ed.
R. A. Kowalski and K. A. Bowen.
[Les92] D. Lester: Distributed Garbage Collection of Cyclic Structures.
MFPG-Report-0001, 1992.
[Wil92] P. R. Wilson: Uniprocessor Garbage Collection Techniques.
Proc. of the 1992 Intl. Workshop on Memory Management. Springer-
Verlag Lecture Notes in Computer Science series.
[Ali96] K. A. M. Ali: A Parallel Copying Garbage Collection Scheme for Shared-
Memory Multiprocessors.
New Generation Computing, 14 (1996) 53-77.
[Imai93] A. Imai, E. Tick: Evaluation of Parallel Copying Garbage Collection on a
Shared-Memory Multiprocessor.
IEEE Transactions on Parallel and Distributed Systems, vol. 4, no. 9, sept.
1993.
[Bobr75] D.G. Bobrow: A note on hash linking.
Comm. ACM, 18, 7 (July 1975), 413-415.
[Deut76] L.P. Deutsch, D.G. Bobrow: An efficient incremental automatic garbage
collector.
Comm. ACM, 19, 9 (Sept. 1976), 522-526.
25
[Coll60] G.E. Collins: A method for overlapping and erasure of lists.

Comm. ACM, 3, 12 (Dec. 1960), 655-657.
[Weiz63] J. Weizenbaum: Symmetric list processor.
Comm. ACM, 6, 9, (Sept. 1963), 524-544.
[Knut73] D.E. Knuth: The art of computer programming
vol I: Fundamental Algorithms
Addison-Wesley, Reading, Mass., 1973
[Scho67] H. Schorr, W.M. Waite: An efficient machine-independent procedure for
garbage collection in various list structures.
Comm. ACM, 10, 8 (Aug. 1967), 501-506.
[Hadd67] B.K. Haddon, W.M. Waite: A compaction procedure for variable length
storage elements.
Comput. J., 10 (Aug. 1967), 162-165.
[Wait73] W.M. Waite: Implementing software for non-numeric applications.
Prentice Hall, Englewood Cliffs, N.J., 1973.
[Morr78] F.L. Morris: A time- and space-efficient garbage compaction algorithm.
Comm. ACM, 21, 8, (Aug. 1978), 662-665.
[Wegb72] B. Wegbreit: A generalized compactifying garbage collector.
Comput. J., 15, 3 (Aug. 1972), 204-208.
[Lang72] B. Lang, B. Wegbreit: Fast compatification.
Report. 25-72, Harvard Univ., Cambridge, Mass., Nov. 1972.
[Tera78] M. Terashima, E. Goto: Genetic order and compactifying garbage
collectors.
Inf. Process. Lett. 7, 1, (Jan. 1978), 27-32.
[Mins63] M.L. Minsky: A LISP garbage collector algorithm using serial secondary
storage.
Memo 58 (rev.), Project MAC, M.I.T., Cambridge, Mass., Dec. 1963.
[Feni69] R. Fenichel, J. Yochelson: A LISP garbage-collector for virtual-memory
computer systems.
Comm. ACM, 12, 11 (Nov. 1969), 611-612.
[Chen70] C.J. Cheney: A nonrecursive list compacting algorithm.
Comm. ACM, 13, 11 (Nov. 1970), 677-678.
[Bake78] H.G. Baker: List-processing in real time on a serial computer.
Comm. ACM, 21, 4 (Apr. 1978), 280-294.
[Bake92] H.G. Baker: The Treadmill: Real-time garbage collection without motion
sickness.
ACM SIGPLAN Notices, 27, 3 (Mar. 1992), 66-70.
[Yua90] T. Yuasa: Real-time garbage collection on general-purpose machines.
Journal of Sytems and Software, 11, 1990. 181-198.
[DLM78] E.W. Dijkstra, L. Lamport, A.J. Martin, C.S. Scholten, E.F.M. Steffens:
On-the-fly garbage collection: An exercise in cooperation.
Comm. ACM, 21, 11 (Nov. 1978), 966-975.
26
[Stee75] G.L. Steele: Multiprocessing compactifying garbage collection.

Comm. ACM, 18, 9 (Sept. 1975) 495-508.
[LH83] H. Liebermann, C. Hewitt: A real-time garbage collector based on the
lifetimes of objects.
Comm. ACM, 26, 6 (Jun. 1983), 419-429.
[CWB86] P.J. Caudill, A. Wirfs-Brock: A third generation Smalltalk-80
implementation.
ACM SIGPLAN 1986 Conf. on Object Oriented Programming Systems,
Languages and Applications. ed: N. Meyrowitz., 119-130.
ACM SIGPLAN Notices, 21, 11 (Nov. 1986), 119-130.
27
View publication stats

Garbage Collection Techniques

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Garbage Collection Techniques

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Garbage Collection Techniques

Article · March 1998

The EPSI Project View project

Data Reduction in Exascale HPC Systems View project

The user has requested enhancement of the downloaded file.

In practice, these two phases may be functionally or temporally interleaved and

Phase (b) can also be divided into two classes:

2.1. The original method

Garbage Collection Algorithms

Reference counting Mark & ...

& sweep & compact Copying algorithms

Separate phase Incremental collectors for Run-time systems

Reference counting Incremental mark & ...

use of read barrier use of write barrier

One generation of objects Generational collectors

Sequential machines Shared memory Distributed

Figure 1. Classification of garbage collection techniques

The reference counting is unable to detect a "cyclic garbage" where some

Even if the size of the

3. Marking and reclaiming

3.1. Classical solutions

procedure mark(p); { p is a reference, i.e. a pointer to a data object }

Knuth suggested another compacting algorithm in [Knut73] which uses three

Morris proposed in [Morr78]

(1) It requires an additional storage as large as the main memory.

By this abstraction the traversal can be seen intuitively as a wavefront of the

example for violation of the invariant.

The implementation of the read barrier in software is quite expensive because

Figure 8.The treadmill: the structure of the memory

Baker's incremental collectors approximates only the true liveness of the

This example shows that choicing among different garbage collection

5. Generational garbage collectors

The collection of an area cannot be performed independently from the others

Figure 9. The memory structure of a generational copying collector

The use of these pointers as roots results, however, in some conservatism. If an

Although, the generational collectors have great advantages on the simpler

6. Parallel and distributed systems

Figure 11. The problem of reference counting in distributed environment

object P placed on processor 2. The refcount of P is 1. If an object is copied from A -

[Goto88] A. Goto, Y. Kimura, T. Nakagawa, T. Chikayama: Lazy Reference

[Coll60] G.E. Collins: A method for overlapping and erasure of lists.

[Stee75] G.L. Steele: Multiprocessing compactifying garbage collection.

View publication stats

You might also like