FAQ What Is Java Garbage Collection

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Computer Science (Advanced) Java Programming

What is Java Garbage Collection?


Purpose
One of the most misunderstood mechanisms in the Java Virtual Machine (JVM) is the actual function and
effects of “garbage collection.” The term “garbage collection” refers to the software practice of finding
blocks of computer memory (objects, in the case of Java) that were once usable but are now no longer
useable because there are now no references anywhere in the executing processes that point at that block
of memory.
In one of the simplest cases, the following code creates “garbage.”
Object object = new Object();
object = null;
In that code, an object (of type Object) is created, and a reference is placed into the variable object. Then
a null value is assigned to the variable. There is now no longer a reference to the created Object, so the
associated memory is no longer accessible by the program. This amount of memory is now “garbage.” Of
course, in a real program, objects become inaccessible in a similar way all the time, usually in much more
complex ways. However, the result is still that there will be objects in the memory inaccessible and are
“garbage.”
The objective of the provision of “garbage collection” is to relieve the Java programmer of the need to
track the creation of every instance of a class, and to ensure that the memory used by those instances is
recycled.* While this is a powerful convenience, and eliminates sources of common errors (so-called
“memory leaks” and “dangling references”), it does require the programmer to understand that the
garbage collector does use both CPU and memory resources in order to provide this convenience, and that
there are other programming implications.
This article contains a description and explanation of the very complex finalization process.
The use of finalization (that is, the finalize method defined in the class Object) is now
@deprecated. The explanation remains here as it is still a step in the garbage collection
process, even though new programs should make no use of it.
General Pattern
There are a number of different implementations of the Java garbage collector, each having different
characteristics (see “Introduction to Garbage Collection Tuning” at
https://docs.oracle.com/javase/9/gctuning/introduction-garbage-collection-tuning.htm and other
information about the “Garbage-First Garbage Collector” (G1) at
https://docs.oracle.com/en/java/javase/12/gctuning/garbage-first-garbage-collector.html).
Java 11 introduced yet another improvement in garbage collection (ZGC), see
https://wiki.openjdk.java.net/display/zgc/Main, intended to cope with terabyte-sized memories. Further
improvements were made in ZGC in Java 12, and further improvements are planned (see “JEP 351: ZGC:
Uncommit Unused Memory,” http://openjdk.java.net/jeps/351). In spite of the variations in
implementation in order to support different performance objectives, the Java garbage collectors all
follow an equivalent outline of actions.

* If you have been introduced to C++, you will understand that it is the programmer’s responsibility to use either delete or
delete[] to release memory no longer needed. If this is not done, the memory area is “lost,” and the program may prematurely run
out of space to work. If memory is released while there is still a reference to that memory, that reference becomes a “dangling
reference,” as it is pointing at memory that is no longer relevant to that reference. The use of garbage collection is intended to
avoid such dangerous situations.
1. The garbage collector runs either continuously (in parallel to other computation) or at intervals,
attempting to identify which blocks of memory are in use (are accessible), and which are not in use
(are not accessible).
The general strategy (implemented in
different ways) is to start from
known places containing references,
known as the “root set” (e.g., Class
instances, variables on the Thread
stacks local to currently executing
methods, plus other places), then to
look at the objects to which these
references are pointing, and to
identify within those objects other
references, which may then be
pointing to other objects. This is
repeated until all possible chains of
references have been traced to objects that have no further pointers, or to where the chain of
references loops back on itself. This is termed the “mark” phase, as each memory block that is found
is “marked” as having been visited. At the end of the mark phase, any object that is not marked is
taken to be “garbage.”
2. The next phase is the “sweep” phase. This is basically scanning linearly through the JVM memory,
visiting every memory block, looking for blocks that are not marked. Again, this is accomplished in
various ways. Often, if enough memory can be regained for the program to continue executing
without sweeping the entire JVM memory space (which may be billions, or more recently, trillions
of bytes), then the sweep will cover only a portion of the memory.
3. “Sweep” is often accompanied by “compaction,” where all the objects in use are moved to be next to
each other, and the unused memory between them is gathered into one large block, from which
allocation of new objects may be made. Compaction requires reference values to be updated to point
to the new locations of any moved objects.
4. An essential feature of any sweep is to ensure that any garbage object that has a “non-trivial” finalize
method is found*. Now this is where it gets a little complex (and reading the section below on
finalization may be skipped).
This paragraph describes “finalization,” a process still in garbage collection, but is
@deprecated for use in new programs.
If an inaccessible (unmarked) object is found that has a non-trivial finalize method, then it
is next checked to see if it has already been finalized (how this will happen is shown
below). If the object has been finalized already, then no further action is required, and the
memory space belonging to that object is simply recovered (just as described above).
If the inaccessible object with a non-trivial finalize method has not been finalized, the
object is placed onto a queue of objects than need finalization. That object now has an
active reference! The garbage collector then forgets all about that object (as it is now an
active object) and moves on to completing the current sweep.

* “Non-trivial” here means that the object has a finalize method that overrides the finalize method of the class Object—because
the system, including the garbage collector, knows that the finalize method of the class Object does nothing, hence any object
whose finalize method does not override that method of Object needs no actual finalization action, and the space belonging to
that object may be simply recovered

2
During the sweep, objects may be moved (and references to them amended), to compact the use of
memory. Also, during this linear sweep, all the “in use” marks are removed, to that a future garbage
collection cycle may be started.
5. Once that sweep in finished (over all the memory to remove “in use” marks, or just a chosen smaller
part that is compacted), the garbage collector stops processing until the next time that it is supposed
to run. This may be immediately in the case of a continuous collector, or may not be until the
executing processes of the applications in the JVM find themselves out of memory again.
o-o-O-o-o
The Finalization Activity (Very Optional Reading)
So, what about those objects on the queue awaiting finalization.
There is another background thread that runs in the JVM (just like the garbage collector itself), that is
started or kept alive just when there are objects on the “finalization queue.”
This thread, when it runs (it is in the background), takes an object off the finalization queue, marks it as
finalized (note that this happens before the next step), and then causes that object’s finalize method to be
executed. While the finalize method is being executed, there is still an active reference, hence the garbage
collector cannot collect this object until the finalize method completes. Once the finalize method is set up
to happen, the finalization thread pays no more attention to that object, and moves on to the next object in
the finalization queue.
All of this means that, depending on how long it takes to mark all the storage in use, depending on how
much of the storage area is involved in the subsequent sweep, depending on how long it takes the garbage
collector to put an object with a non-trivial finalize method on the finalization queue, and depending on
how long it takes the background finalization thread to reach that object, an object might eventually be
finalized. In fact, if the main thread of the program finishes before all this happens (which is when the
JVM shuts down), the finalize method may never be called, or may never complete. As well, there are
other reasons for the JVM to shut down. For example, an uncaught exception may cause the JVM to shut
down even earlier.
Thus, even if the programmer provides a finalize method for a particular class, there is
no guarantee that for any given instance of that class it will ever be used or completed.
Some writers have been known to write, “the Java finalize method of a class is the equivalent to the
destructor of a class in C++.” Nothing could be further from the truth. A destructor is called, for local
objects, as soon as the object goes out of range, and for heap objects, exactly when the programmer uses
the delete (or delete[]) operation. Thus, is totally predictable when a destructor will be executed. When,
or even if, a finalize method will be called, is quite close to unpredictable. It will always be later than
when an object becomes eligible for collection, but whether it happens at all after that is never known in
advance.
It might well be asked, what good then is the finalize method?
It is there so that if something must be done before an actual recovery of the memory of the object is
performed, then that something can be done. This is a very small number of things. If external resources
(such as an open file, or a database connection) must be properly closed and returned by an object, then
that object should be provided with a close or equivalently named method that the programmer should
arrange to be called at the appropriate moment (usually the earliest such moment). It is possible for
programmer-written code to directly call a finalize method, but this is discouraged. A direct call to a
finalize method is not observed as being an execution of the finalize method for the purposes of the
garbage collector, so it may end up being called twice or more.

3
Strange Happenings in Finalization
Note that some strange, but interesting, things can happen in the course of performing finalization. [Do
not attempt to memorize this section—it is interesting detail but is subject matter for (potential) experts.]
From the above description, it can be seen that the finalize method is only called once as part of the
garbage collection strategy. The finalization thread, after setting up the finalize method to be executed,
just “forgets” about the object. Since the finalization thread does not keep a reference to the object after
the finalize method is finished, the object should again become inaccessible, so eventually (!) the garbage
collector may again discover that this object is (again) inaccessible and will go through the whole process
again. But this time, the garbage collector will discover that finalize has already been called for this object
(because the finalization thread has so marked the object), and as a result the garbage collection will
proceed to the actual recovery of the memory used. (There are safety interlocks in all this to ensure that
the object’s memory is not recovered until the finalize method completes—as an exercise, consider the
situation where the finalize method goes into an infinite loop.)
It is possible, that while executing the finalize method, that the object causes a new reference to itself to
be created (e.g., it adds itself to an event list in, say, some Swing of JavaFX container). Now, this object
is now not inaccessible, because there is now an active reference to it. (When this happens, it is called a
“resurrection” of the object.) A garbage collection at this point will, in the mark phase, come to the
conclusion that the object is “in use.” The object can go on doing all sorts of things after its resurrection;
perhaps collecting other resources (references to other objects, together with their memory, external
resources, such as a login session, and so on). When at last this object again becomes inaccessible
(assuming it does), eventually the garbage collector will again find it, will see that the resurrected object
has already been finalized, and the finalize method is not called a second time! (even though it’s finalize
method may be capable of returning all these resources that have been gathered in the meantime). These
are situations over which extreme care must be taken because whatever happens at this point is totally the
responsibility (or fault) of the programmer.
The finalize method must be very carefully written.
One matter that must be carefully considered is that constructors do not always complete. The execution
of a constructor may end prematurely when the constructor itself, or a method it uses, throws an
Exception. In this case, no reference to this object is passed to the caller of the constructor, but the object
still exists! This incomplete object is a prime candidate for being collected, and, if it has a non-trivial
finalize method, the finalize method will be called. The code of the finalize method must be able to detect
which parts of the object have been initialized, and which parts have not, depending upon where and
when in the constructor the exception occurred. This makes for very tricky code, and, since it is in a
finalize method, makes it very difficult to test.
Why should a programmer never call finalize explicitly?
[From this section, it is important to remember the last line.]
It was mentioned above, the finalize method must be written very carefully. Not only does it need to take
care of incomplete objects, but it is also required to clean up completely everything that may happen
during the lifetime of the object. The usual assumption in doing this is that the finalize method will be
called once, and never again (see the discussion above, where the garbage collector takes care not to call
the finalize method twice).
If the finalize method is called explicitly (i.e., without the help of the garbage collector) there will be no
record that finalize has been called). And while it may be possible for a programmer to take some
precautions to ensure calling finalize twice does no harm, this cannot be guaranteed for all objects, and (as
will be seen below), the finalize method, when used correctly, will call the finalize method of its
superclass, and that one will call the next finalize method up the chain, and so on.

4
It is difficult enough to write correct finalize methods, without also undertaking to make such a method a
general purpose method available as a normal part of a class API.
So, do not, repeat, do not, call finalize methods explicitly.
What is the correct form for a finalize method?
Firstly, in must be noted that all classes inherit a finalize method, hence all objects have a finalize method.
The garbage collector can distinguish between a finalize method that is inherited unchanged from Object
(and so this will be a trivial finalize method, that does nothing) and one that overrides that method. The
overriding method may still do nothing—but if the finalize method of Object is overridden, it is normally
assumed to be a non-trivial implementation (although some versions of the JVM do try to improve on this
by working out if anything more than the default finalize is performed).
A finalize method should take the form:
@Override
protected void finalize() throws Throwable
{
try
{
// Perform your finalization tasks here (with a catch
// or more if needed)
}
} finally
{
super.finalize();
}
}
This is different to most overridden methods, where the more usual practice is to call the super version of
the method (if needed at all) as the first operation (or at least, somewhere near the beginning). With
finalize, calling the super version of the finalize method is done last. In this form, if the finalization tasks
cause an exception to be thrown, then that exception will be held. Now, irrespective of whether an
exception was thrown or not, the call in the finally block will be attempted. It this completes normally,
then, if there is a held exception, it will be thrown, otherwise the finalize method will complete normally.
If, during the execution of the call in the finally block causes an exception, then it is that exception that
will be thrown (irrespective of whether there is another exception being held or not). In any event,
whatever exception is eventually thrown, if one is thrown at all, it is ignored by the finalization thread (no
further action is taken), and the execution of the finalize method is complete.
If the finalization tasks are known and can be proven in advance not to throw exceptions, then this pattern
can be simplified. If the finalization tasks are simple assignment of values, or other very simple
manipulations of the state of the JVM, then it is possible that the throwing of exceptions can be avoided.
But be careful: closing files, arithmetic divisions, assignments to arrays, using a reference to access a
member of another object, can all cause exceptions to be thrown, and the above form is recommended. If
no exceptions will be thrown, then the following form may be used:
@Override
protected void finalize() throws Throwable
{
// Perform your finalization tasks here
super.finalize();
}
Keep in mind that exceptions thrown by finalize during the garbage collection process halt the finalization
of this object, but otherwise have no effect on the finalization of other objects.

5
Further detail on the use and semantics of the finalize method, see the Java™ Language Specification:
Java SE 16 Edition, Chapter 12, Section 12.6.

It is because there are so many opportunities for a


finalize method to go wrong (or be misused in a variety
of ways) that the method itself has been @deprecated.

You might also like