04 InternalsOfGarbageCollection (GC)

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 32

Definition and History

In computer science, garbage collection (GC) is a


form of automatic memory management. It is a
special case of resource management, in which the
limited resource being managed is memory. The
garbage collector, or just collector, attempts to reclaim
garbage, or memory occupied by objects that are no
longer in use by the program. Garbage collection
was invented by John McCarthy around 1959 to solve
problems in Lisp. Lisp is the 2nd oldest High-Level
Programming Language after Fortran. Lisp is a
Functional Programming Language that emphasizes
on applications of functions.

John McCarthy
(September 4, 1927 – October 23, 2011).
Professor of computer science,
helped to invent the field of
Artificial Intelligence (AI).
Terminology
“Root Objects” are the objects from where the references
to other objects start. Objects in blue are “Reachable
Objects” (Active References).
These objects are the candidates for entry point for GC
and can never go unreachable. The arrow shows the references
between them.
“Root Set” – refers to the collection of root objects.

Object in red is “Unreachable


Objects” (Out of scope
References).

Object Reference Graph


(According to Graph Theory this is a ‘Directed, Rooted, Finite Graph)
Algorithms

Tri-color Marking (white, grey & black)

Mark-and-Sweep collection

Mark-and-don’t-Sweep

Copying garbage collection

Mark-Sweep-Compact garbage collection

Generational garbage collection

Stop-the-world garbage collection

Incremental garbage collection

Concurrent garbage collection


Mark-Sweep-Compact

• Travarse the entire memory graph starting from root and mark
objects that are visited.
MARK

• Remove all objects that are not visited. It starts scanning objects
from 1st address in the heap.
• If mark flag is not set => Reclaim;
Sweep • If mark flag is set => Clear the flag.

• Defragment the fragmented memory caused by Sweep. This will


boost subsequent memory IO.
Compact
Types of References

Strong Reference
• The garbage collector can reclaim only objects that have no references. An object that is
reachable is said to have “Strong Reference” which cannot be collected by the garbage
collector.

Weak Reference
• An object is eligible for garbage collection if it does not contain any strong references,
irrespective of the number of weak references it contains. Also known as the target.
• The primary advantage of maintaining weak references to an object is that it allows the
garbage collector to collect or reclaim memory of the object if it runs out of memory in
the managed heap.
• There are 2 types of weak reference…

Short The target of a short


Long A long weak reference
weak reference is retained after the
becomes null when object's Finalize
the object is reclaimed method has been
by garbage collection. called.
Mark Bit

Mark Bit is part of the Object Header itself. Size of


the header is 32bit in compact framework. GC
relies on this header information.
Mark – Sweep in action

3
GC Execution – When / How

System is low on memory


Memory is too fragmented
After a quanta of allocation
When the memory allocation fails – CLR (EE – Execution Engine) will initiate the GC().
 Threads are executed by the OS – not CLR.
 However, GC kick off scenario varies from respective CLR. But it will definitely fire
when allocation fails due to less available memory.
GC Execution
JIT will maintain the book keeping of the root objects, stack, registers and
heap; which is used by the GC.

Note: It is highly recommended to set to null for ASP.Net applications; coz


Server GC will clean them off whenever it kicks off. And you never know
which response/request can kickoff Server GC.
Cyclic Reference

Since the Mark logic calls Mark() recursively; if the object


is not Marked and visited – it will go into infinite loop.
Mark algorithm

The !Marked() check will ensure that each Object is marked only
once and avoid endless recursive loop.
Sweep algorithm

Sweep starts at the beginning of the Heap to the end of the Heap.
Compaction

Done to…
 Decrease fragmentation
Increase allocation speed
Increase locality of reference

“Locality of reference”: Objects sitting side by side – this is helpful to read


memory faster. Memory is managed in terms of a Page – which is 4 KB unit (4096
bytes), when the objects are loaded into Cache – keeping related objects side by
side (eg: Class with containment relation) will reduce CACHE hit rate.
Compaction is very slow process – since the data moves from one location to
another and update the references. Hence Compaction Planning is done prior
hitting this.
C# don’t support pointers – to benefit this.
Run based on fragmentation.
Need to strike balance as compaction is very costly.
Elaborate language/platform support required.
Generational GC
To optimize the compaction – Generational GC is used.

known as “infant
mortality” or the
“generational
hypothesis”
Generational GC

.Net uses 4 generations. 0, 1, 2 under 1 heap and


Large Object Generation under separate heap.
Incremental Collection

To avoid this Higher Gen Ref to Lower Gen issue; each time the object
assignment is made; the CLR checks if this scenario exists. If yes then it
maintains a table of such references called ‘Card Table’ which GC will
use. This will mark the blue object as new root.
Finalization
Finalization

Finalization Active List (FAL) maintains the list of Objects that


implement Finalizer() block.
Finalization

When GC runs, and the Object with Finalizer is marked for Sweep; the
Sweep() will check for the reference in the FAL and move the reference to
Finalization Ready List (FRL).
Finalization

Soon after the GC Run 1 is done, the Finalizer thread is kicked off.
This thread will look in FRL and fires the Finalizer() of the Object
and remove the reference in FRL.
Finalization
Finalization

 When GC Run 2 kick off – it checks references in FAL; when not found it will
remove the object. Thus objects with finalizer is guaranteed that they will get
promoted to higher generation; hence it is very expensive.

 Finalizer is not Guaranteed to run: Also, when the Finalizer thread calls the
respective Object.Finalizer() and if one of the Finalizer() breaks/hangs; the
entire thread will be aborted; resulting in the Object.Finalizer() for remaining
objects never gets called.

 To ensure that the Finalizer is called; we need to inherit and implement


CriticalFinalizerObject.
.Net GC Implementation

UP – Unit Processor; MP – Multi – Processor


Best Practices – What to avoid

 Object Pools is used in C/C++ where a pool of memory is


created to handle objects.

 More you call GC.Collect – you’ll end up promoting


objects to higher generation which will mislead the GC to
perform correctly. Hence do not use it.

 Boxing – value type is converted to reference type during


which a heap allocation happens; and subsequent unboxing
will make the object as garbage.

 Structs Implementing Interfaces – will lead to Boxing


which is not recommended.
Best Practices - Finalization
Best Practices - Pinning

Pinning is done to notify the GC not to move the reference during


Compaction. This is used by managed code accessing an unmanaged
code. Unpin it soon after done using it.
Setting GC Operation Mode in .Net Application

Choose GC: You can choose the mode of GC to have Server GC /


Concurrent GC on a Client/ Server machine which is very good for high
performance applications, like WPF / ASP.Net.
Types of GC Operation Mode

Stop-the- Simple stop-the-world garbage


Incremental
world GC GC Perform the garbage collection
collectors completely halt
cycle in discrete phases, with
execution of the program to
program execution permitted
run a collection cycle called the
between each phase (and
"embarrassing pause". This is
sometimes during some
suitable for non-interactive
phases). Takes longer to
programs (eg. batch programs).
complete than one batch
It is both simpler to implement
garbage collection pass
and faster.

Background Concurrent Is generational and it do not stop


Is an enhanced GC in which the
GC GC thread need not lock the
GC program execution at all (almost). It
uses the stop-the-world strategy for
heap. This removes the extra only a limited part of the heap
("generations 0 and 1"). The rest of the
pauses caused in Concurrent heap is collected with a dedicated GC
GC. Only remain the limited thread, which can run concurrently
pauses when the young with the applicative threads. However
generations are collected ("a there are moments when the GC
foreground collection"). thread must assume an exclusive lock
of the heap.
Types of GC Operation Mode in CLR version

.Net Framework (FW) uses Generational GC. The table below shows the GC operation mode
supported in each CLR version.

Note: Stop-the-world and Incremental GC are subsets of Concurrent & Background GC


mode.

FW Version CLR Version Type of GC Visual Studio Default in Windows

Windows XP Tablet and Media


1.0 1.0 Concurrent Visual Studio .NET
Center Editions

1.1 1.1 Concurrent Visual Studio .NET 2003 Windows Server 2003

2.0 2.0 Concurrent Visual Studio 2005 Windows Server 2003 R2

Windows Vista, Windows Server


3.0 2.0 Concurrent
2008
Windows 7, Windows Server
3.5 2.0 Concurrent Visual Studio 2008
2008 R2

4.0 4.0 Background Visual Studio 2010


Latency in Background GC vs Concurrent GC
These charts are from some performance testing done by Microsoft and presented during
Professional Developers Conference (PDC) which shows how the new background collection
algorithm should greatly reduce the latency times.

You might also like