Tracing garbage collection explained

In computer programming, tracing garbage collection is a form of automatic memory management that consists of determining which objects should be deallocated ("garbage collected") by tracing which objects are reachable by a chain of references from certain "root" objects, and considering the rest as "garbage" and collecting them. Tracing is the most common type of garbage collection – so much so that "garbage collection" often refers to the tracing method, rather than others such as reference counting – and there are a large number of algorithms used in implementation.

Reachability of an object

Informally, an object is reachable if it is referenced by at least one variable in the program, either directly or through references from other reachable objects. More precisely, objects can be reachable in only two ways:

  1. A distinguished set of roots: objects that are assumed to be reachable. Typically, these include all the objects referenced from anywhere in the call stack (that is, all local variables and parameters in the functions currently being invoked), and any global variables.
  2. Anything referenced from a reachable object is itself reachable; more formally, reachability is a transitive closure.

The reachability definition of "garbage" is not optimal, insofar as the last time a program uses an object could be long before that object falls out of the environment scope. A distinction is sometimes drawn between syntactic garbage, those objects the program cannot possibly reach, and semantic garbage, those objects the program will in fact never again use. For example:Object x = new Foo;Object y = new Bar;x = new Quux;/* At this point, we know that the Foo object * originally assigned to x will never be * accessed: it is syntactic garbage. */

/* In the following block, y *could* be semantic garbage; * but we won't know until x.check_something returns * some value -- if it returns at all. */if (x.check_something) System.exit(0);

The problem of precisely identifying semantic garbage can easily be shown to be partially decidable: a program that allocates an object X, runs an arbitrary input program P, and uses X if and only if P finishes would require a semantic garbage collector to solve the halting problem. Although conservative heuristic methods for semantic garbage detection remain an active research area, essentially all practical garbage collectors focus on syntactic garbage.

Another complication with this approach is that, in languages with both reference types and unboxed value types, the garbage collector needs to somehow be able to distinguish which variables on the stack or fields in an object are regular values and which are references: in memory, an integer and a reference might look alike. The garbage collector then needs to know whether to treat the element as a reference and follow it, or whether it is a primitive value. One common solution is the use of tagged pointers.

Strong and weak references

The garbage collector can reclaim only objects that have no references pointing to them either directly or indirectly from the root set. However, some programs require weak references, which should be usable for as long as the object exists but should not prolong its lifetime. In discussions about weak references, ordinary references are sometimes called strong references. An object is eligible for garbage collection if there are no strong (i.e. ordinary) references to it, even though there still might be some weak references to it.

A weak reference is not merely just any pointer to the object that a garbage collector does not care about. The term is usually reserved for a properly managed category of special reference objects which are safe to use even after the object disappears because they lapse to a safe value (usually null). An unsafe reference that is not known to the garbage collector will simply remain dangling by continuing to refer to the address where the object previously resided. This is not a weak reference.

In some implementations, weak references are divided into subcategories. For example, the Java Virtual Machine provides three forms of weak references, namely soft references,[1] phantom references,[2] and regular weak references.[3] A softly referenced object is only eligible for reclamation if the garbage collector decides that the program is low on memory. Unlike a soft reference or a regular weak reference, a phantom reference does not provide access to the object that it references. Instead, a phantom reference is a mechanism that allows the garbage collector to notify the program when the referenced object has become phantom reachable. An object is phantom reachable, if it still resides in memory and it is referenced by a phantom reference, but its finalizer has already executed. Similarly, Microsoft.NET provides two subcategories of weak references,[4] namely long weak references (tracks resurrection) and short weak references.

Weak collections

Data structures can also be devised which have weak tracking features. For instance, weak hash tables are useful. Like a regular hash table, a weak hash table maintains an association between pairs of objects, where each pair is understood to be a key and value. However, the hash table does not actually maintain a strong reference on these objects. Special behavior takes place when either the key or value or both become garbage: the hash table entry is spontaneously deleted. There exist further refinements such as hash tables which have only weak keys (value references are ordinary, strong references) or only weak values (key references are strong).

Weak hash tables are important for maintaining associations between objects, such that the objects engaged in the association can still become garbage if nothing in the program refers to them any longer (other than the associating hash table).

The use of a regular hash table for such a purpose could lead to a "logical memory leak": the accumulation of reachable data which the program does not need and will not use.

Basic algorithm

Tracing collectors are so called because they trace through the working set of memory. These garbage collectors perform collection in cycles. It is common for cycles to be triggered when there is not enough free memory for the memory manager to satisfy an allocation request. But cycles can often be requested by the mutator directly or run on a time schedule. The original method involves a naïve mark-and-sweep in which the entire memory set is touched several times.

Naïve mark-and-sweep

In the naive mark-and-sweep method, each object in memory has a flag (typically a single bit) reserved for garbage collection use only. This flag is always cleared, except during the collection cycle.

The first stage is the mark stage which does a tree traversal of the entire 'root set' and marks each object that is pointed to by a root as being 'in-use'. All objects that those objects point to, and so on, are marked as well, so that every object that is reachable via the root set is marked.

In the second stage, the sweep stage, all memory is scanned from start to finish, examining all free or used blocks; those not marked as being 'in-use' are not reachable by any roots, and their memory is freed. For objects which were marked in-use, the in-use flag is cleared, preparing for the next cycle.

This method has several disadvantages, the most notable being that the entire system must be suspended during collection; no mutation of the working set can be allowed. This can cause programs to 'freeze' periodically (and generally unpredictably), making some real-time and time-critical applications impossible. In addition, the entire working memory must be examined, much of it twice, potentially causing problems in paged memory systems.

Tri-color marking

Because of these performance problems, most modern tracing garbage collectors implement some variant of the tri-color marking abstraction, but simple collectors (such as the mark-and-sweep collector) often do not make this abstraction explicit. Tri-color marking works as described below.

Three sets are created white, black and grey:

In many algorithms, initially the black set starts as empty, the grey set is the set of objects which are directly referenced from roots and the white set includes all other objects. Every object in memory is at all times in exactly one of the three sets. The algorithm proceeds as following:

  1. Pick an object o from the grey set
  2. Move each white object o references to the grey set. This ensures that neither this object nor any object it references can be garbage-collected.
  3. Move o to the black set
  4. Repeat the last three steps until the grey set is empty.

When the grey set is empty, the scan is complete; the black objects are reachable from the roots, while the white objects are not and can be garbage-collected.

Since all objects not immediately reachable from the roots are added to the white set, and objects can only move from white to grey and from grey to black, the algorithm preserves an important invariant no black objects reference white objects. This ensures that the white objects can be freed once the grey set is empty. This is called the tri-color invariant. Some variations on the algorithm do not preserve this invariant but use a modified form for which all the important properties hold.

The tri-color method has an important advantage it can be performed "on-the-fly", without halting the system for significant periods of time. This is accomplished by marking objects as they are allocated and during mutation, maintaining the various sets. By monitoring the size of the sets, the system can perform garbage collection periodically, rather than as needed. Also, the need to touch the entire working set on each cycle is avoided.

Implementation strategies

Moving vs. non-moving

Once the unreachable set has been determined, the garbage collector may simply release the unreachable objects and leave everything else as it is, or it may copy some or all of the reachable objects into a new area of memory, updating all references to those objects as needed. These are called "non-moving" and "moving" (or, alternatively, "non-compacting" and "compacting") garbage collectors, respectively.

At first, a moving algorithm may seem inefficient compared to a non-moving one, since much more work would appear to be required on each cycle. But the moving algorithm leads to several performance advantages, both during the garbage collection cycle itself and during program execution:

One disadvantage of a moving garbage collector is that it only allows access through references that are managed by the garbage collected environment, and does not allow pointer arithmetic. This is because any pointers to objects will be invalidated if the garbage collector moves those objects (they become dangling pointers). For interoperability with native code, the garbage collector must copy the object contents to a location outside of the garbage collected region of memory. An alternative approach is to pin the object in memory, preventing the garbage collector from moving it and allowing the memory to be directly shared with native pointers (and possibly allowing pointer arithmetic).[5]

Copying vs. mark-and-sweep vs. mark-and-don't-sweep

Not only do collectors differ in whether they are moving or non-moving, they can also be categorized by how they treat the white, grey and black object sets during a collection cycle.

The most straightforward approach is the semi-space collector, which dates to 1969. In this moving collector, memory is partitioned into an equally sized "from space" and "to space". Initially, objects are allocated in "to space" until it becomes full and a collection cycle is triggered. At the start of the cycle, the "to space" becomes the "from space", and vice versa. The objects reachable from the root set are copied from the "from space" to the "to space". These objects are scanned in turn, and all objects that they point to are copied into "to space", until all reachable objects have been copied into "to space". Once the program continues execution, new objects are once again allocated in the "to space" until it is once again full and the process is repeated.

This approach is very simple, but since only one semi space is used for allocating objects, the memory usage is twice as high compared to other algorithms. The technique is also known as stop-and-copy. Cheney's algorithm is an improvement on the semi-space collector.

A mark and sweep garbage collector keeps a bit or two with each object to record if it is white or black. The grey set is kept as a separate list or using another bit. As the reference tree is traversed during a collection cycle (the "mark" phase), these bits are manipulated by the collector. A final "sweep" of the memory areas then frees white objects. The mark and sweep strategy has the advantage that, once the condemned set is determined, either a moving or non-moving collection strategy can be pursued. This choice of strategy can be made at runtime, as available memory permits. It has the disadvantage of "bloating" objects by a small amount, as in, every object has a small hidden memory cost because of the list/extra bit. This can be somewhat mitigated if the collector also handles allocation, since then it could potentially use unused bits in the allocation data structures. Or, this "hidden memory" can be eliminated by using a Tagged pointer, trading the memory cost for CPU time. However, the mark and sweep is the only strategy that readily cooperates with external allocators in the first place.

A mark and don't sweep garbage collector, like the mark-and-sweep, keeps a bit with each object to record if it is white or black; the grey set is kept as a separate list or using another bit. There are two key differences here. First, black and white mean different things than they do in the mark and sweep collector. In a "mark and don't sweep" collector, all reachable objects are always black. An object is marked black at the time it is allocated, and it will stay black even if it becomes unreachable. A white object is unused memory and may be allocated. Second, the interpretation of the black/white bit can change. Initially, the black/white bit may have the sense of (0=white, 1=black). If an allocation operation ever fails to find any available (white) memory, that means all objects are marked used (black). The sense of the black/white bit is then inverted (for example, 0=black, 1=white). Everything becomes white. This momentarily breaks the invariant that reachable objects are black, but a full marking phase follows immediately, to mark them black again. Once this is done, all unreachable memory is white. No "sweep" phase is necessary.

The mark and don't sweep strategy requires cooperation between the allocator and collector, but is incredibly space efficient since it only requires one bit per allocated pointer (which most allocation algorithms require anyway). However, this upside is somewhat mitigated, since most of the time large portions of memory are wrongfully marked black (used), making it hard to give resources back to the system (for use by other allocators, threads, or processes) in times of low memory usage.

The mark and don't sweep strategy can therefore be seen as a compromise between the upsides and downsides of the mark and sweep and the stop and copy strategies.

Generational GC (ephemeral GC)

It has been empirically observed that in many programs, the most recently created objects are also those most likely to become unreachable quickly (known as infant mortality or the generational hypothesis). A generational GC (also known as ephemeral GC) divides objects into generations and, on most cycles, will place only the objects of a subset of generations into the initial white (condemned) set. Furthermore, the runtime system maintains knowledge of when references cross generations by observing the creation and overwriting of references. When the garbage collector runs, it may be able to use this knowledge to prove that some objects in the initial white set are unreachable without having to traverse the entire reference tree. If the generational hypothesis holds, this results in much faster collection cycles while still reclaiming most unreachable objects.

In order to implement this concept, many generational garbage collectors use separate memory regions for different ages of objects. When a region becomes full, the objects in it are traced, using the references from the older generation(s) as roots. This usually results in most objects in the generation being collected (by the hypothesis), leaving it to be used to allocate new objects. When a collection doesn't collect many objects (the hypothesis doesn't hold, for example because the program has computed a large collection of new objects it does want to retain), some or all of the surviving objects that are referenced from older memory regions are promoted to the next highest region, and the entire region can then be overwritten with fresh objects. This technique permits very fast incremental garbage collection, since the garbage collection of only one region at a time is all that is typically required.

Ungar's classic generation scavenger has two generations. It divides the youngest generation, called "new space", into a large "eden" in which new objects are created and two smaller "survivor spaces", past survivor space and future survivor space. The objects in the older generation that may reference objects in new space are kept in a "remembered set". On each scavenge, the objects in new space are traced from the roots in the remembered set and copied to future survivor space. If future survivor space fills up, the objects that do not fit are promoted to old space, a process called "tenuring". At the end of the scavenge, some objects reside in future survivor space, and eden and past survivor space are empty. Then future survivor space and past survivor space are exchanged and the program continues, allocating objects in eden. In Ungar's original system, eden is 5 times larger than each survivor space.

Generational garbage collection is a heuristic approach, and some unreachable objects may not be reclaimed on each cycle. It may therefore occasionally be necessary to perform a full mark and sweep or copying garbage collection to reclaim all available space. In fact, runtime systems for modern programming languages (such as Java and the .NET Framework) usually use some hybrid of the various strategies that have been described thus far; for example, most collection cycles might look only at a few generations, while occasionally a mark-and-sweep is performed, and even more rarely a full copying is performed to combat fragmentation. The terms "minor cycle" and "major cycle" are sometimes used to describe these different levels of collector aggression.

Stop-the-world vs. incremental vs. concurrent

Simple stop-the-world garbage collectors completely halt execution of the program to run a collection cycle, thus guaranteeing that new objects are not allocated and objects do not suddenly become unreachable while the collector is running.

This has the disadvantage that the program can perform no useful work while a collection cycle is running (sometimes called the "embarrassing pause"[6]). Stop-the-world garbage collection is therefore mainly suitable for non-interactive programs. Its advantage is that it is both simpler to implement and faster than incremental garbage collection.

Incremental and concurrent garbage collectors are designed to reduce this disruption by interleaving their work with activity from the main program. Incremental garbage collectors perform the garbage collection cycle in discrete phases, with program execution permitted between each phase (and sometimes during some phases). Concurrent garbage collectors do not stop program execution at all, except perhaps briefly when the program's execution stack is scanned. However, the sum of the incremental phases takes longer to complete than one batch garbage collection pass, so these garbage collectors may yield lower total throughput.

Careful design is necessary with these techniques to ensure that the main program does not interfere with the garbage collector and vice versa; for example, when the program needs to allocate a new object, the runtime system may either need to suspend it until the collection cycle is complete, or somehow notify the garbage collector that there exists a new, reachable object.

Precise vs. conservative and internal pointers

Some collectors can correctly identify all pointers (references) in an object; these are called precise (also exact or accurate) collectors, the opposite being a conservative or partly conservative collector. Conservative collectors assume that any bit pattern in memory could be a pointer if, interpreted as a pointer, it would point into an allocated object. Conservative collectors may produce false positives, where unused memory is not released because of improper pointer identification. This is not always a problem in practice unless the program handles a lot of data that could easily be misidentified as a pointer. False positives are generally less problematic on 64-bit systems than on 32-bit systems because the range of valid memory addresses tends to be a tiny fraction of the range of 64-bit values. Thus, an arbitrary 64-bit pattern is unlikely to mimic a valid pointer. A false negative can also happen if pointers are "hidden", for example using an XOR linked list. Whether a precise collector is practical usually depends on the type safety properties of the programming language in question. An example for which a conservative garbage collector would be needed is the C language, which allows typed (non-void) pointers to be type cast into untyped (void) pointers, and vice versa.

A related issue concerns internal pointers, or pointers to fields within an object. If the semantics of a language allow internal pointers, then there may be many different addresses that can refer to parts of the same object, which complicates determining whether an object is garbage or not. An example for this is the C++ language, in which multiple inheritance can cause pointers to base objects to have different addresses. In a tightly optimized program, the corresponding pointer to the object itself may have been overwritten in its register, so such internal pointers need to be scanned.

Performance

Performance of tracing garbage collectors – both latency and throughput – depends significantly on the implementation, workload, and environment. Naive implementations or use in very memory-constrained environments, notably embedded systems, can result in very poor performance compared with other methods, while sophisticated implementations and use in environments with ample memory can result in excellent performance.

In terms of throughput, tracing by its nature requires some implicit runtime overhead, though in some cases the amortized cost can be extremely low, in some cases even lower than one instruction per allocation or collection, outperforming stack allocation.[7] Manual memory management requires overhead due to explicit freeing of memory, and reference counting has overhead from incrementing and decrementing reference counts, and checking if the count has overflowed or dropped to zero.

In terms of latency, simple stop-the-world garbage collectors pause program execution for garbage collection, which can happen at arbitrary times and take arbitrarily long, making them unusable for real-time computing, notably embedded systems, and a poor fit for interactive use, or any other situation where low latency is a priority. However, incremental garbage collectors can provide hard real-time guarantees, and on systems with frequent idle time and sufficient free memory, such as personal computers, garbage collection can be scheduled for idle times and have minimal impact on interactive performance. Manual memory management (as in C++) and reference counting have a similar issue of arbitrarily long pauses in case of deallocating a large data structure and all its children, though these only occur at fixed times, not depending on garbage collection.

Manual heap allocation:
Garbage collection:

It is difficult to compare the two cases directly, as their behavior depends on the situation. For example, in the best case for a garbage collecting system, allocation just increments a pointer, but in the best case for manual heap allocation, the allocator maintains freelists of specific sizes and allocation only requires following a pointer. However, this size segregation usually cause a large degree of external fragmentation, which can have an adverse impact on cache behaviour. Memory allocation in a garbage collected language may be implemented using heap allocation behind the scenes (rather than simply incrementing a pointer), so the performance advantages listed above don't necessarily apply in this case. In some situations, most notably embedded systems, it is possible to avoid both garbage collection and heap management overhead by preallocating pools of memory and using a custom, lightweight scheme for allocation/deallocation.[8]

The overhead of write barriers is more likely to be noticeable in an imperative-style program which frequently writes pointers into existing data structures than in a functional-style program which constructs data only once and never changes them.

Some advances in garbage collection can be understood as reactions to performance issues. Early collectors were stop-the-world collectors, but the performance of this approach was distracting in interactive applications. Incremental collection avoided this disruption, but at the cost of decreased efficiency due to the need for barriers. Generational collection techniques are used with both stop-the-world and incremental collectors to increase performance; the trade-off is that some garbage is not detected as such for longer than normal.

Determinism

Real-time garbage collection

While garbage collection is generally nondeterministic, it is possible to use it in hard real-time systems. A real-time garbage collector should guarantee that even in the worst case it will dedicate a certain number of computational resources to mutator threads. Constraints imposed on a real-time garbage collector are usually either work based or time based. A time based constraint would look like: within each time window of duration T, mutator threads should be allowed to run at least for Tm time. For work based analysis, MMU (minimal mutator utilization)[9] is usually used as a real-time constraint for the garbage collection algorithm.

One of the first implementations of hard real-time garbage collection for the JVM was based on the Metronome algorithm,[10] whose commercial implementation is available as part of the IBM WebSphere Real Time.[11] Another hard real-time garbage collection algorithm is Staccato, available in the IBM's J9 JVM, which also provides scalability to large multiprocessor architectures, while bringing various advantages over Metronome and other algorithms which, on the contrary, require specialized hardware.[12]

One major challenge for real-time garbage collection on modern multi-core architectures is in designing a non-blocking concurrent garbage collection, not letting the concurrent threads block each other and create unpredictable pauses. A study of algorithms that allow non-blocking real-time concurrent garbage collection appears in a paper by Pizlo et al. in Microsoft Research.[13]

See also

Notes and References

  1. Web site: Class SoftReference<T> . Oracle . Java™ Platform Standard Ed. 7 . 25 May 2013.
  2. Web site: Class PhantomReference<T> . Oracle . Java™ Platform Standard Ed. 7 . 25 May 2013.
  3. Web site: Class WeakReference<T> . Oracle . Java™ Platform Standard Ed. 7 . 25 May 2013.
  4. Web site: Weak References . Microsoft . .NET Framework 4.5 . 25 May 2013.
  5. Web site: Copying and Pinning . . 2022-04-25.
  6. Steele . Guy L. . Guy L. Steele Jr. . Multiprocessing Compactifying Garbage Collection . Communications of the ACM . September 1975 . 18 . 9 . 495–508 . 10.1145/361002.361005 . free . 29412244.
  7. Appel . Andrew W. . Andrew Appel . 1987-06-17 . mdy . Garbage collection can be faster than stack allocation . Information Processing Letters . 25 . 4 . 275–279 . 10.1.1.49.2537 . 10.1016/0020-0190(87)90175-X . 2575400 . 2022-04-25.
  8. Hopwood . David . 2007-01-01 . mdy . Memory allocation in embedded systems . cap-talk . . https://web.archive.org/web/20150924001857/http://www.eros-os.org/pipermail/cap-talk/2007-January/006795.html . 2015-09-24.
  9. Perry . Cheng . Guy E. . Blelloch . 2001-06-22 . mdy . A Parallel, Real-Time Garbage Collector . ACM SIGPLAN Notices . 36 . 5 . 125–136 . 10.1145/381694.378823 .
  10. Bacon . David F. . Cheng . Perry . Rajan . V. T. . November 2003 . The Metronome: A Simpler Approach to Garbage Collection in Real-Time Systems . Corsaro . Angelo . Cytron . Ron . Santoro . Corrado . Workshop on Java Technologies for Real-Time and Embedded Systems . JTRES'03 . OTM 2003 Workshops . On The Move to Meaningful Internet Systems 2003 . . 2889 . 466–478 . 10.1.1.3.8544 . 10.1007/978-3-540-39962-9_52 . 3-540-20494-6 . 0302-9743 . 14565934 . https://web.archive.org/web/20061026215158/http://www.research.ibm.com/people/d/dfb/papers/Bacon03Metronome.pdf . 2006-10-26.
  11. Web site: Biron . Benjamin . Sciampacone . Ryan . 2007-05-02 . mdy . Real-time Java, Part 4: Real-time garbage collection . . https://web.archive.org/web/20201109040826/http://www.ibm.com/developerworks/java/library/j-rtj4/index.html . 2020-11-09.
  12. McCloskey . Bill . Bacon . David F. . Cheng . Perry . Grove . David . 2008-02-22 . mdy . Staccato: A Parallel and Concurrent Real-time Compacting Garbage Collector for Multiprocessors . IBM Research Division . RC24504 . 2022-04-25.
  13. Pizlo . Phil . Erez Petrank . Petrank . Erez . Steensgaard . Bjarne . June 2008 . Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation . PLDI 2008 Conferenece . 33–44 . 10.1.1.3.8544 . 10.1145/1375581.1375587 . 9781595938602 .