Atego And Real Time Garbage Collection

1 of Enlarge image

Atego garbage collection overview

I've always thought that managed programming languages like Java and C# that take memory management into the hands of a garbage collector are a good solution for many embedded applications were reliability is critical. Unfortunately many designers shy away from languages that use garbage collectors because of myths or encounters with bad implementations.

These days real time garbage collection solutions are more common allowing managed application languages to be used in embedded and even safety critical applications. In fact, these platforms are usually better than using C or C++. Many do not know that Microsoft's .NET platform supports garbage collection.

To provide more insight in the issue I had a chat with Dr. Kelvin Nilsen who is Chief Technology Officer over Java at Atego. He also provided a slide show on the operation of an incremental garbage collector (Fig. 1) that many may find enlightening.

Wong: What type of garbage collector do most Java implementations utilize?

Nilsen: Most Java implementations use mark and sweep tracing garbage collection, with a compacting phase integrated into the sweep, and throughput optimizations based on generational garbage collection techniques. Tracing garbage collection means that the system discovers which memory is in use by tracing chains of pointers starting from certain known global variables known as roots. Contrast this with a reference counting scheme, in which every object maintains a count of how many references point to it, the counts are maintained automatically every time a reference variable is overwritten, and dead objects are reclaimed instantly as soon as the reference count is decremented to zero. The mark phase of garbage collection consists of identifying every object that is reachable by tracing a sequence of pointers (Java reference variables) from one of the root variables. After the mark phase is completed, the "marked" objects are the ones that are still in use and cannot be reclaimed.

The sweep phase walks through memory from low to high addresses, sweeping all unmarked objects into the memory allocation pool. The compacting phase relocates the marked objects to contiguous locations in order to coalesce all of the independent free segments into a single large free segment. This provides defragmentation of memory. While in-use memory is being compacted, it is very difficult for application code to continue accessing heap objects. For this reason, most Java virtual machines suspend execution of all application processing during memory compaction.

Generational garbage collection is an optimization based on the observation that the large majority of allocated objects have a very short life time. In principle, a garbage collector can be much more effective if it focuses its attention on the objects that were most recently allocated. The idea is that with, perhaps, 20% of the effort required to mark all of memory, the garbage collector can reclaim 80% of the objects that are no longer in use. Besides improving throughput, generational garbage collection techniques also reduce the duration of typical garbage collection pauses. Generational garbage collection does not reduce the maximum duration of a garbage collection pause. In most implementations, the use of generational garbage collection techniques actually increases the duration of the longest garbage collection pauses.

Wong: What type of garbage collector does Atego provide?

Nilsen: The PERC Ultra garbage collector uses a patented garbage collection technique comprised of a combination of incremental mark and sweep and copying garbage collection technologies. We make garbage collection incremental by dividing the complete garbage collection effort into thousands of small increments of work. Between increments of garbage collection work, the garbage collector can be preempted by higher priority application requests. When the high priority application code completes, the garbage collector resumes. The PERC Ultra garbage collector is tuned to enable predictable real-time responses to asynchronous events with typical response time constraints in the range from 1 ms to 50 ms. With a bit of extra care to properly configure the underlying real-time operating system and certain PERC Ultra configuration parameters, the PERC Ultra virtual machine can be used to implement systems with response deadlines as low as 250 microseconds.

The real-time constraints honored by the PERC Ultra virtual machine make it impractical to integrate a compaction phase into the mark and sweep garbage collector because the compaction technique used during traditional mark and sweep garbage collection imposes excessively long delays on the execution of application threads. Thus, the mark and sweep region of memory is not defragmented. For this reason, the PERC Ultra garbage collector also includes support for copying garbage collection.

Atego's copying garbage collector periodically copies all live objects from one region of memory known as from-space into another region of memory known as to-space. After all live objects have been copied, what remains in from-space is only dead memory. So the entire from-space can be reclaimed as a large contiguous (defragmented) region of free memory. The copying technique is incremental, unlike the compaction phase of a mark-and-sweep garbage collector.

The key attributes of Atego's real-time garbage collector that qualify it for use in mission-critical real-time systems are that it is accurate, preemptible, incremental, defragmenting, and paced. Accurate means it guarantees to reclaim the memory of all objects that are no longer in use. Preemptible means it can be preempted by higher priority application code when required. Incremental means that the garbage collector's forward progress is preserved when it is preempted and resumed. Defragmenting means that the garbage collector relocates live objects in order to coalesce multiple independent smaller segments of free memory into larger contiguous free memory segments. Paced means the garbage collector runs concurrent with application code, at a pace that is designed to replenish the free pool before the application's appetite for new memory allocations exhausts the free pool.

Wong: How does Atego's garbage collector work?

Nilsen: In order to support arbitrary interleaving of application processing with increments of garbage collection activity, the PERC Ultra native-code compiler inserts special instructions accompanying every object access. These special instructions allow application code to coordinate with the garbage collector. If these special instructions are associated with an application's attempt to modify an object, we call them a "write barrier". If the special instructions are associated with an attempt to fetch the contents of an object, we call them a "read barrier". These barriers serve multiple purposes, depending on the phase of garbage collection and the type of information being accessed. If application code overwrites a pointer variable during the garbage collection mark phase, for example, the write barrier will arrange to mark the object referenced by the new value of this pointer. And if application code attempts to read a field of a from-space object that was recently copied into to-space, the read barrier will arrange to fetch the data from the new location of that object.

When PERC Ultra developers configure the PERC virtual machine, they have the option to specify the organization of heap memory. The PERC heap is structured as a collection of equal-sized memory regions. Each time the PERC garbage collector runs, one region is selected as from-space and its live objects are copied into a second region, identified as to-space. The from-space region is fully defragmented. The rest of the memory regions are garbage collected using an incremental mark and sweep technique.

The combination of mark and sweep and copying garbage collection techniques gives developers the opportunity to exercise tradeoffs between reliability and efficiency. For most real-world workloads, mark and sweep garbage collection offers the highest typical utilization of available heap memory, often reaching utilizations in excess of 80%. However, without compaction, mark and sweep garbage collection may suffer from memory fragmentation, in which an abundance of memory is available, but the memory is divided into thousands of free segments, none of which is large enough to satisfy a particular memory allocation request. In an extreme circumstance, for example, a 100 Mbyte heap that is less than 10% utilized may be so fragmented that it is unable to allocate memory for a single 10,000-element integer array. Since fragmentation impacts are determined by the order in which objects are allocated and reclaimed, it is very difficult to prove through analysis of a program that it will never suffer from fragmentation problems. However, for programs that never make large-object allocations, it is generally safe to assume that defragmentation of memory is not required for reliable operation.

The copying garbage collection technique offers superior reliability and fully deterministic operation in the face of unpredictable or demanding memory allocation workloads. The copying garbage collector automatically defragments all of memory every time it runs. In order to achieve this, it needs the heap to be at least twice as large as the amount of live memory used by the application at any time. Thus, copying garbage collection can utilize no more than 50% of available memory. Each time the PERC Ultra garbage collector runs, it uses a heuristic to select the most fragmented of currently used memory regions as the current from-space. Over time, this allows all of memory to be defragmented.

If a particular deployment of the PERC Ultra virtual machine must be configured for ultimate reliability, then it is typically configured with a small number of very large memory regions. This allows more of the heap to be defragmented on each pass of the garbage collector. The cost of this tradeoff is that more physical memory must be dedicated to execution of the program. Alternatively, if a particular deployment of the PERC Ultra virtual machine must be configured for most efficient use of available memory resources, and the application is willing to risk the possibility of less predictable memory utilization and response times, then that virtual machine will most likely be configured with a larger number of small memory regions.

Wong: Are there differences between garbage collection on Perc Ultra SMP versus the single core version?

Nilsen: The PERC Ultra SMP product is designed to utilize the full potential of multiple processors for a combination of application processing and garbage collection activities. This is reflected in several changes to the way PERC Ultra is implemented:

Since multiple application threads may be running in parallel on different processors, the implementation of Java synchronization is a bit more complex in the SMP version of our product.
Since garbage collection may be running on one processor in parallel with application threads running on other processors, the coordination between application code and the garbage collector, as represented by the automatically generated read and write barriers, is also a bit more complex.
The garbage collector has been restructured to allow each increment of garbage collection to be performed by a different processor. If there is no application processing to be done, all of the available cores can be dedicated to garbage collection. When there is no garbage collection to be done, all of the available cores can be dedicated to application code. And when there exist both application and garbage collection workloads, the available cores are partitioned between them.
Given the changes in the implementation of synchronization, read and write barriers, and the garbage collector itself, performance tuning of the PERC Ultra SMP product has resulted in many adjustments to the settings chosen for the single-core PERC Ultra product.

Wong: Can the garbage collector take advantage of multiple cores?

Nilsen: Yes. The garbage collector uses all available cores to complete its work in less time.

Wong: What is aonixPerc Raven and how does its memory management system work?

Nilsen: The Aonix Perc Raven name describes a future product that will be targeted to the needs of safety-critical development with the Java language. The current Aonix PERC Pico product implements the same architecture and capabilities as the anticipated future Aonix PERC Raven product, but without addressing the unique issues of safety certification. Both products will evolve as the JSR-302 expert group responsible for defining a standard for safety-critical Java completes its work. The general approach to memory management system will not evolve.

With Aonix PERC Pico and Aonix PERC Raven, all temporary objects are allocated on the run-time stack instead of within a garbage-collected heap. This allows a much simpler implementation of the virtual machine, and it forces a discipline on the application developer which enables a more thorough analysis of execution behavior. Programmers are required to annotate Java software to indicate their intentions with respect to temporary object lifetimes, and a special byte-code verifier assures that the program implements these intentions. The enforcement provided by the byte-code verifier assures that the program will not have dangling pointers and guarantees the absence of the scoped-memory exceptions that may be thrown at run time by implementations of the Real-Time Specification for Java.

In comparison with traditional Java, the Aonix PERC Pico technology is much smaller and is generally faster because there is no need for garbage collection coordination. In comparison with the full-fledged Real-Time Specification for Java, Aonix PERC Pico is smaller, faster, and safer. Given its smaller and simpler implementation, it is much more feasible to fully analyze a program's resource usage and timing behavior, which is generally required for safety-critical deployments.

Wong: There are different versions of Perc. How does memory management differ between them?

Nilsen: The key differences in memory management between the different versions of PERC are all described in the answers above.

Wong: What platforms is Perc available on?

Nilsen: PERC is commercially supported on a variety of architectures. Native code compilers are available for Intel x86 architecture, ARM, and PowerPC. Operating system support is available for Green Hills INTEGRITY, Linux, LynuxWorks LynxOS and LOS-178, Microsoft Windows and Windows Mobile, ENEA OSE, Sysgo PikeOS, Wind River VxWorks, and QNX Neutrino.