Unified CPU/GPU Memory Architecture Raises The Performance Bar

Related Articles

AMD’s accelerated processing unit (APU) integrates a CPU and GPU on the same chip. The company’s R-Series processors essentially took two components and dropped them into the same chip (see “APU Blends Quad Core x86 With 384 Core GPU”). The CPU and GPU still operated independently with their own memory, even when they went to a system-on-chip (SoC) with the embedded G-Series (see “APU Targets Embedded Applications”). Bringing the CPU and GPU closer architecturally has had benefits to developers in reducing power requirements while improving performance but in these instances the CPU and GPU memories are distinct. The APU approach is evolving.

Download this article in .PDF format
This file type includes high resolution graphics and schematics. The new heterogeneous Uniform Memory Access (hUMA) design will be used in future heterogeneous system architecture (HSA) configurations. HSA normalizes the CPU and GPU providing equal, integrated memory and cache support (Fig. 1). It also allows tighter integration of the CPU and GPU software. The instruction sets and semantics are too different, so there will still be major distinctions between code that runs on each, but now the two will be able to cooperate more quickly with less overhead. This type of overhead is one reason Nvidia added dynamic parallelism support and direct access to network adapters in its Kepler GPU architecture (see “GPU Architecture Improves Embedded Application Support”).

1. AMD’s heterogeneous system architecture (HSA) brings heterogeneous Uniform Memory Access (hUMA) support to CPU and GPU cores on the same chip to increase speed and reduce overhead.

Also, HSA provides the same cache coherency and virtual memory mapping between all CPUs and GPUs (Fig. 2). This is significant because it means less translation between data shared between CPU and GPU application code. CPUs and GPUs with separate memory spaces must copy data between the two where the memory address is different.

2. AMD’s HSA provides the same cache coherency and virtual memory mapping between all CPUs and GPUs.

Furthermore, the HSA approach means that CPU/GPU boundaries won’t restrict data access. A GPU will have access to the same data as the CPU and vice versa. Programmers used to be very mindful of what data needs to be copied between CPU and GPU memory. This often meant breaking or replicating data structures.

Sharing a virtual memory environment also raises the possibility of better GPU management and GPU application isolation. To date, standalone GPUs tend to have no memory protection between applications they may be running even when the applications are independent.

For example, these days a GPU may be handling display and rendering chores at the same time that it is performing computational chores that may be unrelated to the display. Bad or malicious programming can cause one of these tasks to corrupt the other. The same was true on a CPU without memory protection.

We have seen how important memory protection and virtual memory has been for CPUs. HSA could provide similar improvements on the GPU side. This will go a long way to allowing more dynamic use of GPU applications that have usually been limited to a single application or the operating system.

Memory management will also improve for the GPU applications. Virtual memory with paging support and dynamic memory allocation can greatly simplify programming tasks while providing better security and memory utilization.

A unified approach can also reduce system power requirements by delivering more performance with lower overhead. It remains to be seen how much of an impact HSA will have, but the potential is significant. The fine-grain power management integrated with the software on CPUs will find its way into the GPU side.

Download this article in .PDF format
This file type includes high resolution graphics and schematics.