Unified Heterogeneous Computing Arrives

Unified Heterogeneous Computing Arrives

AMD’s 2013 Developer Summit is off to a bang with chips, SDKs and architectures. The Kaveri (Fig. 1) Accelerated Processing Unit (APU) implements AMD’s Heterogeneous System Architecture (HSA). HSA delivers heterogeneous unified memory access (hUMA)so CPUs and GPUs share the same virtual address space. It means that data does not have to be copied between CPU and GPU accessible memory as is required with most CPU/GPU environments.

Figure 1. AMD’s Kaveri APU implements the Heterogeneous System Architecture that unifies the memory and scheduling of the on-chip CPUs and GPUs.

The new APUs provide hQ (heterogeneous queueing) support (Fig. 2). The hQ allows CPU and GPU applications to schedule new GPU jobs directly. Combined with hUMA, hQ delivers significant efficiencies compared to isolated CPU/GPU architectures.

Figure 2. The hQ architecture allows multiple GPU task queues that can be fed from CPU or GPU applications.

The GPU takes up 47% of the 28 nm Kaveri chips. That is a big chunk of space for the GPU but this delivers performance that used to be found only on dedicated GPU platforms. AMD will still be delivering dedicated GPU chips and boards like the AMD Radeon R9 290 graphics card (Fig. 3). It uses AMD’s Graphics Core Next (GCN) architecture to deliver UltraHD 4K display support. It also uses AMD’s TrueAudio technology.

Figure 3. AMD Radeon R9 290 graphics card is designed to drive UltraHD 4K displays at high frame rates.


The PCI Express board has 2560 947 MHZ stream processing units. The 4 Gbytes of GDDR5 memory run at 5.0 Gbit/s delivering a memory bandwidth of 320 Gbytes/s. The system provides single precision computing power of 4.85 TFLOPS.


The Radeon R9 290 as well as the APUs will be supporting the Mantle in addition to DirectX and OpenGL. Mantle targets gaming applications but it remains to be seen whether Mantle can displace other frameworks like DirectX or whether it will be used on gaming platforms like Sony’s PS4.  

The AMD APP SDK 2.9 provides support for HSA found in the new APUs. It includes support for a range of open source libraries like the Bolt C++ template library, OpenCV and clMath. The media SDK is in beta. The latest CodeXL tool suite now supports Java and remote debugging. It also has CPU and GPU profilers, GPU debugger support and a static kernel analyzer for OpenCL.

Java is a big part of the HSA announcements. Java 7 OpenCL support uses Ararapi. It does not require programmers to learn OpenCL although there are parallel programming idioms to be used to develop parallel applications.

HSA will move into the server space as well with the “Berlin” APU (Fig. 4). It is part of the Opteron X series. It targets web and enterprise clusters. It is available in 2013 and will be available on platforms like Hewlett-Packards Moonshot.

Figure 4. AMD’s “Berlin” processor brings APU technology to the server space.

I am the AMD Developer Summit now and hope to find out more about the HSAIL virtual machine. HSAIL is the APU GPU software interface. The HSAIL Finalizer generates GPU code from the HSAIL bytecode stream generated from compilers like LLVM.

The HSA APU is in its infancy. AMD is now delivering the first pass of software that will radically change how programmers take advantage of the GPU. This is just the beginning.

Hide comments


  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.