ARM Lifts Lid on Cortex-A72

ARM Lifts Lid on Cortex-A72

ARM made a big splash when it announced the Cortex-A50 (see “Delivering 64-Bit Arm Platforms”) almost two years ago. This was the start of the ARMv8 architecture and ARM's entry into the 64-bit fray. It has only been recently that hardware has arrived in more than eval quantities. The Cortex-A50 family branched out in the Cortex-A53 and Cortex-A57. These two could be paired using ARM's big.LITTLE architecture (see “Little Core Shares Big Core Architecture”).

The Cortex-A53 now gets another big brother with ARM's latest Cortex-A72 (Fig. 1). The new 64-bit core is neede for applications that need features like 120 frame/s 4K video. The architecture is designed for mobile applications. Implementations using TSMC's 16 nm FinFET process can run at 2.5 GHz. The architecture is designed to scale to higher performance levels. As is, it is 3.5 times the performance of the 32-bit Cortex-A15 that dominates the mobile space but the Cortex-A72 will have to challenge the Cortex-A53 products now coming on the market. A 75% energy reduction compared to existing Cortex-A15 technologies should help quite a bit. The Cortex-A57 has a 1.9x performance advantage and uses about half the power of the Cortex-A15.

Figure 1. The Cortex-A72 is ARM's latest 64-bit ARMv8 incarnation.

The high level Cortex-A72 architecture is not new. It matches the functionality of the Cortex-A50 with the NEON SIMD engine and floating point support. The instruction cache supports parity while the data cache supports ECC as dos the L2 cache. The Cortex-A72 supports the Accelerator Coherency Port (ACP) and the Snoop Control Unit (SCU). The SCU's snoop filter is now in the interconnect instead of within the core cluster. This allows more efficient cache management providing better power efficiency by reducing transaction overhead. The 128-bit, coherent bus interface can be AMBA4 ACE or AMBA5 CHI.

The new CCI500 Cache Coherent Interconnect (CCI) doubles the memory performance of earlier CCI-400 systems (Fig. 2). This is need to handle the higher resolution and bandwidths for 120 frame/s 4K video. Up to four memory channels are supported. It supports big.LITTLE. It also has a TrusZone secure media path for encrypted Ultra-HD media.

Figure 2. CCI500 Cache Coherent Interconnect doubles memory performance.

CCI500 scales from 1 to 4 clusters. Each cluster can have 1 to 4 cores. The big.LITTLE configuration uses two clusters. In this case, one would have Cortex-A53 cores and the other would have Cortex-A72 cores.

At the other end of the a Cortex-A72 system is likely to be ARM's new Mali-T880 GPU (Fig. 3). The Mali-T880 is supposed to be 40% more energy efficient and 1.8 times faster than the current Mali-T760 family. It is like paired with the Mali-DP550 display and Mali-V550 video processors to handle 4K video.

Figure 3. The 850 MHz Mali-T880 GPU has a throughput of 13.5 Gpixels/s.

The Mali-T880 has up to 16 cores that support virtual memory. Software support includes OpenGL, OpenCL, DirextX11 and RenderScript. It supports ARM's Frame Buffer Compression (AFBC) and Adaptive Scalable Texture Compress (ASTC) (see “Khronos Releases ASTC Next-Generation Texture Compression Specification”). The Transaction Elimination support looks for identical blocks of pixels in two consecutive render targets while the Smart Composition support expands upon this so identical input blocks are read only once.

ARM will likely fill out the 64-bit space in 2016 as it did with the 32-bit space so the Cortex-A72 is just the start. It is also likely that chips based on this architecture will be available a lot sooner that the initial crop of Cortex-A50 chips primarily because of the demand in the mobile space.

Hide comments


  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.