Skip navigation
iphone-android-smartphone.jpg

Cortex-A77 Dials in on Premium Smartphones

Arm’s latest Cortex-A77 CPU and Mali-G77 GPU are designed for high-end smartphones and embedded applications.

Arm’s 7-nm Cortex-A77 is the third generation of the DynamIQ big core technology (Fig. 1). It delivers an overall 4X improvement over the Cortex-A76, with a 20% single thread speed improvement. Combined with the Mali-G77 GPU, the pair initially target high-end smartphones. However, they will be equally at home in high-performance embedded applications where the Cortex-A76 has been dominant.

ARM_A77_Fig_1_Cortex-A77.png

1. The Cortex-A77 provides a 20% performance boost over the Cortex-A76 running at the same frequency. 

Also part of the mix are the Mali-D77 DPU (display processing unit) and Arm’s machine-learning (ML) neural processing unit (NPU) (Fig. 2). These all target the 5G rush that’s putting more display and ML functionality into handheld applications. For example, the DPU can support untethered augmented- and virtual-reality (AR/VR) headsets.

ARM_A77_Fig_2.png

2. Arm’s Cortex-A77 CPU, Mali-G77 GPU, Mali-D77 DPU, and ML NPU combine to target 5G mobile and embedded application.

Every component has seen enhancements and additions. The Cortex-A77 is based on the ARMv8.2 architecture supporting AArch32 and AArch64 instructions (Fig. 3). Each core has a 64-kB L1 instruction and data cache as well as up to a 512-kB L2 cache—all with error-correcting code (ECC) support. The multiple cores share an L3 cache that can be up to 4 MB. The Cortex-A77 can be combined with a Cortex-A55 in a big.LITTLE configuration.

ARM_A77_Fig_3_Cortex-A77.png

3. The Cortex-A77 changes run across the chip, from a doubling of the branch prediction size to an L0-like instruction cache.

The Cortex-A77 doubles the branch prediction support, changing the lookahead from 32 bytes/cycle to 64 bytes/cycle, and there’s a faster instruction fetch subsystem. Included are a new integer ALU pipeline and a new macro-OP cache that essentially implements a L0 instruction cache.

The Mali-G77 delivers just under one-and-a-half times the performance of the Mali-G76 that was announced with the Cortex-A76. Leveraging a new Valhall architecture, it’s designed to handle the latest AR, VR, and ML tasks with a 60% ML improvement over its sibling. However, ML performance can be further improved using Arm’s NPU. The Valhall architecture uses a superscalar engine. It has a simplified instruction set, and the internal data architecture has been optimized to handle the latest graphical APIs such as Vulkan.

The Mali-G77 ups the number of FMA lanes to 32. Each core now has two clusters that are 16 wide warps instead of 8. This translates to a 33% performance improvement using the same chip area. The quad texture mapper delivers 4 texels/cycle or double the throughput of the Mali-G76. The system also supports the Arm Frame Buffer Compression through AFBC 1.3. This latest standard supports 2-plan YUV, support for separate depth/stencil encoding, improved support for front-buffer rendering, and opacity/transparency hints.

The Mali-D77 will show up in more demanding AR/VR applications as well as providing better gaming support. The top-end display controller provides services such as asynchronous timewarp (ATW), lens distortion correction (LDC), and chromatic aberration correction (CAC) (Fig. 4). It can handle composition up to four VR layers and it’s optimized for 3Kp120 VR displays. This includes 4Kp90 support as well.

ARM_A77_Fig_4_mali-d77-atw-ldc2.png

4. The Mali-D77 is optimized for 3Kp120 displays and provides features like asynchronous timewarp (ATW) support that implements re-projection to minimize display pipeline latency that improves the AR/VR experience.

All of these features are designed to provide high-resolution support while minimizing display pipeline latencies. Features like LDC and CAC allow for the use of lower-cost optics, since the hardware can address discrepancies that would otherwise need to be handled by higher-end lens systems.

Though the NPU isn’t new, it’s designed to significantly accelerate ML models. These are important in a range of applications from image processing to gaming where the other three components are key. Not all systems will need the NPU, but high-end smartphones are one likely target.

Overall, Arm’s latest combination is designed to take advantage of the higher bandwidth provided by 5G.

SourceESB banner with caps

Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish