ARM continues to push the envelope with its latest trio of cores that include the Cortex-A55 and Cortex-A75 CPUs (Fig. 1) and the Mali-G72 GPU. These take advantage of ARM’s recently announced DynamiIQ architecture. The combination targets the high-end mobile space, as well as applications that utilize both machine learning and augmented, mixed, and virtual reality (AR/MR/VR).
The core clusters have private L2 caches and a 4 Mbyte shared, 16-way set associative L3 cache that can be partitioned into a maximum of four groups. Repartitioning can be done at runtime by the OS or hypervisor. The DynamIQ L3 cache snoop control unit (SCU) is shared by all cores in the cluster. The SCU is part of the DynamIQ shared unit (DSU) that also include low latency interfaces for closely coupled accelerators, in addition to advanced power management support.
The cluster can contain any combination of up to 8 CPU cores (like the typical 4 by 4 big.LITTLE configuration) to more device-specific platforms (like one Cortex-A75 and seven Cortex-A55s, or vice versa). This allows developers to choose the combination that works best for their application. The latest combination supports the DynamIQ Energy Aware Scheduling (EAS) support.
The Cortex-A75 delivers 50% more performance that its Cortex-A72 and Cortex-A73 siblings. Likewise the Cortex-A55 is 2.5 times more power-efficient than the Cortex-A53 that is also found in big.LITTLE combos with the Cortex-A72 and Cortex-A73. The Cortex-A55 is built on the ARMv8.2 specification. The in-order CPU has a very small die size that is highly energy-efficient. The Cortex-A75 is 2.5 times larger in area than the Cortex-A55 but is more than 20% faster than the Cortex-A73.
The system allows fine grain power management from controlling cores individually to cache management. Parts of the L3 cache can also be turned off when required, such as performing audio or video playback when much of the system can be shut down.
The Cortex-A75 and Cortex-A55 offers a number of enhancements over earlier platforms. This includes Virtual Host Extensions (VHE) need for Type 2 hypervisors like Linux’s KVM. It supports atomic actions, extended cache stashing, and the wider 256-bit AMBA 5interface. The clean to persistent memory feature is designed for future non-volatile memory hierarchies.
The Int8 byte-oriented dot product targets neural network and machine learning applications. Essentially, deep neural networks (DNNs) work very well with smaller weight values and 8-bits is usually more than adequate. This allows the CPU to handle these matrix operations efficiently. The GPUs are also being tuned to handle this instead of just larger integers or floating point numbers.
ARM is also providing new branch prediction support that takes a neural net-like approach. This isn’t the first time this approach has been used. AMD’s Ryzen also uses a neural net structure for its branch prediction support.
A typical system will often include the Mali-G72 GPU, along with additional display and video support that has already been available like the Mali-V550 video subsystem (Fig. 2). The Mali-G72 GPU extends ARM’s Bifrost architecture and provides 1.4 times the performance of earlier subsystems. It has also been enhanced to support DNNs. Its GEneral Matrix Multiply (GEMM) is 17% more energy-efficient than earlier Mali GPUs.
The Mali-G72 GPU also targets AR/MR/VR space with multiview drawing support. Multiview support is where two almost identical images are rendered, one for each eye. Software to support AR/MR/VR can take advantage of this hardware acceleration allowing higher frame rates with reduced overhead and lower power requirements.
The GPU includes additional AR/MR/VR enhancements such as multisampling, anti-aliasing, and foveated rendering. This is where higher definition processing is done on the area where the eyes are focused. This is done by tracking where the eye is looking.
The Mali-G72 GPU also has Adaptive Scalable Texture Compression (ASTC) support. The transaction elimination (TE) support works on a 16- by 16-pixel block to identify identical blocks between two consecutive rendered targets. The Smart Composition feature extends TE to every stage of the user interface composition system. It eliminates the need to read and process identical information.
The high-fidelity gaming market is also supported by the Mali-G72. It has an 87% bandwidth savings compared to the Mali-G71. This handled by the pixel local storage (PLS) G-Buffer.
The Cortex-A55, Cortex-A75, and Mali-G71 can be used by themselves, but they are designed to be integrated. They will likely wind up in high end system-on-chip (SoC) solutions for the mobile space.