Arm Promo

Cortex-A78 Brings Machine Learning to Smartphones

May 28, 2020
The latest crop of IP from Arm targets smartphones that support machine-learning applications.

Arm announced its latest smartphone solutions that include the Cortex-A78, Mali-G78, and Ethos-N78, along with the Cortex-X—an optimized custom processor core to complement the Cortex-A78. The Ethos-N78 complements the Cortex-A78 in a similar fashion to the company’s latest microcontroller machine-learning (ML) support that pairs the Cortex-M55 with the Ethos-U55.

The Cortex-A78 provides a 30% improvement over the Cortex-A77 while staying within a 1-W/core power budget. The Cortex-A78 can run at 3 GHz versus 2.6 GHz for the Cortex-A77. The eight-core Cortex-A78/Cortex-A55 is also 15% smaller than last year’s Cortex-A77/Cortex-A55 combination. The Cortex-A78 includes ML enhancements, but it will normally be paired with the Ethos-N78 for more demanding ML support.

The Ethos-N78 design is scalable, providing from 1 to 10 TOPS. It bests the Ethos-N77 by doubling performance and pushing DRAM bandwidth efficiency up to 40%. Performance efficiency can improve by more than 25% for some ML models.

The Mali-G78 performance improvements focus on enhancing complex gaming scenes that provide features like smoke, moving grass, and trees to deliver a more realistic virtual environment. The result of the design changes equates to a 17% performance improvement. The platform supports up to 24 GPU cores that are 30% better when it comes to power consumption. Part of this is due to an asynchronous design technology.

The Mali-G78 can also take on ML chores. It has a 15% uplift compared to the Mali-C77. The Ethos-N78 will still be used for heavy lifting, but a CPU/GPU combination can still handle many ML tasks. It’s even possible to distribute work in a CPU/GPU/NPU configuration. There’s also a Mali-G68 that has up to six cores. It targets sub-premium tier devices that can benefit from a smaller footprint and 30% power reduction.

Arm is providing a Performance Advisor that highlights performance bottlenecks across the SoC—not just one component. It’s part of the Arm Mobile Studio that’s free to licensees. The generated reports provide improvement suggestions as well.

The Cortex-X is an interesting variant from Arm’s conventional release of IP. It’s designed to give a 30% peak performance boost (Fig. 1). It’s part of the Cortex-X Custom (CXC) program. Arm licensees like Apple have been able to push the envelope, reaching levels that weren’t possible for vendors using the standard Arm IP. The Cortex-X1 is a step toward allowing vendors to match these improvements.

Two performance areas that stand out with the Cortex-X1 are integer performance and ML (Fig. 2). The integer improvements are incremental, but the machine-learning performance change is significant. It can also be a major factor in a system design that may not include an Ethos-N78.

The Cortex-X1 doubles the amount of L1, L2, and L3 cache. It also doubles the number of 128-bit Neon cores to four. A Cortex-X1 fits into Arm’s DynamIQ cluster, so it’s possible to have more of these cores. However, a single one is likely to be the norm. The Cortex-A78 is already a capable core; it’s a matter of balancing burst operation with performance and power constraints.

Overall, the new combination of cores provides incremental improvements. The Cortex-X program will be one to watch to see how vendors can take advantage of the customizations and whether developers will be able to utilize these enhancements.

Sponsored Recommendations


To join the conversation, and become an exclusive member of Electronic Design, create an account today!