ARM has provided a little insight into its next-generation architecture, known as DynamIQ (Fig. 1). The architecture improves 64-bit, multicore Cortex-A integration with hardware accelerators. This will allow developers to incorporate on-chip acceleration units that target artificial intelligence applications.
Doing so could provide a 50× boost in these applications, and the architecture provides a 10× improvement in communication response time between the processing cluster and hardware accelerators compared to existing architectures. The details are still fuzzy on how this massive improvement is attained, but ARM did reveal a few reasons why system performance will be enhanced.
1. ARM’s next-generation DynamIQ architecture targets AI applications.
The DynamIQ architecture is built around an 8-core cluster (Fig. 2) versus the current 4-core cluster approach. It has redesigned memory subsystem that will work with the new ARMv8.2 Cortex-A cores that are compatible with DynamIQ. There is also the new interface for hardware accelerators that will link to the cluster.
2. The DynamIQ architecture supports up to 8 cores in a cluster and the cores may all be different configurations.
DynamIQ extends the big.LITTLE approach (Fig. 3) by allowing each core in the cluster to be different. This allows each core to be utilized in an optimal fashion. The big.LITTLE operating system support has been commonly used in existing systems that typically have two different core types, often in different clusters. The system will also allow for finer-grain speed control and power management. Cores can be managed independently.
3. DynamIQ extends the big.LITTLE approach allowing each core to be used in an optimized fashion as prescribed by the operating system and applications.
The architecture will include some new instructions to support the new features and integration with accelerators, including instructions to enhance machine learning (ML) and artificial intelligence (AI). It is also designed to facilitate support for ASIL D address safety-critical applications like automotive Advanced Driver Assistance Systems (ADAS) and self-driving cars.
The memory subsystem redesign provides improved, tightly coupled memory. Memory blocks can be managed individually to save power, and the caching system has been restructured to better handle multilevel cache designs.
The accelerator interface will be an open standard like ARM’s current AMBA interface. Vendors can create compatible designs, allowing developers to easily incorporate third-party hardware.