Multicore has dominated the high end of the computing spectrum as well as the high end for mobile devices. Using Arm’s big.LITTLE approach, some cores have even been mixed to reduce power consumption in more powerful smartphones and tablets (see “Little Core Shares Big Core Architecture” on electronicdesign.com).
Download this article in .PDF format
This file type includes high resolution graphics and schematics when applicable.
The closest off-the-shelf designs that typically have come to asymmetric solutions are platforms like AMD’s accelerated processing units (APUs) that combine CPU and GPU cores on the same chip (see “APU Blends Quad Core x86 With 384 Core GPU” on electronicdesign.com). GPUs and CPUs are combined on a single chip on a range of microcontrollers. The GPU also has been used for more computing aspects, not just for providing display support. Still, GPU computing normally requires a different programming environment like OpenCL or CUDA versus a common programming environment for symmetrical multiprocessing (SMP) systems.
Asymmetric multiprocessing (AMP) configurations have been less common. Off-the-shelf asymmetric microcontrollers are more readily available, though, giving developers more flexible solutions for power-challenged applications like battery-powered mobile devices.
One of the first AMP microcontrollers was NXP’s LPC4000 family (see “New Platform Approaches Deliver Top Digital Designs In 2010” on electronicdesign.com). The LPC4000 combines 32-bit Arm Cortex-M0 and Cortex-M4 cores on the same chip (Fig. 1). The Cortex-M4 runs a superset of the Cortex-M0 instruction set. The Cortex-M0 shares memory and some peripherals with the Cortex-M4. How application code is split between the two cores will be application-specific. For example, the Cortex-M0 might handle communication while the Cortex-M4 runs the user interface.
The LPC4000 is used in the Pixy Cam smart camera (see “A Tale Of Two Camera Kits” on electronicdesign.com). The Cortex-M0 is used as a peripheral controller capturing data from the camera. It performs some limited post-processing before passing the data to the Cortex-M4, which handles the image analysis. The analyzed image data, which can include colored object size and position, is passed on to the host via USB or a serial port.
The Ineda Systems Dhanush Wearable Processing Unit (WPU) family (Fig. 2) has up to three MIPS-compatible cores (see “Hierarchical Processors Target Wearable Tech” on electronicdesign.com). It starts with a single-core system based on the MIPS microAptiv UC core (see “MIPS Aptiv Family Brings Consolidation And Raises Performance Bar” at electronicdesign.com). At the high end, the third core is an interAptiv. The WPU is designed for power-constrained environments. The cores can operate individually or in concert. The lowest active power operation would utilize the smallest core, SRAM, and some peripherals.
These microcontrollers have advantages and disadvantages from a programming standpoint. Hardware partitioning provides more effective software partitioning but at the added cost of software complexity. Power management tradeoffs are more complex because multiple cores and subsystems are involved, but hardware partitioning provides better control of the system and its power utilization.
Specialized Peripheral Controllers
Other AMP configurations include platforms that incorporate specialized peripheral controllers that have some degree of programmability but tend to be less functional than the main CPU core. For example, the Texas Instruments Sitara AM437x has a single ARM Cortex-A9 core (Fig. 3). It also has four Programmable Real-time Unit and Industrial Communication Sub-System (PRU-ICSS) cores.
The PRU-ICSS core is a simple 32-bit RISC processor that keeps code and data in SRAM. It is programmed in assembler. The instruction set has only about 40 instructions. Like the other AMP systems, these cores can operate while the main core is shut down. The programmable system provides significantly more functionality compared to smart peripherals or advanced direct memory access (DMA) subsystems.
Also, the PRU-ICSS can provide higher-level protocol support on communication links or advanced motor control. Each core has its own program memory. This produces a deterministic execution environment that is independent of all other cores within the system. Shared memory offers interprocess communication support.
Of course, the challenge is to balance the complexity of a multicore system against a faster, more powerful processor. Multicore has been a requirement at the high end because of power and frequency limitations, but they have a different impact with microcontrollers.