Symmetric Multiprocessing Vs. Asymmetric Processing

Download this article in .PDF format
This file type includes high resolution graphics and schematics when applicable.

Engineers contemplating a migration from a single-core to a multicore processor must identify where parallelism exists in their application. The next decision is how to partition the code over the cores of the device. The two main options are symmetric-multiprocessor (SMP) mode and asymmetric-multiprocessor (AMP) mode.

In some cases, combinations of these make sense as well. There’s just one kernel in SMP mode, and it’s run by all cores. In AMP mode, each core has its own copy of a kernel, which could be different (heterogeneous operating systems) from, or identical (homogenous operating systems) to, the one the other core is executing.

In SMP mode, a single operating system (OS) runs on all processors, which access a single image of the OS in the memory. The OS is responsible for extracting parallelism in the application. It dynamically partitions tasks across the processors, manages the ordering of task completion, and controls the sharing of all resources between the cores.

The latter point is important, especially if there isn’t a one-to-one relationship of chip resources (such as memory controllers) to cores. In this case, the OS will dedicate one core to “own” the resource and alert others if they need to know, but the user needn’t know which core owns the resource.

To the user, the program appears to run on a single processor, and much of the chip complexity is “under the hood” of the OS. This is the simplest programming model. It’s frequently used to scale the performance of an application that has parallelism (i.e., processes), but isn’t inherently parallel (as is the case when managing data streams originating from millions of separate users).

Performance will scale less than linearly with the number of cores put in place. In fact, the incremental value of adding another core approaches zero as the high degree of intercore communication eventually overwhelms the gain adding the core.

In AMP mode, the processor cores in the device are largely unaware of each other. Separate OS images exist in main memory, though there may be a shared location for interprocessor communications. AMP may take the form of multiple instances of the same non-SMP-aware OS.

Or, different OSs may reside on the two cores. For example, if two previously separate processors are being collapsed onto one dual-core device, different OSs may reside on the two cores.

In another usage model for AMP, a statically partitioned, task-offload arrangement-one core manages the computationally intensive calculations while the other runs the main OS. This arrangement sometimes is used with a real-time OS (RTOS), which forwards packets, and the Linux OS, which implements higher-level applications.

In any of these three cases, the user must pay attention to the resource sharing between the two OSs. Neither OS owns the entire chip, so resource sharing isn’t “under the hood” as in SMP mode.

If there’s a single instance of a resource on the multicore device, one core typically “owns” it. That means it receives interrupts and error messages and would be responsible, at the application-level, for passing the required information to the other core.

AMP has some advantages, though. It’s the only approach that works when two separate OSs are in place. Also, resources can be dedicated to critical tasks, resulting in more deterministic performance. And it often has higher performance than SMP, because the cores spend less time handshaking with each other.

The decision to use SMP or AMP largely will be about what’s easiest to implement. Applications that already run on an SMP-aware OS, such as Linux, can easily scale by adding more cores under SMP. AMP is a good choice if the application has obvious parallelism that’s easily partitioned to the number of cores at the user level.

Networked multiprocessing is an asymmetric implementation mostly found in embedded architectures where the system typically has a network connection, shared bus, or shared mailbox for communication between processing elements. Otherwise, the main memory isn’t shared.