Multicore MCUs Propel Performance To New Heights

Dec. 8, 2004
A trio of multicore microcontrollers expands computing-power and bandwidth boundaries.

Pushing the envelope? Get an extra shove by adding a few extra processor cores to that high-performance microcontroller (MCU). Just such an approach was taken by Broadcom, Freescale, and PMC-Sierra: Start with high-performance 64-bit processor cores and multiply their effectiveness with one to three additional cores.

Multiple-core solutions offer benefits that aren't possible with a faster, single-core solution. Interrupt handling can improve with a multiple-core solution because each processor can handle an interrupt. Interrupt processing will be faster with a faster processor, but it must complete the interrupt-handling routine before moving on to the next interrupt.

Also, an application for a multiprocessor environment will be more complex, but typically it can be used on any number of processors. Some MCU solutions, like Broadcom's BCM1480, support NUMA (non-uniform memory access).

Striking similarities exist among the three MCUs. Each has multiple 64-bit processor cores tied to a high-speed, internal interconnect. All sport quad Gigabit Ethernet interfaces, which isn't surprising given that one of the target application areas is communications.

PMC-Sierra's RM11200 incorporates a HyperChannel interface along with dual PCI Express interfaces. Freescale's MPC8641D has two independent high-speed serial interfaces in addition to standard interfaces like PCI Express and Serial RapidIO. Dual SDRAM memory controllers deliver data as fast as it can be extracted from the off-chip memory. Broadcom's BCM1480 stays with the tried and true 133-MHz, 64-bit PCI-X interface for off-chip peripherals—but it's no slouch when it comes to communication or expansion. In fact, its support for NUMA via HyperTransport links enables multicore, multichip solutions with up to 16 processor cores.

Beyond their many similarities, each chip possesses unique features and uses different techniques that provide low-latency, high-throughput data flow.

A COUPLE OF MIPS The 1.8-GHz RM11200 is built around a 16-port XBAR nonblocking, asynchronous switch with a low latency of just 3 ns (Fig. 1). The device only uses 11 of the ports, so expect new chips based around the same architecture.

Each port has a bandwidth of 192 Gbits/s, and aggregate throughput measures 1.5 Tbits/s. All transactions are pipelined. Of note is the asynchronous switch, which eliminates the global-clock skew problem found in high-speed, synchronous designs. It also simplifies the interfacing of each port while boosting performance. A processor-specific port handles cache coherency and supports a bandwidth of over 200 Gbits/s.

PMC-Sierra has put a lot of design time into the cache. Each processor has its own level 1 and level 2 cache protected by ECC. With PMC-Sierra's Direct Deposit technology, data from peripherals can be placed directly into a processor's cache.

The MIPS cores are the latest design with seven-stage superscalar pipelines and an 8k entry jump prediction table. EJTAG and trace support are standard, because off-chip in-circuit emulation (ICE) is totally impractical with a multicore design.

The RM11200 has an interesting mix of peripheral interfaces. It sports HyperTransport and dual PCI Express interfaces. HyperTransport opens the RM112000 to a multiprocessor environment, though the interface doesn't support NUMA-based memory sharing.

PAIR OF PowerPCs Freescale's 1.5-GHz MPC8641D also packs a pair of e600 PowerPC processors onto one chip (Fig. 2). Its MPX bus coherency interface provides cache coherency support similar to that of the RM11200. But the MPC8641D uses a very fast parallel bus as its internal interconnect. If more cores are added in a future design, the bus performance must be increased, yet it's sufficient for the needs of two processors.

Like the RM11200, the MPC8641D gives each processor its own level 1 and level 2 cache. This is especially effective when the operating system supports processor affinity.

The MPC8641D shows its heritage with support for 1x or 4x Serial RapidIO instead of HyperTransport. One advantage of Serial RapidIO is that it can be used as a switched backplane technology. Each Serial RapidIO line operates at 2.5 Gbits/s. Serial RapidIO has found a home in the communications space, so expect many MPC8641Ds linked by a switch fabric, such as an AdvancedTCA 3.5-based system. A developer will have to forfeit one of the PCI Express interfaces to use Serial RapidIO.

The PCI Express interfaces can be set up individually as a root or endpoint at initialization time. As a result, the chip can act like a PCI Express peripheral itself or control one.

The other notable feature in Freescale's chip is found within each Gigabit Ethernet port. The ports can be set up as a simple 8-bit FIFO, or a pair can be linked as a 16-bit FIFO. When used as an Ethernet port, it supports TCP/IP checksum and IPv6 layer 4 hardware acceleration. The port also provides 16 quality-of-service queues. Layer 2 features include VLAN insertion and deletion per frame and a 16-exact-match MAC address table.

MULTIPLE QUADS Broadcom's 1.2-GHz BCM1480 represents a more advanced version of its BCM1250 (Fig. 3). The BCM1480 brings significant enhancements, including three SPI-4/HyperTransport (HT) links with ccNUMA (cache coherent NUMA) support. The latter operates in a fashion similar to AMD's Opteron processors, allowing a multichip, multiprocessor system to be constructed without additional glue logic. It's simply a matter of hooking the chips together via the HT links. There's a delay in accessing remote memory, but most accesses are a single HT hop away. Combine this with large caches to minimize overhead for off-chip access.

Designers can choose the necessary mix of SPI-4 and HT links. The HT links can be used to access HT peripherals or memory on another BCM1480. Three ports allow designers to create a number of different configurations from a mesh of processors or SPI-4 interconnects to a bridge between the two interfaces. The BCM1480 should find homes in SPI-4 packet-processing applications.

Expect more multicore chips and matching application software in the next few years. The trend is in place and will continue to unfold.



PMC Sierra

Sponsored Recommendations


To join the conversation, and become an exclusive member of Electronic Design, create an account today!