Hybrid 32-Bit MCUs Master Memory, Power, And Price

Large, fast, low-cost memory has been a boon to 32-bit MCUs, turning this market segment into one of the most competitive. Due to their larger die size, accelerators and processing peripherals can be included. FPGAs, DSP instructions, and coprocessors abound. Even Java byte-code acceleration can be found in standard components.

Many high-performance 32-bit microcontrollers (MCUs) follow the conventional MCU definition by incorporating nonvolatile memory with RAM and a host of peripherals, such as Motorola's 56F8300. Others minimize the amount of external hardware necessary to support external memory and peripherals. Another approach is to use a high-speed transport mechanism, such as HyperTransport and RapidIO. More often than not, on-chip peripherals tend to be more powerful than their 8- and 16-bit counterparts. Likewise, high-end 32-bit cores may have DSP, SIMD, or floating-point support. Some integrate these features into a single core while others turn to multiprocessing cores like Oki Semiconductor's ML67Q5200 (Fig. 1).

This level of optimization and integration has obvious payoffs in performance, but it can also yield better power utilization than low-end, 32-bit parts. Kevin Klien, standard products marketing manager for Motorola's 32-bit Embedded Controller Division, indicates that a high-performance, 32-bit part running at a slower speed often is a better choice than a less sophisticated, 32-bit MCU. In addition, completing a job more quickly and efficiently may allow the MCU to run in lower-power modes, like reduced clock speed or main power voltage, while a different MCU would need to continue running.

Effective use of on-chip peripherals and computational resources is what 32-bit MCUs do best. But, off-chip peripherals and communication are sometimes needed to exploit off-chip resources or coprocessors.

HIGH-PERFORMANCE INTERCONNECTS A number of 32-bit MCUs incorporate Ethernet support, which provides a high-speed networking solution. However, it comes with significant software overhead. HyperTransport and RapidIO offer interconnects that match the performance of the core processor while requiring low communication overhead.

Intrinsity's FastMath and FastMIPS MCUs combine a high-speed MIPS-32 processing core with a pair of RapidIO ports (Fig. 2). They also support vector and matrix operations.

The 2-GHz FastMath chip incorporates a 4-by-4 SIMD array of 32-bit processing elements, each with its own local register file. The 1 Mbyte of on-chip memory is configurable as a layer 2 (L2) cache or SRAM. The dual RapidIO ports offer a 4-Gbyte/s aggregate throughput.

HyperTransport, which has appeared in more 64-bit MCUs than 32-bit MCUs, suits high-performance 32-bit MCUs. Nonetheless, RapidIO seems to have the edge in terms of the number of MCUs available with this communication option. RapidIO is also popular with 32-bit DSPs that have also been gaining more general-purpose processor support.

CUSTOM AUGMENTATION Meeting high-performance application needs can be tough for a 32-bit MCU, even with the wide range of communication and on-chip peripherals. On-chip customization is one way to close that gap.

QuickLogic's solution combines an FPGA with a MIPS-32 processing core (Fig. 3). The FPGA can be used to augment I/O transfers or implement parallel-processing algorithms. Many algorithms, such as encryption, can be implemented more efficiently in hardware than in software. The FPGA can be reprogrammed on the fly, so multiple-part algorithms can be implemented in a step-wise fashion.

Although not strictly an MCU solution, Xilix and Altera provide large logic arrays that can be used to implement one or more 32-bit processors (Altera's Nios soft RISC processor is available in 16- and 32-bit versions). Thus, there's plenty of room for peripherals and custom logic. These companies have also been delivering more fixed solutions at a lower cost, implementing the systems originally programmed into the reprogrammable logic array parts.

Ubicom takes a software approach to peripherals (see "Hardware Scheduling Accelerates Soft Peripherals," p. 58). It includes a few standard peripherals like Ethernet multiply/add accumulates (MACs), but it provides others as virtual peripherals implemented as bit-banging device drivers. A very high-speed processor efficiently implements byte and bit processing, making the approach practical for a variety of peripheral implementations. The software approach allows for the creation of peripherals on demand and the implementation of complex algorithms, as opposed to QuickLogic's FPGA approach.

STANDARD ARM One architecture that has been relatively devoid of standard parts is Arm. Initially found in custom system-on-a-chip (SoC) products, 32-bit Arm processors have gained a reputation for lower power consumption and high performance.

Lately there's been a flood of standard Arm-based MCUs from the likes of Sharp, STMicroelectronics, Philips Semiconductor, and Samsung. These standard parts span the architectural gamut from low-end 32-bit MCUs running the 16-bit Thumb instruction set to high-end Arm processors.

Standard Arm components have been available, though most have been customized for a particular application. AMD's Alchemy targets mobile devices. Also, Digi International's NetSilicon Net+Arm is designed for embedded network devices. In fact, it was one of the first to incorporate an Ethernet adapter and a bundled operating system.

DOUBLE DOWN The Arm architecture is great for data manipulation but falls short with DSP chores. Texas Instruments' OMAP architecture overcomes this limitation by pairing an Arm processor with a TI DSP (Fig. 4). This lets each processor handle applications appropriate for their services. The two are tied together with a shared mailbox interface.

Matching two processors can create significant development advantages. The architecture is well defined, enabling development tools and third-party libraries to be customized for the chip. The software can also take advantage of the power-management features found on a standard chip. Moreover, a two-processor architecture is more responsive. That's because the DSP portion needn't be interrupted or crippled to handle non-DSP chores due to the presence of a processor dedicated to that function.

CRUNCHING NUMBERS A wider bus width makes it easier to handle larger data words. Because floating-point support is also important in the 32-bit realm, you'll find a good number of 32-bit MCUs with at least one floating-point unit.

Motorola PowerQUICC III also incorporates single-instruction, multiple-data (SIMD) instructions. Integer SIMD support can augment audio and image processing, encryption, and a number of other integer and floating-point intensive operations. While not as sophisticated as Motorola's AltiVec array processing system, found on its PowerPC host processors, the SIMD support is more than adequate for most embedded applications.

SURFING THE NET Network processing is another area where high-power, 32-bit MCUs have found a home. While many 32-bit MCUs provide Ethernet support, these network devices usually offer even better support. For example, IBM's PowerPC 440GX incorporates a TCP/IP acceleration engine that handles a number of TCP/IP-related chores in hardware, reducing the size of the TCP/IP stack and considerably improving system throughput.

Motorola's MPC8560 has a more flexible Communications Processor Module (CPM) that handles the on-chip communication peripherals, not just the Ethernet interfaces. It supports an aggregate bandwidth of more than 1 Gbit/s and has its own ROM and dual-port RAM. Such an approach allows peripheral control to be the most automated and more sophisticated than DMA support found on most 32-bit MCUs.

DSP OR MCU? SIMD and dual processing are two ways to improve 32-bit MCU DSP support. Another is to incorporate DSP hardware and instructions into a standard 32-bit MCU core. NEC Electronics takes this approach with its V850. The V850's MAC hardware lets it handle many DSP chores. It's a very fast processor to begin with, so there's usually enough time to handle other services as well.

Adding such support is relatively straightforward for MCU designers. It requires a few wide registers plus hardware multiply-and-add support that's usually independent of the main arithmetic unit. Floating-point support requires additional hardware along the same lines.

Still, turning a 32-bit MCU into a DSP, or a close cousin, typically requires more work. Zero overhead loops, pointer register manipulation, and other DSP-oriented architectural features don't always mesh well with RISC designs.

Analog Devices has taken the other tack by giving its DSPs a more general-purpose architecture. This makes programming easier and lets the DSP handle non-DSP-related services, like network support. It's a great approach for developers well versed in DSP system design who require more conventional peripheral support without adding a second conventional processor.

Hyperstone AG understood the need for DSP support within a general-purpose operating environment by designing its E1-32 from the ground up with a RISC/DSP architecture. It incorporates a flexible peripheral interface so no additional logic is needed for most memory, flash, or peripherals.

ROLL YOUR OWN Off-the-shelf MCUs address most developers' needs, but custom implementations are often more cost-effective and provide better performance. Custom MCUs are available from a large number of sources, including those with standard 32-bit MCU parts. However, Tensilica and Arc are dedicated to custom 32-bit implementations.

Tensilica's Xtensa architecture implements a five-stage pipeline with user-selectable datapath widths up to 1024 bits. Tensilica Instruction Extensions (TIE) enhances Xtensa's instruction set and links instructiosn to custom logic.

Arc takes a similar approach, offering a range of standard configurations that can be combined on one chip. These include the standard ARCtangent architecture as well as ARCtangent for DSP and the ArcLite microRISC 8-bit RISC core. Mixing multiple processors on a chip allows each to be pruned down to the minimum requirements while optimizing functionality.

David Fritz, VP of technical marketing, says that Arc's IPShield security support improves throughput by a factor of 20 for Advanced Encryption Standard (AES) when implemented as ARCtangent instructions. Configuration for most features is a matter of selecting options from a dialog box (Fig. 5).

The key to both company's solutions is the software that encompasses the package. Compilers, debuggers, and integration development environments are customized based upon the hardware design.

The plethora of 8- and 16-bit MCUs is primarily due to the vast number of peripheral and memory combinations available. The diversity of 32-bit MCUs is augmented by the wide range of architectures, acceleration methods, and multiprocessing that's rarely found in their smaller cousins. It makes the developer's job of finding the right high-performance component for a project difficult but interesting.

Need More Information?

Arc
www.arc.com

Atera
www.altera.com

AMD (Advanced Micro Devices)
www.amd.com

Analog Devices
www.analog.com

Atmel
www.atmel.com

Arm
www.arm.com

Digi International
www.netsilicon.com

IBM
www.ibm.com

HyperStone AG
www.hyperstone.com

HyperTransport Consortium
www.hypertransport.org

Intrinsity
www.intrinsity.com

QuickLogic
www.quicklogic.com

MIPS Technologies
www.mips.com

Motorola
www.motorola.com

NEC Electronics
www.necl.com

Philips Semiconductor
www.semiconductors.philips.com

Oki Semiconductor
www.okisemi.com

RapidIO Trade Association
www.rapidio.org

Renesas Technology
www.renesas.com

Samsung
www.samsung.com

Sharp Microelectronics
www.sharpmsa.com

STMicroelectronics
us.st.com

Tensilica
www.tensilica.com

Texas Instruments
www.ti.com

Triscend
www.triscend.com

Ubicom
www.ubicom.com

Xilinx
www.xilinx.com