Electronic Design

Embedded Processors Leverage Parallelism To Deliver Supercomputer Performance

The latest crop of high-performance RISC processor cores can perform many operations in parallel to deliver a major surge forward in performance. With no less than six new implementations of the popular MIPS instruction-set architecture (ISA) appearing over the last two months, designers have plenty of 32- and 64-bit high-performance options.

At the high end of the spectrum, three companies have unveiled 64-bit cores that allow the architecture to deliver up to 2000 MIPS. Meanwhile, three other firms have leveraged the 32-bit MIPS architecture to craft a high-performance core and a highly integrated single-chip solution. These latter processors target modest but still high-performance computing applications.

With network communications in mind, SiByte Inc. of Santa Clara, Calif., has licensed the MIPS64 architecture from MIPS Technologies Inc. of Mountain View, Calif. SiByte has since optimized the architecture for very low in-system power consumption at clock speeds up to 1 GHz. At this rate, the SB-1 core delivers a throughput up to 2000 MIPS and consumes less than 2.5 W, thanks to the extensive use of clock gating and a 1.2-V power supply. Designers can use the SB-1 to integrate server-class performance in embedded applications.

The 64-bit architecture provides 64-bit registers, 64-bit execution units, and 64-bit data paths, which all work in unison to accelerate network-related operations. Such operations include moving large data packets, stripping headers and reinserting routing and control information, and executing encoding and decoding algorithms. To achieve such high performance, SiByte based its design on a 0.15-µm CMOS process with seven levels of metal interconnect. Although SiByte's long-term goal is to offer clock speeds as high as 1 GHz, initial versions of the core operate at 600 MHz.

A nine-stage superscalar pipeline lets the processor issue four instructions every cycle. A dual 64-bit execution pipeline speeds the in-order execution of the instructions. Instruction and data caches, each containing 32 kbytes of four-way set-associative storage, provide quick access to critical data and instructions (Fig. 1). The dual double-precision floating-point execution units included in the core also contain the MDMX instruction-set extensions to improve CPU performance in graphics and multimedia applications.

The dual floating-point units can each be split to perform two 32-bit single-precision multiplication-accumulation operations. This gives the floating-point units a peak throughput of 8 GFLOPS when performing single-precision computations. An internal bus that's 256 bits wide, clocked at 500 MHz, achieves a peak bandwidth of up to 16 Gbytes/s.

Along with MIPS Technologies, Toshiba America Electronic Components of Irvine, Calif., is investigating the 64-bit embedded space. Both companies have unveiled 64-bit cores.

The MIPS64 20K family by MIPS Technologies combines the MIPS64 microprocessor/core with the MIPS3D graphics/multimedia extensions (see "64-Bit Architecture's New Instructions Take On Digital Entertainment, More," Electronic Design, June 12, p. 25). The R20K doesn't employ as aggressive a technology as SiByte's core does. Yet when clocked at 500 MHz, its top throughput is about 1000 MIPS and about 2 GFLOPS. It also runs on a 1.8-V supply and employs a seven-stage pipeline that performs dual in-order instruction issues while permitting out-of-order completion.

Like the SB-1, the R20K includes dual 32-kbyte four-way set-associative caches. Unlike the SB-1, the R20K doesn't use the MDMX instruction extensions to support graphics and multimedia operations. Instead, designers included 13 new MIPS3D application-specific extensions to the instruction set. The extensions help accelerate geometry and other operations, letting the R20K deliver 18 million to 25 million polygons/s (8 million to 10 million with lighting).

Not pushing the clock speeds or lithography quite as hard as the other manufacturers, Toshiba's TX49H family of 64-bit cores has been designed for use on the company's 0.25-µm ASIC process. That process will let the core operate at clock speeds up to 150 MHz. As a core, the TX49H enables designers to customize the instruction- and data-cache sizes. The largest option hits 32 kbytes each. Also, the core can be configured as an integer-only processor, or as a floating-point-capable unit with the optional floating-point coprocessor. Toshiba, however, expects to migrate the core to its latest, 0.14-µm ASIC process, letting it operate at higher speeds and less power.

With their sights set on applications that can still leverage 32-bit processing power, Lexra Inc. of Waltham, Mass., Alchemy Semiconductor Inc. of Austin, Texas, and Integrated Device Technology Inc. of Santa Clara, Calif., have opted to extend the MIPS 32-bit architecture.

Designed for network communications, Lexra's NetVortex processor starts with a MIPS-3000-class RISC processor and adds a 64-bit multimaster system bus capable of handling split transactions. Additional hardware support in the form of an eight-bank register file was included to handle fast context switches—just 1 cycle.

The multimaster bus manages multiple NetVortex cores or peripheral engines that can perform specific processing tasks. A multichannel DMA controller reduces CPU overhead when transferring packets to and from local memory over up to four transfer buses of up to 64 bits. A host PCI interface also is included, making it easy to tie the processor into the host system. Lexra plans to offer the core as both a soft, synthesizable RTL model and as a hard macro. The synthesizable version will be able to clock up to 250 MHz, while the hard macro will clock up to 450 MHz.

One potential application of the core would be to implement an OC-192-capable router. Multiple processor cores (16) can be co-integrated, with each core able to perform the packet processing for eight input and four output channels. An additional core could serve as a control processor (Fig. 2). Such a configuration could deliver a throughput of 7200 MIPS and provide an internal packet bus bandwidth of 115 Gbits/s.

The Au1000 Internet Edge Processor by Alchemy Semiconductor takes aim at Internet support hardware. Based on the MIPS32 instruction set, the chip was designed for high performance at very low power. It can run at clock speeds of up to 500 MHz, yet it consumes less than half a watt when clocked at 400 MHz. This translates to a performance of over 900 Dhrystone 2.1 MIPS/W.

Additionally, Alchemy has integrated a set of peripherals on the Au1000 that targets communications market segments whose products operate on the outer edge of the Internet network. As a result, the chip can handle wireless palm-sized PCs, third-generation data devices, voice-over-IP telephony devices, and networking products like residential gateways, firewalls, and routers.

The Au1000's architecture consists of an Alchemy-designed 32-bit core, a high-speed multiply-accumulate (MAC) unit, and an R4000-class memory management unit (MMU). The core employs a five-stage scalar pipeline, a 16-kbyte instruction cache, and a 16-kbyte nonblocking data cache. On-chip memory control lets the CPU interface to SDRAM, SRAM, or Flash EPROM.

Rather than offer a core, IDT's designers also have finished the chip-level integration and crafted a communications processor based on the RC32300 MIPS CPU core. The RC32334 can operate at clock rates of up to 150 MHz. It combines the CPU, a PCI bus interface that operates up to 66 MHz, a memory controller, and various I/O peripheral support—including a DMA controller, an interrupt controller, and serial ports. At the heart of the chip is the company's IPBus, an internal backbone bus that permits design reuse.

At 150 MHz, the processor delivers a top throughput of 197 MIPS. Unlike most of the other cores, this is a standalone device designed for low-cost systems. The 150-MHz grade sells for just $24 each in lots of 10,000 units. Versions that operate at 100 or 133 MHz also are available for $19 and $21 each. Industrial-temperature-range versions of the same chips are available, too. Prices for industrial versions start at $22.50 each.

For more information about these processors, visit the companies at www.alchemysemi.com, www.idt.com, www.toshiba.com/taec, www.mips.com, www.sibyte.com, and www.lexra.com.

TAGS: Toshiba
Hide comments


  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.