Mini CPU Ramps Up Throughput At Low Power

March 1, 2004
Taking aim at high-performance embedded applications, the ARC-700 CPU core from ARC International delivers the smallest chip area for a 400-MHz, 32-bit RISC processor. This configurable and extendible core can be tailored for speed, area, and power...

Taking aim at high-performance embedded applications, the ARC-700 CPU core from ARC International delivers the smallest chip area for a 400-MHz, 32-bit RISC processor. This configurable and extendible core can be tailored for speed, area, and power for a given application, just like ARC's previous offerings.

According to the company, the base ARC-700 is about one-third the chip area of an ARM-11 CPU from ARM Inc. when implemented in a 130-nm process technology (about 0.56 mm2). At 130 nm, the core can operate at up to 400 MHz and consumes about 0.15 mW/MHz. When implemented in a 180-nm process, the core runs at 266 MHz, occupies 1 mm2, and consumes 0.5 mW/MHz.

To achieve the higher throughput, the CPU core includes a seven-stage pipeline, configurable dynamic branch prediction, and extensive data forwarding to minimize pipeline depth impact (go to www.elecdesign.com to see an online schematic of the ARC-700 core). Memory ports were widened to 64 bits, improving the memory bandwidth. Also incorporated is nonblocking access, and two-deep hit-under-miss operation keeps the pipeline executing while data accesses occur in parallel. Out-of-order completion helps maintain pipeline operation for noninterdependent instructions.

The basic core uses a Harvard architecture. It includes 128 user instructions, 26 general-purpose registers (extendible to 54), and additional dedicated registers. Standard instruction extensions include a 32- by 32-bit multiplier, a find-first-bit normalization operation, and a swap operation. Later this year, ARC plans to release a full DSP extension suite for better handling of DSP applications.

As in previous ARC processors, the ARC-700 employs dynamic code sizing, allowing for both 16- and 32-instructions. This helps minimize the code footprint and reduces memory cost and power consumption.

Processor core timing is isolated from bus timing, opening the door for flexible interconnections that enable the core to meet most timing constraints. With this isolation, each CPU can act independently in a multiprocessor design. An Atomic Exchange instruction speeds up semaphore operations for fast interprocessor communications, while a Sync instruction flushes caches to maintain cache coherency.

A complete suite of design tools supports the CPU cores. The ARChitect 2 suite lets designers select various CPU options, create instruction set extensions, and provide all of the tools needed to integrate the core as part of a system-on-a-chip design.

ARC Internationalwww.arc.com

See associated figure

About the Author

Dave Bursky | Technologist

Dave Bursky, the founder of New Ideas in Communications, a publication website featuring the blog column Chipnastics – the Art and Science of Chip Design. He is also president of PRN Engineering, a technical writing and market consulting company. Prior to these organizations, he spent about a dozen years as a contributing editor to Chip Design magazine. Concurrent with Chip Design, he was also the technical editorial manager at Maxim Integrated Products, and prior to Maxim, Dave spent over 35 years working as an engineer for the U.S. Army Electronics Command and an editor with Electronic Design Magazine.

Sponsored Recommendations

Comments

To join the conversation, and become an exclusive member of Electronic Design, create an account today!