Electronic Design

Hard RISC Cores To Power System-Level FPGAs

On-Chip ARM, MIPS, And PowerPC RISCs Will Drive Cost-Effective, High-Performance FPGA System Designs.

FPGAs are becoming true high-level system chips. This requires high-performance, dense processing power. Vendors of FPGAs are turning to hard RISC processor cores to get that deployable processing power. Developers will no longer need to rely on relatively inefficient soft cores to generate 32-bit processing power. They can now get ARM, MIPS, or PowerPC RISC CPUs as hard cores tuned for the underlying FPGA silicon process.

For a long time, FPGA vendors thought that everything could be done in FPGA gates or cells using the standard FPGA interconnect buses. Even with ever-rising silicon, it has become apparent that FPGA cells can't do everything. All of today's leading FPGA vendors offer some level of hard core IP in their FPGA chips. Until now that hard IP was SRAM blocks and some special math function blocks.

But silicon densities have moved FPGAs from supplying the glue logic to a processor or board, to where the FPGA itself can be the system chip holding the host or peripheral processor. Soft cores have been available for years, starting with the 32-bit ARC and ARM. Some designers have created their own FPGA-based processing elements, especially for network processors. For many mainstream designers, however, an FPGA-based processor makes sense if it's a mainstream standard (with tools and a design base), and if that processor can be effectively deployed in FPGA silicon.

That's where hard IP processor cores come in. They provide a standard RISC processor that has been tuned for the FPGA process. This enables designers to work with a standard design—ARM, MIPS, or PowerPC—almost as if they were working in the IC design plane. They concentrate on their application design without getting caught up in porting the processor architecture to the FPGA silicon and having to spend time optimizing CPU design to the process, which can take up to a year or more. CPU silicon has always represented the most expensive silicon real estate, and that hasn't changed with FPGAs. Hard cores enable FPGA SOC designers to work from optimized CPU silicon, minimizing silicon costs. These hard cores can be as small as 1.5 to 2.0 mm2 of silicon (core only, no caches).

For many markets, such as telecom, datacom, and consumer electronics, time-to-market is critical. Employing hard core processors enables engineers to quickly turn their designs, relying on fixed, already working blocks to minimize design and test times.

Some additional advantages come with hard cores in comparison to ASSP or IC silicon. They're on-chip processors, so memory and bus links are faster and require less drive. Also, the designs are extensible as far as cache and some functionality is concerned. Designs can thus be tuned to the application, with just enough cache added to hit the performance mark needed. Cache is added to the design on silicon by using either existing hard core memory blocks or hard IP memory blocks.

Hard 32-bit CPU cores selected by FPGA vendors for use include the ARM 922T (Altera, QuickLogic), the PowerPC 405 (Xilinx), and the MIPS 4Kc (QuickLogic). They're all descendants of standard 32-bit architectures, each with a large tool, software, and programmer base. Each is a contender for a 32-bit processor standard for the 21st century, and has a history of use and a broad implementation base.

Also Available As ASSPs And ICs
The ARM, MIPS, and PowerPC microprocessor architectures are not just restricted to a core implementation base. Each is available also as an ASSP and an IC. ARM has the widest base and is heavily used as a core in ASICs and ASSPs. ARM is a dominant processor in low-power, portable devices. Right behind ARM is the MIPS architecture, one of the earlier RISCs, which has become a major architecture for games, printers, and communications. It has a broad base in ICs and a growing base in high-end ASIC-based processing. On the other hand, the PowerPC's main base is in ICs. It powers the latest Apple desktop computers, as well as servers and embedded boards. The PowerPC has a growing core presence due to IBM's ASIC business and its licensing of the core to ASIC and FPGA vendors.

The cores in these families selected for FPGA operation—the ARM9T, the MIPS32 4Kc, and the PowerPC 405C—are tuned to deliver cost-effective performance, balancing execution throughputs against silicon real estate and power dissipation. They

  • approach 200-MHz and above operation
  • have 5-stage pipelines
  • implement scalar operation
  • have 2 to 3 execution units
  • are static designs
  • target low power dissipation

These cores deliver 200 peak MIPS performance, at up to 200 MHz (and beyond) with a scalar RISC implementation. Most instructions execute in 1 cycle (pipelined). The cores are designed for L1 caches to up memory fetch efficiencies. Some can run without a cache at degraded performance. They typically have both I and D caches (the ARM 7 has a unified cache), as well as an MMU that accommodates an OS with memory protection.

Bus systems are more than just a CPU and its cache memory. they're needed to efficiently connect the CPU to main memory, and for linking it to peripherals and available FPGA functionality. Complex systems require a bus hierarchy to support multiple connectivity levels and maintain RISC CPU performance.

Two busing systems for hard cores are attracting FPGA attention—IBM's CoreConnect and ARM's AMBA. Both define a hierarchy with a main processor, a local bus, and a supplementary peripheral bus. They were initially created to support a proprietary RISC architecture—the PowerPC for CoreConnect and the ARM for AMBA. Both are now available for FPGA use. A third bus, PCI-X, the new PCI extension, also is under consideration as a systems/peripheral bus to supplement the processor's local bus.

ARM's AMBA bus system defines 3 buses: two system buses—the AMBA High-Speed Bus, (AHB) and the AMBA System Bus (ASB)—and the AMBA Peripheral Bus (APB). The systems buses support high-speed operation, are pipelined, and operate with multiple bus masters. They are non-multiplexed buses—i.e., separate address and data bus lines. The more sophisticated AHB systems bus supports burst transfers and split transactions for more efficient operation. The earlier bus, the AMBA System Bus (ASB) is 32-bits wide and targets typical low-end systems. The later bus, the AMBA High-Speed Bus (AHB) is 64- or 128-bits wide and is intended for high-performance systems.

The AMBA Peripheral Bus is bridged to the system bus; this bridge acts as a slave device to the system bus. The peripheral bus was designed for lower-power operation and provides a simple memory mapped interface for lower-speed devices. Like the system buses, the peripheral bus is non-multiplexed. However, if tri-state buffering is not used for the data bus, ARM recommends that the peripheral bus be implemented with a separate read and write data bus.

IBM's CoreConnect consists of a Processor Local Bus (PLB) that interfaces with CPU cores, and the On-Chip Peripheral Bus (OPB). It is non-multiplexed and supports overlapped writes and reads (separate read and write data buses), split transactions, and address pipelining for efficient operations. It also supports up to 16 bus masters and 16-/32-/64-B line transfers. There are 32-/64-/128-bit versions; they can be extended to 256 bits. PLB clock rates are 66, 133, and 183 MHz for 32-, 64-, and 128-bit versions, respectively. For larger systems, CoreConnect defines a crossbar switch that connects multiple PLB-based subsystems.

See associated figure

ARM ARM 922T 11.8 mm2 (w caches, 0.18 µm) To 200 MHz Used by Altera and QuickLogic for their FPGAs. Used with the ARM AMBA* busing system. Scalar processor with the full ARM ISA and the Thumb 16-bit ISA extension. 16 KB I and D caches, 5-stage pipeline, 160-mW dissipation with caches. Has no hardware divide, but has a hardware multiply. Most instructions are conditional.
  ARM 740T 2.5 mm2 (w caches, 0.18 µm) To 60 MHz Used by Triscend for its FPGA-based microcontroller. The ARM7 acts as the processor and is combined with memory blocks and FPGA logic and peripherals. The ARM7TDMI incorporates the basic ARM ISA and the Thumb 16-bit ISA subset. This is the basic cached processor microcell. It has 8 KB of mixed I/D unified cache, a 4-channel DMA controller, and an external memory controller/interface.
IBM Microelectronics PowerPC 405C 2 mm2 (w/o caches, 0.18 µm) To 200 MHz Used by Xilinx with IBM's CoreConnect** busing systems. This is a scalar core with the PowerPC ISA. The I and D caches go to 32 KB each. It has a 5-stage pipeline and dissipates 400 mW with caches. Has a MPY/DIV unit, and static branch prediction. It dissipates 1.0 W.
MIPS Technology MIPS 4C 3 mm2 (w/o caches, 0.25 µm)   Used by QuickLogic for its FPGA with ARM's AMBA* busing system. The 4Kc is also used by Altera with its FPGA; Altera also uses the AMBA* busing system. The MIPS32 4Kc is a scalar 32-bit RISC. It has 16 KB I and D caches, a 5-stage pipeline, a MPY/DIV/MAC unit that does a MAC in 1 cycle. It dissipates 400 mW @200 MHz with the caches.
*AMBA−the Advanced Microcontroller Bus Architecture developed by ARM (www.arm.com)
**CoreConnect−Developed by IBM Microelectronics (www.chips.ibm.com)

(408) 579-2200

IBM Microelectronics
(914) 499-6435

MIPS Technologies
(650) 567-5000

(408) 990-4040

(650) 968-8668

(408) 559-7778

Hide comments


  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.