Start with a standard processor core design. Highlight performance and power bottlenecks. Replace key logic with advanced clock gated logic. Significantly cut power requirements. Incorporate into popular multimedia devices. Profit.
That’s Intrinsity’s plan. The company started with ARM’s Cortex A8 architecture with a 13-stage, in-order, dual-issue, superscalar microprocessor core and a global history-based branch prediction system (Fig. 1). It incorporates a 10-stage Neon media pipeline designed to accelerate media codecs such as H.264 and MP3.
The core is ARMv7-compliant, including Thumb-2 support and Jazelle RCT (runtime compilation target) Java-acceleration technology designed to optimize Just In Time (JIT) and Dynamic Adaptive Compilation (DAC) support. It also supports ARM’s TrustZone technology for secure transactions and Digital Rights Management (DRM).
The Cortex A8 is already designed to be a low-power platform. By identifying critical spots in the design and replacing them with dynamic logic, though, it was possible for Intrinsity’s designers to increase performance while reducing power requirements (Fig. 2).
Gate delays are 1/4 clock cycle, and overlapping clocks allow delays to be borrowed from adjacent phases. Intrinsity used its proprietary Fast14 1-of-n domino logic (NDL) technology. It’s possible to employ NDL in non-ARM designs as well. Domino logic usually uses less space than conventional CMOS logic. The parasitic capacitance is also lower. It employs an inverting circuit/dynamic gate between each stage. There is no fanout on the inverter, so it can be small and fast.
PRECHARGE AND EVALUATION PHASES
The system operates in two phases: precharge and evaluation. It effectively operates like a latch between stages. Multiple states within the overall circuit increase overall bandwidth. Timing is critical since the charge/eval cycle dynamic operates more like dynamic memory, unlike latched stages.
NDL is just one of a number of optimizations that Intrinsity employs in the design. Power gating, custom static logic and memory, and short wire floor planning are a few others. These design approaches are built into Intrinsity’s design flow toolset, allowing the approach to be applied to almost any design. For example, highly automated Vt and cell selection flows permit selection of the best gates for speed while balancing power usage.
The Cortex A8 is available in 65 nm with a low-power (LP), low-leakage version that runs up to 650 MHz. The generalpurpose (GP) version cracks the 1-GHz/2000 DMIPS mark, but Intrinsity’s Hummingbird achieves this using the LP process while drawing under 0.75 mW/MHz. Multi-VDD and multi-frequency design methodology enable the chip to run at high speed even at the minimum supply voltage of 1.0 V. Samsung chips based on the design will use this platform in a range of portable multimedia applications.