Innovate for Low Power in A High-Performance FPGA

Traditionally, digital logic has not consumed significant static power, but this has changed dramatically as process nodes shrink. Leakage current in digital logic is now the primary challenge for FPGAs as process geometries decrease. If power-reduction strategies are not employed, power consumption becomes a critical issue as static power can increase dramatically at the 65-nm process node. Static power consumption rises largely because of increases in various

sources of leakage current (Fig. 1). Power consumption is composed of static and dynamic power. Static power is the power consumed by an FPGA when it’s programmed with a programmer object file (.pof), but no clocks are operating. Both digital and analog logic consume static power. In an analog system, static power primarily consists of the quiescent current of the analog circuit based on its interface (Fig. 2).

Dynamic power is the added power consumed when the device is operating, which is caused by toggling signals and charging and discharging capacitive loads. The main variables that affect dynamic power are capacitance charging and the supply voltage, as well as the clock frequency (Fig. 3).

Dynamic power decreases with Moore’s Law by taking advantage of process node shrinks to reduce capacitance and voltage. The challenge is when more circuits are implemented with each process shrink and the maximum clock frequency increases.

While the power reduction declines for an equivalent circuit from process node to process node, the FPGA capacity keeps doubling. On top of that, the maximum clock frequency keeps increasing.

FPGA ARCHITECTURE

Advances in architecture, process technology, and circuit techniques help attack these different power challenges. One such example is Altera’s Stratix III FPGA.

The company’s Programmable Power Technology helps reduce power in high-end FPGAs. Traditionally, all high-performance FPGAs are implemented with a highperformance fabric, where every logic element (LE) provides the maximum performance with a subsequent high leakage power.

Programmable Power Technology takes advantage of the fact that most circuits in a design have excess slack and therefore don’t require the highest performance logic. Figure 4 shows a typical excess slack histogram, where the majority of the paths (on the left) have slack and only a few critical paths (on the right) need the highest performance logic to meet timing requirements.

With Programmable Power Technology, the logic fabric of Stratix III can be programmed at the logicarray- block (LAB) level by providing high-speed or low-power logic, depending on what the specific logic path requires (Fig. 5). In this way, the small percentage of timing-critical circuits is “selected” to the high-speed setting, with the remainder using the low-power setting, resulting in a 70% drop in leakage power for the lowpower logic. Placing unused logic, as well as DSP blocks and TriMatrix memory into the low-power modes, further decreases power.

SELECTABLE CORE VOLTAGE

Selectable core voltage lets designers use a 0.9- or 1.1-V core voltage based on performance requirements. The 0.9-V core voltage provides the overall minimum dynamic and leakage power, while the 1.1-V core voltage delivers the overall highest performance. Dynamic power scales with the square of core voltage, while static power scales by the power of 2.5 of core voltage.

The selectable core voltage input can be set to 0.9 V or 1.1 V during board design. This core voltage supplies all LABs, memories, and DSP functions in the core fabric. The selectable core voltage affects the fabric performance, so when a device and speed grade are selected in the software, a core voltage selection is also required. The software uses timing and power models specific to the selected core voltage to implement all timing-dependent and power-dependent analysis and optimization.

When choosing which core voltage to use, a designer must consider the system performance requirements reported from the timing analysis. If a system’s performance requirements can be met with 0.9 V, they always produce lower power than when using 1.1 V.

MERGING TECHNOLOGIES

Combining Programmable Power Technology and selectable core voltage delivers various performance and power operating points that achieve over 50% power reduction at 1.1 V (Fig. 6). Static power varies considerably depending on the use of the various resources, such as DSP blocks and TriMatrix memory blocks.

The combined static and dynamic power varies across combinations of core voltage and percentage of highspeed versus low-power logic. In most designs, where maximum FPGA performance isn’t required, the total power of a design can be reduced by as much as 50% or more.

Continue to next page

PROCESS AND CIRCUIT TECHNOLOGY

The semiconductor industry constantly battles the evolving challenges of small process dimensions through huge investments in equipment, process technologies, design tools, and circuit techniques. In particular, the challenge of increasing leakage power with small process geometries is felt across the industry. Thus, many well-known technologies at the 65-nm process node (and prior) are used to maintain or increase performance while managing leakage power:

Copper routing
Low-k dielectric
Multi-threshold transistors
Variable gate-length transistors
Triple gate oxide
Super-thin gate oxide
Strained silicon

To attain high efficiency and performance, Stratix III FPGAs leverage an adaptive-logic-module (ALM) logic architecture and a MultiTrack interconnect fabric. This combination allows more logic to be packed with less routing.

ALM technology, which is said to have 80% more logic functions than other architectures, includes an eightinput fracturable lookup table (LUT), two 2-bit adders, and two registers. MultiTrack interconnect provides one-hop interconnectivity between different LABs. It’s measured by the number of “hops” required to get from LAB to LAB. Adding interconnect hops ups capacitance; the fewer the hops, the less high-speed logic is required to meet performance.

MultiTrack interconnect provides onehop interconnectivity that yields the lowest possible power (Fig. 7).

Hierarchical clocking is used in the Stratix III FPGAs to support up to 360 unique clocks. The propagation of every clock network can be controlled down to a LAB level. Logic with common clocks is grouped into LABs. Clocks are only propagated where the logic uses that clock. All other clock signals are shut down to minimize power consumption.

SAVE MEMORY INTERFACE POWER

Double-data-rate (DDR) memory interfaces are one of the most common I/O interfaces in designs today, and they can be fairly power-hungry. To combat those power issues, designers can turn to dynamic onchip termination and DDR3.

When reading and writing to external memory, it’s vital to have an impedance-matched buffer, both in series and parallel termination. If there’s a 50-Ω transition line when writing to memory, a matched buffer with a series impedance of 50 Ω is needed. When receiving data from the memory, a 50-Ω parallel termination resistor pulled to a termination voltage is desired. Not only is this used for DDR-type interfaces, but also for RLDRAM and QDRRAM.

By supporting dynamic on-chip termination, FPGA designers can turn the parallel termination resistor to an on or off (open circuit) state, depending on whether a read or write is being executed. During a write, the FPGA output driver impedance must be matched to the transmission line. However, the parallel resistor to VTT wastes energy and reduces signal swing. To avoid this, the resistor can be turned off (Fig. 8).

During a read, the parallel resistor is on to terminate the transmission line to reduce reflections that degrade signal integrity and the ability to reliably read data.

The significant benefits of dynamic on-chip termination are realized whenever the bus is either performing a write from the FPGA or the bus is idle. First, power is greatly reduced—1.6 W of static power can be saved on a 72-bit DDR2 bus. In addition, a pure series line termination is achieved when writing. Finally, the need for lots of board termination resistors is removed, saving board cost and complexity.

DDR3 provides 30% lower power than DDR2 because it runs at a lower voltage: 1.5 V versus 1.8 V. For example, a system with a 72-pin, 200-MHz or 400-Mbit/s memory interface with on-chip termination would dissipate 3.9 W for only one memory interface. Using dynamic onchip termination (wherein the parallel termination resistor is turned off when idle or when performing a write) can save 1.6 W. If both DDR3 and dynamic on-chip termination are used, power drops to 1.6 W, saving a total of 2.3 W. These savings are on a per interface basis (i.e., four memory interfaces in an FPGA would save 9.2 W).

The move to very small process nodes—65-nm and below—delivers the expected Moore’s Law benefits of increased density and performance. However, the boost in performance seriously increases power consumption, introducing the risk of consuming unacceptable amounts of power.

If power-reduction strategies aren’t used, static power consumption will increase significantly. Also, without a specific power optimization effort, dynamic power consumption rises due to the increased logic capacity and higher switching frequencies.

Overcoming these power challenges with an enabling and innovative architecture, combined with process technology and circuit techniques advances, provides an efficient and scalable solution for today’s increasingly complex FPGAbased designs.