Electronic Design

Innovate For Low Power In A High-Performance FPGA

Overcome static and dynamic power consumption challenges by employing novel power-reduction techniques.

Traditionally, digital logic has not consumed significant static power, but this has changed dramatically as process nodes shrink. Leakage current in digital logic is now the primary challenge for FPGAs as process geometries decrease. If power-reduction strategies are not employed, power consumption becomes a critical issue as static power can increase dramatically at the 65-nm process node. Static power consumption rises largely because of increases in various sources of leakage current (Fig. 1).

Power consumption is composed of static and dynamic power. Static power is the power consumed by an FPGA when it's programmed with a programmer object file (.pof), but no clocks are operating. Both digital and analog logic consume static power. In an analog system, static power primarily consists of the quiescent current of the analog circuit based on its interface (Fig. 2 and the table).

Dynamic power is the added power consumed when the device is operating, which is caused by toggling signals and charging and discharging capacitive loads. The main variables affecting dynamic power are capacitance charging, the supply voltage, and the clock frequency (Fig. 3).

Dynamic power decreases with Moore's Law by taking advantage of process node shrinks to reduce capacitance and voltage. The challenge is when more circuits are implemented with each process shrink and the maximum clock frequency increases. While the power reduction declines for an equivalent circuit from process node to process node, the FPGA capacity keeps doubling and the maximum clock frequency keeps increasing.

Advances in architecture, process technology, and circuit techniques help attack these power challenges. One such example is Altera's Stratix III FPGA.

The company's Programmable Power Technology helps reduce power in high-end FPGAs. Traditionally, all high-performance FPGAs are implemented with a high-performance fabric, where every logic element (LE) provides the maximum performance with a subsequent high leakage power.

Programmable Power Technology takes advantage of the fact that most circuits in a design have excess slack and therefore don't require the highest performance logic. Figure 4 shows a typical excess slack histogram, where the majority of the paths (on the left) have slack and only a few critical paths (on the right) need the highest performance logic to meet timing requirements.

With Programmable Power Technology, the logic fabric of Stratix III can be programmed at the logic-array-block (LAB) level by providing high-speed logic or low-power logic, depending on which is required by the specific logic path (Fig. 5). In this way, the small percentage of timing-critical circuits is "selected" to the high-speed setting, with the remainder using the low-power setting, resulting in a 70% decrease in leakage power for the low-power logic. Placing unused logic, as well as DSP blocks and TriMatrix memory into the low-power modes, further decreases power.

Selectable core voltage lets designers use a 0.9- or 1.1-V core voltage based on performance requirements. The 0.9-V core voltage provides the overall minimum dynamic and leakage power, while the 1.1-V core voltage delivers the overall highest performance. Dynamic power scales with the square of core voltage, while static power scales by the power of 2.5 of core voltage.

The selectable core voltage input can be set to 0.9 V or 1.1 V during board design. This core voltage supplies all of the LABs, memories, and DSP functions in the core fabric. The selectable core voltage affects the fabric performance, so when a device and speed grade are selected in the software, a core voltage selection is also required. The software uses timing and power models specific to the selected core voltage to implement all timing-dependent and power-dependent analysis and optimization.

When choosing which core voltage to use, a designer must consider the system performance requirements reported from the timing analysis. If a system's performance requirements can be met with 0.9 V, they always produce lower power than when using 1.1 V.

Combining Programmable Power Technology and selectable core voltage delivers various performance and power operating points that achieve over 50% power reduction at 1.1 V (Fig. 6). Static power varies considerably depending on the utilization of the various resources, such as DSP blocks and TriMatrix memory blocks.

The combined static and dynamic power varies across combinations of core voltage and percentage of high-speed versus low-power logic. In most designs, where the maximum performance of the FPGA is not required, the total power of a design can be reduced by as much as 50% or more.

The semiconductor industry constantly battles the evolving challenges of small process dimensions through huge investments in equipment, process technologies, design tools, and circuit techniques. In particular, the challenge of increasing leakage power with small process geometries is felt across the industry. Thus, many well-known technologies at the 65-nm process node (and prior) are used to maintain or increase performance while managing leakage power:

  • Copper routing
  • Low-k dielectric
  • Multi-threshold transistors
  • Variable gate-length transistors
  • Triple gate oxide
  • Super-thin gate oxide
  • Strained silicon

To attain high efficiency and performance, Stratix III FPGAs leverage an adaptive-logic-module (ALM) logic architecture and a MultiTrack interconnect fabric. This combination allows more logic to be packed with less routing.

ALM technology, which is said to implement 80% more logic functions than other architectures, includes an eight-input fracturable lookup table (LUT), two 2-bit adders, and two registers.

MultiTrack interconnect provides onehop interconnectivity between different LABs and can be measured by the number of "hops" required to get from one LAB to another. Adding interconnect hops increases capacitance; the fewer the hops, the less high-speed logic is required to meet performance. MultiTrack interconnect provides one-hop interconnectivity that yields the lowest possible power (Fig. 7).

Hierarchical clocking is used in the Stratix III FPGAs to support up to 360 unique clocks. The propagation of every clock network can be controlled down to a LAB level. Logic with common clocks is grouped into LABs. Clocks are only propagated where the logic uses that clock. All other clock signals are shut down to minimize power consumption.

Double-data-rate (DDR) memory interfaces are one of the most common I/O interfaces in designs today, and they can be fairly power-hungry. To combat those power issues, designers can turn to dynamic on-chip termination and DDR3.

When reading and writing to external memory, it's vital to have an impedance-matched buffer, both in series and parallel termination. If there's a 50-Ω transition line when writing to memory, a matched buffer with a series impedance of 50 Ω is needed. When receiving data from the memory, a 50-Ω parallel termination resistor pulled to a termination voltage is desired. Not only is this used for DDR-type interfaces, but also for RLDRAM and QDRRAM.

By supporting dynamic on-chip termination, FPGA designers can turn the parallel termination resistor to an on or off (open circuit) state, depending on whether a read or write is being executed. During a write, the FPGA output driver impedance must be matched to the transmission line. However, the parallel resistor to VTT wastes energy and reduces signal swing. To avoid this, the resistor can be turned off (Fig. 8).

During a read, the parallel resistor is on to terminate the transmission line to reduce reflections that degrade signal integrity and the ability to reliably read data.

The significant benefits of dynamic onchip termination are realized whenever the bus is either performing a write from the FPGA or the bus is idle. First, power is greatly reduced—1.6 W of static power can be saved on a 72-bit DDR2 bus. In addition, a pure series line termination is achieved when writing. Finally, the need for a large number of board termination resistors is removed, saving board cost and complexity.

DDR3 provides 30% lower power than DDR2 because it runs at a lower voltage: 1.5 V versus 1.8 V. For example, a system with a 72-pin, 200-MHz or 400Mbit/s memory interface with on-chip termination would dissipate 3.9 W for only one memory interface. Using dynamic on-chip termination (wherein the parallel termination resistor is turned off when idle or when performing a write) can save 1.6 W. If both DDR3 and dynamic on-chip termination are used, power drops to 1.6 W, saving a total of 2.3 W. These savings are on a per interface basis (i.e., four memory interfaces in an FPGA would save 9.2 W).

The move to very small process nodes—65-nm and below— delivers the expected Moore's Law benefits of increased density and performance. However, the boost in performance results in huge increases in power consumption, introducing the risk of consuming unacceptable amounts of power.

If power-reduction strategies aren't used, static power consumption will increase significantly. Also, without a specific power optimization effort, dynamic power consumption rises due to the increased logic capacity and higher switching frequencies.

Overcoming these power challenges with an enabling and innovative architecture, combined with process technology and circuit techniques advances, provides an efficient and scalable solution for today's increasingly complex FPGA-based designs.

Hide comments


  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.