Save Those Watts With A Power-Aware Design Flow For SoCs

At a time when a single data center may consume more power than millions of homes¹, it's easy to see that power consumption has become critically important for all designs—not just battery-powered products. Leakage power now dominates 90- and 65-nm devices, and high power consumption imposes ever more severe heat and performance penalties.

Of course, the chip-or system-level power requirements are in addition to the perennial requirements of higher performance, lower cost, and faster time-to-market. As a result, one must rethink chip design methodologies that traditionally didn't consider power reduction as a critical requirement.

Fewer and fewer designs can tolerate the traditional approach of managing power as an afterthought. Trying to correct power problems after the design's architecture is fixed makes it difficult to manage power/timing/area tradeoffs, perform functional verification, and manage the many other design steps affected by power consumption. The incremental nature of these steps results in sub-optimal power reduction. Therefore, reducing system-on-a-chip (SoC) power consumption must be considered at every aspect of the design flow—from architecture and library characterization to verification and final layout. It's also critical that the flow provides visibility and control of power/timing/area tradeoffs from the earliest stages, while ensuring continuous convergence as the design progresses.

EVOLVING TOWARD TRUE POWER-AWARE DESIGN Designing SoCs with good power management requires a design flow that integrates appropriate power-saving methodologies to the greatest possible extent. Such a flow avoids extra design hierarchy for meeting power goals. It also lets designers use the same scripts for single or multiple power domains. It eliminates unnecessary iterations. Moreover, it enables better quality of silicon (QoS).

The table provides a rough idea of the power reductions available from various techniques, along with the timing/area tradeoffs and their potential methodology impact. In general, there's a tradeoff between the amount of power reduction you can expect and the amount of work needed to apply the techniques.

The challenge for designers is choosing the most suitable power-management techniques that deliver the target QoS while minimizing the cost and risk associated with methodology changes.² As an example, the following techniques can be added to the traditional design flow without fundamentally changing the way the tools work:

Global concurrent optimization of timing, area, and power
Leakage optimization methods, including multi-VT synthesis
Hierarchical clock gating
Low-power clock-tree synthesis

All of these techniques are useful, and the list could be even longer, including a number of other well-understood techniques that involve minimal tradeoffs for designers. Techniques such as pin swapping, operand isolation, and toggle-rate reduction may be easy to implement, but they have minimal impact on power.

Our recommendation is to certainly consider these techniques as the first step toward adopting power-aware design methodologies. Though easy to adopt, power reduction from these techniques is limited. To achieve dramatic reduction, you have to consider advanced techniques.

Two of these techniques—multiple power domains and power shut-off methods—are worth a closer look because they've become the focus for minimizing both active and leakage power in a broad range of designs. Although a number of design teams in the past used these techniques for power-critical designs, the overall methodology was manually tedious and risky. In addition, the design approach resulted in sub-optimal power, area, and timing tradeoffs. Over the last year, however, EDA tools that automate the entire design flow and permit the adoption of advanced techniques with minimal methodology impact have surfaced.

MANAGING MULTIPLE POWER DOMAINS Multiple domains can conserve power in virtually any SoC if you simply run some domains at a lower supply voltage or switch off a domain's power when it's idle (Fig. 1). Traditionally, you had to create multiple domains by partitioning the design, synthesizing each block for the lowest VDD that should support the target timing, and then put the blocks together to see whether the design worked.

This manual, labor-intensive methodology generally resulted in overpowering the voltage domains to ensure some timing margin. Today, with the right kind of tools, it's possible to use the same flow for either single or multiple domains (Fig. 2). The latter flow simply adds two steps for switch-cell insertion (to turn off each domain's power independently) and optional level-shifter/ isolation-cell insertion.

Today's multidimensional optimization lets you assign voltages to blocks without partitioning and then synthesize each block with the lowest voltage level that meets overall timing. This approach also lets you quickly perform a "what-if" analysis for voltage levels to determine the lowest voltage that can meet timing targets.

Even so, bear in mind that you need the right methodologies for physical implementation (especially for power switch-off). You must be thorough with verification. The choice of where to place the level shifters (i.e., which power domain), the size of level shifters, and the power-grid definition all can further affect design tradeoffs.

One of the basic choices that designers need to make is the number of power domains and power supplies to use. Don't be tempted to go with lots of different supply voltages to minimize power consumption, since each voltage and domain adds an area penalty.

For example, you need separate power-distribution networks for each voltage and many level shifters between domains. Also, you must supply each voltage to the chip or include voltage regulators on-chip. Of course, with the right kind of design tools, one can do early explorations to make tradeoff decisions among the number of power supplies, number of power domains, and associated area penalty.

SWITCHING OFF DOMAIN POWER Switching off power to a block is an excellent way to reduce both dynamic and leakage power. However, power switch-off (PSO) incurs further over-head for the switch-off gates, state-retention registers, and isolation logic. The latter devices clamp outputs from the switched-off domain to ensure that you avoid propagating unknown states.

Instead of manually inserting isolation cells, we advise adopting synthesis tools that can insert these cells automatically, as well as use formal verification techniques to ensure adequate isolation. Functionally, remember to allow an appropriate bring-up time when restoring power to domains. For successful implementation, pay particular attention to these factors:

Analyze switching activity to ensure that the block is inactive often enough to justify the overhead associated with this methodology.
Size power gates to accommodate IDsat (total) switching current.
Ensure fast slew rate for power-gate control.
Keep simultaneous switching capacitance small to prevent ground bounce.
Make sure power-gate leakage doesn't increase the overall leakage.

When planning a switch-off strategy, the biggest choice you face is deciding between fine-grain and coarse-grain power-gating schemes³ (Fig. 3). For the fine-grain approach, every standard cell that you want to power-down must include a power-gating transistor. Because this transistor must be sized to accommodate worst-case current (i.e., assuming that the cell will switch during every clock cycle), this approach has a relatively high area penalty. On the other hand, fine-grain power gating is transparent to synthesis, and you can implement the technique using existing tools.

The real challenge—and the future of PSO technology—lies in the coarse-grain approach, which greatly reduces the area penalty by using a single gate to cut off power to an entire block. Gate sizing depends on the block's switching current. Since only a fraction of a block's circuits switch at any given time, these power gates can be much smaller than the total area required for fine-grain switches. We recommend using dynamic power simulators with worst-case vectors to determine the switching current that the power gates must handle.

Simultaneously switching on several blocks at once, after being powered off, raises major IR-drop concerns. You can limit switching capacitance (and thus ground bounce) by daisy-chaining the gate-control buffers, which creates a natural delay in the control signal propagation. That, in turn, sequentially turns on the power switches.

Fast turn-on times can lead to unexpected problems, such as latch-up, because of large voltage differences across the circuits being powered up. This requires dynamic gate power analysis to better understand the turn-on characteristics of a block and analyze the rush current or ground bounce.

Coarse-grain power gating demands advanced EDA tool support to make the necessary tradeoffs in gate size, placement, routing, simultaneous switching analysis, and the slew rate of the gate-control signal. Also, note that absolutely clean signal integrity (SI) is vital for this control network because a spurious signal caused by an SI aggressor could shut down an entire module.

Implementation and analysis tools need to be multiple-power-domain aware so that they can perform the necessary tasks without compromising the quality of silicon. With these capabilities, coarse-grain power gating will deliver the aggressive power reduction required for today's and tomorrow's designs.

Dynamic voltage and frequency scaling (DVFS) offers another choice for substantial power reduction. Scaling the frequency and voltage to a given block simultaneously can result in higher power/energy savings than scaling any of these parameters individually. To employ the DVFS, you need a subsystem that automatically adjusts the supply voltage and clock frequency of some power domains to suit the application's workload at any given time. The software supplies the control based on when an application is scheduled and how long it will be active. Typically, hardware scales supply voltage and clock frequency in response to software demands.

DVFS can be applied to designs with a deterministic work pattern, such as a microprocessor. In a microprocessor, the operating system schedules various tasks and keeps track of when each task starts and when the next task becomes active. To implement DVFS, simultaneous multiple-mode multiple-corner analysis and optimization is required. If this isn't available, verifying all of the structures needed to implement the technique takes lots of time and many manual interventions in the design flow.

POWER-AWARE PHYSICAL IMPLEMENTATION Even if your synthesis tool automatically optimizes power-related structures (such as multiple power domains) and your analysis tools simplify verification, you need physical implementation methodologies that follow through on the necessary strategies. In our discussions with designers, we often find that the physical implementation—ranging from floorplanning to routing—is considered to be the most challenging aspect of advanced power-management techniques.

Early in the flow, silicon virtual prototyping and design partitioning play important roles in reducing power consumption. For example, since interconnect capacitance is a big component of the dynamic power, wires with high switching probabilities must be kept as short as possible. That usually means keeping such wires within a partition.

Cell placement must account for this need without extensive iterations to determine which wires contribute the most capacitance to the power equation. In particular, multi-power-domain designs need partitioning and floor-planning strategies that support the domains' requirements. Doing this requires integration with analysis tools that can accurately predict power relationships even at the floorplanning stage—a non-trivial task, but an essential capability for meeting power goals.

Similar requirements apply to routing. But at this stage of the flow, you have extracted Standard Parasitic Exchange Format (SPEF) information based on routing to support accurate power analysis. The physical implementation tools must use this information for power-grid synthesis, clock-tree synthesis, and further optimization to meet power goals. For multiple power domains, the tools must handle signal routing and power connections to level shifters in suitable ways. Level-shifting isolation cells bridge from one power domain to another, which requires that tools handle these cells carefully (Fig. 4).

The router must understand the power-down-switch placement scheme to implement any power-down approach. For distributed switch layouts, it's important to place the switch and then make sure it has adequate current-carrying capacity. The switch-enable distribution also is critical because this signal uses a high-fan-out net that's always on. Thus, buffers on this net must be kept out of partitions that will be powered-down.

VERIFYING AND ANALYZING LOW-POWER DESIGNS Verification tools must be able to check details such as power-down-switch-enable networks, even though name mapping can be a challenge. State-retention power gating (SRPG) cells pose particular verification challenges. The master latch in the flipflop connects to switch power (VDD), while the slave connects to always-on power (VRET) (Fig. 5a). When the clock is disabled (gated) and retention (RET) is enabled, the master latch powers down while the slave latch retains the state (Fig. 5b). At wakeup, power switches back on for the master, RET is disabled, and only then is the clock enabled.

The best way to make sure that such power-up and power-down sequences work correctly throughout the design is to analyze the logic using formal proofs. In fact, formal verification is essential to ensure that low-power optimizations done in logic synthesis, physical synthesis, and place and route don't introduce logical errors.

Low-power design requires several types of analysis, including power integrity analysis that checks for IR-drop and electromigration effects for multiple voltages. The tools must consider chip temperature (thermal distribution), because hot spots can contribute to leakage power surges and add to the chances of electromigration within the wires.

Traditionally, accurately modeling cell delays for a particular voltage level required characterized timing views for that voltage level—a requirement that involves a great deal of characterization. Even then, voltage variations due to IR drop affect timing in ways that require additional analysis.

Fortunately, current-source models make it possible to predict the effects of voltage variations on delay to within 2% of Spice simulations, including accounting for IR drop. Moving to libraries and tools that use such models will greatly reduce the work involved in verifying designs with multiple power domains and enable faster timing closure. Otherwise, it's a good idea to limit the number of different supply voltages.

Ignoring the impact of IR drop on timing can cause additional setup or hold violations due to the varying operating voltages around a design. Additional delays caused by IR drop on the signal paths create setup-time violations, while additional delays on the clock network create hold-time violations. Performing timing analysis at fixed voltages will never show these additional violations because operating voltages are treated as constant.

WHAT'S NEXT The use of multiple power domains raises issues for design reuse. Specifically, how do you design blocks for power shutoff while making the RTL reusable? How do you ensure that the functionality isn't altered when the blocks are shut down? How do you simulate the effect of level shifters and isolation cells? Do you create a mixed gate and RTL simulation for these conditions?

Today, specifying power intent for every design stage is cumbersome and ad hoc. Engineers implementing advanced low-power techniques face these challenges right now. As more and more engineers look toward these advanced techniques as the only option available for power reduction, the need to support them holistically becomes more apparent.

Hence, engineers should demand that their tools talk and understand in a similar manner when it comes to power. The simulation tools must talk to the synthesis tools, and the synthesis tools must talk to the implementation tools—using the same understanding as to how the designer has specified the design intent for power.

A common specification of power intent understood by all of the tools, from architectural specification to functional verification to implementation, could simplify the design process enormously. What's needed is an infrastructure to capture power intent throughout the flow. This level of support is imperative for any future tools, since these engineers don't want to compromise anything to realize their power-reduction needs.

Mohit Bhatnagar is currently responsible for marketing digital prototyping and physical synthesis products for Cadence Design Systems Inc. (www.cadence.com), San Jose, Calif. He holds a a PhD in electrical engineering from North Carolina State University, Raleigh.

Jack Erickson is product marketing director for RTL synthesis at Cadence Design Systems Inc. He holds a BSEE from Tufts University, Medford, Mass.

Anand Iyer is product marketing director for the Encounter digital IC design platform at Cadence Design Systems Inc. He holds an MBA from Santa Clara University, Calif., an MSEE from the University of California, Santa Barbara, and a master's of technology in reliability engineering from IIT, Bombay, India.

Pete McCrorie is responsible for DFM product marketing at Cadence Design Systems Inc. He holds a master's in physics and electronics from the University of Liverpool, United Kingdom.

References: