Gain Abstraction And Accuracy From RTL Power Estimation

Jan. 18, 2007

13 min read

Excessive power consumption can destroy a design’s commercial viability. Modern cell phones are permitted to consume no more than a few hundred milliwatts for voice communications. Yet in the past, designers were forced to estimate power consumption using a manual spreadsheet approach with inaccuracies of scores of milliwatts. Worse, the design of an advanced 3G phone with multiple functions—camera, video, audio, and data connectivity—will be especially power-critical. Measuring power consumption only after you’ve produced silicon prototypes, or even synthesized logic to gates, is far too late.

So how can you know early in the design cycle whether you’ve exceeded the power budget? How can you optimize the design for power early enough to avoid time-consuming, late-stage redesigns?

This article describes a register-transfer-level (RTL) power-estimation methodology that designers have shown to be accurate to within 8% to 15% of actual silicon power consumption. Such accuracy is more than sufficient to make the critical “big-picture” analyses and decisions that determine chip-level power consumption and to support an RTL power sign-off milestone.

But let’s first ask: “What’s wrong with traditional power-estimation methodologies?” The manual spreadsheet approach has broken down in the face of nanometer design complexity, exacerbated by the need for greater accuracy—especially for leakage power—at smaller process nodes and by interdependent power-management techniques. This has robbed design teams of the up-front analysis necessary to guide chip architecture decisions. Chip micro-architecture is a major determinant of chip power consumption—accounting for 80% or more of the power. Consequently, the failure of the spreadsheet approach presents quite a serious problem for chip architects.

Moreover, power analysis undertaken at the gate level has failed for the same reasons. Worse, the gate and transistor levels of abstraction are microscopic levels that possess no chip micro-architecture context. This severely limits the designer’s ability to identify power-hungry functional scenarios and devise appropriate “worst-case power” stimuli. In addition, these microscopic levels of abstraction overload not only the designer, but also the performance and capacity of power-estimation tools. In any case, gate-level power estimation is executed far too late in the design flow to take significant power-reduction measures.

Now we come to the question of economics and effort. Choosing power-estimation and power-reduction methodologies involves not only what they can (and cannot) do and how well they do it, but at what cost. Costs include power-estimation tools and model libraries (and their up-keep), the additional die area used to implement power-management techniques, and the cost (and time) to achieve the power-consumption objectives. These costs, of course, affect both the design team’s budget and the device’s economic competitiveness.

Certainly, designers have their favored power-management techniques (Fig. 1). These data were collected from 115 respondents to a targeted survey conducted by the authors. The respondents were SoC designers in wireless telecommunications (31%), portable electronics (21%), and networking (27%).

Clearly, these power-management techniques are in widespread use. Thus, the selected power-estimation methodology must measure the effects of these techniques in any given design with sufficient accuracy. The designer can therefore deploy them with maximum effectiveness. This then begs the question: What is that power-estimation methodology?

The Story At RTL
RTL is the level of abstraction that possesses both the micro-architectural context for determining “big-picture” power consumption and the structural detail necessary for reasonably accurate analysis. It’s the prime level at which the effects of power-reduction techniques, such as clock gating, power gating, mixed threshold voltages, voltage islands, and memory partitioning, can realistically be estimated. And it’s also the point in the design flow at which effective remedial action can be taken with minimal adverse effects on design time.

However, this raises a question: Is RTL power estimation accurate enough? User-reported silicon correlations can be shown for an RTL power-estimation methodology (such as Sequence Design’s PowerTheater) compared to those for the gate level, Spice, and silicon (Fig. 2), which assumes a simulation vector set common to all levels. Here, RTL estimation results correlate to within 8% to 15% of silicon, and even overlap gate-level estimation results. Indeed, a leading consumer electronics manufacturer has reported a correlation between RTL and gate level of within 3% to 8% on three designs.

The difference between RTL and gate-level results is determined largely by power model accuracy and the algorithms used by the RTL estimation methodology. Largely, the libraries and simulation vector factors determine the difference between the gate-level results and silicon.

This RTL correlation is sufficient to establish an RTL power sign-off milestone—similar to the functional/timing sign-off that’s been used for over two decades.

How it works
So, how does such an RTL power-estimation methodology work? We’ll start by defining what must be measured, and then derive and discuss some basic methodology attributes necessary for accurate measurement.

The total power consumption—the sum of dynamic and static power consumption factors—of a device is expressed as:

P_TOTAL = P_DYNAMIC + P_STATIC,

where P_DYNAMIC = (Pdyn_cells + Pdyn_loads) and

P_STATIC = (Pstatic_current + Pstatic_state)

It should be noted that static power consumption has increased—both in absolute terms and as a proportion of total power consumption—as processing technology has moved to ever-smaller feature sizes. Indeed, at 90 nm and 65 nm, static power consumption constitutes 20% to 30% of total power consumption (Fig. 3).

Cell power modeling
An effective RTL power-estimation methodology must leverage a standard, easily-available library of cell power models with accurate measurement algorithms.

Cell- and load-level calculations—derived in conjunction with simulation—require the deployment of cell models that capture all of the requisite data. So what data and cell models are required?

The factor Pdyn_cells is the sum of the dynamic power consumed internally by all cells, which is determined largely by internal capacitance charge/discharge and the crowbar currents from V_DD to V_SS when an internal node switches state. Power-estimation tools measure this factor using cell power models compliant with standards such as Liberty (.lib), as well as the emerging Effective Current Source Model (ECSM) and the Liberty Composite Current Source (CCS) library data formats for synthesis. All of these standards emphasize power analysis and optimization. The models must also comprehend the effect of the reduced voltage swings used in, for instance, high-speed CMOS I/O cells.

The factor Pdyn_loads is the sum of the power consumed in charging all nodal capacitances—both wire and pin capacitance. It also depends on both input ramp time and output load, which determine the time that the V_DD/GND path is open. The calculation of load-dependent power consumption must also include the effects of three-state bus switching, where identification of the active driver is essential.

The factor Pstatic_current is the sum of the state-independent static power consumed by all cells, and is determined by the current from V_DD to V_SS when the cell isn’t switching. This is a relatively simple calculation.

Less simple is the measurement of Pstatic_state, which is the sum of the state-dependent static power consumed by all cells. This power consumption depends on the time that any given cell spends in any given state—data that are derived from simulation. A particular case of state-dependent power consumption involves I/O pads with external terminations; thus, the calculation must comprehend the termination voltage and any pull resistors.

Of course, the entire analysis is activity-dependent and varies over time as circuit modes of operation change. Block power is temporal, so vectors must realistically reflect the stimulus from the environment in which the block operates. In addition, all signal nets during RTL simulation are captured by performing a level-0 .vcd dump.

Libraries and Accuracy
Clearly, each cell power model is a multi-characteristic entity. To achieve the requisite model accuracy, the accuracy of each characteristic of the cell and its constituent gates must be maximized. For instance, the dynamic power of a gate is expressed as:

Pdynamic = f*C* V²

where f is the frequency of operation, C the capacitance, and V the supply voltage.

A “rough-and-ready” approach to power estimation is to assign a gate model that approximates the gate in use. Static timing analysis identifies critical paths in which gates must be faster or drive larger loads and, again, the nearest approximate gate models assigned. This approach fails, however, to comprehend the complexity and difficulty of accurately measuring f and C. If the gate is calculated to switch 20% of the time and actually switches 30% of the time, the power consumption will be 50% higher than estimated. This is also true for capacitance. A power-estimation methodology must be able to accurately measure f and C.

In any case, timing analysis doesn’t enable activity and power estimation over time. Moreover, the gate model assigned by the “rough and ready plus timing analysis” approach won’t necessarily reflect the gate that’s actually assigned by logic-synthesis tools. How do these tools handle clock gating, power gating, mixed threshold voltages and clock-tree synthesis?

A power-estimation methodology must therefore deploy tools that estimate activity and power over time; “understand” downstream gate optimizations; and use a comprehensive library of accurate cell power models. “Rough and ready” simply isn’t good enough.

Using accurate cell power models with the attributes delineated above, coupled with annotated wire capacitances or wire load models, the methodology can estimate chip-level power consumption. However, local power optimization—at the cell and wire level—will have a relatively small impact on global power consumption. The micro-architectural level is the level at which power consumption is best analyzed and optimized. Therefore, an effective chip-level power-estimation methodology must also infer—or abstract—the micro-architecture from the HDL code. Now let’s take a look at the micro-architectural abstraction issues.

Micro-architectural Inferencing and Algorithms
Micro-architectural inferencing is simply the abstraction of parameterized, higher-level blocks from the HDL design, using appropriate language compilers. These blocks can include sequential elements (registers and latches), instantiated elements (I/O cells and memories), datapath elements (adders and multipliers), and control logic (decoders and multiplexers).

The output of micro-architectural inferencing is a complete interconnected netlist of inferred modules. The inferred modules are then mapped to their corresponding power models, and any instantiated elements are mapped to library file entries. The resulting power model netlist is combined with the target processing technology to deliver accurate power estimation.

Commonly inferenced components include primitives such as registers; latches; adders; multipliers; 2-1 multiplexers; tri-states; buffers; inverters; NAND, OR, NOR, XOR, and XNOR gates; and macros such as register files, latchfiles, unencoded muxes, and decoders. Macros are modeled using primitives as building blocks.

Such high-level inferencing is quite accurate because the stimuli necessary to exercise these macros comprehends its operational environment and can be tailored to its actual operation modes in the design. Deconstruction of the macros to the gate level destroys this functional view, and necessitates the generation of gate-level stimuli, which may not necessarily accurately or comprehensively reflect the macro’s operation. Moreover, estimating power consumption at the macro level reduces the run time and increases the capacity of the power-estimation tool.

The HDL code can be structural, behavioral, or a mixture of the two, although language compilers perform inferencing on only the behavioral code. Clearly, inferencing must adhere to standard guidelines and rules for gate-level synthesis to avoid netlist mismatches.

RTL power estimation tools can measure the power consumption of these micro-architectural blocks. This lets the designer perform architectural “what if?” analyses and design modification to power-optimize the design.

From a graphical summary of many of the factors that determine the accuracy of RTL power estimation, it’s clear that power estimation is a multi-dimensional problem (Fig. 4). A power-estimation methodology must comprehend all of these factors to deliver the requisite accuracy.

Examples of Silicon-Aware Power Optimization
One power-optimization technique utilizes a power-estimation tool to measure the effects of clock gating. The designer simply defines the conditions under which clock gating should be applied, and the tool applies the appropriate clock-gating cells designed according to an industry-standard methodology. The tool specifies which registers are gated, as well as the power-consumption effects.

Another technique is the use of mixed threshold-voltage (V_T) cells. Higher-performance circuits often use low-V_T cells at the expense of higher leakage, but an easy-to-implement optimization technique is the use of multi-threshold voltage cells. An early RTL power-estimation methodology can predict leakage based on low-V_T cell population expectations.

Yet another power-management technique involves using voltage islands, which supply different voltages to different parts of the design—for instance, the functional core and the input/output (I/O) ring. Given that power is proportional to V², even a modest reduction in supply voltage—to, say, blocks with non-critical timing—can significantly reduce power consumption. Specifically, reducing the supply voltage from V_HIGH to V_LOW results in a power consumption of:

P_LOW = P_HIGH*(V_LOW/V_HIGH)²

Thus, changing the supply from 1.2 V to 0.9 V reduces power consumption by approximately 44%. Again, an RTL power-estimation tool can measure the effects of various voltage island configurations.

Another effective optimization is the use of power gating, which puts various functional domains in the design into “sleep” mode when not required. This completely reduces the combinational logic power to zero in the affected circuits. It does require the use of State Retention Registers to hold critical state node values during sleep mode. Early planning for power gating is becoming important and EDA tools that let you do this result in efficient power-management circuitry. The user defines a virtual power supply per library supply rail per power domain to support the specification of power on/off conditions. With an appropriate power-estimation tool, the user can perform multiple “what-if” analyses to determine the optimum power-gating scheme. Again, early power planning using an RTL estimation methodology achieves optimal results.

Users who address chip power consumption with a combination of these types of analysis and optimization consistently report overall power savings in the range of 30% to 65%.

Results
An RTL power estimation methodology has achieved accurate results on a number of advanced chip designs, as shown in (Table 1):

The mixed signal Wi-Fi device—a multi-million gate design—was developed by Airgo, a wireless connectivity company that invented the multiple-input, multiple-output (MIMO) technology for next-generation Wi-Fi. The design achieves IEEE 802.11 a/b/g compliance, integrating a 2x3 MIMO system, PHY and MAC layers, as well as analog-to-digital converters and digital-to-analog converters—all on one chip.

Cradle Technologies, a fabless semiconductor company that develops multi-core DSP (MDSP) for next-generation multimedia applications, designed the MDSP. The chip deploys 16 DSPs and eight general-purpose processors to encode 16 channels of MPEG4 SP@L3, 4 channels of MPEG4 ASP@L5, or 1 channel of H.264 Main Profile D1—all in real time.

Toshiba developed the 11 mobile wireless designs. In addition to achieving an average correlation within 15% of silicon, the methodology enabled a 10X to 30X leakage reduction across those designs.

Conclusion
Extensive comparative analyses of power estimation at RTL, gate, and silicon levels demonstrate that RTL power estimation is more than sufficiently accurate to estimate chip power consumption. And, it’s done at an early juncture in the design flow so that significant optimizations can be performed without jeopardizing the design schedule. Certainly, gate-level analysis is somewhat more accurate. However, the competitive advantage of understanding chip power-consumption characteristics—and specifically what causes them—early in the design flow is considerably greater than having a slightly better understanding late in the design flow. In addition, effective optimizations at the gate-level are very difficult and time-consuming due to the lack of micro-architectural visibility—the visibility that’s only offered by an RTL-based power-estimation methodology.

So, does RTL power estimation combine the advantages of both abstraction and accuracy? Yes it does.