Analyze And Optimize Throughout The Flow

The increased use of portable, wireless, battery-powered electronic systems is driving the demand for integrated-circuit (IC) and system-on-a-chip (SoC) devices. After all, these devices consume the smallest possible amounts of power. With every new product generation, users demand smaller size, increased functionality, and longer battery life. In the case of a modern cell phone, for example, users expect advanced features. They think that the phone should have the ability to act as a personal organizer, play games, take and transmit pictures, connect to the Internet, and so forth. At the same time, they expect the device to weigh in at around 4 oz. or less. The device's battery is expected to last at least three hours when in use, or five or more days while in standby mode.

Whenever the industry moves from one technology node to another, existing design constraints are tightened. New constraints then emerge. In the case of today's increasingly complex and sophisticated devices, power, signal-integrity (SI), and timing constraints need to be imposed throughout the entire design flow. Such an approach should maximize the quality of results as well as the reliability of devices.

The problem is that power considerations, SI issues, and timing effects are related. These relationships become more significant with deep-submicron (DSM) implementation technologies. The combination of design power closure, circuit power integrity, signal integrity, and timing closure are a major drain on engineering resources. They impact the device's total time to market.

To create optimal low-power design, tradeoffs like timing versus power and area versus power must often be made at different stages of the design flow. At the same time, the timing and integrity of signals must be ensured. Engineers therefore need access to appropriate power, SI, and timing-analysis and optimization engines. Such engines need to be integrated with, and applied throughout, the entire RTL-to-GDSII flow.

Furthermore, the complex interrelationships between diverse effects must be handled. It is necessary to use an integrated design environment in which all of the power, SI, and timing tools are fully integrated with each other. The flow also should have other analysis and implementation engines. For example, varying cell sizes affect their associated currents (and power consumption). This, in turn, affects the voltage drops associated with these cells.

To fully account for the impact of voltage-drop effects, it is important to derate for timing on a cell-by-cell basis based on actual voltage drops. The timing-analysis engine should concurrently make use of this derated timing data to identify potential changes to the critical paths. In turn, the optimization engine should make appropriate cell sizing changes to address potential setup or hold problems, which appear as a result of the timing changes. This will affect the currents, which will then affect the voltage drops, and so on.

Similarly, voltage drops can alter the noise margins and the susceptibility to crosstalk effects that are associated with a cell. These crosstalk effects impact cell performance in terms of functionality and timing. To fix timing, it may be necessary to resize the cells. Resizing, in turn, affects their power consumption and associated voltage-drop effects. These effects impact their noise margins and susceptibility to crosstalk, and so on. If any of these interrelationships are not addressed due to the lack of a concurrent, integrated design environment, the competitors will surely be first to market with lower-power designs.

The term quality of results (QoR) refers to the way in which engineers measure different aspects of a design compared to the original design goals. They assume that the higher the speed, the higher the quality; the smaller the die size, the higher the quality; the lower the power dissipation, the higher the quality, etc. This article shows how concurrent analysis and optimization delivers lower-power designs with higher QoR. An introduction of the key signal-integrity, power-dissipation, and power-distribution considerations follows:

Common signal-integrity effects
Early IC implementation technologies were cell-delay dominated. The delays associated with the logic elements far outweighed the delays associated with the interconnect. By comparison, today's DSM implementation technologies are interconnect-delay dominated. In terms of relative magnitude and significance, resistive and capacitive (RC) interconnect-delay effects that used to be third or fourth order are now first order. As a result, any changes in signal behavior can have a major effect on the design's quality.

Increased sidewall-capacitive coupling
Again, look to early IC implementation technologies. The aspect ratio of tracks was such that their width was significantly greater than their height (FIG. 1A). Feature sizes continue to shrink, however. The processes that were used to create these devices result in track aspect ratios in which height predominates over width (FIG. 1B).

The outcome is a dramatic increase in the coupling capacitance (C_XCOUP) between the sidewalls of adjacent tracks relative to the substrate capacitances, C_AREA (track base to substrate) and C_FRINGE (sidewall to substrate). Also, today's devices are associated with high integration densities. They can support eight-plus metallization layers. Yet these integration densities result in significant capacitive coupling between adjacent layers, as represented by C_CROSSOVER(FIG. 2). The combination of these factors leads to a tremendous increase in the complexity of crosstalk-related noise (glitch) and timing effects.

Crosstalk-induced glitches
When signals in neighboring wires transition between logic values, the coupling capacitance between the wires causes a transfer of charge. There can be significant crosstalk-induced glitches, depending on the slew of the signals (the speed of switching in terms of rise and fall times). These glitches also are impacted by the amount of mutual crosstalk capacitance or C_XTALK(Figure 3).

In this example, a transition on a fast aggressor net causes a glitch to be presented to the input of an adjacent victim net's load/receiver. Of course, this illustration presents a very simplistic view. In reality, each track may be formed from multiple segments that occupy multiple levels of metallization. The resistances (R_WIRE1 and R_WIRE2) and capacitances (C_WIRE1 and C_WIRE2) will consist of the multiple elements that are associated with the different segments.

The mutual coupling crosstalk capacitance (C_XTALK) also may consist of multiple elements. If segments of the aggressor and victim nets are neighbors on the same metallization layer, some of these elements will be equivalent to C_XCOUP. They will be equivalent to C_CROSSOVER if the aggressor and victim nets cross each other on adjacent layers.

The example glitch illustrated in Figure 3 represents only one of four generic possibilities. They are based on the fact that a rising or falling transition on the aggressor net may be coupled with a logic 0 or logic 1 on the victim net (FIG. 4). If the ensuing low-noise or high-noise glitches on the victim net cross the input-switching threshold of its load/receiver, a functional (logic) error may occur. In some cases, this error may manifest itself as an incorrect data value that is subsequently loaded into a register or latch. In other cases, the error may cause a latch to perform an unintended load, set, or reset.

The victim net's low-undershoot and high-overshoot glitches pose a different problem. They can cause undesirable charge carriers to be trapped in the transistors that form the logic gates. The result is degraded circuit performance. These effects, which are commonly known as hot-electron effects, are not a major threat in the context of current IC implementation technologies. They will become increasingly significant, however, as device geometries progress further into the DSM realm.

Crosstalk-induced timing errors
The situation becomes even more complex when simultaneous switching occurs on both the aggressor and victim nets. For example, in the case of opposing transitions, the signal on the victim net may be slowed down (FIG. 5). If that signal were transitioning in isolation, it would take a certain amount of time to reach its load/receiver's switching threshold. (For the purposes of these discussions, that threshold is assumed to be 50% of the value between logic 0 and logic 1.)

In this example, the glitch is caused by a simultaneous transition on the aggressor net. Yet it holds the victim's signal above its load/receiver's switching threshold for an additional amount of time. This may ultimately result in a downstream setup violation.

An alternative scenario occurs when a transition on the victim is complemented by a simultaneous transition on the aggressor in the same direction. In this case, the signal on the victim may speed up (FIG. 6). The glitch caused by a simultaneous transition on the aggressor net causes the victim's signal to cross the load/receiver's switching threshold earlier than expected. The result may be a downstream hold violation. (The examples shown here are somewhat simplistic. In real-world designs, multiple aggressors may affect each victim net. The accurate analysis of today's DSM designs requires each aggressor's contribution to be individually accounted for and analyzed.)

The following discussions focus on power dissipation. They assume the use of complementary-metal-oxide-semiconductor devices, as CMOS is currently the most prevalent digital IC implementation technology.

DYNAMIC POWER Dynamic power consumption occurs in logic gates that are transitioning from one state to another. During the act of switching, any of the internal capacitances associated with the gate's transistors have to be charged. As a result, they consume power. More importantly, the gate also has to charge any external (load) capacitances. These capacitances are comprised of the parasitic wire capacitances and the input capacitances that are associated with any downstream logic gates.

For the purposes of this article, the amount of dynamic power consumption may be represented using the following equation:

Dynamic Power ~ α f × C × V² Where: αf = the amount of activity as a function of the clock frequency (f); C = the amount of capacitance being driven/switched; and V² = the square of the supply voltage.

This equation shows that the dynamic power consumption may be reduced by minimizing the circuit activity, reducing the capacitance being driven, and/or reducing the supply voltage. The amount of switching activity may be reduced in a number of ways. The most obvious solution is to reduce the frequency of the system clock. Yet this approach will have a corresponding impact on device performance. Another technique is clock gating. Here, the distribution of the clock is restricted to only those portions of the device that actually perform useful tasks at that time. Local data activity (glitches and hazards) also can be reduced. Just apply the appropriate delay balancing.

If the engineer opts to reduce the amount of capacitance, one approach is to downsize the gates that are driving over-driven wires. The capacitance values associated with these gates will then be lowered. It also is possible to use a power-aware placement algorithm. Such an algorithm minimizes the length of critical wires, thereby reducing their associated parasitic capacitances. Ideally, such power-aware placement should be weighted by the amount of switching activity that is associated with each wire. It also is possible to exploit technology options, such as using low-resistance/low-capacitance copper (Cu) tracks and low-k dielectric (insulating) materials.

The last option for cutting dynamic power consumption is to reduce the supply voltage. Keep in mind that dynamic power is a function of the square of the supply voltage. As a result, lowering the supply voltage has a dramatic effect with regard to reducing a logic gate's power consumption. The act of lowering the supply voltage to a logic gate, however, also significantly reduces that gate's switching speed. One option is to have different areas of the chip running at different voltages. In this case, performance-critical functions would be located in a higher-voltage domain. Non-critical functions would be allocated to a lower-voltage domain.

Tradeoffs also can be made between functional parallelism and frequency and/or voltage during the early (algorithmic and architectural) design stages. For example, a block of logic running at frequency "f" with supply voltage "V" may be replaced with two copies of that block. Each copy then performs half of the task while running at a lower frequency and/or using a lower voltage. Using this technique, the total power consumption of the function may be reduced. Yet the performance is only maintained at the expense of using more silicon real estate.

STATIC POWER Static power consumption is associated with the logic gates that are not currently in the process of switching between states. Theoretically, such gates should not be drawing any power at all. In reality, however, some amount of leakage current is always passing through the transistors.

An individual transistor's static power consumption is extremely small. Yet the total effect becomes significant with today's integrated circuits, which can contain tens of millions of logic gates. Furthermore, transistors shrink in size when the industry moves from one technology node to another. The level of doping then has to be increased, which causes the leakage currents to become relatively larger. It follows that even if a large portion of the device is totally inactive, it may still be consuming a significant amount of power.

When it comes to addressing static power consumption, one equation needs to be considered first. This equation describes the leakage that is associated with the transistors:

Leakage ~ exp (−qV_t / kT )

As the chip heats up, its static power consumption increases exponentially as a function of its temperature (T). In addition, the static power consumption has an exponential dependency on the switching threshold of the transistors (V_t). To address low-power designs, IC foundries now offer MTCMOS technologies. These technologies enable multiple V_t libraries. The logic gates that were formed from low-threshold transistors will switch quickly. Yet they also will have higher leakage and consume more power. The gates formed from high-threshold transistors will have lower leakage and consume less power. They will switch more slowly, however.

Next, consider the equation that describes how the delay (switching time) associated with a transistor is affected by its switching threshold (V_t) and supply voltage (V_DD):

Delay ~ V_DD× V_DD− V_t− α

Lowering the supply voltage reduces the heat being generated. This, in turn, lowers the static power consumption. Lowering the supply voltage also increases gate delays, however. By comparison, lowering the transistors' switching thresholds gives them speed. Yet it also exponentially increases their leakage and thus their static power consumption.

One solution is to only use low V_t transistors on timing-critical paths. On non-critical paths, high V_t transistors should be employed. This option may be combined with the use of multiple voltage domains.

Another technique is to power-down selected blocks whenever those device portions are not required. Switching entire blocks on and off can cause significant current surges, though. The use of additional circuitry may then be needed to provide a "soft" (staged) power on/off for these blocks.

POWER DISTRIBUTION When it comes to power distribution, the first problem is to get the power from the outside world through the device's package and to the silicon chip itself. The size of the wires used to distribute power throughout the chip corresponds to both their resistances and the associated voltage drops. If the wires are larger, their resistances and associated voltage drops also will be greater. It follows that in the case of today's extremely large and complex designs, the traditional packaging technologies based on peripheral power pads are no longer an acceptable option.

Flip-chip-based packaging is a solution, however. The pads located across the die's face are used to deliver power from the external power supply directly to the chip's internal areas. Clearly, this approach is able to support many more power and ground pads. It also minimizes the distance that the power has to travel to reach the internal logic. Furthermore, the inductance of the solder bumps used in flip-chip packages is significantly lower than that of the bonding wires utilized with traditional packaging techniques.

When dealing with packaging, temperature and performance considerations also must be taken into account. Power consumption—both static and dynamic—increases a device's operating temperature. Engineers may therefore need to employ expensive device packaging and external cooling technology.

To accommodate variations in operating temperature and supply voltage, designers have traditionally been obliged to pad device characteristics and design margins. It is inefficient, however, to create a device's power network using excessively conservative design practices. This method consumes valuable silicon real estate, increases congestion, and results in performance that is significantly below the silicon's full potential.

Another consideration is the on-chip temperature gradient. It is defined as the difference in temperatures at different portions of the device. These differences are caused by unbalanced power consumption. The on-chip temperature gradient can produce mechanical stress that may degrade the device's reliability.

VOLTAGE-DROP EFFECTS Deep-submicron devices also are prone to voltage-drop effects. The resistance that is associated with the network of wires causes these effects. That network is used to distribute power and ground from the external pins to the internal circuitry. In the case of DC-related voltage drops, these effects are often referred to as IR drop effects.

For a simple example, consider a chain of inverter gates that is connected to the same power and ground tracks. Every power and ground-track segment has a small amount of resistance associated with it. The logic gate that is closest to the IC's primary power or ground pins is therefore presented with the optimal supply. The next gate in the chain will be presented with a slightly degraded supply, and so on down the chain.

This problem is exacerbated in the case of transient or AC voltage-drop effects. Such affects occur when gates are switching from one value to another. They also take place when entire blocks are switched on and off. This scenario is considered even worse, as it causes transitory power surges. Those surges momentarily reduce the voltage supply to gates farther down the power-supply chain.

Voltage-drop effects are considered vital because of the input-to-output delays across a logic gate. Those delays increase as the voltage supplied to that gate is reduced. Such a reduction can cause the gate to miss its timing specifications. An increase in interconnect delays and a susceptibility to crosstalk effects also are associated with the wires driven by underpowered gates. Furthermore, a gate's input switching thresholds are modified when its supply is reduced. The gate then is more susceptible to noise.

Voltage-drop effects are gaining significance. The resistivity of the power and ground tracks rises as a function of decreasing feature sizes or track widths. These effects can be minimized by increasing the width of the power and ground tracks. Yet this approach consumes valuable silicon real estate, thereby causing routing congestion problems. To solve these problems, the logic functions have to be spaced farther apart. This solution increases delays (and power consumption) due to longer signal tracks. Thus, implementing an optimal power network requires the balancing of many diverse factors.

ELECTROMIGRATION EFFECTS When the current density or current per cross-sectional area in the tracks is too high, electromigration occurs. In the case of power and ground tracks, electromigration effects are DC-based. The so-called "electron wind," which is induced by the current flowing through a track, causes metal ions in the track to migrate. This migration creates "voids" in the "upwind" direction. Meanwhile, metal ions can accumulate "downwind" to form features called "hillocks" and "whiskers."

In power and ground tracks, electromigration causes timing problems. The increased track resistance that is associated with a void can result in a corresponding voltage drop. In affected logic gates, this drop will produce increased delays along with increased noise and crosstalk susceptibility.

Power and ground electromigration also may cause the occurrence of major functional errors. The voids may eventually lead to open circuits. At the same time, the hillocks and whiskers may trigger short circuits to neighboring wires.

The majority of today's design environments concentrate on analyzing and addressing power considerations toward the back end of the design process. It is almost impossible to fix any problems that were caused by the poor decisions made during the early design stages. It follows that a key requirement for a true low-power, high-quality-of-results design environment is to provide concurrent analysis. Using whatever data is available at the time, the environment should analyze effects like voltage drop on timing and signal integrity. Then, it should successively optimize the design as more accurate data becomes available. Such an approach would allow potential timing, power, and SI problems to be identified and resolved as soon as possible.

Creating optimal low-power designs involves making tradeoffs at different stages of the design flow. Such tradeoffs include timing-versus-power and area-versus-power. To enable designers to accurately and efficiently perform these tradeoffs, low-power and SI optimization techniques must be integrated with and applied throughout the entire RTL-to-GDSII flow.

A number of very sophisticated power, SI, and timing-analysis tools are available to designers. These tools are typically provided as third-party point solutions, however. They are not tightly integrated into the main design environment. As such, they require the use of multiple databases. Or, such tools combine disparate data models into one database. The design environments based on these tools have to perform internal or external data translations and file transfers. Data management is therefore cumbersome, time-consuming, and prone to error.

The correlating of results from different point tools can be difficult. Problems may be discovered late in the design cycle. Or they may never be detected at all. Perhaps the most significant problem with existing design environments, however, is that power, timing, and SI effects are strongly interrelated in the nanometer domain. Yet the conventional point-solution design tools cannot consider all of these effects and their interrelationships concurrently.

To fully account for the impact of voltage-drop effects, for example, one needs an environment that can derate for timing on a cell-by-cell basis and based on actual voltage drops. The timing-analysis engine should then make use of this derated timing data to identify potential changes to the critical paths. In turn, the optimization engine should make modifications that address any potential setup or hold problems, which appear as a result of the timing changes.

In such a design environment, the power analysis, voltage-drop analysis, derating calculations, timing analysis, and optimization engines all work together without any problems. If this type of integrated environment does not exist, one would have to transfer large amounts of data, such as SDF files, between the different point tools. The engineer would then have to iterate between them to address the timing and signal integrity problems caused by voltage-drop-induced delays.

This lack of integration between the analysis and optimization engines can result in numerous time-consuming design iterations. For example, say the results from the power analysis are used to locate and isolate timing and/or signal-integrity problems. The act of fixing these kinds of problems may introduce new problems into the power network. As a result, solving them may then cause more trials and tribulations.

Using point-solution power-analysis tools can result in non-convergent solutions. Such solutions prevent designs from meeting their time-to-market windows or from being realized at all. In contrast, a true low-power, high-QoR design environment should have all of the power-analysis tools operating concurrently with the implementation tools. These tools include synthesis, place-and-route, clock-tree, extraction, timing, and signal-integrity analysis. Furthermore, all of the environment's tools should operate on a common data model. They will have concurrent access to analysis data, thereby making it possible to perform "on-the-fly" changes to the design.

To address the power, timing, and SI problems that are associated with today's DSM devices, a process must have design, analysis, and optimization tools. These tools must work throughout the entire RTL-to-GDSII design flow. After all, identifying and resolving problems late in the flow requires expensive, time-consuming iterations. Overly conservative analysis and design is not a viable solution either. What is required is the ability to identify and resolve problems throughout the flow. Also, it must be possible to "forget" issues once they have been addressed and made "safe."

To handle the complex interrelationships between diverse effects, it is necessary for the power, timing, and SI engines to be fully integrated with each other. They also must be integrated with the other analysis and implementation engines in the flow, including synthesis, place-and-route, voltage-drop derating, and optimization. All of the implementation and analysis engines must therefore have concurrent access to the design data via a common data model. Any changes made by one tool must be immediately tested and validated by the others. This approach results in a convergent algorithm that quickly determines optimal solutions. It also provides better QoR without resorting to time-consuming iterations.