It's Time To Consider Temperature Gradients In IC Design

As IC design complexity and sensitivity rises and operating voltages fall, the use of constant (average) chip temperature models can lead to over-design or multiple design spins. The variation in temperature across a chip, as well as its impact on power consumption, reliability, and timing, should be considered throughout the design flow. The next important breakthrough in design integrity is to understand local temperature variations and their impact on the overall design.

Due to the exponential relationship between sub-threshold leakage power and temperature, any prediction of leakage power should include an analysis of temperature. The relationship feeds on itself--increased leakage creates an increase in temperature, which in turn creates increased leakage. This scenario can potentially end in thermal runaway unless the package can dissipate the extra self-heating due to leakage power. Another impact of rising leakage power is an increase in IR drop on power-supply structures, creating the potential for performance impact on the design's active circuitry.

Designing with worst-case assumptions produces an exponential over-estimate of the voltage-drop constraint and results in an overly pessimistic design. Worst-case assumptions have yet another huge drawback: worst-case means worst-case constant temperature assumptions. That means worst-case timing conditions are ignored, because race conditions in timing are affected more by temperature differences than by any constant temperature across signal paths.

To thoroughly analyze electromigration and reliability, designers should include an analysis of self-heating in the wires to determine the localized temperatures of wires for power, high-drive signals, and clock nets. The local temperature of the wire would then be used to determine electromigration margins, because these localized effects can lead to a decrease in mean time to failure (MTTF). In addition to power, high-drive signals, and clock nets, the increase in resistance due to increased wire temperatures contributes to the IR drop and, hence, the timing.

To thoroughly understand the performance of an IC design, designers must consider the impact on IR drop caused by power variations. During timing analysis, these power variations cause significant temperature gradients that account for the impact of localized temperature on cell and interconnect delays

Noise margins also can be impacted by temperature variation on the cells and by interconnect of potential crosstalk victims and aggressors. Failure to analyze the impact of temperature variation on final timing could lead to design respins to correct timing errors, causing a delay in design closure.

Thermal problems on ICs can often appear to be corrected with electrical fixes. But without a clear understanding of thermal variation across a chip, excessive power consumption may appear to be due to poor leakage characterization and/or improper power estimation. Such excessive power consumption might be resolved by decreasing overall power consumption through existing circuit design methods or by moving to a more expensive package. However, leakage power will be overly worst-cased in this method. Even with an improved package, the problems due to temperature gradients can still cause a chip failure.

Timing issues that surface as signal-integrity problems, such as crosstalk, may be caused or exacerbated by temperature variances across the chip. Timing problems might be repaired by changing the interconnect parameters or upsizing a driver. But costly and time-consuming repairs made after a chip is manufactured can be avoided with a more thorough analysis and understanding of how and where temperature might impact design integrity.

The lack of a thermal analysis solution has created problems in current design methodologies. Multiple respins performed to fix temperature-related problems can add months to a design cycle, increasing design costs. If a more expensive package is required to fix the problem, the added cost may be equal to or greater than the profit margin for the packaged part.

Parametric failures due to temperature gradients don't only occur in digital design; analog designs can be victimized as well. With the move toward SoC designs, temperature gradients are often intractable. Circuit complexity, failures in bandgap reference circuits, and other temperature-gradient-sensitive circuits such as ADCs, DACs, and comparators have been the culprit in several instances. As mask costs rise for such SoC designs, standard methods involving manual estimates based on test chip measurements often result in costly respins of silicon. Test-chip temperature measurements require strategic placement of the temperature sensors to detect the temperature profile. Without a priori knowledge of temperature variation on the chip, placement of the temperature sensors can be misguided and may not provide an accurate assessment of the temperature gradients.

For maximum benefit, then, thermal analysis should be utilized early in the implementation stage. Defining the chip's thermal signature begins in the placement stage in prototyping or physical synthesis. At this point, the design's active elements are positioned on the die and define the initial input into thermal analysis. With a good power methodology, designers may think that the high power location will relate directly to the highest temperatures on the die. However, many factors can make this assumption incorrect -- package characteristics, die-to-package attachment materials, bonding/bump connections, and heat dissipation through materials on the die.

Designers should do thermal analysis at the physical synthesis stage to determine the temperature magnitudes and variance across the die. At this early stage, placement can be optimized, with temperature taken into account to avoid time-consuming modifications later in the design flow. Thermal analysis can be utilized in each step of the design flow, from power distribution to clock distribution, signal routing, and signoff, to ensure that the impact of temperature on design integrity is considered.

Thermal analysis is the next level of analysis required to ensure design integrity for temperature-sensitive designs at all process nodes, especially for complex designs at 90 nm and for all 65-nm design flows. Understanding the temperature gradient on a chip will give designers insight into the actual performance, reliability, and power consumption of their designs.

Rajit Chandra is the founder, president, and CEO of Gradient Design Automation. Before starting Gradient, Chandra co-founded Moscape Inc., which specialized in software solutions for signal integrity in chip designs. Moscape was acquired by Magma in August 2000, and Chandra held the title of VP of technology at Magma following the acquisition. He left Magma in 2002 to pursue his interest in the challenges of nanometer-scale semiconductor design. Chandra holds engineering degrees from the University of Calcutta, India, and a PhD (EE) from London South Bank University, UK. He can be reached at [email protected].