Recently, one industry analyst noted that server farms in Seattle are dissipating as much power as the rest of the Seattle metro area, including the Boeing plants in Everett. Also, 25% of the total cost of their operation is for power and air conditioning. The idea of electronics being a low-cost, environmentally friendly industry is being challenged by the reality that this equipment may be one of the drivers for the rolling West Coast blackouts. This power consumption is transformed to heat, forcing designers to address thermal issues in addition to the other difficulties of creating new designs for ICs and equipment.
Similarly, power densities for other parts of our environment are as follows: A typical house uses from 0.1 to 1 W/ft2, with a peak power density of about 10 W/ft2. An office consumes from 5 to10 W/ft2 with a peak of about 25 W/ft2. The above-mentioned server farm falls in the range of 1 to 10 kW/ft2. All of these figures assume a constant power load with peaks only when restarting equipment. These power densities will continue to increase in all areas and cause concerns about thermal management from the end-user to the designer, as consumers demand solutions to their heat problems.
The power dissipated in a 2- by 2.5-ft rack can be 10 kW or higher. This is the heat equivalent of 100 100-W light bulbs inside a space of less than half a typical home refrigerator. Using existing methods to provide sufficient cooling to one such rack isn't particularly difficult, but cooling hundreds of these racks in a densely packed area requires a whole different approach (Fig. 1).
Beyond the heating and cooling problems, thermal management is concerned with classical heat issues. The reliability of ICs and systems is inversely related to the temperature relationship: the higher the operating temperature, the lower the reliability. As electronics equipment heats up, the operating conditions for the components cause changes in its operation. This affects timing, noise performance, and in the analog sections, parameter drift.
System Level Concerns: Because the systems use multiple ICs, the total system power is becoming a major problem. The systems are shrinking with more highly integrated chips on each board. But also the environment—noise, external ambient temperature, etc.—is growing more restrictive. These drivers increase the total power density within the board and the system.
Smaller sizes make adding extra cooling more difficult. Alternatives for shedding the heat can become very complex. Also, the costs of acquisition and ownership limit the acceptable options for removing or reducing the heat.
Heat can transfer from its source to the ambient through radiation, convection, and conduction. Radiation is fairly limited and hard to enhance because its characteristics are a function of surface properties, the best surface being a black sphere. On the other hand, convection and conduction are easier to change. Ways to improve local cooling through convection include forced-air cooling and heatsinks. Conduction occurs via the package leads and body, so pc-board construction and heat pipes are good thermal enhancements.
To address these issues, designers need accurate data on power consumption. The data must address thermal matters such as temperature gradients across the box. Sometimes the solution for high-temperature problems is to move the devices around to place the hot components in a cooling air flow. Or, it may be to allow more space for heatsinks. But moving components to solve thermal problems usually conflicts with the requirement to maintain short electrical distances between the fast components.
Tools For Analysis: Every design with thermal issues needs to go through iterations because each design change has to be analyzed for effectiveness. Thermal analysis tools implement computational fluid dynamics (CFD) to model the heat generation and removal functions. Most of the tools create a mesh around the various components to form the various boundary layers. This mesh usually isn't constant, but it alters resolution, depending on the components.
For example, a heatsink may have a fairly coarse mesh that models the thermal radiation characteristics of the cooler at some volume of air flow. But an individual component would require fairly fine detail to model local hot spots, the conduction paths through the leads, and the multiple thermal paths available for heat distribution and removal through the packaging.
FLO/MCAD from Flomerics enables parts and assemblies from any mechanical computer-aided design (MCAD) software—such as Pro/Engineer, I-DEAS, and Solid Designer—to be transferred easily and rapidly to and from Flotherm for thermal analysis. The interface program intelligently filters the geometrical data for a particular part or assembly and creates a simplified "thermal equivalent" for analysis purposes.
This data reduction step is critical, because production-quality MCAD solid models contain a vast amount of thermally insignificant geometric detail. Simply importing the geometry from the MCAD system into Flotherm will create a thermal analysis problem so complex that it will take weeks to solve.
A wiser approach is to simplify the geometry to a level that matches the thermal importance of the part, e.g., little or no simplification for thermally critical geometry, a lot of simplification for small or passive geometry. A few moments spent simplifying the problem can save days or weeks later.
The CFD process starts with preprocessing in building and analyzing a flow model. It includes building the model (or importing one from a CAD package), applying a mesh, and entering the data. After preprocessing, the CFD solver performs the calculations and produces the results. Postprocessing is the final step in CFD analysis, and it involves organization and interpretation of the data and images.
Avijit Goswaami, a director at Applied Thermal Technologies, says that the key to good thermal performance is to start the thermal design early. Thermal design is no longer a function that can be performed as an afterthought, because too many issues and parameters are thermally constrained within systems today. There's no other way than up from the start. Otherwise, the design is bound for trouble.
After developing a thermal model of the components and pc boards, the design team analyzes the CFD data and determines whether problems exist. As problem areas are identified, the thermal engineer tries to optimize the thermal processes. Usually, the evaluations are performed entirely in software, in a manner similar to the EDA tools. In some instances, the mechanical/thermal engineer will construct a thermal prototype, especially when the analysis indicates very little margin for error.
As an alternative, the designer can work on increasing conduction through such techniques as thermally enhanced packaging. Or, he or she could use bigger pc-board traces to draw the heat out of the packages into the boards through thermally en-hanced pc-board construction with power planes and ground planes thermally connected to the case. The design could also employ heat pipes. This external hardware takes advantage of the energy that a material absorbs when going through a phase change.
Other options include thermo-electric coolers and even liquid or gas refrigerants. All of these technologies help to move heat from a hot spot to some other area. However, increasing air flow across the heat-generating components only works if the air is at a reasonable ambient. It doesn't make sense to pump the hot air from one part of the system to cool off another section.
For example, Intel defines a heatsink and fan combination for Pentium processors. When someone uses a Pentium in a server, it may be in a 1U or rack-mount system, often very close to other boards and not at all like a PC's thermal environment. In one case, the heatsink/fan combination was driving all of the hot air into the system fan, which then moved all the hot air into the rest of the system. This cased the system to overheat. One solution turned off the Pentium's fan, which let the cooling system remove all excess heat. Another solution changed the Pentium's fan assembly and heatsink.
Prabhu Sathyamurthy, director of the Icepak business unit at Fluent, notes that designers must identify the critical areas within their design and understand how issues of air flow, air-flow distribution, and component temperature profiles affect the overall thermal environment. Prabhu noted that people need to know how to look at the problems to arrive at solutions. The intent is to get to the concept of the solutions and see where to change the designs to meet the thermal requirements.
Board-Level Tools: In the old design flow, the electronics engineer gives the design and layout information to the CAD department. Then the CAD group places and routes the pc board and gives the database to the mechanical engineering team for thermal analysis. Problems then occur, as the electrical and thermal requirements usually conflict. After some number of iterations, the compromise solution is produced.
To minimize the looping problems, many of the CFD tools interface to the CAD tools to perform the thermal analysis at the board level. Most of the tools have some library elements to represent common components like IC packages. Model detail is very important at this level because the amount of detail available is much greater than necessary. For instance, you could model the thermal resistance of each and every ball on a BGA package. But you wouldn't gain any accuracy for the very high cost in compute time for the fine-grained model versus a lumped thermal resistance for the whole array (Fig. 2).
Dave Rosato, president of Harvard Thermal Inc., says that tools need to analyze the interactions between the thermal characteristics and component placement at the board level. The various elements on the board must be analyzed for their individual contributions to the total power dissipation. Again, components that restrict air flow or generate large amounts of heat need careful analysis.
IC Tools: The inherent conflicts between the mechanical and electronics engineers require alternatives to minimize the design iteration loops that result from their disparate needs. The mechanical engineers demand that hot parts be separated and located near the exhaust port of the chassis, while the electronics engineers want the critical high-speed parts next to each other in the signal path. Of course, both groups are correct.
One alternative is to start with low(er)-power ICs. Power and thermal management tools haven't been very high profile until recently because the vast majority of designs didn't need to account for power or heat removal. Now, most EDA tools supply at least a perfunctory view of power consumption. The optimization parameters within synthesis have always included timing and power as related variables, but the power analysis function is moving from the back-end physical tools to the RTL stages.
The earlier one addresses power as an issue, the greater the effect of the decisions. In synthesis, manipulating the switches can change the total power as much as 10% to 15%. At the register transfer level (RTL) stage where the architectures are still being massaged, up to 90% power savings are possible. Moving power as a design parameter into the early design stages produces high-impact results, but at the expense of major changes in design philosophy and methodologies.
Wolfgang Nebel, vice president of engineering at ChipVision Design Systems Inc., opines that designers must shift their philosophy from gate-level optimization to higher levels of architectures and algorithms. This change involves more than just switching tools. The design team must expend more effort on early specifications and algorithm analysis, as opposed to the current approach of a quick and dirty general specification, which can lead to a poor choice of design solutions. Designers must base their decisions on evaluations of data, not just on design ideas.
Tools for power design must be able to estimate floor plans. They also must generate a thermal heat map of the estimated floor plan and generate a trend analysis, which is a graphical representation of how changes affect power consumption. These functions should interact with a scheduling graph that helps to identify potential timing problems because low power that doesn't meet timing will never tape out(Fig. 3).
The International Technology Roadmap for Semiconductors (ITRS) for 2001 forecasts that high-performance ICs will dissipate 200 W within the next five years. The easy power reductions afforded by cutting the supply voltage from 5 to 3.3 to eventually 1 V or less cannot keep up with the other factors driving power consumption. For most systems in existence, power is a function of clock frequency, total capacitance, and voltage squared (P µ fCV2). The total switching capacitance is a function of the gate count, and this is where the problems arise.
For high-end system chips, gate counts are now measured in units of 10 million. The gate capacitance is swamped by the interconnect capacitance, which is growing to over 10 km of total length. So power is increasing as a linear function of the product of capacitance and frequency, and both numbers are growing in line with Moore's Law, doubling about every 18 months.
Jerry Frenkil, vice president of advanced development at Sequence Design Inc., notes that the problems to be solved are changing with the designs. Thermal issues are just starting to become significant to most designers. These issues aren't just at the chip level, but at all of the higher levels of the system too. Therefore, designers must change their approaches to design problems and solutions.
Temperature effects include changes in delay, power handling capabilities, power analysis, and component lifetimes. In addition, electromigration becomes worse at higher temperatures. Not a static parameter, temperature varies in time and in place on the die. The amount of signal activity within a chip must be traded for performance, but both are becoming power limited.
An example of this trend is Transmeta's Crusoe chip. Originally de-signed for laptop computers, it's now moving into blade servers due to limits in the power dissipation capabilities within the boxes. Large systems cannot afford to add additional blade servers, even though there may be space available within the rack, be-cause of the power densities within the enclosures. Blade servers are failing due to the heat from one board overheating adjacent boards.
Lee Hansen, product marketing manager at Xilinx, states that design flows need to change, from the reactionary phase of correcting problems after they appear, to a proactive flow that looks at power and thermal issues early in the design stages and aids in the prevention of downstream problems.
The design tools for some FPGAs even look at various forms of power consumption. Xilinx has a tool originally developed for ASIC designers who wanted power analysis for the designs they were developing in the FPGAs.
There are two sets of users for this tool. The first, complex programmable logic device (CPLD) designers, looks for low-power implementations (like battery-powered equipment) and needs all possible information about power consumption within the design. The other group is designers who use many chips within their designs (such as network equipment) and need to address the overall thermal issues of the design.
These high-performance designs need to consider parameters such as available heatsinking, thermal flows, and the effect of clocking across multiple chips to manage the total power consumption and keep all the chips within reasonable thermal parameters.
|Need More Information?|
Applied Thermal Technologies
ChipVision Design Systems Inc.
Harvard Thermal Inc.
Sequence Design Inc.
The Uptime Institute
Texas Instruments Inc.