Cooling Strategies Must Get Smarter

With Semiconductors Adding Density And Handling More Power, Cooling Strategies Must Learn To Manage Modern-System Heat Loads.

Dave Keller

Nov. 1, 2002

8 min read

For an electronic device, heat removal is critical to assuring proper operation and long-term reliability. Elevated operating temperatures can introduce circuit stability problems. They also can take semiconductor junctions to the point where they break down electrochemically and fail. Over the long term, high temperatures can degrade low-power electrical components, insulation, adhesives, and other structural components. To further complicate reliability and cooling, more and more power is being forced through semiconductors. At the same time, the semiconductors are becoming more functionally dense. This trend is causing greater heat localization in smaller and smaller physical areas.

As a result of these changes, a process that was once as simple as adding a fan to move air through a system chassis has become a complex discipline. It requires expertise in design, materials science, manufacturing, and assembly. The first step in achieving effective cooling is to quantify the process. That way, subsequent steps can be taken in a deliberate fashion with some confidence in the outcome. Unfortunately, the flow of cooling air inside a cabinet is not always easy to predict. It can change considerably in response to fairly minor changes in the placement of equipment, cables, and other components.

A simplified model of enclosure cooling makes it clear that the engineer must account for all of the electrical power entering a system (FIG. 1). Any power that is not re-routed to external devices will ultimately be converted to heat within the enclosure. The dissipation of heat from the points where it is created must overcome the path's "thermal resistance." Furthermore, the final temperature of the equipment will always be some number of degrees above ambient temperature—unless pre-cooled air is forced through the system.

Thermal resistance is a function of the distance and materials through which the heat must flow. It is commonly signified with the lower-case Greek letter theta (θ). It is expressed in units of °C/Watt. A simplified equation that relates temperature to power dissipation is:

Temperature differential = system power dissipation × thermal resistance, or

T_SYSTEM − T_AMBIENT = P × θ

For example, take a system in which the net power to be dissipated is 100 W, the ambient temperature is 30°C, and θ = 0.25°C/W. Obviously, a system temperature of 55°C can be expected.

T_SYSTEM − T_AMBIENT = P × θ

T_SYSTEM = P × θ + T_AMBIENT

T_SYSTEM = 100 W × 0.25°C/W + 30°C

T_SYSTEM = 25°C + 30°C = 55°C

Note that the simplified formula treats heat as a point source. Real-world systems, in contrast, typically represent a much more complex, distributed heat source. Lower figures for θ are more desirable, as they indicate a smaller temperature rise for a given amount of energy introduced into a system. Thermal resistance can be reduced through means like increasing airflow, pre-cooling incoming air, providing adequate heatsinking, and using high-performance thermal interface materials between heat-generating devices and heatsinks.

Simple power devices, such as regulators and power transistors, begin to require external heatsinks at dissipations of around a few hundred milliwatts. But today's microprocessors and switching components are electronically denser and faster. The latest generation of microprocessors operates at a supply range of approximately 1.4 to 1.9 V and current levels of 40 to 50 amperes. They can dissipate upwards of 100 W of heat.

Though the maximum recommended microprocessor temperatures vary by model, they all range from approximately 65° to 85°C (149° to 185°F). For equipment installed in elevated temperature environments, this range does not provide much margin for cooling.

Because microprocessors are not always the most power-hungry components in a system, these figures do not represent worst-case cooling scenarios. Take the packet-switched-backplane (PSB) architecture for CompactPCI, which is described in the PICMG 2.16 specification. A CompactPCI switch-fabric system can consist of more than 20 cards. Of those cards, one or two will be switching cards. Power consumption in each of these switching slots can exceed 70 W. This high consumption will be added to the heat load produced by microprocessors and components on other circuit cards.

Finally, the cooling systems themselves usually impose an additional heat burden. Fan arrays can be placed at the intake or the exhaust side of the system. The choice is usually a tradeoff between fan reliability and additional heat stress on the system. Fans placed at the intake of a system draw cool air on themselves. But their exhaust heat is added to the heat load present in the system. In contrast, fans placed at the exhaust side of an enclosure draw hot air from the system across themselves. This may contribute to shorter bearing life.

System cooling methods actually form a hierarchy. They vary from an essentially "minimalist" approach to one which is highly developed, predictable, and efficient. Descriptions of these different techniques follow:

Convection cooling is the simplest cooling method, as it relies on hot air's natural tendency to rise. The convection approach is suitable for cabinets that are lightly loaded with equipment and therefore have low to medium power requirements. System cabinets are often designed to take advantage of convection cooling. They have openings in the sides and top to promote a chimney effect within the cabinet. Space around the equipment must therefore be maintained to assure adequate airflow.
The brute-force method adds to forced convection by scaling up the cooling system. It does this through the installation of more fans, faster fans, etc. They can thus increase the cost, noise level, and total heat load of the system while the cooling effectiveness diminishes. This method can increase system airflow by as much as 2X without increasing the rate of heat removal. But the rate of improvement decreases exponentially. Perhaps more importantly, increasing airflow does not automatically provide a more even distribution of cooling air.
Manual tuning is a more focused approach to cooling. Using baffles, ducts, and deflectors within a cabinet, it directs cooling air to particular slots or components. This type of flow control can make it more difficult to achieve uniform performance from chassis to chassis, however. Minor changes in types and locations of cables, boards, and components can alter the flow of air, which requires more manual tuning to optimize cooling.
To the commercial system's enclosure, vectored airflow is a new technology. Yet it has proven itself rigorous in military aircraft and marine applications. Vector-controlled airflow involves the use of a specially designed fan shroud and perforated baffles, which are inserted between the fan and card cage to control airflow. This technique can provide up to 4X the cooling capacity of conventional fan-driven, forced-air cooling strategies. The technology was developed by Raytheon (www.raytheon.com) under the name "Advanced Vector Controlled Air Flow" (AVCAF). It has since been licensed for use by Tracewell Systems for inclusion in its modified standard and custom system-enclosure designs.

AVCAF is seen as one of the few viable solutions to two problems: the higher heat loads and extreme cooling requirements associated with switch-fabric architectures, and the demanding cooling requirements of future high-performance telecom, datacom, and data-acquisition systems. AVCAF may seem similar to the concept of manual tuning, which can be imprecise and difficult to reproduce from chassis to chassis. Yet the basis of AVCAF technology is both quantifiable and repeatable. With AVCAF, the locations and sizes of holes in the AVCAF baffles are determined through a mathematical algorithm that facilitates the development of a precise airflow engineering model. Subtle adjustments in airflow can be made predictably. This, in turn, provides increased design efficiency and faster time-to-market.

Normally, AVCAF hardware is designed to provide a uniform flow of air both across and along each slot in the backplane. AVCAF component designs can be engineered, however, to concentrate airflow over specific portions of the system. Conventional wisdom might indicate that this arrangement would be restrictive to airflow. After all, structural components are being inserted between the fan and circuit cards. But the actual result is greater uniformity of flow with very little additional back pressure.

The air velocity through a system is illustrated in Figure 2. It is depicted in linear feet per minute (Y axis) according to slot position (X axis) and depth along each slot (Z axis). Entrance and exit velocities are shown for both a system without AVCAF and a system with AVCAF. The charts indicate that AVCAF can make air velocity significantly more uniform at both the entrance and exit. It can attain the same results for air pressure and flow.

The average velocity in all charts is approximately 1000 linear feet per minute (lfm). In the figure, the charts on the right better show the range of entrance and exit air velocities converging. They range from approximately 800 to 1100 lfm. The leftmost charts show a wider difference in air velocity. For entrance velocities, the span is approximately 200 to 2000 lfm. It ranges from approximately 600 to 1600 lfm at exit.

Power-hungry, high-performance electronic systems will obviously present increasing challenges to systems integrators. Faster switching circuits, increased component density, and enclosure crowding produce more waste heat. Recent developments in backplane technology can generate heat densities as high as 250 W or more per backplane slot. Compared to earlier generations of electronics, these densities can require 3X to 7X the cooling. Traditional forced-air-convection cooling is becoming increasingly inadequate for such high heat loads.

As a potential solution to this problem, the AVCAF technique has been field-proven. This engineered, repeatable cooling technology has been shown to achieve more uniform cooling-air velocity throughout a system enclosure. Improved cooling comes through the increased efficiency of airflow. Yet AVCAF doesn't create other problems, like the issues associated with increasing the speed and numbers of conventional fan-driven cooling systems.