Cooling Techniques Attack MPU Processing Heat

The continuing evolution toward higher-performance microprocessor units (MPUs) has revolutionized the design of computers large and small. This evolution has generally followed Moore’s law—the semiconductor industry doubles transistor density every two years while increasing performance with each new generation. Increased performance has contributed to a rise in microprocessor chip power dissipation and power density.

An example of the heightened power dissipation can be found in the 2007 edition of the International Technology Roadmap for Semiconductors (ITRS). It says there is now a maximum power dissipation of approximately 120 W due to package cost, reliability, and cooling cost issues.¹

Starting with this ITRS power-dissipation statement, the 4.7-GHz MPU clock frequency is projected to increase by a factor of at most 1.25 times per technology generation. Power dissipation is estimated to reach 200 W/cm2 by the end of the 2008 ITRS timeframe. MPUs that continue using existing circuit and architecture techniques would exceed package power limits by a factor of nearly 4 by the end of 2020.

SOME OPTIONS TO TRY One approach to cutting power dissipation is to reduce powersupply voltage, which is driven by reduced transistor channel length and the reliability of gate dielectrics. Even with lower supply voltage, total power consumption will continue to increase, driven by higher chip operating frequencies, the higher interconnect overall capacitance and resistance, and the increasing gate leakage of exponentially growing and scaled on-chip transistors.

MPUs must control their operating temperature, which affects reliability as defined by their failure rate, or useful system life in failures per 10⁶ hours (Fig. 1). The Arrhenius reliability model states that failure rate is a function of the temperature stress—the higher the stress, the higher the failure rate. Typically, each 10°C rise in temperature causes a 50% increase in the failure rate. Conversely, cutting the operating temperature by 10°C reduces the failure rate.

Thus, failure rate and its inverse, mean time between failures (MTBF), is one measure of thermal-management effectiveness in electronic systems. In dealing with thermal problems, the electronic system designer will have to enter the domain of the packaging and thermal design engineer.

Besides reliability and performance issues, a microprocessor’s thermal management also involves economic and mechanical challenges. Cost is obviously an important consideration. Equally important are size considerations when trying to accommodate increasingly higher-power microprocessors, especially in laptop computers.

“Most of today’s high-performance microprocessors use an area array, flip-chip interconnect scheme to connect the active (circuit) side of the die to an organic or ceramic package substrate. The package substrate is either soldered to the computer motherboard through a grid array of solder joints or has pins that are inserted into a socket that is soldered to the motherboard (another alternate socket is the land grid array socket where socket fingers contact pads on the surface of the package),” says R. Mahajan, et al.²

“In all cases, when dealing with high cooling demand, and in attempting to establish cooling envelopes, a reasonable first-order assumption is that the bulk of the heat will have to be removed from the inactive side that is farther away from the motherboard. Given the limited airflow and the presence of significant amounts of lower thermal conductivity organic material on the active side, this is a reasonable first assumption,” Mahajan continues.

“There are two thermal design architectures,” says Mahajan (Fig. 2). “Architecture I is one where a bare die interfaces to the heatsink solution through a thermal interface material (TIM) and Architecture II is one where an integrated heat spreader (IHS) is attached to the die through the use of a TIM and the heatsink interfaces to the IHS through a second TIM. Architecture I has a lower profile compared to Architecture II and is often used for microprocessors in mobile and handheld computers. Architecture II is typically used for microprocessors in desktop and server applications.”

HEATSINKS The most widely used thermal-management device, the heatsink, transfers heat by conduction from a microprocessor to a specially constructed metal plate. The most common heatsink type has many metal fins. The metal’s high thermal conductivity and large surface area transfer the heat from the microprocessor to the heatsink and then to the surrounding air. The heatsink’s ability to transfer heat depends on its material, geometry, and overall surface heat transfer coefficient.

Heatsink material is usually aluminum or copper, which is more expensive and heavier than aluminum. Compared with copper, aluminum has the advantage of being more easily formed and shaped into different geometries. Heatsinks with fins come in many forms: extruded, cold forged, die cast, milled, bonded, and folded. Some heatsinks consist of a series of round pins force-fit into a baseplate.

A key parameter in using a heatsink is the thermal resistance of the associated microprocessor package, which is its ability to conduct heat away into the surrounding environment. A design goal is a low thermal resistance value for a given amount of power, which allows the microprocessor’s junction to operate at an optimum temperature and provide a longer useful life.

Continue on Page 2

The flow of heat without a heatsink causes the dissipated heat to flow in all directions. With a microprocessor heatsink, the heat passes from the case to the sink before being emitted into the air. Thus, the heatsink increases the effective heat dissipation area and removes heat from the microprocessor, permitting it to operate at higher power levels.

Figure 3 shows heat flow with a heatsink. Schematically, thermal resistances are represented as resistors, although they are really the equivalent thermal values. Mathematically, thermal resistance is the rise in the junction temperature above the case temperature per unit of power dissipated in the device:

θjc = (Tj − Tc)/Pd (1) θja = (Tj − Ta)/Pd (2)

where:

θjc = thermal resistance from junctionto- case in °C/W, which is a function of the microprocessor and its package

θja = thermal resistance from junctionto- ambient in °C/W

Tc = microprocessor case temperature in °C

Ta = ambient air temperature in °C

Tj = microprocessor junction temperature in °C

Pd = microprocessor power dissipation in W

THERMAL INTERFACE MATERIALS Ideally, heatsinks require intimate surfaceto- surface contact with the microprocessor to be cooled. But in the real world, irregular surface areas on the microprocessor and heatsink prevent this intimate physical contact. Therefore, some type of thermally conductive interface material is necessary to fill any gaps between the mating surfaces. In many cases, this interface material must also act as an electrical insulator and thermal conductor.

These materials range from thermal grease to adhesive tapes and phase change materials that are solid at room temperature and flow at higher temperatures. Certain cure-in-place compounds and elastomeric pads can act as gap fillers. Figure 3 shows the thermal resistances between the microprocessor junction and ambient air temperature. Without a heatsink:

θja = θjc + θca

Here, θca (case-to-ambient) thermal resistance is a relatively high value. Therefore, θja is also high and the microprocessor may operate at a relatively high, unreliable temperature. With a heatsink:

θja = θjc + θci + θis + θsa

where:

θci = thermal resistance from case-tothermal interface in °C/W

θis = thermal resistance from thermal interface-to-sink in °C/W

θsa = thermal resistance from sink-toambient air in °C/W

Here, the heatsink and thermal interface material reduce the overall thermal resistance, and the microprocessor operates at a reliable temperature.

HEAT PIPES Heat pipes are an ideal heat transport means for relatively long distances—for example, if a sensitive heat source is separated from a remote heat exchanger or heatsink. Heat pipes are also good heat spreaders. When combined with folded aluminum fins, they can be used as efficient and lightweight heatsinks.

A heat pipe consists of a sealed aluminum or copper container whose inner surfaces have a capillary wicking material (Fig. 4). A liquid under its own pressure in the container enters the pores of the capillary material, wetting all internal surfaces. Applying heat at any point along the heat pipe’s surface causes the liquid at that point to boil and enter a vapor state.

When that occurs, the liquid picks up the latent heat of evaporation. The gas, which then has a higher pressure, moves inside the container to a colder location where it condenses. Consequently, the gas gives up the latent heat of vaporization and moves heat from the input to the output end of the heat pipe. Heat pipes have an effective thermal conductivity many times that of copper.

The container isolates the working fluid from the outside environment. Therefore, it must be leak-proof, maintain the pressure differential across its walls, and enable the transfer of heat to take place from/ to the working fluid. It also must be nonporous to prevent diffusion of vapor.

Working fluids have to be compatible with the wick and wall materials, offer good thermal stability, provide wettability of wick and wall materials, and have a vapor pressure that’s appropriate for the operating temperature range. Also, a high latent heat of vaporization is desirable to transfer large amounts of heat with minimum liquid flow and to maintain low pressure drops within the heat pipe.

Continue on Page 3

HEAT SPREADERS Heat spreaders are used in die-level packaging to spread heat from a microprocessor chip into its associated heatsink. One type of heat spreader is a natural graphite sheet with anisotropic thermal properties: it exhibits a high thermal conductivity in the plane of the sheet and a much lower thermal conductivity through the thickness of the sheet.

This allows a natural graphite sheet to function as both an insulator and heat spreader that eliminates hot spots in microprocessor chips. Also, because of their excellent flexibility, natural graphite materials can conform well to surfaces under low contact pressures. This combination of properties makes natural graphite a potential substitute for aluminum and copper materials as heat spreaders.

Made from natural graphite, Graftech’s Spreadershield products distribute heat evenly while providing thermal insulation. They offer a variety of in-plane thermal conductivities, from 300 to 500 W/m-K. By eliminating heavy thermal solutions, these products slim down product design and reduce product weights by up to 50%.

THERMOELECTRIC MODULES Thermoelectric modules (TEMs) must be placed between the MPU die or package and a heatsink (Fig. 5). Power dissipated by the TEM must be dissipated by the heatsink, which can result in higher ambient temperatures at the heatsink that may impact downstream components. TEMs have low efficiency because they consume more power than they transport.

TEMs employ the Peltier effect, which produces rapid heating or cooling of electronic components. These solid-state devices have no moving parts, making them reliably maintenance-free. They’re often used to eliminate hot spots on an MPU.

Applying a low-voltage dc to the TEM causes one side to cool down and the other side to heat up. Cooling is proportional to the amount of current applied. Varying the current applied and the direction of current provides tight temperature control in cooling applications.

A typical module has two wires for the application of power, which must be the correct polarity for cooling. The wrong polarity will heat rather than cool. If the TEM cooling fails, the results can be disastrous. Also, don’t apply power to a TEM without a heatsink, or it may overheat and fail.

A TEM can damage an electronic circuit with condensation because it’s possible to cool components below ambient temperature. The exact temperature at which condensation occurs depends on the ambient temperature and humidity.

THERMAL-ANALYSIS SOFTWARE Before committing a design to production, it’s a good idea to evaluate its thermal characteristics. Several software programs can perform this evaluation. For instance, Flomerics’ Flotherm 3D simulation software for thermal design of electronic components and systems enables the creation of virtual models of electronic equipment (Fig. 6).

Flotherm also performs thermal analysis and test design modifications quickly and easily in the early stages of the design process well before any physical prototypes are built. It uses advanced computational fluid dynamics (CFD) techniques to predict airflow, temperature, and heat transfer in components, boards, and complete systems.

In evaluating thermal-analysis software for electronic systems, it’s imperative for the user to have readily accessible technical support from the supplier. The user should consider the modeling methodology, definition of a system for analysis, creation of a computational grid, solution and control features, and presentation of the results.

DYNAMIC POWER MANAGEMENT Heatsinks and fans can only go so far to cool microprocessors. However, sleep and suspend modes can also reduce power consumption. This has led to new circuit techniques, called dynamic power management (DPM), that reduce a microprocessor’s average power dissipation by dynamically reconfiguring a system to lower power consumption during lowworkload periods.

In principle, DPM identifies low-processing- requirement periods and reduces operating voltage (voltage scaling) and/or frequency (frequency scaling) to reduce operating power consumption. This technique is called dynamic voltage and frequency scaling (DVFS). Furthermore, during these low-power-requirement periods, idle circuits can be turned off to provide even lower power consumption.

Proposed DPM solutions can be categorized as either predictive or stochastic. Predictive schemes attempt to predict a device’s usage behavior in the future, based on past experience. Stochastic techniques make probabilistic assumptions based on usage-pattern observations. To be effective, DPM must account for the time it takes to change a power-supply voltage. Plus, the processor must be able to operate reliably when its supply voltage or clock rate changes.

REFERENCES
1. 2007 International Technology Roadmap for Semiconductors (ITRS).
2. R. Mahajan, et al, “Cooling a Microprocessor Chip,” Proc. Of IEEE, August 2006, p. 1476-1486.