How To Minimize Component Thermal Overstress Failures (Part 2)

Some practical tips harness the all-too-often ignored techniques of thermal design to improve product reliability.

March 19, 2001

18 min read

Part 1 of this article appeared in our Analog Supplement, Nov. 20, 2000, p. 23.

Temperature can accelerate the physico-chemical factors that influence numerous failure mechanisms in electronic components. By varying temperature appropriately, it's possible to accelerate life-testing methods to screen electronic components and weed out infant-mortality cases. This article uncovers the path to good thermal design.

Thermal overstress effects can be quite dramatic on both components and pc boards. Photographs bring home this point. The charring of a bipolar junction transistor due to thermal overstress caused by electrical overstress (EOS) is shown in Figure 1. When a device is subjected to more than its rated current or voltage and it exceeds the power dissipation defined by its safe operating area, EOS occurs. Another example of EOS-induced thermal overstress is revealed in Figure 2. Electrostatic discharge (ESD) can cause thermal overstress, too (Fig. 3).

Accelerated testing provides information about the lifetime distribution of a component within a compressed time-frame, thereby reducing the cost of testing. With semiconductor devices, accelerated testing is implemented by applying temperature cycling or other stressing factors. In addition to providing information on the life expectancy of the device, the testing exposes latent defects. Data from accelerated tests helps predict the reliability of components.

Accelerated tests use temperatures in the 75°C to 225°C range and humidity in the 50% to 90% range, depending on the failure mechanisms and category of device. A standard combination is 85°C with 85% relative humidity. Accelerated testing stimulates failure mechanisms, such as internal corrosion and metallic growth due to ion migration.

Tests that apply environmental stresses in addition to temperature are known as Highly Accelerated Stress Tests (HAST). A number of models, such as the Arrhenius model (which was explained in Part 1 of this article), the Eyring model, the Reich-Hakim model, the Peck model, and the Lawson model, are used to represent accelerated life-testing on electronic components.

The main objective of any screening technique is to accelerate the failure mechanisms and processes so that weak products—those that have intrinsic faults such as manufacturing defects—fail during testing. This eliminates the infant-mortality cases. Products that pass the screening test are regarded as being in their "useful life" phase of the familiar bathtub curve (Fig. 4).

A widely implemented screening technique is Environmental Stress Screening (ESS). In this procedure, environmental stresses are applied in an accelerated manner to force the failure of defective products. For electronic components that have a long, productive operating life, this technique is useful to screen components. Devices that survive ESS tests will perform well during their useful life phase until wear-out failure occurs.

Stresses used for ESS screening include random vibration, temperature cycling, thermal shock, high temperature, and electrical stimuli. A good ESS screen, as established by experience, constitutes the cyclic application of various temperatures at different rates for a specific number of times, with dwell times at different temperatures. Customized ESS screens, however, can be designed to reflect the operating environment in which the component has to work.

In ESS, subjecting the product to very high and very low temperatures at a fast rate accelerates failure mechanisms. The rapid temperature change creates stresses in the product due to the different thermal coefficients of expansion for the materials used. Plus, it triggers temperature-dependent failure mechanisms. The rate of temperature change is about 5°C to 10°C per minute between 10°C to 70°C, with a dwell time at each temperature limit of around 30 minutes.

Component-Level Screening Methods Before qualifying for use in high-reliability systems, components undergo three major types of testing: environmental, physical, and electrical characteristics. The widely used MIL-STD 202F outlines the involved test procedures. For electronic components, the burn-in standards are based on MIL-STD-883, MIL-STD-750, and MIL-STD-S-19500.

The salient features of the tests related to thermal effects are:

High-temperature burn-in. For burn-in testing, the device is subjected to temperature stress in the range for which it's rated—that is, 70°C for commercial devices and 125°C for military de-vices—for a duration ranging from 24 to 168 hours. The device is tested functionally by powering it up and applying test patterns so the testing is dynamic. The purpose here is to eliminate marginal devices that have manufacturing defects, including wire-bond defects, oxide-layer faults, and metallization defects. After burn-in testing, the component is tested electrically to determine whether its parameters have degraded and to study its characteristics. Burn-in testing at the device level is more economical than at the card or system level.

Temperature cycling. Exposing the devices to alternating periods of low and high temperature brings out defects like poor wire bonds, die-substrate attachment problems, cracks in the die, mismatch in the thermal coefficients of expansion from different construction materials, seal defects, and defects in the plastic packages. The temperature limits for cycling devices lie in the region of 40°C to 125°C or 65°C to 150°C, depending on the category of the device (industrial or military). The recommended number of cycles is 20 (at least 10). At temperature extremes, the dwell time should be at least 10 minutes. After cycling, the temperature is brought down to 25°C, and the device's electrical parameters are measured and compared with its rated specifications.

Storage at high temperature. In this test, the component is subjected to a higher temperature than with burn-in testing, but without applying power or stimuli. The temperature applied is around 150°C for plastic encapsulated devices and about 250°C for hermetically sealed devices. The component is kept at the test temperature for 24 hours. This test can reveal a variety of problems which include moisture entrapment, metallization defects, bulk defects in the silicon, ionic contamination, surface defects, oxidation, and contact defects.

Life at elevated temperature. This test evaluates how a part operates at an elevated ambient temperature for a fixed length of time. The test checks the de-vice's mechanical and electrical properties while it performs its function. Capacitance, dielectric strength, insulation resistance, and surge current are some of the parameters affected when temperature increases.

Thermal shock. This test is performed to evaluate the resistance of a component to extreme temperatures and cyclic exposure to extreme temperatures. A number of problems can spring up. For instance, cracking of surfaces, delamination, rupturing of seals, change in electrical characteristics, or leakage of filler materials (like electrolyte in a capacitor) might result.

Resistance to soldering heat. This test determines how well components withstand exposure to heat during soldering and cleaning processes. Soldering heat may cause a change in electrical characteristics, damage to mechanical parts (such as loosening of terminations), softening of insulation, opening of solder seals, and weakening of mechanical joints. For this test, the solder bath, in which the component is immersed for 10 ±2 seconds, is maintained at a temperature of 260 ±5°C. This is the preferred test for pc-board through-hole mounted components.

Basic Thermal-Resistance Concepts The flow of heat can be modeled in a manner similar to current flow in an electrical circuit. The path from the junction of a device through the die and encapsulation to the ambient creates a chain of thermal resistances, connected in series, to the transfer of heat.

Manufacturer datasheets provide information about thermal resistance from device junction to case and from device junction to ambient. Thermal resistance is denoted by R_θ, and its unit of measurement is °C/W. For example, when a device's thermal resistance from junction to case is 1.2°C/W, the temperature differential between the device junction and its case will be 1.2°C for a 1-W power dissipation.

The mathematical definition of thermal resistance is:

R_θ = ΔT°C/P_D

where ΔT°C is the temperature difference, and P_D is the power in watts, dissipated by the device during operation. Rearranging the above equation yields:

ΔT = P_DR_θ°C

Thermal resistances exist between different interfaces—between junction and case, between case and heatsink, and between heatsink and ambient. These values are specified on the device datasheets and in the heatsink catalogs.

Thermal device parameters are analogous to electrical device parameters: temperature is like voltage, thermal resistance is like electrical resistance, thermal power dissipated is like electrical current, and ambient temperature is like the reference ground in an electrical circuit. The concept of thermal resistance modeled along the lines of an electrical circuit is shown in Figure 5.

The thermal resistance from junction to ambient (R_θ_JA) is the sum of the thermal resistances of the paths involved: thermal resistance from junction to case (R_θ_JC), plus thermal resistance from case to heatsink (R_θ_CS), plus thermal resistance from heatsink to ambient (R_θ_SA).

R_θ_CSfrom case to heatsink depends on the thermal conductivities of the interface surfaces, such as insulating washers if any are used. R_θ_JC from junction to case depends on operating currents, voltages, and temperature. R_θ_SAfrom heatsink to ambient varies with temperature and is lower at high temperatures.

The objective of good thermal design is to achieve the minimum possible thermal resistance from the device junction to the ambient for heat to transfer efficiently from the junction to the ambient. Therefore, the values of the individual contributing thermal resistances must be as small as possible.

You can apply the concept of thermal resistance to estimate the junction temperature of a device during operation. Simply add the reference ambient temperature and the individual P_DR_θ product (which gives the ΔT):

T_J = T_A + P_DR_θ

where T_J is the junction temperature, T_A is the ambient temperature, P_D is the power dissipation of the device mounted on the heatsink, and R_θ is the thermal resistance from junction to ambient for the device.

You can use a heatsink to dissipate a device's power in the form of heat. This way, the temperature remains within specified limits. Heatsinks dissipate heat by the fundamental modes of heat transfer—conduction, convection, and radiation. They're available in all shapes and sizes to suit various device packages. When choosing a heatsink, the guiding principle is to select the one that has the largest surface area for a given volume. The material used to make the heatsink should also have a high thermal conductivity, ease of shaping into different configurations, ease of machining, wide availability, and above all else, low cost.

Aluminum Extrusions Work Well Aluminum meets these broad guidelines, and it is the material of choice. Extrusions of aluminum are shaped into various configurations for use as heatsinks, usually with finned shapes to increase the surface area for better heat emission to the ambient. Aluminum castings don't make good heatsinks because they're porous and can trap hot air. This reduces their ability to conduct heat efficiently. Moreover, cast aluminum is difficult to machine.

Small heatsinks, which must fit snugly into small device packages such as TO-18 and TO-39, are made of a beryllium-copper alloy. This material has a spring-like property that helps maintain tight coupling between the heatsink and the device, even after repeated thermal expansion and contraction of multiple heating and cooling cycles. This spring-like property, along with its high thermal conductivity, make the beryllium-copper alloy the material of choice for small heatsinks that are generally snap-fitted to the device.

The way a device is attached to a heatsink determines how efficiently heat transfers from device to heatsink. Avoiding air gaps and voids at the device-heatsink boundary improves heat transfer by reducing the interface thermal resistance.

Sometimes more than one device shares the same large heatsink. In these cases, an insulating washer placed between the heatsink and the device maintains isolation of the devices from each other and from the chassis. The washer generally consists of mica, rubber, plastic, or a similar insulating material.

Another insulating technique is to mount the devices directly on heatsinks made of anodized aluminum, which is coated with aluminum oxide for good electrical insulation. But this practice isn't recommended. Any scratch or surface blemish caused during assembly or later could damage the oxide film and create an unwanted electrical connection be-tween device and heatsink.

To improve thermal conductivity between device and heatsink, a thin layer of thermal grease is generally applied on the contacting surfaces. Such compounds contain a metal powder in a greasy medium. They fill up the surface pores caused by imperfections in the contacting surfaces, resulting in a larger contact surface area and better heat transfer. Synthetic-grease materials have better properties than silicone-oil-based compounds.

In recent years, thermally conductive insulating pads have replaced thermal grease and its associated mess. Aside from providing good electrical isolation between heatsink and device, these flexible pads are compressible and fit snugly between the device and the heatsink.

Heatsink color also is an important factor. Black is preferable because a black surface is a good absorber and radiator of heat. In addition, a dull, nonreflecting matte finish prevents the reflection of incident thermal radiation and absorbs heat better.

The best way to mount a heatsink is to keep it vertical in a system cabinet with vent holes at the top and bottom to achieve a good convection current flow. Cooler air gets sucked in at the bottom and becomes hotter as it passes up through the holes at the top.

Convection Is Often Sufficient Natural convection cooling is sufficient when the system doesn't have to dissipate a large amount of heat and the acceptable temperature limit can be maintained without fans or blowers. Both the amount of heat that can be naturally dissipated and how much temperature can be reduced depend upon air velocity, volume of air circulated, heatsink surface area, and the ambient temperature, among other factors.

A fan increases the cooling rate by speeding up air circulation. One is only needed in equipment where a large amount of heat must be continuously carried away at a fast rate. Apart from the power the fan needs, the added cost and space requirements are other points to consider when deciding whether fan-based cooling is necessary.

Suppose you must select a heatsink for a TO-220 transistor being used in a power-supply circuit. Assume that P_D is 10 W under peak load conditions, the device is mounted on its heat-sink with a 2-mil mica insulating washer, the contacting surfaces are coated with thermal grease, and T_A is 50°C. The transistor's datasheet specifies that typically R_θ_JC is 1.7°C/W and R_θ_CS is 1.6°C/W.

The R_θ_SAof the heatsink picked must be low enough to prevent T_Jfrom going beyond its 125°C maximum and causing device failure at the high ambient temperature of 50°C. The thermal equation is:

(T_J T_A) = P_D(R_θ_JC + R_θ_CS + R_θ_SA)

T_J = (R_θ_JC + R_θ_CS+ R_θ_SA)P_D + T_A

Substituting the device datasheet values of T_J = 125°C, T_A = 50°C, P_D = 10 W, R_θ_JC = 1.7°C/W, and R_θ_CS = 1.6°C/W yields:

125 > \[(1.7 + 1.6 + R_θ_SA)(10) + 50\]

R_θ_SA < \{\[(125 50)/10\] 3.3\}

R_θ_SA < 4.2°C/W

Therefore, the heatsink must have a thermal resistance of less than 4.2°C/W to prevent device junction temperature from exceeding 125°C, without any forced-air cooling. A variety of commercial heatsinks are available to meet common application requirements, and custom heatsinks can be designed for special cases.

While electronic circuit designers always look at the current, voltage, power, and timing specifications of a device, they seldom consider the thermal specifications in detail, except for total power dissipation. Let's look more closely at the thermal specs that can help to achieve a better thermal design. The datasheet for a TIP-120 power transistor lists the following thermal characteristics:

P_D = 65 W for T_C = 25°C (case), derated at 0.52 W/°C above 25°C
P_D = 2 W for T_A = 25°C, derated at 0.016 W/°C above 25°C
T_J (operating) and T_STG (storage) junction temperatures range from 65°C to 150°C
R_θ_JC = 1.92°C/W
R_θ_JA = 62.5°C/W

The 65-W P_D rating applies as long as the case temperature remains at 25°C. If you don't allow the device to be freely exposed to cooling air via a heatsink, P_D gets derated to 2 W at an ambient temperature of 125°C.

Likewise, the other specifications must also be derated by their appropriate factors if you expect the case temperature or the ambient temperature to be higher than 25°C. For example, if T_A = 35°C:

P_D = 2 W \[0.016 × (35-25)\]

P_D = 1.84 W

The device can handle just 1.84 W of power at 35°C if it's mounted without a heatsink that's exposed to air.

You can calculate the power dissipation for any ambient temperature:

P_D = (T_J T_A)/R_θ_JA

For T_J = 150°C and T_A = 50°C:

P_D = (150 50)/62.5 = 1.6 W

Similarly, when the device is mounted on a heatsink, the power dissipation at a case temperature of 90°C is:

P_D = (T_J T_C)/R_θ_JC

P_D = (150 90)/1.92

P_D = 31.25 W

This is the maximum allowed power dissipation. If you have to keep the junction cooler at, say, T_J = 120°C, the allowable P_D becomes:

P_D = (T_J T_C)/R_θ_JC

P_D = (120 90)/1.92

P_D = 15.625 W

which is nearly half of the allowable P_D when T_J can rise up to 150°C.

You have to select the appropriate heatsink to maintain the junction temperature within the limits that you impose in the design. Databooks generally give power derating curves for various case temperatures. Use these graphs to pick the derating factor for different temperatures.

To achieve optimal thermal performance, thermal design of an electronic circuit or a system should be an integral part of the design cycle. It's best to address thermal design in the initial design phase of the product development cycle, when thermal management costs the least. If thermal-mode failures of components and systems aren't identified until a later stage, it will be difficult to implement the changes needed to correct the problems.

Choose the component that meets your application re-quirements and the operating thermal environment. Use components with a history of good endurance based on screening test results. Because the reliability of semiconductor devices is higher when they're operated at lower junction temperatures, device parameters should be derated adequately to avoid operating devices near the limits of their specifications.

Additionally, even under worst-case conditions, device junction temperature limits must not be exceeded. Furthermore, adequate heatsinking should be provided where necessary to keep the device temperatures low, and ventilation must be sufficient for better air circulation around heatsinks and hot devices.

Forced Air May Be Needed High-power equipment that has to operate continuously may require the use of forced-air cooling. As a general guideline, it's recommended that device-cooling accessories, such as heatsinks, limit the junction temperature rise of semiconductor devices to within about 125°C. The lower the junction temperature, the better the conditions are for device reliability.

A word of caution: overspecifying heatsink requirements will increase cost without significantly reducing temperature. Similarly, beyond a certain limit, increasing fan speed and, consequently, air speed in a forced-convection cooling system won't greatly improve the cooling rate.

The aim of good thermal design is to achieve optimum cooling at minimum cost. The following steps will help you attain this goal:

Good thermal design begins with good board design. By selecting appropriate components and placing them properly on the board, you will avoid thermally induced failures in heat-sensitive components. Heat-generating components, like power devices and power resistors, should be mounted away from heat-sensitive components, such as electrolytic capacitors.
Provide adequate heatsinking for power-dissipating components and baffles, if necessary, for better air circulation.
Because hot air is less dense and rises, cool air should be passed from the bottom so it gets heated and rises. Mount cooling fans at the bottom of cabinets and provide holes near the top to ensure a chimney effect.
Keep air inlets and outlets away from each other to prevent hot air from getting sucked up into the chassis through the cooling fan.
Any dust filters used in the system cabinet should be kept clean to ensure proper air passage.
The key to better heat disposal is employing a heatsink with a large surface area and lower thermal resistance to the ambient.
Cooling accessories like heat-sinks and fans should be specified during initial design. They can't be added later as an afterthought. Such retrofits will have space constraints and limited impact on achieving better air circulation.
Derate your device's thermal specifications, depending on your application and the degree of reliability to be achieved. An 80% derating factor is a good guideline for the junction temperature in °C and the power-dissipation rating in watts.
Use a thermal simulation package, if it's available, to estimate the thermal profile of your board. Conduct a thermographic study of your prototype under actual operation to ex-pose any thermal problems. In a thermographic study, thermal imaging equipment, like an infrared system, creates a graphic thermal-profile picture of the heat distributions throughout a powered-up system, such as a pc board.

When all is said and done, you must achieve the right tradeoff between cooling requirements and the economics involved in your design, bearing the reliability factor in mind.

Currently there's a growing trend toward developing miniature electronic systems-on-a-chip in shrinking packages operating in an environment of high levels of thermal stress. Obviously, accomplishing high operational reliability will be a challenge for electronic designers in the new millennium.

Bibliography

"Attaching Heatsinks to Components," Susan Crum, Electronic Packaging & Production, July 1997
Cooling Techniques for Electronic Equipment, Dave S. Steinberg, John Wiley & Sons, 1991
"How to Select a Heatsink," Seri Lee, Electronics Cooling, Vol. 1, No. 1, June 1995
"Reliability Prediction of Electronic Equipment, MIL-STD-217F"
"Test Method Standards for Microcircuits, MIL-STD-883E"
"Test Methods for Electronic and Electrical Component Parts, MIL-STD-202F"
Thermal Stress and Strain in Microelectronics Packaging, John H. Lau, Van Nostrand Reinhold, 1993