Use Thermal Analysis And Other Types Of Simulation To Craft A “Cool” Design

Thermal analysis is critical in determining potential failure mechanisms in ICs, packages, and boards. What is often less well understood is how thermal analysis ties in with electrical and electromagnetic analyses. This article digs into those dependencies.

Byron Blackmore

Sept. 16, 2011

11 min read

1 of Enlarge image

Fig 1. This image shows the temperature distribution on the top surface of a USB memory stick PCB.

Fig 2. Shown is an isolated
view of thermal bottleneck distribution in the top layer of the memory stick PCB shown in Figure 1. White and yellow areas indicate where heat flows are most constrained.

Fig 3. Shown is a typical neck-down area on a power distribution net. The net in this view is color-graded by current density.

Fig 4. A large heatsink in a power amplifier module can mechanically stress the board and its connections.

The primary function of thermal analysis is to predict the temperatures of components and parts within a product. By visualizing these temperatures, heat fluxes, thermal bottlenecks, and missed shortcut opportunities, it seeks to eliminate any detected thermal compliance issues.

These temperature predictions are important to other analysis disciplines as well, as many real-world engineering materials are known to have temperature-dependent thermo-physical properties. For example, copper’s impedance increases with increased temperature even within common design temperature ranges. Temperature effects can therefore be critically important to the electrical design, especially for power distribution, signal integrity, and timing signals considerations

Moreover, there may be tradeoffs when deciding what is good for thermal performance and what is good for the rest of the design. Thermal analysis results, then, can influence other forms of analysis by forcing design tradeoffs and compromises.

Thermal Analysis Moves Ahead

For the past 20 years, computational fluid dynamics (CFD) techniques have provided 3D conjugate thermal simulation results that predict and display temperatures in and around electronic product designs. Thermal designers routinely use predicted temperatures to judge thermal compliance, simply by comparing the simulated temperatures to maximum rated operating temperatures.

If the operating temperature exceeds the maximum rated value, there will be at least a potential degradation in the performance of the packaged IC and at worst an unacceptable risk of thermo-mechanical failure. These techniques are commonplace today, with widespread adoption all across the electronics sector including heavy usage in semiconductors, telecommunications, automotive, aerospace, and consumer products.

The typical means of visualizing the predicted temperature field for a printed-circuit board (PCB) provides useful information (Fig. 1). However, the latest advances in thermal simulation also offer the calculation and display of thermal bottlenecks and shortcut opportunities (Fig. 2). These offer insight into the reasons why certain temperature distributions occur and how best to resolve thermal issues.

Electronic Materials and Temperature

Frequently the variance in thermo-physical properties for a substance is large enough across the expected temperature range to be a first-order design effect. A common example is the thermal conductivity of silicon, which decreases by approximately 20% as temperature increases from 350°K (~77°C) to 400°K (~127°C).

Of course, this has the tendency to exacerbate thermal problems at the die level. The hotter the die becomes, the more difficulty heat has in exiting the die due to the lower thermal conductivity value. This effect is often described as a “thermal runaway” scenario.

Copper is used extensively in the electronics industry, and it too can have key thermo-physical property changes over the expected range of operating temperatures. For example, the electrical resistivity of copper increases approximately 4% for every 10°C temperature rise within typical temperature ranges. That equates roughly to a 32% variation in resistivity over an 80°C span of temperatures.

This has a big effect on the dc resistance of the copper in the board and significantly impacts the voltage drop and current density within the board. As current density and resistivity directly cause joule heating effects, temperature impacts the power distribution, and the power distribution impacts temperature.

This strong interaction is one of the greatest design challenges in modern PCB design, as it adds complexity to any attempt to provide enough metal on the board for dc current needs. “Neck-downs” in the power distribution network will cause locally increased current density and will induce large joule heating terms, elevated temperatures, and associated changes in electrical resistivity.

Consider a typical neck-down on a power distribution plane (Fig. 3). A neck-down may be a narrow section of a plane, a via that is connecting the power supply to the plane or two planes together, or a narrow trace that is expected to carry tens of amperes.

Such neck-downs can, in severe cases, act like fuses that can lead to disconnected power situations and even mechanical failures. At the very least, these neck-downs cause a rise in temperature on the board. The temperature rise depends on how the surrounding metal is connected. Prediction of the rise requires a detailed thermal simulation of the board.

Both power and temperature affect transistor operation. In fact, transistor performance is usually partitioned into PVT corner cases that map variations in process, voltage, and temperature. The PCB design greatly affects voltage and temperature.

Standard I/O buffer models known as IBIS models are used in system simulations to characterize buffers by using I-V (current-voltage) and V-t (voltage-time) tables for each of the different PVT corners. For example, a CMOS buffer has a maximum corner with I-V and V-t tables for fast process, high voltage, and low temperature, respectively. It is important to consider all these factors to properly account for the many I/O buffer performance variations that can arise.

At IC process technologies of 90 nm and below, leakage currents begin to cause appreciable additional heat sources. These leakage currents have a non-linear, increasing relationship with temperature. At these process scales, the temperature is required to evaluate the power dissipation, and vice versa, making the inclusion of this relationship in the thermal management scheme a necessity.

Temperature also has a powerful effect on mechanical stress and strain. Most materials experience a significant decrease in Young’s modulus (e.g., a drop of approximately 20% from 50°C to 100°C for Sn-3.5Ag solder) and an increase in yield stress with an increase in temperature, as well as an associated rise in the coefficient of thermal expansion for that material. Including temperature effects during stress analysis simulation is critical to properly predict thermo-mechanical failure and reliability metrics.

Be aware of these temperature effects throughout the design process. The temperature dependence of physical properties in common electronic materials means thermal design must be coordinated with the power distribution, signal integrity, and mechanical failure analyses. In addition, many thermal issues stem from the temperature dependency of material properties in other analysis and design disciplines.

Electrical and Thermal Tradeoffs

Component placement commonly involves tradeoffs between thermal and electrical disciplines. The ideal placement from an electrical perspective often is the least desirable from a thermal management perspective.

A prime example is the placement of components as closely together as possible for purely electrical reasons. Shorter connections from pin to pin are generally good from a signal integrity standpoint. It is common practice to constrain routing with maximum allowable distances for particular connection—a direct result of prioritizing electrical considerations.

But this may conflict with the thermal management ideal. Placing components close together results in increased power density locally and leads to elevated temperatures among all the components in a group. When components are grouped tightly to improve electrical performance, the “thermal victim” effect may appear in components that would not otherwise pose a thermal challenge.

A second example is the thermal rule of thumb claiming that components with the largest thermal management issues (high powers and power density) should be placed as the near the leading edge of the board as possible to receive to the coolest possible air in a forced convection cooling system. But this may be impractical from an electrical perspective when components from diverse functional partitions are grouped together for reasons of routing, timing, and signal considerations.

In the field of IC packaging, there is the design problem of hot spots (zones of elevated power density) aligned vertically in stacked-die devices. This can have a drastic effect on the peak silicon temperature as well as the temperature gradients present across the dice. Moving the hot spots so they do not stack vertically is a sound approach, but it can add electrical and manufacturing difficulties in packaging. Moreover, it requires careful planning in the functional partitioning of the active surfaces.

EMC And Thermal Tradeoffs

There are further design tradeoffs to consider in the field of electromagnetic compliance (EMC) and its relationship to thermal issues. Here too, design proposals aimed at improved EM containment can detract from thermal performance.

This applies to many aspects of EM design. Consider a vent or perforated plate design. From the EMC perspective, each cooling vent in the chassis should have as small a free-area ratio as feasible. In fact, the best-case scenario is often to eliminate vents.

However, any reduction in the free-area ratio of a cooling vent will likely decrease the thermal performance of the design. A less open vent will impose more flow resistance through the system, reducing the amount of air that moves through the chassis, whether the movement is mechanical (fans or blowers) or buoyancy driven. A design compromise must be found that is acceptable to both design disciplines.

A second example is the use of shielding cans, which are metallic enclosures that envelop “noisy” components that pose particular difficulty for the EM design. While these cans can effectively attenuate the emissions from the components, they present additional challenges to the thermal design.

By placing a solid obstruction over a component, we are effectively removing a heat transfer avenue by reducing the convective heat transfer ability on the top of the component. This forces most of the heat to reach the ambient via the PCB, and changes to the local copper content and distribution in the form of fills and thermal vias may be needed to achieve satisfactory thermal performance.

A third example is found in heatsink design. Considering the thermal design only, a heatsink with more fins and more surface area is generally better at allowing heat to escape from a component. This isn’t always true of course, as heatsink geometry imposes an obstruction to air flow that must be considered, but for the purposes of this argument we’ll allow it.

Yet the EMC design may very well suffer as larger and larger heatsinks are proposed, as the heatsink may begin to serve as an “emissions antenna” and exacerbate EMC problems. The best heatsink for the thermal design may not be ideal for the EMC design.

Stress and Thermal Tradeoffs

The design of heatsinks can have further tradeoffs when considering mechanical stress as well. One common example of this is the size (and therefore the mass) of a heatsink design. Usually a bigger and heavier heatsink will yield better thermal performance than a smaller, lighter one (again, ignoring the complexity posed by reduced flow rates and questions of cost effectiveness).

The increased mass will cause more mechanical stress concerns, though. A heavier heatsink attached to a component may require additional mounting attachments. A vertically mounted heatsink may act as a “cantilever” with increased stress effects being observed at the heatsink attachment points (Fig. 4).

Further, the material selection can have important effects. Copper has excellent heat conduction characteristics (k ~ 400 W/mK) and is often used when thermal management is the foremost concern. However, the use of copper may have drawbacks on the stress design. Copper’s coefficient of thermal expansion is about six times larger than that of silicon, which imposes more challenges in some cooling strategies (such as through-silicon vias), as the materials will tend to strain at greatly differing rates.

Summary

Temperature predictions within electronic systems are still used primarily to compare the thermal performance of the design and judge thermal compliance. But it is vital to acknowledge the secondary effects of temperature on other design disciplines.

A design change targeted to improve thermal performance often will negatively impact another aspect of the design because the properties of copper, silicon, and other common materials have important—and differing—dependencies on temperature. These can complicate the design of power distribution, signal integrity, electrical timing, and mechanical stress solutions and should be built into the design as early as possible by utilizing thermal simulation techniques in parallel with other design flows.