Simulation Techniques Improve Electronics Cooling

Localized gridding, compact modeling and optimization methodologies speed thermal-management simulation.

Byron Blackmore, Applications Engineer, Flomerics, Marlborough, Mass.

Aug. 1, 2005

13 min read

For the PDF version of this article, click here.

When physical testing shows that component junction temperatures are too high, the usual approach is to add heatsinks and increase fan performance to solve the problem. Unfortunately, this type of solution, if even possible, drives up the cost of the product and can increase noise, reduce reliability, and have a negative impact on electromagnetic compatibility or other design aspects. Computer simulation, which can be performed either before or after prototypes are available, frequently reveals that the root cause of the problem is inefficient use of the air being driven through the enclosure, which is nearly impossible to detect with physical testing. This design example will explain how the latest technology in thermal-management simulation can be brought to bear on this problem.

Compact models of semiconductors, heatsinks and fans can substantially reduce the amount of time required to model the design and solve the model without reducing accuracy. One aspect of electronics cooling simulation is gridding, the process whereby the model volume is discretized into many smaller volumes for computational purposes. This process has recently been improved to allow volumes of disparate length scale to be positioned adjacent to each other efficiently.This “localized” gridding can provide high accuracy in critical areas while minimizing solution times. Optimization technology can automatically adjust the design parameters to address specified goals, such as meeting your thermal-management requirements while minimizing manufacturing costs.

Thermal performance is a critical aspect of nearly every electronics system design. But historically, it has been difficult to determine the thermal performance of a new design without actually building and testing it, which normally doesn't occur until relatively late in the design cycle. This helps explain why electronics cooling analysis has entered the engineering mainstream and is used today by most electronics OEMs and component suppliers from the early stages of the design process to ensure that thermal-management issues are identified when they can be inexpensively corrected.

But some companies that have used this technology have encountered difficulties with lengthy modeling building times required by complicated systems, lengthy compute times required to perform the simulation and the need to manually perform many design iterations in order to optimize the design.

New Cooling Analysis Methods

More recently, developers of electronic cooling analysis software have addressed these issues and provided tools that streamline the modeling, simulation and optimization processes.

The following design example illustrates some of these improvements while addressing the system-level thermal design of a 3U network server, which is based on a real product. The chassis of the server shown in Fig. 1 is 26 in. long × 17 in. wide × approximately 5 in. high. The main components include four power supplies, two printed circuit (pc) boards, a fan tray with three axial fans and pin fin heatsinks. The two pc boards include a motherboard with dual central processor units (CPUs) and dual memory-controller hubs (MCHs).

The chassis is configured to allow airflow from front to back, cooling first the components on the motherboard and then the power supplies. IDF files for each of the pc boards and a Pro/ENGINEER file define the chassis and fan tray geometry.

The electronics cooling analysis process begins by importing both of these files into the FLOTHERM simulation software. Then, power dissipation of the components is determined by gathering specifications from integrated circuit manufacturers. While this process is relatively straightforward, the make and model of the fan has not been specified yet. In this example, the mechanical design specifies the fan physical size and rated flow rate but does not include a fan curve. The dimensions of the fan are derived from the form factor (that is, it has to fit within the height of the box). For this model, the height of the power supplies dictate that the server enclosure will be a 3U-high chassis.

Fortunately, many fan manufacturers, such as JMC, Sunon, Comair Rotron and Delta, among others, have posted compact models of their most popular products on the SmartParts3D website. Each FLOTHERM fan model from www.SmartParts3D.com comes complete with a nonlinear fan curve, swirl settings and the correct geometry for the cowling and hub.

Calculating Flow Rate

Hand calculations roughly determine the flow rate that is required to cool the chassis. The table lists the power levels dissipated by the major components in the design. From this data, the power dissipation of the entire system is determined to be 333 W.

At this point, a popular rule of thumb provides a ballpark figure for the required flow rate and helps get the design started. Assume a certain temperature rise through the system (usually between 10°C and 15°C). This rule of thumb considers the problem from a bulk airflow perspective and may or may not be suitable when the system is considered in detail.

With a 10°C assumption, a simple energy-balance analysis can be applied to the server. Steady-state conservation of energy gives the following relationship:

Heat = air density × flow rate × specific heat of air × change in temperature

where heat is specified in watts (W), air density in kg/m³, flow rate in m³/s, specific heat of air in J/kg/K, and change in temperature in Kelvin (K).

With the known values inserted:

333 W = (1.16 kg/m³) × (flow rate) × (1005 J/kg/K) × (10 K)

yielding: Flow rate = 0.029 m³/s.

This value is the rated flow rate of a single fan. It is important to remember that the pressure drop through the enclosure will cause the fan to actually deliver significantly less than the rated flow rate. For first-pass fan sizing it's common (and conservative) to assume a fan will deliver 50% of it's rated flow rate. Assuming that one of the three fans is there for redundancy, there are two fans that should deliver the 0.029 m³/s. So, taking into account the 50% rule of thumb, each fan should be rated for 0.029 m³/s, which is approximately 61 CFM.

By inputting the fan size and flow rate at the SmartParts3D website, a suitable fan may be identified (Fig. 2). In this case, a search reveals that the Delta FFB0912HHE fan meets these criteria (Fig. 3). Once a fan is identified, the model of the fan is dragged into the cooling model within FLOTHERM. A similar process is then used to model the heatsinks. A website search for heatsinks that match this design's size and thermal requirements leads to the selection of Alpha Novatech's UB40-25B pin-fin heatsinks, which are made of 6063-T6 aluminum.

With all the important components now in place, another important challenge arises: how to simulate and optimize the design accurately and quickly. The heatsinks and components are critical from a thermal point of view and are thus defined with a high level of geometric detail.

With traditional cooling analysis software, in order to maintain accuracy at these important points, it would be necessary to use a very fine grid throughout the entire solution domain. This would result in a model with a large number of cells that would require a considerable amount of time, probably several days, to provide results. Although waiting that long for a single analysis run might be acceptable, the problem is that engineers typically need to perform many analysis runs in order to solve problems and optimize the design.

Cooling analysis software vendors have overcome this problem by providing localized gridding capabilities that make it possible to generate independent but interconnected grids for different parts of the model. The latest technology makes it possible to define localized grid spaces that are nested, internally abutting or externally abutting. The externally abutting feature is particularly useful when two or more parts are in conduction contact but have geometric differences that would benefit from different grid refinement levels.

Reducing Solution Time

Most electronic systems are cluttered with hundreds, sometimes thousands of objects. This means that a gridding system is needed, which is efficient both in terms of its memory requirements and its solution speed. In this case, localized grids are defined for the heatsinks and DIMMs because of their detailed geometry. This helps keep the size of the model to about 250,000 cells and the solution time to about 20 min on a single CPU (2.1-GHz) computer.

The software automatically generates the grid in the form of squares and rectangles that meet at their vertices. Because each cell has a predefined shape, the solver just needs to keep track of an index for each cell, as opposed to knowing the absolute location, shape and orientation of each cell. This reduction in calculation overhead greatly enhances the efficiency of this Cartesian grid system in comparison to unstructured grid options, and results in much smaller memory requirements and faster solutions for typical electronics problems.

The results of the simulation of the initial concept design show that MCH junction temperatures are at 110°C, about 10°C higher than the maximum levels (shown in the table), at an ambient temperature of 40°C. At this stage, a logical question would be why are we seeing thermal failures when the design is receiving the necessary amount of airflow that was calculated by hand earlier? This would be a difficult question to answer were we to build and perform measurements on a physical prototype.

Given the problem, an immediate concern is whether or not airflow is being delivered efficiently inside the chassis. With a considerable amount of effort, it would be possible to measure the speed of airflow at a few points inside the chassis, and from there attempt to infer airflow direction. However, this is a difficult process that often doesn't tell the whole story.

Simulation, on the other hand, can provide a much clearer picture of what is happening inside the chassis with minimal additional effort. Graphically plotting the airflow speed and direction inside the chassis using the software reveals that the air is flowing far above the principal components and moving straight into the power supply, as shown in Fig. 4.

Optimizing the Design

One possible solution to this problem is to add heatsinks to the components that are failing. Without the simulation results, this would be a very tempting though costly answer, because there would be no way to determine the root cause of the problem. However, with the benefit of the simulation results, it becomes clear that a baffle may be used to force the air to move where it was needed. But this raises the question of exactly what type of baffle is needed for best results?

In the past, an engineer would typically sit down at a computer and model and solve as many different designs as he or she had time to pursue. And often it wasn't very many. In many cases, time constraints made it necessary to settle for the first design that provided “good enough” results, rather than aiming at true optimization.

But in this case, new software capabilities automate the process of optimizing the design. Basically, the user defines what design parameters can be varied, such as component locations, number of fins on a heatsink and the size or location of vents, and the range of variation allowed. The user also sets a design objective — often referred to as a cost function — to be optimized, such as component temperatures, pressure drop, airflow rate or some combination of these. The design optimization software then proceeds to find values for the design parameters that satisfy all constraints and produces the optimum value for the cost function.

Instead of running all possible combinations of values in order to explore the whole design envelope, which would take a great deal of CPU time, the software runs only a relatively small number of variants, calculates the response surface and intelligently works out how to adjust the design variables for subsequent runs. This produces an optimized design in the shortest possible time.

In this case, the physical limitations of the chassis help define the design space for the baffle. The baffle angle with the roof of the enclosure can vary from 1 degree to 89 degrees, the length of the baffle can vary from 2 in. to 3.5 in., and the location relative to the back of the box can vary from 22.5 in. to 27.25 in. In addition, the position of the fan tray can vary from 29 in. to 31 in. from the back of the enclosure. The additional constraints required to define the baffle are the junction temperatures specified for each component. Along with these constraints, the software is configured to set a goal of minimizing the sum of selected component junction temperatures. With these parameters entered, the software is instructed to optimize the design.

In this case, the software first sets up a series of 10 iterations designed to explore the design space to locate interesting areas to be further explored during local optimization. In doing so, the software locates one area of the design space that is far superior to the others (Fig. 5). Then it sets up three more iterations in this area to perform local optimization. Finally, it runs five more iterations to confirm the design that was identified during the local optimization process.

The optimization process, which included running 18 iterations, took less than 3 hr, considerably less than the 6 hr that would be expected if each run took the same 20 min that it takes to run a single iteration. The reason for this time savings is that the software begins calculating the solution for each successive iteration with the solution from the previous iteration. As a result, the later iterations move very quickly to a solution. In this case, the optimal design was identified as the 13^th iteration, with a baffle angle of 19 degrees, baffle length of 3.3 in., baffle position of 25.5 in. from the rear and fan tray position of 28.8 in. from the rear.

This design drops the junction temperatures of critical MCH components by 15°C, which is more than enough to meet design requirements. The relatively short time required for optimization makes it clear that in more difficult cases engineers can easily explore a wider range of alternatives, such as adding or subtracting heatsinks, changing component location and changing fan performance characteristics. This process can be used to optimize the mechanical design to a much higher level of performance than could be achieved using traditional build and test methods or even traditional simulation methods.

Table. Power dissipations and maximum rated junction temperatures for major components.Component Quantity Power (W) T_J Max (°C) CPU 2 35 75 Memory controller hub (MCH) 2 3.5 100 Input/Output controller hub (ICH) 1 2 100 DIMM 6 12 100 Network processor (switch) 2 12 100 5633 gigabit switching processor 1 4.5 100 5821 network security processor 2 3.5 85