Burn-In Issues

July 1, 2005

Just how important in terms of power-supply product reliability is the quality and repeatability of the assembly process? Don Gerstle explains.

Don Gerstle

Users of power-supply products demand increasingly high levels of reliability and performance. Although the suppliers of individual components can confidently provide impressive life and reliability data, the compound effect on overall reliability can be significant when a large number of individual components are combined in a module such as a power supply.

Perhaps more important in terms of product reliability is the quality and repeatability of the assembly process. Solder joints, connectors, and mechanical fixings are all potential origins for product failure. In use, operating temperature and other environmental factors also affect the longevity and reliability of a power supply.

Burn-in and various other forms of life and stress testing help provide the data to enable power-supply manufacturers to continually improve the reliability of their products. Indeed, when analysed correctly and fed back into the design and assembly process, the accumulated data can be used to optimise the test and burn-in process and can even demonstrate that burn-in is not necessary to achieve the target reliability for a particular product.

THE BURN-IN PROCESS The purpose of the burn-in process for power supplies is to weed out "infant mortalities"—as seen in the first portion of the well known "bathtub curve" of failure rate versus operational time (Fig. 1). These latent, early life failures are attributable to intrinsic gross faults within the bought-in components, assembly errors, and faults induced in components by inappropriate handling (e.g., ESD damage). It should be noted that there are certainly no absolutes in the world of reliability testing, only probability and confidence levels for large populations.

Hence, there is never a guarantee that the burn-in process catches all infant mortalities. In fact, some problems need to be seen in functional testing

The conventional approach to power-supply burn-in over many years has involved running the supplies at an elevated temperature, often the maximum rated operating temperature listed in the product data specifications, where it is assumed the rate of appearance of latent defects is accelerated.

The supplies are run under full load with power cycling, and the input voltage run at either the maximum or minimum voltage to provide either maximum voltage stress or maximum current stress, depending on the design topology. Care in the choice of conditions is necessary as, for example, some components in some topologies can see more stress at light loads, such as snubber networks in variable frequency converters. Some ingenuity can also be applied. For example, if a product is intended to operate normally with forced air, it could be run in still air at light load and continue to achieve comparable temperature stress levels.

Data logging and analysis of the units under test is important to determine whether a failure has occurred, and if so, when. If all failures actually occur within minutes of a 48-hour burn-in sequence, there would be very good reason to know about it, shorten the time, and increase throughput while saving energy. It is normal for companies such as C&D Technologies to test products comprehensively before and after burn-in to ensure that any changes in performance are identified.

This can also show whether there are any intermittent problems. Understanding and using burn-in data to modify product design and manufacturing processes can result in improved reliability and yield that will be reflected in future data collected from burn-in.

Experience in burn-in testing has shown that thermal cycling precipitates more infant mortalities than a constant elevated ambient, although the sets of failures don't completely overlap. Thermal cycling with a dwell time at each thermal extreme is, therefore, the preferred process. Increasing the thermal rate of change precipitates more failures in fewer cycles (Fig. 2). Generally, care must be taken to ensure that the products are not stressed outside of their ratings in the often untypical environment of burn-in. If overstressed, some useful life of a good product could be "used up" and at worst, hard or latent failures could actually be induced in otherwise good product.

At C&D Technologies, the burn-in process normally starts with a duration of 48 hours, with a decision process to reduce the time of burn-in when no failures occur after a set number of hours. Depending on the product's complexity and topology, a decision is made to reduce the future burn-in hours by half after 200 to 500 units have gone through the process with no failures occurring in a quarter of the current burn-in time. This process is continued until the burn-in time is reduced to two hours, where it is held for the remainder of production. Some argue that burn-in can be eliminated when no failures occur after multiple production builds. However, it could be argued that this removes the insurance against a group of defective components being used and/or a process anomaly occurring.

In volume production of parts that are known to have a significant infant mortality rate, perhaps because of the degree of manual assembly, a regime of variable burn-in can be used. In this case, failures are expected, but when a pre-calculated period of failure-free operation of a batch has elapsed, burn-in is terminated.

HASS AND HALT Some manufacturers have taken the burn-in process further after finding that the types of burn-in described don't eliminate, within a reasonable time, all of the failures seen to occur in the early life of a power supply. Also, conventional burn-in doesn't provoke early failures that could be a result of the shock and vibration of shipping and handling.

To combat this, a more aggressive HASS (Highly Accelerated Stress Screen) can be used that applies mechanical, thermal, and electrical stress typically beyond product ratings but within design margins. Acceleration factors of more than 40 over conventional burn-in have been claimed for this method, giving correspondingly shorter test times. A problem, however, is that the stress levels are so extreme that there is a risk of damaging good product with hard or latent failures.

In answer to this, the HALT (Highly Accelerated Life Test) process was designed to identify the real damage limits in a product by stressing the product to failure with temperature extremes, thermal cycling, progressively higher levels of vibration, and then a combination of thermal cycling and vibration. During this testing, the destruction limits of the power supply are identified. These operating limits are then used to set the lower HASS test levels.

HALT is also used extensively during product development to identify potential weaknesses in the design. The test equipment required to do HALT must typically ramp temperature between –55 to +125C while applying six-axis linear and rotational random vibration. This requires a major capital investment and is often subcontracted to specialist test houses.

THE NO BURN-IN PRODUCTION MODEL As described earlier in the article, once burn-in failures have reduced to a certain level, some manufacturers feel that the process can be dropped completely. This can be considered only if the manufacturing process is entirely predictable and the quality of bought-in material is such that it has no gross latent intrinsic defects. Although commodity components approach this quality level and modern manufacturing quality control can minimise process variations, there is still a real risk that a customer may see some early life failures. The cost of this in terms of goodwill has to be weighed against the costs of burn-in. ON-GOING LIFE TESTING While extended burn-in tests may be employed on small numbers of units to gauge whether all infant-mortality failures has been identified, on-going life tests are run for up to six months on 25 to 50 units at a moderately elevated temperature. These tests are normally only utilised when there are large quantities of units built on a continuing basis, and can give an estimate of the intrinsic reliability of a product in service—that is, MTBF. It should be emphasised that real field failure rate is the most accurate measure of the reliability of a product.

A calculated MTBF can be compared with the demonstrated figure obtained through burn-in to check for consistency.

The important point to note is that quality and reliability cannot be tested in or inspected in. Burn-in testing is ultimately another inspection process but serves as a mechanism for process control and feedback. Failures in burn-in along with field failures prompt failure analysis and corrective action, to ensure that the product design and process have been centred and optimised to provide the best product possible to the field. Studies have shown that higher factory yields give higher product reliability, happier customers, and lower warranty return costs.

See Figure 3