Test-In Quality With HALT

March 1, 2003

10 min read

Enlightened destruction of product prototypes helps to ensure survival of production units in the field.

New-product development is a complex undertaking. When used as part of the phrase new-product development, new means different as well as new. You may ask, “How different must a product be before it can be considered new?” For example, some current models of new cars only differ from the previous year’s models in small details. Other designs actually are new and innovative and stand out from the rest.

It’s all a matter of degree. Incremental change is the foundation of much new-product development for a good reason: If a new product can be based closely on an existing one, development risk is reduced. The corollary to lower risk is greater certainty of timely project completion. And from that follows a smoother product launch; better coordination of purchasing, production, marketing, and sales; and ideally, greater and earlier profits.

What happens if your competitor already has introduced a model with only incremental changes? You may be tempted to skip making similar changes to your own existing product, instead electing to develop something really new. Many companies make this choice, citing greater market share and financial growth as the anticipated rewards. The downside to such a decision includes unforeseen costs, delays, and technical setbacks. Truly new development is full of risks.

HALT Adds Reliability

Risk is associated with more than just the development of highly innovative new products. After the design is complete and the production phase begins, the service history will start accumulating. Incrementally different products already have a service history. Truly new products don’t.

In an incrementally improved product, most of the design hasn’t been altered from an earlier, well-understood model. Reliability engineers only have to concentrate on the changes made in the new model. They already have found whatever problems existed in the old model and corrected them. Proof that the problems have been solved may be evidenced by many years worth of field-service reports.

Assessing the reliability of innovative new products is difficult. There may be little relevant field data to compare the new product against. Theoretical mean time between failure (MTBF) calculations are of some help but may be inappropriate given the novel construction or use modes of the product. Detailed testing during development takes a long time and may not adequately address those areas that eventually result in field failures.

Highly accelerated life testing (HALT) is an approach to the problem that has become popular in many industries. HALT is fundamentally different from conventional forms of testing because its purpose is to quickly break the product in an informed manner. In contrast, conventional testing attempts to prove that a product works correctly, although meeting the design specification might not ensure reliable operation in the field.

In a paper presented at NEPCON West 2000, Neill Doertenbach, formerly product manager at QualMark and currently manager of Advantage Technical Services, commented, “…the goal of HALT is to quickly break the product and learn from the failure modes the product exhibits. The key value of the testing lies in the failure modes that are uncovered and the speed with which they are uncovered. HALT is considered a success when failures are induced, the failure modes are understood, corrective action has been taken, and the limits of the product are clearly defined and pushed out as far as possible….HALT is a process of discovery and design optimization.” ¹

The HALT process results in greater product reliability because the discovery and design optimization loop is repeated at higher and higher stress levels. Eventually, when many things fail as the result of a small increase in the stress level, the end of the HALT process may have been reached.

To withstand even greater stress might require substantial redesign or much more expensive components. At this point, maximum performance has been wrung out of the design. Common types of stress are temperature and vibration, but changes in the power-supply level and power cycling also can be effective.

“Although failure modes are induced by stresses in excess of the product specification, they typically are valid failure modes that would show up in the product in the field,” Mr. Doertenbach continued. “…The important thing to remember is that HALT is finding the weakest parts of the design. These weak links will be the source of warranty problems in the field….HALT is not intended to demonstrate that a product will function in its intended environment. Consequently, the stresses do not attempt in any way to duplicate those expected in real life. Rather, the stresses are specifically designed to quickly bring out failure modes.”

When HALT has been used in the development of a product, the final design verification testing should proceed smoothly, and there should be fewer problems found when beginning production. However, marginal production processes and bad batches of components also can introduce faults leading to field failures.

Highly accelerated stress screening (HASS) verifies that products have been manufactured correctly. Based on the destruction and performance limits established by HALT, HASS attempts to precipitate latent faults without actually damaging the product. Developing a HASS profile that includes the right amount of the right kinds of stresses is a skilled and sometimes iterative process.

An Example

HALT was introduced at Otis Elevator in late 1995 by Mark Morelli, principle engineer, product reliability, in response to a management challenge to improve quality and reduce costs. Since then, more than 200 HALT tests have been performed. HALT first was used on previously released products to prove its capability to identify known faults.²

Table 1. Projected Benefits of Motor-Controller HASS Program
Year	Cumulative Failures Prevented	Cost of Failures	Cost of HASS	Cumulative Cost Savings
1	10	$116k	$29k	$87k
2	20	$231k	$29k	$173k *
3	30	$347k	$29k	$260k
4	40	$462k	$29k	$346k **
5	50	$578k	$29k	$433k

* HASS has been completed on two years worth of products.
** The cost of the HASS chamber and related equipment will be paid for after four years of testing.

Courtesy of Otis Elevator

Figure 1 shows the types and number of faults found in one model of an elevator-motor controller. Three of the top four types of field failures reported for this assembly were observed during HALT. These faults had contributed to more than 50% of the field failures on previous versions of the product.

Typically, HALT requires only a small number of DUTs. Because the stress is increased until DUT failure, it is not necessary to use a large population to have a better chance of observing a failure. HALT guarantees that failure modes will be found. Nevertheless, more than one DUT is needed to verify that a failure mode is common to the assembly, and having several DUTs available ensures that testing can continue in the event of a catastrophic DUT failure.

To avoid damaging DUTs unnecessarily, testing usually starts with cold step stressing, then hot, followed by rapid thermal cycling, then vibration, and finally by a combined thermal/vibration environment. As examples of failures experienced during the motor-controller tests,

A number of processor boards operated incorrectly at low temperatures.
A snubber circuit operated incorrectly at several different temperatures.
A rectifier diode failed at 120°C.
Short circuit failures were caused in the high-voltage board’s 24-V supply and the output driver IGBT at 130°C.

When the HALT findings were compared to actual field failure data, HALT could not account for all the failures that had been reported. It was obvious that the manufacturing process was accounting for some of the faults.

Figure 2 relates to faults caused by leaky ceramic capacitors. These failures were caused by a bad batch of parts and could only be eliminated by a suitable HASS.

As shown in Figure 3, the HASS profile developed for the motor controller consists of four temperature cycles performed over two hours with simultaneous 25g vibration. The general shape of the profile conforms to the theoretical description given in Mr. Doertenbach’s paper:

“There are two parts to the screen. The first part is the precipitation screen. This screen stresses the product beyond the operational limits and near the destruct limits found in HALT. It is intended to precipitate failures in the product due to latent defects….

“The second part of the screen is the detection screen. During the detection screen, the product is stressed to near the operational limit found in HALT. Now, the product is being functionally tested. Any hard failures induced during the precipitation screen will be detected as well as any soft failures that may be induced by the stresses.

“Figure 4 provides an overview of the purpose and limits of these screens. It shows the margin discovery curves, overlaid with the precipitation and detection screens. The limits on the screens are set so that they are outside of the tails of the distribution of the failure modes that define the operational and destruct limits for the product. Consequently, product that has no new latent failure modes should pass the screen undamaged. Any new failure mode, however, will be exposed.”

Of the many motor controllers Otis manufactured during 1998 and 1999, 729 were subjected to this HASS profile. The result has been the complete elimination of field failures in this population over the two-year period. During the same period, the population of 5,678 similar motor controllers not subjected to HASS continued to have failures at about a 2.6% cumulative rate after three years of field use. This failure rate correlated well to the 2.6% fallout from the two-hour HASS test in the factory.

The elimination of field failures corresponds to 10 motor controllers per year that will not fail because HASS was used in production. Table 1 shows the warranty savings that are anticipated to accumulate over time both in terms of units and dollars on a year-by-year basis.

Cost of failures includes both visible costs such as field labor for diagnosis and repair, the cost of building new replacement assemblies, and warranty costs as well as hidden costs such as engineering and factory charges and continuing support. The cost of HASS was approximately $80 per unit or $29k per year based on 365 units per year.

Conclusion

Implementation of HALT and HASS programs at Otis Elevator has been shown to improve quality and reduce cost, two major points in management’s original challenge. In addition, the combined programs also address the need to be more innovative in an increasingly competitive environment and to speed up product development and introduction. As a result, HALT and HASS are performed on all new products, and field failure data is continuously monitored to ensure the effectiveness of the testing.

References

Doertenbach, N. “Highly Accelerated Life Testing—Testing With a Different Purpose,” Proceedings of the Technical Program, NEPCON West 2000, Vol. 2, pp. 765-773.
Morelli, M., “Effectiveness of HALT and HASS,” Proceedings of the Technical Program, NEPCON West 2000, Vol. 2, pp. 801-805.

Acknowledgement

Thanks to Mark Morelli at Otis Elevator for his help in preparing this article.

FOR MORE INFORMATION on HALT/HASS training

www.rsleads.com/303ee-200

on HALT/HASS technical papers

www.rsleads.com/303ee-201

Return to EE Home Page

Published by EE-Evaluation Engineering
All contents © 2003 Nelson Publishing Inc.
No reprint, distribution, or reuse in any medium is permitted
without the express written consent of the publisher.

March 2003