Let HALT Improve Your Product

Accelerated stress testing is used to detect and correct any inherent design and manufacturing flaws and determine a product’s robustness. The first component of the stress-testing process is the highly accelerated life test (HALT). Typically, a series of individual and combined stresses, such as multiaxis vibration, temperature cycling, and product power cycling, is applied in steps of increasing intensity—well beyond the expected field environments—until the product fails.

The HALT process continues with a test-analyze-verify-and-fix approach, with root cause analysis of all failures. Test time is compressed with accelerated stressing, leading to earlier product maturity.

The benefits of HALT can be summarized as follows:

Rapid design and process maturation.

Reduced total engineering time and cost.

Reduced production and warranty costs.

Earlier and mature product introduction (yields stabilized).

Higher mean time between failures.

Greatly reduced manufacturing screening costs.

Faster corrective action for design and process problems.

Delighted customers.

Typically, HALT starts with the lowest modular unit, usually individual printed circuit boards (PCBs), CPUs, cardcages, fans, and power supplies, and concludes with testing of the complete product. Production should be delayed until HALT results are satisfactory.

HALT Stresses

Appropriate stresses must be determined for each assembly since they have unique electrical, mechanical, thermal mass, and vibration characteristics. Typical system stresses used during HALT are shown in Table 1.

Actual stresses will be determined during an evaluation of the product prior to HALT and during HALT itself. Some stresses may be deleted; others may be added; multiple stresses may be used in combination.

HALT also is an iterative process, so stresses may be added or deleted in the sequence of fail-fix-retest. The limit values of these stresses are upper/lower destruct, upper/lower operating, and upper/lower specification.

Why These Stresses?

Historically, the stresses shown in Table 1 have caused certain defects to precipitate. Again, they are examples, not recommendations. Actual values must be determined for each product.

VCC Voltage Margin

With a nominal 5.0-V VCC, the problem range is 3.9 to 6.0 V.

Clock Frequency

Sometimes circuit requirements make it impossible to use clock frequency. However, it can be useful in identifying marginal components.

Temperature Screening

Every assembly and subassembly has its own temperature characteristic. HALT determines what that is. For example, the rated temperature range of a simple TTL digital logic PCB is 0 to 70°C, but it may operate correctly from -55°C to +125°C.

Failures in the expanded range may indicate marginal design, bad components, or processing problems. Some commercial parts do not even meet specification requirements for operation from 0 to 70°C.

Temperature Cycling

Temperature cycling detects failures that will happen over time in the field. These include weak solder joints, IC package integrity, PCB mounting problems, and PCB processing issues.

Vibration

Vibration testing normally is used to check a product for shipping and operational values. PCB testing can show weak or brittle solder connections. Bad connections may be stressed to failure at levels that do not harm good connections.

HALT Results Example

The initial HALT done on a controller PCB design produced several failures at low temperature extremes. These units exhibited microprocessor write errors at 30°C, system panics at 20°C, data errors periodically during temperature cycling ramping from 45°C to -25°C, and checksum parity errors periodically between 30°C and 65°C.

Development engineering analyzed the design to determine the cause of these problems. Timing issues were found and corrected by changing equations in some of the programmable array logic used in the design. The modified boards were run through HALT, and significant improvement was seen in design margins.

Determinants of HALT Success

The objective of event-driven HALT is not one of compliance but of results, corrective action, and prevention. The processes to achieve the objectives require physically testing units to failure to identify weak links, determine the root causes, and implement corrective action.

In HALT, there is every intention to physically damage product in an attempt to maximize and quantify the margins of product strength, both operating and destruct, by stimulating above the expected end-use environments.

Using a closed-loop, corrective-action process to preclude recurrence of the failures is critical to achieving lasting results. To be effective, the results of HALT must be:

Fed back to design to assist in selecting a different supplier, improving a supplier’s process, or making a circuit design/layout change.

Fed back to manufacturing to make a process change, typically of a workmanship nature.

Used to determine the environmental stress screening (ESS) profiles for production testing, as appropriate.

Some stresses are universal in their application, such as temperature, thermal cycling, and vibration. Others are suitable to more specific types of products, such as clock margining for logic boards and current loading for power components. Vibration and thermal stresses generally are the most effective environmental stresses in precipitating failure.

The importance of determining root causes for all failures is critical. Root cause failure analysis often is overlooked or neglected due to underestimation of resources and disciplines required to properly carry out this effort. If failure analysis is not carried through to determination of all root causes, the benefits of the HALT process are lost.

Start HALT at the lowest level of the product’s subassemblies and then repeat it at each level of assembly until the total system is tested. The stress levels that are tolerated decrease with each higher level of assembly, with the system capable of undergoing substantially lower stress levels than the subassemblies.

HALT requires extensive and intimate involvement by all concerned parties, including design/development engineering, manufacturing/production, quality, reliability:

Management—Project management must be willing to provide resources necessary to support testing. Time and funds must be committed up front to supply test units and expeditiously support failure analysis.

Development —Product designers are critical in providing in-depth knowledge of the design for effective stress selection and troubleshooting of failures. Immediate availability of the designers when a problem occurs is critical for the HALT process to be effective in compressing the development cycle.

Production—Manufacturing and process issues must be given the same priority as design issues. Problems in production will impact the customer as much as problems with design.

HALT requires extensive equipment, competent personnel, and a dedicated test plan and HALT profile for each new product. Each design has unique features and, consequently, unique flaws that must be precipitated and detected. A test plan developed for one product may not be applicable to another product because of design differences.

Hints on Conducting HALT

Here are some essential elements for success in HALT:

Start the testing as early as hardware is available for a new project.

Use as many units as possible.

Conduct failure analysis to the root cause.

Treat all failures as relevant.

Develop documentation and failure reporting robust enough to prevent failures or their solutions from being overlooked.

Sample size is a very critical decision prior to testing. The probability of uncovering a defect during testing increases as the number of test units increases. The larger the sample size, the more thorough the testing coverage. Diagnostic software must identify and isolate fault conditions.

ESS

Sometimes, depending on both market requirements and product/process maturity, ESS is used to quickly identify latent component, manufacturing, and workmanship issues that could later cause failure at the customer’s site. ESS is particularly effective in finding faults that are present only in a small percentage of the product—typically the case for process faults and component faults.

Optimum ESS assumes that design defects have been identified and corrected through implementation of the accelerated stress testing process. The ESS profiles are derived from the HALT results. As a result, the proper application of product environmental testing will ensure that the product design can be purged of latent defects that testing to product specifications will miss.

With the application of any stimulus, some fatigue damage is accumulated on the product under test. However, it is up to the manufacturer to prove that the screening process works and is rapid and cost-effective.

This validation is accomplished with the proof-of-screen process (POS). A POS demonstrates that the screen can expose design marginalities or manufacturing defects that would occur in the field without breaking good product or consuming an unacceptable amount of product fatigue life.

ESS Results Example

Figure 1 shows the Q3 97 yield results of pre-ESS and post-ESS for five products then in production. It shows the value of conducting ESS in production and the potential impact of loss in system test or the field if ESS is not conducted.

The mature PCBs (PCBs #1 to #3) have a high ESS yield, and the new boards (PCBs #4 and #5) have a low ESS yield. The benefit of ESS for new products is evident here. The post-ESS yields for both mature and immature products are equivalent, indicating that ESS is finding the latent defects. Nonetheless, the value of ESS must be constantly evaluated. At some point in time when yield is stable and high, it may make sense to discontinue its use for that product.

Conclusion

At the Tandem Division of Compaq, we strongly believe in the use of HALT and ESS in our product development and manufacturing processes, respectively. In fact, they must be used for all new-products.

About the Author

Eugene R. Hnatek is director of the Product Evaluation Center at the Tandem Division of Compaq Computer Corp. Previously, he was the component engineering manager at the company. Mr. Hnatek has worked in the IC quality and reliability field for more than 30 years and has published 11 books on the subject. Compaq Computer Corp., 10300 N. Tantau Ave., CAC05-53, Cupertino, CA 95014-0725, (408) 285-2609.

Parameters

Faults Found

VCC

Design Faults, Faulty Components

Clock Speed/Frequency

Design Faults, Faulty Components

Clock Symmetry

Design Faults, Faulty Components

Power Holdup/Cycling

Overloads, Marginal Components

Temperature

Cold

Margins

Hot

Overloads, low quality components

Cycling

Processing Faults, Soldering

Vibration

Processing Problems, Soldering

Table 1.

Copyright 1999 Nelson Publishing Inc.

May 1999

Sponsored Recommendations

Comments

To join the conversation, and become an exclusive member of Electronic Design, create an account today!