Optimize HALT Results With Best Practices

To gain optimum results from highly accelerated life test (HALT) and highly accelerated stress screen (HASS), it is important to properly plan, implement, document, and take corrective action when embarking on this path. When design defects are removed proactively before manufacturing, warranty costs are reduced and sales and reorders increase because customers are more satisfied with the product.

Even though HALT is inserted within the product development cycle, time to market is shortened because last-minute problems are significantly reduced (Figure 1). Problems found early in product development are less expensive to resolve. Quality specifications are met which produce greater yields, lower costs, and higher reliability. And, once HALT is implemented using best practices, managers can see its value and incorporate HALT into future new-product development plans.

Figure 1. Adding HALT to Product Development Speeds Time to Market

The main idea behind the HALT process is to break the product to reveal weaknesses. The weakness found during HALT then can be analyzed for root cause and later evaluated for mitigation.

A common error is to stress test the product to, or slightly above, the specified temperature. If the product is specified at 0° to 80°C, many only will test to these limits or perhaps -10°C to 90°C. In steps, go further. Testing to -40°C or even -60°C on the cold side and to 140°C on the hot side is recommended. If there are weaknesses in the design, a failure will more readily occur at these higher stress levels.

Temperatures from -40°C to 140°C should be used, even for products with much lower temperature specifications such as 0° to 40°C. The engineering team then can evaluate the relevance of failures found in HALT and implement corrective action if necessary.

Vibration is much the same. Operate the table to at least 60g rms (2 Hz to 5 kHz) on the high side. Many designers balk at stressing above 20g rms, knowing that their product will reside in a typical office environment where vibration is minimal. In this case, the design specification will be very low; stressing to 20g rms ensures there is a little margin in the test. It is better to stress to at least 60g rms.

Is there a best stress-test temperature and vibration profile that suits everything? Not necessarily for all products, but there is a very good profile that meets the needs for many products
(Figure 2).

Figure 2. HALT Stress Profile Accommodates Many Products

The best practices recommended are the following: Close the chamber doors and verify that the product is turned on and operating normally. While the product is in normal operation, set the chamber to room temperature and the vibration to 5g rms to 10g rms and run for 5 minutes. This low level of vibration can mechanically loosen any small hardware that might fall in a tight place and cause damage.

If you see or hear hardware flopping around, stop the vibration, open the door, and fix what’s loose. This helps to ensure that all the instrumentation cables and connectors are fastened correctly and there is no loose hardware in the chamber that could lead to damage at higher vibration levels later in the testing.

Next, stop the vibration. Lower the chamber temperature to -10°C. Let the product ramp there and then soak for 10 minutes. In most cases, longer soak times are not necessary.

Monitor the instrumentation to verify that the product is operating normally. Continue lowering the temperature in 10°C to 20°C steps until you reach -40°C (or lower). Monitor the instrumentation throughout. In HALT, the idea is to stress to failure, not to verify in-spec performance.

After the soak at the lowest temperature, ramp to 40°C and soak for 10 minutes. Increase the temperature in 10°C to 20°C steps until you reach 140°C. Continuously monitor the instrumentation for normal operation. Ramp to 20°C.

Begin vibration from 10g rms to 60g rms in 10g-rms steps with 10-minute dwells. Monitor the instrumentation throughout. At this point, you have temperature and vibration stressed with separate stressors.

Now, combine the two stressors. Lower the temperature to 0°C and set the vibration to 20g rms. Dwell for 10 minutes while monitoring the instrumentation. Repeat at 40g rms and 60g rms. Move to the next temperature in the profile and repeat the vibration steps. Products will have resonances that vary at different temperatures so vibrating at many temperatures may precipitate surprise failures.

After completing at 140°C, move to the final section of the stress profile. Here the high and low temperature limits are 10°C lower (130°C) and higher (-30°C), respectively, from the main profile. This accommodates some overshoot of the chamber when doing rapid temperature changes over a wide range.

Run the vibration at 10g rms throughout this final ramp test. When done, set the temperature to 20°C and the vibration to 0g rms and wait until the product has cooled and can be handled. One HALT stress test is complete.

Remove the product from the chamber and inspect for damage. Place the second test unit in the HALT chamber and repeat the test profile.

As the test runs, take careful notes of what’s happening when a failure occurs. Note in the time log the temperature and vibration levels when the product fails. If you are applying additional stressors such as DC supply voltage, AC line voltage and frequency and phase for heavier power usage products, or clock frequency, note that as well.

Units Required for a Well-Done HALT

In performing HALT, you will need four or five units, one being a gold unit. The gold unit is used when there is a failure difficult to diagnose. By simply inserting the gold unit into the test setup, you can learn if the removed product has the failure or if the interconnections or test setup were at fault.

Do not HALT the gold unit. If you don’t stress the gold unit, it can be sold.

If you HALT one unit, you probably will have several failures. You can’t easily know if the failure is design, manufacturing, component, or software related. By HALTing three or four units, hardware design- and software-related failures will repeat very closely under similar stress levels. Manufacturing-related failures may occur only one time in a set of three or four test samples. Component failures may or may not repeat.

Each failure must be analyzed to its root cause by the HALT team. Sometimes root causes are easily determined. The corrective actions can be defined and verified later.

Strategy for ReHALT

If a failure is relatively simple, the team may determine that the fix does not need to be run through a full HALT again. If the failure requires major repairs like layout changes and component changes, a reHALT is a must. This will verify that the change was an acceptable fix and did not cause new problems.

One unit is required for the reHALT. More usually are not needed because only the physical redesign changed.

Hard and Soft Failures

When collecting failure data, note the hard and soft failures. Hard failures stay in the failure mode even when the stressors are removed. A soft failure recovers when the stressors are lowered or removed completely.

Soft failures actually can be turned on and off by changing stress levels. When you find a soft failure, note the test specs and then reduce the stress levels 10% to 20%. When the product recovers, return to the stress levels where the soft failure occurred. If this is a true soft failure, it should fail again. Reduce the stress levels again and obtain a second soft failure to ensure you have repeatability.

Make careful note of what failed and what the failure stress levels were. This will be very helpful when doing HASS in manufacturing. If a soft failure that is expected in HASS disappears or new soft failures appear, something has changed that needs looking into. This detected change might be an early indicator of a new problem.

Planning for HALT

HALT doesn’t just happen all by itself. Planning for HALT begins with the reliability engineer on the project meeting with the lead designer. They determine who should attend HALT planning meetings and take part in the actual HALT. The team should be composed of the lead designer, the lead software engineer, a test engineer, a manufacturing engineer, and the reliability engineer. The first HALT planning meeting should be scheduled 12 weeks before the test so that samples will be ready for the actual HALT. Then, the team should meet weekly.

The reliability engineer chairs the meetings and takes notes for later distribution. During the first meeting, action items with completion dates will be created for team members and even non-team members. The reliability engineer should track the status of the action items.

Subsequent meetings often will be brief. Sometimes only a few action items must be completed in a week so the reliability engineer can just phone each team member to get the current status. In some weeks, no meeting may be necessary.

A final HALT planning meeting should take place before the actual HALT start day to ensure that everyone is in agreement with what is needed, all the tasks are done, and what was done meets the needs of the testing. Have all the planned apparatus in the HALT lab a few days before the HALT start. This eliminates last-minute scrambles that delay the start of the HALT.

Make sure to have an adequate supply of spare parts on hand for repairing samples. Remember to disable the thermal shutoffs and protection circuits to ensure that high/low temperatures can be reached during HALT. Also, review the results of previously tested products one last time, looking for any information that could help optimize the pending tests.

During the HALT week, conduct small meetings in the lab to modify the test plan. Even with all the original planning, things change due to surprises discovered in early testing.

HALT Report

The reliability engineer writes the HALT report. It should distill everything that occurred during the HALT. Generally, good report-writing techniques will suffice, but a few pointers here will help. Note members of the team as well as additional contributors. List all the instrumentation used to monitor the samples during testing including software or firmware revisions. Add temperature and vibration graphs to assist in understanding what happened during the test. Include the stress test profile and any deviations from the plan.

Take many photographs of the hardware and all the interconnections. These photos will be very helpful when describing failures. Also, the photos will be valuable in setting up the test apparatus if a reHALT is needed.

Have all the important information on the first page of the report so your manager can efficiently scan the test results. Describe both hard and soft failures and the number of units HALTed. List what is planned to fix failures and why and how long this will take. Also, explain any failures that the team decided not to fix and why.

HALT Closure

As a follow-up to the HALT report, note the action items that were done and verified, when, and if a reHALT was needed. Closure ensures that there is one document that describes the total activities performed on the product. It creates a one-stop location for finding out what happened. This is especially helpful when embarking on a new product similar to one already tested. There is no need to reinvent the test.

Prepare a chart that describes the stress levels applied to the product and at what stress levels hard or soft failures occurred. A bar chart will make it easy to place the results of one product after another to get a graphical view of the stresses and the failures encountered during HALT.

Establishing HASS Stress Levels

As a final step in the HALT, establish the best stress levels for HASS. At this time, everything is known in terms of what stressors to apply and what levels will not remove an excessive amount of life in the product after it has been manufactured. This process is called proof of screen.

The idea is to determine the temperature and vibration stress levels that, when applied to the product, will not cause a failure or remove too much life assuming that the product was manufactured correctly. Stress testing any product will remove some life from the end of its expected life. So if you removed 5% of the life on a 20-year life product, it will become a 19-year product.

For example, you have decided to stress a product at -40°C to 80°C and up to 30g rms vibration for two full cycles. You place one unit in the test chamber and run your profile for many cycles (Figure 3). If the product fails in five or six cycles, your somewhat arbitrary HASS profile is clearly too severe. If the product doesn’t fail until you have run 100 HASS cycles, then the HASS is too weak. You need stress levels in between. If you perform HASS on a product and it fails in 20 cycles, you have it about right.

Figure 3. HASS Stress Profile

Removing 5% of the life will remove 5% on every HASS cycle. So in 20 cycles, the wear-out accumulates, and the product fails. This is what you want.1

Reference

1. Levin, M.A. and Kalal, T.T., Improving Product Reliability: Strategies and Implementation, 2003.

About the Authors

Ted Kalal is a reliability consultant for global clients, focusing on teaching product designers about proactive design tools that improve reliability. He has authored several papers on electronic circuitry and HALT and holds a patent in the field of power electronics. 1624 Old Course Dr., Plano, TX 75093, 972-447-8693, e-mail: [email protected]

Wayne Tustin founded Equipment Reliability Institute, a specialized engineering school, in 1995. His short courses address vibration and shock measurement, analysis, and testing. Equipment Reliability Institute, 1520 Santa Rosa Ave., Santa Barbara, CA 93109, 805-564-1260, e-mail: [email protected]

Ken Duncan is co-owner of Reliant Labs and has more than 10 years of testing expertise in networking and telecommunication equipment, power supplies, medical equipment, and a range of consumer electronic products. Mr. Duncan obtained a B.S. in industrial technology with a concentration in quality assurance from California State University in San Jose. Reliant Labs, 925 Thompson Place, Sunnyvale, CA 94085, 408-737-7500

March 2009

Sponsored Recommendations

Comments

To join the conversation, and become an exclusive member of Electronic Design, create an account today!