Automated Fault Diagnosis Ensures Product Success

SPICE-based fault diagnosis eliminates potential product failures early in the design cycle, so they don’t occur in the field.

Tim Ghazaleh, Director of Marketing, Intusoft, Carson, Calif.

April 1, 2008

9 min read

A product's future can be at stake if any of its parts fail, such as a short, open or stuck-at condition. To avoid such problems, product designers should be aware of the cause and effect of part failures, as well as their potential risk to the end user. Designers should not have to find this out in the field; they should be able to predict potential failures in the design process. Automated fault diagnosis produces the desired results.

Automated fault diagnosis capability is now available with SPICE-simulation tools for analog and mixed-signal designs. In the past, the lack of this capability forced designers to employ many painstaking tricks by inserting circuit faults one at a time using design-automation tools. Or they might use unconventional circuit board measures with marginal results.

For example, a designer could insert a super-high resistance value (1 GΩ) to simulate an open condition, or insert a wire to simulate a shorted component. Although that is all right for simulation, it is potentially dangerous with a prototype board. Or, the designer could use a heat source to temporarily change a transistor's temperature coefficient on a circuit board. However, inserting faults in a complex IC can create its own set of problems. Manually pursuing fault coverage under any of these conditions could take weeks to set up and attempt measurement, including the need to track an immense amount of design-alteration data.

Automated fault diagnosis spares the designer from such laborious manual tasks. It provides a convenient means of first assigning pass/fail limits to a component's electrical operation and signal levels, such as maximum power, root-mean-square voltage, transistor collector current, rise time and propagation delay. Then the designer assigns shorts, opens and stuck-at values to passive, discrete and active devices. The designer also can assign fault levels for device temperature and power-supply sources. After entering these adverse conditions, the designer can run fault simulations, which automatically simulate all prescribed faults, record designated measurements with pass/fail status and flag out-of-tolerance component conditions.

Fault Simulation Setup

The designer can prepare a fault simulation in either the time or frequency domain. Design simulation setup is done quickly by dialog-box entry, specifying the start/stop points in time for transient analysis, or start/stop frequency and number of points per decade, octave or linear for frequency-domain analysis.

Next, the designer assigns fault conditions. Fig. 1 illustrates this process for a UC1524 PWM controller IC design. The designer can perform this on any device, like R3 in Fig. 1. The type of prescribed fault appears in the “failure mode” dialog for R3 — and optionally all like parts (i.e., resistors). The designer can also do this for power sources, the PWM IC and the device temperature. The “failure parameters” dialog enables the designer to change default values for short and open faults, as well as stuck-at values. After assigning fault conditions in this fashion, only one fault (L1 short) is initially enabled from a failure-mode dialog box, among a large collection of other faults that were previously established.

Next, the designer should select the electrical properties of components and signal lines to monitor during post-fault simulations, including pass/fail status. Fig. 2 includes several signals and devices selected to record their “final value” post-fault simulation. The drop-down list provides several other selectable measurements. The fault run (time domain) was prescribed over a 1-msec period.

Fig. 3 illustrates the resulting measured final-value readings (measurements column) for several signal lines and devices, including their pass/fail status by colored histogram bars. Actual min/max test limits will be described later. For now, default values are automatically assigned by the simulator. In the lower-right corner of the dialog box, L1 short was selected from a drop-down list of several possible faults.

Multiple faults enabled for simulation appear in Fig. 4. Note that four new faults were enabled (Q1 open, R7 stuck at 1 kΩ, R3 short and source V1 stuck at 4.9 V). Fig. 5 shows the simulation results, with display waveforms, pass/fail status and measured results.

Selected for analysis are all five fault selections enabled in Fig. 5, and the R7 stuck-at condition. After simulation, the designer can choose any of the enabled faults atop of the list, which instantly show corresponding changes in the measurement values in the main portion of the dialog box for the prescribed devices and signals.

Next, observe that a detailed breakout of the devices and signals is on the left of the dialog, just under the shaded final-value text. Selecting one of these entities causes the instant display of the corresponding measurement and pass/fail conditions on the right for all five fault conditions. Both ways of displaying measurements and pass/fail status (selecting all devices and signals as a collective group as shown in Fig. 5, or selecting individual ones to display data from all enabled faults) provide a fast what-if analysis of how faults affect critical devices and signal lines. The simulation waveforms show the current through R7 and voltage on signal V(18) atop of the design. Such waveforms also can be displayed in real time during fault runs.

Establishing Design Test Limits

Establishing electrical test limits for devices and signal lines is an important aspect of fault simulation, because signals could lead to catastrophic product operation if they falter. The first option for establishing test limits is to probe anywhere in the circuit after a nominal simulation and view signal waveforms. From this the designer might ascertain acceptable deviation for pass/fail test limits. Alternatively, the designer could interpolate such limits from final or maximum values charted after simulation. Other ways of establishing test limits make use of acceptance criteria from product specifications and component data sheet ratings (i.e., device maximum power).

A production-oriented method for establishing test limits is to first run a Monte Carlo statistical analysis, which randomly (i.e., Gaussian variation, bimodal, etc.) varies component parameters through their tolerance range, then examines devices and signal lines for a corresponding change in behavior. Graphical representation of Monte Carlo signal behavior, such as a curve family, enables interpolation of maximum variations within the waveform's envelope caused by the design's altered component tolerances.

The designer also can select a histogram bar graph from the Monte Carlo simulation that shows how many runs fell within a range (bin) of signal values (i.e., seven runs fell within 1.25 mV to 1.3 mV). If necessary, once the design is tweaked enough where signal variations are acceptable, you can use min and max values for this range to establish pass/fail limits on signal lines and devices. Besides understanding how the design will perform with device variation in production, you can use the test limits as pass/fail criteria when injecting faults into the design simulation.

A useful feature is the ability to quickly expand test limits in any number of ways following fault simulation (Fig. 6). Recall earlier that after a fault run, specific test limits were not yet established. Now the “expand to pass” option (left dialog) provides ways to open min/max test limits for all measured readings under the present fault at hand (i.e., Q1 open). Corresponding colored histograms to the right reflect the new pass/fail results. The dialog shows expand-to-pass options:

Assignment of any sigma limit (3, 4, 5…), whereby 3 S is taken as a device's assigned tolerance variation (i.e., 5%).
Manual value, which the user assigns in the associated “value” box. This could be in compliance with design specifications, the component data sheet, signal waveform data and designer discretion.
Expand to pass, which sets the measured value as the high or low limit depending on the sign, or both minimum and maximum limits symmetrically if the “with symmetry” box is checked as in Fig. 6.

Once the designer specifies the test limits, the new limits can be used to study test-limit compliance with successively selected faults — that is, seeing how other faults' measured results comply with the new test limits. For example, Fig. 6 shows test limits expanded for a Q1 open fault, so that all monitored signals and devices pass. But when selecting our former R7 stuck-at fault, three of the pass/fail histograms change color (failed red) despite the new expand-to-pass limits for Q1 open. Fig. 6 shows all devices and signals in a pass condition for Q1 open.

Product Safeguards

So what does all this have to do with design reliability? Beyond reliability analyses such as Monte Carlo and worst case, not much. However, fault simulation does adversely affect a design in several possible ways, which ultimately enables the designer to view measured data that could be critical to production or field operation. Safeguards can then be built into the design to help circumvent hazardous product operation if a fault were actually to occur. For instance, if a signal dropped below a certain voltage resulting from possible component failures found by fault simulation, then special electronics could be built into the design to take a prescribed course of action. Examples include an LED display that a fault condition occurred, invocation of equipment shutdown and activation of backup electronics.

Examine how a design will operate under faulty conditions early in a product's design-simulation phase. Historically, SPICE tools have provided powerful ways of varying component tolerances and temperature to study signal variation for manufacturing compliance. This method and other means also serve to establish pass/fail criteria for acceptable operation under fault conditions. In the end, incorporate product safeguards early in the design cycle to help eliminate possible damage from faults encountered in production and in the field. The process not only saves weeks to months of time using traditional attempts at fault diagnosis, but is far more effective in ensuring accuracy and thorough fault coverage.