Leading-Edge Diagnostic Tools Help Ramp Up SoC Production

DESIGN VIEW is the summary of the complete DESIGN SOLUTION contributed article, which begins on Page 2.

There's a direct relationship between the quickest route to volume IC production and profitability. That said, today's SoC designs demand more elaborate testing and application of more types of tests. Thus, two key factors in ramping up to volume production are how fast you can troubleshoot errors in first silicon and the speed in achieving a sufficient target process yield. Failure diagnostics plays a vital role in both of these areas.

Traditional analysis methods are less productive with technology trends such as shrinking feature sizes, and conventional instruments are too cumbersome when isolating root-cause logic failures without precision guidance. This is where expert systems can step in to help track failure trends and provide warnings when problems arise. The more automated the link between the tester, diagnositcs software, and root-cause analysis, the faster products can be ramped up to volume.

Scan technology is a key enabler of effective diagnostics. Scan cells are used instead of standard flip-flops or latches for internal sequential elements. In functional mode, the cells operate as standard flip-flops or latches. In test mode, they operate as shift registers, enabling control and observability at each sequential element. Thus, a 10-million-gate SoC may have nearly 500,000 internal control and observe points using scan technology.

This article delves into the use of automatic-test-pattern-generation (ATPG) tools in scan technology and the use of built-in self-test (BIST) methods when testing SoC memory arrays. Also discussed is logic BIST diagnostics. Stressed is the fact that electronic design automation and automatic-test-equipment vendors are partnering to develop new solutions for leading-edge design-for-test and BIST diagnostics.

HIGHLIGHTS:
How Diagnostics Work	Scan technology is key to effective diagnostics. Scan cells naturally partition the complex function and sequential nature of an SoC into small combinational blocks. ATPG tools use the failing scan cells' values to run an ATPG-based diagnosis.
Diagnostics For Memories	Memory arrays within SoCs are often tested with on-chip memory BIST controllers. These controllers can easily report failing data and addresses that don't match expected values. Many times, a separate diagnostics port running from a clock other than the system clock is required.
Logic BIST And Embedded Compression	Logic BIST is almost always based on scan technology. However, scan chains are automatically loaded with results compressed into a signature by on-chip logic. Scan chains are used when tester access is impractical or impossible. Use of logic BIST will greatly simplify tester requirements and test-program data for pass/fail testing.
Design For Test Is A Must	Without guidance from a software-based diagnostic system, traditional failure analysis methods become ineffective guesswork in the face of increasing device complexity.

Full article begins on Page 2

There’s a direct relationship between the quickest route to volume IC production and profitability. That said, today’s system-on-a-chip (SoC) designs demand more elaborate testing and application of more types of tests. Consequently, two key factors in ramping up to volume production are the quickness in troubleshooting errors in first silicon, and the speed in achieving a sufficient target process yield. Failure diagnostics plays a vital role in both of these areas. Moreover, automatic diagnostics must be available for failures in any of these tests to aid in a quick root-cause analysis.

Conventional failure-analysis instruments are too cumbersome when isolating root-cause logic failures without precise guidance. And technology trends, such as shrinking feature sizes and increasing numbers of metal layers, are rendering traditional analysis methods (e.g., visual inspection, signal acquisition) less productive. However, expert systems can help track failure trends and provide warnings when problems arise. The more automated the link between the tester, diagnostics software, and root-cause analysis, the faster products can be ramped up to volume–and thus a faster track to profitability.

The ability to debug failures in first silicon is vital for most companies developing SoCs, even if the chips are fabricated elsewhere. This is because the first chips produced might not work. If that is the case, all production must stop until the source of the problem is determined.

Of course, companies that run fabrication lines know about the necessity of perfecting new processes to a high yield and then maintaining it. They also know that automated diagnostics will dramatically reduce the time and effort needed to isolate the potential root cause of a defective chip, from several weeks to under an hour.

How Diagnostics Works Scan technology is a key enabler of effective diagnostics. The technique employs scan cells instead of standard flip-flops or latches for internal sequential elements. The scan cells operate as standard flip-flops or latches in functional mode. In test mode, they operate as shift registers and enable control and observability at each sequential element. Consequently, a 10-million-gate SoC may have nearly 500,000 internal control and observe points using scan technology.

The scan cells naturally partition the complex function and sequential nature of an SoC into small combinational logic blocks (Fig. 1). A tester will shift out the scan-cell values for comparison to know-good values. If a failure occurs, the internal location (scan cell) that mismatched at the tester is a nice starting point for diagnostics tools. Automatic-test-pattern-generation (ATPG) tools can then use the failing scan cells’ values to run an ATPG-based diagnosis on the small combinational logic feeding the failing scan cells.

To ensure a high-quality test, today’s SoCs require applying multiple types of test patterns. Many patterns are created using scan and ATPG tools. The same ATPG tools that produce stuck-at, transition, and path delay at-speed tests, as well as specialized scan tests, can perform diagnostics in a similar manner. Failing scan-cell values for multiple patterns are read into the diagnostics tools and an ATPG-based analysis is performed to determine the source(s) of the failures. Failing-gate information can then be used with physical layout tools to direct physical inspection and determine the cause of the defect (Fig. 2). Knowing the root cause of the failure will identify which part of the fabrication process needs to be improved. Similarly, scan diagnostics, when applied across large numbers of devices, can also help isolate design-related problems. To enable scan diagnosis, the automatic test equipment (ATE) must provide a datalog with the failing pattern, pin, and cycle information. Generation of this datalog could affect test time if the baseline indicates only a pass or fail condition. The tester must also be able to capture the requisite number of scan failures to be data logged.

Typically, it takes only a few hundred fail cycles, or less, to perform scan diagnosis. In terms of failing test patterns, this equals 50 failing patterns, although diagnostics are still possible on 20 or fewer failing patterns. Capture memory of this size is well within the specifications of most ATE. The diagnostic engine, running offline, would import the scan pattern number, chain, and scan-cell position for each tester failure.

However, tester failures are typically reported in terms of failing pattern name, cycle number, and pin name. This information must be converted into the "scan domain" to be understood by the diagnostic engine. Some ATE vendors provide conversion capabilities; otherwise, in-house scripts can be written. The challenge lies in processing and diagnosing failures from multiple wafers or lots in order to separate systematic defects from random defects–and to identify those with the highest impact on yield. Software systems must be created to perform this high-value-added task.

Diagnostics for Memories Memory arrays within SoCs are often tested with on-chip memory built-in self-test (BIST) controllers. These controllers can easily report failing data and addresses that don’t match expected values. Running memory BIST on ATE is rather simple and straightforward. The test challenge lies in extracting the failure information for diagnosis (e.g., bitmapping). Generally, the tester needs to determine whether or not a failure occurred. Then it extracts the failing address, data, controller state, and so forth.

Many memory BIST controllers are designed to operate at full system speed and will operate from the system clock (Fig. 3). But trying to report diagnostics from the system clock may be impractical. Many times, therefore, a separate diagnostics port running from a different clock is used. Although various methods are available, including extracting the entire memory contents, this article will focus on two common approaches that extract only fail information.

The first approach is to rely solely on the IEEE 1149.1 Test Access Port (TAP). The BIST can be initialized and run through a set of TAP instructions. Upon completion, the ATE gives additional instructions to check the pass/fail result and shift out the fail information on TDO (Test Data Out). Only a limited number of fails can be stored on-chip. To extract the entire set of fail information, the process repeats itself. This method has the advantage of requiring just the TAP pins to operate. On the downside, test time is longer if more than a few fails need to be analyzed.

Another approach uses dedicated memory BIST pins: a diagnostic clock, an enable pin, and a fail-data-out pin. The tester provides a diagnostic clock to shift out the fail data when the enable pin is asserted. If the design permits the diagnostic clock to be continuously applied, the BIST can run and pause itself to shift out fail information, then continue. Furthermore, the pause is only necessary when a second fail occurs, before a prior fail has completely shifted out. This method offers the advantage of improved test time. Figure 4 illustrates how this would look (the capture region is shaded). When the enable pin is asserted, the tester captures the fail data.

A host of other methods also perform BIST diagnostics. Key factors in selecting a diagnostic strategy are the number of available device pins and the test time required to extract the fail data. The latter can be greatly influenced by such ATE factors as capture-memory depth, pattern matching, and capture-on-fail capability.

Memory BIST diagnostics and information reported on the diagnostics port can also be optimized for memory-repair strategies. A BIST controller configured for repair may report failure information in the form of a defective row, column, or bank. This can be more directly used by the repair technique rather than reporting many failing address and data bits. The BIST controller for a repair system can also report if a repair isn’t possible based on the locations of failures.

Logic BIST And Embedded Compression Logic BIST is almost always based on scan technology. However, scan chains are automatically loaded with results compressed into a signature by on-chip logic. Scan chains are mostly used for situations where tester access is impractical or impossible, such as in fielded system test. Use of logic BIST greatly simplifies tester requirements and test-program data for pass/fail testing.

Logic BIST diagnostics are fairly simple from an ATE perspective. If the BIST fails, as indicated by a failing signature, then it must be placed in bypass mode and a regular ATPG-based pattern is applied directly to the internal scan chains. The standard scan diagnostics approach is then applied to the bypass pattern (i.e., datalog the failing pattern, pin, and cycle information). Because a second bypass pattern set must be applied, test time increases significantly. To effectively address this problem, EDA and ATE vendors must partner to develop new solutions for diagnosing logic BIST failures quickly and efficiently.

For silicon debug, logic BIST diagnostic routines can be hosted on the tester to enable a slightly more intelligent flow. Test routines would rerun the logic BIST patterns, but either in self-contained groups or self-contained patterns. Each self-contained group would load the pattern generator and signature compressor with a starting seed, and perform a comparison of the signature against a known-good signature for the group. Seed values and signatures can be applied and observed using a diagnostic port. This port will operate using a diagnostics clock to allow for an easy ATE interface. Based on the results, bypass patterns can be generated and applied for only the failing specific patterns or groups of patterns.

As mentioned earlier, SoC testing is encountering new demands to maintain product quality. This is mostly driven by a large increase in the number of test patterns necessary during test. The growing sizes of SoCs and a rise in speed-related defects associated with 0.13-µm and smaller geometries force an increase in test data and time. Logic BIST provides test with minimal test data, but must be supplemented by deterministic tests to provide adequate test quality. Several approaches have been proposed to deal with this issue. Some suggest additional requirements and methodologies to modify the functional logic so that it can work with BIST.

Scan-based technologies, such as embedded deterministic test (EDT), are quickly being adopted to deal with the growth in pattern count without forcing changes to the functional design or tester. EDT, which involves locating minimal logic around the scan-chain I/O, uses advanced pattern-generation techniques. The result is compressed data that’s applied by the tester and expanded by the EDT logic at the scan-chain inputs (Fig. 5).

From the ATE perspective, the pattern application is the same as standard scan test patterns. The scan-chain outputs are compressed by EDT logic and shifted out for comparison to compressed good values within the tester. The pattern generation is deterministic, so the same quality of test that scan and ATPG produces can be maintained, but with up to 100 times more compression. It can also support advanced at-speed testing, such as using programmable phase-locked loops to drive the at-speed clocking.

However, EDT uses compressed outputs. As a result, it’s difficult to determine the internal scan cells that fail. This complicates the ability to perform a direct diagnosis, even though it’s not as extreme as having only a BIST signature for comparison. A simple diagnostics approach is to bypass the EDT logic and apply standard ATPG diagnostics, as with logic BIST.

Moreover, new on-line diagnostic routines to directly diagnose EDT compactor outputs are under development. These techniques eliminate the need for additional tester time to perform on-line diagnostics without sacrificing test quality. But care must be taken not to create design-for-test (DFT) strategies that are difficult, or even impossible, to interface with ATE. Ultimately, the diagnostic scheme must make sense in a production environment.

Employing DFT methods for diagnostics and problem resolution remains underused by the semiconductor industry. Moreover, the technical and economic imperatives to deploy DFT diagnostics to speed production ramps of SoC devices are increasingly forceful. The 2001 ITRS Roadmap for Semiconductors notes that the "need for alternatives \[to traditional defect isolation methods\] is driven by such factors as finding smaller, more subtle defects, tighter pitches, and increasing numbers of metal layers." Without guidance from a design-based software diagnostic system, traditional failure analysis methods become ineffective guesswork in the face of increasing device complexity. The result is time-to-volume delays and unprofitable yields. The 2001 Roadmap goes on to highlight software-based fault-localization methods, "the need for which is especially acute." Thus, software-based SoC diagnostics are becoming both challenging and necessary.

Leading-edge practices, incorporating DFT diagnostics, aim beyond diagnosis of a few devices to quick isolation of systematic defect types with the biggest potential impact on yield. This next evolution of DFT diagnostics depends on current and emerging diagnosis algorithms, post-diagnosis analysis, physical domain links, and the growing integration between ATE and DFT tools.

To unify these elements, to maximize automation and ease of deployment, and to facilitate fast processing of large datasets, an integrating software framework should prove fundamental. Such a framework would also empower product and test engineers to describe yield-loss problems in actionable, design-oriented terms. This would focus DFT or design-engineer involvement to only those areas where their expert judgment is required. This vision can be realized when the emerging capabilities from EDA and ATE vendors are combined with IC-designer cooperation to implement diagnosable designs. Product- and test-engineer cooperation to learn to support advanced diagnostic approaches in the manufacturing environment is also necessary.