Why Tools Are Failing Designers Of Deep-Submicron Chips

Until recently, integrated-circuit design and manufacturing methods have scaled successfully. They allowed designers and fabs to make incremental changes while continuing to use existing tools. But traditional design methods are running out of steam with today's deep-submicron chips. The demands of system-on-a-chip (SoC) devices with millions of gates have stretched tools' capabilities beyond their practical ability to keep pace with design productivity. Timing verification is just one area in which the limits of today's tools are making SoC design harder, longer, and less efficient.

There's no question that powerful tools are enablers for ASIC designs. But these tools don't yet have the capabilities required for successful SoC design. They don't just need the specification of a complex SoC. The tools also require the communication of that specification throughout the design process. Design and layout tools must be able to integrate well with all of the other toolsets used in the design. After all, improper translation between toolsets can lead to inaccuracies in the final design.

Until recently, the traditional design-flow practices that delay layout and timing verification in the design cycle also hampered the effectiveness of these tools. Still, design teams are often forced to perform numerous iterations before achieving a successful design.

Other problems lie in the verification of the SoC. The chip's magnitude and complexity have made this step both difficult and time-consuming. Today's deterministic, functional verification methods aren't successful in catching corner cases and exceptions, which can have grave effects on the design's overall effectiveness. Most designers get stuck running excessive simulations at the gate level. This takes a great deal of time and computing power. It also makes deterministic, functional-verification methods impractical when trying to meet a time-to-market window.

To solve these issues, many designers are striving for higher levels of abstraction—from the gate level to the register-transfer and C levels. Too many gates exist to be able to simulate and verify each one. Still, each step up the abstraction ladder offers much faster dynamic verification, even though it's at the expense of accuracy.

Timing delays are now predominately determined by interconnect rather than gate delays, however. So timing is heavily influenced by layout. Interconnect capacitance can be estimated before layout, but only the final design will reveal actual capacitances and related delays. Designers accept the need to intervene manually to fix some timing problems. But even if the interconnect estimates are 99.99% accurate, a large number of errors will still require such intervention in a million-gate design. A large chip simply has too many timing paths.

To overcome some of these difficulties, the design team and the tools they use must employ a hierarchical approach to the design process. In the flat design approach, all of the blocks of a chip, as well as the interconnects within and between those blocks, are physically constructed as one large component. Any changes required could cause a rippling effect.

Often, modifications to block A may impact the timing performance of block B. Fixing block B can then cause a problem in block C, and so on. Not only is this harmful to the overall design productivity. It also means that no part is finished until the entire design is complete.

Minimizing The Ripple A hierarchical approach minimizes this rippling effect. It actually bears a lot of resemblance to the design of a printed-circuit board. A pc-board designer focuses on device I/O, giving minimal consideration to the internal workings of each chip. Timing is a function of the interconnection of the chips on the board and the design of the traces that join them.

Likewise, in hierarchical SoC design, the increasing use of core-based IP enables designers to treat blocks as "black boxes." Only interblock links need to be considered. The internal timing of each block is verified beforehand, so the block only needs to be verified once during the design process. Incremental changes to one block need not affect the internal-timing characteristics of other blocks.

With the increased use of block-based design, floorplanning is becoming more prevalent in SoC designs. It allows for the relatively quick, high-level positioning of circuit blocks. This positioning enables designers to identify the optimal positioning for area, minimize interblock routes, and maximize data flow at the full-chip level. Performing several quick assessments up front will get the optimum full-chip floorplan. This can save unnecessary iterations and cycle time later in the design flow.

Correlating the timing constraints of synthesis with those of the physical layout requires up-front estimates of the interconnects' capacitance and resistance. The normal industry practice is to create statistical interconnects. Referred to as wireload models, these interconnects are typically provided by the semiconductor manufacturer. The models capture pre-layout information for timing estimation in the synthesis tools, static timing analysis, and timing validation.

The key is to select an appropriate wireload value for the design. Being too pessimistic limits performance and causes difficulties in synthesis, while an aggressive value means that the place-and-route results will never meet requirements. The goal is to have pre-layout timing results that are close to those of the post-layout timing, while remaining slightly pessimistic.

One Model Not Enough The typical approach is to use one standard wireload model for each given technology, or for each given complexity (block size) within a technology. In many cases, a single wireload model is used for every block that contains 30,000 gates. It's been my experience, however, that this approach cannot provide accurate results. Figure 1 shows the extracted wireload models representing four different 30,000-gate modules in the same silicon technology. The chart illustrates that there is no uniformity in the results. Each block can have its own wireload model. Independent of size, it's instead driven by the specific logic structures within that circuit.

An alternate solution is to offer a selection of tuned wireload models, enabling the designer to select a model that more closely mimics the design. The first step is to develop the "wiring profile" of the chip through floorplanning. This circuit-specific profile is then used to benchmark against the standard wireload model. If they're not aligned, the designer chooses one that does match from a library of wireload models based on a significant amount of data, rather than on a single-instance custom wireload model.

Timing-driven layout, in which timing constraints are defined before layout rather than analyzed afterward, supplements dynamic timing analysis. Constraints are forwarded from the timing-analysis tools to the place-and-route tools, ensuring a complete and smooth flow of information through the design process. Some tool vendors are working toward a seamless flow between toolsets to ensure the accurate communication of this timing information. This will permit designers to complete physical design more quickly and move on to analyze the interconnect information, such as capacitance and resistance.

Performing in-place optimization (IPO) and/or engineering change orders (ECO) can accelerate timing closure on the handful of rogue nets that remain after "completing" full-chip place and route. To enable these capabilities, it's crucial to configure the cell library with a sufficient set of drive strengths for each functional family. Current tools can perform quick cell swapping, both up and down, and/or logic re-optimization to improve the timing through these critical paths. These changes can be quickly incorporated into the existing layout, creating a combination that can automate timing closure and move the design to the tape-out stage more quickly.

A hierarchical approach has resulted in a faster time-to-market in the example of a quad pointer-processor chip (Fig. 2). This device supports both SONET and SDH applications, and provides the pointer processing and path monitoring on an equivalent of 48 Synchronous Transport Signal-Level 1s (STS-1s).

Consistent with the approach, a single pointer-processor element was built as a standalone block. The majority of the effort was spent on optimizing that block for both area and timing. In the final layout, the block was instantiated in four places. This allowed the design team to finalize it earlier in the design cycle, while other control-logic and mixed-signal clock-/data-recovery elements were being implemented.

Fewer Design Iterations The final integration efforts required designers to focus on the pointer-processor I/O timing with the control logic, a much smaller and more manageable effort as compared to the full design. This approach significantly reduced the number of design iterations and brought the product to market several months faster than the previous methodology.

By shifting from a flat view to a hierarchical perspective, SoC design teams are having greater success and reaching the market faster than before. Floorplanning, timing-driven layout, and IPO/ECO have already begun to prove their effectiveness. With new tools and capabilities being introduced by key vendors, SoC teams will come even closer to their goal of first-success designs.

As better wireload models and utilities move designers forward, don't forget that further improvements in several other areas could increase the accuracy in timing estimates. They could even simultaneously speed up the design process.

Improvements Needed Take synthesis tools, for instance. They lack the ability to build models with sophisticated knowledge of interconnects and cell placement. Rather than requiring rough estimates from the designer, the tool should know the placement of the blocks on the chip, perform intelligent placement of interconnects, and apply a more sophisticated capacitance for the design's concurrent synthesis. Many EDA tool providers are improving their existing toolsets to offer some of this intelligence.

Other tools need to provide better accuracy on early estimations of block-level boundary conditions, like the slew rates of data entering the block and the loading of data presented to the block. These estimates then need to be passed to other blocks within the design to provide faster timing resolution.

Finally, the collection of EDA tools must be seamlessly integrated through the design flow via common library and database format structures. Even though there's a significant investment in the tools used today, integrating this sort of sophisticated functionality into a single toolset is the best way to push future chip design to the next level.