Try A Hybrid Flow To Overcome Hierarchical Design Limitations
In a flat design flow, placement and routing resources are always visible and available. Designers then can perform routing optimization and avoid congestion to achieve a good-quality design optimization. Yet large and optimization-intensive designs make flat design less desirable because of long tool run times and large memory-space requirements.
On the other hand, conventional hierarchical design-optimization flows dramatically reduce tool run times and memory space requirements. They let designers partition a design into a number of blocks and optimize the blocks separately and in parallel. But the conventional hierarchical flow can lead to sub-optimal timing for critical paths traversing through the blocks and for critical nets routed around the blocks.
Consequently, a hybrid hierarchical flow was developed to overcome challenges in a high-performance DSP application. It succeeded in implementing this design after a conventional hierarchical flow failed to close timing.
The hybrid hierarchical flow combines the strengths of the conventional flat and hierarchical design optimization flows. It allows for a hierarchical, efficient optimization of a large design with quality results comparable to flat design optimization. With the hybrid flow, designers can route critical top-level nets through blocks. Blocks in global timing paths can be further optimized at top-level design optimization.
The hybrid hierarchical flow overcomes the limitations of conventional hierarchical design optimization methods because global nets needn't be routed around the blocks. In addition, block optimization is less sensitive to the quality of the block's floorplan and the accuracy of the block's timing budget.
With the hybrid hierarchical design flow, efficient and predictable design completion is possible without compromising optimization targets. It exploits existing design-tool capabilities to the fullest with advanced design techniques.
Hierarchical Design Limitations It's often difficult to set a good I/O budget for every block in a hierarchical design because designers don't have accurate predictions of top-level global path timing and interblock routing before layout. The resulting poor timing constraints can create a problem because the timing of block I/O paths can't be changed during top-level design optimization in a conventional hierarchical flow.In the conventional hierarchical flow, blocks are modeled as black boxes when they're integrated into a top-level design. Therefore, they act as placement and routing obstructions. Nets that interconnect the blocks must be routed through channels between blocks. These routes tend to be long and can cause timing and signal-integrity problems. Designers can insert signal repeaters and net splitters in the channels to fix problematic routes, but this strategy may not succeed. Figure 1 shows a simplified view of a fully hierarchical design that leads to long signal routing.
The channelized floorplan also may cause routing congestion in core-limited layouts, particularly in a design with many cross-chip connections. As a result, conventional hierarchical timing-closure methods usually require more chip area than flat methods.
The Hybrid Hierarchical Flow The hybrid hierarchical design flow introduces the concept of a pseudoblock. The pseudoblock models parts of the design that will cause timing violations and routing congestion at top-level design if they're modeled as black boxes. Designers optimize the pseudoblocks separately and in parallel with the regular blocks, but they don't "black-box" the pseudoblocks during top-level design optimization. Instead, the pseudoblocks' optimized netlist and cell placement are integrated into a top-level design. The pseudoblock placement and routing resources are then visible in top-level design optimization. Cross-chip nets can be routed through the pseudoblocks (Fig. 2).Designers also can optimize pseudoblock I/O path logic at the top level, where they can get accurate timing of global paths traversing through the pseudoblock I/O paths. Even though the netlist of a pseudoblock is present in the top-level design, top-level design optimization is quite efficient because the pseudoblocks are optimized before being integrated into the top-level design. This "pre-optimization" significantly reduces the scale and number of violations in the top-level design, and hence the optimization runtime.
The first step in which the hybrid flow differs from the conventional hierarchical flow is that it partitions the design into three types of sub-designs: blocks, pseudoblocks, and glue-logic blocks (Fig. 3). Each type of sub-design has particular qualifications:
In the conventional hierarchical flow, proper physical constraints such as pin-side assignment and pin-placement blockages are necessary to achieve high-quality pin assignment. In the hybrid hierarchical timing-closure flow, pseudoblock pin assignments don't have to be optimal so long as the pins are placed on the correct side because the pins serve only to guide I/O cell placement in pseudoblock optimization. The pseudoblock's I/O connections are made through its I/O cells, which can be further optimized at the top level where necessary.
In contrast to functional-based block partitioning, partitioning in the hybrid flow needs to consider more physical implementation and timing closure requirements. The partitioning process is combined with floorplanning.
Designers can derive a pseudoblock floorplan from the top-level floorplan. The size of the pseudoblock floorplan depends on the defined cell usage, with the consideration of potential local-routing congestion and the number of global paths crossing the pseudoblock. The size also depends on the target fabrication process because routing resources vary with the process. For a deep-sub-micron CMOS process, it's a good idea to use 55% to 75% cell-area usage to determine pseudoblock size. Due to the impact of the global nets routed through the pseudoblocks, higher cell usage tends to result in significant pseudoblock timing variations when integrating the optimized pseudoblock into a top-level design.
Top-Level Design Integration Similar to a conventional hierarchical design flow, the hybrid hierarchical flow calls for separate timing closure for each block and pseudoblock. Designers then integrate the blocks and pseudoblocks into a top-level design. The integration involves netlist generation, pseudoblock placement integration, top-level constraint adjustment, plus block layout and timing-model integration.In the netlist generation, designers combine the netlists of the optimized pseudoblocks with the glue-logic-block netlists. Also, designers should add prefixes to the names of the instance in the pseudoblocks to comply with the top-level logic hierarchy.
In the pseudoblock placement integration, designers need to adjust the placement coordinates of the optimized cells in a pseudoblock according to the position of the pseudoblock in the top-level floorplan. A simple script can shift the XY placement by a delta value. This adjustment is necessary because the cell-placement locations are referenced to the relative origin of the pseudoblock (0,0) in the pseudoblock optimization. As a result, the relative origin must be converted into the absolute origin of the pseudoblock in the top-level layout.
In the top-level constraint adjustment, designers need to convert path constraints that refer to block internal cells or pins into constraints referring to block I/O pins. This adjustment is necessary because blocks are black boxes at the top level, so they can be referenced only through their I/O pins. In a situation where different path constraints go through the same block I/O pin, the constraints need further adjustment to distinguish the paths by adding more constraint points in the paths. This task is accomplished by cascading the top timing constraint to all cell names, which is the forte of Synopsys Professional Services' Physical Compiler. Next, all block internal-path constraints should be removed because the block has already achieved timing closure.
Top-Level Optimization Although timing closure of the individual blocks and pseudoblocks significantly reduces top-level timing and design-rule violations, further optimization of the integrated top-level design is necessary to fix violations in the glue-logic blocks and global paths traversing through the blocks and the pseudoblocks. A two-pass optimization strategy works well for this top-level optimization.In the first pass, designers fix the placement of the cells in the pseudoblocks and turn off timing checks on pseudoblock internal paths. As a result, the pseudoblocks will not be considered for placement optimization and violation fixes during the top-level design optimization. However, the pseudoblocks' cells and I/O paths are still visible in the top-level design optimization. By applying this strategy, designers can greatly improve the efficiency of the top-level design optimization. To further accelerate top-level optimization, they also can turn off scan-chain timing checks.
After the first-pass optimization, some cells might be placed in the pseudoblocks, and some nets might be routed through the pseudoblocks. Consequently, timing and design-rule violations might occur. The second-pass top-level optimization fixes these violations.
In the second-pass optimization, designers remove the fixed attribute from the pseudoblock cells and turn on timing checks for the pseudoblock internal paths to allow further optimizing of the pseudoblocks in the top-level design. Then, they apply an incremental optimization to the top-level design to fix the violations that remain after the first-pass optimization.
In practice, it's a good idea to add as much margin as possible to cell utilization when determining a pseudoblock floorplan size. This extra margin reserves routing resources for potential crossover global nets in the top-level design optimization, reducing the risk of routing congestion in the pseudoblock in the top-level optimization. It's also a good idea to add enough timing margin to pseudoblock I/O time constraints. This will absorb possible path-delay variations caused by crosstalk effects from cross-pseudoblock routings and global-path timing variations introduced in the top-level design optimization.
Following these margin recommendations may prevent violations in the pseudoblocks after the first-pass top-level optimization. In that case, the second-pass optimization isn't necessary.
A DSP Example The hybrid hierarchical flow was developed for a 1.5-million-gate DSP subsystem. Before applying the hybrid flow to the design, the design team placed and routed the design. Here, large and numerous timing and design-rule violations were discovered.At this point, the hybrid hierarchical flow was applied to the design. We partitioned the design into 24 blocks, seven pseudoblocks, and 11 glue-logic blocks. After optimizing the blocks and pseudoblocks individually and in parallel, we integrated the blocks and pseudoblocks into the design and checked post-layout timing with a sign-off static timing analysis flow using our PrimeTime tool.
The results were much better than the initial placement results. The worst setup-time violation dropped from -14.8 ns to -0.58 ns, and the worst capacitance violation decreased from -3516 to -200 library capacitance units. (To speed up calculations, capacitance in a cell library is defined by integer multiples of a capacitance unit, instead of floating point numbers, in our Design Compiler.)
We then optimized the integrated design by the hybrid hierarchical top-level optimization method, reducing the worst setup-time violation to -0.4 ns and total setup time violation to -1.7 ns from -15.56 ns. Design-rule violations also were fixed. The worst capacitance violation was reduced further to -44 library capacitance units, and total capacitance violations dropped from -173,343 to -627 library capacitance units.
The team fixed the remaining minor violations via post-layout ECO along with functional ECO fixes. The DSP subsystem is now embedded in a number of wireless application chips.