A multisource clock tree is a hybrid containing the best aspects of a conventional clock tree and a pure clock mesh. It offers lower skew and better on-chip variation (OCV) performance than a conventional clock tree; lower clock tree power/area; and a shorter, easier flow compared to a pure clock mesh implementation. A renewed emphasis on high-frequency clock design has heightened interest in multisource clock-tree synthesis (CTS). This article provides a tutorial on how to implement a multisource CTS design.
Table of Contents
- Introduction To Multisource CTS
- Pre-Mesh Tree
- Multisource Mesh Fabric
- Multisource Clock Trees
- Multisource CTS Flow Steps
- Mesh Fabric Pitch Determination
- Mesh Creation
- Tap Point Determination And Sink Assignment
- Synthesize The Multisource Trees
- Timing Analysis
- Multisource CTS Performance And Flexibility
Introduction to Multisource CTS
Multisource CTS represents a new clock-distribution technology that fills the methodology gap between conventional CTS and pure clock mesh. Whereas pure clock mesh delivers the best possible clock frequency, skew, and OCV results, and whereas conventional CTS delivers the lowest power consumption and the easiest flow, multisource CTS offers a compromise between the two methods while favoring the OCV tolerant nature of pure clock mesh. As a result, a larger set of designs can access the considerable benefits garnered from mesh technology.
The primary benefits of multisource CTS include:
• Higher performance/lower skew than conventional CTS
• Better OCV tolerance (within die) than conventional CTS
• Better multi-corner performance (die-to-die) than conventional CTS
• Less power consumption than pure clock mesh
• Greater tolerance for irregular, macro-laden designs than pure clock mesh
• Faster, easier flow than pure clock mesh
• Deeper clock gating levels enabled for more complex power plans
There’s renewed focus on high clock frequencies, evidenced by the announcements of gigahertz-plus design releases. The emergence of multisource CTS coincides with this rising trend. The many performance and flexibility benefits of multisource CTS make it a strong candidate for adoption in a broad set of design types. In fact, several high-clock frequency designs are already starting and taping out with multisource CTS.
1. A multisource CTS design comprises three different structures in the design. Starting from the top, there’s the clock root and the pre-mesh clock tree. The next structure down is the mesh fabric (shown in blue). At the bottom resides a collection of moderately sized clock trees.
To understand multisource CTS, it’s best to first examine a conceptual side angle view of a multisource CTS design (Fig. 1). On visual inspection, one can quickly determine three different structures in the design. Starting from the top, there’s the clock root and the pre-mesh clock tree. The next structure down is the mesh fabric (shown in blue). At the bottom resides a collection of moderately sized clock trees.
Each buffer in the pre-mesh tree drives four other buffers, which implies that the pre-mesh topology is implemented using H-tree placement and routing. An H-tree structure provides a uniform, scalable, and predictable means of distributing the root clock over a large area. In addition, H-trees exhibit excellent corner-to-corner variation tolerance because of their balanced structure.
Multisource Mesh Fabric
The multisource mesh fabric resembles a power/ground or clock mesh fabric, but is one or two orders of magnitude less dense. The coarse fabric smoothes out any remaining clock arrival-time differences from the multiple H-tree buffers that directly drive the fabric, whereby the skew measured at the mesh plane is effectively zero.
The fabric also represents the lowest part of the multisource CTS topology shared by every sink in the design. The conceptual position in the Z-axis determines the specific OCV tolerance characteristic of a multisource CTS design—the higher up in the topology, the more it behaves like conventional CTS and the less the OCV tolerance. Conversely, as the mesh gets pushed further down in the structure, the more it behaves like clock mesh and, as a result, benefits from the best possible OCV tolerance.
In multisource CTS, the design team chooses the point along the OCV tolerance spectrum to target the design. Though the mesh fabric confers considerable benefits to the approach, it also requires circuit simulation to time the design. This is discussed in more detail later in the article.
Multisource Clock Trees
The multiple clock trees attached to the coarse mesh gives the technology its name. As mentioned, designers may target the OCV performance level by targeting the depth of the clock tree. In clock mesh, the guideline is to restrict the buffer and clock-gating depth to one or, at most, two levels. Multisource CTS generally ranges from three to nine levels of buffers of clock gating. If more levels of clock gating become necessary, conventional CTS may be the natural choice.
The synthesis and optimization of the multiple clock trees leverage conventional CTS methods. One benefit of multisource CTS is that designers can take a “divide and conquer” approach. In this scenario, the root-to-mesh portion is timed with circuit simulation, and then the multiple clock trees are timed with standalone signoff timing engines or with the timer embedded in the place-and-route tool.
Multisource CTS Flow Steps
With an understanding of the main concepts of multisource CTS in place, it’s time to proceed through the steps in the flow. The starting point is just after cell placement and placement optimization—a fully placed design.
Mesh-Fabric Pitch Determination
Perhaps the hardest part of the flow comes first—determining the pitch of the multisource mesh. In some multisource CTS implementations, the intersections of the horizontal and vertical mesh spines become potential tap-point locations. For that particular style, determination of the mesh pitch must consider the location of the tap points.
2. OCV latency is used to establish the tap-point density in all cases. The goal is to establish the minimum topology that meets the design’s OCV latency target. This will become an automated step in the near future, but currently it’s determined per the flow shown.
In all cases, OCV latency is used to establish the tap-point density. The goal is to establish the minimum topology that meets the design’s OCV latency target. This will become an automated step in the near future, but currently it’s determined per the flow shown (Fig. 2).
3. A plot of multiple trial runs with different tap-point configurations can be examined to determine the optimum density. The graph shows the OCV latency decreasing as a function of an increasing number of tap points. Timing analysis of the mesh fabric validates the tap-point locations, and thus the mesh-fabric design.
Next, a plot of multiple trial runs with different tap-point configurations is examined to determine the optimum density (Fig. 3). The graph shows the OCV latency decreasing as a function of an increasing number of tap points. Timing analysis of the mesh fabric validates the tap-point locations, and thus the mesh-fabric design.
After establishing the mesh-fabric pitch, those values are used as inputs to create the fabric. Other command inputs include the number of spines in x and y and any regions where the fabric creation should be suppressed.
Tap-Point Determination and Sink Assignment
Sinks in the local area are attached to the tap point after determining the number and location of the tap points. Though multisource CTS tap-point clustering is based on the geography, it’s influenced by the design’s natural hierarchy.
4. In this example, the highlighted sink lies in the geographical domain of the adjacent tap point, but is naturally part of the bin on the left. Keeping that sink in its natural hierarchy saves clock buffer area and power, since it needn’t replicate the clock gating logic on the adjacent bin.
An example shows that the highlighted sink lies in the geographical domain of the adjacent tap point, but is naturally part of the bin on the left (Fig. 4). Keeping that sink in its natural hierarchy saves clock buffer area and power, since it needn’t replicate the clock gating logic on the adjacent bin.
Synthesize the Multisource Trees
Now that the sinks are associated with their tap points, the clock trees are synthesized using conventional CTS methods. The process starts by placing buffers at the tap points. The input pin of these buffers attaches to the mesh fabric, and the output is the local clock root for each instance of the multiple clock trees below the mesh.
The logical clock tree is then split commensurate to the defined tap points. Subsequently, clock trees are compiled and optimized for skew. The multiple clock trees are balanced during compilation or as a post-processing step, per the designer’s preferred practice. After clock-tree synthesis and optimization, the clocks are routed and the design is ready for signoff timing analysis.
An enabling technology for mesh technologies comes in the form of sign-off quality timing analysis of the multiply-driven clock-mesh loads. While static timing tools alone aren’t well-suited to analyze parallel drive networks, the combination of Spice-accurate simulation with a static timing engine provides a seamless, signal-integrity-aware timing approach.
Many digital designers are uncomfortable with circuit simulation techniques and usage. Ideally, then, the place-and-route tool would encapsulate a call to a circuit simulator to time the multiply-driven net. The same encapsulated flow should also back-annotate the timing results onto the design for later use via static timing for both reporting and optimization activities. Reduction of the learning curve overhead to time the multiply-driven circuit is an important part of a product-worthy multisource CTS flow.
As noted earlier, some design teams can isolate circuit-simulation activities away from pure digital design by timing the root to mesh path separately from the multiple clock trees. An ideal clock applied to the multiple clock trees enables timing and optimization in an all-digital place-and-route environment. It’s always recommended, however, to perform final sign-off timing with a full circuit-simulation-based timing analysis run.
Multisource CTS Performance and Flexibility
As designs expand, greater insertion delay, or latency, becomes an increasingly important quality metric. That’s because these large latencies expose higher jitter. The OCV-tolerant physical structure of multisource CTS—the large amount of shared path, the coarse mesh fabric, and the relatively shallow clock trees (three to nine gating levels)—delivers attractive latency and OCV tolerance performance.
5. Multisource CTS provides superior OCV latency performance compared to conventional CTS. The left-most bar of the chart measures the OCV latency of a conventional CTS design. The remaining bars in blue show each of the post-mesh sub-trees in the multisource CTS version of the same design. Latency is much lower in the multisource CTS implementation.
A graph helps illustrate the superior OCV latency performance with multisource CTS versus that of conventional CTS (Fig. 5). The left-most bar of the chart measures the OCV latency of a conventional CTS design; the remaining bars in blue show each of the post-mesh sub-trees in the multisource CTS version of the same design. Latency is much lower in the multisource CTS implementation.
Multisource CTS also exhibits a high degree of flexibility. It can cover almost the entire spectrum between conventional CTS and clock mesh. Designers can target specific OCV latency, skew, power, and clock frequency results, enabling them to make the required tradeoffs when prioritizing these metrics. The controlling knobs to turn include the pitch of the mesh, the number of tap points, the clocking depth of the clock trees, and the presence or absence of the mesh fabric itself.
Multisource CTS presents a viable hybrid approach for designers seeking the best of conventional CTS and pure clock mesh. It provides better high-clock frequency performance and OCV tolerance than conventional CTS. It’s also more tolerant of complex floorplans, creating greater flexibility for clock-gating depth than pure clock mesh.
The multisource CTS design flow is easier to use than the pure clock-mesh flow, yet maintains many of the benefits of clock mesh. Multisource CTS further lowers the deployment barrier for mesh-style design practices.
Early multisource CTS adopters have had prior experience with mesh technologies, helping shape the flow automation. Adoption of multisource CTS continues to grow, as design teams seek the easiest path to very high-frequency clock design.
- Toyama, Harvey, Multi-Source CTS Delivers Flexible High Performance and Variation Tolerance, http://www.synopsys.com/cgi-bin/protected/iccwp/pdfr.cgi?file=multiSource_cts_wp_v2.pdf