With the proliferation of multicore, deep-sub-micron designs, the verification of clock domain-crossing (CDC) signals has become a critical step in the design process. Multicore devices incorporate an ever-increasing number of asynchronous clock domains and integrate many design IP blocks on a single chip; thus, full-chip CDC verification must deal with a wider variety of tricky synchronization schemes. Although many design teams have adopted automated CDC verification tools to handle this situation, several significant methodology challenges remain.
At AMD, we addressed these challenges by successfully developing a CDC verification methodology and adopting a CDC tool that supported a partitioned verification approach. This approach was necessary because our device combined both a multicore central processing unit (CPU) and a graphics-processing unit (GPU) on a single chip—known as an accelerated processing unit (APU), or what the company refers to as the AMD Fusion processor. Typically, verification of the GPU and CPU does not have to happen concurrently in a single tool flow because they occupy separate chips. However, in the case of the AMD Fusion processor, doing so became mandatory.
To satisfy the wide spectrum of requirements imposed by such a high level of integration, we found the most effective RTL CDC verification methodology should include a standardized block-level flow, support for a wide variety of synchronization schemes, and a hierarchical approach. Beyond the RTL, our methodology recognized that CDC verification was not truly complete without avoiding common pitfalls at the gate level. We proved our methodology and tool choice in the case study described in this article, which demonstrated the effectiveness of both our RTL and gate-level CDC verification.
The Elusive CDC Value
CDC signal issues are nothing new. However, designers of multicore designs must be even more vigilant in handling metastability, one of the more troublesome causes of CDC errors. Metastability occurs when the output of a flop is in an indeterminate state. When the value of a signal is metastable, it randomly and unpredictably stabilizes to a 0 or a 1, only one of which is correct. This random behavior of the flop may lead to all kinds of design failures; RTL simulation completely abstracts away these failures. Hence, an in-depth and complete CDC analysis is as important as the verification of any other functionality in a design because a small error could have serious consequences.
By definition, the random quality of metastability makes tracking CDC errors in silicon elusive because the CDC signal does not yield the same signature every time it is tested. Finding such an elusive problem is difficult, if not nearly impossible, once the design tapes out. This puts more emphasis on the quality of CDC verification.
The Meta-CDC Challenge Grows
The trend toward multicore designs exacerbates CDC methodological challenges in several ways. The first has to do with the sheer size of today’s designs. The ability to squeeze more and more design features and functionality into a single chip as the industry moves deeper into sub-micron and nanotechnologies means that 100 million- and even billion-gate designs have become a reality. In fact, AMD’s APU chip has more than a billion transistors. It is impossible for a single engineer to handle a design of this magnitude, so CDC verification must be segmented and shared among different verification teams. The CDC verification tool also must be able to handle a billion-transistor design efficiently and reliably.
Splitting these designs across multiple teams, often spread across the globe, is a practical necessity. Because different design teams use different design styles, a successful CDC methodology must ensure proper sharing of information and that teams make correct assumptions about it. Hence, it is critical to validate those assumptions when assembling multi-sourced IP.
In a multicore fusion design such as AMD’s APU, these complications—as well as design complexity—multiply by two because an APU combines two independent chips onto one SoC. This widens the spectrum of CDC requirements to design and verify, as designers must deal with multiple clock domains in a single processor and across multiple cores.
Finally, it is essential that verification teams are aware of common methodological pitfalls and know how to avoid them. These often-overlooked issues require gate-level CDC verification. This article focuses on the approach we used to standardize RTL CDC verification, but it is important to realize that a design’s CDCs cannot be considered clean unless certain measures are taken at the gate level.
Common Methodology Is Key
At AMD, addressing these bigger challenges meant establishing a standardized block-level CDC verification methodology, defining an approved set of synchronization schemes, and adopting a partitioned verification approach. The standardized methodology employed a project-wide CDC tool flow that prescribed a specified CDC verification tool, a common set of scripts and preference settings, and a universal design approach. A common, block-level flow enabled regulation of the CDC verification approach across the entire project. This eliminated duplication of effort in setting up the CDC verification environment across multiple teams and provided a predictable, uniform format and quality of results.
AMD’s approach brings together diverse design styles by supporting a wide variety of set synchronization schemes. In addition to common, relatively simple CDC synchronization schemes, it supports complex schemes such as FIFOs, handshakes, and ratioed synchronous clock enables. It also supports custom synchronization schemes and latch-based synchronization.
Finally, the hierarchical approach to CDC verification eliminated any concerns about large-scale designs. Tool capacity scales to design size, eliminating the glass ceiling on how large a design can be. Partitioned verification allowed us to share our designs to be either shared among different project teams, or split up into manageable chunks for an individual team. Conversely, we found that running a top-down approach was not very efficient in terms of compute resources and debugging time as well as the distributed computational and human effort required.
Tools Make the Difference
With a common methodology in place, the next step was a single tool that best suited the entire project’s needs. Use of a single tool creates a seamless flow from the SoC to the block level as well as from the block to the SoC level. It also eliminates concerns about how to combine such things as constraints from different tools used at different abstraction levels.
An important quality of the CDC verification tool is that it must overcome the challenges of designing and verifying large multicore designs. Because flat, top-level CDC verification is impractical, the tool must have the capability to partition the design, verify the blocks independently, and then put them all together again and verify the design as a whole. Having this hierarchical capability in the CDC tool eliminates the capacity bottleneck, which is essential for a multicore, billion-gate design.
In developing the methodology, a strong candidate for the tool of choice was the Mentor Graphics’ Questa CDC verification tool. It establishes a hierarchical flow that allowed verification of IP from anywhere in the world. It also is able to interpret the input constraints and the design so it could output hierarchical control files. These were essentially gray boxes; in other words, black boxes with enough information about inputs and outputs to run CDC verification at the interfaces after combining the blocks in the SoC. The gray-box models capture relevant clock domain information for the ports of the IP required for checking connectivity between blocks. The tool ignores the internals of blocks. IP block designers are responsible for intra-block CDC path verification. The resulting top-level verification is fast and produces a manageable number of CDC reports to review.
An Example For Illustration
Consider the example of an APU design that includes the CPU, GPU, DDR PHY, and PCI PHY domains. Each domain is further classified into different blocks. Typically, the CPU and DDR PHY comprise one chip and the GPU and PCI PHY comprise another. The APU, however, combines all of them.
The purpose of the exercise was to determine how many blocks to expect in an ordinary APU design so we could sense how big the design would be. The design size of a typical APU is more than one billion transistors. This particular design had more than 60 clock domains, twelve gray-box blocks (these are the hierarchical blocks), and more than 15,000 custom synchronization cells.
After dividing the entire design into the twelve blocks, we ran CDC verification. The Questa CDC tool generated a hierarchical control file, and then we pushed verification up to the SoC level. The results showed the CDC tool gave very good coverage and proved very productive at finding issues at the block, IP, and system levels.
Finally, we checked for potential CDC problems that remain because of unavoidable holes in static CDC verification at the register-transfer level. These are common pitfalls because many teams assume completion of CDC verification at RTL. However, one cannot analyze some CDC issues until after gate-level synthesis. Chief among these are design elements that have asynchronous interfaces added at the gate level; RTL for custom macros and analog blocks that is not available for RTL CDC verification; deadlock scenarios; and, differences between CDC tool synthesis results and actual RTL synthesis results.
As expected, when we ran gate-level CDC verification, we found a few bugs related to these common pitfalls. The results emphasize that it is not good enough to run CDC only at the RTL. Be sure to run CDC at the gate level.
Fully and accurately verifying the CDCs of very large, multicore designs requires a standardized methodology. Such standardization requires a CDC verification tool that is capable of partitioned verification. The tool must support different design styles and different synchronization styles at the block level, and it must be able to piece together everything at the top level for full-chip verification. Finally, one must run CDC verification at the gate level to plug any holes in CDC verification at RTL. Complete and accurate CDC verification is critical: a small error will have serious consequences that may cost millions of dollars. Only by establishing the right methodology and selecting the right tool can global design teams be certain their multicore designs are truly CDC-clean.