High-level synthesis (HLS), or the notion of synthesizing a design into RTL from a higher level of abstraction, has been gaining currency among design teams. For some time now, there have been compelling reasons to explore HLS methodologies for certain kinds of designs, or certain blocks within a larger design, such as signal- processing blocks. Such a design flow can get you to RTL faster from languages like C++ or SystemC. And because simulation at the transaction level is orders of magnitude faster than at RTL, at least theoretically, the RTL you get out of an HLS tool should be cleaner.
A leading HLS tool, Catapult C from Mentor Graphics, has been continually improved since its 2004 launch. Initially built for block-level synthesis to RTL from pure ANSI C++ input, it has added optimizations for video and wireless designs and the ability to synthesize multiple blocks. But significant as they may have been, these improvements pale compared to the latest overhaul. Mentor has now fully endowed Catapult C with the ability to synthesize control-logic blocks, enabling it to synthesize full chips from ANSI C++ to RTL.
The simultaneous HLS of algorithmic and control-logic blocks has historically been an elusive goal. The two types of blocks have very different properties. For example, algorithmic blocks synchronize on data while control blocks synchronize on clocks. In algorithmic blocks, arbitration is implicit in the code sequence. In control blocks, arbitration is explicitly modeled. Typically, algorithmic blocks never drop data, while control blocks are often required to drop and/or ignore data.
Algorithmic blocks are usually idle when no data is available for processing. Control blocks must execute and update their states even if no data is available. These disparities between the algorithmic signal-processing blocks and control-logic blocks have led to the development of a number of domainspecific language styles for coding of control logic at levels of abstraction above RTL. Bluespec comes to mind as an example.
It’s worth pointing out at this juncture that in Mentor’s view, there are three different flavors of control logic. According to Shawn McCloud, Mentor’s product line director for HLS products, Catapult has been synthesizing control logic for years. “We had a philosophy that we want to be able to infer the control logic and automatically build it for as long as possible,” says McCloud. When it comes to intra-block control, for example, much of the logic is not explicitly coded in the C++ source but can be inferred.
“Say an algorithm is performing a transformation, such as a fast-Fourier transform. There’s a sequence of data through the algorithm. When you synthesize this and produce the data path, all of the control logic related to interfacing with this block can be implicitly inferred from the C source and built automatically,” McCloud adds. Catapult C has been able to synthesize this sort of intra-block control logic since its launch in 2004.
In 2006, Catapult C introduced support for a second variety of control logic, known as multi-block dataflow control logic. The idea here is chaining single blocks to create a higher-level subsystem. Again, the control logic is not necessarily modeled in the source code but is inferred. “This sort of logic involves the communication channels between the block and the top-level finite state machine controller of the system,” says McCloud. “This can be very complicated, like, for instance, a ping-pong memory manager.”
The leap forward in the latest incarnation of Catapult C is its ability to handle a third variety of control logic: synchronous, reactive inter-block control logic. “This concerns synthesis of control-centric blocks that are purely reactive,” says McCloud. With this kind of control logic, which is explicitly defined in the C++ source, it’s important to give designers a way to explicitly model the control logic. “Now, you can model a series of ports and make a decision when there’s a conflict, such as an arbiter when two requests are coming in at the same time. The decision as to which port wins is very much a user decision,” says McCloud.
In adding synthesis of this sort of reactive inter-block control logic, the challenge for Mentor was to determine how to maintain the abstraction benefits of C++ while permitting users to specify lower-level detail. The answer comes in the form of a new synthesizable C++ construct for asynchronous data communication.
The construct lets designers easily specify asynchronous data communication, allowing full control of the creation of concurrent hardware (see the figure). It enables interfacing of datadriven algorithms with control-centric blocks synchronized by clocks.
“We call this a decoupling control channel,” says McCloud. “The channel handles data on one end and clocks on the other, allowing you to connect between these two abstraction domains.” With this, designers now have all the semantics needed to define what control logic does, including prioritization of tasks and coordination of data. It also provides the ability to query the channel for content availability. All of this can be coherently modeled in pure ANSI C++, in a coding style that’s familiar to hardware designers, who now can express communication, priority, and task coordination within an abstract representation of concurrency.
THE VERIFICATION PIECE
Getting these complex control-logic blocks from C++ to RTL is one thing. However, making sure they still function properly at that level of abstraction is another. “It’s easy to create RTL just to lose the benefit of getting there faster by overcomplicating verification,” says McCloud.
The C++ representation of a controllogic block is very different from that same block at RTL, where there are pin-level interfaces, memory arrays, clocks, and so on.
If the block exhibits unexpected behavior after synthesis to RTL, the challenge is figuring out why. Mentor has filed for a patent on a technique for providing the necessary debug visibility into these kinds of aberrant behaviors. The technique involves back-annotation of the RTL behavior onto the C++ source code. Designers can thus execute the C++ source code with the RTL behavior overlaid, enabling them to validate detailed RTL block interactions at the C level.
A final enhancement lies in power optimization. Many design teams have adopted clock gating as a power-management tool, but the insertion of clock gating is typically a manual process. In general, the team’s power expert examines the RTL code to identify registers that are candidates for clock gating. It’s a tedious, time-consuming step. Moreover, it’s pretty easy to overlook candidates for clock gating.
Because Catapult C synthesizes the RTL from an untimed description, the tool can glean knowledge about the design through detailed sequential analysis. It uses the result of that analysis to automate the process of multilevel clock gating. At register-level granularity, the tool decides which registers should be gated before it produces the RTL. Thus, the RTL it does produce will include clock gating on all registers that can benefit from it.
How much clock gating is created for a given design is very much designand vector-dependent, explains McCloud. “We’ve seen anywhere from 10% to 90% power savings,” he says. On average, power consumption is reduced by 40%.
The 2009a release of Catapult C Synthesis is available now. Pricing for the product ranges from $140,000 to $390,000 for time-based licenses.