HARDWARE DESIGN is a process of refining an idea from a highly abstract form to a concrete, physical implementation. Along the way, a design is continually transformed from a given state of abstraction to another less abstract representation, finally ending with physical design. Those transformations are achieved through synthesis.
For the past 15 years, synthesis primarily has meant transforming an RTL design description to a gate-level netlist. Synopsys' Design Compiler, fueled by the emerging Verilog language standard, represented a new design paradigm that helped designers manage the spiraling complexity of LSI chips.
But as electronic-system-level (ESL) design emerged, EDA tool vendors began providing tools for synthesis of designs from higher levels of abstraction into RTL. It's hard not to see close parallels between the state of high-level synthesis today and the early days of Design Compiler, when it battled for acceptance in a design community that was loathe to give up its beloved schematics.
HOW HIGH IS UP?
Why would designers want to move up in abstraction in the first place? There are three primary reasons, according to Jeff Jussel, vice president of marketing and Americas General Manager at Celoxica.
"One reason is when you have both hardware and software in the system and it's a benefit to have a common language (most often C) between the two," he says. "A second is when you're working with an algorithm that's just too complex to write in RTL. The third is verification. C is used for modeling because it's far faster than RTL in simulation (due to the absence of timing information)."
A few classes of synthesis reside above RTL, more or less categorized as algorithmic, coprocessor, and behavioral. One case that stands out is a kind of reverse synthesis in which C-level models are synthesized from existing RTL (see RELATED ARTICLES: "Synthesis In Reverse?" at the top of this page.).
In the algorithmic category, much of the attention centers on DSP algorithms. Randy Allen, president and CEO of Catalytic Inc., believes that algorithms are more critical to the design effort than ever before, especially as more designers begin to rely on standardized programmable hardware platforms.
"What we see is that there will be fewer people designing ASICs, but many more writing code for them," he says. "For 100-Mgate SoCs, the only way to take advantage of those gates with respect to manufacturing is to make them programmable."
Like others who dwell in the algorithmic space, Allen and Catalytic take a view of high-level synthesis that would resonate with a software engineer. The idea, as Catalytic sees it, is to help designers synthesize algorithms from the traditional Matlab development environment into efficient C code. Such code can either be deployed on a DSP or processor or used as a functional model for RTL development (Fig. 1).
Many programmers write code targeted toward a specific signal-processing application, knowing that it will run on a dedicated DSP chip. They'll typically want to start in a high-level language like the MathWorks' Matlab, which allows them to focus on the mathematics and ignore the implementation.
The biggest difference between algorithm writers using Matlab and hardware designers going with lower-level languages like HDLs, or even SystemC, is thinking in terms of parallelism. Languages like Matlab and C/C++ are inherently sequential, while hardware (at least efficient hardware) is inherently parallel.
Traditionally, compiling general-purpose programs to run on general-purpose parallel architectures has been nearly impossible. Yet compiling application-specific programs with application-specific parallelism is well within the scope of today's algorithmic synthesis flows.
Synplicity's Synplify DSP is a DSP synthesis tool that performs a number of behavioral-like optimizations, such as tradeoffs between timing and area. Interestingly, it can automatically multithread processing hardware through instantiation of multiplexing.
AccelChip is another vendor plying the waters of high-level synthesis. Preferring the term "architectural synthesis," AccelChip has a methodology that combines a synthesis engine with its AccelWare core generators for complex DSP-related functions to create RTL from Matlab algorithmic expressions.
Architectural synthesis gives DSP algorithm engineers a means of very quickly sorting through a large number of microarchitecture options. With Matlab as a starting point, there's nothing implied in terms of implementation in the algorithm. AccelChip's flow abstracts away hardware, allowing designers to specify the details of, say, a fast Fourier transform while remaining at the architectural level (Fig. 2).
The tools generate synthesizable Matlab and perform a floating-to-fixed-point conversion. Unlike C-based tools, which typically operate with 8-, 16-, or 32-bit boundaries, Matlab can provide up to 63,000 bits of accuracy.
The fixed-point Matlab is simulated, the inputs and outputs are captured, and the tool builds a self-checking testbench for the RTL in VHDL or Verilog. As a result, the designer can verify that the RTL code emerging from synthesis matches the characteristics and performance of the fixed-point Matlab code.
While customizing software to fit a given hardware architecture is one approach, another involves designing custom hardware for software applications. Celoxica's methodology starts with a function and then works through the architectural issues, as opposed to starting with an architecture and trying to map the function to it.
Rather than start from an algorithmic level, the methodology begins with a proprietary ANSI C variant called Handel-C. Celoxica tried to strike a balance with Handel-C to find a language that lets designers express parallelism through a rules-based approach. At the same time, Handel-C avoids imposition of hardware constructs that take C farther in an HDL-like direction.
"If you're going to synthesize C code, it requires some tradeoffs," says Celoxica's Jussel. Some of these tradeoffs might mean deciding whether or not to represent the design structure in C, or deciding how much to do to your C code to represent hardware. "You can write SystemC with so much implementation detail that it's practically RTL. At that point, why bother? But pure C and C++ contain no notion of hardware, so again, there's a tradeoff there."
WORKING FROM C
Celoxica is just one of many options for designers looking to synthesize RTL from a C-level description. Other vendors' approaches, such as one from Synfora, attempt to analyze sequential C code and extract as much parallelism from it as possible. Synfora's methodology seeks to solve, at block level, the issue of imposing parallelism on sequential C algorithms at four different levels.
First, parallelism is extracted at task level so various tasks can be run in parallel. Next, loops in the code are examined for parallelization potential. Multiple iterations of these loops then can be run in parallel. Finally, loops with different operations can run in parallel. The tool adjusts the parallelism to match the designer's desired throughput.
In Synfora's flow, the tools take in the sequential C code as well as a testbench with the test vectors written in C. From that, the tool creates an RTL testbench and an instrumented C testbench. It also creates three different levels of SystemC models: bit-accurate, which is completely untimed; parallel, which is timed at the high level; and RTL, which is timed at the cycle level.
Mentor Graphics entered the C-synthesis space just over a year ago with its CatapultC tool, which it calls an algorithmic synthesis tool. CatapultC takes in an ANSI C++ functional description of the algorithm. Users tell the tool what technology they're targeting, such as a 90-nm ASIC technology, and specify a clock period and hardware interfaces. The tool generates RTL tuned for the exact system specification.
The latest CatapultC revision was extended to automatically generate SystemC models. The tool now generates fully timed and cycle-accurate SystemC models. Mentor expects it to be able to generate timed and untimed TLM models as well later this year.
ON THEIR BEST BEHAVIOR
On the behavioral-synthesis front, many eyes have been on Forte Design for some time. Forte's Cynthesizer is the latest hope for a behavioral-synthesis tool that will wipe away the bad memories of earlier generations of behavioral-synthesis technology that didn't live up to the hype.
Designers should take behavioral synthesis seriously for a number of reasons. A key reason is the existence of C, C++, and SystemC as inputs to a behavioral tool, as well as the plethora of C-level models.
The advantages of behavioral synthesis come in a couple of flavors. For one, it enables designers to quickly respin a design, because it's a lot easier to make a change in a functional specification in C code than RTL. Behavioral synthesis also lets designers rapidly retarget a design to different technology libraries. The technology library is an input to the Cynthesizer tool. Making a change in libraries is simply a matter of having the tool regenerate RTL targeted for the new library.
It's worthy to note, though, that behavioral synthesis from C to RTL doesn't absolve designers from knowledge of hardware design.
"You have to be a hardware designer to get value out of our tool," says Brett Cline, Forte's director of marketing. "We won't let you do things like access memory during every clock cycle. But we will generate RTL that will meet timing, because we characterize a basic portfolio of parts for the technology library and speed you give us."
For some synthesis vendors, it's not so much the flavor of C you start from, but whether you start from a hardware- or software-centric point of view. Celoxica's notion of starting with a function as opposed to an architecture isn't at all lost on Critical Blue, whose Cascade coprocessor-synthesis tool spins RTL code for programmable coprocessor architectures from C.
Cascade relieves overburdened ARM CPUs by synthesizing hardware coprocessors from the software routines that run on them. According to David Stewart, CEO of Critical Blue, power consumption will increasingly drive designers toward programmable coprocessor synthesis.
"Even though general-purpose processors can run at extremely high speeds in 90-nm processes, the reality is that power budgets usually won't allow them to do so," he says. "What that's going to mean is that, increasingly, pieces of code currently running on processors will have to be offloaded into more power-efficient solutions."
RTL: NOT DEAD YET
While many designers explore their options above RTL, RTL synthesis hasn't gone away. But even RTL synthesis is creeping upward in abstraction.
For example, Bluespec's tool accepts an untimed design description in a proprietary flavor of SystemVerilog. Bluespec's approach uses what it calls "rules and methods" to describe parallelism and block-to-block interfaces. "Rules and methods" are embodied in extensions to SystemVerilog. The resulting amalgamation is what the company calls Bluespec SystemVerilog.
"Traditional behavioral synthesis is very good at taking a tightly nested for-loop and parallelizing that into hardware," says George Harper, Bluespec's director of marketing. "But the bulk of IP out there doesn't fall into that category." Bluespec has tried to subtly raise the level of abstraction of RTL synthesis by adding these rules and methods, which can be likened to assertions.
An important consideration in today's synthesis environment is the proliferation of implementation choices. Designers can choose from FPGAs, ASICs, and structured ASICs. As a result, it's crucial that synthesis tools be tuned to the architecture of the chosen implementation.
Synplicity attempted to address this requirement in its range of synthesis tools. The company recently introduced graph-based physical synthesis for FPGAs. With this capability, Synplicity's tool can perform simultaneous placement, optimization, and routing. It then passes forward the placement information and the netlist to the FPGA vendors' routers.
Similarly, Synplicity has worked closely with structured-ASIC vendors such as LSI Logic, NEC, and Fujitsu so its physical synthesis tools can offer close correlation between the tools' timing analysis and the ASIC vendors' back-end flows.
Synplicity uses a technology it calls Sensitive Net Analysis and Prevention to reduce the probability that variations in routing will cause a correlation problem (Fig. 3). This way, the synthesis tool ensures that the placed-gates design still meets timing after detailed routing.
"In our experience, it takes a pretty small amount of area expenditure to tighten up the prediction," says CEO Ken McElvain.
The granddaddy of all RTL synthesis tools, Synopsys' Design Compiler, still commands an overwhelming market share in its domain. The latest revision, Design Compiler 2005, brings more accuracy to the table in an effort to have its timing and area reports correlate more closely with physical implementation.
To this end, Synopsys did away with wireload models in Design Compiler and is pushing more physical information forward into the synthesis step. Meanwhile, the company managed to keep the look and feel of the tool consistent.
Magma Design Automation subscribes to a similar philosophy—accuracy in logic synthesis with respect to the physical domain. "The value in RTL synthesis is in giving the designer insight into what the physical implementation is likely to be," says Yatin Trivedi, Magma's director of product marketing.
"The game has shifted from 'give me the most optimized netlist' to 'help me create better RTL and complete the constraints,'" Trivedi says. In Trivedi's and Magma's view, physical synthesis is the best opportunity for real improvements in timing, not logic synthesis.
"Physical synthesis provides an opportunity to reimplement the datapath," says Trivedi. "Traditionally, people have fixed their datapath elements. Then they run into problems and try to fix them in the back end. What you really need to do is implement datapath synthesis dynamically or on the fly."
In its RTL Compiler technology, Cadence takes an approach it calls global logic synthesis. RTL Compiler doesn't create an implementation randomly and then optimize. Rather, the tool considers all axes of optimization during the global surveying process. According to Chi-Ping Hsu, corporate vice president at Cadence, global synthesis has great impact in terms of power optimization.
"The optimization for leakage power tends to be more in sync with area objectives, while optimization for dynamic power tends to be contrary to area objectives," says Hsu. "When you add multiple voltages, it really makes the traditional incremental approach very difficult. Runtimes are extremely slow, and results are suboptimal."
In the physical-synthesis realm, as with logic synthesis, EDA vendors have strived for greater accuracy. Synopsys updated IC Compiler to extend physical synthesis to cover both placement and routing.
Just as earlier generations of physical synthesis closed the gap between synthesis and placement, Synopsys' Extended Physical Synthesis (EPS) technology brings placement, clock-tree synthesis, and routing into close alignment. The net effect is the greater visibility of placement into clock-tree synthesis, and the results of that previously disparate process can be anticipated so it's accounted for during placement optimization.
EPS technology also enables clock-tree synthesis to anticipate and guide routing. Furthermore, routing can modify placement and locally perform resynthesis to complete the design where needed.
What could the future hold for physical synthesis? One indication comes from Zenasis Technologies. Its flagship tool, ZenTime, performs transistor-level optimization of standard cells to precisely tune them to meet timing objectives.
"In the late 1980s and early 1990s, logic synthesis was the factor that changed standard-cell design, followed by physical synthesis, which has now taken over the whole synthesis domain," says Sunil Mudinuri, marketing manager at Zenasis. "We believe that in the future, the new synthesis domain will be what we're calling flex-cell or design-specific cell synthesis."
Today, most standard-cell designs are being done on fixed libraries. Zenasis proposes imposing a layer, or wrapper, around the standard cells that would enable ESL tools to replace them, as necessary, with cells that would satisfy specific timing, area, and power constraints.
Zenasis' technology operates at the transistor level, giving the tools a good deal of insight into what's happening inside the cells. In the flex-cell concept, more physical information can be piped into the ESL tools for a better estimation of a given implementation.
It's almost ironic that a futuristic concept for physical synthesis would reach all the way up to the ESL level. But given the path taken by synthesis to this point, it's not surprising that EDA vendors would seek to drive physical information as high in the flow as possible.
In a sense, Zenasis' concept closes the circle, bringing the end point of the design process—physical design—around to the front end. This is, in all likelihood, the path that the EDA industry will, and must, traverse if Moore's Law is to survive long into the 21st century.
|NEED MORE INFORMATION?|
Carbon Design Systems
Forte Design Systems
Magma Design Automation