Go Abstract To Speed Up Your Design Flow

New silicon-process-technology nodes are coming out every 18 months. Each process node brings a doubling of chip gate capacity. On top of that, the new 90-nm technology with 7.2-ps gate delays and 10 layers of interconnect offers the capacity to integrate more than 100 million logic gates. This technology has far reaching implications on design methodology.

Traditionally, the divide and conquer method has been used in complexity management. But after more than a decade of dividing, the shear volume of pieces is creating its own complexity crisis. Two strategies address this issue: working with bigger pieces or using more abstraction to simplify the problem.

The first option requires a new generation of register-transfer-level (RTL) synthesis tools based on a completely new approach to the synthesis problem. Incremental fixes and patches being applied to the incumbent solutions haven't kept up with silicon technology.

One of the more serious methodology issues relates to using synthesis tools whose capacity and runtime-associated limitations haven't kept pace with process technology. Most designers using old synthesis technology create more than the necessary number of subblocks for designs. If we look at the design process for a high-performance, 50-Mgate design employing old synthesis tools, we could expect that the design would be subdivided into approximately 500 modules of 50 to 100 kgates (Fig. 1).

Of course, it takes time and engineering effort to create these 500 chip partitions. Worse yet, the price of this added partitioning continues to mount throughout the design process. Indeed, the expense of the additional complexity to manage too many blocks recurs throughout the design process and in the reuse of the design data. These costs include:

The designer must write longer synthesis scripts than are needed to compile the design. For a design of just a few million gates, scripts of more than 3000 lines aren't uncommon. When compile scripts are more complicated than the RTL code, it's time to rethink design strategies and tools.
Once written, the script must be debugged. Obviously, debugging thousands of lines of script is more complicated than debugging a script that's tens of lines. Design errors and project delays often result from such complications. This arcane scripting language is far from self-documenting.
These scripts must be maintained throughout the project's life, as well as during the design data's useful life. Out-of-date and overly complex compilation scripts impede design and script re-use. This has been a great benefit for those in the cottage industry of tool jockeys who specialize in synthesis scripting.
Junior engineers face a steep learning curve to become proficient in the arcane tool knowledge required to make the hyperpartitioned design strategies work. This means that the attention of the most-experienced engineers will become tool-centric, instead of design-centric. This defocus is subtle, but obviously a huge potential issue.

The second option to combat escalating complexity—add abstraction—has some perceived barriers and risks. These must be addressed before it really is considered a viable option.

False Starts And Bad Karma: A few years ago, a great deal of hoopla surrounded the second coming—behavioral synthesis. It was going to revolutionize the way that the electronics world would design. While adding more abstraction to the design process is fundamentally a sound idea, the actual implementation tarnished the entire idea for a generation of designers. A high-level synthesis tool detached from the ability to make accurate performance calculations just wasn't what the market was looking for.

Further, it was assumed that this bad tool was representative of high-level synthesis in general. Soon, it became fashionable to lambaste anyone foolish enough to touch the stuff.

But the compelling complexity crunch has motivated designers to again consider the value of abstraction beyond RTL. Luckily, there are also new generations of high-level synthesis tools to turn to. However, the macro-economic climate doesn't always cooperate with needs generated by new technology. Most corporations have cut their exploratory methodology projects, choosing instead to milk everything they can out of their existing tools and remaining designers.

There never seems to be a good time to change design methodologies. But if teams don't evolve, they will perish. Evolution is the key. To reduce training and ramp-up time, along with the risk of making a change, evolution—not revolution—is necessary.

A high-level summary of the differences between RTL-based and architectural-based design is shown in the table. Note that architectural synthesis yields automation three steps up the design-implementation chain. This automation comes from combining the more abstract starting point of the pins-out-cycle-accurate (POCA) coding style and the automated, optimized implementation provided by architectural synthesis—fine tuned for technology process and constraints.

Starting The (R)Evolution: Architectural synthesis is imminently accessible to the RTL designer because it represents an evolutionary path to higher abstraction. The body of new knowledge required to access the power of abstraction is fairly small and limited in scope. But to exploit its power, designers will have to understand three primary areas of architectural synthesis. Ranging from most difficult to least difficult, they are:

HDL entry differences: requires new conceptualization of the design;
Running synthesis, analyzing re-sults: new analysis abstractions to use;
Design-flow differences: incremental additions to existing flows.

One of the biggest challenges facing the seasoned RTL designer in adopting a higher level of abstraction is the need to change the conceptualization of the design problem. For years, RTL designers learned to visualize banks of registers and the logic between them. Actually, such direct creation of micro-architectures has been the hallmark of RTL-based design.

To adopt architectural synthesis, designers must change from a hardware-declarative method of thought and hardware description language (HDL) coding to an I/O-operation algorithm style. Designers who don't make this conceptual leap will find architectural design inefficient and frustrating.

The easiest way to make the conceptualization breakthrough is to attack specific modules that are well suited to the architectural design style, then generalize to the entire design. Just choosing a module can be a challenge to the RTL designer because the partitioning criteria for the RTL and architectural styles are different.

In RTL, partitioning is driven by logic-synthesis limitations that dictate boundaries along register lines. So as much as possible, critical paths are contained within single modules.

In contrast, architectural-synthesis partitions are based on aggregating algorithmic or computational threads together—even though they might eventually be broken up over the time axis. As a result, selecting a module for the first attempt at architectural synthesis may involve some design reorganization. But it's well worth the effort.

A design methodology based on architectural synthesis can replace the RTL methodology in each case. But depending on the design type, the design environment, and the target technology, the advantages may be insignificant, or even counterproductive. Eight decisive factors favor a design flow based on architectural synthesis instead of an RTL-based design flow.

1. Algorithmic Complexity: A design is an ideal candidate for architectural synthesis when it possesses considerable algorithm complexity, either due to a convoluted control flow or because of elaborate arithmetic operations. Here, architectural synthesis relieves the designer from the cumbersome task of coding the finite state machine (FSM) in detail. That's because architectural synthesis accepts procedural descriptions that implicitly contain the FSM hidden within loops, conditional branches, and other high-level constructs.

Hence, the more complex the control flow of a design, the greater the benefits of using architectural synthesis. Unlike RTL synthesis, architectural synthesis doesn't require the comprehensive description of any complex arithmetic operation, such as an arithmetic logic unit (ALU) or cyclic redundancy checker (CRC). Consequently, designs that include extensive arithmetic operations will benefit from an architectural synthesis flow. Examples of designs that encompass elaborate arithmetic operations are digital signal processors and microprocessors.

2. Memory-Access-Dominated Designs: Any RTL designer will admit that writing code to mange memories and their interfaces isn't a trivial task. Higher complexity memory interfaces have more demanding modeling jobs.

Architectural synthesis fully automates implementing the interface logic of any memory block by specifying the number of memory ports and the read/write protocol. Memories with multiports and composite read/write schemes are ideal candidates for architectural synthesis. Different memory types can actually be swapped in and out of the synthesis process without recoding the HDL.

3. Designs With Pipelines: Pipelining improves the performance (higher throughput) and/or the manufacturing cost (fewer hardware resources) of a design. It breaks up the logic into smaller pieces and inserts registers to hold intermediate data between those pieces. An upstream piece can begin to process data before the data processing of a downstream piece has finished. Benefits can include higher throughput and/or smaller area. Depending on the nature of the design, pipelining might not be an option. It can be successful, though, if the design involves parallel computation of operations.

In the RTL flow, the designer must manually implement the pipeline, which may be an intricate assignment prone to traps and mistakes. Architectural synthesis automates implementing a pipeline by merely specifying the latency and the area of the design.

4. Limited Knowledge Of Components Timing: Developers writing RTL code must estimate the propagation delays of structural blocks that make up the design and ensure that it meets top-level timing constraints. With the exception of simple logic, such as gates, this mission is especially challenging. Routinely, designers "guesstimate" the propagation delays of everything more complex than gates, like multiplexers, ALUs, RAMs, CRCs, and FSMs. Estimation is based on either experience of similar designs or on rough metrics.

Frequently, erroneous calculations are made, producing synthesized gate-level netlists that don't meet timing targets. To correct the mismatch, designers must rewrite parts (or all) of the RTL code. Thus, multiple iterations with the RTL synthesis tool are required until timing converges.

In contrast, architectural synthesis automates the propagation-delay calculation of every element—whether complex structures or simple logic, including interconnections between hardware elements—by mapping the RTL and the gate structures, then performing timing analysis on those structures. This automation of calculating process-accurate timing also helps designers who lack experience with a class of designs and eliminates the bad assumptions.

5. Need To Perform Architectural Explorations: Any design can be implemented with one of several different hardware architectures and still meet the design goals. But finding the appropriate architecture at the RTL may not be possible considering the tight timing schedule usually assigned to a team creating complex devices in the competitive electronics industry.

In an RTL flow, the development team identifies the design architecture in the HDL description by explicitly coding machine states, operations to be performed in those states, registers to store temporary values, and whatever other structural details make up the design. Consequently, altering an architecture requires the design's entire RTL description to be rewritten. It might also force rewriting the testbench, as the throughput of the design could have changed.

In an architectural synthesis flow, the design team can explore different architectures over a wide range of performance and cost by switching target libraries. Or, they can modify design goals such as area, number and type of resources, number of states, clock cycles or design latency, and resynthesizing the same architectural HDL code. As long as the I/O protocol of the design doesn't change, the same testbench may be used repeatedly.

6. Switching Process Technologies: Similar considerations made for performing architectural explorations also hold true for retargeting to a different process library. A micro-architecture that performs best with one process technology may yield poor results with another process. When changing a target library in an RTL flow, a process-technology-optimized micro-architecture may need to be rewritten to produce the best results. In an architectural synthesis flow, the design team can resynthesize the same architectural HDL code targeting a different process library and generate a new netlist optimized for the new library.

7. Specifications Not Completely Frozen: The RTL design methodology requires the complete and in-depth definition of the design architecture before the start of the coding phase. Design specifications must be nailed down in all details, so RTL developers can work out the final architecture and finish the coding. If the coding has already started when the RTL development team is notified of a change in the spec, a significant disruption will arise in the project schedule and a costly slip will occur.

The architectural design flow doesn't need the complete and detailed architecture of the design to perform the implementation task. So the development team can make last-minute changes, with only minor updates to the functional HDL description of the design.

8. IP Creation: Architectural synthesis is the perfect technology for creating libraries of soft intellectual-property (IP) macros and boost their popularity. Two problems have slowed soft IP macros use. First, unlike hard macros that "plug and play" in a design without the need for customization, soft IP mac-ros written at RTL almost always require partial redesign of the host circuitry. This places unforeseen burdens on the de-velopment team and delays the project. In contrast, a soft IP macro described as a POCA functional model is constrained only at the I/O protocol level and sits in a design with minor editing.

Second, soft IP macros written in RTL require partial rewriting of their code when they're retargeted to a different process technology. But a soft IP macro described at the architectural level doesn't need customization. It accommodates a different process technology by being synthesized to target the new library. Soft IP macros described at the architectural level are ideal to support families created from variants of any given design.

POCA Coding: As mentioned previously, the HDL coding style for architectural synthesis is called pins-out-cycle-accurate, or POCA. This style can be written using any of the popular HDLs, including Verilog, VHDL, SystemVerilog, Superlog, SystemC, and CoWare C.

The basic premise of POCA is that the input and output transactions (reads and writes) of the module being created are explicitly fixed into clock cycle time, while all intervening calculations can be moved during optimization. As a result, the I/O protocol of the modules is the same before and after optimization. This overcomes a major headache found in behavioral synthesis where I/O protocols between modules are constantly shifting and made incompatible.

Language constructs used for POCA-style HDL coding are largely the same as those used in RTL-based design. While POCA permits more-general use of loops, the key difference is in the more-general approach to clock-edge usage that POCA permits. RTL code is restricted to a single clock edge per process, and the clock edge must be located at the beginning of the process.

Conversely, POCA permits as many clock edges as desired. They may be scattered throughout a process. A simple filter design coded in POCA-style Verilog uses many familiar RTL constructs (Fig. 2). The accompanying control-flow graph (CFG) has been annotated with the loop and data operations. The designer will implement this graphical view to understand the automatic micro-architecture created by architectural synthesis.

Running Synthesis: Architectural-synthesis tools are operated and controlled in much the same way as RTL synthesis tools. They can use the same technology libraries and types of timing constraints and produce the same kinds of results reports. This discussion will focus only on the differences, but most of the environment is nearly identical to that of RTL-based environments.

Architectural synthesis does "bigger" design transformations than RTL synthesis, as it's not constrained within the solution space of a single micro-architecture. The two categories of optimization transformations that architectural synthesis performs are scheduling and resource allocation.

The scheduling process assigns each operation to a clock cycle. It performs timing analysis, looks at the clock cycles and clock trees, and then distributes operations across clock cycles. If a value is crossing the cycle boundary, that will be saved in a register. To avoid multicycle operations, it splits complex operations into multiple cycles. It checks for timing violations and fixes them by rescheduling the operations.

On the other hand, the resource-allocation process decides the number and type of resources that can be used in a given implementation. It creates an FSM to control the use of these resources and generates a set of registers or memory blocks to store intermediate values. If possible, it reuses hardware resources like registers, functions, and operators.

Moreover, an architectural synthesis tool offers several capabilities not available in RTL synthesis, such as chaining, pipelining, memory inferencing, resource sharing, and register sharing. These optimization transformations are explained in many books on the subject of high-level synthesis.

In RTL-based design, the designer must perform scheduling and resource allocation tasks. With the automation provided by architectural synthesis, results of these tasks must be communicated to the designer who wants or needs to understand details of the module implementation. The most concise way to explain results is by using design abstractions, or design views, that may not be familiar to all designers. These views are the control-flow graph and data-flow graph (DFG).

One way to view a design is to look at it as consisting of three pieces of information: what, when, and where. A design is characterized by the desired behavior (what), mapped to time (when), and implemented by a specific structure (where). The DFG encapsulates the desired behavior, whereas the control-flow graph condenses timing information. During design elaboration, architectural synthesis extracts the CFG and DFG from the high-level HDL code. In the scheduling phase of architectural synthesis, the two graphs are linked and design optimizations are performed. Finally, in the allocation phase, architectural synthesis assigns hardware resources to the operations.

Architectural synthesis tools offer user interfaces that depict CFG and DFG views of the design, along with the more familiar source-code, hierarchy, and gate-level views. These views are all hyperlinked together so that their relationships can be readily understood.

The procession of steps within an architectural-synthesis tool proceeds in a generally sequential fashion, with many hidden iterations performed during various optimization steps (Fig. 3). Inputs are the POCA-style HDL code, implementation constraints, and process library (the same as used for RTL synthesis). Outputs can be generated at multiple levels of abstraction, from an optimized POCA-style scheduled design, RTL, or gate-level.