C-Code Algorithms Infiltrate Hardware

April 1, 2004
This algorithm-to-tapeout synthesis tool performs tradeoffs between wireless software and hardware implementations.

Few areas of embedded design are more challenging than the development of mobile wireless products. In this arena, designers must carefully balance overall performance issues with power consumption and time-to-market pressures. The tools that can assist in this delicate balancing act are among the most sought-after resources by wireless developers.

Building on research that was originally conducted at Hewlett Packard Labs, a new company known as Synfora now claims to have the first true "algorithm-to-tapeout" synthesis technology. Called Program In Chip Out, or PICO, this technology promises to greatly reduce design risks by enabling the early exploration of architectural design alternatives. Specifically, PICO will create efficient hardware from compute-intensive, algorithmic C descriptions. This early exploration of "what-if" performance and power scenarios allows wireless designers to find the optimal mix of hardware and software implementations.

Tradeoff analysis between system performance and power consumption is especially critical in the design of algorithmic-intensive applications. These applications are abundant in wireless systems, which encompass everything from digital-baseband processing to MPEG-4 coding and decoding. Although compute-intensive algorithmic applications are originally constructed in C-coded software, performance and power issues often mandate a hardware solution.

The company's first commercialization of PICO technology is called PICO Express. This tool takes compute-intensive blocks of algorithmic code (typically representing items like Viterbi decoders and data filters) and accelerates them in hardware. The actual output from PICO Express is the automatic generation of the register-transfer-level (RTL) code that's needed to build hardware accelerators. This hardware interfaces directly with the host processor (typically but not restricted to an ARM core). In doing so, it implements the compute-intensive functions of the overall algorithm.

Designers have full discretion in the selection of which blocks of code should be moved to hardware. Once those blocks are selected, PICO Express can perform an analysis—called a space walk—of the selected code to examine power and performance tradeoffs. For example, power consumption of the selected algorithmic block can be checked throughout a range of clock frequencies and performance-throughput targets. The tool can then output a variety of RTL implementations for the designer's final selection.

Along with the RTL description, PICO Express provides a synthesis script, test bench, and software-driver code. It therefore enables the integration of the RTL into the existing system-on-a-chip (SoC). The tool provides checking and validation of the generated RTL code, including bit-accurate C simulation to detect overflows. A program verifier ensures that the design creates a highly efficient hardware accelerator. One of the main testing activities is a perturbation test, which validates that the structures added will not cause functional or timing failures.

Once PICO Express generates the RTL driver code, it can be compiled into executable code. The tool then creates the completely verified hardware accelerator, called the Pipeline of Processor Arrays (PPA) architecture (SEE FIGURE). Included in this architecture is the bus interface between the PPA and the primary processor core.

The interface between the accelerator and the core is critical because bus bandwidth is a very limited resource. PICO Express analyzes the bus data access and determines how it can best cache data to reduce constant data loading and storing activities. This effort can significantly reduce traffic to the host processor. Furthermore, the tool creates a streaming interface pipeline. It allows processed data to be passed between several accelerators without ever going back to the system bus or processor.

Gate counts for the resulting hardware accelerator depend greatly upon the function of the original algorithmic block. Gate-count values can range from 35 to 1000 kgates. Typically, a series of accelerators will be needed for more demanding designs, such as a CDMA modem.

PICO Express is available immediately. It is priced at $125,000 for a design-project license. The first customer silicon using PICO Express is expected this month.

Synfora, Inc.
2465 Latham St., Suite 100, Mountain View, CA 94040; (650) 314-0500, FAX: (650) 314-0501, www.synfora.com.

About the Author

John Blyler

John Blyler has more than 18 years of technical experience in systems engineering and program management. His systems engineering (hardware and software) background encompasses industrial (GenRad Corp, Wacker Siltronics, Westinghouse, Grumman and Rockwell Intern.), government R&D (DoD-China Lake) and university (Idaho State Univ, Portland State Univ, and Oregon State Univ) environments. John is currently the senior technology editor for Penton Media’s Wireless Systems Design (WSD) magazine. He is also the executive editor for the WSD Update e-Newsletter.

Mr. Blyler has co-authored an IEEE Press (1998) book on computer systems engineering entitled: ""What's Size Got To Do With It: Understanding Computer Systems."" Until just recently, he wrote a regular column for the IEEE I&M magazine. John continues to develop and teach web-based, graduate-level systems engineering courses on a part-time basis for Portland State University.

John holds a BS in Engineering Physics from Oregon State University (1982) and an MS in Electronic Engineering from California State University, Northridge (1991).

Sponsored Recommendations

Comments

To join the conversation, and become an exclusive member of Electronic Design, create an account today!