When DSP Is Not Enough, And Time Is Too Short For Gates

High-performance and feature-rich signal-processing functions are becoming more common in devices across all market segments as companies seek to further differentiate their products. Convergence within consumer products has seen digital cameras integrated with cell phones and PDA devices combined with digital media players. Manufacturers are under pressure to accommodate the requirements of multiple data formats and communication standards within a single product, as well as anticipate the needs of future standards in the design platform. The ubiquitous spread of wireless connectivity is adding to the convergence trend.

These pressures have multiplied the required processing in many devices by several orders of magnitude. The transition from speech to data processing in mobile Internet devices and the pressure to achieve increased spectrum efficiency and network coverage have led to the development of new algorithms. These processing algorithms, in turn, require more DSP performance to cope. Many analysts believe that this demand is actually outpacing Moore's law, resulting in a widening performance gap.

This gap cannot be closed through current design practice. The universal requirements for low power and minimum cost suggest that traditional signal-processing solutions will not scale to meet the processing demands of new applications. A possible solution to this is configurable signal-processing intellectual property (IP).

Unfortunately, the general-purpose DSP approach often forces the system designer to fit an algorithm to the capability of the target DSP. This makes it very difficult to execute a high-performance algorithm without compromising some aspect of its performance. As such, there is a real danger of arriving at a sub-optimal program running on sub-optimal architecture.

To get around this problem, general-purpose DSPs now include a number of special instructions to provide algorithm-specific optimizations. However, these instructions are at best representative and only provide acceleration for key algorithms in high-volume applications. With GSM, for example, the Viterbi decoding algorithm was identified as a key bottleneck. In later DSPs, the Viterbi algorithm was accommodated by a single "butterfly" function built into the DSP instruction set. But with 3G processing, one of the critical algorithms—the new bottleneck—is turbo decoding. Currently, no general-purpose DSPs offer a turbo decoding solution that satisfies the stringent power budgets for a 3G handset.

Many chip designers have turned to dedicated logic to meet data throughput requirements at acceptable power and area. But this is time-consuming and difficult to design for complex algorithms, sacrificing both flexibility and time-to-market.

The other option is configurable signal-processing IP, which offers a best-of-both-worlds solution: the flexibility of DSPs and power efficiency and performance levels near those of dedicated logic. Dubbed a data engine, this IP can be tuned to a set of algorithms such as video, audio, 3G, and others, providing a flexible chip that helps close the computational and performance gap.

Configurable signal-processing IP enables the system designer to create an optimal solution based on the algorithm's requirements, not the fixed capability of a pre-defined architecture. Data engines provide far more design freedom, better hardware implementations, and more efficient compiler code. They are highly configurable and can be designed to exploit the full parallelism available within the target algorithm. By supporting reprogrammability at the software level, the design process enables designers to freeze the data engine architecture while continuing to tune the algorithm through C-code changes. This is an important feature that enables multiple algorithms that have similar requirements to be run on the same data engine hardware. For example, different variations of the MPEG algorithm can be accommodated on the same data engine hardware using different code.

Implemented well, this new approach can deliver an optimal combination of low power, high performance, and area efficiency without sacrificing time-to-market or design flexibility.