Dual-Core DSP Serves Up 40-Bit Precise GFLOPS

April 12, 2004

Merging a VLIW DSP engine and a 32-bit RISC processor produces single-cycle FFTs and complex domain computations.

DSPs must deliver higher throughputs to execute the more complex algorithms today's demanding systems require. In applications like high-end audio, imaging, and beamforming, they have to produce that throughput at higher levels of precision. The DSP engines also must be easy to program, so designers can quickly port algorithms developed on large systems while maintaining the floating-point precision of the large-computer software.

These challenging requirements, plus the need to provide some system control functions, led Atmel Corp. to create a novel system-on-a-chip. The company combined its recent mAgic very-long-instruction-word (VLIW) floating-point DSP with an ARM7 RISC microprocessor core, various cache and data memories (1.9 Mbits total), and peripheral support functions (serial ports, timers, etc.).

The result is the AT572D740 Diopsis, which delivers 1 GFLOPS of floating-point throughput and 1.5 Goperations/s when clocked at 100 MHz. It also delivers such throughput while consuming less than 1.2 W when the core logic is powered by a 1.8-V supply and the I/O pins run from 3.3 V.

The Diopsis processor handles modern signal-processing algorithms that make intensive use of complex-domain arithmetic. Such algorithms include those that use short-time Fourier transforms or complex wavelets. Examples include audio and speech processing, spectrum analysis/surveillance, and vibration analysis. The dual-core architecture lets designers partition tasks and map them to the cores to achieve the best performance and code density (see the figure).

The 40-bit precision of the DSP core's floating-point computational blocks permits direct algorithm-to-code conversion. That's because floating-point algorithms typically used during development on large computers can be directly compiled to the mAgic DSP. Although the mAgic core uses a 128-bit VLIW, programming the core is simple thanks to an Atmel scheduling algorithm. The algorithm automatically analyzes the logical and temporal data dependencies. Then, it schedules operations to optimize both resource usage and pipeline depth to achieve maximum execution throughput.

Housed in a 352-contact plastic BGA, the industrial-temperature-grade AT572-D740 costs $30 in lots of 1000. Samples are immediately available.

Atmel Corp. www.atmel.com/products