Digital-signal-processing (DSP) technology has made tremendous strides thanks to advances in semiconductor processing that allow more memory and compute resources to be integrated on one chip and architectural enhancements that let the processors do more during each clock cycle.
Although the basic Harvard-style architecture is still used by many 16-bit DSP cores and chips, most higher-performance solutions have moved to architectures that do more in parallel-either very-long-instruction-word (VLIW) approaches or single-instruction/multiple-data (SIMD) schemes. At the very high end, some companies are starting to employ multiple-instruction/multiple-data (MIMD) architectures to achieve the highest degree of parallelism and thus maximum performance.
Yet along with increased parallelism comes the challenge of programming and controlling all the resources. Thus, software tools and algorithm application libraries will play key roles in getting systems to market as fast as possible. The tools and libraries often will make or break the popularity and success of a particular DSP architecture.
Today, a basic MP3 music player requires a throughput of about 30 MIPS, while more advanced audio applications, such as the new Windows Media Audio Professional software, require closer to 100 MIPS to execute. Digital-camera image-processing requirements are also increasing as designers add more capabilities, pushing up DSP needs to several hundred MIPS. But to fit these applications, the DSP solution must be highly integrated and inexpensive.
The latest generation of DSP chips, including the Blackfin family from Analog Devices and the TMS320C6412 and 320F2801 families from Texas Instruments, have been designed to sell for as little as $5 each yet deliver throughputs of several hundred MIPS. Such chips, as well as those from Motorola and StarCore, are the new workhorses for many consumer, telematics, and industrial applications. They will deliver the 100 to 300 MIPS-the new mainstream performance levels for the consumer audio/video market.
In the cell-phone handset and basestation market, DSP cores and chips offered by Ceva Inc. (formerly Parthus-Ceva), StarCore (a partnership formed by Agere, Motorola, and most recently, Infineon), Texas Instruments, Motorola, and Analog Devices, as well as captive DSP solutions from companies such as STMicroelectronics, Intel, and others, handle the audio processing requirements with ease. However, next-generation cell phones are now incorporating features such as MP3 players, games with sound effects, polyphonic tones, and image/video capture and playback. These features place additional demands on the DSP subsystem and will also require much higher DSP throughputs to handle the more complex algorithms.
To tackle those increasing demands, Ceva recently unveiled its next-generation scalable DSP core architecture, which it expects to sample by mid-2004. Designed to be compiled to specifically match the system needs, the CEVA-X core combines both VLIW and SIMD architectural aspects. At 450 MHz with a dual-multiplier-accumulator (MAC) implementation, its performance exceeds the throughput of all dual-MAC DSP chips or cores and even most quad-MAC DSPs. Compared to the company's own Teak DSP core, the CEVA-X will deliver a 12-fold performance improvement, with a peak throughput of 11 billion instructions/s. Because handheld applications are one key application area for the CEVA-X core, it was also designed for low-power operation. It dissipates just 0.06 mW per MMAC. Included in the roadmap are 16-bit versions with two, four, and eight MACs, as well as a still higher-performance 32-bit implementation of the architecture targeted at home-entertainment applications like HDTV and multimedia streaming.
Based on a super Harvard architecture with lots of on-chip memory, the Sharc family of DSP chips from Analog Devices can deliver 32/40-bit floating-point computational throughputs at up to 5 billion operations/s. Further, these chips can do it at a budget price, which starts at less than $10 in volume. Competing with the super Harvard approach are VLIW architectures like those used in the TMS320C6212 family. By dividing the algorithms across multiple functional units, the processors can achieve throughputs of 2000 MMACs when clocked at 500 MHz. Additionally, the high level of integration designers have achieved provides a system-on-a-chip class solution. Planned for this year is a clock-speed boost up to 1 GHz from the current maximum clock speed of 720 MHz for the TMS320C6414. This promises to up the maximum throughput to 4000 MMACs-and at a budget price. The flagship processor will cost about $189 in lots of 10,000.
Highly parallel DSP architectures that implement SIMD and MIMD approaches are now available from close to a dozen companies. Some of the architectures are fixed arrays of computational units. Others consist of a software-configurable array of processors that can be configured to optimally map to the desired algorithm (for more about configurable DSPs, see "Harness Today's DSPs: Propel Tomorrow's Design," electronic design, Dec. 18, 2003, p. 47). These highly parallel processors can achieve throughputs of 10 to 25 gigaoperations/s, which lets them perform algorithms to implement software-defined radios, antenna beam forming, multiple HDTV datastream processing, and many other complex tasks.
- Samples of a 1-GHz DSP chip will be released by TI in Q2 of 2004. Based on 90-nm design rules, the processor will be a drop-in replacement for TI's previous best, a 720-MHz processor.
- Prototypes of a new DSP core that combines VLIW and SIMD architectural approaches will be released by CEVA Inc. The 16-bit version of the core will deliver 12 times the performance of the company's previous high-end processor core, the Teak DSP.
- Low-cost floating-point DSPs targeted at telematics, audio processing, and streaming media applications will be available from ADI. The Sharc DSP chips will operate at a 300-MHz internal core frequency and deliver a throughput of 5 billion operations/s.
- The first samples of the highly parallel and reconfigurable compute fabric developed by Motorola will be released during the first half of 2004. The array is based on a core compute element developed by Morpho Technology and licensed by Motorola.
- Software libraries and development tools will play an ever more important role as the DSP chip architectures get more complex. Due to the complexity of the new, highly parallel architectures, designers will require more software support to reduce the programming time and get systems to market as quickly as possible.
- Operating power levels for DSP cores will keep dropping as designers try to extend system battery life while adding more functionality. Next-generation cell phones, for instance, are adding cameras, multimedia players, and other features that require additional DSP MIPS, but battery size and weight will either remain constant or decrease. So, lower power consumption is essential.
- DSP cores are growing in popularity as designers move to a system-on-a-chip solution. Integration levels are increasing as engineers combine the DSP cores with standard RISC processor cores, large blocks of memory, and such system interfaces as Ethernet ports, PCI bus interfaces, and serial I/O ports.
- Control and DSP functions are merging into a single core along with flash-based program storage and a broad array of peripheral interface functions. Samples of next-generation controller/DSP chips, the TMS320F28xx family, will come from TI in Q4 of this year.
- Compute throughputs exceeding 20 GFLOPS will be achieved by some of the latest highly parallel array processors. These software-configurable processors will deliver unparalleled performance. But software tools will be key to achieving the performance by helping to optimize the architecture of the array to the algorithms.
- Look for FPGAs to also play a role as DSP accelerators or coprocessors. The ability to configure an FPGA into an array of multipliers or other functions will allow systems to rip through large data tables or perform other highly parallel operations. The configurable nature of the FPGAs also allows their functionality to change by simply loading a new bit stream.