Applications that don't leverage some form of digital-signal processing are like the proverbial hen's teeth. DSP capability surfaces in standalone DSP chips, one or multiple DSP cores integrated into an ASIC, general-purpose CPUs enhanced to run DSP algorithms, and even FPGAs configured to execute key algorithms.
As DSP-related algorithms and tasks gain in complexity, DSP engines must clock faster, do more during each clock cycle, or manage both feats. Increasing the clock speed is the simplest approach, which is why we now see 1-GHz clock speeds. But higher speeds usually mean higher power consumption.
However, over the last decade, power has dropped several orders of magnitude from about 2 mW/MIPS to 0.1 mW/MIPS. Over the next few years, look for that to fall to 0.01 mW/MIPS (see the figure). This bodes well for portables, which will operate longer or handle heavier task loads without increasing the drain.
To offer this range of performance, DSP vendors have crafted chips and cores that span the performance levels required by such a variety of potential applications. Other vendor developments include more highly integrated solutions that run the DSP algorithms as well as perform all system control tasks. This eliminates the separate controller chip, reducing system chip count.
One notable trend involves delivering off-the-shelf, system-on-a-chip solutions targeted at specific market segments. Such is the case with Texas Instruments' just-released DaVinci DSP family. The on-chip resources of the TMS320DM6443 and 6446, its first two members, support video playback and video capture and playback, respectively. Future family members will be optimized for other application areas. All of the chips are based around TI's C64x+ DSP core, which can run at up to 600 MHz. The first two devices also include an ARM 926EJ-S 32-bit processor running at 300 MHz for control tasks.
But the integration doesn't stop there. Dedicated digital video-processing blocks handle video encoding (6446 only) and playback of H.264 video streams. These blocks offload the main DSP engine and lower the speed that would otherwise be necessary to perform all of the calculations.
The DaVinci family is just one of several lines in TI's C6000 high-performance series. The C5000 series of power-efficient DSPs targets communications and audio products. The C2000 series comprises control-oriented DSP engines.
Analog Devices and Freescale Semiconductor are perhaps TI's largest rivals. In the floating-point arena, ADI's TigerSHARC family basically is the only competition for TI. However, ADI and Freescale both compete with TI in audio and control DSPs, with Freescale's 56xxx series and ADI's Blackfin devices. Freescale also offers highly integrated DSP chips based on the 16-bit StarCore architecture.
Formed by Freescale, Agere, and Infineon (StarCore LLC), the StarCore Alliance licenses the StarCore DSP cores. The latest implementation, the StarCore V5 architecture definition, arrived late last year. Samples of the core are expected early this year. Some of its 47 new instructions enhance single-instruction/multiple-data (SIMD) processing and help deliver the best-in-class multimedia and Viterbi performance.
With the availability of DSP cores, designers can craft their own ASIC solutions. In addition to the StarCore offerings, STMicroelectronics and CEVA DSP offer various 16-bit cores, some with throughputs of up to 600 MMACs. CEVA also has developed several application-optimized cores—one fits audio applications, while another takes aim at Voice over Internet Protocol (VoIP) systems. Configurable CPU cores, such as those offered by ARC and Tensilica, can be extended with the addition of multiple MACs to deliver top-notch DSP performance.
Yet another class of DSP solutions focuses on the world of multimedia—chips dedicated to MPEG 2, MPEG 4, H.264, and other video data formats. Many companies offer ultra-low-power devices for mobile phones and similar battery-powered handheld systems. ARM's recent Cortex-A8 core also targets multimedia systems and delivers 2 GMIPS of compute throughput.
Meanwhile, some manufacturers push performance to the limit-via highly parallel architectures. Cradle Technologies' CT3616 uses 16 DSP cores and eight general-purpose processors on one chip. Through software, it can be configured to encode 16 realtime MPEG-4 channels at SIF resolution and 16 G.711 voice channels. It delivers an aggregate compute throughput of 24 GMACs when clocked at 375 MHz. In effect, it's a 16-channel digital video recorder.