Awakening to a CD-player/clock radio. Chatting on a cell phone. Listening to an MP3 player. Watching a DVD-based movie on a widescreen TV. It's hard to go a day without coming in contact with at least two or three devices that contain some form of dedicated digital-signal-processing engine. Available DSP engines range from relatively simple 16-bit units, usually in the form intellectual property (IP) blocks, to complex 32-bit and larger devices that employ very-long-instruction-word (VLIW) architectures.
The cell-phone market consumes the lion's share of all DSP blocks produced, but a majority of them are 16-bit engines. These are mostly blocks of IP co-integrated with the control processor and other functions.
However, as more cell phones offer multimedia-type functions (MP3 players, still-image capture, video conferencing, etc.), more complex DSP engines are arriving and being added to these phones. To that end, ParthusCeva (formerly known as DSP Group) unveiled its next-generation scalable processor core, called Cedar. This block of IP has an architecture that combines VLIW and a single-instruction/multiple-data approach to perform many more operations in parallel. Plus, designers can tailor the architecture both in terms of compute resources (the number of multiplier-accumulators) and in data word width (16, 24, or 32 bits) as the system is being designed.
Less powerful cell phones can go more conservatively with the dual-core approach of the OMAP5910 processor from Texas Instruments (TI). The chip combines the real-time processing capabilities of a TMS320C55x DSP core and the control capabilities of a TI-enhanced 32-bit ARM processor core. The company targets the chip at the high-end 2.5G and 3G handsets, as well as PDAs that incorporate communications capabilities.
Integrating multiple engines on a DSP chip can overcome the clock speed limit of current fabrication processes. Companies have taken two paths to accomplish this task. In one approach, VLIW architectures are created to combine a half-dozen computational blocks that all operate in parallel. TI's TMS320C6000 series was one of the first to offer such an architecture. Here, the blocks are all controlled by an instruction word that's partitioned to enable simultaneous control of all blocks.
Two additional examples of this approach include the recently released BSP-15 media processor from Equator Technology and the Nexperia processor developed by Philips. The BSP-15, for example, delivers up to 40 GOPS of video processing power. It can replace both the control processor, MPEG codecs, and other functions to implement a system like a set-top box or a personal TV recorder. The Philips Nexperia family offers several variations that can target different applications, including set-top boxes, DVD recorder/players, and HDTV systems.
The other architectural approach taken by a few companies employs a highly parallel array that contains dozens to hundreds of small processors. This single-instruction/multiple-data architecture works well on large arrays of data, such as those found in image and video processing. One newcomer to this market is PACT, which has developed an array architecture called XPP (eXtreme processor platform). It can deliver scalable performance from a few to hundreds of GOPS. Other vendors developing array-processor-like architectures include Adelante, Improv Systems, Jazz Semiconductor, and 3DSP.
One emerging trend is the development of reconfigurable architectures aimed at high-performance DSP applications. Among many others, these applications include wireless basestations and voice channels in telecommunications systems. QuickSilver Technologies, Elixent Ltd., Morpho Technologies, RadioScape, and a few other companies are working in this arena.
>STANDALONE 16-BIT DIGITAL SIGNAL PROCESSORS are beginning to fade. Their functions are becoming co-integrated onto ASICs to help reduce component count and lower power consumption in such applications as cell phones and pagers.
>HIGH-PERFORMANCE ARRAY-BASED PROCESSORS will provide scalable performance—from a few to a hundred GOPS—by offering platforms of dozens to hundreds of processors that employ a single-instruction/multiple-data architecture. Such processors are ideal for crunching large arrays of data such as those encountered in imaging and graphics.
>SOFTWARE-CONFIGURABLE DSP CHIPS will give designers the flexibility to develop families of end products. They will provide a highly integrated set of resources such as multiple multiplier-accumulators, dedicated blocks such as an MPEG 2 encoder, and still others that can be tuned via the software to various tasks.
>LOWER-POWER DSP ENGINES with operating voltages of under 1 V and performance levels of 20 to 40 MIPS will be readily available for use in handheld portable systems such as next-generation MP3 players, portable DVD players, low-end cell phones, and many other applications.
>VLIW PROCESSORS will become easier to program because C-based programming tools can hide the complexity of coordinating operations of all programmable engines typically integrated on chip. Programming the engines has typically been the most complex challenge due to the many simultaneous operations possible with the multiple compute blocks.
>COMBINATION DSP/CONTROL-PROCESSOR ASICS will become mainstream for low-end cell phones and consumer appliances, such as MP3 and CD players. Processor performance for these applications will also increase from the 20 MIPS needed for standard audio and MP3 algorithms to about 40 MIPS for more complex algorithms like Windows Media audio (WMA).
>DSP CORES will bring more scalable performance options via software tools. This will allow designers to change data-word width, the number of computational units (multiplier-accumulators), and other aspects of the architecture to optimize it to the system application.
>SOFTWARE TOOLS WILL IMPROVE in efficiency and minimize the overhead penalties suffered when writing DSP algorithms in a high-level language such as C. Part of this improvement will come from better compilers that can more efficiently convert the high-level commands into the native instruction set. Additional improvements will come from the chips themselves, which will have higher performance levels that can handle the overhead and architectures better tuned for the high-level instructions such as those found in C.
>NOVEL ARCHITECTURES will exploit the ability to integrate, on a single chip, a large number of configurable computing elements interconnected by a programmable fabric. Such architectures will provide the maximum flexibility to handle complex algorithms that new applications will require the chips to execute.
>FLEXIBLE DSP SUPPORT will be provided by new generations of FPGAs. These FPGAs will be able to implement multipliers, shifters, and other DSP compute elements used to accelerate DSP functions, or serve as a coprocessor to tackle a speed-critical application.