Programmable Media Processors Deliver Flexible Solution

May 12, 2003
Resource-rich multimedia engines handle MPEG-2, MPEG-4, and other video-processing tasks for entertainment and handheld systems.

Low-cost dedicated and programmable video engines deliver the performance and flexibility needed to handle the plethora of standards that decode or encode the expanding video capabilities within consumer and business applications. Most high-end general-purpose CPUs, such as the Intel P4 or the Sun UltraSparc, can handle the media encode or decode. But they're too expensive and power-hungry for consumer and portable system applications, like set-top boxes, DVD/personal video recorders, Internet appliances, cell phones, and more. Low cost is paramount when designing many of these applications.

That alone rules out CPUs initially targeted at desktop computers, because the total semiconductor bill of materials must typically be kept to less than about $70. Often, joining an inexpensive CPU with either an application-specific IC or a programmable high-speed DSP to handle the user interface will deliver the performance needed to handle one or more video streams. Such a combination, along with the necessary memories and other support circuits, can keep the cost within the desired range.

Various video algorithms require differing amounts of computational throughput. Most have an inverse relationship between the bit rate and the amount of horsepower required to process the data. Typically, the lower the bit rate, the more processing power it takes to either encode or decode the image and maintain the desired image quality.

Thus, a typical MPEG-2 decode algorithm might demand about 300 MIPS from a DSP engine, while an MPEG4 decode function takes slightly more as the algorithms are a bit more complex. Similarly, processing requirements for algorithms such as Microsoft's Windows Media Video could hit about 500 MIPS, because that format uses a more complex compression/decompression algorithm.

Programming these algorithms to execute on a DSP chip, such as the Blackfin or TigerSharc chips from Analog Devices or low-cost versions of the TMS320C5500 or C62/64/6700 families from Texas Instruments (TI), with a generic DSP solution is one way to tackle the problem. These chips pack an array of compute resources, including multiple ALUs and multipliers. They also shoehorn in system resources like multichannel DMA controllers and significant amounts of on-chip cache memory. (For more information about Analog Devices' Blackfin family, see "Cost-Savvy DSP Chip Trio Keeps Performance High," Electronic Design, March 31, 2003, page 42.)

TI offers a wide variety of VLIW-based (very-long-instruction-word) DSP chips that range from under $10 to over $500 in 1000-unit lots. Low cost coupled with high throughput are key design parameters, so let's look at the types of resources available on the cheapest members—the fixed-point TMS320C6204 and 620, as well as the fixed- and floating-point TMS320C6211B and 6711B.

Using its VelociTI VLIW-based C6000 series CPU to control twin datapaths, the TMS320C6211/12 or 6711/12 can execute up to eight 32-bit instructions per cycle (Fig. 1). The 6211/12 and 6411 families handle fixed-point calculations, while the 6711/12 supports both fixed- or floating-point computations.

Each datapath includes two ALUs (one floating point and one fixed point) as well as other blocks that perform data addressing and other functions. When clocked at its top speed of 200 MHz, the 6711/12 processors can deliver a peak throughput of 1200 MFLOPS, while the 6211 offers a peak throughput of 1333 MIPS when clocked at 167 MHz. The latest addition to the family, the TMS320-C6411, ups the clock rate to 300 MHz and can deliver a throughput of 2400 MIPS. But it sells for more than double that of the 6211B. To achieve the higher throughput, the 6411 incorporates an extension to the VelociTI VLIW architecture that allows each ALU to support single 32-bit, dual 16-bit, or quad 8-bit arithmetic operations during each clock cycle. That dramatically increases the number of computations that can be done, especially on pixel-type data sets.

Taking aim at the high end of the DSP space, TI has also just released 720-MHz versions of its TMS320C6416, 15, and 14 DSP chips—a 20% speed improvement versus the company's best previous devices. The most highly integrated chip, the 6416, includes on-chip Viterbi and Turbo-code coprocessors to improve its ability to handle more channels in 3G wireless basestations or implement adaptive antenna array processing while providing eight time slots for GSM/GPRS/EDGE modems. All three DSP chips include 1 Mbyte of on-chip high-speed memory and high-speed peripherals that accelerate applications and the processing of real-time data.

VLIW architectures are also employed by two other more dedicated chips, both aimed at media processing. Now in its second generation, the BSP-15 media processor from Equator delivers an equivalent throughput of over 10 Goperations/s (Fig. 2). It achieves such throughput by combining a VLIW controller that packs four integer ALUs, two 64-bit single-instruction/multiple-data (SIMD) ALUs, and two 128-bit SIMD ALUs, along with dedicated coprocessor blocks that perform variable length encoding/decoding, video filtering, audio processing, and so forth.

ONE CHIP FORMS THE SYSTEM Nearly a full system-on-a-chip, the BSP-15 also includes audio and video I/O ports, serial interfaces, a memory controller, and all memory needed to support the coprocessors. Via software, the chip can serve as the main processor in a high-end set-top box, HDTV controller, or other system that must handle multiple MPEG data streams (a videoconferencing system, for instance).

The Nexperia processor family from Philips also is based on a VLIW engine developed by the company several years ago. The TriMedia engine, surrounded by a mix of coprocessors and peripheral I/O functions, enabled Philips to develop several variations of the Nexperia chip, each with hardware resources targeted at specific applications like HDTV, set-top cable boxes, and DVD/personal video recorders.

Already available are the PNX8320 and PNX8500 processors. The PNX8230 targets applications that use 2D graphics and a single stream of video data. The higher-performance PNX8500, which can handle dual video streams, includes a 3D graphics engine.

Ratcheting performance up another notch, Philips developed two families of media processor chips, the PNX1300 and PNX1500. Targeted at media gateways, the PNX1500 includes a 10/100-Mbit/s Ethernet port, as well as many coprocessors and support functions that suit it to applications like a set-top box or personal video recorder (Fig. 3). To support video applications, the chip incorporates a video scaler and de-interlacer, a 2D drawing engine, a variable-length decoder, a DVD descrambler, an LCD controller, and other features. The PNX1500's revamped TriMedia VLIW engine (the TM3260) enhances its ability to process multimedia signals.

Another device homing in on network media processing is LSI Logic's recently released processor, a full system solution on a chip with an IEEE 1394 (FireWire) network interface. The DoMiNo network media processor contains an audio/video codec subsystem, host interface, graphics engine, and I/O subsystems to control both a hard-disk drive and a DVD drive (Fig. 4).

Dual 150-MIPS RISC processors provide the horsepower to handle DoMiNo's graphics and host control functions. These processors run a real-time operating system to handle all housekeeping functions and the C code to implement the signal-processing algorithms. The built-in graphics processor handles 24-bit RGB and 8-bit alpha blending and supports up to four graphic planes to provide on-screen display, backgrounds, and other capabilities. A flicker filter reduces the flicker on interlaced TV screens for better viewing of Web sites visited with a TV-screen-based browser.

The chip can perform both MPEG-2 (HDTV, SDTV) and MPEG-4 decoding. Motion-compensated de-interlacing algorithms enable format upconversion of SD for large-screen displays. Built-in A/V networking, thanks to the IEEE 1394 link interface, makes it easy to add networking to consumer devices such as digital video camcorders.

DSP, CPU CORES WORK TOGETHER By combining a high-performance RISC CPU with DSP support, chips like SuperH's SH5-100, the Nomadik processor from STMicroelectronics, and TI's OMAP 1500 can take on streaming media applications at power levels low enough to fit handheld applications. The SH5, when clocked at 400 MHz, delivers 700 MIPS and up to 2.8 GFLOPS throughput. Although it delivers such a high throughput, the core consumes less than 400 mW when running at 400 MHz and powered by a 1.2-V supply. In addition to packing a 64-bit CPU, the chip contains a 64-bit multimedia processing unit (a SIMD engine) and a 64- by 64-bit integer/multimedia register set.

The Nomadik media processor uses an ARM 926EJ processor core for its main CPU. It is Java-enabled thanks to the Jazelle Java extensions included in the core. Moreover, the chip is surrounded by audio and video acceleration engines, as well as a wide selection of peripheral interfaces—serial, parallel, storage cards, cameras, and USB on-the-go, among others (Fig. 5). The ARM core can run at a top clock speed of 350 MHz when implemented in STMicro's 130-nm CMOS process. Supporting the core is a memory management unit, 32 kbytes of instruction cache, 16 kbytes of data cache, a 16- by 32-bit multiplier, and strong real-time debug support.

Also based on an ARM 9-family core, TI's OMAP1510 processor combines the ARM core and a TMS320C55x DSP core. With a top clock speed of 200 MHz, the DSP section can tackle many of the media-processing requirements of cell phones and other handheld systems. The on-chip ARM core runs slightly slower, with a typical top clock speed of 175 MHz. Both the DSP core and the ARM 9 core have associated data and instruction caches.

Looking to cell-phone, PDA, and other very low-power applications, Emblaze Semiconductor devised the ER4521, which enables multimedia streaming in mobile and handheld applications. Similarly, NeoMagic designers have defined the MiMagic 3, a highly integrated, low-power multimedia processor. Another solution for MPEG-4 applications in handheld systems comes via Toshiba, which detailed a second-generation solution at last month's International Solid State Circuits Conference in San Francisco.

MPEG-4 APPLICATIONS NOW HAVE OPTIONS The Emblaze chip supports MPEG-4, H.263, JPG, MP3, AAC, and GSM-AMR compression algorithms. On-chip hardware macros enhance and accelerate the critical computations needed to handle the video and audio processing requirements. The company also offers the ER4520, an A/V codec for streaming, messaging, and conferencing applications. The chip delivers MPEG-4 with QCIF resolution. Actually a superset of the ER4520, the ER4521 adds an LCD controller and VGA capture capabilities. The end of the second quarter should mark the arrival of the ER4525, an application processor that includes even better graphics capabilities, a four-channel DMA controller, an SDRAM interface, and three universal serial ports.

Working on a second-generation MPEG-4 decoder/encoder, Toshiba designers just unveiled the enhanced device. It consumes a mere 160 mW when active and only 80 nA on standby. The chip packs four 16-bit RISC processors, 16 Mbits of embedded DRAM, and a 5-Goperation/s adaptive filter engine.

Unveiled late last year, the MiMagic 3 streaming media processor from NeoMagic aims at PDAs and other low-power portable systems. The chip combines an ARM 720T RISC processor core with 8 kbytes of cache that operates at up to 110 MHz. The applications processor achieves its high performance and low power through architectural innovations, including two independent memory buses: a static bus for flash memory and a separate synchronous bus for DRAM system memory. These two memory buses provide separate interfaces for simultaneous access to program and data storage, avoiding bus-contention issues. When powered by a 1.8-V supply, the chip consumes about 100 mW. On standby, the power can drop to 0.6 mW.

The abundant choices for handling multimedia video and mixed audio/video applications let designers select devices that best fit their application. Plus, with the availability of many media-processing functions as blocks of intellectual property, designers can also "roll their own" media processor if the features of off-the-shelf devices don't meet their needs.

Need More Information?
Analog Devices Inc.
www.analog.com

Emblaze Semiconductor Ltd.
www.emblazesemiconductor.com

Equator Technologies Inc.
www.equator.com

LSI Logic Corp.
www.lsilogic.com

NeoMagic Corp.
www.neomagic.com

Philips Semiconductors
www.semiconductor.philips.com

STMicroelectronics
www.st.com

Super-H Inc.
www.super-h.com

Texas Instruments Inc.
www.ti.com

Toshiba Corp.
www.toshiba.com/taec

About the Author

Dave Bursky | Technologist

Dave Bursky, the founder of New Ideas in Communications, a publication website featuring the blog column Chipnastics – the Art and Science of Chip Design. He is also president of PRN Engineering, a technical writing and market consulting company. Prior to these organizations, he spent about a dozen years as a contributing editor to Chip Design magazine. Concurrent with Chip Design, he was also the technical editorial manager at Maxim Integrated Products, and prior to Maxim, Dave spent over 35 years working as an engineer for the U.S. Army Electronics Command and an editor with Electronic Design Magazine.

Sponsored Recommendations

Comments

To join the conversation, and become an exclusive member of Electronic Design, create an account today!