High-Def Video Brings Telepresence Into Focus

Imagine that you’re walking into a darkened conference room. You switch on the lights and make a few phone calls. All of a sudden, three of your colleagues from across the globe appear at the conference room table as if they were sitting there in the dark all along. This represents the essence of telepresence—an ultra-high-end video-conferencing system.

These systems employ high-definition video on 50-in. or larger flat-panel displays with audio designed to make all of the participants’ voices seem like they’re coming straight from their lips. And that’s not all. Typically, factors such as lighting and even furniture are taken into account, with possibly half a conference table in one room and the other half in the remote room.

A telepresence system like this could cost several hundred thousand dollars, as is the case with the TelePresence 3000 from Cisco Systems. But viable alternatives exist at a variety of price points from companies such as Hewlett-Packard, Life- Size Communications, Polycom, Sony, Telanetix, and Vidyo.

Design engineers wanting to build telepresence and highdefinition video-conferencing systems, from high-end setups down to those that might run on PCs and video phones, should begin by surveying the hardware needed to implement these systems. The latest H.264 codecs are a good starting point.

H.264 CODECS The driving technology behind telepresence and high-definition video conferencing is the H.264 video standard, which provides over twice the compression ratio of MPEG-2. Several companies make H.264 codecs, including Fujitsu Microelectronics America, W&W Communications, and Mobilygen.

Fujitsu’s MB86H51 compresses and decompresses full highdefinition video (1920 dots by 1080 lines) in real time using the H.264 format (Fig. 1). This is a single-chip implementation for full HD H.264 high-profile version 4.0 video processing that incorporates embedded memory. It also compresses and decompresses audio in real time by utilizing formats such as the MPEG-1 Audio Layer.

The MB86H51 uses a proprietary algorithm that automatically applies less compression to areas in the image where compression artifacts are most noticeable to human vision, such as human faces or slow-moving objects, and increased compression to other areas. The effect is to maximize image quality for those critical zones. This feature also makes it possible to reduce image size to between one-half and one-third the size of the MPEG-2 format with an equivalent level of image quality.

“The advantage of our chip lies in our compression algorithm,” says Davy Yoshida, director of Business Development of Fujitsu Microelectronics America. “Comparing the compression of MPEG-2 and H.264 is 2.5 times the compression. So a 25-meg image will be 10 megs, at equal quality. But our chip can compress, with very little depreciation, to a smaller size, like 25 megs to 5 megs, and still show a very good quality picture.”

The chip also contains two blocks of 256-Mbit fast-cycle random access memory (FCRAM) embedded on-chip. The chip measures only 15 mm squared and consumes just 750 mW. The MB86H51 comes in a 650-pin FBGA package and began mass production in July of last year, priced at $295 in sample quantities. Fujitsu plans to develop a much more cost-effective version of this codec, and it may launch in the latter half of this year.

W&W Communications’ WW10K H.264 HD codec chip set consists of the WW10000BA single-chip encoder and the WW10001BA single-chip decoder (Fig. 2). The low encode-decode tandem delay as well as the ability to encode and decode 1080p and 720p video at low bit rates suit the WW10K chip set for high-definition video-conferencing and telepresence applications.

The WW10K runs at 110 MHz in single-chip implementations of the encoder and decoder. The WW10000BA encoder compresses 1080p or 720p HD video at bit rates that are two times lower than MPEG-2 HD encoders, with 15% better peak signal-to-noise ratio (PSNR). The WW10001BA decompresses the encoder’s bit stream into quality 1080i/p or 720p HD video.

The chip set has an encode-decode tandem delay of less than 35 ms or about 1 frame at 30 frames/s, delivering performance very close to the H.264 Joint Model. It can handle up to four video inputs simultaneously at different bit rates and resolutions, up to 1920 by 1088. This makes it possible to design systems that dedicate one camera per participant or group of participants and one display per participant or group of participants, delivering more immersive and lifelike video communications experiences.

Continue on Page 2

The WW10000BA encoder also integrates an advanced context- adaptive noise-reduction filter. This not only cuts noise in the source video, but also improves the encoder’s compression efficiency significantly, depending on the video content.

The WW10K H.264 HD codec chip set has been in mass production since April 2007. A development kit is available with HDMI, component video, Y/C and composite video inputs and outputs, and PCI and 10/100BaseT Ethernet interfaces.

Mobilygen’s H.264 HD codec system-on-a-chip (SoC), the MG3500, is a member of the company’s en-ViE platform (Fig. 3). It can encode HD content, including 720p60, 1080p24, 1080p30, or 1080i60 material. It additionally may be used to encode two 720p30 sources or encode and decode 720p30 content simultaneously.

The MG3500 supports H.264’s Baseline, Main, and High Profiles up to Level 4.1. Macro-Block Adaptive Field/Frame (MBAFF) encoding in the Main and High Profiles allows the highest quality per bit of interlaced material. It also supports IDE and CompactFlash and extends the Ethernet MAC capability to support Gigabit Ethernet.

Last June, Mobilygen announced its en-ViE platform of codec SoCs for the creation, playback, and distribution of HD H.264 video. The en-ViE platform offers High Profile 1080i H.264 encoding at full 1920-by-1080 resolution with both Context-Adaptive Binary Arithmetic Coding (CABAC) and MBAFF to provide the highest-quality video at any given bit rate.

The MG3500’s ability to perform full-duplex 720p30 encoding and decoding makes it possible to implement single-chip video-conferencing systems. But two chips are needed for 1080i60 resolution—one for encode and one for decode. The real-time transcoding of legacy HD MPEG-2 streams into H.264 helps minimize video storage requirements and enables reliable video streaming over wireless networks.

The SoCs integrate an ARM9 CPU dedicated to user applications. A programmable multimedia engine supports all leading audio formats, including AAC, MP3, G.7xx, and Dolby Digital. Network connectivity is provided via integrated Gigabit Ethernet and high-speed USB 2.0 OTG.

AES encryption and digital signature hardware provide secure networking and storage. Most popular video storage devices are supported, including USB, SD, MMC, CompactFlash, CE-ATA, and IDE. In addition, the en-ViE SoCs support digital image stabilization for use in IP cameras. The MG3500 HD codec costs $30 in high volume.

Since latency can be an issue in telepresence systems, Mobilygen has come up with a unique way of dealing with those requirements. “Typically, the requests we get are sub-100-ms response,” says Brian Johnson, vice president of marketing at Mobilygen.

“And we have the ability to encode at a slice level, which is a fraction of a frame. By making a slice smaller and smaller, you can minimize latency. You don’t have to wait until you get a whole frame before you start encoding,” says Johnson. “There are a number of things like that that you can do to meet the latency requirements. We can get encode, decode, and buffering in between, in under 50 ms.”

The en-ViE platform features a complete Linux development environment, substantially expanding upon Mobilygen’s existing H.264 software. This includes productionready application programming interfaces (APIs), drivers, optimized codec firmware, example applications, and complete reference designs to accelerate time-to-market.

DOING H.264 WITH DSPS One of the premier telepresence systems employs DSPs from Analog Devices to perform the necessary encoding and decoding of high-definition video, among other things. Design engineers from Cisco Systems used ADI’s Blackfin DSP as a main ingredient of its TelePresence 1000 and 3000 systems.

Continue on Page 3

Cisco’s codec is based on the H.264 video codec standard, and it provides best-in-class latency and up to 1080p30 video resolution occurring in real time. By minimizing the HD video encode/ decode latency, Cisco’s solution gives more “latency budget” back to the network. This enables swift deployment, with a minimum of hardware and software upgrades for the network to handle the new application.

With dual symmetric, 600-MHz, highperformance Blackfin cores, Blackfin ADSP-BF561 processors were the ideal choice for Cisco’s exceedingly challenging and complex video application. Cisco’s video-codec functionality is distributed across a multiprocessor farm of Blackfin ADSP-BF561 processors, delivering more than 0.5 tera-instructions/s of processing muscle. Thus, the video subsystem performs at truly best-in-class levels, enabling practical solution deployment.

On another DSP front, Spirit DSP recently announced the HD-enabled version of its flagship product, the TeamSpirit Voice&Video Engine. Available for both PC and mobile platforms, the engine now has new features to empower high-definition video-conferencing terminals and mobile devices with high-quality wideband voice and seamless video.

The upgraded version of the engine features wideband codecs (like Spirit’s proprietary IP-MR, GSM AMR-WB) and Ultra- Wideband codecs (like audio AAC LD) as the most advanced audio technology for high-end video telephony. In addition to video engine and software video codecs, the TeamSpirit Engine easily integrates with external hardware-accelerated video codecs. It also increases the video quality and robustness of the entire solution.

“Today, HD is the fast-moving trend, leaving grainy and blurry video conferencing behind. It allows users to see everyone in the room, and the resolution is high enough to detect even an eye roll. Full-duplex audio allows people to talk at the same time without muddying the sound,” says Slava Borilin, vice president of Products & Marketing at Spirit.

HIGH-DEF VIDEO SENSOR OmniVision Technologies calls its OV9710 CameraChip the first true HD video sensor for the mobile handset and notebook PC markets. The OV9710 is a 1-Mpixel CMOS sensor built with OmniVision’s proprietary OmniPixel3 architecture, which uses a 3- by 3-µm pixel, for optimal low-light sensitivity and high-quality HD video performance at 30 frames/s. It meets all camera phone (1280 by 720) and PC multimedia (1280 by 800) market requirements in terms of performance, quality, reliability, and power consumption.

This is the kind of sensor that can significantly raise the bar for PC-type video-conferencing applications. “With sensitivity ratings of over 2300 mV per lux-second, we are providing exceptional image quality at price points that are attractive to high-volume markets,” says Bruce Weyer, OmniVision’s vice president of marketing.

Designed for use as a 0.25-in. HD video camera, the OV9710 provides full-frame, sub-sampled, windowed 8/10-bit images in raw RGB format via the digital video port. The sensor delivers full-frame HD video at 30 frames/s in WXGA (1280 by 800) or 60 frames/s in sub-sampled WXGA (640 by 400) with complete user control over image quality, formatting, and output data transfer.

The OV9710 incorporates image-processing functions, including exposure control, gain control, white balance, lens correction, and defect pixel canceling. These functions are also programmable through the Serial Camera Control Bus interface.

Available in a variety of lead-free packaging options, the OV9710 operates from –30°C to 70°C. It’s currently available in sample quantities. Volume shipping is expected to begin in the second quarter of this year.