Electronic Design

Video Processing Brings New Meaning To Motion

Hundreds of video processing and compression methods present a real challenge for analog and digital video engineers.

In the not-too-distant future, a single fiber-optic cable will deliver phone service, an Internet connection, and of course, digital television (DTV), movies, and other multimedia to the home. Until that time, though, most consumers will have to deal with some combination of plain-old telephone-service phone wires, digital satellite, cable, Ethernet, USB, wireless networks, and so on.

If we narrow the focus to DTV and other forms of high-resolution video, most consumer products require one or more forms of video decoding and processing. With high-definition television (HDTV), Internet Protocol television (IPTV), set-top boxes, high-definition DVDs, and mobile media devices, there are hundreds of video processing methods, each with its own advantages and disadvantages (see "Standards Around The House" at www.electronicdesign.com, Drill Deeper 13287).

While hundreds of video codecs are in use today, most consumer applications integrate only a handful of codecs because of the desire to standardize and use the ones that offer the best benchmarks.

Emmy Award-winning MPEG-2 revolutionized the way consumers watch video at home. This was driven by DVDs that were encoded with MPEG-2. MPEG-2 is still in prolific use today, primarily for encoding broadcast signals and movies on DVDs. Even commercial videos written on Blu-ray DVDs are currently being encoded with MPEG-2. This is expected to change soon since HD DVDs are mostly encoded using VC-1 and H.264 standards.

MPEG-2 has 10 profiles that work at bit rates ranging from 96 kbits/s for 176- by 144-pixel resolution at 15 frames per second (fps) to 300 Mbits/s at a resolution of 1920 by 1080 pixels at 30 fps or 1280 by 720 at 60 fps. DVDs, digital video broadcasting (DVB), and ATSC have different supported resolutions. MPEG-2 also offers several other key qualities:

  • Video images are split into one luminance (Y) and two chrominance (Cb and Cr) channels.
  • Blocks of luminance and chrominance arrays are organized into macroblocks.
  • The blocks are then used for motion estimation and motion vectors.
  • Blocks have a discrete cosine transform applied, which is then quantized, re-ordered to maximize compression, and compressed using a Huffman encoding scheme.

Built on MPEG-1 and MPEG-2 feature standards, MPEG-4 includes a vast array of parts (22 total, counting the parts awaiting approval). It can deliver the same quality video as MPEG-2 at lower bit rates. And, MPEG-4 introduced new concepts like rendering natural or synthetic objects together to make a scene and support for user interaction.

Video producers can appreciate MPEG-4 for its improved content protection and ability to create more flexible and reusable content. Thanks to these improvements, video can be delivered over a broadband link. Satellite and DVB providers also adopted MPEG-4.

Since MPEG-4 has so many parts, developers decide which parts get implemented based on the needs of the end system. Unless you're designing for backward compatibility, there's really no reason to implement all of its parts, because some only offer improvements over older parts. For video, Part 2 and Part 10 both include codecs, but the Part 10 codec outperforms the Part 2 codec by about 2:1.

H.264/MPEG-4 AVC
H.264 (MPEG-4 Part 10) Advanced Video Coding (AVC), which is quickly becoming the de facto standard for home consumer video products, is mostly known for its high data compression and quality. Developed by the video coding group of the International Telecommunication Union Telecommunications bureau (ITU-T), H.264 includes a prolific set of video-processing features.

Within H.264 are seven profiles and 16 levels. This makes the codec very versatile. At the highest profile (High 4:4:4), the maximum bit rate ranges from 256 kbits/s at either 128 by 96 pixels and 30.9 fps to a bit rate of 960 Mbits/s at resolutions of 1920 by 1088 with 120.5 fps, 4096 by 2048 at 30 fps, and 4096 by 2304 at 26.7 fps. H.264 incorporates many video-processing features, including, but not limited to:

  • Pulse-code modulation (PCM) macroblock representation mode and context-adaptive binary arithmetic coding (CABAC), both of which permit lossless encoding/recreation of some data samples
  • Multipicture and variable block size motion compensation with weighted prediction for scaling and offset
  • Image sharpening using six-tap filtering for derivation of half-pel luma sample predictions
  • /-pixel precision for motion compensation
  • In-loop de-blocking filter
  • Flexible macroblock ordering (FMO), arbitrary slice ordering (ASO), data partitioning (DP), and redundant slices (RS) mostly used to improve error immunity
  • Spatial block transforms to reduce ringing and allow for better compression.

Windows Media Video (WMV) version 9 was the first standard developed by Microsoft (along with 15 other companies) to be submitted to the Society of Motion Picture and Television Engineers (SMPTE) in 2003. It was finally approved last March, making it the only non-proprietary codec developed by Microsoft.

VC-1 may be H.264's biggest competitor. Developed for progressive encoding, VC-1 supports resolutions from 176 by 144 pixels at 15 fps and 96 kbits/s to 1920 by 1080 pixels at 60 fps or 2048 by 1536 pixels at 24 fps and 136 Mbits/s. While the bit rates are lower than those required by MPEG-2, the image quality is the same. VC-1 breaks down into three profiles.

The advanced profile offers baseline intra-frame compression; variable-sized, 16-bit, and overlapped transforms; four motion vectors per macroblock; /-pixel luminance/ chrominance motion compensation; extended motion vectors; a loop filter; dynamic resolution change; adaptive macroblock quantization; B frame; intensity compensation; range adjustment; field and frame coding modes; GOP layer; and display metadata.

Although WMV is predominately used for cyberspace videos, both Blu-ray and HD DVD have adopted the VC-1 codec as a mandatory standard. The implication is that, going forward, all HD DVD and Blu-ray players must be able to decode VC-1.

Most consumer video products are differentiated by the video-processing techniques and algorithms applied to the video stream during and after decompression. Some video-processing techniques may be accomplished using a different method than the ones mentioned below, and the techniques mentioned may go by different names in different applications.

Film is typically recorded at 24 fps. Electronic viewing on TVs requires the video to be converted to fields per second. This process is known as telecine, or X:Y Pulldown. Converting film to NTSC format for viewing on TV is referred to as 3:2 pulldown. This process converts the 24 fps to approximately 60 fps.

This is a two-step conversion. First, it slows the film down by 1%. Second, it converts four frames of film into 10 fields of NTSC. Step two is accomplished by using the interlaced nature of NTSC to stretch four frames into five. The 3:2 (or really 2:3) comes into play as the telecine places one film frame across two fields and, alternately, the next frame across three.

As with any conversion process, telecine produces some distortion that's known as telecine judder. To remove telecine judder and reproduce the original signal, reverse telecine (reverse 3:2 pulldown) is applied to the converted signal to convert it back to 24 fps. Reverse telecine is a form of de-interlacing.

To take full advantage of HD televisions and other displays that display non-interlaced (or progressive scan) video, a video broadcast stream must be converted from interlaced to non-interlaced. This conversion process is called de-interlacing. De-interlacing must be performed on signals that are converted from a composite signal, such as NTSC, to a component signal like RGB or YCbCr.

De-interlacing algorithms generally come in several flavors (and are known by different names), including film mode, video mode, motion compensation, field combination (weaving, blending, selective blending), and field-extension (half sizing, line doubling). Each form has its benefits and drawbacks, and the best de-interlacers use some combination of these methods. Still, MPEG decoders sometimes use 3:2 pulldown to avoid the requirement to de-interlace.

With almost every video application that offers multiple resolutions, scaling is mandatory. This is because signals must be converted from one resolution to another to fit the end application. As with de-interlacing, component signals may not be scaled unless they're first converted to composite.

There are several scaling schemes, including anti-alias resampling, linear interpolation, and pixel dropping/duplication (also called nearest neighbor scaling). Pixel dropping removes pixels to scale the image down, while duplication for upscaling duplicates pixels to enlarge the image. As you can imagine, this method produces marginal results and introduces aliasing components.

Linear interpolation isn't much better. It also may result in aliasing, especially with high-frequency components. Antialiased resampling produces the best results because it ensures that frequency content scales correctly. There are several approaches to anti-aliasing, and the optimal approach is likely best determined by the end application.

Video frames are normally broken up into groups of 16 samples by 16 lines called macroblocks, which are used for motion estimation and motion vectors. When macroblocks are decompressed and recreated, adjacent macroblock edges may not quite match due to natural errors occurring during the recreation of a lossy codec.

When the edges don't quite match, the macroblock boundaries become visible. To remove these unwanted effects, a low-pass de-blocking filter must be applied to blend and smooth these artifacts. H.264 includes a de-blocking filter that's applied at the macroblock level. For other codecs, the filter must be constructed separately.

When an encoder discards too much information during quantization of macroblocks, artifacts appear as distortions around the edges of images. This type of error is known as ringing. As with de-blocking, de-ringing uses an adaptive low-pass finite impulse response filter to hide the effects of ringing.

Errors can occur during broadcast or playback for a number of reasons. For example, DVDs often become scratched, and the errors can't be corrected. Error concealment can often compensate for uncorrectible errors. Techniques such as interpolation or replacement of bad data from earlier or later frames can be used to hide such errors.

Edge enhancement improves the perceived sharpness of an image by applying a processing filter to increase the contrast between the lighter and darker pixels on opposite sides of edges. This processing technique may be useful in lower-grade displays and has a better effect when viewed from afar. Because edge enhancement reduces the quality of the picture, it generally isn't used in high-end displays.

Transcoding is a direct translation of one digital format to another. It involves taking an encoded video format and decoding/decompressing it to the original format as if it were to be played back (Fig. 1). Then the video stream is compressed/ encoded to the new desired format. Most codecs are lossy, causing errors and their associated artifacts to become cumulative. This ultimately leads to continually degenerating quality.

"Transcoding is going to emerge as the most challenging function for video systems over the next several years," says Jeremiah Golston, CTO for Streaming Media, Texas Instruments.

"High video quality must be sustained while converting between a wide range of video codec formats, bit rates, and resolutions to share content across media devices in the home," Golston adds.

"Innovation in both algorithms and architectures are needed to achieve these goals within tight system budgets for I/O bandwidth and overall cost. Real-time HD transcoding solutions with DSPbased technology can offer the right mix of high performance and flexibility to meet these demanding requirements."

If speed is the top priority, ICs with dedicated silicon that handle codecs and video processing should be your first choice. The table provides an overview of available chips based on their native handling of codecs and video-processing capabilities. It is by no means exhaustive, and most of the companies listed make more than one product or family for video processing.

HDTV isn't just about digital processing. Flat-panel HDTVs have to match multiple display standards to their single fixed-display resolutions, creating design headaches not experienced with CRT-based TVs. Finding solutions for them is tricky enough for graphics cards in PCs. But HDTV receivers have to do it faster and better, and the challenge only gets stiffer when the HDTV must deal with legacy video from VHS/DVD players as well.

Interestingly, this is mainly a problem for high-end systems. Mass-market HD sets need only deal with cable feeds. Consumers have to pay extra for additional connectors for external SD sources. Chips called analog front ends (AFE) handle this, but they've got to be much more sophisticated than the familiar AFEs that are simply signal conditioners for analog-to-digital converters (ADCs).

Several analog companies offer products that deal with these issues one way or another. Intersil's Automatic Black Level Compensation (ABLC) function and Analog Devices' Noise Shaped Video approach these challenges with particular sophistication (see "The Devil Is In The Design Details," Drill Deeper 13289, and "Challenges Introduced By Legacy Video," Drill Deeper 13290).

Analog video signals contain horizontal and vertical retrace intervals where the CRT's electron beam is redirected to the start of a new line or field. But while panels lack vertical or horizontal retrace, they do need a pixel clock.

Analog video signals don't provide a pixel clock, so a phase-locked loop (PLL) must generate it. HD resolution requires a PLL with low jitter, but the range of HD standards creates a design challenge for analog PLLs because it's hard to optimize the loop filter across horizontal frequencies from 10 to 150 kHz.

Offset is another intrinsic AFE challenge. The AFE video path typically consists of a dc-restore clamp, offset and gain correction, and analog-to-digital conversion. A good dc-restore function will eliminate the offset at the input to the AFE, but the active devices that follow reintroduce offsets. The offsets are random: They vary from device to device, and they usually have large temperature coefficients, causing them to drift as the display warms up.

Here's the problem with offset. In component video, the Y signal (luminance, the gray-scale information) and the Pb and Pr signals (chrominance, the color information) are sent over three discrete channels. The Y signal is unipolar, and Y offset on the Y channel will affect brightness. The Pb and Pr signals are bipolar, and they form the orthogonal color space.

Random offsets on Pb and Pr move the center of this space away from 0 V. This adds color to what should be gray images and shifts the overall color space, causing colors to be displayed incorrectly (Fig. 2). Historically, display manufacturers have done a one-time calibration during the display's production test or simply not addressed it at all, shipping devices with large black level and color variations. As a result, it's up to the user to manually adjust the settings.

See associated figure


Analog Devices
Connex Technology
Texas Instruments
The MathWorks
W&W Communications

Hide comments


  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.