DVD Encoding Complexity Drives Embedded CPU Core Choice

Requirements vary when choosing embedded CPU cores for system-on-a-chip designs. For DVD encoding, system engineers should emphasize an embedded CPU's level of flexibility and programmability. It's important to select an embedded CPU that gives engineers top-level control of the entire coding process in a high-level language (HLL) such as C or C++.

Being equipped with a basic understanding of how the encoding engines operate makes it relatively easy to modify an encoding algorithm through basic reprogramming. Programmers then can get immediate feedback on the video quality. This kind of CPU programmability via C/C++ is critical for supporting an assortment of different algorithms, including adaptive ones, to provide designers with an ample range of product differentiation.

A completely hardware-based architecture, on the other hand, locks systems designers into a single algorithm. Implementing a newer algorithm or performing alterations to an existing one requires a complete hardware change, thus incurring major design-engineering costs. Even architectures that are programmable may impose similar restraints if the microcoding isn't user-friendly.

It also is best not to get bogged down by low-level, mundane, fixed tasks like discrete-cosine-transform (DCT) and variance calculations. Designers should choose an embedded CPU that lets them operate at a higher level, focusing on issues that affect video quality and/or bit rate, such as quantization selection or mode decision.

Quantization and Quality Quantization (Quant) selection, in particular, plays a crucial role in DVD video quality. It's important to have the necessary CPU flexibility and programmability to target the proper levels of Quant selection, not only for video quality, but also for system differentiation. Today, there may be no single best viable algorithmic equation that can produce an "optimum" quantization value. But through the programmability features of an embedded CPU, system engineers have the ability to easily incorporate tomorrow's algorithms, allowing the product to improve as compression technology evolves.

To understand why Quant selection is so crucial to video quality, you must understand that real-world video compression is a lossy process. A certain amount of image detail must be sacrificed to achieve DVD bit rates. You simply can't have high compression without significant loss of data. But once this data is lost, it's gone forever. Ideally, this loss is introduced where it's least noticeable. That' what quantization is all about--it allows you to decide how much loss you can introduce into part of the image or picture.

In MPEG, quantization takes place in the frequency domain. An 8-by-8 block of pixels is first converted to an 8-by-8 block of frequency coefficients by the Discrete Cosine Transform (DCT). The DCT itself is a completely reversible process, meaning that the original coefficients can be converted back to the original 8-by-8 block of pixels using an Inverse DCT (or IDCT), without any loss.

The quantization process essentially involves dividing these coefficients by a Quant value during encoding, and then multiplying by the same Quant value during decoding. This does two things to the original frequency coefficients: First, it reduces the accuracy of the larger values. In general, this type of "loss" is very minor, and typically unnoticeable, after conversion back to the spatial domain. The second effect is the complete elimination of the smaller coefficients. This is much more serious, because it can introduce very noticeable losses in the spatial domain, especially if it involves a lower frequency.

Although the most noticeable distortion results from eliminating coefficients, this also contributes most to higher compression ratios. And there's the rub. To achieve high compression ratios, you have to start throwing away coefficients--a lot of them. At a bit rate of 4 Mbits/s, an average I-block uses only about six coefficients. Some blocks need more and some need fewer. But which ones? The trick to successful quantization is finding a way to eliminate all coefficients that are the least noticeable to the human eye.

Ultimately, the entire quantization algorithm boils down to a sophisticated modeling of the human visual system. Some kinds of data loss are more tolerable than others. While this process may sound simple, it's highly complex. Still, the basic principle is to throw away enough data to meet the compression required, but do it in a way that's least noticeable. It's a tricky proposition; there are many different ways to perform Quant selection, and engineers are always finding better algorithms. That's the reason this particular encoding function should be performed in software and not in hardware.

Variable Bit RateEmbedded CPU flexibility and programmability also play a part in attaining video quality goals in a DVD system design. Consider that most DVD systems are based on variable bit rate (VBR). The instantaneous bit rate of a DVD system varies continuously with the complexity of the image being created. This allows the encoded video quality to remain relatively constant. The average bit rate for DVD is 4.7 Mbits/s. But actual data rates can slip below 2 Mbits/s and accelerate to a maximum of over 10 Mbits/s. Because a DVD player/recorder is a "closed-loop" system, this is the method used to generate high-quality images from the DVD disk.

In contrast, a broadcast application is an open-loop system with severe constraints on channel bandwidth and buffers. In this case, the bit rate control of choice is Constant Bit Rate (CBR). Think of this as the antithesis of VBR.

In the real world, VBR and CBR actually represent the endpoints of an entire range of algorithmic possibilities. The specific algorithm selected depends on various system parameters, and, as in Quant selection, these algorithms are constantly evolving. With that in mind, it's simply not smart engineering to hardwire the encoding function with the latest algorithm. A single hardware platform that can easily accommodate future algorithmic refinements is not only more economical, but will also accelerate the evolution of video-encoding technology.

Each To Their Own Duties In the encoder circuitry shown in Figure 1, the embedded CPU (see "Embedded CPU Core Is Programmer Friendly below,") controls the bit-stream generation and monitors the results of the encoder's various subengines. These include such blocks as motion estimation, mode decision calculations, quantization/inverse quantization, and the variable length encoder (VLE). The CPU tracks and reads the motion vectors in a register after motion estimation is performed. Likewise, the same is done with mode decision. This module makes the necessary calculations, and afterward, it stores the results in several registers. The CPU monitors the calculations, and then via software arrives at a mode decision and a quantization value.

In this encoding application, a custom coprocessor performs variable length encoding (VLE). This encoding operation is a reversible and lossless procedure for coding. It assigns shorter code words to frequent events and longer code words to less-frequent events, thereby achieving further compression. Huffman coding is the most often utilized form of VLE, due to its simplicity and efficiency in reducing the number of bits necessary to encode without losing information.

Several key attributes contributing to design flexibility differentiate one encoder from another. They include quantization value selection, rate control, how many frames are being stored, the number of bidirectional or interpolated pictures (B frames) between anchors, or whether or not original and/or reconstructed data is being used for encoding.

The embedded CPU in this encosing application also supports syntax generation for all six MPEG layers. Each layer supports either a signal-processing or a system function (see "Two Paths For MPEG Syntax Generation below,").

To sum things up, the ideal encoder combines programmable and hardwired functions to achieve the best possible cost/performance trade-off. It performs math-intensive, well-defined functions in hardware using hardwired "subengines," while it will make programmable those functions and decisions that allow engineers to differentiate the end product.

Embedded CPU Core Is Programmer-Friendly

The 81-MHz, 71-MIPS 16-/32-bit TinyRISC TR4101 embedded microprocessor core consists of a register file, system control coprocessor (CPO), arithmetic logical unit (ALU), shifter, CBus interface, and a computational bolt-on (CBO) interface (Figure 2). The register file contains general-purpose registers, supplies source operands to the execution units of the encoding function and handles the storage of results to the target registers.

The CPO processes exceptions, which includes interrupts; the ALU performs the necessary arithmetic and logical operations in support of the encoding functions and does address calculations; and the shifter performs shift operations. The CBO interface gives the systems engineer a way to insert specialized arithmetic instructions to the microprocessor. For example, an embedded CPU can attach a multiply-divide unit (MDU) via the CBO interface to perform such encoding-support functions as complex rate-control calculations.

The CBus interface passes data to and from the core. Thus, systems engineers can attach up to three tightly coupled special-purpose coprocessors that enhance the embedded microprocessor's general-purpose computational power. By taking this approach, high-performance, application-specific hardware is made directly accessible to a programmer at the instruction-set level.

The embedded CPU's code is written in C/C++, then compiled into instructions and stored in memory. Besides handling syntax generation for all MPEG layers, the code also handles frame control and type, rate control, audio and system-stream multiplexing, and parts of the mode decision process.

Two Paths For MPEG Syntax Generation

The MPEG syntax layers correspond to a hierarchical structure. A sequence constitutes the top layer of the video-coding hierarchy, consisting of a header and a number of group of pictures (GOPs). A GOP is a random-access point, meaning it's the smallest coding unit that can be independently encoded within a sequence. It contains a header and a number of pictures. The GOP header features time and editing information.

The TinyRISC embedded TR4101 embedded microprocessor core (see the figure in "Embedded CPU Core Is Programmer-Friendly above,") supports syntax generation for all six MPEG layers, each of which supports either a signal-processing or a system function. The layers are: system, sequence, GOP, picture, slice, and macroblock (Figure 3).

The three types of pictures are intracoded (I), predictive-coded (P), and bidirectionally predictive-coded (B). "I" pictures are coded without reference to any other pictures; "P" pictures are coded using motion-compensated prediction from the previous I or P reference pictures; and "B" pictures are coded using motion compensation from a previous and a future I or P picture. Pictures consist of a header and one or more slices. The picture header includes time, picture type, and coding information.

A slice provides immunity to data errors. If the encoded bit stream is unreadable within a picture, the decoder can recover by waiting for the next slice, without having to drop an entire picture. Slices consist of a header and one or more macroblocks. The slice header contains position and quantizer scale information.

A macroblock is the basic unit for motion compensation and quantizer scale changes. In MPEG 2, the block can be either field or frame coded. Each macroblock consists of a header and six component 8-by-8 blocks; four blocks of luminance, one block of Cb chrominance, and one block of Cr chrominance. The macroblock header contains quantizer scale and motion compensation information. A macroblock has a 16-pixel by 16-line section of luminance component and the spatially corresponding 8-pixel by 8-line section of each chrominance component.

Blocks are the basic coding unit, and the DCT is applied at this block level. Each block contains 64 component pixels arranged in an 8-by-8 order. After the DCT, the resulting 8-by-8 block of coefficients are quantized, zig-zagged, grouped in run-level pairs (the number of zero coefficients preceding each non-zero coefficient is the "run"; the nonzero coefficient, itself, is the "level") and finally, Huffman encoded. Because these operations are very math intensive, only the Huffman coding is performed by the CPU.

The embedded microprocessor also handles the rate-control function of the encoder. Rate-control algorithms are feedback mechanisms that regulate the number of bits generated during the transform coding process over a given elapsed time. They're typically divided into two groups--fixed or variable.

For a fixed data rate, the output bit stream must be constant to ensure the encoder operates properly with a fixed-rate communications channel, such as satellite. Equally important, it must also ensure that decoder receiving the fixed-rate bit stream operates properly. Over a period of time determined by the size of the encoder's output or channel buffer, the average number of bits per macroblock must be held below a fixed threshold to prevent the decoder's video output from underflowing or overflowing. As a result, the quality of the video varies inversely with the image complexity.

When in a variable data-rate mode, the instantaneous bit rate is allowed to vary continuously in proportion to the level of complexity of the image. This is also known as "constant quality" bit-rate encoding. Variable rate control can be useful when there are multiple channels being multiplexed onto a single transport stream, or in a closed-loop system such as DVD. By knowing the type of the source material in advance (using so-called forward-analysis techniques), the encoder can optimize the image compression, based on statistics and image complexity, and set priorities.