Digital video compression is already an important technology in virtually every type of video application. As the trend toward media convergence continues, compression and interoperability will only become more critical.
Among the most prominent digital video applications are DVD, HDTV, video telephony/ teleconferencing, and, more recently, video surveillance. Each of these consumer system-level technologies, however, has a different historical background. As a result, each has adopted different compression algorithms (see "Video Processing Brings New Meaning To Motion," Electronic Design , Sept. 1, 2006, ED Online 13291) .
Over the last few years, the number of standards in the market has grown significantly, particularly with the introduction of new codecs like H.264 (MPEG-4 Part 10) Advanced Video Coding (AVC) with its various profiles and Windows Media Video version 9 (WMV9) and its profiles. At the same time, engineers designing video systems still must deal with legacy equipment that supports only a few of the older standards like H.261, H.263, MPEG-2, or MPEG-4 part 2. But depending on the application, this equipment must be interoperable with the latest equipment integrating the newest algorithms.
Algorithm development is another part of the challenge. As new, more powerful algorithms are developed and standardized, they must be compatible with all pre-existing algorithms — a daunting task at best — or some powerful, universal transcoding scheme must be developed.
From an intellectual-property perspective, the problem becomes even more complex. Although many video coding algorithms (e.g., MPEG-2, MPEG-4, H.263, and H.264) are published standards, others (e.g., On2 and Real Video) are proprietary. Sometimes, proprietary algorithms become standards. WMV9, for example, began as a proprietary algorithm and was ultimately adopted by the Society of Motion Picture and Television Engineers (SMPTE) as the public VC1 standard.What Is Video Transcoding? Video transcoding converts one video format to the other. It has two distinct and important roles:
- Transcoding enables communication between legacy and emerging devices. Many existing video-conference systems, for instance, are based on the legacy video coding standard H.263. More recent video-conference systems utilize the H.264 baseline profile. Real-time video transcoding is necessary to realize communication between them.
- Networks, particularly the Internet, place bandwidth constraints on the video transmission. Most movies today, for example, are stored in DVD with MPEG-2 format. The bandwidth limits of video-on-demand and video-streaming-over-IP systems require the video data to be converted to a more bandwidth-efficient format by real-time video transcoding before transmission.
For a video-conferencing system, legacy and emerging video streams need to be converted between formats using video transcoding. For a video-on-demand application, the conversion is usually from a video stream encoded in a legacy video coding standard (MPEG-2, H.263) to a stream encoded in a new and advanced video coding standard (H.264 or VC1). The rationale for transcoding is that it saves up to 50% network bandwidth without losing any video quality.Background of a Video Transcoding System From an operational perspective, video transcoding is typically employed in the video infrastructure of a system central office. The most common implementation calls for the host processor to handle network traffic while multiple DSPs handle the video encoding and decoding involved in the task of transcoding. Usually, a single video multiport control unit (MCU) is powerful enough to take care of multiple video-transcoding channels simultaneously.
As an example, Figure 1 illustrates the basic video transcoding requirements and dataflow in a video conferencing system. DSP2 decodes the input video stream and generates a reconstructed video frame, which is transmitted to DSP1 over the Serial RapidIO interface. Another DSP will encode the reconstructed video frame into the target format. The most common scenario is that one end of the video conference uses H.263-based equipment while the other party uses H.264-based equipment.
Here, the host processor communicates with multiple DSPs (four in this instance) over a PCI connection. A critical feature for processor intercommunication in this example is the sRIO connections between the four DSPs. Since the data being transferred between DSPs are uncompressed video data typically at 30 frames/s, the bandwidth requirement between devices is huge.
Taking NTSC standard resolution (720 by 480) with color space YUV 4:2:0 video as an example, the size of each frame is 720 × 480 × 1.5 = 518,400 bytes. Transcoding at 30 frames/s means each channel requires approximately 124 Mbits/s. The choice of sRIO is key due to the video bandwidth requirements and the support for a flexible switching fabric. In turn, the advantages of sRIO become a critical factor in the choice of a DSP for this application.
An ac-coupled interface, sRIO offers three data rates at 1.24, 2.5, and 3.125 Gbits/s. The interface utilizes a serializer/deserializer (SERDES) interface to perform clock recovery from the data stream and incorporates 8/10-bit coding. The serial specification supports one-lane (1 ×) and four-lane (4 ×) port sizes. Its physical layer defines the mechanism for handshaking between link partners and handling error detection based on CRC. It also defines packet priority, which is used for routing within the switch fabric.
To take full advantage of sRIO's bandwidth capability, the DSPs must have sRIO interfaces. The built-in sRIO interface in Texas Instruments' TMS320C6455 DSP realizes four simultaneous links and enables peak data transmission at 20 Gbits/s bidirectionally.Video-Transcoding Prototype Implementation Transcoding is also appropriate for sending DVD-originated data over an IP network, such as in a company training application, video-on-demand application, or video-broadcasting application. In this case, MPEG-2 would be the source video format and VC1 would most likely be used as the target format. In this section, we will describe the implementation of a prototype of such a system using two TI TMS320C6455 DSPs.
Technically, video transcoding is required to solve many issues, such as format conversion, bit-rate reduction, and temporal/spatial resolution reduction. Correspondingly, different intelligent video-transcoding schemes are developed to fit different issues. The principle is to reuse information contained in the original incoming video stream as much as possible for complexity simplification.
For instance, motion vector (MV) mapping, discrete-cosine-transform (DCT) domain conversion, and residual re-estimation are popular techniques for video transcoding to reduce computational complexity significantly.
In addition, a simple and extendable architecture of transcoding is also desired. Because different video-transcoding solutions require tailoring algorithms and architectures in various ways and there's no single standardized video-transcoding scheme, the programmability of a DSP like the C6455 DSP fits this domain.
In the remainder of this section, we will propose a general video-transcoding architecture and prototype that fits all kinds of transcoding schemes. To fit different scenarios in video transcoding targeting, we pick the simplest transcoding scheme that fully re-encodes the decoded video stream subject to new constraints.
This initial video transcoding implementation does not reuse the information contained in the original incoming video stream and demonstrates the performance capability to handle the full complexity of decoding and re-encoding. However, this video transcoding architecture and software infrastructure can be extended to leverage intelligent transcoding schemes MV mapping, DCT domain conversion, etc.) to increase channel density and exploit potential quality optimizations. Many conventional and novel transcoding schemes can be implemented using this architecture based on the flexible hardware/software framework.DSPs Are Crucial High DSP computational performance, like that provided by the C6455 DSP, is a prerequisite for video encoding and decoding. Other features also are critical for video-infrastructure applications, and they can be broken down into four primary areas:
Multiple powerful I/O options : Systems designers address problems from different perspectives, which means a DSP for video-infrastructure applications should provide I/O options for board-level connectivity. As previously mentioned, an sRIO port is built in for interdevice communications. A high-throughput message-passing scheme used by sRIO achieves 95% utilization of the available data bandwidth. Other I/O options are a 1-Gbit/s Ethernet media access controller (EMAC), a 32-bit double-data-rate (DDR2-500) memory controller, and a 66-MHz Peripheral Component Interconnect (PCI) bus.
Efficient on-chip data movement: In video infrastructure applications, DSPs act as slave devices to the host processor. Ensuring high-throughput, low-latency, concurrent data transfers between masters and slaves is therefore important. The architectural consequence of these requirements is that peripherals, internal memory, and the DSP core are interconnected through an efficient switched central resource (SRC), like that in the C6455 DSP.
Dataflow streamlining is also important. Improvements are realized by employing 256-bit wide memory buses and an internal DMA (IDMA). The IDMA performs background data movement between the two levels of internal memory and to/from the peripheral bus.
Large on-chip memory: Compared to off-chip SDRAM, on-chip SRAM is much faster and its size is much smaller due to its implementation cost. For a typical video application, the on-chip memory mainly serves two purposes. First, it stores code and data that are accessed frequently, such as a variable-length-code (VLC) table, and so on. Second, it swaps in/out temporal data before/after processing. Usually, the more on-chip memory available, the better the application performance. Up to 2 Mbytes of on-chip SRAM are deployed in a C6455 DSP, which helps boost video-application performance and makes it possible to handle multiple channels.
Code compatibility: Backward code compatibility is important because a great deal of code was developed for video applications long before transcoding for video-infrastructure applications became commonplace. Compared to instruction set change, the DSP core architecture is the best place to improve performance for critical signal-processing operations.
For instance, the C6455 has two architectural innovations. The first is the introduction of a loop buffer, which potentially improves the software pipeline efficiency of small loop code. The other is the use of 16-bit versions of native 32-bit instructions, which significantly reduces code size and, therefore, lowers the program cache miss rates.Prototype ImplementationFigure 2 illustrates a video-transcoding engine using MPEG-2 and WMV9 as the originating and target formats. Although the MPEG-2/WMV9 combination is expected to be very common, the programmability of DSPs makes it easy to handle virtually any combination of source/destination video formats.
The system's dataflow begins on the left side of the diagram, with the compressed MPEG-2 video file stored on hard disk, and ends with the flat-panel display through the Windows Media Player software. In this demonstration vehicle, the video is in NTSC standard (720 by 480 pixels) resolution and being transcoded at 30 frames/s.
The streaming receiver module running on DSP1 buffers the MPEG-2 stream and manages the input data for the MPEG-2 decoder module. The data-receiving operation is controlled by TI's Network Development Kit (NDK) library, which essentially is a TCP/IP stack. DSP2 also has an NDK-based HTTP server. It handles the streaming request sent by the Windows Media Player as well as transmits the ASF packets to it. The Windows Media Player then decodes the ASF packets and displays the video on screen.
One of the most interesting and challenging aspects of the dataflow is the interaction over the sRIO interface between the two DSPs. For each video frame transfer, this involves:
- As soon as DSP1 completes sending the video frame, it continues to send what is known as a DOORBELL package in the sRIO protocol specification. The DOORBELL package generates a system interrupt on DSP2, which is notified that a frame is available and then starts WMV9 encoding. After the frame is encoded, DSP2 sends a DOORBELL package back to DSP1, which again triggers an interrupt to DSP1 to notify that DSP1 can continue to send over the next frame. In the actual implementation, a PING-PONG buffer scheme is deployed to parallelize the encoding/decoding and data-transfer operation. The sequence is followed in loop fashion until the demo is stopped.
- The GUI block represents the control and monitoring functionality built into the system. The activity of the sRIO link and both Gigabit MAC (GMAC) links are displayed in real time. For the link transferring the MPEG-2 stream, the average data rate is 8 Mbits/s, which is typical for the standard resolution encoded at 30 fps. For the link transferring ASF packets, the average bit rate is 4 Mbits/s, which demonstrates that WMV9 is able to save about 50% of the bandwidth but still achieve similar video quality. For the sRIO link, the average bit rate is 124 Mbits/s.
Although the basic technology has been in place for quite some time, higher-quality video and eventually HDTV over the IP network requires a new video infrastructure architecture that includes sRIO and DSPs that interact with, and complement, the interface's transmission scheme. DSPs used in infrastructure applications typically need to have large on-chip memory to handle multiple channels simultaneously. In addition, they must give designers multiple I/O options, including GMAC and UTOPIA, as well as provide better internal dataflow. The C6455 DSP and the aforementioned demonstration software components are evidence that the challenge of video over IP can be met now and in the future.