As Compression Technologies Reach Their Limits, What’s Next?

Compression continues to be an active academic and commercial research area, and some clear trends for 2011 are emerging. Markets that have used compression for years are hitting the wall as they approach the theoretical highest compression ratio with acceptable quality. Other areas where the use of compression is relatively new can expect significant improvements in both compression performance and signal quality, often at surprisingly low resource utilization levels.

Rather than focusing on just the traditional rate-distortion tradeoff, engineers are actively looking for compression algorithms with low complexity and low latency. That’s because MIPS (compression in software) and gates (compression in hardware) are always in short supply, especially in mobile devices with limited battery life.

First, let’s identify those data types that are already being compressed to near their theoretical limit: computer files, speech, and audio signals. Because more complex algorithms fail to achieve higher compression ratios on these data types, compression researchers are focusing their efforts on other data types.

Both Lempel-Ziv (L-Z) and Burrows-Wheeler (B-W) algorithms achieve about 2.5:1 lossless compression on computer files. One interesting trend for computer file compression is the emergence of companies like Sandforce and Cavium Networks that have integrated lossless compression (L-Z, bzip2, etc.) into ICs for storage and networking applications.

Today’s mobile-quality speech requires about 10 kbits/s. Compressed speech below 10 kbits/s typically exhibits unacceptable artifacts. The latest Advanced Audio Coder (AAC) achieves transparent coding of stereo audio at 128 kbits/s. AAC also includes multi-channel HD audio support, with up to 48 wideband channels and 16 low-frequency effects (LFE) channels contained in one AAC bit stream.

The availability of free, downloadable source code for most text, speech, and audio compression standards is a telling indicator of the maturity of compression for these data types, since proprietary algorithms no longer offer any advantages.

Image And Video Compression
With the wide availability of the wavelet transform-based JPEG2000 algorithm, image compression has reached technical maturity. JPEG2000 includes a host of features, including lossless (~2:1) and lossy (up to 100:1) compression modes, “encode once, decode many” flexibility, compressed domain editing, and video support.

The only knock on JPEG2000 is its complexity, as high-quality compressed images can take seconds to decode on a 2-GHz CPU. Microsoft recently standardized its lower-complexity HD Photo image codec (now called JPEG XR) as ISO/IEC 29199-2 and as ITU Recommendation T.832.

Video compression is a very active research area where improvements reduce the bandwidth of mobile uploads to, and downloads from, popular video sharing sites like YouTube and Hulu. While many of us prefer to view video on 50-in. HD screens at home, we are quickly becoming addicted to the “everywhere/always on” mobile video experience on our iPhones, iPads, and Android devices.

Because video bandwidth continues to be a limited, expensive resource, video compression improvements capture immediate economic benefits for wireless, satellite, and cable providers. H.264 is the most widely adopted video compression standard, with more than 50 companies (including large vendors like Nvidia, Broadcom, and Texas Instruments) offering H.264-compliant chips for video cards, cable set-top boxes, and Blu-ray disc players. H.264 software support is included in products from Adobe, Apple, Intel, Microsoft, and many other companies.

Continue on next page

Medical, Wireless, And Supercomputing
These areas include the most visible and high-volume uses for compression. But some of the most interesting compression developments are appearing in lower-volume applications, such as medical imaging, wireless infrastructure, and supercomputing, which exhibit their own unique bandwidth and storage bottlenecks.

Computed tomography (CT) scanners, ultrasound machines, and magnetic resonance imaging (MRI) equipment generate prodigious amounts of sensor data, often above 100 Gbits/s. This high rate must be transferred across cables, backplanes, and ICs.

If compression could reduce the rate without affecting diagnostic image quality, significant cost savings are realized. GE Healthcare and Stanford have demonstrated that 4:1 compression of sampled x-ray signals did not change a radiologist’s clinical diagnosis across 400 CT images (see “Compress CT Samples At 64 Gbits/s”).

As 3G and 4G wireless basestations are being deployed worldwide, vendors are challenged by the significant cost of 6-Gbit/s and higher fiber-optic cables that carry sampled data between the bottom and the top of cellular towers.

By having transmitters compress wireless baseband signals by at least 2:1, and then having receivers decompress these signals, expensive 6-Gbit/s and higher electrical-to-optical and optical-to-electrical transceivers, and the pricey 6-Gbit/s and higher SERDES-enabled (serializer-deserializer) FPGAs that drive them, can be replaced with much less expensive 3-Gbit/s components.

Finally, the supercomputing world has recently discovered that graphics processing units (GPUs) like those found on Nvidia’s Tesla boards are creating lower-cost, lower-power compute engines than traditional CPU-based compute servers. Nvidia Fermi GPUs include up to 448 cores per socket, and GPU programmers are realizing that keeping this many cores “fed” causes significant I/O bottlenecks at both the PCI Express interface (8 Gbytes/s) and the GDDR5 memory interface (more than 100 Gbytes/s).

Because 32-bit floating-point numbers comprise supercomputing’s most common data type, compressing floats reduce these bottlenecks in a novel way. Using a floating-point compression algorithm called Prism FP, software-based compression experiments in medical imaging, financial modeling, fluid flow, and oil and gas exploration recently showed that lossy compression of floating-point numbers doesn’t affect the results.

Prism FP can someday be integrated into the instruction set (or even the memory controllers) of CPUs and GPUs, following recent trends that add dedicated silicon accelerators for video compression and decompression and making limited on-chip memories two to 10 times larger and faster.