JPEG 2000 To Arrive In December With Tantalizing Array Of Features

Oct. 30, 2000

9 min read

As if to trumpet its arrival, JPEG 2000 was the topic of an extraordinary number of the 800 papers presented at month's IEEE International Conference on Image Processing (ICIP) in Vancouver, Canada. Like JPEG and several of its upgrades, JPEG 2000 is a compressed representation of still images. Yet one primary feature of this new standard can be characterized as a "successive refinement." JPEG 2000 can extract a number of different images from a single, compressed bit stream.

During web browsing and typical web-page printing, users initially may display a low-resolution version of an image. If they need more detail in a portion of the image, the server can provide the necessary additional data for that region. Or, the image might be reproduced at the next higher resolution.

If a user intends to print that part of the image on a grayscale printer, the server can send a version with higher resolution. For a 600 dot-per-inch (DPI) printer versus a 75 DPI screen, it would only send the grayscale data. Not only is the raw image data compressed, but more importantly, the server selectively transmits just the data the application needs.

In digital camera applications, extremely high-quality—even lossless—pictures could be taken and stored using JPEG 2000 until the memory card is filled. At that point, portions of the compressed data from each image could be discarded, allowing more images to be stored. It's always possible to take "just one more" picture, provided that a decrease in quality in the images already captured is acceptable.

Satellite imagery frequently encounters situations where bandwidth is the single most expensive part of the system. At times, a recognizable image is desired no matter how low the bit rate goes. For these systems, JPEG 2000 is expected to provide the best compression of any publicized standard. What's more, experts say JPEG 2000 will be able to compress satellite images at low bit rates far better than any current standards.

JPEG 2000 comprises three key steps. A discrete wavelet transform first decomposes the image pixels into subbands. The subbands represent different frequency components of the same image. For instance, the smooth or slowly changing low-frequency components are separated from high-frequency components such as abrupt boundaries or edges in the image. Each subband is then partitioned into relatively small blocks known as code blocks. These are independently coded into embedded block bit streams. Finally, the embedded block bit streams are packed into "quality layers" in the code stream.

It is the embedded block coding algorithm at the heart of JPEG 2000 that delivers excellent compression performance. Each code block is coded entirely independently without reference to other blocks in the same or other subbands. Adopting independent embedded block coding for JPEG 2000 has other benefits as well. These independent features let JPEG 2000 users arbitrarily select the contributions made by each code block to each quality layer to reflect their priorities in reconstructing the data. This flexibility in image quality and resolution more than compensates for the cost of restarting the coding process at each block.

Part I of JPEG 2000, the "Core Coding System," has been submitted to JPEG committee members for ap-proval. If they give it the nod, it will become a specification this December. Once ratified, the International Standards Organization (ISO) will sell copies. Part I limits options to the extent that decoders, browsers, printers, and PDAs will be able to implement various options and thereby assure that a discernible image will be able to be displayed on all devices. Part II will consist of "value added" options not required for all implementations.

The original JPEG ISO/ITU-T standard, created in the late 1980s, is "image in, image out" compression. The quantization, resolution, region, and component decisions are all made at the time the image is encoded. Little or no variation on these decisions is possible at the decoder. With JPEG 2000, more flexibility in accessing the image comes into being.

A pyramidal structure will allow extraction of images of different resolutions. Along with the use of redundant copies of the image, this structure will enable zooming and panning through the image. Some of JPEG 2000's features, such as progression by resolution and pixel fidelity, will be selectable at decode time. This includes region of interest decoding and out of order decoding. The rate-distortion performance should improve very significantly over the original JPEG standard as well. But the credit for the flexibility goes to the wavelet and bit-plane coding technology that's the foundation of JPEG 2000's development.

Central to this development is the concept of wavelets. These tools decompose signals into a hierarchy of increasing resolution. Put simply, the more resolution layers added, the more detailed the image becomes. Wavelets, which are like mathematical microscopes, permit zooming in and out at multiple resolutions. Whereas the familiar discrete cosine transform of JPEG can be considered a spectrum analyzer, a wavelet code can be perceived as a microscope.

This relatively young technique employs filter banks in which the signal is decomposed into its high- and low-frequency components, and downsampled by a factor of two. Designers reconstruct the original image by upsampling by a factor of two and then recombining the subbands. As it turns out, it's possible to achieve perfect reconstruction with implementable filters. But the use of wavelets is one of many techniques in the set of tools that are employed in the vast domain of compression technology.

Compression's Importance Compression has been here for some time. Since it's invisible, it's easy to forget its large role in our daily lives. This became clear in a tutorial about compression presented at ICIP by Michelle Effros, associate professor of electrical engineering at the California Institute of Technology in Pasadena. Her presentation, titled "Data Compression Demystified," brought forth some astonishing numbers.

In a world without compression, an 8.5- by 11-in. page of text scanned at 300 pixels/in., with 1-bit assignment per pixel, would require 8.4 Mbits of memory to store and 15 minutes to transmit over a 9600-baud modem. Also, a single photo taken with a 35-mm negative, scanned at 12-mm resolution, would eat up 233 Mbits. Without compression, storage would be devoured at extraordinary rates, bandwidth would be hogged by the transmission of rather mundane documents, and transmissions would be far longer.

Successful image processing relies heavily upon compression, and compression means seeking compression algorithms. Compression addresses one of the two central challenges of communications—information representation. (The other is information transmission.) One of the fundamental keys to compression is that it exploits redundancies, taking advantage of patterns in data to describe common events efficiently. In the course of doing so, it introduces acceptable inaccuracies, removing information that viewers can't see and listeners can't hear.

The challenge, then, is choosing the right algorithm for a given application and achieving the best possible performance with a given code. But as Effros points out, "Theory tells us how well we can hope to do it, but it does not tell us how to do it."

Compression's challenge is realizing the best possible tradeoff between description length in bits per symbol (R) and reproduction fidelity, or distortion (D), which can be thought of as a measure of lack of description fidelity. Effros warns that compression algorithms don't always work. There exist source distributions for which no compression algorithm will deliver a decrease in the expected file size. She notes, though, that these are not types of distributions encountered in typical data sets of human interest.

In lossless compression, the goal is to minimize R subject to the constraint that D must equal zero. But in lossy compression, there are three options. In the first option, the designer must minimize R, subject to the constraint that D is less than some D_MAX. The objective here is to obtain the lowest possible rate for a desired reproduction fidelity. Second, designers must minimize D, subject to the constraint that R must be less than or equal to R_MAX. Here, the goal is to get the lowest possible distortion for an acceptable file size. Or third, designers can find a compromise between these two figures of merit (see the figure).

In this third option, D + λR is minimized for some slope, −λ. This recognizes that the user's perception of system performance may rely less on absolute quality metrics, like D_MAX or file sizes like R_MAX, and more on some measure of how much quality improvement would result from an incremental increase in rate. Optimizing D + λR is equivalent to saying "keep increasing the rate R as long as the benefit from increasing the rate is large." The parameter λ defines how fast the improvement must be in order to be considered worthwhile by the system user.

Nonetheless, the problem is that the interplay between rate, distortion, and complexity isn't well understood. Treating multiple pixels as a single vector leads to better performance than coding the corresponding pixels separately. The higher the dimension, the better the performance. But even the best code at dimension K isn't guaranteed to be better than an arbitrary (suboptimal) code at dimension K + 1. The objective, then, is to design suboptimal (but clever!) high-dimensional codes with low computational complexity. One of the tricks of the trade, as Effros puts it, is to take advantage of the correlation in the data. Code the uncorrelated data with a simple code and then add the correlation back in to remove the correlation.

For more about JPEG 2000, go to www.jpeg.org.