Remove Redundancy In Your Compression Designs

All compression algorithms utilize two basic functions. The first removes redundant or irrelevant information. Redundancy removal can be lossless or lossy. The second function, bit packing, packs the redundancy remover’s outpt bits together in a lossless manner.

Redundancy occurs when a signal stream contains more bits than required to represent the inherent information it carries. For instance, telephone calls could transmit the full 2-kHz bandwidth that humans hear. But in the 1960s, telephone engineers decided that 3 kHz was enough bandwidth to adequately represent speech because the words and the speaker were still recognizable.

Speech compression advanced further in the 1980s when mathematical models such as linear predictive coding (LPC) and code-excited linear prediction (CELP) enabled speech coders to send numerical descriptions of speech waveforms, instead of the speech samples themselves. Speech samples contain redundancy that a mathematical description reduces while retaining intelligibility.

Redundancy removers for speech are lossy. Therefore, as long as the speech sounds the same, the parties talking on the phone are satisfied, even if the original samples changed. Since errors in even a single character or data byte of a computer file can have disastrous consequences, though, redundancy removers for computer files are obviously lossless.

IN THE COMPUTER WORLD
In 1977, IBM scientists Abraham Lempel and Jacob Ziv developed a text compression algorithm with a powerful text redundancy remover that still works well for computer files such as spreadsheets, databases, and programs (executables).

The Lempel-Ziv algorithm combines a dictionary of frequently occurring phrases and a character-matching algorithm to shrink many text files to half their original size. Patterns in ASCII text can be exploited to make computer files even smaller. The dictionary is searched for the entry having the longest character match, and a pointer to that dictionary entry is sent along with a character-match count, replacing the original text.

Redundancy removers for digitized analog waveforms (DAWs), such as those that analog-to-digital converters (ADCs) generate, exploit three universal characteristics of sampled data signals. Examples of DAWs, whose redundancy can be effectively reduced, include radar, sonar, x-ray, imaging, oscilloscope, and seismic signals.

First, DAW signals often have a timevarying dynamic range (see the figure, a). A DAW redundancy remover would not deliver N bits all the time, but instead would provide time-varying sample resolution. Second, DAW signals are regularly oversampled when a signal with bandwidth B is sampled by an ADC operating well above the minimum Nyquist sample rate of 2B.

Oversampling introduces “white space” in the spectrum (see the figure, b). A DAW redundancy remover downsamples signals whenever possible to reduce white space. Third, ADCs have an inherent, measured noise floor. An ADC that provides N bits per sample (see the figure, c) usually has between N – 1 and N – 2 “effective” bits (effective number of bits, or ENOB).

ENOB is an ADC measurement. A DAW redundancy remover should monitor and preserve the ADC’s ENOB, not its resolution. In this third case, irrelevancy is removed, not redundancy. DAW customers have the choice of lossless and lossy redundancy removers, depending on their end-to-end system requirements.