HPC And “Big Data” Apps Tap Floating-Point Number Compression

Advice on sharing big FP arrays among multiple cores.

Jan. 9, 2012

4 min read

Floating-point values contain three fields: a sign bit, exponent bits, and significand or mantissa bits. The IEEE-754 floating-point number format defined a common floating-point format that most processor vendors implemented. (courtesy of IEEE Std. 754-2008)

Text and integer numbers in the form of audio, speech, image, and video files have been the target of innumerable compression algorithms. Floating-point numbers, though, have drawn the proverbial short stick when it comes to compression research.

With the rise of high-performance computing (HPC) and so-called “big data” applications in seismology, physics, meteorology, and genomics, floating-point values are becoming more prevalent. Big data is the popular term for databases that hold many terabytes (1012 bytes) of data, often in numerical form.

In HPC, big data is processed, searched, summarized, and visualized by thousands of microprocessor cores. Unlike many business and library text databases, HPC datasets contain numerical data—integer and floating-point values. The most common scientific datatype is the 32-bit floating-point number.

With the explosion of mobile devices and ubiquitous sensing, companies and governments are collecting more real-world data than ever, including satellite tracking of illicit activity, climate metrics, astronomy, energy exploration, and drug discovery.

While sensor data is first captured in integer form as the output of an analog-to-digital converter, integer sensor data is commonly converted to floating-point form, simply because floats have a much wider dynamic range than ints and thus are easier to manipulate by computers.

Floating-Point Values

Floating-point values comprise three fields: a sign bit, some exponent bits, and significand or mantissa bits (see the figure). In the 1970s, microprocessor vendors developed proprietary floating-point formats that led to incompatibilities in data representation, causing the same scientific source code (typically written in FORTRAN) to generate different results on different processors.

The IEEE resolved this incompatibility issue in 1985 by ratifying the IEEE-754 standard, which defined a common floating-point format that most processor vendors implemented. The IEEE-754 standard specifies 32-bit floats with 1 sign bit, 8 exponent bits, and 23 mantissa bits.

Floating-point values in the IEEE-754 format are hard to compress because the mantissa bits follow a rather unusual statistical distribution called Benford’s law. In 1938 Frank Benford, a physicist at General Electric, noticed that the digits of logarithmic values were much more likely to begin with 1, 2, or 3, rather than 8 or 9. Benford’s law explains why floating-point mantissas (the bulk of floating-point bits) are hard to compress—primarily because they follow a broad, skewed distribution that exhibits no discernible patterns.

Floating-Point Compression

Peter Lindstrom and Martin Isenburg, who were working at Lawrence Livermore Labs in 2006, published a paper about the lossless compression of scientific floating-point values, including unstructured meshes, point sets, images, and voxel grids. Rather than aiming to achieve the highest lossless compression ratio, Lindstrom and Isenburg designed a software compression algorithm that would operate at the I/O rates of that time.

Their design goal was important, since HPC won’t generate results more quickly if the compression algorithm doesn’t operate at I/O rates. Their algorithm predicts each new floating-point number using a Lorenzo predictor and then entropy-encodes the difference between the predicted and actual values using an integer variant of arithmetic coding. The Lindstrom/Isenburg algorithm achieved an average lossless compression ratio of 1.5:1 at a rate of 20 Mbytes/s (5 Mfloats/s).

Improvements To Floating-Point Compression

Today, multicore chip designers at companies like Intel, Nvidia, IBM, and ARM are aware that their multicore designs are hitting the memory wall (see “The Memory Wall Is Ending Multicore Scaling” at electronicdesign.com). Memory, bus, and disk bandwidth limitations significantly reduce the benefits of multiple compute cores.

If floating-point compression and decompression is to keep up with today’s Gbyte/s I/O rates, compression algorithms that reduce multicore I/O bottlenecks will have to be significantly accelerated in software or implemented in hardware.

If hardware acceleration were to provide compress-decompress functions, the compress-decompress block would ideally accept both floating-point and integer values and would support fast lossless compression, as well as lossy compression options where users specify the desired compression ratio or the decompressed data quality. With these improvements, numerical compression can flexibly accelerate I/O rates that degrade the throughput of many multicore applications.

About the Author

Al Wegener

Al Wegener is the CTO and founder of Samplify Systems, a fabless semiconductor startup in Santa Clara, Calif. He holds 17 patents and is named on additional Samplify patent applications. He earned a BSEE from Bucknell University and an MSCS from Stanford University.