Latest from Embedded

promo__id_316508515__alena_butusava__dreamstime

Industrial

Advancing Memory: The Move to 3D NOR Flash

July 10, 2025

Power

High-Density AC-DC Power Supply is All-In on Digital Control

July 9, 2025

Machine Learning

Can We Trust AI in a Product’s Lifecycle Path?

July 8, 2025

Processors

MRAM Micro Sports AI Accelerator

July 7, 2025

Systems

Advancing Mezzanine Card Technology via QMC

July 2, 2025

Brainchip Platform Uses Spiking Neural Networks for Low Power Operations

Machine Learning

Delivering Neuromorphic Computing to Embedded Systems

July 2, 2025

Embedded

Graphics Bus Wars: The Legacy of ISA, AGP, and PCI

July 2, 2025

Automation

Attack Grid Power Reliability Issues with Control System Redundancy

June 26, 2025

TechXchange

SmartNIC Accelerating the Smart Data Center

June 26, 2025

What’s the Difference Between Fixed-Point, Floating-Point, and Numerical Formats? (.PDF Download)

Aug. 31, 2017

Embedded C and C++ programmers are familiar with signed and unsigned integers and floating-point values of various sizes, but a number of numerical formats can be used in embedded applications. Here we take a look at all of these formats and where they might be found.

One reason for examining different formats is to understand how they work and where they can be applied. For example, fixed-point values can often be used when floating-point support isn’t available. Fixed point may be preferable in some instances, while floating-point support is available for other reasons, such as precision or representation.

Developers may be using single- and double-precision IEEE 754 standard formats, but what about 16-bit half precision or even 8-bit floating point? The latter is being used in deep neural networks (DNNs), where small values are useful. Small integers and fixed point can be used with DNN weights as well, depending on the application and hardware.

There are a variety number of ways to represent numbers. However, the layouts tend to vary only in the number of bits involved (see the figure). The use of the sign bit in binary-encoded values differs depending on whether 1’s or 2’s complement encoding is used. The 1’s complement approach uses the same encoding for the integer portion, which means there is actually a positive and negative zero value. A 2’s complement number has a single zero value, but there’s one more negative value than positive value. For example, an 8-bit signed integer includes values −128 to −1, 0 and 1 to 127.

Download