Add AI Acceleration with This Tiny Coprocessor
What you’ll learn:
· How to add artificial intelligence acceleration to a host microcontroller.
· How to add always-listening keyword detection using less than 100 mW.
Artificial-intelligence (AI) developers are improving performance of AI and machine-learning (ML) models through a range of techniques like DeepSeek. Addressing model sparsity is one of these methods. While much of this focus is on high-end, cloud-based solutions, it’s equally applicable to low-power embedded solutions.
I recently talked with Sam Fok, CEO at Femtosense, about how sparsity and other techniques (Fig. 1) enable them to provide very low-power hardware for AI edge computing. Sparse matrices are common in machine-learning models since these weights are zero or close to it. Eliminating the need to perform arithmetic operations can reduce overhead by a factor of 100 or more.
Femtosense’s hardware is based on a sparse processing unit (SPU). This neural processing unit (NPU) is optimized to handle sparse data often consuming under 1 mW. The SPU-001 (Fig. 2) utilizes an SPI interface to connect to a host processor. The evaluation board contains an SPU-001 and plugs into a PMOD connector found on many processor evaluation boards. Femtosense has a single-chip solution: the AI-ADAM100 incorporates a Cortex-M0+ core and an SPU.
The company’s software tools can accept models from popular AI/ML frameworks like PyTorch and TensorFlow. Tools include software simulation that provides information about power requirements, latency, and memory footprint. The SPU-001 includes 1 MB of SRAM.
The SPU can handle a range of applications, but there’s a focus on audio applications such as keyword detection. The low-power requirements make it possible to implement an always-listening mode even when using battery power. Currently, the SPU can be found in some earbud applications.