AI Accelerator IP Family is Flexible and Future-Proof
CEVA unveiled its enhanced NeuPro-M neural processing unit (NPU) family for AI inferencing workloads. The NeuPro-M NPU architecture is specifically designed to handle the transformer networks at the heart of large language models (LLMs) and other advanced AI models.
According to CEVA, the accelerator IP also brings the right performance, power efficiency, cost, latency, and memory profile to take care of more traditional models in the world of machine learning, including convolutional neural networks (CNNs).
The power-efficient NeuPro-M operates at up to 350 trillion operations per second (TOPS) per watt at a 3-nm node, and it can handle more than 1.5-million tokens a second per watt for AI inferencing with LLMs.
CEVA said the NeuPro-M IP is flexible and future-proof thanks to its integrated vector processing unit (VPU), which supports future network layers. The architecture also supports any activation and data flow, with true sparsity for data and weights that enables up to 4× acceleration in performance. As a result, it’s flexible enough to be used in cost- and space-constrained edge devices as well as in the latest server-class processors for the data center—and everything in between.
Beyond the NPM10 and NPM18, the NeuPro-M adds new NPM12 and NPM14 NPU cores, with two and four NeuPro-M engines, respectively, to easily move to higher-performance AI workloads.
Accompanying the NeuPro-M NPU IP is a new development toolset, based on CEVA's AI compiler, CDNN, which is “architecture aware” so that customers can fully tap into the NeuPro-M’s processing engines.