Power-Efficient NovuTensor Takes on Inference at the Edge

Nov. 7, 2018

Tackling tensor models, NovuMind’s low-power NovuTensor comes as a standalone chip or quad-chip card that delivers 15 and 60 TOPS, respectively.

William G. Wong

NovuMind is another startup with its eyes on inference at the edge, leveraging dedicated technology designed to deliver high inference throughput with minimal power. It’s not alone in this space and many GPU and FPGA machine-learning (ML) platforms are available, but fewer dedicated platforms are shipping at this point. Many of these target specific applications, such as Intel’s Movius series. The advantage of the newer chips is that their power requirements are a few watts instead of the hundreds needed for high-end GPU boards.

Dr. Ren Wu, founder and CEO of NovuMind, says, “Until now, GPUs have powered the advances in AI, particularly around the training of deep-neural-network models from large sets of data. Once models are trained, however, the challenge is to deploy them at scale. GPUs and other processors are expensive and consume large amounts of power. Their architectures are optimized for two-dimensional matrix computation. While they perform well when processing large batches of data, these chips are not suited for real-time applications that require low latency. They also lack power efficiency and they tend to be very expensive. With the arrival of our NovuTensor chip, we are breaking these barriers and ushering in a new era where AI can be deployed at scale.”

Www Electronicdesign Com Sites Electronicdesign com Files Novu Tensor Fig1

1. The 400-MHz NovuTensor delivers 15 TOPS while the quad-chip PCI Express card pushes 60 TOPS.

NovuMind’s 400-MHz NovuTensor is designed to deliver 15 TOPS while using under 5 W for the neural engine. The chip actually needs 15 W. It’s available as a chip or on a short PCI Express card (Fig. 1). The card includes four chips delivering a combined 60 TOPS.

Details about the chip are a bit sparse at this point. In general, one of its advantages is to 3D tensor calculations without unfolding the data into 2D matrices. According to its patent, “The contraction engine calculates the tensor contraction by executing calculations from equivalent matrix multiplications, as the tensors were unfolded into matrices, but avoiding the overhead of expressly unfolding the tensors. The contraction engine includes a plurality of outer product units that calculate matrix multiplications by a sum of outer products. By using outer products, the equivalent matrix multiplications can be partitioned into smaller matrix multiplications, each of which is localized with respect to which tensor elements are required.”

Part of this approach is to minimize the amount of data movement. This is also something that Flex Logix does with its NMAX approach to neural-net processing. Moving data around takes time and power, but it’s necessary to keep the matrix multipliers flowing. Most systems can’t keep these calculators running all of the time and are often waiting for data to arrive.

Www Electronicdesign Com Sites Electronicdesign com Files Novu Tensor Fig2

2. A NovuTensor chip takes on tasks such as scaling streaming media to 4K video or 8K video using four chips.

NovuTensor can be used for most inferencing chores. It’s able to handle challenging applications like scaling streaming media to 4K video using a single chip or 8K video using four chips (Fig. 2).

The challenge for developers will be benchmarking this and other chips with their applications and neural-network models. Most benchmarks these days don’t address real-world applications all that well. Likewise, the scale and implementation of a model can have a significant impact on how it’s partitioned and implemented on a particular system. This will be especially critical for embedded systems where using the smallest, lowest-power chip can make the difference between a good, economical product and an expensive one that doesn’t work.

Www Electronicdesign Com Sites Electronicdesign com Files Source Esb Looking For Parts Rev Caps

About the Author

William G. Wong | Senior Content Director - Electronic Design and Microwaves & RF

I am Editor of Electronic Design focusing on embedded, software, and systems. As Senior Content Director, I also manage Microwaves & RF and I work with a great team of editors to provide engineers, programmers, developers and technical managers with interesting and useful articles and videos on a regular basis. Check out our free newsletters to see the latest content.

You can send press releases for new products for possible coverage on the website. I am also interested in receiving contributed articles for publishing on our website. Use our template and send to me along with a signed release form.

Check out my blog, AltEmbedded on Electronic Design, as well as his latest articles on this site that are listed below.

You can visit my social media via these links:

I earned a Bachelor of Electrical Engineering at the Georgia Institute of Technology and a Masters in Computer Science from Rutgers University. I still do a bit of programming using everything from C and C++ to Rust and Ada/SPARK. I do a bit of PHP programming for Drupal websites. I have posted a few Drupal modules.

I still get a hand on software and electronic hardware. Some of this can be found on our Kit Close-Up video series. You can also see me on many of our TechXchange Talk videos. I am interested in a range of projects from robotics to artificial intelligence.