The Ever-Improving Inference at the Edge

April 12, 2019

Using machine-learning inference on the edge has never been easier with platforms like NVIDIA’s Jetson Nano.

Not all applications can utilize machine-learning (ML) inference, but it’s possible with most. Doing it at the information source instead of in the cloud is becoming easier thanks to improved artificial-intelligence (AI) software support plus hardware acceleration that targets deep neural networks (DNNs). Platforms like Renesas’ e-AI, STMicroelectronics’ STM32CubeMx.AI, and NXP’s eIQ all support ML and target hardware from conventional microcontrollers to systems with hardware acceleration.

ML hardware acceleration can significantly improve the performance of inference applications on the edge, opening up new application opportunities that would not be possible on stock hardware. GPGPUs and multicore CPUs led the charge, but ML-specific hardware has the edge. Even the latest version of these platforms have been enhanced to address the inference chores. For example, Intel’s latest Xeons include instructions targeting ML and its Movidius video processing unit (VPU) zeros in on specific ML application spaces.

NVIDIA’s Jetson Nano (see figure) brings a full SoC to the ML table. The 128-CUDA-core Maxwell GPGPU handles processing of most of the DNN models assisted by the 64-bit, quad-core Cortex-A57 CPU cluster. The compact DIMM module also includes 4 GB of DRAM and runs Linux. Its hardware encode and decode support can process a 4K or eight 1080p video streams while running ML models on each stream. Convection cooling easily handles the 5 to 10 W of power, allowing the Jetson Nano to work in compact, low-power AI applications on the edge. The Jetson Nano provides the same functionality as its older and more powerful siblings including the ability to support major platforms like TensorFlow, PyTorch, Caffe/Caffe2, MXNetx, and Keras.

Electronicdesign Com Sites Electronicdesign com Files 0511 Lab Bench Fig 1 Nvidia Jetson Nano

The Jetson Nano from NVIDIA is an SoC that supports machine-learning inference chores in embedded systems.

Coprocessors are also answering the call for more efficient inference and identification chores in embedded systems, where a batch size of one is important. Servers typically handle large batch sizes more efficiently, but they’re also working with larger datasets versus embedded systems that might have a single camera delivering data for analysis.

Chips like Flex Logix’s InferX X1 target this space. The chip incorporates multiple nnMAX processing tiles specifically designed to handle each layer within a DNN model that’s been trained on a server using boards like NVIDIA’s latest Tesla T4 or FPGA boards such as Xilinx’s Alveo or Intel’s Programmable Acceleration Cards (PACs). The InferX X1 is optimized to implement Winograd acceleration, which can improve accuracy and performance of INT8 layers by 2.25. The system transforms the 3-by-3 convolution to a 4-by-4 with dynamic translation of weights to 12 bits. The support also handles input and output translation on-the-fly, minimizing the loading of weights within the system.

Figuring out whether AI will benefit an application is a chore in and of itself. However, once that determination is made, lots of options are available to implement these systems. Of course, one may have to apply AI just to wade through the options.

Electronicdesign Com Sites Electronicdesign com Files Source Esb Lookin For Parts Banner Caps 0

About the Author

William G. Wong | Senior Content Director - Electronic Design and Microwaves & RF

I am Editor of Electronic Design focusing on embedded, software, and systems. As Senior Content Director, I also manage Microwaves & RF and I work with a great team of editors to provide engineers, programmers, developers and technical managers with interesting and useful articles and videos on a regular basis. Check out our free newsletters to see the latest content.

You can send press releases for new products for possible coverage on the website. I am also interested in receiving contributed articles for publishing on our website. Use our template and send to me along with a signed release form.

Check out my blog, AltEmbedded on Electronic Design, as well as his latest articles on this site that are listed below.

You can visit my social media via these links:

I earned a Bachelor of Electrical Engineering at the Georgia Institute of Technology and a Masters in Computer Science from Rutgers University. I still do a bit of programming using everything from C and C++ to Rust and Ada/SPARK. I do a bit of PHP programming for Drupal websites. I have posted a few Drupal modules.

I still get a hand on software and electronic hardware. Some of this can be found on our Kit Close-Up video series. You can also see me on many of our TechXchange Talk videos. I am interested in a range of projects from robotics to artificial intelligence.