ITC keynoter addresses hardware inference accelerators for machine learning

Fort Worth, TX. Hardware inference acceleration for machine learning was the topic of Rob A. Rutenbar, Bliss Professor and Head, Department of Computer Science, University of Illinois at Urbana-Champaign, who delivered Wednesday afternoon’s keynote address at the International Test Conference, where he described joint work with his colleague Jungwook Choi of the IBM T.J. Watson Research Center.

“We increasingly need custom accelerators,” he said. “We have thus far been blessed with opportunity to ride Moore’s Law, but we urgently need a new plan because it’s about to give up the ghost.”

Plan B, he said, is to throw transistors at the problem and see if something good happens. As examples of where throwing transistors at a problem already helps, he cited bitcoin mining and high-frequency trading—you can’t make money at either using a software-based approach. With high-frequency trading, you have 100 ns to decide whether to trade and another 1,000 ns to execute the trade.

As for AI, he said applications can show us stuff, but not understand stuff. However, machine learning is yielding breakthroughs in recognition and classification.

Relevant activity across a wide spectrum, he said, includes efforts related to Microsoft Catapult, Google TensorFlow, and the Qualcomm brain-inspired neural processing unit (NPU).

“You can’t go to computer architecture conference without seeing someone doing a chip for deep learning,” he said.

The particular focus of his ITC keynote was structured prediction involving inference on graphical models. With such models, he said, nodes encode what you observe and know. The question, he said, is what are the most likely set of labels on a graph?

He described the belief propagation (BP) inference method of solving such problems. Belief propagation, he said, has wonderful properties in which nodes exchange messages iteratively. In general each node on a graph decides what label it wants to acquire based on the labels of surrounding nodes. “The goal is to figure out what the graph wants to be,” he said.

The approach involves smart, local, interactive message passing (such as in the Viterbi algorithm for FFT)—otherwise the problem is intractable. Conflicting information can yield loopy BP.

He cited stereo image matching as an example of using BP inference. The graphical nature of the presentation precludes a textual description, but you can download a relevant paper, “Configurable and scalable belief propagation accelerator for computer vision,” by Rutenbar and Choi, from IEEE Xplore.

Rutenbar cited challenges ahead. “The unfortunate reality for today’s fab now is that every chip behavior is a little smear of probability.” That presents the need for resilience on a stochastic fabric. BP with its iterative character is already quite resilient, he said, but not enough for a “really nasty stochastic fabric.” Studies are underway to make BP even more resilient.

“Doing machine learning in FPGAs and custom hardware is academically challenging and industrially relevant,” he said, “and is likely to be increasingly relevant as FPGAs appear in data centers.”

ITC Organizers designated Rutenbar’s talk the “Special Keynote in Honor of Professor Edward J. McCluskey.” Rutenbar concluded his address by noting that the passing of Moore’s Law is accompanied by the sad passing of individuals, such as Professor McCluskey.

Rutenbar said he has been teaching MOOC courses in EDA. “There are 25,000 EDA professionals on the planet,” he said, “and 51,602 have signed up for my VLSI CAD course.”

He said that McCluskey’s 1986 book Logic Design Principles: With Emphasis on Testable Semicustom Circuits has been a valuable source for his classes and his constant companion.