The Crest of Wave Computing

Before he started Tallwood Venture Capital in 2000, Dado Banatao founded S3 Graphics, one of the first major suppliers of graphics chips. Over the years, the company was squeezed out of the market by Nvidia, which has since tapped into the growing use of graphics accelerators for training neural networks, the building blocks of machine learning software.

Banatao has moved onto lining the pockets of another company he founded trying to end the dominance of graphics chips. Wave Computing is one of the rare machine learning chip startups to have raised over $100 million from investors since it was founded in 2009. And the company’s hardware, with general availability before the end of the year, could pose the first major challenge to Nvidia’s market lead in machine learning.

The Santa Clara, California-based company has been flexing its muscles. Derek Meyer, Wave’s chief executive, said his company’s chips can complete training tasks hundreds of times faster than graphics chips, scrolling through millions of images or hours of human speech to teach algorithms to make new conclusions. It’s targeting training and inferencing, both of which typically occur in data centers.

Wave is also taking pages from Nvidia’s playbook, handing customers the keys to systems loaded with memory, storage and connectivity that can be slipped into servers or used on premise. It does not sell chips directly. “We’re going to configure different systems and solutions for different use cases and depending on where the data resides, whether that’s in the cloud, private enterprise, or a gateway on the edge,” Meyer explained. “Wherever the data, we don’t care.”

The keystone is Wave’s dataflow processing unit—also known as the DPU—which runs computations by sending software through a group of predetermined instructions. Thousands of cores inside save power by staying dormant until data starts flowing through them, like a water wheel. “Flowing and streaming data is inherent to neural networks so dataflow hardware is a technological match,” Meyer told Electronic Design.

Founded before the current machine learning boom, Wave was focused on building general-purpose chips, but the company changed course in 2013. Not only were manufacturing advances slowing down significantly but running machine learning was also getting more strenuous. Today, the processing power used in training the most complex algorithms is 300,000 times what it was five years ago, doubling every three months and a half, according to OpenAI.

That has emboldened startups like Graphcore and Cerebras Systems to enter the machine learning race. They are competing with each other on the training side as well as major suppliers like Intel and Xilinx on the inference side. Other players are cloud computing companies like Google, whose tensor processing unit touched off millions of dollars of new investment in the sector. New challengers are lighting a fire under Nvidia, which has added custom tensor cores to boost the performance-per-watt of its latest graphics chips, which are the current gold standard in training.

Wave is zigging where other companies are zagging. Most new chips are accelerators, filled with thousands of cores that run machine learning tasks faster and more efficiently. But Nvidia’s graphics chips and other accelerators still need instructions from separate processors called hosts. And the amount of power and latency the host wastes taking calls from the accelerator is not insignificant, Meyer said.

“Going back and forth to a host is a bottleneck, so we eliminated the host completely from the equation,” Meyer told Electronic Design. The company removed the clocks from its chips, the fundamental way that chips organize and execute their work, allowing them to operate at around 6.7 GHz. An embedded microcontroller loads instructions, cutting down on power and latency wasted by traditional accelerators.

Each chip contains 16,384 cores embroidered into a coarse-grained reconfigurable array (CGRA), which is designed to avoid the disadvantages of field-programmable gate arrays (FPGAs) and application-specific integrated circuits (ASICs), which excel at running very specific types of software. The 16-nanometer FinFET processor supports real-time programmability and operations ranging from 8-bits and 16-bits to 32-bits and 64-bits.

There's a debate over whether to favor power efficiency or flexibility in the development of machine learning chips, and Wave’s processor roughly falls between FPGAs and ASICs. “In the world of artificial intelligence, software is changing almost daily, so you need something that can adapt to changing algorithms,” Meyer said. “If the world moves on, you throw out your ASIC and build another one.”

But the new architecture is useless without tools capable of mapping machine learning algorithms onto silicon. Three-quarters of the company, which employs 80 people, is focused on building compilers and other software tools to get customers programming with TensorFlow and other frameworks. No wrestling with hardware needed, Meyer said. These tools could be critical to challenging Nvidia, which has reinforced its market stronghold with investments in Cuda, its programming software.

“We know what the structure of the neural network is in advance and what operations we need to perform because it is wired into the neural network,” said Meyer. “You can grab software components called agents from Wave’s library and program the array of processors to load those instructions. They don’t go back and forth to the host because they do what they’re supposed to do, and they know because the neural network tells them.”

The key question is whether customers will switch from Nvidia’s graphics chips to Wave’s rival technology, and software could help determine the answer, industry analysts say. Nvidia’s focus has been on boosting the versatility of its graphics chips, which work with every major machine learning environment, including TensorFlow and Caffe. Wave only supports TensorFlow, though it plans to expand into others.

But winning over customers is still an educational endeavor for Wave. Last year, the company started an early access program to give potential customers an inside look at Wave's architecture. This month, the company is rolling out its first product to these early customers, who can install the workstation in hospitals, factories or wherever else they want to experiment with machine learning models before training them in the cloud.

Focusing somewhere other than the cloud could give Wave some breathing room. Cloud computing companies like Google and Microsoft are paving over potential revenue by building their own chips for data centers, making the competition even fiercer. Wave’s focus is not only on these major corporations but also smaller enterprises dipping their toes into machine learning to assess medical records, financial transactions and other data that can’t be stored in the cloud.

Wave is standing apart in other ways. The company recently bought MIPS Technologies, which is targeting its namesake architecture at machine learning for embedded devices. Wave plans to combine the multithreading and real-time capabilities of the MIPS architecture with its dataflow hardware, but a spokesperson declined to comment on what the new solutions would look like or whether they would scale down to sensors and cameras.

Some early Wave employees previously worked for MIPS Technologies, which was owned by Banatao’s Tallwood Venture Capital before the transaction. Meyer was once vice president of sales and marketing for MIPS. Other former MIPS employees came over in recent years. Former chief technology officer Mike Uhler was hired in 2016 and its former head of engineering Darren Jones joined in 2015. MIPS will continue to license CPU cores as a separate business unit.

“The acquisition of MIPS by Wave Computing is a bold move, and could accelerate its time to profitability and industry presence,” said Karl Freund of Moor Insights and Strategy in a statement. The MIPS architecture powers billions of devices globally, with wide support for operating systems and tools. Nvidia estimates the market for chips used in training and inferencing could generate $15 billion and $11 billion, respectively, in 2020.

The company could also benefit as machine learning – specifically the training phase – spills out data centers, said Jin Kim, Wave’s chief data scientist. He told Electronic Design last year on-premise applications will likely grow faster than the data center. Smaller enterprises are increasingly looking to repurpose machine learning software for other tasks, training models that already grasp the basics of how to do something, he said.

Financial and healthcare companies are toying with a technique called transfer learning, which involves removing the layers of a neural network trained for a specific task, like identifying faces in a photo, while preserving lower levels of the network, which handle more primitive tasks like pattern matching. The higher levels can be taught more specialized tasks, like spotting cancerous spots on skin, with less training.

The shift toward unsupervised learning could unseal other markets. “Most of the economic value of artificial intelligence comes from supervised learning right now, but less than 10 percent of data out there is labeled or can be labeled,” said Kim. “If you can do unsupervised learning, suddenly the 90 percent that you can’t label becomes accessible, and that dramatically expands the business opportunities for artificial intelligence.”