No one knows what the future of artificial intelligence will look like and no one knows what computer architecture will take it there. For years, Nvidia has been trying to expand the market for its graphics chips, which are the current gold standard for training and running algorithms based on deep learning. And it has often used tactics unrelated to the performance of its chips.
Last year, the company released the DGX Station to enable software engineers to experiment with software libraries used in artificial intelligence and improve algorithms before sending them to the cloud, where the software is trained on enormous amounts of data. The workstation contains chips based on Nvidia’s Volta architecture and provides 480 trillion floating-point operations per second (tflops).
The DGX workstation shares the same software stack as the DGX-1 appliance, a miniature supercomputer that provides 960 tflops of peak performance. That way, software engineers can swiftly swap software between Nvidia’s workstations and appliances, which can be installed in data centers where training typically happens.
Nvidia introduced both products to tighten its grip over the artificial intelligence market and promote its Volta architecture, which contains custom tensor cores for handling deep learning. But according to one industry executive, the company’s rivals could use the same strategy to push their custom chips onto software engineers.
“They say, in the early phases of designing neural networks, we don’t want to go to data centers,” said Jin Kim, chief data science officer for machine learning chip startup Wave Computing. “We want a workstation right next to us for experimentation, taking elements of existing neural networks and putting them together like Lego blocks.”
He declined to disclose whether Wave Computing plans to release its own workstation. But the company, which has raised $117 million over the last nine years, has been putting the finishing touches on an appliance equipped with its dataflow processing unit (DPU), which supports lower precision operations that consume less power and memory than traditional chips.
When it is finished, the appliance is projected to provide performance of 2.9 quadrillion operations per second (2.9 petaflops) for machine learning workloads. Wave Computing has also built a special compiler that translates code into a form that its silicon can understand. The company designed its coarse-grained reconfigurable array chips to have 16,384 cores.
Wave Computing is acutely aware that software engineers are asking for workstations to experiment with algorithms outside of the data center, said Kim. Other startups have almost certainly gotten the same requests. But none have ventured to challenge Nvidia yet.
Nvidia pushes its appliances and workstations as a powerful package for deep learning. It sold one of the first bundles to Avitas Systems, an automated inspection startup that coaches software to identify corrosion and other defects using photographs of things like underwater pipelines and critical equipment in facilities like power plants and oil refineries.
“You might have researchers or data scientists that are doing a lot of experimentation, refining their models,” said Tony Paikeday, director of product marketing for Nvidia’s DGX Systems. “At that stage of the development lifecycle, we found that developers prefer not to feel encumbered by a resource sitting way out in a data center. They want it to sit close to where they sit.”
“We did this because we wanted to offer a proof point to the marketplace,” said Paikeday, adding that Nvidia looks forward to original equipment manufacturers selling their own workstations using its graphics chips and software language. “We wanted to set a blueprint for them to follow,” he said in an interview with Electronic Design.
Other companies may also follow Nvidia’s blueprints. In addition to Wave Computing, both Graphcore and Intel are working on server appliances that could potentially be paired with a workstation. Startups like Groq and Cerebras Systems could imitate them, building boxes with custom silicon and the massive amounts of memory required for training.
There are other hints. Over the last year, Graphcore has raised $110 million from investors that include Dell, the second largest supplier of workstations as well as a prolific purveyor of server infrastructure. The company claims that its custom hardware can be used to shorten the training phase of deep learning from days to hours.
Industry analysts say it is also possible that Dell will acquire Graphcore, giving it custom chips to install in its servers and gateways. Hewlett Packard, the largest supplier of workstations and another major maker of servers, is also considered a potential destination for startups like Groq and Cerebras Systems, which are still operating under the radar.
That could affect where these companies stand in the market for deep learning chipsets, which the research firm Tractica predicts could grow from $513 million in 2016 to $12.2 billion by 2025. Nvidia estimates that the market for computer chips used in training could generate $15 billion in 2020, in contrast to $11 billion for inferencing, which requires less powerful chips.
Paikeday could not discuss specific numbers, but he said that Nvidia almost always sells its workstations paired with an appliance. That could change as it continues to charm small companies with the power and portability of the Station. For instance, Avitas Systems takes the system out to where it collects information, editing algorithms in the field.
For Kim, Wave’s chief data science officer, the way that Avitas Systems is using its workstation points to another trend. And it could provide an opportunity for startups like Wave.
“I think training will move out of the data center,” he said in an interview. “There are small enterprises that want to have early stage training done locally, especially when using methods like transfer learning, where you don’t need data centers full of compute resources to train and deploy reasonably accurate models.”
Transfer learning is a technique for training software on small platters of data. It involves removing the layers of a neural network trained for a specific task, like identifying faces in a photograph, while preserving lower levels of the network, which handle more primitive tasks like pattern matching. Using the lower levels as the foundation, the higher levels can be trained on smaller amounts of data to complete more specialized tasks, like spotting skin cancer.
Many industries like finance and healthcare are starting to experiment with transfer learning, said Kim. And since it requires less computing power and less data than training from scratch, engineers could wing it with appliances or workstations instead of renting cloud infrastructure. That could be a faster growing market than data centers, said Kim.