When he was promoted to chief executive of Xilinx, Victor Peng said that the job was a rare opportunity to capitalize on the fundamental changes taking place in computing. In his previous roles as executive vice president of product and then briefly chief operating officer of the FPGA maker, he became acutely aware of the challenges.
He had overseen the development of the last three generations of Xilinx FPGAs and has been working to thrust them further into the market for accelerated computing and artificial intelligence. After two months as C.E.O., Peng has charted a similar course for the San Jose, California-based company, one focused on bringing the benefits of programmable silicon to the masses.
On Monday, he announced what he called a completely new product category called the adaptive compute acceleration platform (ACAP). These heterogenous multicore chips can be programmed to accelerate a wide range of tasks, giving them similarities to traditional FPGAs. But users can program ACAPs without going into the guts of the hardware. They can use a software language like Python instead.
Peng also announced a new strategy Monday that doubles down on data centers, where Xilinx has pushed FPGAs to accelerate applications ranging from machine learning tasks like image recognition to processing human genomes in minutes versus days. The company's FPGAs can also be used in server storage and networking.
The central component of that strategy is the ACAP and the first processor family based on the technology, codenamed Everest. The company said that the Everest line of chips can accelerate machine learning tasks with better performance-per-watt than general purpose chips like CPUs and GPUs, which are the current gold standard.
“While FPGA and Zynq SoC technologies are still core to our business, Xilinx is not just an FPGA company anymore,” Peng said. With Everest, which will be first manufactured with 7-nanometer technology and released next year, Xilinx is trying to build momentum with cloud giants offering its chips like Amazon, Baidu, TenCent, and Alibaba.
Xilinx is also trying to become more than a peripheral player in machine learning chips. The benefit of programmable logic is that it can be customized for specific algorithms and altered as algorithms change. It is the opposite of an ASIC, which trades flexibility for performance. Xilinx is targeting the inference side of machine learning, not training, where Nvidia GPUs excel.
The race to artificial intelligence, and the slowing of the Moore’s Law, the chip industry’s economic engine for decades, has made the development of chips a high stakes game. Nvidia is spending billions into to hold onto its dominant lead in machine learning chips. Other companies, including startups as well as Google and Microsoft, are either trying to muscle into the business or wean themselves off Nvidia.
And yet no one knows what the future of computing will look like and what combination of processors will ultimately be installed in things like wireless sensors monitoring a factory floor, gateways that pool information from them, and the data centers where it is all analyzed. How important will CPU, GPU, FPGA, DSP, ASIC, or ACAP accelerators be? It's still early.
Traditionally, the FPGA has been a bulky and expensive slab of hardware suited for aerospace, industrial, medical, test and measurement, and other markets. These chips are divided into separate routing and logic blocks arranged in a checkerboard pattern, and programmers tie together the individual cells based on the workload.
Xilinx said that ACAP devices lower the bar for that type of programming for users both in data centers and other industries, like test equipment and wireless. “We want to increase the number of users on our products by an order of magnitude,” Peng said in a recent conference call with reporters. “There are easily a thousand times more software developers than there are FPGA developers.”
Everest, and any other accelerator based on ACAP technology, will include a next generation FPGA as well as a software programmable, yet hardware adaptable, compute engine based on a new architecture that the company has declined to discuss in more detail. These processors will be connected to highly integrated programmable I/O blocks with a network-on-a-chip (NOC) system.
The NOC allows Everest and other ACAP devices to be programmed with more flexibility than FPGAs, the company said. The I/O functionality ranges from high bandwidth memory to advanced SerDes technology, which the company recently pushed to 56 gigabits per second and plans to boost further in the next few years.
The chip can be programmed with software languages including Python and OpenCL, the company said. “Simply dropping an FPGA block into an ASIC design doesn’t really bring anything new and different to the game, but Xilinx designed a lot of flexibility into the rest of the ACAP design,” said Paul Teich, an analyst with Tirias Research.
Reducing the programming complexity also allows the hardware to be altered faster and conform to different workloads more dynamically. “Although compiling an application to accelerate a workload can take second, minutes or hours, dynamically swapping the combined binaries takes milliseconds,” a Xilinx spokesperson told Electronic Design.
“We are talking about data centers being able to program their servers to change workloads depending upon compute demands, like video transcoding during the day and then image recognition at night,” said Patrick Moorhead, founder of technology research firm Moor Insights & Strategy, in statement.
Xilinx said that the Everest line of chips would scale to devices with 50 billion transistors. The 7nm processor provides 20 times better performance on machine learning workloads than 16nm Virtex FPGAs. The company also said that radios based on Everest would have four times the bandwidth of radios based on 16nm technology.
The company has not yet benchmarked the Everest accelerator against GPUs but claimed that it would provide up to a hundred times the performance of CPUs. The company plans to tape out Everest with Taiwan Semiconductor Manufacturing Corporation before the end of the year, with customer shipments scheduled for next year.
Xilinx has been working to tune its products for data centers. Before he was promoted to chief executive, Peng pushed the company to release software libraries that can accelerate image recognition on its hardware. The company also moved to integrate SoCs onto programmable dies and partner on the development of the Cache Coherent Interconnect for Accelerators, or CCIX.
Xilinx said that Everest would be essential to helping it push further into data centers. The ACAP architecture “is likely to give the cloud giants more of what they want from FPGA-based hardware platforms,” said Teich. “Xilinx didn’t design ACAP in a vacuum – early software tools are available to strategic customers.” A spokesperson declined to comment on the identity of these customers.
And the programming tools for Everest should move even further up the software stack. “Fundamentally, there is a cap on the number of programmers who can understand massively parallel programming,” said Teich. “It requires thinking differently. Everyone in the industry is trying to develop software tools to hide that complexity from most programmers, who just want to simulate things and look for patterns in data.”
That was the same thinking behind the devleopment of TensorFlow, Caffe and other machine learning libraries. That type of software has been an equalizer in the machine learning hardware space as startups like Graphcore and Wave Computing can focus on the development of compilers rather than an expansive ecosystem on par with Nvidia’s Cuda platform.
Peng said Xilinx is also working on integration with TensorFlow, Caffe and other frameworks.
Every chip maker targeting artificial intelligence is spending lavishly. Xilinx poured more than a billion dollars over four years – almost half the company’s $2.15 billion in research and development expenses from 2014 to 2017 – and more than a tenth of its $9.32 billion in revenue over the same span – into roughly 1,500 employees that worked on the ACAP and Everest.
The expense underlines the lengths that companies are going to develop machine learning chips. Jensen Huang, Nvidia’s chief executive officer, has said that it cost $3 billion to create the company’s Volta architecture. Volta graphics chips contain custom tensor cores that accelerate the matrix multiplications used in deep learning.
Intel, struggling with the speed of the artificial intelligence market, reportedly spent $350 million in its acquisition of Nervana Systems, which has been tasked with engineering a server chip for neural network training. The company also allegedly spent about $400 million on Movidius, which built a computer vision chip architecture for edge devices.
Hundreds of millions of dollars of funding is flowing into hardware startups, too. Last year, Graphcore raised $50 million in financing led by Silicon Valley institution Sequoia Capital, while Wave Computing has raised $117 million from investors including Dado Banatao. Cerebras Systems, which last year was reportedly valued at almost $1 billion, is funded in part by Benchmark Capital.
“The world of CPU-centric computing is over,” Peng said. “What that means is that in this new era, architecture will be heterogeneous with accelerators. I say accelerators, not accelerator, because of the breadth of applications that will integrate some form of artificial intelligence is vast. There is not a single accelerator that will do all that well.”
Xilinx believes that the ACAP technology inside Everest can take the place of multiple accelerators, switching between different workloads and algorithms dynamically. “We are in the very early stages of the emergence of artificial intelligence,” Peng told reporters in the conference call.