Nvidia's Latest Chip Aims Five Thousand Cores at Deep Learning

Nvidia's chief Jensen Huang said that it contains 21.1 billion transistors and that the processor's die measures 815 square millimeters or a little smaller than an Apple Watch face.

James Morra

May 10, 2017

5 min read

Add Us On Google

Nvidia_Data_Center — (Image courtesy of Nvidia).

Nvidia unveiled a graphics chip with more than five thousand processing cores that significantly hasten deep learning jobs in data centers. The design underlines the chip maker's belief that artificial intelligence is central to its future.

Jensen Huang, Nvidia's chief executive, announced the new product last week at the company's annual GPU Technology Conference in San Jose. The Tesla V100 is capable of training software on large amounts of data, applying what it has learned on new information or transferring that task to conventional computer chips.

The processor contains 21.1 billion transistors measuring only 12 nanometers, making it roughly 50% faster than Nvidia’s previous graphics architecture called Pascal for traditional computing jobs. The processor’s die is massive, measuring 815 square millimeters or slightly smaller than an Apple watch face.

“It is at the limit of photolithography, meaning you can’t make a chip any bigger than this because the transistors would fall on the ground,” said Huang, wearing his usual black leather jacket, during the opening keynote. “Just the fact that this could be manufactured is an incredible feat.”

The announcement shows how seriously Nvidia is targeting customers in the market for deep learning. The chip maker poured around $3 billion into developing the Volta architecture at the heart of the V100, which contains specialized tensor cores to run mathematical operations in deep neural networks.

The parallel processor contains 640 such cores, which together can perform 120 trillion operations per second on deep learning workloads. It stands out not for rendering graphics but for teaching autonomous cars to drive within highway lanes or simulating what could happen if two galaxies collided.

Huang said that the new architecture would train deep learning software 12 times faster than Pascal, allowing models to be created in days rather than weeks. The cores are also six times faster at inferencing or solving new problems based on how software models are trained in data centers.

The Tesla V100 contains a new generation of HBM2 memory from Samsung as well as a novel memory controller to improve memory bandwidth 50% over Pascal. It uses Nvidia’s proprietary interconnect to transfer data to computer processors at 300 gigabytes per second.

The advances amount to the clearest statement yet that artificial intelligence is the future for Nvidia, which burnished its brand in video game graphics. For years, the company has tuned its chips to handle the workloads of cloud companies like Google, Microsoft, and Amazon. Huang introduced Nvidia’s own cloud computing platform this week.

Last year, Nvidia started an accelerator program called Inception that offers hardware and programming tools for start-ups in fields like medical diagnostics and predictive maintenance. Thirteen of the fifteen companies funded by Nvidia’s venture capital unit are involved in deep learning or autonomous driving, a field where Nvidia has supplied hardware to Audi, Mercedes, and Toyota.

The shift might be existential for Nvidia, but it is far from Intel’s crisis of faith. Though its chips are still indispensable for data centers, the world’s largest chip maker has been forced to admit that new generations of chips have failed to deliver the advances that once came from doubling the number of transistors etched onto silicon chips every two years.

For handling exhaustive tasks like voice recognition, chip makers are shifting toward specialized hardware. Two years ago, Intel acquired Altera for $16.7 billion and its vast knowledge of FPGA chips, which exhibit the same parallelism that lets graphics chips divide up programs and run the pieces simultaneously.

Intel sells FPGAs, which can be reprogrammed on the fly, in its Go autonomous driving system as well as its Deep Learning Inference Accelerators for data centers. The chips are used as accelerator chips in servers at Baidu and Microsoft, which has also built custom FPGAs for its machine learning work.

Last year, Intel spent almost $400 million on Nervana Systems, whose chief Naveen Rao – now vice president of Intel’s artificial intelligence unit – had started devising custom chips to train deep learning software faster than graphics chips. Nervana argued that the chips are inefficient because they are not built for anything but graphics.

The pendulum in the semiconductor industry is swinging toward customized chips, now that internet companies are tailoring chips for machine learning. Last year, Sundar Pichai, Google’s chief executive, announced that the search engine giant’s had built the tensor processing unit for accelerating inferencing tasks in its data centers.

Pichai claimed that the TPU is three processor generations ahead of traditional computer and graphics chips, escaping some of the gravity of Moore’s Law. The chip has been used to enhance Google’s search engine and an artificial intelligence program that last year mastered the devilishly complex board game of Go.

Last month, Google unsealed the performance of its tensor processing unit, which it claims is 15 to 30 times faster than GPUs and CPUs on inferencing tasks and 30 to 80 times more energy efficient per trillion operations. Nvidia contends that Google was measuring its TPUs against older graphics chips, which are still the fastest for training software.

With the rise of custom silicon, Nvidia is increasingly finding itself in competitive cross hairs. Trying to break into machine learning chips are a number of start-ups, including stealthy companies like Groq, which was co-founded by Jonathan Ross, a former Google engineer that helped invent the tensor processing unit.

But the Volta architecture is a powerful statement of Nvidia’s head start in accelerated computing. It is not clear how badly Nvidia has been hurt by customization efforts at Google and others, but its data center business reaped $409 million in the first quarter of this year, an increase of 186% from a year earlier.

Nvidia’s bet on graphics accelerators are paying dividends. Last week, the company reported first quarter revenue of $1.94 billion, up from $1.3 billion a year earlier. Its profits soared to $507 million, up from $208 million in last year's first quarter. Nvidia’s stock price jumped almost 20% during Huang’s keynote, noted an attendee on Twitter.

"It is the reason of our existence, recognizing that we need to find a life after Moore's Law," Huang said.

About the Author

James Morra

Senior Editor

James Morra is the senior editor for Electronic Design, covering the semiconductor industry and new technology trends, with a focus on power electronics and power management. He also reports on the business behind electrical engineering, including the electronics supply chain. He joined Electronic Design in 2015 and is based in Chicago, Illinois.