Nvidia's Turing Architecture Starts to Take Hold

Having long dominated the market for chips capable of training machine learning algorithms, Santa Clara, California-based Nvidia has slightly shifted focus in recent months. Last year the company started selling graphics chips for training based on its Volta architecture, but this year has been focused on inferencing, the act of introducing algorithms to new data for interpretation.

Nvidia released the latest TensorRT software in the second quarter to increase the speed of inference algorithms running on Nvidia chips. The market for these chips is estimated to be $11.8 billion by 2021, versus $8.2 billion for training, according to investment research firm Morningstar. In September, the company announced the Tesla T4 accelerator, which is based on its latest Turing architecture targeting inference.

On Monday, Google said that it would be first to make Nvidia's latest chip available to customers over its cloud. This follows Google’s announcement in August that it would offer Nvidia's previous generation P4 processor to cloud customers. Google continues to give customers access to Nvidia's chips despite investments in its custom tensor processing units—more commonly known as the TPU—for machine learning jobs.

Nvidia is trying to defend its market stronghold in machine learning from rivals ranging from Intel and Advanced Micro Devices to startups Graphcore and Wave Computing. In August, Intel reported that it had made $1 billion in sales last year for Xeon processors used to run artificial intelligence. In May, Nvidia said that it had doubled its annual shipments of inference chips to data center customers.

Jensen Huang, Nvidia’s founder and chief executive officer, said at the Supercomputing conference on Monday that the T4 accelerator is also available in almost 60 servers from leading manufacturers, including Lenovo, Supermicro and Dell. “We have never before seen such rapid adoption of a data center processor,” Ian Buck, the company’s vice president of accelerated computing, said in a statement.

The new processor contains 2,560 cores for processing graphics and 320 custom tensor cores for handling machine learning. Despite having lower throughput and higher latency than Nvidia’s Tesla V100 graphics chips, the T4 supports higher energy efficiency. With 13.6 billion transistors based on 12-nanometer technology, the chip is also significantly faster than its predecessor at speech and image recognition.

The processor also supports many different levels of precision. The company said that it could offer more than 8 trillion floating point operations per second using 32-bit precision and 65 trillion using 16-bit precision. Around 130 trillion operations per second (TOPS) are possible with 8-bit precision, while 4-bit precision offers 260 TOPS. The company said that the T4 consumes around 70 watts.