Electronicdesign 21532 Kubernetes Promo

NVIDIA Brings Kubernetes to GPUs

April 3, 2018
Kubernetes—a container orchestration tool for CPUs—was just extended so that it can now take on GPU application and resource management.

One aspect of using the cloud is having lots of computing and storage resources available at any time. The challenge is managing them and deploying them. This is where Kubernetes enters the picture.

Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications. Google uses it to run billions of containers every week, and others are doing the same thing. It makes management of cloud resources practical, even with a small management team, since reusing configurations is significantly easier than individually configuring systems. The other advantage is these containers can run almost anywhere—that’s because the underlying structure of the cloud is relatively consistent within the orchestration of Kubernetes.

1. NVIDIA demonstrated Kubernetes running on its GPGPUs.

This makes NVIDIA’s demo (Fig. 1) very impressive for two reasons. First, it allows Kubernetes to manage GPGPU resources while managing CPUs. The GPU awareness enables more effective management of GPU applications, such as training in machine-learning (ML) applications.

Second, users are able to take advantage of NVIDIA’s large-scale GPGPU solutions like the latest DGX-2 (Fig. 2). The DGX-2 utilizes NVIDIA’s 16-port NVLink switch chips along with 16 32-GB Tesla V100 GPGPUs. The GPGPUs share a 512-GB HBM2 memory space. This large compute engine will be deployed in the cloud with Kubernetes support.

The NVIDIA demonstration in Fig. 1 scanned pictures of flowers using machine-learning algorithms to identify the type of flower in the picture, taking advantage of GPU compute resources. This is an application that scales well by adding more, parallel instances of the same application. Of course, the demo starts with a single GPU and with an obvious performance increase as more systems with GPUs are brought online.

2. NVIDIA’s DGX-2 combines 16 32-GB Tesla V100 GPGPUs into a single GPU system using NVSwitch chips.

Kubernetes was managing the systems, and it’s easy to see how this could scale to hundreds or more systems using GPGPUs. However, the really impressive part of the demo occurred when some systems were removed and automatically replaced by other nodes within the cloud. This type of load leveling and resilience are part of the Kubernetes solution that’s now applicable to GPU-managed containers and resources.

So far, Amazon Web Services (AWS) and Google Cloud Platform (GCP) provide this type of GPU-enabled Kubernetes support. Microsoft Azure certification is in the works.

The GPU support is implemented as a Kubernetes plug-in. The plug-in makes GPUs a first-class resource that’s managed like other system resources. Future enhancements include GPU monitoring as well as support for different GPUs.

Such support should prove very useful for ML training and inference in the cloud. It can work in private, public, or hybrid clouds, as well as large embedded environments. Large-scale automotive and robotic simulations could benefit from this support, too.

Sponsored Recommendations

What are the Important Considerations when Assessing Cobot Safety?

April 16, 2024
A review of the requirements of ISO/TS 15066 and how they fit in with ISO 10218-1 and 10218-2 a consideration the complexities of collaboration.

Wire & Cable Cutting Digi-Spool® Service

April 16, 2024
Explore DigiKey’s Digi-Spool® professional cutting service for efficient and precise wire and cable management. Custom-cut to your exact specifications for a variety of cable ...

DigiKey Factory Tomorrow Season 3: Sustainable Manufacturing

April 16, 2024
Industry 4.0 is helping manufacturers develop and integrate technologies such as AI, edge computing and connectivity for the factories of tomorrow. Learn more at DigiKey today...

Connectivity – The Backbone of Sustainable Automation

April 16, 2024
Advanced interfaces for signals, data, and electrical power are essential. They help save resources and costs when networking production equipment.

Comments

To join the conversation, and become an exclusive member of Electronic Design, create an account today!