NVIDIA Brings Kubernetes to GPUs

Kubernetes—a container orchestration tool for CPUs—was just extended so that it can now take on GPU application and resource management.

William G. Wong

April 3, 2018

3 min read

One aspect of using the cloud is having lots of computing and storage resources available at any time. The challenge is managing them and deploying them. This is where Kubernetes enters the picture.

Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications. Google uses it to run billions of containers every week, and others are doing the same thing. It makes management of cloud resources practical, even with a small management team, since reusing configurations is significantly easier than individually configuring systems. The other advantage is these containers can run almost anywhere—that’s because the underlying structure of the cloud is relatively consistent within the orchestration of Kubernetes.

Www Electronicdesign Com Sites Electronicdesign com Files Kubernetes Fig1

1. NVIDIA demonstrated Kubernetes running on its GPGPUs.

This makes NVIDIA’s demo (Fig. 1) very impressive for two reasons. First, it allows Kubernetes to manage GPGPU resources while managing CPUs. The GPU awareness enables more effective management of GPU applications, such as training in machine-learning (ML) applications.

Second, users are able to take advantage of NVIDIA’s large-scale GPGPU solutions like the latest DGX-2 (Fig. 2). The DGX-2 utilizes NVIDIA’s 16-port NVLink switch chips along with 16 32-GB Tesla V100 GPGPUs. The GPGPUs share a 512-GB HBM2 memory space. This large compute engine will be deployed in the cloud with Kubernetes support.

The NVIDIA demonstration in Fig. 1 scanned pictures of flowers using machine-learning algorithms to identify the type of flower in the picture, taking advantage of GPU compute resources. This is an application that scales well by adding more, parallel instances of the same application. Of course, the demo starts with a single GPU and with an obvious performance increase as more systems with GPUs are brought online.

Www Electronicdesign Com Sites Electronicdesign com Files Kubernetes Fig2

2. NVIDIA’s DGX-2 combines 16 32-GB Tesla V100 GPGPUs into a single GPU system using NVSwitch chips.

Kubernetes was managing the systems, and it’s easy to see how this could scale to hundreds or more systems using GPGPUs. However, the really impressive part of the demo occurred when some systems were removed and automatically replaced by other nodes within the cloud. This type of load leveling and resilience are part of the Kubernetes solution that’s now applicable to GPU-managed containers and resources.

So far, Amazon Web Services (AWS) and Google Cloud Platform (GCP) provide this type of GPU-enabled Kubernetes support. Microsoft Azure certification is in the works.

The GPU support is implemented as a Kubernetes plug-in. The plug-in makes GPUs a first-class resource that’s managed like other system resources. Future enhancements include GPU monitoring as well as support for different GPUs.

Such support should prove very useful for ML training and inference in the cloud. It can work in private, public, or hybrid clouds, as well as large embedded environments. Large-scale automotive and robotic simulations could benefit from this support, too.

Www Electronicdesign Com Sites Electronicdesign com Files Link Source Esb Looking For Parts Rev Caps 0

About the Author

William G. Wong

Senior Content Director - Electronic Design and Microwaves & RF

I am Editor of Electronic Design focusing on embedded, software, and systems. As Senior Content Director, I also manage Microwaves & RF and I work with a great team of editors to provide engineers, programmers, developers and technical managers with interesting and useful articles and videos on a regular basis. Check out our free newsletters to see the latest content.

You can send press releases for new products for possible coverage on the website. I am also interested in receiving contributed articles for publishing on our website. Use our template and send to me along with a signed release form.

Check out my blog, AltEmbedded on Electronic Design, as well as his latest articles on this site that are listed below.

You can visit my social media via these links:

I earned a Bachelor of Electrical Engineering at the Georgia Institute of Technology and a Masters in Computer Science from Rutgers University. I still do a bit of programming using everything from C and C++ to Rust and Ada/SPARK. I do a bit of PHP programming for Drupal websites. I have posted a few Drupal modules.

I still get a hand on software and electronic hardware. Some of this can be found on our Kit Close-Up video series. You can also see me on many of our TechXchange Talk videos. I am interested in a range of projects from robotics to artificial intelligence.