Electronicdesign 21532 Kubernetes Promo
Electronicdesign 21532 Kubernetes Promo
Electronicdesign 21532 Kubernetes Promo
Electronicdesign 21532 Kubernetes Promo
Electronicdesign 21532 Kubernetes Promo

NVIDIA Brings Kubernetes to GPUs

April 3, 2018
Kubernetes—a container orchestration tool for CPUs—was just extended so that it can now take on GPU application and resource management.

One aspect of using the cloud is having lots of computing and storage resources available at any time. The challenge is managing them and deploying them. This is where Kubernetes enters the picture.

Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications. Google uses it to run billions of containers every week, and others are doing the same thing. It makes management of cloud resources practical, even with a small management team, since reusing configurations is significantly easier than individually configuring systems. The other advantage is these containers can run almost anywhere—that’s because the underlying structure of the cloud is relatively consistent within the orchestration of Kubernetes.

1. NVIDIA demonstrated Kubernetes running on its GPGPUs.

This makes NVIDIA’s demo (Fig. 1) very impressive for two reasons. First, it allows Kubernetes to manage GPGPU resources while managing CPUs. The GPU awareness enables more effective management of GPU applications, such as training in machine-learning (ML) applications.

Second, users are able to take advantage of NVIDIA’s large-scale GPGPU solutions like the latest DGX-2 (Fig. 2). The DGX-2 utilizes NVIDIA’s 16-port NVLink switch chips along with 16 32-GB Tesla V100 GPGPUs. The GPGPUs share a 512-GB HBM2 memory space. This large compute engine will be deployed in the cloud with Kubernetes support.

The NVIDIA demonstration in Fig. 1 scanned pictures of flowers using machine-learning algorithms to identify the type of flower in the picture, taking advantage of GPU compute resources. This is an application that scales well by adding more, parallel instances of the same application. Of course, the demo starts with a single GPU and with an obvious performance increase as more systems with GPUs are brought online.

2. NVIDIA’s DGX-2 combines 16 32-GB Tesla V100 GPGPUs into a single GPU system using NVSwitch chips.

Kubernetes was managing the systems, and it’s easy to see how this could scale to hundreds or more systems using GPGPUs. However, the really impressive part of the demo occurred when some systems were removed and automatically replaced by other nodes within the cloud. This type of load leveling and resilience are part of the Kubernetes solution that’s now applicable to GPU-managed containers and resources.

So far, Amazon Web Services (AWS) and Google Cloud Platform (GCP) provide this type of GPU-enabled Kubernetes support. Microsoft Azure certification is in the works.

The GPU support is implemented as a Kubernetes plug-in. The plug-in makes GPUs a first-class resource that’s managed like other system resources. Future enhancements include GPU monitoring as well as support for different GPUs.

Such support should prove very useful for ML training and inference in the cloud. It can work in private, public, or hybrid clouds, as well as large embedded environments. Large-scale automotive and robotic simulations could benefit from this support, too.

Sponsored Recommendations

Board-Mount DC/DC Converters in Medical Applications

March 27, 2024
AC/DC or board-mount DC/DC converters provide power for medical devices. This article explains why isolation might be needed and which safety standards apply.

Use Rugged Multiband Antennas to Solve the Mobile Connectivity Challenge

March 27, 2024
Selecting and using antennas for mobile applications requires attention to electrical, mechanical, and environmental characteristics: TE modules can help.

Out-of-the-box Cellular and Wi-Fi connectivity with AWS IoT ExpressLink

March 27, 2024
This demo shows how to enroll LTE-M and Wi-Fi evaluation boards with AWS IoT Core, set up a Connected Health Solution as well as AWS AT commands and AWS IoT ExpressLink security...

How to Quickly Leverage Bluetooth AoA and AoD for Indoor Logistics Tracking

March 27, 2024
Real-time asset tracking is an important aspect of Industry 4.0. Various technologies are available for deploying Real-Time Location.

Comments

To join the conversation, and become an exclusive member of Electronic Design, create an account today!