AMD Announces Next-Gen GPUs and Software for AI
What you’ll learn:
- What advances were made with the fourth-gen AMD Instinct MI350?
- What’s new with ROCm 7?
- How AMD fares against the competition.
AMD’s latest press conference highlighted a family of new GPUs and software that target artificial-intelligence (AI) applications. These tools compete with NVIDIA and Intel as well as other AI-focused hardware vendors.
AMD’s Instinct MI350 GPGPU
The AMD Instinct MI350 is the company’s fourth-generation Instinct architecture using a 3-nm process node and CoWoS-S chiplet packaging architecture (Fig. 1). The system interconnect is based on the AMD Infinity Fabric AP Interconnect.
The chip packs in 185 billion transistors and incorporates eight stacks of high-bandwidth memory, HBM3E (Fig. 2). The architecture employs a N3P accelerator complex die (XCD) stacked on two N6 I/O base die (IOD). The XCD includes eight 32 AMD CDNA 4 compute units. In addition, the GPU can be divided into one to eight partitions with SR-IOV support.
The chip also adds support for smaller, AI-friendly floating-point formats, including FP4 and FP6. The more compact formats reduce large-language-model (LLM) size and increase performance of the models because smaller values can be manipulated more efficiently.
If models utilize the smaller floating-point values, then performance can be improved by a factor of four. The 288 GB of HBM3E can support LLMs with up to 520 billion parameters.
As part of the announcement, AMD is supporting the UBB8 industry-standard GPU node in both air-cooled and direct liquid-cooled versions (Fig. 3). The boards include eight AMD Instinct chips, which target large data centers. Total HBM3E memory in the system is 2.3 TB with a bandwidth of 64 TB/s. Floating-point performance scales from 161 PF for PF4 to 0.63 PF for FP64 values.
The eight chips are linked via 153-GB/s AMD Infinity Fabric. The 128-GB/s PCI Express (PCIe) Gen 5 links provide outside communication.
ROCm 7.0 Improvements
ROCm provides the underlying software support for AMD’s hardware (Fig. 4). It’s comparable to NVIDIA’s CUDA. One major difference is that ROCm is open source. It enables software to be written so that it can target different hardware platforms like CPUs, GPUs, DPUs, and compute clusters.
AMD ROCm Enterprise AI builds on ROCm 7 with support for Kubernetes and slurm with support for cluster provisioning and system telemetry.
AMD’s AI Environment
AMD’s Instinct and ROCm takes on a wide range of solutions from embedded applications that have a single CPU or GPU through cloud computing with disaggregated compute and storage. The three main components in higher-end systems include the company’s x86 EPYC “Turin” CPU, the Instinct MI350 Series GPU, and the AMD Pollara 400 NIC. The latter is designed to support low-latency, 400G Ethernet connectivity. The P4-based architecture supports ATS and RDMA as well as the new P4DMA.
AMD has large-scale data center solutions built on the latest GPU (Fig. 6). These are designed to deliver dense compute solutions based on the latest GPUs. In a forward-looking view, the company talked about the “Helios” AI-optimized rack solutions for 2026 delivery. It will use the next-generation CPU, GPU and NIC.
ROCm 7 works with the AMD Developer Cloud. This provides zero setup environment using Jupyter Notebooks and integration with GitHub. Preinstalled Docker containers support the latest AI tools.