HBM4 Fuels Next-Gen AI with Unprecedented Memory Bandwidth

HBM4 will be essential when it comes to successfully handling the demands of AI in the cloud.
Dec. 11, 2025
4 min read

What you'll learn:

  • What is HBM4?
  • How does HBM4 differ from HBM3?
  • HBM4 implementation challenges.

The debut of DeepSeek R1 sent ripples through the AI community, not just for its capabilities, but also for the sheer scale of its development. The 671-billion-parameter, open-source language model’s release marked a pivotal moment for AI as it was trained on over 20 trillion tokens, using tens of thousands of NVIDIA H100 GPUs, highlighting the insatiable demand for data in the realm of large language models (LLMs).

Crucial to the H100s’ ability to handle such immense data throughput is its reliance on HBM3 memory. Each H100 SXM GPU uses 80-GB HBM3 memory to offer a bandwidth of 3.35 TB/s. While this represents a significant advancement over previous generations, GPU memory capacity and bandwidth still isn’t growing fast enough to keep pace with the exponential growth of AI models. 

For instance, the H100 offers twice as much memory capacity and bandwidth as NVIDIA's previous-generation A100 GPU, which initially offered 40 GB of HBM2 memory and 1.55 TB/s of bandwidth. However, the size of AI models has grown by over 100X in the last two years — far outpacing memory growth.

This disparity highlights a critical challenge in AI development: Traditional memory technologies simply can't keep up with the bandwidth and capacity demands of modern AI training. Massive datasets need to be rapidly accessed and processed, and without sufficient memory capacity and performance, AI compute resources are left underutilized.

Enter High Bandwidth Memory

This is where high bandwidth memory (HBM) comes in (see table). By stacking memory dies vertically and connecting them with a wide, high-speed interface, HBM delivers a significant leap in performance and capacity compared to traditional memory architectures. It’s quickly become the memory solution of choice for advanced AI workloads.

High bandwidth memory (HBM) has ramped up in both capacity and performance. (Credit: Rambus)

The evolution of HBM has been remarkable. It launched with a 1-Gb/s data rate and a maximum of eight 16-Gb die in a single 3D stack. With HBM3e, an enhanced version of HBM3, the data rate scales up to 9.6 Gb/s, and the devices can support up to 16-high stacks of 32-Gb die for a total of 64 GB per device. 

To cope with the memory bottlenecks encountered in AI training, high performance computing (HPC), and other demanding applications, the industry has been eagerly awaiting the next generation of HBM memory, HBM4. The HBM4 memory standard was recently announced by JEDEC, promising another significant leap forward for the industry.

JEDEC has reached an initial agreement on speed bins up to 6.4 Gb/s. Moreover, by employing a 2048-bit-wide interface — double that of previous HBM generations — HBM4 doubles the memory bandwidth at the same data rate compared to the initial version of HBM3, and has 33% more bandwidth than that supported by HBM3e standard. This translates to significantly faster data access and processing speeds, enabling AI models to train and operate more efficiently than ever before.

HBM4 also incorporates advanced reliability, availability, and serviceability (RAS) features. This is crucial in massively parallel processing architectures with thousands of GPUs, where hardware failures can occur every few hours on average. Higher reliability is paramount to ensuring consistent performance and minimizing downtime.

To fully harness the power of HBM4, a sophisticated memory controller is essential. Leading controllers on the market support the JEDEC spec of 6.4 Gb/s and can be paired with third-party or customer PHY solutions to create a complete HBM4 memory subsystem.

Challenges in Implementing HBM4

Implementing HBM4 presents new challenges. One major obstacle is managing the complexity of data parallelism at higher speeds. New HBM4 controllers incorporate more sophisticated reordering logic. This optimizes the outgoing HBM transactions and incoming HBM read data to keep the high-bandwidth data interface efficiently utilized with manageable power consumption.

Another challenge is thermal management. With higher-performance capabilities, HBM memory controllers must be aware of the potential for thermal hotspots. Next-generation HBM4 controllers address this by providing mechanisms for the host system to read out the thermal condition of the memory die, helping manage the overall system effectively within thermal parameters.

As the era of generative AI unfolds, increasingly sophisticated and data-hungry models will emerge, and the importance of memory bandwidth can’t be overstated. Enabling the next generation of AI will require unlocking unprecedented HBM4 memory performance and beyond. With a keen eye on the future, chip designers are shaping the trajectory of the AI revolution, empowering researchers and developers to push the boundaries of what’s possible.

About the Author

Steven Woo

Steven Woo

Steven Woo is a Fellow and Distinguished Inventor at Rambus Inc., working on technology and business development efforts across the company. He is currently leading research work within Rambus Labs on advanced memory systems for data centers and AI/ML accelerators, and manages a team of senior technologists.

Since joining Rambus, Steve has worked in various roles leading architecture, technology, and performance analysis efforts, and in marketing and product planning roles leading strategy and customer programs. Steve received his PhD and MS degrees in Electrical Engineering from Stanford University, and Master of Engineering and BS Engineering degrees from Harvey Mudd College.

Sign up for our eNewsletters
Get the latest news and updates

Voice Your Opinion!

To join the conversation, and become an exclusive member of Electronic Design, create an account today!