662becefc0508c001e507c23 Promo Smart Cxa8f2w

Packing CXL-Attached Memory into a PCIe Card

April 26, 2024

SMART Modular Technologies’ 4- and 8-DIMM CXL-memory boards deliver up to 4 TB of storage.

William G. Wong

Related To:

Electronic Design

TechXchange

CXL for Memory and More

What you’ll learn:

What is CXL-attached memory?
What is the difference between CXL and NVMe storage?
What Smart Modular Technologies is bringing to the table.

Using CXL-attached memory is becoming more commonplace in high-performance servers. It takes advantage of the CXL standard that’s moving into its third incarnation. CXL, built on PCI Express (PCIe) Gen 5, provides a cache-coherent environment that allows for disaggregation of memory. It enables applications running multiple processing elements to access any memory connected to a CXL fabric (Fig. 1).

1. A cache-coherent environment enables applications running multiple processing elements to access any memory connected to a CXL fabric.

Typically, processor-attached memory can be shared among multiple processors. However, in the past, this interface was proprietary. Dual inline memory modules (DIMMs) attached to one processor could be accessed by all, but there was a limitation on the number of DIMMs. On top of that, the approach doesn’t scale to hundreds or thousands of processing elements found in cloud servers. CXL offers this type of scaling while maintaining the cache-coherent support like that of proprietary systems.

Nonvolatile Memory Express (NVMe) is also based on PCIe, but it is block oriented and generally used with flash memory. It doesn’t have to support cache coherency that simplifies the controller. NVMe-over-CXL (NVMe-oC) is an emerging option that takes advantage of CXL while retaining the NVMe interface already supported by operating systems and applications. NVMe-over-Fabric is already in play to address the hyperscaler’s high-performance-computing (HPC) requirements. CXL-attached memory is just another piece to the hyperscaler HPC puzzle.

Delivering on the CXL-Attached Memory Promise

SMART Modular Technologies has pushed the boundaries of memory technology since its inception, so it’s no surprise that the company’s latest products address CXL-attached storage. The eight-DIMM CXA-8F2W (Fig. 2) and four-DIMM CXA-4F1W add-in cards (AICs) include a CXL controller and a bunch of SMART Modular DIMMs.

2. SMART Modular Technologies' CXA-8F2W hosts 4 TB of DDR5-4800 storage from eight DIMMs.

“The CXL protocol is an important step toward achieving industry-standard memory disaggregation and sharing, which will significantly improve the way memory is deployed in the coming years,” said Andy Mills, senior director of advanced product development at SMART Modular.

The cards have a full-height, half-length, x16 PCIe form factor. They use standard DDR5 registered DIMMs (RDIMMs) that provide up to 4 TB of storage with the fully populated CXA-8F2W. It uses two CXL memory controllers, delivering a total bandwidth of 64 GB/s with a 200-ns latency. That configuration dissipates 135 W of power. Users can select the RDIMMs with a corresponding reduction in capacity and power requirements. The top end uses 512-GB modules, while a 90-W system would employ 64-GB modules for a 512-GB capacity.

What may be interesting to some is that the x16 PCIe card exposes the CXL controllers as a pair of x8 PCIe connections. This is supported by the PCIe standard as well as switches that negotiate the type of connection and speeds involved. It offers a more efficient interface overall.

Why CXL-Attached Memory is Important

Any programmer knows that there’s never enough available memory. This is especially true for the massive cloud servers that provide HPC services. Artificial intelligence and machine learning (AI/ML) demands in this space include the need for very large amounts of memory, which is available with CXL-attached memory.

Using cache-coherent CXL-attached memory is much more efficient for most applications that must share data, versus NVMe or application-based communication over network connections like Ethernet.

The applicability for embedded solutions tends to be more limited because the number of cores and memory is usually much smaller than cloud servers, where terabytes of memory is equivalent to the proverbial “drop in the bucket.” Still, the flexibility that comes with CXL-attached memory is something embedded HPC application developers should not ignore, especially when dealing with AI/ML applications on the edge.

William G. Wong | Senior Content Director - Electronic Design and Microwaves & RF

I am Editor of Electronic Design focusing on embedded, software, and systems. As Senior Content Director, I also manage Microwaves & RF and I work with a great team of editors to provide engineers, programmers, developers and technical managers with interesting and useful articles and videos on a regular basis. Check out our free newsletters to see the latest content.

You can send press releases for new products for possible coverage on the website. I am also interested in receiving contributed articles for publishing on our website. Use our template and send to me along with a signed release form.

Check out my blog, AltEmbedded on Electronic Design, as well as his latest articles on this site that are listed below.

You can visit my social media via these links:

I earned a Bachelor of Electrical Engineering at the Georgia Institute of Technology and a Masters in Computer Science from Rutgers University. I still do a bit of programming using everything from C and C++ to Rust and Ada/SPARK. I do a bit of PHP programming for Drupal websites. I have posted a few Drupal modules.

I still get a hand on software and electronic hardware. Some of this can be found on our Kit Close-Up video series. You can also see me on many of our TechXchange Talk videos. I am interested in a range of projects from robotics to artificial intelligence.