Switch ASIC Wires Together Data Centers for Faster AI

A new high-bandwidth, low-latency switch from Broadcom can wire together tens of thousands of chips in data centers over standard Ethernet, in a bid to run artificial-intelligence workloads faster.

The Jericho3-AI serves as a networking fabric that reduces the time it takes to run AI training and inference in the data center. Complementing Broadcom’s high-end Tomahawk and feature-rich Trident series, the switch is said to provide access to more bandwidth for less latency and more advanced packet processing that can be used to stay a step ahead of future AI work.

The problem the company is trying to tackle with the Jericho3-AI is related to the recent boom in advanced AI models, including ChatGPT from Microsoft-backed startup OpenAI and Bard from Google. These technologies are taking a toll on the networks inside the AI supercomputers where they’re trained and run.

These AI models must be trained on colossal amounts of data for several days or even months at a time. But the underlying workloads are too large and complicated for a single server to manage alone.

Instead, cloud and other technology giants spread out the job over thousands of graphics processing units (GPUs) that are wired together in a cluster. But as it stands, these workloads are limited by the latency and bandwidth of the underlying network.

Broadcom said Jericho3-AI can stitch together up to 32,000 GPUs at the same time, giving each one up to 800 Gb/s of connectivity over Ethernet. It also supports RDMA over Converged Ethernet (RoCEv2), a networking protocol that allows for remote direct memory access over Ethernet.

AI Networking Acceleration

The networking chip leverages 144 lanes of SerDes that operate at 100-Gb/s PAM4, which can supply up to 28.8 Tb/s of total bandwidth. The Jericho3-AI supports up to 72x 200G, 36x 400G, or 18x 800G ports on the front of the switch. It’s the latest offering in its Jericho family, specifically designed to handle deep buffering in the event of high network congestion.

“The benchmark for AI networking is reducing the time and effort it takes to complete the training and inference of large-scale AI models,” said Ram Velaga, SVP and GM of the core switching business at Broadcom.

To that end, Broadcom said Jericho3-AI brings up to 25.6 petabytes per second of network connectivity to every server in the data center, which is approximately 4X more bandwidth than its predecessor.

Jericho3-AI features intelligent load balancing to spray data equally over the fabric, preventing the network from becoming overcrowded under the highest loads. The company explained that the new chip brings other features into the fold to reduce the risk of collisions and jitter that can add latency. It can also reroute packets of data around blockages in the network to reduce the runtime for AI computations.

These improvements cut down the time it takes to run AI workloads by 10% compared to alternatives based on InfiniBand, according to Broadcom. This plays favorably into the economics of cloud data centers since AI chips can be used more efficiently, saving costs. Switches and other types of server chips are among the most expensive chips on the market, with price tags of up to thousands of dollars.

Broadcom said the network “pays for itself” when the Jericho3-AI is plugged into AI-focused data centers.

The company is betting on Jericho3-AI to stay competitive with other networking fabrics, specifically NVIDIA’s InfiniBand technology. NVIDIA is also the leading player in GPUs that are the current gold standard for AI.

Even though InfiniBand is at the heart of many of the world’s fastest supercomputers, Ethernet gives companies a high degree of flexibility in network architecture. Ethernet also belongs to a much larger ecosystem of hardware and software vendors. Thus, companies can choose the best parts from different suppliers, instead of having to buy server GPUs and switch ASICs from the same purveyor.

On top of faster connectivity, Broadcom said Jericho3-AI also burns through 40% less power per gigabit, which will go a long way in reducing the vast amount of electricity expended by data centers.