Dreamstime_pavlinec_24011721
datacenter_dreamstime_pavlinec_24011721

Can Silicon Supply Enough Power for the Future of AI Silicon?

Dec. 6, 2023
The Briefing: Can traditional power electronics satisfy the rising power demands of AI chips in data centers?

This article is part of the TechXchange: Gallium Nitride (GaN).

As AI chips grow larger to handle the rapid development and deployment of large language models (LLMs) such as OpenAI’s ChatGPT, the amount of power that they require is rapidly ramping up, too.

New power-hungry AI accelerators such as NVIDIA’s H100 GPUs have a thermal design power (TDP) of 700 W, while the most advanced CPUs and purpose-built AI chips are close behind. This is driving up power-per-rack specifications up to 90 kW, which conflicts with what most data centers can manage at 15 to 30 kW per rack. As rack power demands for data centers rise by up to 3X, delivering more power in a smaller space is vital.

What must change to deliver power safely and reliably to these densely packed server racks? What about limiting and removing the heat generated in the process? For Stephen Oliver, VP of marketing and sales at Navitas Semiconductor, the solution starts in the power electronics. He contends that the silicon MOSFETs widely used today to convert and regulate the power entering each rack are reaching the end of their rope.

For its part, Navitas rolled out a new generation of gallium-nitride (GaN)-based power ICs, called GaNSafe, that “break through the glass ceiling” into high power levels reaching 22 kW in data centers, solar, and other renewable-energy storage, and even electric-vehicle (EV) subsystems. It’s placing everything possible into a single GaN power IC, including control, gate-drive, sensing, and protection features that lend themselves to the harsh realities of these applications.

While silicon still dominates, many of the movers and shakers in the world of power semiconductors, such as Infineon and TI, are also pushing for power switching devices based on GaN as an alternative. GaN is the gold standard for consumer fast-chargers and power adapters, but the challenge is proving that it can handle even higher power levels (thousands of watts at a time) as safely and robustly as silicon MOSFETs, said Oliver.

He added, “GaN is relatively new technology, so expanding into these high-reliability areas has taken time.”

Power Supplies: Translating Electricity from the Grid

Once electricity enters a data center from the power grid, it travels through several different checkpoints, where it’s converted, regulated, and conditioned as it gets closer to the processors in servers. These chips require stable, clean DC power at very precise voltages. If the voltage is insufficient, the chip will fail to switch correctly. Overloading the chip with too much voltage can cause permanent damage to its delicate circuitry.

As power traverses a series of transformers when entering the data center, one of the core building blocks it encounters before hitting the processor is a switch-mode power supply (SMPS). Housed in a “silver box,” its job is to step down the high-voltage electricity entering the rack—typically 220 V AC—to a lesser DC voltage that can be successfully used by the circuit board housing AI silicon. For a long time, 12 V was the standard DC voltage output by a power supply, but technology giants are currently raising their standards up to 48 V.

The key to minimizing power loss is to deliver power at higher voltages and lesser currents and then step down the voltage as close as possible to server processors. Since Ohm’s Law states that power is equal to the current times the voltage (P = I × V), upgrading the power delivery to 48 V up from 12 V means four times less current is required. Thus, power losses will be 16X less in the end. Elevating the voltage also makes it possible to use thinner, lighter copper wiring to ferry power through the system, saving on cost.

Even so, server power supplies are becoming more of a challenge to run effectively as the demands for data-center rack power continue to rise, said Oliver. While the silicon MOSFETs that have long dominated power electronics are constantly improving, they fall short when it concerns AI chip power delivery. “You are throwing money at the problem by using more—and larger—silicon transistors than you need,” he noted.

Given the challenges of supplying power to AI accelerators, Oliver said it finally makes sense to swap silicon out for GaN FETs. Using the unique material properties of GaN, these chips reduce gate charge (Qg), output capacitance (COSS), and reverse-recovery loss (QRR) and limit power losses at faster switching frequencies. So, while they are more expensive per unit, they end up bringing better power density to the table, said Navitas.

Oliver pointed out that “if you want three times more power, it’s a lot easier to slide in GaN power electronics. Upgrading from silicon, which is at the end of its life, to GaN is a game changer for the data center.”

GaN Geared for AI Power Delivery

Navitas is upgrading its GaN power ICs to check more of the boxes for AI power delivery in the data center.

GaNSafe is based on the same basic blueprint as its existing GaNFast family, integrating a dc-dc bias supply and gate driver with control and sensing in a single chip to deliver higher power density. But they also add “application-specific” protection features into a 10- × 10-mm, 4-pin TOLL package, which is physically more robust to survive mechanical fatigue as well as harsh temperatures and other grueling operating conditions.

According to Navitas, the GaN power ICs are significantly larger and 2.5X thicker than the last generation to improve their power-handling capabilities. The power FET inside operates at voltages of up to 650 V—typically, the upper limit for GaN—with the ability to tolerate large transient voltages of up to 800 V to aid with survival in extraordinary conditions. Navitas said the family covers a range of RDS(ON) from 35 to 98 mΩ.

While a wider safe operating area allows it to handle higher power levels during overcurrent scenarios, the ICs stand out with 2 kV of ESD protection, fortifying the system from electrostatic-discharge (ESD) events.

Targeted at 1- to 22-kW systems, the GaNSafe chips also have protected and regulated gate-drive control, with minimal gate-source loop inductance to curtail ringing and reduce the risk of damaging voltage spikes. Reducing the inductance also opens the door to reliable high-frequency switching of up to 2 MHz. Faster switching rates allow for the use of smaller transformers, chokes, capacitors, and other passives in the system, adding to power density. GaN also increases slew rate during turn-on and turn-off, limiting switching losses.

The GaN power ICs also bring very fast short-circuit protection to the table. For rapid detection of potentially harmful short-circuit conditions—also called “desaturation detection”—the chips have what Navitas calls its autonomous “detect and protect” feature, making it possible for the power IC to sense and respond to short circuits without external assistance. This feature facilitates shutdown times of 50 ns or less, said Navitas.

Rapid responses prevent unusual conditions in the power electronics from affecting the rest of the system. “Safety in this context means the power supply won’t fail,” said Oliver. “Data centers are all about uptime.”

Electromagnetic interference (EMI) is one of the risks for any high-reliability electronic system. To effectively reject it, Navitas said GaNSafe gives customers the ability to precisely and dynamically control the turn-on and turn-off speeds (dV/dt) of the circuit. Integrating programmable resistors and diodes at the input and output pins of the GaN power IC enables real-time adjustments to the dV/dt rate. This is crucial for mitigating undesired voltage spikes that can violate EMI specifications.

Don’t Overlook Data-Center Power Supplies

The TOLL package is a cut above in terms of mechanical durability and heat dissipation. Navitas said the new packaging gives its GaN power ICs more robust performance compared to multichip modules (MCMs), which require three times as many connections and have trouble staying cool. The TOLL package is also equipped with a larger, thicker copper pad to optimize heat removal.

While the TOLL package is easy to cool compared to a standard QFN, the high efficiency of GaN also rejects heat. “For every watt of power burned on the way from the power supply to the processor, that is another watt that you have to air condition away from it,” said Oliver. The rule of thumb in data centers, he pointed out, is that for every dollar you spend on electricity for a processor, you spend another cooling the system.

While the power supply itself is easy to overlook, even small efficiency gains when converting power add up for hyperscalers when the process happens as often as it does in data centers. As their fast-switching speeds reduce the area, weight, and cost of the passives in a power supply, Navitas estimates its new GaN power ICs save 5% of the LLC-stage cost, plus more than $60 of savings per power supply in electricity over three years.

Though not required, GaN power FETs are frequently paired with the latest power topologies. This includes the interleaved CCM totem-pole PFC topology with full-bridge LLC used in the latest 3.2-kW server reference design from Navitas—the CRPS185. In this case, the Common Redundant Power Supply (CRPS) is a standard form factor defined by the Open Compute Project (OCP) that includes companies such as Dell, Google, Intel, Meta, and Microsoft.

According to Navitas, the power-supply design uses the GaNSafe family of power ICs to fit 3200 W of power into a single silver box that measures 40 × 73.5 × 185mm, delivering close to 100 W/in.3 of power density.

Everything inside the power supply squeezes into a 40% smaller area than it would with legacy silicon, and it hits over 96.5% efficiency at a 30% load and 96% stretching when operating at a 20% to 60% load, better than the “Titanium” benchmark.

“The less space taken up by a power supply, the more space they [the hyperscalers] have for computer and memory chips in the server and in the rack,” said Oliver. “If you don’t go to GaN technologies, you eventually will not be able to put as many processors in each server rack as you’d like to.”

Despite its advantages over silicon, upgrading to GaN power supplies is not a cure-all for the challenges of powering AI chips, which will require many other changes up and down the power-delivery network (PDN).

Read more articles in the TechXchange: Gallium Nitride (GaN).

Comments

To join the conversation, and become an exclusive member of Electronic Design, create an account today!