11 Myths About Liquid-Cooling Technology
With the rapid emergence of 50- to 100-MW AI factories to support massive AI workloads, liquid cooling has become one of the most critical requirements of nearly every data center around the globe. These facilities have already been challenged with controlling their heat and footprint. Now they must figure out how to bring in the next generation of AI superchips operating at 2,800 W and beyond.
There’s no question that the only way to handle the heat from these new AI-powered chips is to use liquid-cooling technology. That’s why this market is projected to surge from $4.1 billion in 2024 to $19.4 billion by 2031.
It doesn’t matter if you’re a chipmaker, server manufacturer, OEM, hyperscaler, or data center or hyperscale operator, they all know they need it. What they don’t always know, though, are specifics about the different liquid-cooling options and how they can maximize the benefits while at the same time keeping costs down and sustainability up.
This article aims to demystify all of those questions by laying out the most common 11 myths surrounding liquid-cooling technology.
1. Immersion and direct-to-chip liquid cooling are pretty much the same thing.
This is one of the biggest myths out there — and it’s wrong. All liquid-cooling technologies fall under either immersion or direct-to-chip (Fig. 1). Direct-to-chip is often referred to as “cold plate” cooling because it uses cold plates that sit on top of the GPU or CPU. Immersion cooling, on the other hand, submerges the servers, chips, and other equipment into large, heavy tanks of fluid.
2. Liquid cooling uses water inside the servers.
This statement is correct, but only when you’re using single-phase direct-to-chip liquid cooling. Water or water glycol mix is used as the coolant in the cold plate. Water remains in a liquid state and the ability to take away heat with this method depends on water flow. The higher the power of the chip that needs to be cooled, the greater the flow of water that’s required. This necessitates the investment of larger pipes, tubs, and connectors, as well as power-hungry pumps to continually carry the water through the system.
In contrast, neither immersion or two-phase direct-to-chip liquid cooling uses water in their systems to remove the heat away from the CPUs or GPUs (they connect to facility water loops to condense the vapor back to liquid or to cool down the liquid). Single-phase immersion uses an oily fluid while two-phase immersion utilizes dielectric fluid. In both cases, however, the server and IT equipment is submerged in heavy tanks filled with this fluid.
Two-phase direct-to-chip technology uses compact cold plates that sit on top of the GPUs. A heat-transfer fluid inside the cold plate removes the heat from the components and is contained in the cold plate. This liquid never comes into contact with the chips or other server components, unlike immersion cooling (Fig. 2).
3. If I want AI performance, I need to throw sustainability out the window.
This is absolutely not the case, as long as you pick a sustainable liquid-cooling solution. To make sure you’re building for sustainability, you need to ask the following questions:
- Does the liquid-cooling technology use water? This is an important question because a 100-MW data center using single-phase direct-to-chip cooling can consume approximately 1.1 million gallons of water every day. Water is already a scarce resource globally, so the best approach is to use a waterless system.
- What is the power usage efficiency (PUE) of the system? You want to make sure your PUE is as low as possible to have efficient operation.
- Do I need to rebuild, or can I retrofit my existing data center to accommodate the next generation of AI GPUs?
- What’s the infrastructure investment that goes along with the liquid-cooling technology? If you need large heavy tanks, pumps, and tubing, it costs money and takes up valuable space.
- What are the long-term maintenance costs? Does the liquid have to be replaced?
- What is the lifespan of the equipment that comes in contact with the liquid?
4. The use of dielectric fluid is bad for the environment.
While in the past, perfluoroalkyl and polyfluoroalkyl substances (PFAS) were considered dangerous, certain PFAS options are now considered safe. However, when dealing with any PFAS, a best practice is to use it in a contained system such as a closed loop.
>>Check out this TechXchange for similar articles and videos
If you’re housing this fluid in tanks that need to be opened during maintenance, there’s always going to be some of the fluid exiting out into the atmosphere. Ask the liquid-cooling manufacturer if their liquid ever needs to be replaced, if it’s ever exposed to the outside air, and more importantly, what their plans are for moving towards “zero PFAS” in the future.
5. The heat from GPUs can’t be reused.
There is no reason the heat generated from the AI GPUs cannot be used to heat nearby rooms or buildings, particularly in cities where everything is in close proximity. Liquid-cooling solutions such as two-phase, direct-to-chip technology have been designed to not only allow facilities to reuse this heat, but also turn it into reusable energy.
6. Liquid cooling with cold plates can create hot spots.
While pool boiling inside a cold plate has always been the holy grail of liquid cooling, up until now, no one has been able to figure out how to prevent the boiling bubbles from causing hot spots. To overcome this, ZutaCore developed a structure of fins and wick with a material that’s porous, like a sponge located between the fins (Fig. 3).
The liquid is soaked inside the sponge and the bubbles occur between the wick, liquid, and the fins. This method prevents bubbles from forming on the surface and maintains uniform cooling.
7. I can get by with just air cooling.
Traditional air-based cooling is largely considered obsolete because of the enormous amount of energy required to power the fans and chillers, as well as the valuable real estate required to house them. The advantages from moving from air cooling to liquid cooling are significant — and these advantages add up with every additional watt of computing added to a facility.
For example, a data center using only air cooling needs 1 W of cooling for every watt of computing. That means 50% of the power is going just to the cooling system! In contrast, by moving to an advanced liquid-cooling technology, every watt of cooling suddenly supports 10 W of computing.
8. A water leak won’t do any damage.
Hyperscalers are adverse to risk, and with AI servers approaching the $350K range, the use of water is risky. A leak can also significantly delay or stop production.
Just this past year, Tweaktown reported that NVIDIA's next-gen GB200 Superchip was about to ship, but then leaking from the liquid-cooling system inside of the AI server cabinet was discovered, which caused delays. In addition to leakages, the use of water can cause corrosion and erosion. Thus, it requires continues filtering and water treatment as result of biological growth.
9. Liquid cooling is limited by how hot the chips get in the future (maximum chip power).
Some liquid-cooling options do have limitations that will prevent them from scaling in the future as chips move to higher working power. That’s why it’s important to use future-proof methods such as pool boiling used in two-phase cold plates. Inside the cold plate is a pool of heat-transfer fluid; when heat is generated from the chip, the liquid begins to boil and the heat turns into vapor.
The liquid always remains at a consistent boiling temperature, regardless of chip power, ensuring predictable thermal performance. As a result, this cooling method is scalable and able to cool hotter and hotter chips as they become available. Just like boiling a pot of water over the stove, it doesn’t matter if you turn the heat up 3X because the liquid will always stay at boiling temperature, avoiding the need for new equipment or infrastructure change.
10. Liquid cooling requires a lot of maintenance.
Maintenance cost depends on the liquid-cooling approach. For example, if you use large heavy tanks that need a forklift to lift servers out of the tank, then that obviously will be costly. To figure out ongoing maintenance costs, it’s important to look at the whole system and identify anything that could potentially fail, and if so, how it could be fixed. These include tubes, pumps, tanks, and the need to replace fluid.
11. If I don’t have a facility water loop, I can’t use liquid cooling in my facility.
While some data centers have a facility water loop, it’s possible to deploy liquid cooling without one. This can be achieved with a liquid loop that relies on ambient air, rather than facility water, to condense the vapor back into liquid form. Such an approach enables the deployment of liquid-cooling infrastructure independently of the building’s existing water systems.
Conclusion: Liquid Cooling and Sustainability
Hopefully the common myths I outlined above helped explain some of the mystery around liquid cooling. It’s an exciting time to be involved in the ramp up of AI, even though it was faster than anyone ever imagined. There certainly has been a learning curve, but the industry can now clearly see a path toward continuing their focus on sustainability while also getting the computing power they need for the AI future.