CDUs: The Brains of Direct Liquid Cooling

By Gordon Johnson, Senior CFD Manager, Subzero Engineering.

2 weeks ago Posted in Power & Cooling

Traditional air cooling has hit its limits as rack power densities surpass 100 kW due to the relentless growth of AI and high-performance computing (HPC) workloads. Already, CPUs and GPUs exceed 700–1000 W per socket, while projections estimate 1500 W+ going forward. Fans and heat sinks are just unable to handle these thermal loads at scale; air cooling has hit its limits.

Hybrid cooling strategies are becoming the only scalable, sustainable path forward.

Single-phase direct-to-chip (DTC) liquid cooling has emerged as the most practical and serviceable solution, delivering coolant directly to cold plates attached to processors and accelerators. However, Direct Liquid Cooling (DLC) cannot be scaled safely or efficiently with plumbing alone. The key enabler is the Coolant Distribution Unit (CDU) – a system that integrates pumps, heat exchangers, sensors, and control logic into a coordinated package.

CDUs are often mistaken for passive infrastructure. But far from being a passive subsystem, they act as the brains of DLC, orchestrating isolation, stability, adaptability, and efficiency to make DTC viable at data center scale. They serve as the intelligent control layer for the entire thermal management system.

Intelligent Orchestration

CDUs do a lot more than just transport fluid around the cooling system. They think, adapt, and protect the liquid cooling portion of the hybrid cooling system. They maintain redundancy to ensure continuous operation, control flow, and pressure using automated valves and variable speed pumps, filter particulates to protect cold plates, and maintain coolant temperature above the dew point to prevent condensation. They contribute to the precise, intelligent, and flexible coordination of the complete thermal management system.

Because of their greater cooling capacity, CDUs are ideal for large HPC data centers. However, because they must be connected to the facility's chilled water supply or other heat rejection source to continuously provide liquid to the cold plates for cooling, they can be complicated.

CDUs typically fall into two categories:

· Liquid to Liquid (L2L): Large HPC facilities are well-suited for high-capacity CDUs known as L2L. Through heat exchangers, they move chip heat into the isolated chilled water loop, such as the facility water system (FWS).

· Liquid to Air (L2A): For smaller deployments, L2A CDUs are simpler but have a lower cooling capacity. By utilizing conventional HVAC systems, they transfer heat from the returning liquid coolant from the cold plates to the surrounding data center air by using liquid to air heat exchangers rather than a chilled water supply or FWS.

Isolation: Safeguarding IT from Facility Water

Acting as the bridge between the FWS and the dedicated technology cooling system (TCS), which provides filtered liquid coolant directly to the chips via cold plate, CDUs isolate sensitive server cold plates from external variability, ensuring a safe and stable environment while constantly adjusting to shifting workloads.

One of L2L CDU’s primary functions is to create a dual-loop architecture:

· Primary loop (facility side): connects to building chilled water, district cooling, or dry coolers.

· Secondary loop (IT side): delivers conditioned coolant directly to IT racks.

CDUs isolate the primary loop (which may carry contaminants, particulates, scaling agents or chemical treatments like biocides and corrosion inhibitors – chemistry that is incompatible with IT gear) from the secondary loop. As well as preventing corrosion and fouling, this isolation offers operators the safety margin that operators need for board-level confidence in liquid.

The integrity of the server cold plates is safeguarded by the CDU, which uses a heat exchanger to separate the two environments and maintain a clean, controlled fluid in the IT loop. Because CDUs are fitted with variable speed pumps, automated valves, and sensors, they can dynamically adjust the flow rate and pressure of the TCS to ensure optimal cooling even when HPC workloads change.

Stability: Balancing Thermal Predictability with Unpredictable Loads

HPC and AI workloads are not only high power, they are also volatile. GPU-intensive training jobs or changeable CPU workloads can cause high-frequency power swings, which without regulation, would translate into thermal instability. The CDU mitigates this risk by controlling temperature, pressure, and flow across all racks and nodes, absorbing dynamic changes and delivering predictable thermal conditions.

The CDU absorbs fluctuations by stabilizing temperature, pressure, and flow across all racks and nodes, regardless of how erratic the workload is. Sensor arrays ensure the cooling loop remains in accordance with specifications, while variable speed pumps modify flow to fit demand, and heat exchangers are calibrated to maintain an established approach temperature.

Adaptability: Bridging Facility Constraints with IT Requirements

The thermal architecture of data centers varies widely; some use warm-water loops that operate at temperatures between 20 and 40°C. By adjusting secondary loop conditions to align IT requirements with the facility, the CDU adjusts to these fluctuations. The CDU uses mixing or bypass control to temper supply water. It can alternate between tower-assisted cooling, free cooling, or dry cooler rejection depending on the environmental conditions, and it can adjust flow distribution among racks to align with real-time demand.

This adaptability makes DTC deployable in a variety of infrastructures without requiring extensive facility renovations. It also makes it possible for liquid cooling to be phased in gradually – ideal for operators who need to make incremental upgrades.

Efficiency: Enabling Sustainable Scale

Beyond risk and reliability, CDUs unlock possibilities that make liquid cooling a sustainable option.

By managing flow and temperature, CDUs eliminate the inefficiencies of over-pumping and over-cooling. They also maximize scope for free cooling and heat recovery integration such as connecting to district heating networks and reclaiming waste heat as a revenue stream or sustainability benefit. This allows operators to simultaneously lower PUE (Power Usage Effectiveness) to values below 1.1 while simultaneously reducing WUE (Water Usage

Effectiveness) by minimizing evaporative cooling. All this, while meeting the extreme thermal demands of AI and HPC workloads.

CDUs as the Thermal Control Plane

Viewed holistically, CDUs are far more than pumps and pipes. They are the thermal control plane for thermal management, orchestrating safe isolation, dynamic stability, infrastructure adaptability, and operational efficiency.

They translate unpredictable IT loads into manageable facility-side conditions, ensuring that single-phase DTC can be deployed at scale, enabling HPC and AI data centers to evolve into multi-hundred kW racks without thermal failure.

Without CDUs, direct-to-chip cooling would be risky, uncoordinated, and inefficient. With CDUs, it becomes an intelligent and resilient architecture capable of supporting 100 kW and higher racks and the escalating thermal demands of AI and HPC clusters.

As workloads continue to climb and rack power densities surge, the industry’s ability to scale hinges on this intelligence. CDUs are not a supporting component. They are the enabler of single-phase DTC at scale and a cornerstone of the future data center.

CDUs: The Brains of Direct Liquid Cooling

By Gordon Johnson, Senior CFD Manager, Subzero Engineering.

Key legal risks and considerations for UK data centre providers and their customers

Building Sustainable AI Infrastructure: Lessons from Infra/STRUCTURE 2025

Designing smarter power systems: how integration is redefining the data centre

Multimotor Plenum Fans: The Smarter Way to Upgrade Air Handlers

From Digital Dreams to Physical Reality

"Redefining the Journey: Tech-Agnostic Microgrids for Data Centres"

The silent guardian ~ How resistors protect data centres from electrical stress.

From ‘skills gap’ to ‘skills evolution’ - rethinking capability in data centre design