Artificial Intelligence

Preparing for Rubin: Getting Your Data Center Ready for Dense Inference and DLC

NVIDIA Rubin isn’t just “the next GPU.” It’s a density and thermals inflection point.

Hopper and Blackwell forced enterprise operators to take 40–120 kW racks seriously and to experiment with liquid cooling. Rubin-class platforms push racks into the 100–200+ kW range, with each GPU around 2.3 kW TDP and some rack-scale designs exceeding 230 kW. At this level you no longer choose whether to use direct liquid cooling (DLC): you either design for DLC or you don’t deploy Rubin.

If you want dense LLM inference at Rubin scale without cooking your data hall, you have to treat Rubin readiness as its own design track. It won’t be a light upgrade from your last GPU refresh.

This article outlines how to prepare for Rubin-generation GPUs and how AHEAD can help.

Most Enterprises Won’t Start with a Full NVL72 (And That’s OK)

Most of the hype focuses on NVIDIA Vera Rubin NVL72 racks: 72 GPUs, 36 CPUs, and 190–230 kW of IT load in a single cabinet. A full NVIDIA® DGX SuperPOD™ adds networking, Groq LPX, Vera CPU, and STX storage racks, delivering huge inference and agentic capacity — and an equally huge power and cooling budget. That’s an important upper bound, but not where most enterprises need to begin.

A few key reasons why it’s ok to start smaller:

Smaller deployments already have enormous capacity. A single Rubin GPU delivers 50 PFLOPS of low-precision inference, or about 10x that of Blackwell. Eight GPUs in a DGX Rubin NVL8 give you hundreds of PFLOPS of NVFP4 inference per node, which is plenty for initial enterprise LLM and RAG workloads.
Utilization is usually the constraint. Very few organizations have the sustained, well-governed LLM traffic to keep a full NVL72 busy. Data pipelines, storage, model lifecycle, and application integration almost always bottleneck before GPU silicon does.
Facilities lead times dwarf hardware lead times. You can often stand up a small DLC Rubin cluster in months if you have spare power and some chilled-water headroom. Re-plumbing a hall for multiple 130–200+ kW DLC racks can easily take 18–36 months, and many sites have no additional power. Meaning entire rows of legacy gear must be retired to power a single NVL72 row.

For many enterprises, 4-8 NVIDIA HGX™/NVIDIA DGX™ Rubin NVL8 systems is a far more realistic first step than a Vera Rubin NVL72 rack or SuperPOD, and still represents a huge jump in on-prem AI capability.

What 4–8 DGX Rubin NVL8s Actually Mean for the Data Center

A DGX Rubin NVL8 node roughly includes:

8x Rubin GPUs on NVIDIA NVLink™
~400 PFLOPS NVFP4 inference
Dual host CPUs (e.g., Intel® Xeon® 6)
~2.3 TB of HBM plus high-bandwidth fabric
8x 800 Gb/s ConnectX®-9 ports for east–west traffic
2x 800 Gb/s BlueField®-4 DPUs for north–south or storage

Per-node facilities requirements:

~24 kW of power at full utilization
DLC as primary cooling: Initially, you might use a liquid-to-air (L2A) CDU, but serious planning should assume liquid-to-liquid (L2L) heat rejection.

Scaling to a small pod:

4 systems
~96 kW of IT load, plus networking, storage, and management.
DLC via L2L or L2A depending on water readiness.
Typically one dense MGX rack with ~4 × 33 kW power shelves (N+1), plus additional cabinets for networking and storage.
Enough to run multiple production LLMs, internal copilots, and RAG services, supporting roughly 800–1,600 concurrent knowledge workers on a 70B parameter model at 10–20 tokens per second.
8 systems
~192 kW of IT load, plus supporting infrastructure.
DLC should be L2L; L2A is only a stopgap if facility water isn’t yet available.
Expect two dense MGX racks plus networking and storage.
Roughly 1,600–3,200 concurrent knowledge workers under similar assumptions, which is a serious multi-tenant AI pod for most enterprises.

Either size cluster can:

Serve tens of thousands of tokens per second across mixed models.
Run fine-tuning or post-training jobs alongside inference.
Support both online services (e.g. chat, agents, copilots) and offline pipelines (e.g. batch summarization, embeddings, analytics).

You don’t need a Vera Rubin NVL72 SuperPOD to feel the infrastructure impact. A 4–8 node DGX Rubin NVL8 cluster is more than enough to expose weaknesses in your power, cooling, and operational model.

Treat Any Rubin Deployment as a Rubin-Class Density Problem

You might land a few Rubin NVL8s without major facility changes, but your CTO won’t be happy when it’s time to grow and the site can’t support it. Whether you expect five nodes or fifty, plan as though you’re designing for Rubin-class density and work backward from the facility.

Per-Rack Power Envelope

Aim for 190–230 kW per rack capability in at least a subset of your whitespace, even if you start smaller.
Future generations will push toward 400+ kW per rack, so keep that in mind if you’re already investing in plant upgrades.
In addition, crossing the 400+kW boundary will then bring 800V DC input into the cabinet and require additional planning.
If your current design tops out at 30–40 kW per rack, you’re not Rubin-ready. You may land a small starter cluster, but growth will quickly saturate capacity.

Coolant Distribution

Every cabinet will require access to a technology coolant system fed by a CDU
The CDU could be in-rack, in-row, or in a gallery space; a choice determined by overall capacity and redundancy desires

MGX and ORV3 Racks Change the Physical Assumptions

Rubin doesn’t just stress power and cooling; it changes rack form factors. MGX and ORv3 were optional for Blackwell, but with Rubin they become the expected baseline.

Both MGX (19″) and ORv3 (21″) racks:

Deliver power via integrated DC busbars and 48 V power shelves, not traditional AC PDUs.
Use blind-mate DLC manifolds for coolant distribution.
Are often engineered as rack-scale systems following NVIDIA reference designs

For facilities, these are not drop-in replacements:

Footprint and clearances may differ from legacy 19″ EIA racks.
Power entry shifts to high-amperage 400–415 V 3-phase feeds into the shelves.
Equipment density plus coolant significantly increases rack weight, often challenging raised-floor ratings.

Assume at least a portion of your environment will move to MGX/ORv3, and verify that floor loading, containment, busway routing, and maintenance practices can support them.

Operating Rubin-Class Pods and Planning for Ultra

Rubin-class DLC pods fail differently than conventional air-cooled rows. Day-2 success depends on operations design as much as on facilities.

Plan for:

End-to-end telemetry: Coolant temps, flow, and pressure; GPU power and temperatures; leak detection on trays, manifolds, and under-rack lines; integration into DCIM and observability tooling.
Playbooks: What happens when a pump fails mid-inference, how to drain and service a rack without taking down the aisle, and who owns the boundary between facility water and IT loops.
Spares and serviceability: Cold plates, manifolds, hose assemblies, sensors — not just GPUs and NICs — and rack designs that allow node swaps without disassembling major plumbing.
Thermal change management: Firmware or power-profile changes across thousands of GPUs materially change thermal load and should be treated like high-risk production changes.

Even if your first Rubin deployment is a single rack of DGX Rubin NVL8s, assume you might add a Vera Rubin Ultra NVL576 rack in the next cycle. You don’t need to build for 600 kW racks today, but you can:

Reserve at least one row or cage with suitable floor loading, ceiling height, and access for a 600 kW+ rack and manifolds.
Oversize, where practical, medium-voltage gear, main busways, and primary chilled-water distribution so you can step up to NVL576-class loads without opening walls and ceilings again.
Standardize DLC practices, tooling, spares, and runbooks so your first Rubin pod serves as rehearsal.
Align with colo and facilities roadmaps so contracts and capital plans reflect the path from 100–250 kW racks today to 600 kW tomorrow.

Done well, the jump from a few DGX Rubin NVL8s to a Vera Rubin Ultra NVL576 pod becomes a capacity and business decision and not a forced data-center redesign.

How AHEAD Helps

Rubin-class deployments are co-design efforts across facilities, networking, storage, security, and AI platform teams.

AHEAD, through Foundry and our liquid-cooled rack integration practice, helps enterprises:

Evaluate existing data centers and colos for Rubin-class density and DLC readiness.
Design and build Rubin racks and 4–8 node DGX Rubin NVL8 clusters in the AHEAD Foundry™ integration facility.
Co-design and deploy full Vera Rubin racks and SuperPODs (NVL72 and beyond), including networking, storage, security, and facilities requirements.
Engineer and integrate Technology Cooling Systems tied into existing mechanical plants.
Redesign power distribution and protection for 100–250+ kW racks.
Identify and secure colo space purpose-built for higher densities.
Architect GPU-based AI clusters with the right networking, storage, and security integration, and build them at Foundry.
Deploy the instrumentation, observability, and Day-2 operations required for reliable Rubin-class AI infrastructure, including AHEAD managed services for both hardware and software.

Rubin will raise the bar for what AI-ready infrastructure means. Organizations that treat dense inference and DLC as design inputs now, rather than emergency retrofits later, will be the ones ready to exploit Rubin silicon on day one.

AHEAD’s goal is simple: when your AI teams are ready for Rubin, your data center will be too.