Artificial Intelligence

Why AI Is Now an Enterprise Infrastructure Strategy Problem

Beyond the Chip: Why AI Is Now an Enterprise Infrastructure Strategy Problem

an abstract image depicting computer chips

Insights from NANOG 96

At NANOG 96, AHEAD’s Bill Conrades, Chris Stewart, and Michael Balasko did something refreshing.

Instead of debating AI model architectures or chasing the latest GPU announcement, they focused on what actually determines whether enterprise AI initiatives ship on time and scale reliably: power, cooling, and network infrastructure.

Their message to CIOs, CTOs, and network leaders was direct.

If you want AI to scale across the enterprise, infrastructure cannot be treated as a downstream constraint. AI infrastructure strategy must be a first-class component of AI strategy from day one.

This keynote recap highlights what matters most to enterprise leaders as AI moves from pilot to production and what that shift means for your data center, facilities, and network roadmap.

Watch AHEAD’s Full NANOG 96 Keynote

AI Infrastructure Has Outgrown Traditional Data Center Design

Bill opened with a reality most executives already feel: AI demand is growing faster than the facilities built to support it.

For years, scaling infrastructure was hard but straightforward. When the business pushed for more capacity, you could add more servers, roll in more racks, and upgrade links.

The math roughly held. That is no longer true.

In today’s environment:

A single AI training or inference platform can consume more power, cooling, and network capacity than entire data centers did not long ago.
Racks that used to draw 10–15 kW are now being designed for 100 kW and beyond, with roadmaps pointing higher still.
East–west GPU traffic overwhelms the north–south patterns most networks were built for.

The result? A widening gap between AI ambition and what buildings, utilities, and networks can safely deliver on the timelines your leadership expects.

Bill framed AI’s progression in simple terms:

Early analytics and computer vision workloads fit inside existing data centers.
The rise of generative AI turned GPU clusters into executive‑level priorities, forcing rapid retrofits.
The current “agentic” era embeds AI into products and daily workflows, turning AI from a project into a continuous demand signal.

Traditional three‑to‑five‑year data center plans are struggling to keep up. AI is moving at the speed of software. Power and construction are moving at the speed of physics and permitting.

Closing with Gartner’s Emerging Tech Impact Radar on Energy-Efficient Massive Compute, Bill pointed to direct-to-chip liquid cooling in the near term and AI Ethernet fabrics over the next few years as clear industry inflection points. As AI and HPC push beyond the limits of air-cooled infrastructure and facility cooling capacity, the window for enterprise leaders to act is now.

Power and Cooling Requirements for AI: The Decisions That Make or Break AI Timelines

Chris picked up from there with a blunt reality that enterprise IT leaders face.

When you say “we’ll just run it in the cloud,” what you are really betting on is someone else’s ability to source power, move heat, and keep GPUs online.

Whether the facility is yours, a colocation, or a hyperscaler’s, a few facts are now in play:

GPU thermal and power requirements have grown by multiples in just a few years.
Even modest AI deployments are measured in hundreds of kilowatts and often push into megawatts.
Many legacy facilities are already uncomfortable at 15–20 kW per rack, while AI designs are pushing into 60–100 kW per rack and beyond.

Above a certain point, air cooling simply runs out of room. You can only push so much air through a rack before physics pushes back. That is why liquid cooling is moving from “interesting” to unavoidable for dense AI and HPC.

Chris broke it down in plain terms:

Air‑only designs still have a role for traditional workloads and lighter AI.
Rear‑door heat exchangers and direct‑to‑chip liquid cooling are becoming standard for dense GPU racks.
Facilities are shifting to new rack standards, DC bus bars, and even high‑voltage DC distribution so they can safely deliver 100 kW+ into a single cabinet while keeping people and equipment safe.

For enterprise IT leaders, the takeaway is less about the technology choice and more about timing and risk:

If you treat power and cooling as a late‑stage detail, you risk discovering that your chosen building cannot support your AI roadmap without years of utility work.
If you bring facilities and thermal engineering into the conversation early, you can pick sites and designs that support multiple generations of AI, not just the first project.

Enterprise AI Infrastructure Case Study: AHEAD’s Libertyville Lesson

Bill shared how AHEAD faced this decision ourselves.

AHEAD Foundry, our primary integration facility in Libertyville, IL, operates just under 2MW of power and has supported traditional enterprise gear for years. When direct‑to‑chip liquid‑cooled AI and HPC systems became part of everyday work, the obvious first move was to upgrade that site.

A deeper analysis showed:

Bringing in enough additional power would require substation enhancements and new permits.
The timeline was multi‑year, far slower than AI projects and client demand.

Instead, AHEAD acquired a nearby industrial facility with 10MW of existing power and designed it specifically around AI and HPC integration:

Support for both air‑cooled and liquid‑cooled equipment
Built‑in trenching for secondary fluid networks, with containment for safe integration work
An environment engineered from day one for high‑density, liquid‑cooled racks

For clients, the impact is simple: we can integrate and validate AI infrastructure on realistic timelines without waiting for long utility projects.

That is the kind of decision many enterprises will have to make: retrofit where it makes sense, and design AI‑first capacity where it does not.

AI Network Design: From Basic Connectivity to Performance and Cost Control

Michael closed the keynote with the “unsung hero” of AI infrastructure: the network.

In a traditional environment, the network played a familiar role:

Provide reliable bandwidth into and out of the data center.
Tie a few key locations together.
Keep an eye on utilization and add capacity before things get tight.

AI has changed that picture.

Modern AI platforms treat the network less like a roadway and more like a shared memory bus between GPUs and storage:

Collective communication libraries move huge volumes of data sideways inside a data center, not just in and out.
GPU clusters rely on consistent, low‑jitter performance across thousands of links.
As facilities tap out on power and cooling, organizations increasingly spread AI workloads across multiple sites, which means more performance‑sensitive traffic over data center interconnects.

If the network is not designed with this reality in mind, you feel it in three ways:

Unpredictable timelines: Training runs that should finish in hours slip to days when congestion shows up in unexpected places.
Higher cost per experiment: GPUs sit idle while they wait on data, driving up the effective cost of each run.
Limited scale‑out options: Theoretical multi‑site designs stay on the whiteboard because the network cannot reliably support real workloads.

Michael’s point was not that every leader needs to become an expert in fabrics or co‑packaged optics. It is that you cannot separate AI planning from network planning anymore.

In practice, that means:

Building separate, purpose‑built fabrics for user access, GPU traffic, and inter‑site connectivity.
Designing around the fact that GPUs are the most aggressive packet generators you have ever deployed.
Taking advantage of emerging capabilities like enhanced Ethernet, AI‑ready switching, and co‑packaged optics that reduce power draw and support AI at rack scale without sacrificing performance.

What Enterprise Leaders Need to Know About Scaling AI Infrastructure

AHEAD’s NANOG 96 keynote underscored a simple point: AI roadmaps will only move as fast as the infrastructure beneath them. The question is no longer whether models can deliver value in a lab, but whether power, cooling, and networks can support that value at the speed your business expects.

If there is one takeaway for enterprise leaders, it is this: successful modern AI and HPC deployments require bringing new stakeholders into the conversation much earlier than in the past. The boundaries between facilities and technology have blurred, and AI infrastructure now sits squarely at that intersection.

Traditional owners of compute, networking, storage, and power remain essential. But avoiding delays, redesigns, and stalled deployments now depends on early involvement from facilities power and thermal teams, operations leaders responsible for building management systems, and project management teams coordinating trades and timelines end-to-end. In most cases, upgrades to power infrastructure and facility cooling, both liquid and air, are no longer optional. They are foundational to delivering AI at scale.

Why AHEAD Is Uniquely Positioned to Power, Cool, and Connect AI At Scale

Enterprises need a partner that can connect AI strategy to the hard realities of power, cooling, and networks. AHEAD does that by combining advisory, engineering, and lifecycle services into a single, accountable model that is built for AI at rack scale.

We design GPU-ready architectures, validate liquid-cooled racks and AI fabrics in AHEAD Foundry, and integrate those platforms with your existing cloud, data, and security investments. We then keep them running with managed AI infrastructure operations, observability, and continuous optimization, so you can track utilization, cost, and performance with confidence.

If your AI plans are beginning to outgrow what your current facilities and networks can support, now is the time to re-baseline. Connect with AHEAD to assess your AI infrastructure readiness, see our AI Factory designs in action, and build a roadmap that lets you power, cool, and connect AI at scale across your enterprise.