Platform Engineering 2.0: Manage AI Costs and Risks Without Rebuilding Infrastructure

Platform engineering teams have spent the better part of a decade building something admirable: Standardized Kubernetes clusters, CI/CD pipelines, internal developer platforms (IDPs), and self-service infrastructure that let developers ship applications safely, efficiently, and repeatedly. That foundation held up well — until AI arrived and changed every assumption underneath it.

The debate over DevOps versus platform engineering now feels quaint.

A far more consequential challenge has taken its place: How do you build, govern, isolate, and operate AI workloads on infrastructure that was never designed to carry them? Broadcom and PlatformEngineering.org say there is an answer in Broadcom’s new Platform Engineering 2.0 framework. With its creation of the framework, Broadcom says it drew on its collective enterprise experience across AI, software and private cloud infrastructure when it created the framework.

Platform Engineering 2.0 also builds upon the successful foundations of Platform Engineering 1.0, rather than replacing them. The framework was designed to serve as a natural progression rather than a departure from existing platform investments.

The Gap that AI Exposed

Platform Engineering 1.0 was built for containerized, developer-centric, human-paced workflows. AI broke that model in three distinct ways.

First, AI introduces prompts and retrieved content as a live, unpredictable input channel. Data is no longer just data — it is an executable influence on model output behavior, making traditional isolation properties — which once held for compute and storage — suddenly unreliable.

Second, AI workloads span multiple execution planes at once: SaaS APIs, fine-tuned hosted models, on-premises inference, retrieval layers, and tool-calling agents that reach deep into existing systems. The platform was never designed to govern across all of those simultaneously.

Third, and most critically, AI moves the trust boundary away from the application — and into the interplay between models, tools, data sources, humans, and non-human agents steering them. That is not a gap you can patch. It is a structural problem. In addition to introducing new security and governance concerns, AI also creates operational fragmentation as teams independently adopt different models, tooling, retrieval approaches, and observability practices.

The consequences are already landing in enterprise budgets. At June’s FinOps X conference, Mike Eisenstein, Accenture’s FinOps Global Practice Lead, relayed a CIO’s account of Claude API costs escalating from $250,000 per day to $400,000 per day in a single month. As J.R. Storment, Executive Director of the FinOps Foundation, put it plainly: “AI’s rate of change is exceptionally fast. What’s a good policy one day can be out of date the next week.”

This is not sustainable — and application-level fixes won’t solve it.

Why App-Level Fixes Don’t Scale

The instinctive response has been to let each application team wrap its own guardrails around its AI use case. Chatbots add prompt hardening. Document tools bolt-on access checks. Code assistants get separate logging. Every team builds its own perimeter.

That reflex has structural limits. Policy interpretation fragments across teams. For example, “no PII to external models” means something different in marketing than it does in finance. Security leaders cannot answer basic questions about which models are running, where, and under what policies, because the answers are scattered across dozens of services and vendor consoles.

Shadow AI compounds this further. It is more dangerous than shadow IT ever was — not just because of data exposure risk, but because of the cost profile it creates.

AI governance confined to documents and application code is not enough. Security responsibilities must move down into the platform itself.

Two of those pillars are foundational: model governance as a control plane and workload isolation as a structural guarantee.

Model Governance as a Control Plane

Most enterprises now run multiple models across multiple providers. As the AI security firm Airia notes, “model-specific governance breaks at scale.” It proposes “a control layer that sits above the model level — one that enforces policy, logs decisions, and monitors behavior regardless of which underlying model is executing a task.”

Platform Engineering 2.0 turns that principle into a concrete service: a central model registry and routing layer; unified authentication? Policy enforcement applied uniformly across OpenAI, Anthropic, or any on-premises model, and a single pane of glass for audit, observability, and compliance. Developers request model access through the platform; the platform maps those requests to risk tiers and approval workflows. Think of it as infrastructure as AI — the platform becomes both the rulebook and the referee.

Workload Isolation as a Structural Guarantee

If model governance defines what AI is allowed to do, workload isolation defines where it can do it and how far failures can spread. That means dedicated isolation domains across experimental sandboxes, internal workloads, and customer-facing regulated data environments. And it means zero-trust service identities bound to models, agents, and tools — because prompt injections will succeed, and lateral movement must be structurally difficult when they do.

The Agentic Frontier

These two pillars converge on something larger. Autonomous AI agents are arriving as a new class of platform user — with no prior persona to inherit from and no human in the loop to catch a misconfigured scope. The platform must support both human developers and AI agents simultaneously, as first-class consumers.

“Platform engineering is no longer a software delivery discipline, Platform Engineering 2.0 whitepaper states. “It is becoming the operational foundation for the enterprise’s agentic future.”

The Platform Transformation Imperative

Platform Engineering 2.0 is an evolution, not a reset. The foundations — Platform as Product, golden paths, self-service IDPs, and shift-left security — remain essential. What changes is who the platform serves, what it must do, and how fast it must adapt.

The teams that master this transition hold something more valuable than a delivery pipeline. They hold the substrate for their organization’s AI-native future. Organizations that do not integrate agentic AI directly into their platform control plane will fall significantly behind.

Getting there will not be easy. But the direction is clear, and the cost of delay — in security exposure, operational fragmentation, and runaway AI spend — is already visible. The platform must evolve, and the time to start is now.

Platform Engineering 2.0: Manage AI Costs and Risks Without Rebuilding Infrastructure

The Gap that AI Exposed

Why App-Level Fixes Don’t Scale

Model Governance as a Control Plane

Workload Isolation as a Structural Guarantee

The Agentic Frontier

The Platform Transformation Imperative

SHARE THIS STORY

FOLLOW US

Platform Engineering 2.0: Manage AI Costs and Risks Without Rebuilding Infrastructure

The Gap that AI Exposed

Why App-Level Fixes Don’t Scale

Model Governance as a Control Plane

Workload Isolation as a Structural Guarantee

The Agentic Frontier

The Platform Transformation Imperative

SHARE THIS STORY

RELATED STORIES:

FOLLOW US

NEWSLETTER SIGN UP