infrastructure, transformation, AI, platform, teams, modernization, change, build, buy, platform, CI/CD, DevOps 2.0, platform engineering, CoE, AI, IT Ops production environment

As AI systems evolve from support tools to autonomous infrastructure layers, platform engineering is undergoing a fundamental transformation. 

Enterprises driving model training, inferencing and real-time decisions now face the dual challenge of building platforms that not only scale, but also think and act with a high degree of independence — without compromising trust or governance. 

The shift from AI as a support system to AI as an autonomous infrastructure layer marks a fundamental change in how platforms are designed and managed. 

Traditionally, AI augmented human decision-making — helping teams identify patterns, automate scripts, or optimize performance. 

“However, we’re now entering an era where AI agents are not only assisting but also actively making infrastructure decisions in real time,” says Derek Ashmore, AI enablement principal at Asperitas. 

This transition brings significant complexity, requiring organizations to rethink the fundamentals of system design. 

“Agentic AI is causing a dramatic shift in systems engineering,” says Tyler Jewell, CEO of Akka. “We’re moving from decades of best practices designed to create deterministic systems where any deviation from the norm must be actively managed, to creating non-deterministic systems.” 

Ashmore outlines three areas where platform engineering must evolve: architecture, governance and operations. 

“Platforms must now be built with AI agents as first-class citizens,” he says. “This means modular, API-driven systems that expose control surfaces on which AI can safely act—whether to scale services, reroute traffic, or self-heal incidents.” 

At the same time, automation must be tempered with oversight — as AI gains more control, transparency and trust become essential. 

This means platform teams must implement safeguards — such as human-in-the-loop approvals, auditability and policy-based boundaries — to ensure reliable and explainable actions. 

Jewell notes platform teams must anticipate variability and design accordingly.  

“Autonomous decision making based upon unreliable LLMs that can hallucinate requires architects to accept uncertainty and design systems, processes, and oversight that anticipates these variances and addresses them,” he says. 

One of the most difficult engineering challenges, according to both experts, is managing real-time, AI-driven decisions at scale. 

Ashmore points to issues such as low-latency data collection, scalability constraints and control-loop stability. 

“Autonomous platforms operate in feedback loops,” he says. “Poorly tuned loops can cause instability, such as oscillating autoscaling or cascading failures.” 

From his perspective, engineering must focus on designing stable closed-loop control systems and preventing AI agents from interfering with each other’s logic. 

Jewell highlights the architectural demands of multi-agent systems, with autonomous systems of multi-agents with different agents for reasoning, supervision, guardians and execution.  

“These agents run across time and space, making multi-agent systems distributed systems that require coordination and shared state,” he says. “This creates challenges with developer productivity, reliability and scalability.” 

There’s also a financial reality that can’t be ignored: The costs of a fully autonomous agentic system grow quickly. To remain sustainable, enterprises must tightly manage the context used by AI agents. 

“Context engineering and intelligent management of agent consumption of tokens is critical to staying within a budget,” Jewell explains. 

With growing autonomy comes an even greater need for observability and transparency.  

“Telemetry provides a real-time and historical view of what the system is doing and why,” Ashmore says. 

Observability tools surface the outcomes of AI-driven actions — like scaling decisions, configuration changes, or fault remediation — allowing engineers to understand system behavior rather than treat it as a black box. 

Both experts emphasize the evolving relationship between platform engineers and data scientists as infrastructure becomes increasingly intelligent. 

“As platforms take on more intelligence-driven workloads, the relationship is evolving from a loosely coupled collaboration to a deeply integrated partnership,” Ashmore says. 

That integration includes shared responsibility for model lifecycle management, performance and compliance. 

“With the rise of MLOps and AI-integrated platforms, both roles now share responsibility for ensuring models are not only performant but resilient, observable, and governed in production environments,” he says. 

Jewell describes the emerging discipline of context engineering as key to that collaboration.  

“The software development will be the easy part,” he says. “Context engineering is the interface between getting the right data, at the right time, analyzed in the right way — whether this is prompts, knowledge, semantic meaning, dynamic transaction data, or tools.” 

He predicts context engineering will emerge as a cross-functional discipline. 

As enterprises push further into AI-driven automation, both Ashmore and Jewell stress that trust, governance and clarity must remain core priorities. 

“Successful platform teams treat AI agents not as black boxes, but as collaborative actors operating within well-defined and observable systems—empowered to act, but never beyond scrutiny,” Ashmore says. 

Ultimately, platform engineering is moving from an era of deterministic control to one of intelligent adaptability. 

“Autonomous infrastructure isn’t about removing humans from the loop,” Jewell says. “It’s about rethinking the loop entirely.” 

Tech Field Day Events

SHARE THIS STORY