
We’re nearly a decade into the infrastructure-as-code mindset, and if you’re feeling overwhelmed by the complexity of managing it at scale, you’re not imagining things. It’s genuinely hard, and you’re not alone.
The industry has spent years pushing a code-first mentality: Codify everything, create repeatable templates, standardize your infrastructure patterns. Build your VPC modules, your EC2 configurations, your networking templates. Package them up, hand them to your consumers, and watch the magic happen. It’s a common pattern, and for many teams, it’s worked.
But here’s the challenge that lots of platform teams are facing right now: They have been doing IaC for years. They have their preferences. They have their established workflows. Their devs are working in CDK, their Ops teams are committed to Terraform, and everyone has their own method (whether good or bad) they’ve developed over time. And reigning all of that in? That’s proven to be extraordinarily difficult, especially for teams handling very large or very complex cloud environments.
DevOps and Platform teams now face what can feel like an impossible mandate: build new infrastructure the right way, while simultaneously fixing years of infrastructure built every other way.
But how do they do that, exactly?
Setting Standards is Easy. Enforcing Them Isn’t.
Setting standards is one thing. Enforcing them is another entirely. But the real complexity comes when you’re this many years into your IaC journey and you need to course correct.
The questions every platform team is asking:
- How do we govern what was created outside our IaC pipelines?
- How do we detect and handle drift that inevitably occurs?
- How do we handle changing or updating our modules at scale?
- How do we standardize when acquiring new cloud environments or inheriting other business units?
Platform teams now have to build and support new infrastructure for teams in this ideal codified manner, while simultaneously going back and revising the code that has already been created. They’re keeping legacy infrastructure up to date, finding vulnerabilities in patterns that were set years ago, and trying to maintain consistency across it all.
It’s a battle between codifying their legacy resources and keeping a handle on their new resources, and they’re fighting on both fronts at once.
And in a reality where most teams use IaC, but only one-third have codified more than 75% of their infrastructure, there are inevitably a lot of resources out there left unmanaged.
Too Many Tools, Not Enough Alignment: The Reality of Cloud Operations
Frankly, the tools and platforms that are out there aren’t as comprehensive as they need to be. The landscape is bifurcated. You have your security tools over here, your infrastructure tools over there, your workflow tools in another corner, and your code repositories somewhere else.
How do you bring all of that together? And not only that, how do you bring it together while accomplishing all of these goals simultaneously:
- Setting and enforcing standards across teams
- Correcting day two infrastructure that’s already deployed
- Governing and managing what’s running in production
- Shifting left to govern deployments before they reach production
For most organizations, this challenge spans four or five different tools across several different teams. Their security team is viewing and scanning the infrastructure that’s already there. Their platform team is advocating the shift left mentality. And then they have multiple application teams trying to consume this infrastructure and push to production as quickly as possible. Each group is faced with the challenge of continuously updating its code to keep systems stable and applications online. Your tools shouldn’t be hindering collaboration, but in the real world, they often are.
The result? Exhaustion and teams that get so buried in the complexity of managing IaC that they struggle to see a way forward and admittedly, sometimes find themselves giving up on it altogether.
If it Feels Hard, That’s Because it Is
Infrastructure-as-Code was designed to make infrastructure management more consistent, more repeatable, and more scalable. It succeeded at that. But consistent doesn’t mean simple, and repeatable doesn’t mean effortless, especially not when you’re managing the accumulated decisions of dozens of teams over nearly a decade.
And if it feels hard, that’s because it genuinely is challenging. You’re not doing something wrong. This is simply the reality of managing infrastructure at scale, in a world where codification has become the standard, but comprehensive solutions haven’t kept pace.
Stuck? Here are Three Questions to Ask Yourself
Rather than trying to solve everything at once, begin with three fundamental questions:
- Do we have visibility into everything deployed, IaC or not?
You can’t govern what you can’t see. Do you have confidence that every deployed resource is tracked, versioned, and linked back to code? Or are there still “shadow” environments, ad-hoc fixes, and resources created outside your pipelines? Visibility isn’t just about Terraform state; it’s about knowing what exists and who owns it.
- How connected are our ‘day two’ operations and ‘shift-left’ workflows?
If these are happening in separate tools with separate teams, you’re fighting an uphill battle. Look for ways to create tighter integration between runtime infrastructure management and pre-production controls.
- Are we using too many tools to solve the same problems?
It’s easy to end up with overlapping scanners, linters, policy engines, and IaC wrappers, all meant to enforce consistency but each adding overhead. Which of these tools actually drive adoption and alignment, and which just add noise? Consolidating even one layer of your toolchain can simplify workflows and reveal where the real gaps are.
What Does the Path Forward Look Like?
The IaC journey has given us incredible capabilities, but it’s also introduced real complexity that can’t be ignored. If you’re navigating multiple teams with different preferences, managing both legacy and new resources, and trying to maintain security and standards across all of it, recognize that it’s legitimately difficult work. But there is a path forward, and a better way.
Acknowledging and understanding the challenge is the first step toward addressing it effectively.
You’re not alone in this, and the complexity you’re experiencing is a sign of growth and scale, not failure. The next step? For us to collectively build and embrace the comprehensive approaches our infrastructure deserves.
