Platform Engineering is the first discipline in the software space in many years that truly has the potential to enable developers and operations teams to achieve a similarly high speed of delivery without being slowed down by complicated processes and handovers. Where DevOps struggles by making everyone work like a developer and thus ignoring the reality of any organization, in Platform Engineering, responsibilities are clearly separated. The platform team serves many other dev teams by providing reusable services, delivery pipelines, and taking care of non-functional requirements such as scalability, reliability, security, and compliance.

Over the years, software development has figured out how to segment things into services, how to make them sit behind stable APIs, and how to enable them to communicate with each other, regardless of their size. It’s relatively straightforward to establish an architecture where multiple teams can work independently, basing their work on a common foundation.

Building delivery pipelines on top of a CI/CD tool is a well-addressed problem. Although this is a grey area where responsibilities aren’t always clearly separated and much can go wrong, it generally works.

However, what Platform Engineering has yet to solve is how to manage the infrastructure required for these services to run. If CI/CD is a grey area, then infrastructure is the dark side of Platform Engineering. This is simply because existing tools focus on a single, all-knowing team member rather than on the different roles within the overall team. As a result, everyone is forced to deal with low-level details that most engineers don’t ever need to see, leading only to frustration and slowdown. Here are a few scenarios that should sound very familiar to the reader:

  • The platform team uses an Infrastructure-as-Code (IaC) tool and forces developers to use it, even if they don’t want to. Consequently, every small infrastructure task lands on the platform team’s desk. Some team members might welcome this, but many wouldn’t.
  • The platform team uses an IaC tool but makes developers use a different tool because developers “can do YAML.” While YAML is just a format and is very prone to error on its own, the bigger problem is what’s in that YAML. If it contains low-level details that no one understands, there’s no real benefit, and everything reverts to the platform team.
  • Developers break free and bypass the platform team’s infrastructure tools, opting to use cloud provider offerings directly. This might initially seem to speed things up, but underneath, it’s a growing risk for poor performance, questionable security, and potential failure during the next audit.

This list could go on and on. And it gets worse: a modern organization has some roles that don’t fit the classic team-split picture: ML/AI engineers and data scientists who need to experiment quickly try things that are critical for the business, and generally don’t care much for software development processes. They also get served with infrastructure by their own tools, and any attempt to stick them into the overall delivery harness leads to unacceptable slowdowns.

For whatever reason, no one even considers the operator, who might not necessarily have a separate job title but is rather just another role assumable by anyone on the team who goes on call and has to quickly fix urgent problems at night. These people don’t have the time or mental capacity during an outage to follow a rigid process. They just need to pinpoint the problem and go back to sleep.

All these actors need tailored solutions to achieve maximum performance. The response to their requirements cannot be to offer them basic tools, as is currently happening. It’s like asking for a birdhouse and receiving hammers and saws instead. Of course, one can build anything with them, but at what overall cost, and why should they have to?

A good platform team needs to adapt to the speed and workflow of their clients, regardless of their role. They need to build a one-stop shop with a small selection of off-the-shelf solutions, not just the tools used to build them. Especially in infrastructure management, it’s crucial to understand that almost none of the platform team’s clients possess the knowledge and understanding of that infrastructure. If they did, they wouldn’t need the platform team to handle it.

One-stop shop platform engineering offers significant added value beyond basic tools. If the platform team views itself as a hardware store, it provides no extra value and becomes an impediment rather than a help. It’s time for platform teams to shift their mindset from simply providing tools to offering comprehensive, ready-to-use solutions that cater to the diverse needs of all their users. By embracing the one-stop shop model, platform engineering can truly fulfill its promise of accelerating delivery and fostering innovation across the entire organization.

Tech Field Day Events

SHARE THIS STORY