Spec-Driven AI Development: How We Stopped Shipping AI Code Nobody Could Maintain

We were six weeks into a new platform initiative. AI coding tools were everywhere. Features were shipping, PRs were merging, and velocity was up. However, two months later, I was staring at a codebase nobody could fully explain. Inconsistent IDs across services. Error shapes that differed across endpoints from ‘the same spec’ — except there was no spec. There were three prompts in three chat windows nobody had saved.

The AI hadn’t done anything wrong. We did. We’d treated AI codegen as a shortcut past the planning process instead of as a faster execution of a planning process that still needed to happen. That’s when we found Spec Kit and SQUAD.

The Real Problem: Institutional Amnesia

The failure mode isn’t the AI writing bad code. Modern AI coding tools write pretty good code, most of the time. The failure mode is institutional amnesia.

When an engineer decides to use UUIDs for primary keys, that decision lives in their head and maybe in a PR description. When an AI makes the same decision because you told it to “build a REST API,” the decision lives nowhere — it just manifests in the code. In a traditional workflow, tech stack choices, error handling conventions, authentication approach and API versioning strategy get captured in ADRs. In an AI-first workflow, they get swallowed by the chat window.

Figure 2: Solving Invisible AI Decisions via Spec-Driven Development

What Spec Kit Does in a DevOps Context

Spec Kit is a CLI maintained by GitHub. Think of it as a structured documentation pipeline: You put a feature description in one end, and five phases of spec artifacts come out the other.

Figure 3: The Spec Kit Five-Phase Pipeline — From Feature Description to a Versioned Artifact Trail in Specs

The part that matters most for DevOps teams isn’t any individual phase — it’s the artifact trail.

The specs/ folder tells you what the feature does (spec.md), why it’s built that way (research.md), what the API contract looks like (contracts/api.md) and the full ordered task list (tasks.md). These are real files in your repo. They’re version-controlled, searchable and they answer the ‘why did we do it this way’? A question that your 2 a.m. incident call always gets stuck on.

SQUAD: The Multi-Agent Team That Executes the Spec

SQUAD is a separate tool built by Brady Gaster. Where Spec Kit creates the planning artifacts, SQUAD executes them using four specialized agents.

Figure 4: SQUAD Architecture — Four Single-Purpose Agents, one Source of Truth

The key design decision: No agent tries to do everything. Linus writes, Livingston reviews. They never share the same context window. An agent reviewing its own work catches almost nothing, because the gaps in its generation are the same gaps in its review. Livingston has never seen Linus’s reasoning — only the output — which is the same adversarial posture a good code reviewer has.

Install SQUAD: npm install -g @bradygaster/squad-cli@latest

The Constitution: Highest ROI, Lowest Adoption

The constitution is a one-time setup per project. You run:

/speckit.constitution create Python FastAPI application

You get a Constitution.md, encoding your team’s non-negotiables: Test runner, error shape standard, ID strategy, API versioning. Here’s what it does in a SQUAD workflow: Basher reads it before accepting any plan. If the plan proposes integers for primary keys (against principle I), Basher blocks it and explains why. The violation gets logged in decisions.md.

This is policy as code for your development process. The same way a Rego policy in OPA enforces infrastructure standards at deploy time, the constitution enforces development standards at design time. It works in natural language and costs nothing to set up. For platform teams standardizing AI codegen across SQUADs, it’s the enforcement mechanism you didn’t know you needed.

What Livingston Actually Catches

Once the spec folder exists, SQUAD runs execution. Here’s what Livingston’s review produces on the TODO API from our example — built from the spec, not from vibes:

LIVINGSTON REVIEW REPORT

SPEC COMPLIANCE

✅ 17/17 acceptance scenarios have test coverage

✅ Constitution I: UUIDs used for all primary keys

✅ Constitution III: 93% test coverage (gate: 90%)

SECURITY

✅ No raw SQL concatenation found

✅ Input validation on all route handlers (Pydantic v2)

⚠️ SQL injection surface via sort_by parameter — BLOCKING

QUALITY

✅ Error shapes consistent with Constitution II standard

✅ Async/await correctly used in all route handlers

⚠️ Missing test: pagination reserved parameters return 501

STATUS: 2 blocking issues require resolution before merge.

The SQL injection finding is real: Sort_by query parameters passed directly to ORDER BY are a textbook OWASP injection vector. Linus missed it. Livingston caught it because Livingston has a security checklist that runs every time. The missing test for US4 Scenario 3 was found by checking spec.md line by line. Neither finding would have surfaced in a typical AI codegen workflow. They’d have shipped.

How This Changes Incident Response

Here’s the concrete difference Spec Kit makes when things go wrong.

Before:

“Why is this endpoint returning integers for IDs? I don’t know, the AI probably just did it that way. Who made this call? Nobody, really.”

After:

Run grep -r “integer” specs/ — found in research.md: “Auto-increment integer IDs considered and rejected. Reason: Enumeration attack surface. UUID selected per Constitution Principle I.” The resolution time drops from a week of archaeology to 20 minutes of reading. At 2 a.m. during an incident, that’s the difference between a bad night and a catastrophic one.

How it Compares to the Alternatives

Approach	Traceability	Consistency	New Eng. Ramp-Up	Incident Debug
Just use Copilot inline	None	Depends on the day	Weeks of archaeology	Archaeology
Wikis + ADRs (manual)	Good (if maintained)	Low (nobody updates)	Days to weeks	ADRs are stale
Single-agent ‘build it’	None	Varies by context	Archaeology	Archaeology
Spec Kit + SQUAD	Full, versioned	Enforced by constitution	30 minutes	Read the spec folder

The ‘full, versioned’ row deserves a note: The spec folder is a first-class citizen in version control. You can git blame it. You can diff it in PRs. You can grep it during incidents. It’s not a separate system — it lives next to the code and moves with it.

The DevOps Readiness Checklist

When I look at a team adopting AI-assisted development, these are the things I check:

A Constitution.md in version control, owned by a named person, updated when standards change.

Spec Kit artifacts committed to specs/ in the same repo as the code they describe.

CI rule that flags PRs where code files change without corresponding spec file updates.

SQUAD’s decisions.md treated as an architectural decision log — committed, reviewed, referenced.

Livingston’s review report in the PR as a machine-generated comment, human-reviewed before merge.

Postmortem template that includes ‘what does research.md say about this decision’? as a standard question.

New engineer onboarding — read Constitution.md + research.md before touching code.

Spec folder included in backup and disaster recovery — it’s an artifact, not a scratch pad.

Miss the first three and you get the benefits of AI speed without the traceability you need when things break.

Advice for the DevOps Team Starting Today

Start with the constitution. Before the next feature gets built, run /speckit.constitution and spend one hour writing down the five or six things your team should never have to redecide. That document is the most valuable thing you’ll produce this month.

Then, for the next feature, run the full four-command pipeline before touching the code. This adds maybe two hours to the front of a feature that probably takes a week. In return, you get the spec folder for the life of the project.

The AI coding problem is not a code quality problem. It’s a knowledge management problem. The code is usually fine. The decisions that produced the code are invisible. Spec Kit and SQUAD make them visible.

Spec-Driven AI Development: How We Stopped Shipping AI Code Nobody Could Maintain

The Real Problem: Institutional Amnesia

What Spec Kit Does in a DevOps Context

SQUAD: The Multi-Agent Team That Executes the Spec

What Livingston Actually Catches

How This Changes Incident Response

How it Compares to the Alternatives

The DevOps Readiness Checklist

Advice for the DevOps Team Starting Today

SHARE THIS STORY

FOLLOW US

Spec-Driven AI Development: How We Stopped Shipping AI Code Nobody Could Maintain

The Real Problem: Institutional Amnesia

What Spec Kit Does in a DevOps Context

SQUAD: The Multi-Agent Team That Executes the Spec

What Livingston Actually Catches

How This Changes Incident Response

How it Compares to the Alternatives

The DevOps Readiness Checklist

Advice for the DevOps Team Starting Today

SHARE THIS STORY

RELATED STORIES:

FOLLOW US

NEWSLETTER SIGN UP