From Conversations to Features: Using AI to Write Better Specs

A workflow for turning what people say into what gets built

Feb 17, 2026

My last post covered Bedlam from the developer’s side. This one’s for the PMs.

ICECAP: Interview, Codify, Extract, Compose, Assemble, Prove. Six steps, but PMs really own three of them. Interview starts it. Codify turns the conversation into specs. Prove closes the loop with a demo. The middle three, Extract through Assemble, are where developers and the AI do their work. You’ll see cards moving on the board. You don’t need to be in the kitchen for that part.

Interview

You know how to do this. Sit down with the stakeholder, have a conversation about what they need. In civic tech this is usually someone who’s been doing their job for twenty years and knows exactly what they want but has never had to say it in terms a computer understands. Fine. You’re not asking them to write requirements. You’re having a conversation.

Record it. Transcribe it. That transcript is more valuable than any traditional requirements doc because it has the customer’s actual words. Not what someone interpreted. Not what survived committee review. The real thing.

Codify

Take the transcript, feed it to an LLM, get Gherkin feature files back. Plain English, Given-When-Then format. This is where most people get it wrong, because left alone an LLM will generate everything it can think of. API endpoints. Database schemas. Non-functional requirements about response times. Stuff that’s technically correct but useless for what we need.

What we need is features we can explain back to the customer during Prove. Mrs. Rodriguez from Planning does not care about your REST endpoints.

So you constrain the prompt. Tell it: “Generate only functional behavioral specifications. Do not generate API endpoints, database schemas, performance benchmarks, or system architecture. Keep scenarios focused on user-visible behavior that can be demonstrated to the stakeholder.” You can also feed it the names and roles of the interviewees, reference existing features for context, and tell it what module this belongs to. More context in, better specs out.

The AI drafts. You review. Can you show this to the person you interviewed and have them follow it? If not, adjust and regenerate. This is iterative, not one-shot. Think of it like scoping a work order. You wouldn’t tell a contractor “build me a house.” You’d say what’s in scope and what’s not. Same deal. The AI will absolutely tile your bathroom ceiling if you don’t tell it that’s not part of the job.

Tracking the Work

Features map to kanban cards. Complex features with multiple scenarios become epics; individual behaviors become tasks. Same board you’ve always used, just fed by a better source. The difference is what’s on the cards. Instead of “as a user I want to upload files,” you get specific testable behaviors with acceptance criteria baked in. Given-When-Then IS the acceptance criteria. No ambiguity about what “done” means.

Extract, Compose, Assemble

Covered this in the Bedlam post. Short version: the AI extracts testable elements from the features, composes code and test bindings, assembles everything into a deployable package. Cards move across the board. Pipeline enforces coverage. Because your features are written in the customer’s language, the developers have clear direction instead of guessing what “intuitive interface” means.

Prove

This is the step most agile teams skip. You demo the working software to the person you interviewed. Not to a proxy. To the actual human who told you what they needed.

“Remember when you said contractors need to upload photos with GPS data? Here it is. Upload a photo, inspector gets notified. Forty-eight hours without a review, system flags it.” You’re reading the feature file back to them and they’re watching it work. The features are the demo script.

This is exactly why you told the AI to skip the API specs during Codify. Nobody from Public Works wants to watch you POST to /api/v2/permits/inspections/photos. You’d lose the room in seconds.

Why This Matters

Small civic orgs can’t afford eighteen months of requirements, development, and testing. By then the regs changed, the grant expired, or the department head retired. ICECAP compresses the timeline because the AI handles the grunt work. The time you used to spend writing specs now goes into the parts where human judgment matters: good interviews, thoughtful review, and proving it works to real people. Those are your three stops. Interview, Codify, Prove. The rest is machinery.

* * *

Interview people. Let AI draft the specs but keep them in the customer’s language. Track work on the board. Let the devs cook. Then prove it works. The AI doesn’t replace the PM. It replaces the busywork that kept you from doing the job that actually matters.

Mark’s Wiki

Discussion about this post

Ready for more?