How I split the work for AI agents depends on how well I understand what I’m building. Before any agent writes code, the work has been broken down into phases — bounded steps with defined inputs and a reviewable result.
The agent and I do this planning together. We start by discussing the problem and writing an issue description. That description becomes the seed for the agent’s first draft — an analysis, a proposed breakdown, a set of phases.
The results vary. Sometimes I have strong opinions about the approach and steer accordingly. Sometimes the defaults are sensible and I let them stand. Sometimes the agent’s understanding of the issue is so different from mine that the analysis is completely off and I need to explain more, rewrite the issue description, and start over.
What matters is that we discuss the decisions and write them down, so the reasoning is in the artifacts when I return to a feature weeks later.
When the architecture is clear
Some work is structurally predictable. The domain is understood, the patterns are established, the layers are defined. A new feature needs a domain model, a repository, a service, an API endpoint, a UI component. We’ve built this shape before. The decisions are mostly made.
In these cases, I split by horizontal layer. Domain first — the types, the rules, the pure logic that doesn’t touch any external system. Then infrastructure — repositories, API clients, database access. Then presentation — the endpoints, the UI. Each layer builds on the one below it.
This works because the structure is settled. The agent stays within one architectural concern at a time, which means it uses existing patterns and conventions consistently instead of making its own choices. For example, the first two phases of a feature might be pure domain types and parsing functions — no I/O, fully testable in isolation — with external system integration coming only in phase three.
Each phase is independently reviewable, independently mergeable. At the end of each phase, separate reviewer agents check the output for architecture, style, and correctness before I look at it.
This assumes I know the right structure upfront, which is only true for problems I’ve seen before.
When the domain is unclear
This has gone wrong for me more times than I’d like to admit. I envision a domain model, work through the cases on paper, discuss it with the AI. The horizontal breakdown looks solid. Then I work through all the layers up to the UI and realize there are things the interface needs that the model doesn’t support — not in a minor way, but structurally. The domain was wrong, and I built four layers on top of it.
Vertical slices are how I prevent this. Each phase is a thin end-to-end scenario: one user story, implemented all the way from domain model through to UI. The first slice is deliberately simple. It forces a concrete domain model into existence — not the right model, but a starting point. And because it reaches the UI, the mismatch shows up immediately, not after all the layers are already built.
I ran into this on a document management feature — bulk import, versioning, editing workflows, permission requests. I’d had problems with the horizontal approach before, so I decided to test both: I started the horizontal and vertical breakdowns at the same time and moved them forward in parallel, switching between them. The horizontal approach got me through all the layers, and then the UI revealed structural gaps that required refactoring across every one of them. The vertical slices never hit that problem — each iteration grew the model where it needed to grow, and nothing forced a cross-layer rewrite. I abandoned the horizontal attempt.
With vertical slices, the scenarios are all defined upfront. I know the progression. What changes between phases is the implementation, not the plan.
The agent takes the next scenario and the already-implemented code, and figures out what to change and how to grow the model for the new piece. I often don’t need to steer this explicitly — the agents have clear architectural constraints: functional core with imperative shell, hexagonal architecture, explicit separation between domain types, pure operations, services, and infrastructure.
Those patterns give the agent a shape to follow when modeling the domain, and the shapes tend to come out clean. I do give feedback when the agent is too rigid or too loose with the model, but more often than not the structure is right.
I still review each phase, and sometimes I see an opportunity to reshape things — a better understanding of the problem, an insight that only became visible after a few slices. The structure gives me that opening without requiring me to take it every time.
What changes when you stop writing code
There’s a tension I haven’t fully resolved here. When I wrote code myself, understanding the domain happened through the act of implementation — the tacit knowledge I got from being in the material, making decisions at the keystroke level.
When an agent implements, I get some of that understanding through review, but it’s not the same.
Even before working with agents, I tended not to specify features in full detail upfront, because I knew the shape would change as soon as we started experiencing the model. That instinct hasn’t gone away — if anything, it’s stronger now.
I’ve been thinking about whether there are ways to experience the code more directly — more visual representations of how the system changes between phases, something closer to walking the space than reading diffs. That’s an open problem, not a solved one, and it’s very much on my mind.
Verification and rollback
Each phase produces something that gets verified at multiple levels. The implementing agent checks its own work against tests. Reviewer agents examine the output from specific angles — architecture, security, style, correctness. CI runs as another gate.
By the time I look at a phase result, it has already been through this loop. I think keeping the phases small is what makes this work — each check is scoped to a bounded piece of work, not to the entire feature at once.
The phases aren’t a rigid plan. If a phase goes wrong, I roll back to the checkpoint and try again. I lose that phase, not the whole feature. There’s also a simpler workflow for when something is broken — reproduce, investigate, fix, verify — same phased structure, but focused on diagnostics and fixes with less ceremony.
Context windows keep growing. I started splitting work into phases for a practical reason: the work had to fit in the context. They’ve turned out to be valuable for reasons that have nothing to do with context size — and I don’t think that changes when the context gets bigger.
The phases also produce a trail of artifacts — analyses, task breakdowns, implementation logs, review packets. The agents use them to orient themselves when returning to the code. I use them when I want to investigate how something went. And I can use them to analyze and improve the system itself. I believe this trail is immensely useful, and it has possibilities I haven’t fully explored yet.