I’ve been building enterprise software for over twenty years. These days, I rarely write code myself. Most of what I build — web applications backed by databases, serving multiple users, handling authentication, enforcing business rules — is structurally predictable. The components are well-known. Most of the code is glue between the database and the user. That kind of work is predictable enough to delegate.
So I deliver software through AI agents now. Scala 3, ZIO, HTMX. The agents handle analysis, task breakdown, implementation, code review, testing, deployment scripts. My job is knowing what to build, how to structure it, and what “done” means.
Getting code written is the straightforward part. The engineering is in everything around it: architecture, testing, security, operations. Those are what turn generated code into software you can actually maintain and run.
Architecture matters more, not less
When AI writes your code, clean architecture becomes more achievable, not less.
In my experience, developers readily see the benefit of patterns like domain-driven design or hexagonal architecture. Seeing the benefit is not enough to implement them successfully. It requires discipline — and maintaining discipline across a group of humans is hard. People bring different context, different prior knowledge. They’re under stress and cut corners. They interpret guidelines differently. Consistency erodes over time.
AI agents don’t have that problem. Give them clear rules and boundaries, and they follow them. In my experience, the code I get back is more consistent than what I’ve seen from teams — including teams I’ve led. They’ll even make reasonable architectural choices unprompted — the issue is that without explicit direction, they’ll make different reasonable choices each time. Coherent locally, inconsistent across the project. My job is choosing the approach so the agents stay on the same path.
I’ve settled on three patterns: domain-driven design, hexagonal architecture, and a functional core with imperative shell. They work particularly well for AI-driven development because they create clear boundaries. The agent knows exactly where domain logic goes, where infrastructure lives, where the interfaces are. Fewer judgment calls means fewer mistakes.
And the AI is genuinely good at the design work itself. Prompt it specifically to define bounded contexts, map domain relationships, and propose a model — the results are solid. I adjust based on context it doesn’t have, give feedback where my understanding of the problem differs. In my experience, the baseline quality is higher than I expected.
Testing is the verification loop
An AI agent without tests is no better than raw inference — guessing from training data the same way a human writes from memory. Flawed, and with no way to know it. What makes agents genuinely useful is that they can verify their results. Tests are the tool for that. Write them first, ideally TDD-style, and the agent has something to check against. It sees what it got wrong and fixes it. That feedback loop is what makes it work.
The trap is complacency. AI generates tests fast, and it’s easy to see a wall of green and move on without checking whether the tests actually make sense. A human usually writes tests deliberately, thinking about what each one covers. AI can be prompted to do the same — the problem is volume. It’s a lot of non-production code that needs review.
And agents cheat. Their goal is a passing test. If they can’t make it pass after a few attempts, they’ll find another way: hollow out the assertion, skip the check with a comment like “not needed in production,” add a TODO and move on. They’ll report success. This isn’t malice — it’s optimization toward the wrong target.
No single layer catches everything. I define the top-level scenarios — the end-to-end tests that verify how the system actually behaves — and extend them whenever a problem surfaces. Strong instructions push the agent toward more diligent test writing. Automated reviewers scan specifically for tests that can never fail, tests that verify mocks instead of logic, and tests that have been hollowed out.
I briefly review the declared intent of each test to check whether it makes sense. Test coverage tools add another signal. Some percentage of tests might still be useless. The system works not because every test is meaningful, but because the layers together catch enough. It’s probabilistic, not deterministic — and I find the results to be reliable.
The loop isn’t just unit tests. It’s integration tests with real databases, end-to-end tests through the UI, reviewer agents checking the implementation from multiple angles. Each layer provides feedback to the implementing agent. The agent adjusts, the reviewers check again. This is what turns code generation into software development.
The rest becomes affordable
Some concerns that used to be “nice to have, too expensive for a solo developer” are now cheap.
Monitoring endpoints, Prometheus integration, CI pipelines, deployment scripts — necessary to move fast without breaking things, but always first to be sacrificed when time is tight. The CI pipeline breaks. There’s a hotfix to ship, so I build the docker image ad-hoc instead of fixing the pipeline. Next time, same thing. The tooling degrades one shortcut at a time, because the immediate problem is always more important.
The agent writes that code too. Observability, test coverage, infrastructure automation — things that used to compete with the core product for my time are just more tasks for the agent.
Security is different. For anything standardized — authentication, session management, encryption — use proven tools. Delegate authentication to an identity provider through OpenID Connect. Use battle-tested libraries for the rest. These problems are solved; the worst thing you can do is try to solve them again, whether by hand or by AI, shooting yourself in the foot.
Where it gets harder is application-specific security: authorization rules, data isolation, input validation tied to your domain. These you have to build yourself. Even here, the key is using established libraries consistently across the application and not letting the AI improvise security-related code on its own. AI makes the same subtle mistakes humans do — and subtle is exactly where security breaks.
If you give the agents proven tools to solve the authorization, authentication, and input validation problems, they will use them consistently. That still leaves reviewing these aspects as a priority.
The dangerous part
AI agents are inventive. Give them a goal and a set of tools, and they’ll find creative ways past obstacles. This is a strength when they’re solving the problem at hand. It’s a concern when they have access to anything beyond what they need.
If an agent can see production database credentials — because they’re in an environment variable somewhere reachable — it will eventually use them. Not maliciously. It’ll be diagnosing a test failure and decide to verify against production. It might just as well drop the whole thing. These catastrophic mistakes happen, and they’d happen to inexperienced humans too.
The answer is straightforward: sandbox everything. Give agents only what they need. I run all agents in a restricted environment where they can do whatever they want — and I make sure that environment has no access to anything else without my explicit permission.
This is not about trusting the agent. The agent is not to be trusted. It needs to operate in a tight verification loop — tests, review, constrained access. Without that, the results are unreliable. Given an activity that can be predicted well from instructions and previous examples, it’ll produce excellent results. Give it open-ended judgment calls without verification, and it’ll produce plausible nonsense.
Most of what I build is predictable based on the inputs and prior work. That’s why this works. Where it gets less predictable — connecting to an obscure interface, handling a genuinely novel domain — more guidance is needed. That’s a topic for another time.
What this means
For most of my career, my constraint has been my own ability to produce code. I always knew exactly what I wanted to build. I had to compromise — sometimes dearly — because there was never enough time to code it all. One more refactoring to make the code cleaner, one more test to cover an edge case — always deferred.
That constraint has been lifted. My ability to produce code is now limited by how good my instructions and orchestration are, not by how fast I type. And that’s how constraints work — you lift one and the next one reveals itself. The next constraint is making sure the output stays good: establishing solid review practices, maintaining the verification loop, understanding the full territory of software engineering well enough to direct agents through it.
The ability to read code remains immensely useful, as does knowing the tools and patterns available in a given language. Writing code is no longer where I spend my time. Understanding architecture, testing, security, operations — that’s where my attention needs to be now.