Code Is Cheap Now. Here’s What Actually Matters.
Orchestration is the new bottleneck. Imagination might be the last one.
Something shifted in November 2025.
I’ve been building AI agent systems for a while now — contract analytics platforms, hybrid agent-graph execution models, the usual orchestration puzzles. But watching GPT-5.1 and Claude Opus 4.5 cross what felt like a reliability threshold made me reconsider what I thought I understood about this field. We’re no longer in the era of code completion. We’re in the era of agentic engineering, where coding agents independently write, test, and debug large volumes of code. The question is not whether this changes software development — it already has. The question is what remains valuable on the other side.
Simon Willison — Django co-creator, the person who coined “prompt injection” and “agentic engineering,” and perhaps the most visible practitioner documenting this transition in real time — has been writing about this with characteristic clarity. Code has become cheap. What hasn’t become cheap: human agency, ambition, and taste. The ability to direct these tools toward something worth building. The judgment to know when the output is actually good.
I want to work through what I think this means.
The Dark Factory Pattern
There’s a concept Willison describes that I can’t stop thinking about: the “dark factory” approach to software production. Highly automated. No one writes the code. No one reads the code. Quality assurance happens through swarms of agentic testers that simulate end-users and stress-test systems around the clock.
This sounds like science fiction until you watch an auto-research agent replicate a complex SaaS application in six hours while you sleep. Or until you see a Chrome extension get built in fifteen seconds — functional, installable, doing exactly what was asked. The gap between “this is theoretically possible” and “I just watched it happen” has collapsed.
The dark factory pattern raises an obvious question: if no human reads the code, how do you know it’s correct? Willison’s answer is pragmatic — red/green test-driven development. Write the tests first. Let the agent produce code that makes them pass. The tests become the specification, the verification, and the documentation all at once. You’re not reviewing code; you’re reviewing behavior.
I find this persuasive but incomplete. Tests can verify that code does what you specified. They can’t verify that what you specified was the right thing to build. That judgment call — what should exist, what’s worth the compute, what users actually need — remains stubbornly human.
Tests cannot verify that what you specified was the right thing to build. That judgment call — what should exist, what’s worth the compute, what users actually need — remains stubbornly human.
The Mental Exhaustion Problem
Here’s something I didn’t anticipate: managing multiple agents in parallel is cognitively brutal.
Willison describes himself getting “wiped out” by mid-morning from the mental load of coordinating agent swarms. I’ve experienced this. You spawn three agents — one for logic, one for tests, one for integration — and suddenly you’re context-switching between their outputs, catching subtle errors, making judgment calls about which branch to pursue. It’s not that any single interaction is hard. It’s that the aggregate load compounds faster than you expect.
This maps onto a pattern I’ve noticed in my own work: orchestration is the scarce resource, not intelligence. The LLMs are capable enough. What’s missing is the coordination layer that maintains state across extended workflows, decides which tool to invoke when, and keeps the whole system coherent over time. Claude Code’s leaked 70,000 lines of orchestration code matter more than the underlying model because they solve this problem.
The dark factory vision assumes we automate the coordination too. Maybe we will. But right now, humans are the bottleneck — and the bottleneck is getting squeezed harder as the agents get more capable.
Who Gets Amplified, Who Gets Displaced
The workforce implications are uneven in ways that don’t map neatly onto seniority.
Senior “10X” engineers get amplified. They already know what good code looks like, what architectural patterns work, where the edge cases hide. AI lets them move faster without losing quality. Juniors, surprisingly, also benefit — they can onboard quickly, learn by iterating with AI feedback, and produce useful output earlier than they otherwise would.
The uncomfortable middle is… the middle. Mid-career engineers who’ve built their value on knowing how to write code — but not yet on judgment, architecture, or system design — face the steepest adaptation curve. The skill that got them here (implementation) is precisely the skill that’s being commoditized.
I don’t know what the right response is for someone in that position. “Learn orchestration” is easy to say and hard to do when the tools are changing monthly. What I do know: the people who will thrive are the ones who can specify clearly what they want, verify rigorously whether they got it, and iterate quickly when they didn’t. That’s less about coding and more about thinking.
The Lethal Trifecta
Willison’s security warnings deserve more attention than they’re getting.
He describes a “lethal trifecta” for prompt injection vulnerabilities: an agent with access to private data, exposed to malicious instructions (say, an untrusted email), and possessing a mechanism for exfiltration. All three conditions are increasingly common. Most people building agentic systems aren’t thinking carefully about which combination of capabilities they’re enabling.
His prediction is grim: we’re heading toward a “Challenger disaster” for AI. The mechanism is normalization of deviance — companies use these systems in unsafe ways, nothing bad happens, they push further, still nothing bad happens, until suddenly something catastrophic does. The absence of failure becomes evidence of safety, when really it’s just evidence that you haven’t been attacked yet.
I see this in my own work. It’s tempting to give agents broad permissions because it makes development faster. The security hygiene that should accompany those permissions — sandboxing, capability restrictions, output validation — often gets deferred. Everyone’s racing to ship. The vulnerabilities accumulate silently.
When the Challenger moment comes, and I think it will, the response will be regulatory overreach and reputational damage that affects the entire field. The people building carefully now will be better positioned to navigate that.
Whimsy as Benchmark
One of Willison’s observations that stuck with me: the field remains “inherently funny.”
He uses a pelican riding a bicycle as a benchmark for testing LLM spatial reasoning through SVG code generation. Can the model draw the bird correctly positioned on the bike? He’s found strong correlation between this whimsical test and overall model intelligence.
I like this. It suggests that even as the stakes get higher — security vulnerabilities, workforce displacement, economic disruption — there’s room for play. The best engineers I know approach these systems with curiosity rather than grimness. They’re delighted when something unexpected works. They’re amused when it fails in absurd ways. The seriousness and the whimsy coexist.
Maybe that’s what “taste” means in this context: knowing when to push harder and when to step back, when to trust the agent and when to verify, when to optimize and when to ship something good enough. The algorithms don’t have taste. We do.
What Remains Valuable
Here’s where I’ve landed:
Code is cheap. Orchestration — the coordination layer that chains capabilities together across time and context — is becoming the scarce resource. But even orchestration will eventually be automated. What sits upstream of both is the human capacity to want something specific, to have a vision worth executing, to judge whether the output matches the intent.
Willison recommends “hoarding” successful code patterns and research in personal repositories to provide context for future AI tasks. I’ve started doing this systematically — building a library of Claude Code skills, prompt engineering artifacts, spec refinement workflows. Not because I think these specific artifacts will remain valuable forever, but because the practice of curating what works builds judgment that transfers.
The dream is still in your head. The agents can execute it faster than ever. But someone has to dream it first, and someone has to recognize when the execution matches the dream. That’s what humans are still for.
I don’t have a clean conclusion here. The field is moving too fast for clean conclusions.
What I have is a working hypothesis: the value chain is shifting from implementation to specification to judgment. The people who can clearly articulate what they want, verify rigorously whether they got it, and maintain security hygiene while moving fast — they’ll do well. Everyone else is in a race against commoditization.
The orchestration thesis isn’t that orchestration is the final answer. It’s that orchestration is the current bottleneck, and understanding bottlenecks is how you stay relevant as they shift. Right now, the bottleneck is coordination. Soon it might be verification. Eventually it might be imagination itself.
But imagination, I suspect, is the one thing that doesn’t automate away. The dream stays in your head. Everything else is becoming infrastructure.
These are some of my internal mutterings, externalized on a blog post on things I read/heard and thought about during weekend, 👏👏 and subscribe if you liked the post. I would like to hear your thoughts what will be the human moat in the era of Agentic AI.









