On Friday evening I asked Claude to go through my AI legal-advice platform Erst Recht end to end — backend, database, payment flow, the lawyer-upgrade path, the whole machine. Instead of working through it step by step itself, Claude wrote a short script: seven specialized agents, one per domain, all running at once.
Five minutes and eleven seconds later I had a findings list sorted by severity that would have cost a single developer half a day. Among them: several critical findings, including a security-critical flaw a single pass would have missed — and it's exactly this one that best shows what this new way of working is about.
That's impressive. But it isn't the point. The point is what it makes possible that a model can now turn an entire plan into executable code and unleash a swarm of agents on it in parallel. That's exactly what Claude Opus 4.8 and dynamic workflows have been about since May 28.
Opus 4.8 in one paragraph
Anthropic shipped Opus 4.8 just 41 days after 4.7 — its fastest flagship cycle yet. One thing matters most to me: the model has become more honest as an agent. It lets code flaws slip through roughly four times less often, flags its own uncertainties, and makes fewer unsupported claims. On top of that come new "effort controls" (you choose how hard Claude thinks) and a cheaper fast mode. It sounds like detail polish — but it's exactly the foundation on which you'd trust an agent with a live product in the first place.
The real shift: the plan moves into the script
Until now, the ceiling for everything an AI agent could do autonomously was always the same: the context window. The agent held the plan in its head, and every intermediate result — every file read, every tool call — landed in that same window. Eventually it filled up and the agent lost the thread. Pulling in more helpers didn't help, because their results flowed back into that one memory too.
Dynamic workflows turn that around. Claude no longer writes the plan into its head but as a script: which agents run, in what order, what gets passed to whom. A runtime executes that script in the background while the intermediate results live in script variables — not in the model's context. Claude's memory holds only the finished answer at the end.
The plan moves out of the head and into code. That removes the limit that bounded everything before — and a single run can coordinate dozens to hundreds of agents (up to 16 at a time, 1,000 total).
This is more than "the same, but faster." Because the plan is now code, it can enforce repeatable quality patterns: agents that adversarially review what other agents found before anything is reported. In my case, that became the decisive step.
My 5-minute proof
The workflow for Erst Recht had two phases:
Seven read-only agents, each owning a domain: backend API, database, lawyer upgrade, payment, frontend wiring, build/tests/dependencies, email. They only read — and all at the same time.
The seven raw audits are distilled into a single findings list, prioritized by severity — the blueprint for the fixes that followed.
| Agent | Tokens | Tools | Duration |
|---|---|---|---|
| backend-api | 177.5k | 36 | 4m 32s |
| lawyer-upgrade | 157.7k | 34 | 5m 11s |
| database | 111.9k | 40 | 4m 36s |
| payment-stripe | 108.4k | 25 | 4m 34s |
| frontend-wiring | 100.8k | 39 | 4m 01s |
| build-tests-deps | 86.5k | 34 | 4m 27s |
| email-notifications | 71.7k | 25 | 3m 27s |
Here you can see the magic of parallel orchestration in black and white: add the agents up and that's roughly half an hour of work. But because they ran at the same time, the whole phase was done in 5m 11s — exactly the slowest agent, nothing more. That's the difference between "one agent works through a long list" and "a swarm finishes it in the time of its slowest member."
Where the model distrusted itself
The most interesting moment came after the findings — and it belongs to exactly that security-critical flaw. The first fix for it looked clean. But on live verification against the actual system it turned out the fix was a shot in the dark. The root cause sat one level deeper than the first glance suggested — after the "fix" the problem was still exactly as it had been.
Verification beats trust
A normal audit would have checked the first fix off as done. Because the workflow tested every critical finding adversarially against live reality, the failure surfaced — and the problem was actually fixed, not just seemingly.
This is where Opus 4.8's honesty and the structure of workflows mesh: a model more willing to admit "this looks done but isn't," inside a flow that gives that skepticism its own verification step. The end result was six commits on main, across security, the checkout flow, the lawyer upgrade and stale dependencies — every critical fix verified live once more before it was closed.
What this really unlocks
Forget my audit for a moment. Once "a plan plus a thousand agents" becomes normal, what shifts is which work is even conceivable in the first place:
- Tasks nobody touches today become doable. A migration across hundreds of thousands of lines, an audit of every single API route, a cleanup spanning an old repo — things you kept postponing because they were too big for one session.
- Research that cross-checks itself. Instead of an answer that merely sounds plausible, agents fan a question out across several angles, fetch sources, and vote on which claim survives cross-checking. What fails is dropped before you ever see it.
- Decisions from several minds. A hard architecture question can be drafted from three independent angles and weighed against each other before you commit — instead of endlessly spinning a single first draft.
- Verification as the default. The built-in reflex to adversarially refute findings rather than believe them is exactly what saved my no-op fix. That becomes standard, not the exception.
- Processes instead of prompts. A good workflow can be saved and run again and again. The one-off "go do this" becomes a repeatable routine — the branch review, the release check, the weekly audit, orchestrated identically every time.
- A new role for the developer. You type fewer prompts and conduct more swarms. Especially for small teams and solo developers that means reach that used to take a whole department.
The honest flip side
Dynamic workflows are a research preview — not a finished product. A swarm costs noticeably more tokens than the same task in conversation, and more agents aren't automatically better: for a small, well-scoped question a single agent is faster and cheaper. The lever only fires when a task breaks into many independent parts, spans more material than one context window can hold, and produces a result that tolerates cross-checking. That's exactly when — and only when — I reach for it.
Tools used:
Erst Recht is my own legal-tech platform — and at the same time my testbed for exactly these workflows. Wondering how AI-driven multi-agent flows could take real work off your team's plate? Get in touch.