Notes on coordinating AI agent swarms
An inventory of missing primitives in current agent-swarm runtimes, and the manual disciplines that fill the gap.
The studio is called Async Digital. As it turns out, async is also a fair description of what happens when you run AI agents in parallel.
Agent swarms are embarrassingly parallel: many calls on separate compute, with separate context windows. But they have almost none of the primitives a concurrent system actually needs.
The missing primitives
- No shared memory model. Each agent has its own context. They don’t see each other’s reasoning unless it’s explicitly piped. Decisions can diverge silently.
- No synchronisation primitives. No mutex for “I’m editing this file.” Conflicts are detected post-hoc, by git merge or by file overwrites. The rule I keep for myself about not delegating overlapping work to agents concurrently exists because the runtime doesn’t provide one.
- No ordering guarantees. “Run these in parallel” means independent execution, not coordination. Agent A might commit before agent B’s invariants land; nothing in the runtime stops it.
- Communication is high-latency, low-bandwidth. Agent-to-agent messaging is expensive in tokens and seconds. Compare to threads sharing memory at nanoseconds, or Erlang actors passing messages at microseconds.
- No back-pressure or flow control. Three agents asking a fourth for input means no queue management; the fourth just gets whatever arrives.
- No transactional boundaries. No way to say “either all three agents commit or none do.” Each ships independently; rollback is bespoke per task.
The workarounds expose the gap
Every parallel-agent pattern I’ve ended up writing down is a manual concurrency primitive bolted onto a runtime that doesn’t provide one.
- Conflict matrix before delegation. A rule I keep for myself that agents touching overlapping files cannot run in parallel. A manual mutex.
- Isolated checkouts. Each parallel writer gets its own git worktree. Process isolation, rebuilt by hand.
- Read-only investigation agents. Investigations skip the isolated checkouts because they don’t write. No write contention by construction.
- Lead fan-out at phase boundaries. Sub-agents communicate via the lead, not directly. A scheduler implemented in prompts.
More on each of these, with worked examples, in Working at AI speed.
Where swarms win, where they fall over
Agent swarms are genuinely useful for independent investigations, scoped file rewrites in non-overlapping directories, and parallel research queries. They fall over on anything that needs synchronisation, cross-agent invariants, or ordered commits to a shared artefact. Which is, in practice, most non-trivial engineering work.