The Subagent Playbook: Three Patterns for AI Orchestration
Not all parallel work is the same, and using agents wrong costs you in one of two ways
Date: 2026-03-24
The Problem
AI coding assistants can spawn subagents — parallel workers that read code, search archives, draft content, and report back. The obvious move is to throw agents at everything and let parallelism do the work. But not all parallel work is the same, and using agents wrong costs you in one of two ways: you waste tokens (using throughput agents when you need judgment) or you sacrifice reasoning quality (delegating decisions that require full visibility).
After running dozens of sessions with Claude Code’s 1M context window, three distinct orchestration patterns have emerged. Each solves a different problem. Each fails when applied to the wrong situation. The difference between a session that produces excellent output and one that produces expensive mediocrity often comes down to which pattern you chose.
Pattern A — Throughput
Agents do the heavy reading. The main thread stays lean.
This is the workhorse pattern. You have a lot of material to process — files to read, sources to check, content to generate — and the work is parallelizable. Each piece can be handled independently. The main thread acts as an orchestrator: it identifies work streams, dispatches agents, and synthesizes their returns.
The economics are striking. Each agent consumes 60-80k tokens internally — reading files, processing context, forming conclusions. But it returns only ~2-3k tokens of distilled findings to the main thread. That’s a 30:1 compression ratio. The main thread never touches the raw material. It works exclusively with distilled output.
Best for: Production workflows, multi-step tasks, content generation, research gathering.
How it works:
- Main thread identifies parallel work streams
- Each agent gets a focused task plus the context it needs
- Agents return distilled results
- Main thread orchestrates and synthesizes — it never duplicates the reading
In practice, a production session might dispatch three research agents and two writing agents simultaneously. The main context stays surprisingly low despite massive output across the agents. The main thread’s job is editorial: sequencing, quality control, integration.
The failure mode is using Pattern A for work where the connections between pieces matter more than the pieces themselves. If agent #3’s findings would change how you interpret agent #1’s results, you have a judgment problem, not a throughput problem.
Pattern B — Judgment
The main thread reads everything. Agents validate.
This is the opposite architecture. Instead of delegating the reading, the main thread loads the full picture into context — the codebase, the specs, the commit history, the error logs, the design documents. It forms its own conclusions with all connections visible. Only then does it send subagents out, not to research, but to challenge specific conclusions.
The subagents in Pattern B serve a fundamentally different purpose than in Pattern A. They aren’t workers — they’re devil’s advocates. You’ve formed a theory about why the system is failing. You send an agent to try to disprove it. You think the architecture should change in a specific way. You send an agent to find evidence against that change. The agents return not research but opinions on your conclusions, which the main thread evaluates with full context.
Best for: Critical debugging, architecture decisions, postmortems, anything where connections between distant pieces matter.
How it works:
- Main thread loads all relevant context
- Main thread forms its own conclusions
- Subagents get specific conclusions to challenge or validate
- Main thread integrates feedback with full visibility
The evidence for this pattern came from reflecting on what went wrong during an architectural investigation. The investigation uncovered five cascading decisions spread across five separate sessions. Each session had made a locally optimal choice — reasonable given the information available at the time. But no single session had visibility into the compounding effect of all five decisions together. The accumulated technical debt created a crisis that blindsided everyone.
A single session running Pattern B — loading the full project history and forming conclusions with all connections visible — might have caught the compounding risk early. Each individual decision looked fine. The danger was in the interaction between them, and interactions are exactly what you miss when you distribute reasoning across agents that can’t see each other’s context.
The rule: If getting it wrong is expensive, don’t delegate the thinking.
Pattern C — Discovery
Swarms explore broadly. Most return nothing. The ones that hit pay dirt change everything.
This is the least predictable pattern and potentially the most valuable. You send agents out to explore — searching across a codebase, scanning archives, pulling from different data sources — without knowing exactly what they’ll find. The search is intentionally broad. You’re not looking for a specific answer; you’re looking for material you didn’t know existed.
The hit rate is low. In one research session, 15 exploration agents were sent across a machine’s codebase and archives. Six returned useful material — unexpected connections, forgotten documents, code artifacts that reframed the problem. Nine hit walls: permission issues, dead ends, irrelevant results. A 40% hit rate.
But the six that worked surfaced material no amount of directed thinking would have found. A quirky mode buried in an old script. A document from a completely different project that contained exactly the framing needed. An archived story that connected to the current work in a way no one anticipated. These weren’t things you could have searched for, because you didn’t know they existed.
Here’s the critical constraint: creative synthesis must happen in the main thread. The agents find raw material. They don’t write the final output. In the session that tested this pattern, agent-written output was competent — structurally sound, factually correct, entirely adequate. But the best output of the session was written manually in the main thread with full context loaded. It had sustained voice, unexpected connections between the discovered material, and editorial choices that required seeing everything at once.
The difference isn’t subtle. Agent-written content reads like it was assembled from parts, because it was. Main-thread content reads like it was written by someone who had read everything and was making choices about what mattered most. That distinction — assembly versus authorship — is the reason Pattern C has a strict division of labor. Agents are excellent researchers. They are mediocre authors, because authorship requires holding the whole picture in mind at once.
Best for: Content creation, research gathering, exploring unfamiliar territory.
The rule: Use swarms to find. Write the important stuff yourself.
The Decision Guide
When you’re starting a session, the choice between patterns comes down to a few signals:
| Signal | Pattern |
|---|---|
| Work is parallelizable and independent | A (Throughput) |
| Connections between pieces are the point | B (Judgment) |
| Speed matters more than depth | A (Throughput) |
| Getting it wrong is expensive | B (Judgment) |
| Research + synthesis | A (Throughput) |
| Architecture + debugging | B (Judgment) |
| Need to find material across many sources | C (Discovery) |
| Need consistent voice or creative quality | Write manually after C |
The signals aren’t always clean. A debugging session might start as Pattern A (gather logs from multiple services) and shift to Pattern B (form a theory about root cause with full context). The point isn’t rigid categorization — it’s recognizing when you’re in the wrong pattern and switching.
The 1M Context Shift
Pattern B was impractical with smaller context windows. Loading a full project state — the codebase, the specs, the recent commit history, the design documents — consumed most of the available context. There was no room left for actual work, let alone agent round-trips.
The 1M context window changes the calculus fundamentally. You can load an entire project’s state into context and still have hundreds of thousands of tokens available for reasoning, agent dispatches, and iterative refinement. Pattern B goes from “theoretically nice but practically impossible” to “the obvious choice for high-stakes decisions.”
This isn’t just a quantitative improvement. It enables a qualitatively different way of working. When the context window was small, every session was forced into Pattern A by default — you couldn’t fit everything, so you had to delegate the reading. Now you have a choice. And having the choice means you can match the pattern to the problem instead of using the only pattern that fits.
The practical impact: sessions that would have required multiple rounds of “read this, summarize it, read the next thing” can now load everything at once and reason about the connections. The five-session architectural failure described earlier happened partly because the context windows of the time couldn’t hold the full picture. With 1M tokens, a single session can.
This also reframes the cost conversation. A Pattern B session that loads 200k tokens of context and reasons over all of it costs more per session than a Pattern A session that dispatches lightweight agents. But if Pattern B catches a compounding architectural mistake that would take five debugging sessions to unravel later, it’s the cheapest option by far. The cost of the pattern isn’t just the tokens consumed — it’s the cost of the failure mode you avoided.
Combining Patterns
The patterns compose. A real session rarely uses just one.
A research project might start with Pattern C: send a swarm of agents to explore sources, scan archives, pull papers. Most return nothing. Some surface unexpected material. Now you have raw findings scattered across agent returns.
Shift to Pattern B: load all the discovered material into the main thread’s context. Read everything yourself. Form conclusions about what matters, what connects, what the narrative is. The synthesis requires seeing all the pieces at once — exactly what agents can’t do when they each only see their own slice.
Then shift to Pattern A: you know what you want to produce. Dispatch agents to handle the parallelizable execution. One writes section drafts. Another formats references. A third generates supporting materials. The main thread orchestrates and polishes.
C (explore) into B (understand) into A (execute). Each pattern handles the phase of work it’s best suited for.
The reverse composition works too. You might start with Pattern A (gather information from multiple sources in parallel), realize the findings are contradictory (shift to Pattern B — load everything and reason about why), then send discovery agents to find evidence you hadn’t considered (Pattern C).
The mistake is staying in one pattern when the work has shifted to a different phase. The agent swarm found interesting material — great, that was Pattern C doing its job. Now stop sending more agents and read what you have. The main thread formed a solid theory — great, that was Pattern B. Now stop loading more context and dispatch agents to execute.
Recognizing the transition point is the skill. The signal is usually a change in what you’re bottlenecked on. If you’re bottlenecked on information, you need more agents (A or C). If you’re bottlenecked on understanding, you need to stop delegating and think (B). If you’re bottlenecked on output volume, you need workers (A). Most sessions that feel stuck are stuck because the bottleneck shifted and the pattern didn’t.
The Takeaway
The three patterns map to three different relationships between an AI main thread and its agents:
- Pattern A: Agents are workers. Main thread is a manager.
- Pattern B: Agents are advisors. Main thread is the decision-maker.
- Pattern C: Agents are scouts. Main thread is the writer.
Each relationship is appropriate for different work. The throughput pattern maximizes output. The judgment pattern maximizes reasoning quality. The discovery pattern maximizes the chance of finding something unexpected.
The cost of getting this wrong isn’t just wasted tokens. Using Pattern A for a critical architectural decision means distributing reasoning across agents that can’t see each other’s context — exactly the failure mode that leads to cascading mistakes. Using Pattern B for production content generation means bottlenecking everything through a single thread that could have been parallelized. Using Pattern C without following up with manual synthesis means getting competent-but-generic output when the material deserved better.
Match the pattern to the problem. Switch when the work changes phase. And when the stakes are high, do the thinking yourself.