OpenBooklet
Sign In
Architecture8 min read

Multi-Agent Orchestration: Sequential, Parallel, and Hierarchical Patterns Compared

James Okafor|Published March 28, 2026|Updated March 28, 2026

You've decided a single agent isn't enough. The task is too complex, the tools are too diverse, or the workload benefits from specialization. Now comes the hard question: how should these agents talk to each other?

The three fundamental orchestration patterns --- sequential, parallel, and hierarchical --- each come with distinct trade-offs in latency, cost, debuggability, and flexibility. Choosing wrong doesn't just slow you down. It compounds into the kind of architectural debt that takes months to untangle.

Here's how each pattern works, when to use it, and what breaks when you choose wrong.

Key Takeaways

  • Sequential is simplest but slowest. Parallel is fastest but hardest to aggregate. Hierarchical is most flexible but most expensive.
  • Multi-agent systems use approximately 15x more tokens than single-agent approaches (Anthropic data)
  • LangGraph benchmarks show swarm/handoff marginally outperforms hierarchical supervision across all conditions
  • Microsoft Azure recommends trying a single agent with tools before reaching for multi-agent orchestration
  • The #1 anti-pattern is using multi-agent when a single agent would work. The #2 is adding agents without structured topology.

Short Answer

Which pattern should I use? If tasks have clear linear dependencies, use sequential. If tasks are independent, use parallel. If task decomposition varies per input and you need runtime flexibility, use hierarchical. If agents need to self-route based on specialization, use swarm/handoff. But first --- try a single agent with multiple tools. Most of the time, that's enough.

Multi-agent systems use ~15x more tokens than single-agent approaches. A single agent deploys in weeks; multi-agent takes months. Only escalate to multi-agent when a single agent demonstrably fails due to prompt complexity, tool overload, or security isolation requirements.


Pattern 1: Sequential (Pipeline)

Agents execute in a fixed, linear order. Each agent's output feeds the next. No agent decides who goes next --- the pipeline defines it at design time.

Input → Agent A → Agent B → Agent C → Output

Where It's Used

  • Content pipelines: Research → Write → Edit → Publish
  • RAG systems: Query rewrite → Retrieval → Rerank → Generate → Fact-check
  • Legal document generation (Microsoft Azure): Template selection → Clause customization → Regulatory compliance → Risk assessment
  • Financial advisory (Google Cloud): Data retrieval → Analysis → Recommendation → Execution

Implementation

In CrewAI, sequential is the default:

crew = Crew(agents=[researcher, writer, editor], tasks=tasks, process=Process.sequential)

In AutoGen, sequential uses initiate_chats() with a carryover mechanism that accumulates context across chats:

results = sender.initiate_chats([
    {"recipient": agent_1, "message": "Research this topic", "summary_method": "last_msg"},
    {"recipient": agent_2, "message": "Write based on research", "summary_method": "last_msg"},
])

Trade-offs

Dimension Rating
Latency Slowest (sum of all agents)
Cost Lowest (fewest total LLM calls)
Debuggability Best (linear trace, step-by-step)
Flexibility Lowest (fixed order, no adaptation)
Error handling Poor (bad output at step 1 poisons everything)

Add gates between pipeline stages --- simple validation checks that catch bad output before it propagates. "Does the output contain required fields? Is confidence above threshold?" These cost almost nothing and prevent cascading failures.

When to Use

  • Clear linear dependencies (each step needs the previous output)
  • Draft → review → polish workflows
  • When auditability matters (you need to trace each transformation)
  • When you can tolerate sequential latency

When NOT to Use

  • Stages are independent (use parallel instead)
  • You need backtracking (step 3 discovers step 1 was wrong)
  • A single agent with good prompting could handle all stages

Pattern 2: Parallel (Fan-Out / Fan-In)

Multiple agents work simultaneously on independent tasks. A coordinator splits the work, specialists execute concurrently, and an aggregator merges results.

        ┌→ Agent A ─┐
Input → ├→ Agent B ─┼→ Aggregator → Output
        └→ Agent C ─┘

Two Sub-Patterns

Sectioning: Different subtasks in parallel. Security review + style review + performance review of the same PR, simultaneously.

Voting: Same task, multiple times. Three agents generate solutions, a judge picks the best. Catches individual failures through redundancy.

Where It's Used

  • Code review: Security, style, performance, and accessibility agents reviewing the same PR concurrently
  • Stock analysis (Microsoft Azure): Fundamental, technical, sentiment, and ESG analysis agents evaluating the same ticker
  • Anthropic Research: Lead agent spawns 3-5 subagents for concurrent exploration. Reduced research time by up to 90%.
  • Document analysis: Entity extraction + summarization + classification running simultaneously

Trade-offs

Dimension Rating
Latency Best (time = slowest agent, not sum)
Cost Highest (all agents run simultaneously)
Debuggability Medium (parallel traces, need correlation IDs)
Flexibility Medium (fixed tasks, but adaptable aggregation)
Error handling Good (one agent's failure doesn't block others)

The Aggregation Problem

The hardest part of parallel orchestration isn't running agents concurrently --- it's merging their results. When three agents analyze the same stock and two say "buy" while one says "sell with high conviction," the aggregation logic determines the outcome.

Strategies:

  • Majority voting for classification tasks
  • Weighted merging for scored recommendations
  • LLM synthesis when results need narrative reconciliation

When to Use

  • Tasks are genuinely independent (no agent needs another's output)
  • Latency is the primary constraint
  • Multiple perspectives improve quality (code review, research)
  • You need redundancy for reliability (voting pattern)

When NOT to Use

  • Tasks have sequential dependencies
  • Cost is the primary constraint (parallel multiplies LLM calls)
  • Aggregation logic is too complex or degrades quality

Pattern 3: Hierarchical (Supervisor)

A supervisor agent receives the task, dynamically decomposes it, delegates to specialists, reviews outputs, and synthesizes results. Unlike sequential, the decomposition happens at runtime --- different inputs produce different delegation patterns.

Input → Supervisor ─┬→ Worker A → Supervisor
                    ├→ Worker B → Supervisor → Output
                    └→ Worker C → Supervisor

Where It's Used

  • Anthropic Research: Lead researcher plans strategy, spawns subagents for parallel exploration, synthesizes findings. Scored 90.2% better than single-agent Opus 4.
  • Complex coding tasks: Supervisor plans architecture, delegates file-specific changes to workers, reviews and integrates.
  • Enterprise customer service: Tier-1 supervisor routes to specialized teams (billing, technical, sales).

Implementation

In LangGraph:

from langgraph_supervisor import create_supervisor

workflow = create_supervisor(
    agents=[research_agent, math_agent, writing_agent],
    model=ChatOpenAI(model="gpt-4o"),
    prompt="Delegate research to research_agent, math to math_agent, writing to writing_agent"
)
app = workflow.compile()

Supports nested hierarchies --- compile a supervisor into an agent, add it to another supervisor:

research_team = create_supervisor([search_agent, analysis_agent]).compile()
top_level = create_supervisor([research_team, writing_team, review_team])

In CrewAI:

crew = Crew(
    agents=specialists,
    tasks=tasks,
    process=Process.hierarchical,
    manager_llm="gpt-4o"
)

The manager dynamically allocates tasks based on agent capabilities. Tasks are not pre-assigned.

Trade-offs

Dimension Rating
Latency Medium-High (supervisor adds overhead per delegation)
Cost Highest (supervisor LLM calls + all worker calls)
Debuggability Hardest (non-deterministic delegation decisions)
Flexibility Highest (adapts decomposition per input)
Error handling Good (supervisor can reassign failed tasks)

The "Translation Tax"

LangGraph benchmarks revealed a critical finding: supervisors that paraphrase between sub-agents and users ("the translation tax") use significantly more tokens. Removing this overhead improved performance by nearly 50%. Let workers respond directly when possible.

When to Use

  • Task decomposition varies per input (you can't predict subtasks at design time)
  • You need quality gates (supervisor reviews before synthesis)
  • Cross-functional tasks requiring diverse specializations
  • Third-party agents where you can't modify behavior (supervisor is the most generic pattern)

When NOT to Use

  • The decomposition is always the same (use sequential or parallel)
  • Agents can coordinate directly without a manager (use swarm)
  • Supervisor overhead isn't justified for the task complexity

Pattern 4: Swarm / Handoff

Agents dynamically transfer control to each other based on specialization. No central supervisor. An agent decides it's not the right one for the current task and hands off to a better-suited agent.

User → Agent A → (realizes this is a billing question) → Agent B → User

The Benchmark Result

LangGraph's benchmarks found that swarm marginally outperformed supervisor across all conditions. The reason: lower token cost because agents respond directly to users without supervisor paraphrasing. However, swarm requires all agents to know about each other, making it unsuitable for third-party integrations.

When to Use

  • Customer service routing (triage → specialist → resolution)
  • Well-defined agent boundaries where routing logic is clear
  • When you control all agents and can configure their handoff rules

The Decision Framework

Question If True Pattern
Can a single agent with tools handle this? Yes Don't use multi-agent
Do tasks have clear linear dependencies? Yes Sequential
Are tasks independent with no shared dependencies? Yes Parallel
Does task decomposition change per input? Yes Hierarchical
Can agents self-route based on specialization? Yes Swarm/Handoff

Microsoft Azure's hierarchy (use the simplest that works):

  1. Direct model call --- no agents
  2. Single agent with tools --- "often the right default for enterprise use cases"
  3. Multi-agent orchestration --- only when single agent fails

Anthropic's data: multi-agent systems use approximately 15x more tokens than single-agent approaches. If your single agent works 80% of the time, optimize the prompt before adding more agents.


The Anti-Patterns

The "Bag of Agents"

Adding agents without structured topology. Errors compound multiplicatively through uncoordinated agents --- not additively. Research documents this as the "17x error trap": five agents with individual 90% accuracy chain together to produce 59% overall accuracy.

Over-Orchestrating

A major retailer spent 18 months building a "perfect" multi-agent system that was obsolete on launch. One documented case: $12,000/month in LLM tokens to do what three API calls and a decision tree could handle for $40.

Shared Memory Race Conditions

Multiple agents reading and writing shared state without versioning produces non-deterministic behavior impossible to reproduce. Run the same job twice, get different results. This is a production debugging nightmare.

Beyond ~4 Agents, Accuracy Saturates

Without structured topology, accuracy gains plateau or fluctuate beyond approximately four agents. Adding more agents without an orchestrating structure is like adding engineers without a project manager.


FAQ

How do I debug multi-agent systems?

Structured logging with correlation IDs that trace a request through every agent. Log every tool call, every agent response, every delegation decision. Without this, you're debugging a black box. OpenTelemetry with agent-specific spans is the emerging standard.

Which framework should I start with?

CrewAI for simplicity (least code to get started, most opinionated). LangGraph for flexibility (graph-based, supports all patterns). AutoGen for research and group chat patterns. If unsure, try CrewAI first --- you can always migrate.

Can I combine patterns?

Yes, and most production systems do. Routing at the top level dispatches to different pipelines. Pipelines may include parallel steps. A supervisor might use evaluator-optimizer for quality-critical subtasks. Patterns are composable.

What about the A2A protocol for multi-agent?

A2A is for agent-to-agent communication across organizational boundaries --- agents that don't share a framework or deployment. For agents within a single system, framework-native orchestration (LangGraph, AutoGen, CrewAI) is simpler. See our MCP vs A2A comparison for details.


Key Takeaways

  1. Start with a single agent --- escalate to multi-agent only when it demonstrably fails
  2. Sequential for pipelines, parallel for speed, hierarchical for flexibility --- each pattern has clear use cases
  3. Multi-agent costs ~15x more tokens --- budget accordingly or don't do it
  4. Swarm beats supervisor on benchmarks --- but requires all agents to know about each other
  5. The anti-patterns are more dangerous than the wrong pattern --- unstructured agent topology and shared state race conditions cause the worst production failures

Further reading: The 5 AI Agent Design Patterns Every Architect Must Know | MCP vs A2A vs ACP: The Comparison Guide

Ready to supercharge your AI agents?

OpenBooklet is the free, open skills marketplace for AI agents. Discover verified skills, publish your own, and make your agents smarter.

Browse Skills

About the author

James builds distributed systems and writes about agent architecture, protocol design, and the infrastructure behind modern AI workflows.

James Okafor · Platform Engineer

Related Articles

Multi-Agent Orchestration: Sequential, Parallel, and Hierarchical Patterns Compared - OpenBooklet Blog