Single-agent LLM applications saturate quickly. Complex real-world tasks—research + synthesis + code generation + QA—need specialised agents that do one thing well and hand off cleanly. This article walks through the architecture patterns, failure modes, and code you need to ship a production-grade multi-agent system.

Why Multi-Agent Instead of One Big Prompt?

A single LLM context window is a finite resource. Stuffing a 50-page research brief, a JSON schema, business rules, and chain-of-thought instructions into one prompt produces confused, hallucinated output. Agent decomposition solves this by assigning narrow, well-defined responsibilities to specialist nodes:

🔍
Retriever Agent
Queries vector DB, web search, or internal knowledge base. Returns ranked chunks only.
🧠
Reasoner Agent
Synthesises retrieved context, performs step-by-step reasoning, outputs structured JSON.
Verifier Agent
Runs factual checks, confidence scoring, and re-routes to Retriever if uncertain.
📝
Writer Agent
Formats verified facts into the target tone, language, and structure.

The Orchestrator Layer

An orchestrator is the director—it receives the original user task, routes it to the right agent, passes outputs downstream, and decides when the pipeline is done. Two common patterns:

  • Plan-then-Execute (PTE): orchestrator creates a full plan upfront (list of steps), then dispatches each to an agent sequentially or in parallel.
  • ReAct Loop: orchestrator picks one action, gets a result, "thinks" about what to do next, repeats until done.
💡

When to use PTE vs ReAct

Use PTE for deterministic workflows (report generation, code refactoring). Use ReAct for exploratory tasks (research, debugging) where you don't know the steps in advance.

Implementation with LangGraph

LangGraph models agent workflows as directed state graphs, making the flow auditable and easy to debug.

Python
from langgraph.graph import StateGraph, END from typing import TypedDict, List class AgentState(TypedDict): task: str context: List[str] draft: str verified: bool def retriever(state: AgentState) -> AgentState: # Pull relevant chunks from vector store chunks = vector_store.similarity_search(state["task"], k=6) state["context"] = [c.page_content for c in chunks] return state def reasoner(state: AgentState) -> AgentState: prompt = build_prompt(state["task"], state["context"]) state["draft"] = llm.invoke(prompt).content return state def verifier(state: AgentState) -> AgentState: score = confidence_check(state["draft"], state["context"]) state["verified"] = score > 0.75 return state def route(state: AgentState) -> str: return "END" if state["verified"] else "retriever" graph = StateGraph(AgentState) graph.add_node("retriever", retriever) graph.add_node("reasoner", reasoner) graph.add_node("verifier", verifier) graph.set_entry_point("retriever") graph.add_edge("retriever", "reasoner") graph.add_edge("reasoner", "verifier") graph.add_conditional_edges("verifier", route, {"END": END, "retriever": "retriever"}) app = graph.compile()

Top Pitfalls & How to Avoid Them

  • Infinite loops: Always set a max_iterations guard in your orchestrator.
  • Context drift: Don't pass the raw full context between every agent—summarise it.
  • Hallucinated tool calls: Use strict JSON schemas with Pydantic for every agent output.
  • Token cost explosion: Profile token usage per node; cache retriever results aggressively.

Ecosystem at a Glance

ToolBest ForMaturity
LangGraphStateful, cyclical agent graphsStable
CrewAIHigh-level role-based crewsStable
AutoGenMicrosoft's conversational agentsBeta
LlamaIndex WorkflowsData-heavy document agentsStable
🚀

Quick Win

Start with two agents (Retriever + Writer) before adding verification layers. Complexity should be earned by actual reliability requirements.