Single-agent LLM applications saturate quickly. Complex real-world tasks—research + synthesis + code generation + QA—need specialised agents that do one thing well and hand off cleanly. This article walks through the architecture patterns, failure modes, and code you need to ship a production-grade multi-agent system.
Why Multi-Agent Instead of One Big Prompt?
A single LLM context window is a finite resource. Stuffing a 50-page research brief, a JSON schema, business rules, and chain-of-thought instructions into one prompt produces confused, hallucinated output. Agent decomposition solves this by assigning narrow, well-defined responsibilities to specialist nodes:
The Orchestrator Layer
An orchestrator is the director—it receives the original user task, routes it to the right agent, passes outputs downstream, and decides when the pipeline is done. Two common patterns:
- Plan-then-Execute (PTE): orchestrator creates a full plan upfront (list of steps), then dispatches each to an agent sequentially or in parallel.
- ReAct Loop: orchestrator picks one action, gets a result, "thinks" about what to do next, repeats until done.
When to use PTE vs ReAct
Use PTE for deterministic workflows (report generation, code refactoring). Use ReAct for exploratory tasks (research, debugging) where you don't know the steps in advance.
Implementation with LangGraph
LangGraph models agent workflows as directed state graphs, making the flow auditable and easy to debug.
Pythonfrom langgraph.graph import StateGraph, END from typing import TypedDict, List class AgentState(TypedDict): task: str context: List[str] draft: str verified: bool def retriever(state: AgentState) -> AgentState: # Pull relevant chunks from vector store chunks = vector_store.similarity_search(state["task"], k=6) state["context"] = [c.page_content for c in chunks] return state def reasoner(state: AgentState) -> AgentState: prompt = build_prompt(state["task"], state["context"]) state["draft"] = llm.invoke(prompt).content return state def verifier(state: AgentState) -> AgentState: score = confidence_check(state["draft"], state["context"]) state["verified"] = score > 0.75 return state def route(state: AgentState) -> str: return "END" if state["verified"] else "retriever" graph = StateGraph(AgentState) graph.add_node("retriever", retriever) graph.add_node("reasoner", reasoner) graph.add_node("verifier", verifier) graph.set_entry_point("retriever") graph.add_edge("retriever", "reasoner") graph.add_edge("reasoner", "verifier") graph.add_conditional_edges("verifier", route, {"END": END, "retriever": "retriever"}) app = graph.compile()
Top Pitfalls & How to Avoid Them
- Infinite loops: Always set a
max_iterationsguard in your orchestrator. - Context drift: Don't pass the raw full context between every agent—summarise it.
- Hallucinated tool calls: Use strict JSON schemas with Pydantic for every agent output.
- Token cost explosion: Profile token usage per node; cache retriever results aggressively.
Ecosystem at a Glance
| Tool | Best For | Maturity |
|---|---|---|
| LangGraph | Stateful, cyclical agent graphs | Stable |
| CrewAI | High-level role-based crews | Stable |
| AutoGen | Microsoft's conversational agents | Beta |
| LlamaIndex Workflows | Data-heavy document agents | Stable |
Quick Win
Start with two agents (Retriever + Writer) before adding verification layers. Complexity should be earned by actual reliability requirements.