Guide
LangGraph fundamentals explained
Linear LLM chains break the moment an agent needs to loop, branch, pause for human approval, or resume after a crash. LangGraph is LangChain’s library for modeling agent workflows as graphs: nodes are functions (call a model, run a tool, validate output), edges define control flow (including cycles back to earlier nodes), and a typed state object accumulates results across steps. Unlike one-shot LangChain LCEL pipelines, LangGraph graphs can run until a termination condition, persist checkpoints to disk or Postgres, and interrupt mid-run for operator review — patterns that multi-agent orchestration at production scale depends on. This guide covers StateGraph construction, state reducers, conditional routing, tool-calling loops, checkpoint savers, human-in-the-loop interrupts, subgraph composition, a Harbor Logistics shipment agent worked example, a framework decision table, common pitfalls, and a practitioner checklist.
Why graphs instead of chains
A chain is a DAG: step A, then B, then C, done. Agents are rarely that simple. A support bot may classify intent, retrieve policy, call a refund API, discover insufficient permissions, ask a human, then resume. That is a cycle with external input — impossible to express cleanly as a fixed linear Runnable without re-implementing a state machine by hand.
LangGraph makes the state machine explicit. You define:
- State — a schema (usually
TypedDict) holding messages, tool results, flags, and domain fields. - Nodes — Python callables that receive state and return partial state updates.
- Edges — fixed transitions, or conditional edges that route based on state (e.g. “more tools needed” vs “respond to user”).
The runtime compiles your graph, executes nodes, merges updates into state via reducers, and optionally writes a checkpoint after each super-step so runs survive process restarts and support time-travel debugging in LangSmith.
State schema and reducers
State is the single source of truth for a thread (conversation or job). LangGraph encourages immutable-style updates: each node returns a patch, not a full replacement.
Annotated reducers
List fields like messages typically use
Annotated[list, add_messages] so new messages append rather than overwrite.
Custom reducers merge dicts, sum counters, or keep the latest value — define merge
semantics once in the schema instead of inside every node.
from typing import Annotated, TypedDict
from langgraph.graph.message import add_messages
class AgentState(TypedDict):
messages: Annotated[list, add_messages]
shipment_id: str | None
needs_human: bool
Keep state small. Store message history for the model, but put large blobs (PDF bytes, full API payloads) behind references (S3 keys, row IDs). Bloated checkpoints slow persistence and inflate LangSmith trace storage.
Thread IDs
Every invocation uses a thread_id (and optional checkpoint_id for
branching). Map thread IDs to your user session, ticket ID, or job queue message so
concurrent runs do not share state. Never reuse thread IDs across tenants.
Building a StateGraph
Core API (Python): StateGraph(AgentState), add nodes, wire edges, compile with
a checkpointer.
from langgraph.graph import StateGraph, START, END
builder = StateGraph(AgentState)
builder.add_node("classify", classify_intent)
builder.add_node("agent", call_model_with_tools)
builder.add_node("tools", run_tool_node)
builder.add_edge(START, "classify")
builder.add_edge("classify", "agent")
builder.add_conditional_edges(
"agent", should_continue, {"tools": "tools", END: END}
)
builder.add_edge("tools", "agent")
graph = builder.compile(checkpointer=memory_saver)
should_continue inspects the last message: if the model emitted tool calls, route
to tools; otherwise finish. This is the standard ReAct loop encoded as graph
topology instead of a while loop in application code.
Prebuilt agents
create_react_agent(model, tools, checkpointer=...) ships a tested graph for
tool-calling agents. Start here for MVPs; drop to custom graphs when you need extra nodes
(validation, RAG retrieval, human gates) that prebuilts do not expose.
Conditional edges and routing
Conditional edges are functions (state) -> str returning the name of the next
node (or END). Use them for:
- Tool loops — model requested tools vs final answer.
- Intent branching — route billing vs shipping subgraphs after classification.
- Quality gates — if RAG retrieval score is below threshold, re-query or escalate.
- Retry limits — after N failed tool attempts, fall back to human handoff.
Keep routing logic deterministic where possible. Let the LLM decide inside nodes;
let Python functions decide graph topology based on structured fields (intent,
error_code, retry_count). Mixing free-form model text into
routing invites flaky loops.
Checkpointing and persistence
Checkpointers serialize state after each super-step. Implementations include in-memory
(MemorySaver for dev), SQLite, Postgres (langgraph-checkpoint-postgres),
and Redis. Production agents almost always use durable storage.
What checkpoints enable
- Resume — worker crashes mid-run; next worker loads latest checkpoint and continues.
- Human-in-the-loop — graph pauses at an interrupt; operator edits state or approves; run resumes from same checkpoint.
- Time travel — fork from an earlier checkpoint to explore alternate tool paths (debugging and eval).
- Audit — replay exactly what the agent knew at each step for compliance review.
Configure retention: checkpoints grow with message length and step count. TTL old threads or archive to cold storage; align with GDPR deletion when state contains PII.
Human-in-the-loop interrupts
interrupt_before and interrupt_after on specific nodes pause
execution before or after a node runs. Common pattern: interrupt before a node that charges
a card or sends an external email. The API returns {"__interrupt__": ...};
your UI collects approval; you call graph.invoke(None, config) with updated
state to continue.
Pair interrupts with
guardrails: automated
checks can set needs_human=True in state, routing to an approval node via
conditional edge instead of relying solely on static interrupt lists.
Subgraphs and multi-agent composition
A compiled graph can be added as a node inside a parent graph — useful for department-specific workflows (billing subgraph, logistics subgraph) with isolated tool sets. Parent state maps into child state via input transformers; child output merges back through reducers.
This differs from ad-hoc multi-agent message passing: subgraph boundaries are enforced by the compiler, and checkpoints nest so you can inspect sub-runs in LangSmith. Avoid graphs deeper than two levels without strong justification — debug complexity rises quickly.
For retrieval-heavy agents, embed an agentic RAG subgraph: retrieve, grade, optionally re-query, then hand condensed context to the main reasoning node.
Streaming, observability, and deployment
graph.stream(..., stream_mode="updates") yields per-node state patches —
ideal for SSE to a browser showing “Checking inventory…” steps.
stream_mode="messages" token-streams model output during agent nodes.
LangSmith traces graph runs when LANGCHAIN_TRACING_V2=true. Tag runs with
thread_id, deployment version, and model ID. Set per-graph token budgets
and step limits (recursion_limit) to prevent runaway loops from draining
budget.
Deploy compiled graphs behind a queue worker or FastAPI endpoint. Pass
config={"configurable": {"thread_id": ticket_id}} on every call. Use
idempotent tool implementations so resumed runs do not double-charge.
Worked example: Harbor Logistics shipment exception agent
Harbor Logistics processes 3,400 daily shipment exceptions (delays, customs holds, address fixes). A linear LangChain chain could not loop on carrier APIs or pause for customs broker approval. The team rebuilt with LangGraph.
Graph topology
- parse_ticket — extracts tracking number, exception code, customer tier.
- fetch_status — calls carrier API; writes structured
carrier_state. - policy_rag — retrieves SLA and compensation rules; sets
eligible_credit. - propose_action — model chooses: reroute, credit, or escalate; may emit tool calls.
- tools — executes carrier reroute or billing credit with idempotency keys.
- human_gate — interrupt before credits above $200 or international customs holds.
- compose_reply — drafts customer email from final state; no tools.
Routing rules
Conditional edge after propose_action: tool calls → tools;
needs_human → human_gate; else → compose_reply.
After tools, always return to propose_action with updated
carrier_state (max three tool rounds, then force escalation). Postgres
checkpointer stores state per thread_id=ticket_id.
Results
Auto-resolution rose from 38% to 61% for tier-1 customers; average handle time dropped 4.2 minutes; erroneous high-value credits fell to zero after human_gate shipped. Failed carrier API calls resume from checkpoint instead of restarting classification — saving roughly 1,800 redundant LLM calls per day.
Framework decision table
| Choose LangGraph when… | Stick with LCEL chains when… | Consider alternatives when… |
|---|---|---|
| Agent loops, branches, or cycles are required | Fixed extract-transform-load pipeline, no cycles | Team bans LangChain deps; raw SDK + hand-rolled FSM is smaller |
| Durable checkpoints and resume matter | Stateless one-shot completions | Temporal/Cadence already orchestrates long workflows |
| Human approval mid-run is common | No interrupts or multi-day pauses | MCP tool servers replace framework; graph is thin glue only |
| Multiple specialized subgraphs share a parent | Single model call with structured output | CrewAI-style role YAML is team standard and sufficient |
| LangSmith graph visualization is valuable | Prototype under 50 lines of Python | Eval shows prebuilt create_react_agent already meets requirements |
Common pitfalls
- Unbounded message state — full chat history in every checkpoint; summarize or trim before context limits break runs.
- LLM-driven routing — parsing free-text “next step” for conditional edges causes non-deterministic loops.
- No recursion_limit — tool/model cycles run until API budget exhaustion.
- Shared thread IDs — concurrent users bleed state; always scope by tenant plus session.
- Non-idempotent tools on resume — replaying a checkpoint re-executes nodes unless tools dedupe by idempotency key.
- MemorySaver in production — process restart loses all in-flight agent work.
- God-state objects — one giant TypedDict with 40 fields; split domain state and message state, document ownership per node.
- Skipping interrupt UX — interrupts without a dashboard strand runs forever in paused state.
Practitioner checklist
- Define state schema with explicit reducers for list and dict fields.
- Map
thread_idto your domain ID; enforce tenant isolation in checkpointer keys. - Use conditional edges with structured state fields, not raw model prose.
- Set
recursion_limitand per-run token budgets. - Deploy Postgres (or equivalent) checkpointer for production; test crash-resume paths.
- Mark side-effecting nodes with interrupts or automated policy gates before execution.
- Implement idempotent tools; log idempotency keys in checkpoint metadata.
- Stream node updates to the UI for perceived latency wins on long runs.
- Enable LangSmith tracing; tag graph version and model IDs per deployment.
- Document subgraph boundaries and which tools each subgraph may call.
Key takeaways
- LangGraph models agent workflows as graphs with explicit state, nodes, and edges — including cycles.
- Reducers define how partial node outputs merge into shared state.
- Checkpoints enable resume, human-in-the-loop, and audit after failures or approvals.
- Conditional edges should route on structured state, not ambiguous model text.
- Subgraphs compose multi-agent systems with enforced boundaries and nested tracing.
Related reading
- LangChain fundamentals explained — LCEL, chains, and when to add LangGraph on top
- Multi-agent orchestration explained — topologies, handoffs, and cost control across specialists
- LLM function calling explained — tool schemas and the call loop inside agent nodes
- Agentic RAG explained — iterative retrieval subgraphs for knowledge-heavy agents