Guide

LangGraph fundamentals explained

Linear LLM chains break the moment an agent needs to loop, branch, pause for human approval, or resume after a crash. LangGraph is LangChain’s library for modeling agent workflows as graphs: nodes are functions (call a model, run a tool, validate output), edges define control flow (including cycles back to earlier nodes), and a typed state object accumulates results across steps. Unlike one-shot LangChain LCEL pipelines, LangGraph graphs can run until a termination condition, persist checkpoints to disk or Postgres, and interrupt mid-run for operator review — patterns that multi-agent orchestration at production scale depends on. This guide covers StateGraph construction, state reducers, conditional routing, tool-calling loops, checkpoint savers, human-in-the-loop interrupts, subgraph composition, a Harbor Logistics shipment agent worked example, a framework decision table, common pitfalls, and a practitioner checklist.

Why graphs instead of chains

A chain is a DAG: step A, then B, then C, done. Agents are rarely that simple. A support bot may classify intent, retrieve policy, call a refund API, discover insufficient permissions, ask a human, then resume. That is a cycle with external input — impossible to express cleanly as a fixed linear Runnable without re-implementing a state machine by hand.

LangGraph makes the state machine explicit. You define:

State — a schema (usually TypedDict) holding messages, tool results, flags, and domain fields.
Nodes — Python callables that receive state and return partial state updates.
Edges — fixed transitions, or conditional edges that route based on state (e.g. “more tools needed” vs “respond to user”).

The runtime compiles your graph, executes nodes, merges updates into state via reducers, and optionally writes a checkpoint after each super-step so runs survive process restarts and support time-travel debugging in LangSmith.

State schema and reducers

State is the single source of truth for a thread (conversation or job). LangGraph encourages immutable-style updates: each node returns a patch, not a full replacement.

Annotated reducers

List fields like messages typically use Annotated[list, add_messages] so new messages append rather than overwrite. Custom reducers merge dicts, sum counters, or keep the latest value — define merge semantics once in the schema instead of inside every node.

from typing import Annotated, TypedDict
from langgraph.graph.message import add_messages

class AgentState(TypedDict):
    messages: Annotated[list, add_messages]
    shipment_id: str | None
    needs_human: bool

Keep state small. Store message history for the model, but put large blobs (PDF bytes, full API payloads) behind references (S3 keys, row IDs). Bloated checkpoints slow persistence and inflate LangSmith trace storage.

Thread IDs

Every invocation uses a thread_id (and optional checkpoint_id for branching). Map thread IDs to your user session, ticket ID, or job queue message so concurrent runs do not share state. Never reuse thread IDs across tenants.

Building a StateGraph

Core API (Python): StateGraph(AgentState), add nodes, wire edges, compile with a checkpointer.

from langgraph.graph import StateGraph, START, END

builder = StateGraph(AgentState)
builder.add_node("classify", classify_intent)
builder.add_node("agent", call_model_with_tools)
builder.add_node("tools", run_tool_node)

builder.add_edge(START, "classify")
builder.add_edge("classify", "agent")
builder.add_conditional_edges(
    "agent", should_continue, {"tools": "tools", END: END}
)
builder.add_edge("tools", "agent")

graph = builder.compile(checkpointer=memory_saver)

should_continue inspects the last message: if the model emitted tool calls, route to tools; otherwise finish. This is the standard ReAct loop encoded as graph topology instead of a while loop in application code.

Prebuilt agents

create_react_agent(model, tools, checkpointer=...) ships a tested graph for tool-calling agents. Start here for MVPs; drop to custom graphs when you need extra nodes (validation, RAG retrieval, human gates) that prebuilts do not expose.

Conditional edges and routing

Conditional edges are functions (state) -> str returning the name of the next node (or END). Use them for:

Tool loops — model requested tools vs final answer.
Intent branching — route billing vs shipping subgraphs after classification.
Quality gates — if RAG retrieval score is below threshold, re-query or escalate.
Retry limits — after N failed tool attempts, fall back to human handoff.

Keep routing logic deterministic where possible. Let the LLM decide inside nodes; let Python functions decide graph topology based on structured fields (intent, error_code, retry_count). Mixing free-form model text into routing invites flaky loops.

Checkpointing and persistence

Checkpointers serialize state after each super-step. Implementations include in-memory (MemorySaver for dev), SQLite, Postgres (langgraph-checkpoint-postgres), and Redis. Production agents almost always use durable storage.

What checkpoints enable

Resume — worker crashes mid-run; next worker loads latest checkpoint and continues.
Human-in-the-loop — graph pauses at an interrupt; operator edits state or approves; run resumes from same checkpoint.
Time travel — fork from an earlier checkpoint to explore alternate tool paths (debugging and eval).
Audit — replay exactly what the agent knew at each step for compliance review.

Configure retention: checkpoints grow with message length and step count. TTL old threads or archive to cold storage; align with GDPR deletion when state contains PII.

Human-in-the-loop interrupts

interrupt_before and interrupt_after on specific nodes pause execution before or after a node runs. Common pattern: interrupt before a node that charges a card or sends an external email. The API returns {"__interrupt__": ...}; your UI collects approval; you call graph.invoke(None, config) with updated state to continue.

Pair interrupts with guardrails: automated checks can set needs_human=True in state, routing to an approval node via conditional edge instead of relying solely on static interrupt lists.

Subgraphs and multi-agent composition

A compiled graph can be added as a node inside a parent graph — useful for department-specific workflows (billing subgraph, logistics subgraph) with isolated tool sets. Parent state maps into child state via input transformers; child output merges back through reducers.

This differs from ad-hoc multi-agent message passing: subgraph boundaries are enforced by the compiler, and checkpoints nest so you can inspect sub-runs in LangSmith. Avoid graphs deeper than two levels without strong justification — debug complexity rises quickly.

For retrieval-heavy agents, embed an agentic RAG subgraph: retrieve, grade, optionally re-query, then hand condensed context to the main reasoning node.

Streaming, observability, and deployment

graph.stream(..., stream_mode="updates") yields per-node state patches — ideal for SSE to a browser showing “Checking inventory…” steps. stream_mode="messages" token-streams model output during agent nodes.

LangSmith traces graph runs when LANGCHAIN_TRACING_V2=true. Tag runs with thread_id, deployment version, and model ID. Set per-graph token budgets and step limits (recursion_limit) to prevent runaway loops from draining budget.

Deploy compiled graphs behind a queue worker or FastAPI endpoint. Pass config={"configurable": {"thread_id": ticket_id}} on every call. Use idempotent tool implementations so resumed runs do not double-charge.

Worked example: Harbor Logistics shipment exception agent

Harbor Logistics processes 3,400 daily shipment exceptions (delays, customs holds, address fixes). A linear LangChain chain could not loop on carrier APIs or pause for customs broker approval. The team rebuilt with LangGraph.

Graph topology

parse_ticket — extracts tracking number, exception code, customer tier.
fetch_status — calls carrier API; writes structured carrier_state.
policy_rag — retrieves SLA and compensation rules; sets eligible_credit.
propose_action — model chooses: reroute, credit, or escalate; may emit tool calls.
tools — executes carrier reroute or billing credit with idempotency keys.
human_gate — interrupt before credits above $200 or international customs holds.
compose_reply — drafts customer email from final state; no tools.

Routing rules

Conditional edge after propose_action: tool calls → tools; needs_human → human_gate; else → compose_reply. After tools, always return to propose_action with updated carrier_state (max three tool rounds, then force escalation). Postgres checkpointer stores state per thread_id=ticket_id.

Results

Auto-resolution rose from 38% to 61% for tier-1 customers; average handle time dropped 4.2 minutes; erroneous high-value credits fell to zero after human_gate shipped. Failed carrier API calls resume from checkpoint instead of restarting classification — saving roughly 1,800 redundant LLM calls per day.

Framework decision table

Choose LangGraph when…	Stick with LCEL chains when…	Consider alternatives when…
Agent loops, branches, or cycles are required	Fixed extract-transform-load pipeline, no cycles	Team bans LangChain deps; raw SDK + hand-rolled FSM is smaller
Durable checkpoints and resume matter	Stateless one-shot completions	Temporal/Cadence already orchestrates long workflows
Human approval mid-run is common	No interrupts or multi-day pauses	MCP tool servers replace framework; graph is thin glue only
Multiple specialized subgraphs share a parent	Single model call with structured output	CrewAI-style role YAML is team standard and sufficient
LangSmith graph visualization is valuable	Prototype under 50 lines of Python	Eval shows prebuilt `create_react_agent` already meets requirements

Common pitfalls

Unbounded message state — full chat history in every checkpoint; summarize or trim before context limits break runs.
LLM-driven routing — parsing free-text “next step” for conditional edges causes non-deterministic loops.
No recursion_limit — tool/model cycles run until API budget exhaustion.
Shared thread IDs — concurrent users bleed state; always scope by tenant plus session.
Non-idempotent tools on resume — replaying a checkpoint re-executes nodes unless tools dedupe by idempotency key.
MemorySaver in production — process restart loses all in-flight agent work.
God-state objects — one giant TypedDict with 40 fields; split domain state and message state, document ownership per node.
Skipping interrupt UX — interrupts without a dashboard strand runs forever in paused state.

Practitioner checklist

Define state schema with explicit reducers for list and dict fields.
Map thread_id to your domain ID; enforce tenant isolation in checkpointer keys.
Use conditional edges with structured state fields, not raw model prose.
Set recursion_limit and per-run token budgets.
Deploy Postgres (or equivalent) checkpointer for production; test crash-resume paths.
Mark side-effecting nodes with interrupts or automated policy gates before execution.
Implement idempotent tools; log idempotency keys in checkpoint metadata.
Stream node updates to the UI for perceived latency wins on long runs.
Enable LangSmith tracing; tag graph version and model IDs per deployment.
Document subgraph boundaries and which tools each subgraph may call.

Key takeaways

LangGraph models agent workflows as graphs with explicit state, nodes, and edges — including cycles.
Reducers define how partial node outputs merge into shared state.
Checkpoints enable resume, human-in-the-loop, and audit after failures or approvals.
Conditional edges should route on structured state, not ambiguous model text.
Subgraphs compose multi-agent systems with enforced boundaries and nested tracing.