Guide

Celery fundamentals explained

A customer clicks “Place order” on your storefront. The HTTP handler must return in under 200 ms, but fulfillment needs to charge the card, reserve inventory, generate a shipping label, and send email — steps that can fail independently and take seconds each. Celery is the de facto distributed task queue for Python: your web app enqueues a message to a broker, separate worker processes pull jobs and execute them, and results land in a backend store if you need them. Unlike FastAPI background tasks (which die when the process restarts) or raw RabbitMQ consumers you wire by hand, Celery gives you retries, routing, periodic schedules, and workflow primitives out of the box. This guide covers architecture and brokers, defining and calling tasks, worker concurrency and prefetch, acknowledgments and idempotency, Celery Beat for cron jobs, canvas chains and chords, monitoring with Flower, a Harbor Commerce order-fulfillment worked example, a task-queue decision table, common pitfalls, and a production checklist alongside our Python fundamentals guide and Redis fundamentals guide.

What Celery is (and what it is not)

Celery is a task execution framework, not a message broker. It sits between your application code and a broker such as Redis or RabbitMQ, serializing Python function calls into messages workers deserialize and run. The moving parts:

  • Producer — your Django, Flask, or FastAPI app calls send_email.delay(order_id).
  • Broker — holds the queue until a worker is ready (Redis list, RabbitMQ queue).
  • Worker — long-running process that imports your task module and executes jobs.
  • Result backend (optional) — stores return values so callers can poll AsyncResult.
  • Beat scheduler (optional) — separate process that enqueues periodic tasks on a cron-like timetable.

Celery is not a replacement for Kafka event streaming (no durable replay log for analytics), nor for synchronous gRPC RPC. It excels at fire-and-forget side effects, competing consumers on a work queue, and scheduled maintenance — thumbnail generation, report exports, webhook retries, nightly reconciliation.

Installation and minimal app

Install Celery with a broker client. Redis is the simplest starting point for small and medium deployments:

pip install "celery[redis]"

# tasks.py
from celery import Celery

app = Celery("harbor", broker="redis://localhost:6379/0",
             backend="redis://localhost:6379/1")

@app.task(bind=True, max_retries=3, default_retry_delay=60)
def charge_and_fulfill(self, order_id: str):
    try:
        payment = charge_card(order_id)
        create_label(order_id, payment)
    except TransientPaymentError as exc:
        raise self.retry(exc=exc)

# Start a worker
# celery -A tasks worker --loglevel=info --concurrency=4

The @app.task decorator registers the function name and module path in the message so any worker with the same code can execute it. Keep task modules importable without side effects — workers import them at startup.

Broker choice

  • Redis — fast, simple ops, good for queues under ~100k msgs/day; use separate DB indexes for broker vs result backend.
  • RabbitMQ — stronger AMQP routing, publisher confirms, quorum queues; better when you already run Rabbit for other services.
  • Amazon SQS — managed, pay-per-message; higher latency but zero broker patching.

Avoid using the result backend for high-volume fire-and-forget tasks — writing every return value to Redis adds write amplification you rarely read.

Calling tasks: delay, apply_async, and routing

task.delay(*args, **kwargs) is shorthand for apply_async. Use the full form when you need control:

charge_and_fulfill.apply_async(
    args=["ord_9f2a"],
    queue="payments",
    countdown=30,          # delay 30 seconds
    expires=3600,          # drop if not started within 1 hour
    priority=5,
)

Define multiple queues in config and route heavy jobs away from quick notifications:

app.conf.task_routes = {
    "tasks.charge_and_fulfill": {"queue": "payments"},
    "tasks.send_receipt_email": {"queue": "email"},
}

Run specialized workers per queue: celery -A tasks worker -Q payments --concurrency=2 so a stuck PDF render cannot starve payment retries.

Workers, concurrency, and prefetch

A Celery worker process can run tasks with different concurrency models:

  • prefork (default) — forked child processes; best for CPU-bound Python; sidesteps the GIL.
  • threads — lighter for I/O-bound HTTP calls; watch thread safety of shared clients.
  • gevent / eventlet — greenlets for massive I/O concurrency; requires monkey-patching and compatible libraries.
  • solo — one task at a time; useful for debugging.

Prefetch controls how many messages a worker reserves before finishing current jobs. High prefetch on long tasks causes unfair distribution — one worker hoards messages while others sit idle. Rule of thumb: worker_prefetch_multiplier=1 for tasks longer than a few seconds; default 4 is fine for sub-second email sends.

Scale horizontally by adding worker containers, not inflating --concurrency on a single machine until CPU is saturated. Pair with Docker Compose or Kubernetes Deployments where each pod runs one worker command.

Acknowledgments, retries, and idempotency

Celery acknowledges broker messages when a task finishes (or fails permanently). If a worker is killed mid-task, the message can be redelivered — tasks must be idempotent or guarded with deduplication keys.

  • acks_late=True — ack after success, not at start; safer but requires idempotency on crash.
  • reject_on_worker_lost=True — requeue if worker disappears.
  • max_retries + self.retry(exc=...) — exponential backoff for transient failures.
  • task_time_limit / task_soft_time_limit — kill runaway jobs.

Store a processing_status column or Redis SETNX lock keyed by order_id before charging cards. On retry, short-circuit if already charged. Failed tasks after max retries should land in a dead-letter queue or alert channel — Celery’s built-in task_reject_on_failure and broker DLX wiring vary by backend; many teams log to Sentry and move poison messages manually.

Celery Beat: periodic and cron schedules

Celery Beat is a scheduler process that enqueues tasks on an interval or crontab. Run exactly one Beat instance per schedule namespace (or use a distributed lock) to avoid duplicate cron fires:

from celery.schedules import crontab

app.conf.beat_schedule = {
    "nightly-inventory-reconcile": {
        "task": "tasks.reconcile_inventory",
        "schedule": crontab(hour=2, minute=15),
    },
    "poll-carrier-tracking": {
        "task": "tasks.refresh_shipment_status",
        "schedule": 300.0,  # every 5 minutes
    },
}

Beat stores last-run times in a local shelve file by default — ephemeral containers lose state on restart and may re-fire tasks. Production setups use django-celery-beat (database scheduler) or Redis-backed redbeat so schedule state survives deploys.

Canvas: chains, groups, and chords

Celery’s canvas primitives compose multi-step workflows:

  • chain — sequential: (validate.s(order_id) | charge.s() | ship.s())()
  • group — parallel fan-out: group(send_email.s(id) for id in ids)
  • chord — parallel then callback: chord(group(...), finalize.s())

Canvas results require the result backend. Chords are notoriously fragile on Redis (lost callback headers under broker restarts) — prefer RabbitMQ or orchestrate sagas explicitly in code with status rows for mission-critical pipelines. For most order flows, a single task with internal steps and structured logging is simpler than a deep chain.

Monitoring, configuration, and deployment

Operate Celery like any production service:

  • Flower — web UI for worker heartbeats, active tasks, revoke; protect with auth in production.
  • Structured logs — include task_id and business keys in every log line; correlate with HTTP request IDs.
  • Metrics — export queue depth, task runtime histograms, and failure rates to Prometheus.
  • Health checks — workers respond to celery inspect ping; orchestrators should restart unresponsive pods.
  • Config — centralize in celeryconfig.py or env vars; never hardcode broker URLs.

Deploy workers independently of web processes. Rolling out new task code requires draining or versioning: old workers cannot deserialize new task signatures. Blue/green worker pools or feature flags inside tasks reduce mixed-version crashes during deploys.

Harbor Commerce order fulfillment: worked example

Harbor Commerce’s checkout API returns 202 Accepted with an order ID while Celery handles side effects. June 2026 architecture (illustrative):

  1. POST /orders — FastAPI validates cart, writes orders row as pending, enqueues fulfill_order.delay(order_id), returns JSON immediately.
  2. Broker — Redis DB 0; separate payments and notifications queues with routed tasks.
  3. fulfill_order — idempotent lock on order_id; calls Stripe; on success updates row to paid, enqueues create_label and send_receipt in parallel via group.
  4. Retries — payment task max_retries=5 with exponential backoff; inventory release on permanent failure.
  5. Beatreconcile_stuck_orders every 15 minutes finds pending older than 1 hour and alerts ops.
  6. Observability — OpenTelemetry trace context injected into task headers; Flower read-only behind SSO.
  7. Verdict — p95 checkout latency 85 ms; fulfillment p95 4.2 s async; zero duplicate charges after idempotency keys shipped in Q1.

The pattern decouples user-perceived speed from partner API slowness — the same reason teams pair Celery with transactional outbox when the enqueue must survive database commit.

Task-queue decision table

NeedBest fitWhy
Python background jobs with retries and cronCelery + Redis/RabbitMQMature ecosystem, routing, Beat, canvas workflows.
Minimal Redis-only queue, tiny codebaseRQ (Redis Queue)Simpler than Celery; fewer features, easier onboarding.
Alternative Python worker with sane defaultsDramatiqSmaller API surface; good middleware story.
Work inside one FastAPI process, best-effortFastAPI BackgroundTasksNo broker ops; tasks lost on restart; fine for non-critical hooks.
Polyglot consumers, complex routingRabbitMQ directly or via CeleryAMQP exchanges; Celery adds Python ergonomics on top.
High-volume event log, replay, stream processingKafkaDurable partitioned log; not a task queue replacement.
Serverless, no workers to patchSQS + LambdaOps-free at cost of cold starts and vendor coupling.
Scheduled k8s batch without app codeCronJobShell scripts and ETL; not for per-user request fan-out.

Common pitfalls

  • Passing non-serializable objects — ORM models and open file handles break JSON serialization; pass IDs only.
  • Import cycles — tasks defined in views.py that import views cause worker startup failures; isolate tasks module.
  • Running Beat twice — duplicate cron enqueues; use one Beat or distributed scheduler lock.
  • Ignoring idempotency — late acks and retries double-charge customers without dedup keys.
  • Monolithic concurrency — one queue for 30s PDF jobs and 200 ms emails starves the fast path.
  • Result backend on everything — Redis fills with stale AsyncResult blobs nobody polls.
  • Chords on Redis in production — lost chord callbacks under restarts; use RabbitMQ or avoid chords.
  • Deploying code before workers — web enqueues new task names old workers cannot find; deploy workers first or use versioned task names.

Production checklist

  • Pin Celery and broker client versions; test upgrades in staging with real queue depth.
  • Separate broker and result backend Redis databases (or disable results for fire-and-forget).
  • Configure task_routes for at least payments vs notifications vs heavy batch.
  • Set acks_late=True with idempotent tasks for anything financial.
  • Define task_time_limit and max_retries on every external API call.
  • Run Beat with persistent schedule store (database or redbeat), not default shelve.
  • Export queue depth and task failure metrics; alert on DLQ growth.
  • Protect Flower or replace with CLI inspect + Grafana dashboards.
  • Document worker deploy order relative to web deploys.
  • Load-test broker under 2× peak enqueue rate before Black Friday.

Key takeaways

  • Celery executes Python functions asynchronously via a broker — it is not the broker itself.
  • Workers scale horizontally; tune concurrency model and prefetch per task duration.
  • Retries and late acks demand idempotent task bodies and deduplication keys.
  • Celery Beat handles cron; persist schedule state in production.
  • Choose RQ or BackgroundTasks for trivial queues; Kafka when you need event replay, not Celery.

Related reading