Guide

SQLAlchemy fundamentals explained

Your Python API needs to persist orders, users, and inventory in PostgreSQL without hand-assembling SQL strings on every route. SQLAlchemy is the de facto Python database toolkit: a low-level Core for explicit SQL expression construction, plus a high-level ORM that maps Python classes to tables and rows to objects. Unlike Django’s bundled ORM (which only works inside Django), SQLAlchemy plugs into Flask, FastAPI, data pipelines, and CLI scripts alike. Version 2.0 unified the API around select() and session.execute(), dropping the legacy session.query() style most tutorials still show. This guide covers Engine and pooling, declarative models, the 2.0 query API, relationships and loading strategies, session transactions, Alembic migrations, async patterns, a Harbor Commerce inventory API worked example, an ORM decision table, common pitfalls, and a production checklist alongside our Python fundamentals guide.

What SQLAlchemy is (Core vs ORM)

SQLAlchemy has two layers you can mix freely:

Core — Table, Column, and composable SQL expressions (select, insert, update) that compile to parameterized SQL for any supported dialect.
ORM — DeclarativeBase mapped classes, relationship() for joins, and the Session identity map that tracks object changes and flushes them as INSERT/UPDATE statements.

The Engine is the entry point: it holds a connection pool to your database URL and hands out connections. A typical sync stack looks like:

from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker

engine = create_engine(
    "postgresql+psycopg://user:pass@localhost/harbor",
    pool_size=10,
    max_overflow=20,
    pool_pre_ping=True,
)
SessionLocal = sessionmaker(bind=engine)

SQLAlchemy is not a migration tool (use Alembic for that), not an admin UI, and not a replacement for understanding SQL — complex reports and window functions often belong in Core or raw SQL even when the rest of the app uses the ORM.

Declarative models

SQLAlchemy 2.0 style models inherit from DeclarativeBase and use mapped_column() with Python type annotations:

from datetime import datetime
from sqlalchemy import String, ForeignKey, func
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column, relationship

class Base(DeclarativeBase):
    pass

class Product(Base):
    __tablename__ = "products"

    id: Mapped[int] = mapped_column(primary_key=True)
    sku: Mapped[str] = mapped_column(String(32), unique=True, index=True)
    name: Mapped[str] = mapped_column(String(200))
    price_cents: Mapped[int]
    created_at: Mapped[datetime] = mapped_column(server_default=func.now())

    variants: Mapped[list["Variant"]] = relationship(back_populates="product")

class Variant(Base):
    __tablename__ = "variants"

    id: Mapped[int] = mapped_column(primary_key=True)
    product_id: Mapped[int] = mapped_column(ForeignKey("products.id"))
    size: Mapped[str] = mapped_column(String(16))
    stock_qty: Mapped[int] = mapped_column(default=0)

    product: Mapped["Product"] = relationship(back_populates="variants")

Key modeling choices:

Explicit types — Mapped[str] maps to VARCHAR; use Optional for nullable columns.
Indexes and constraints — declare unique=True, index=True, and ForeignKey on columns, not only in migrations.
Server defaults — server_default=func.now() lets the database set timestamps; avoid Python-side defaults for clock consistency across app servers.
Table naming — pick a convention (snake_case plural) and stick to it; Alembic autogenerate depends on stable model definitions.

SQLAlchemy 2.0 query API

The modern pattern uses select() with session.scalars() or session.execute():

from sqlalchemy import select
from sqlalchemy.orm import selectinload

with SessionLocal() as session:
    stmt = (
        select(Product)
        .where(Product.sku == "HBR-MUG-01")
        .options(selectinload(Product.variants))
    )
    product = session.scalars(stmt).one_or_none()

    low_stock = session.scalars(
        select(Variant).where(Variant.stock_qty < 5)
    ).all()

Compared to the legacy session.query(Product).filter_by(...) API, 2.0 queries compose like SQLAlchemy Core expressions — you can reuse where clauses, subqueries, and CTEs across ORM and raw SQL layers. For inserts and updates:

from sqlalchemy import insert, update

session.execute(
    insert(Variant).values(product_id=1, size="L", stock_qty=50)
)
session.execute(
    update(Variant)
    .where(Variant.id == 42)
    .values(stock_qty=Variant.stock_qty - 1)
)
session.commit()

ORM shortcut: mutate mapped objects and let the session track changes:

variant = session.get(Variant, 42)
variant.stock_qty -= 1
session.commit()

Relationships and loading strategies

relationship() defines how ORM classes join. Without a loading strategy, accessing product.variants triggers a lazy SELECT per product — the classic N+1 query problem. SQLAlchemy offers:

selectinload — second query with WHERE id IN (...); best default for collections.
joinedload — single query with LEFT OUTER JOIN; good for many-to-one parent fetches.
subqueryload — older pattern; prefer selectinload in 2.0 apps.
raise_on_sql (testing) — fail tests that accidentally lazy-load.

Many-to-many uses an association table:

from sqlalchemy import Table, Column, Integer, ForeignKey

order_items = Table(
    "order_items",
    Base.metadata,
    Column("order_id", ForeignKey("orders.id"), primary_key=True),
    Column("variant_id", ForeignKey("variants.id"), primary_key=True),
    Column("qty", Integer, nullable=False),
)

For read-heavy list endpoints, consider Core select() with explicit joins instead of deep ORM graphs — fewer surprises in query plans. Pair with database indexing on foreign keys and filter columns.

Session lifecycle and transactions

The Session is a unit-of-work scope: it tracks dirty objects, orders flushes, and wraps transactions. Rules that prevent production bugs:

One session per request — create at request start, close in a finally block or dependency-injection teardown.
Explicit commit/rollback — call session.commit() only after all related writes succeed; rollback() on any exception.
expire_on_commit — default True refreshes objects after commit; set False only if you understand stale attribute risks.
Nested transactions — use session.begin_nested() savepoints for partial rollback inside a larger transaction.

FastAPI dependency pattern:

def get_db():
    db = SessionLocal()
    try:
        yield db
        db.commit()
    except Exception:
        db.rollback()
        raise
    finally:
        db.close()

Long-running Celery tasks need their own session — never share a session across threads or async tasks. Pass primary keys, not ORM instances, into background workers.

Alembic migrations

Alembic is SQLAlchemy’s companion migration tool. It versions schema changes as Python revision files and tracks applied revisions in an alembic_version table:

alembic init migrations
alembic revision --autogenerate -m "add products table"
alembic upgrade head

Autogenerate compares Base.metadata to the live database — it catches new tables and columns but misses renames (it sees drop + add), check constraint tweaks, and some index changes. Review every autogenerated script before merging. Production workflow:

Generate migrations in CI or locally against a schema snapshot, not production.
Run alembic upgrade head in deploy pipeline before traffic shifts.
Keep migrations backward-compatible for blue/green deploys when possible (add column nullable first, backfill, then enforce NOT NULL).
Never edit applied revision files; add a new revision to fix mistakes.

Connection pooling and async SQLAlchemy

The Engine’s pool amortizes TCP and auth handshake cost. Tune for your deployment:

pool_size + max_overflow — cap total connections below PostgreSQL max_connections divided by app instances.
pool_pre_ping — test connections before checkout; essential behind PgBouncer or after idle timeouts.
PgBouncer transaction mode — disable server-side prepared statements (prepare_threshold=None in psycopg3) when pooling externally.

Async SQLAlchemy uses create_async_engine with asyncpg or psycopg async drivers and async_sessionmaker. The ORM API is the same select() style inside async with session blocks. Do not mix sync sessions inside async route handlers without run_in_executor — you will block the event loop.

Worked example: Harbor Commerce inventory API

Harbor Commerce runs a Flask/FastAPI hybrid: Flask serves the admin UI; FastAPI exposes a JSON inventory API backed by SQLAlchemy on PostgreSQL. Requirements: reserve stock atomically when checkout starts, release on payment timeout, and never oversell limited-edition mugs.

Schema

products, variants, and stock_reservations with a partial unique index on active reservations per variant. Foreign keys use ON DELETE CASCADE for variants when products archive.

Reserve flow

def reserve_stock(session, variant_id: int, qty: int, order_id: str):
    variant = session.execute(
        select(Variant).where(Variant.id == variant_id).with_for_update()
    ).scalar_one()
    if variant.stock_qty < qty:
        raise InsufficientStock()
    variant.stock_qty -= qty
    session.add(StockReservation(
        variant_id=variant_id, order_id=order_id, qty=qty, expires_at=utcnow() + timedelta(minutes=15)
    ))
    session.commit()

with_for_update() row-locks the variant during the transaction so concurrent checkouts cannot pass the stock check simultaneously. A Celery Beat task runs every minute to delete expired reservations and restore stock_qty.

List endpoint

The catalog API uses selectinload(Product.variants) in one query batch instead of 200 lazy loads for a 50-product page. Response caching sits in Redis with a 60-second TTL; cache keys invalidate on successful commit() via SQLAlchemy after_commit event listeners.

ORM decision table

Need	Prefer	Why
Python web app, any framework	SQLAlchemy	Framework-agnostic; Core + ORM flexibility
Django-only monolith with admin	Django ORM	Integrated migrations, admin, QuerySet ergonomics
TypeScript Next.js API	Prisma or Drizzle	Generated types; see our Prisma guide
Complex analytics / ETL	SQLAlchemy Core or raw SQL	Window functions, bulk COPY, no identity map overhead
Embedded local-first app	SQLite + SQLAlchemy	Same ORM code path; swap URL for Postgres later
Serverless edge functions	HTTP SQL proxy or Drizzle	Connection pooling is hard; avoid long-lived Engine per invocation

Common pitfalls

Implicit lazy loads in loops — hundreds of queries per request; use selectinload or explicit joins.
Detached instances — passing ORM objects outside session scope after close triggers DetachedInstanceError; pass IDs or DTOs.
Missing transactions on multi-step writes — partial updates without rollback corrupt inventory state.
Autogenerate blind trust — Alembic drops columns you still need in production.
Unicode and timezone naive datetimes — store UTC in timestamptz; use datetime.timezone.utc.
Global session anti-pattern — one module-level Session shared across requests causes cross-request data leaks.
Over-mapping JSON blobs — PostgreSQL JSONB is fine, but ORM change tracking on huge dicts is slow; consider Core for patch updates.
Connection pool per worker explosion — 16 Gunicorn workers × pool_size 20 exhausts Postgres; math pools before deploy.

Production checklist

Pin SQLAlchemy 2.x, driver (psycopg3/asyncpg), and Alembic versions in lockfile.
Set pool_pre_ping=True and document total max connections per service.
Enforce one session per request/task with guaranteed close in finally.
Review Alembic autogenerate diffs in PR; never run unchecked against production.
Add indexes on every foreign key and frequent WHERE column.
Log slow queries (>100 ms) with statement text and bind parameters redacted.
Integration-test critical flows with real Postgres, not SQLite-only, before release.
Use with_for_update() or idempotent upserts for inventory and payment rows.
Export pool checkout time and overflow metrics to your observability stack.
Document rollback procedure for failed migrations (alembic downgrade -1).

Key takeaways

SQLAlchemy splits Core (SQL expressions) and ORM (mapped classes) — use both where each shines.
SQLAlchemy 2.0 standardizes on select() + session.execute/scalars; retire session.query() in new code.
Sessions are scopes, not singletons; commit/rollback explicitly and never share across threads.
Alembic versions schema changes; autogenerate is a draft, not gospel.
Loading strategy choices matter as much as indexes for API latency.