Guide
SQLAlchemy fundamentals explained
Your Python API needs to persist orders, users, and inventory in
PostgreSQL
without hand-assembling SQL strings on every route. SQLAlchemy is the
de facto Python database toolkit: a low-level Core
for explicit SQL expression construction, plus a high-level ORM that
maps Python classes to tables and rows to objects. Unlike
Django’s
bundled ORM (which only works inside Django), SQLAlchemy plugs into
Flask,
FastAPI,
data pipelines, and CLI scripts alike. Version 2.0 unified the API around
select() and session.execute(), dropping the legacy
session.query() style most tutorials still show. This guide covers
Engine and pooling, declarative models, the 2.0 query API, relationships and
loading strategies, session transactions, Alembic migrations, async patterns,
a Harbor Commerce inventory API worked example, an ORM decision table, common
pitfalls, and a production checklist alongside our
Python fundamentals guide.
What SQLAlchemy is (Core vs ORM)
SQLAlchemy has two layers you can mix freely:
- Core —
Table,Column, and composable SQL expressions (select,insert,update) that compile to parameterized SQL for any supported dialect. - ORM —
DeclarativeBasemapped classes,relationship()for joins, and theSessionidentity map that tracks object changes and flushes them as INSERT/UPDATE statements.
The Engine is the entry point: it holds a connection pool to your database URL and hands out connections. A typical sync stack looks like:
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
engine = create_engine(
"postgresql+psycopg://user:pass@localhost/harbor",
pool_size=10,
max_overflow=20,
pool_pre_ping=True,
)
SessionLocal = sessionmaker(bind=engine)
SQLAlchemy is not a migration tool (use Alembic for that), not an admin UI, and not a replacement for understanding SQL — complex reports and window functions often belong in Core or raw SQL even when the rest of the app uses the ORM.
Declarative models
SQLAlchemy 2.0 style models inherit from DeclarativeBase and use
mapped_column() with Python type annotations:
from datetime import datetime
from sqlalchemy import String, ForeignKey, func
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column, relationship
class Base(DeclarativeBase):
pass
class Product(Base):
__tablename__ = "products"
id: Mapped[int] = mapped_column(primary_key=True)
sku: Mapped[str] = mapped_column(String(32), unique=True, index=True)
name: Mapped[str] = mapped_column(String(200))
price_cents: Mapped[int]
created_at: Mapped[datetime] = mapped_column(server_default=func.now())
variants: Mapped[list["Variant"]] = relationship(back_populates="product")
class Variant(Base):
__tablename__ = "variants"
id: Mapped[int] = mapped_column(primary_key=True)
product_id: Mapped[int] = mapped_column(ForeignKey("products.id"))
size: Mapped[str] = mapped_column(String(16))
stock_qty: Mapped[int] = mapped_column(default=0)
product: Mapped["Product"] = relationship(back_populates="variants")
Key modeling choices:
- Explicit types —
Mapped[str]maps to VARCHAR; useOptionalfor nullable columns. - Indexes and constraints — declare
unique=True,index=True, andForeignKeyon columns, not only in migrations. - Server defaults —
server_default=func.now()lets the database set timestamps; avoid Python-side defaults for clock consistency across app servers. - Table naming — pick a convention (snake_case plural) and stick to it; Alembic autogenerate depends on stable model definitions.
SQLAlchemy 2.0 query API
The modern pattern uses select() with session.scalars()
or session.execute():
from sqlalchemy import select
from sqlalchemy.orm import selectinload
with SessionLocal() as session:
stmt = (
select(Product)
.where(Product.sku == "HBR-MUG-01")
.options(selectinload(Product.variants))
)
product = session.scalars(stmt).one_or_none()
low_stock = session.scalars(
select(Variant).where(Variant.stock_qty < 5)
).all()
Compared to the legacy session.query(Product).filter_by(...) API,
2.0 queries compose like SQLAlchemy Core expressions — you can reuse
where clauses, subqueries, and CTEs across ORM and raw SQL layers.
For inserts and updates:
from sqlalchemy import insert, update
session.execute(
insert(Variant).values(product_id=1, size="L", stock_qty=50)
)
session.execute(
update(Variant)
.where(Variant.id == 42)
.values(stock_qty=Variant.stock_qty - 1)
)
session.commit()
ORM shortcut: mutate mapped objects and let the session track changes:
variant = session.get(Variant, 42)
variant.stock_qty -= 1
session.commit()
Relationships and loading strategies
relationship() defines how ORM classes join. Without a loading
strategy, accessing product.variants triggers a lazy SELECT per
product — the classic N+1 query problem. SQLAlchemy offers:
- selectinload — second query with
WHERE id IN (...); best default for collections. - joinedload — single query with LEFT OUTER JOIN; good for many-to-one parent fetches.
- subqueryload — older pattern; prefer selectinload in 2.0 apps.
- raise_on_sql (testing) — fail tests that accidentally lazy-load.
Many-to-many uses an association table:
from sqlalchemy import Table, Column, Integer, ForeignKey
order_items = Table(
"order_items",
Base.metadata,
Column("order_id", ForeignKey("orders.id"), primary_key=True),
Column("variant_id", ForeignKey("variants.id"), primary_key=True),
Column("qty", Integer, nullable=False),
)
For read-heavy list endpoints, consider Core select() with explicit
joins instead of deep ORM graphs — fewer surprises in query plans. Pair with
database indexing
on foreign keys and filter columns.
Session lifecycle and transactions
The Session is a unit-of-work scope: it tracks dirty objects, orders
flushes, and wraps transactions. Rules that prevent production bugs:
- One session per request — create at request start, close in a finally block or dependency-injection teardown.
- Explicit commit/rollback — call
session.commit()only after all related writes succeed;rollback()on any exception. - expire_on_commit — default True refreshes objects after commit; set False only if you understand stale attribute risks.
- Nested transactions — use
session.begin_nested()savepoints for partial rollback inside a larger transaction.
FastAPI dependency pattern:
def get_db():
db = SessionLocal()
try:
yield db
db.commit()
except Exception:
db.rollback()
raise
finally:
db.close()
Long-running Celery tasks need their own session — never share a session across threads or async tasks. Pass primary keys, not ORM instances, into background workers.
Alembic migrations
Alembic is SQLAlchemy’s companion migration tool. It versions
schema changes as Python revision files and tracks applied revisions in an
alembic_version table:
alembic init migrations
alembic revision --autogenerate -m "add products table"
alembic upgrade head
Autogenerate compares Base.metadata to the live database — it
catches new tables and columns but misses renames (it sees drop +
add), check constraint tweaks, and some index changes. Review every autogenerated
script before merging. Production workflow:
- Generate migrations in CI or locally against a schema snapshot, not production.
- Run
alembic upgrade headin deploy pipeline before traffic shifts. - Keep migrations backward-compatible for blue/green deploys when possible (add column nullable first, backfill, then enforce NOT NULL).
- Never edit applied revision files; add a new revision to fix mistakes.
Connection pooling and async SQLAlchemy
The Engine’s pool amortizes TCP and auth handshake cost. Tune for your deployment:
- pool_size + max_overflow — cap total connections below PostgreSQL
max_connectionsdivided by app instances. - pool_pre_ping — test connections before checkout; essential behind PgBouncer or after idle timeouts.
- PgBouncer transaction mode — disable server-side prepared statements (
prepare_threshold=Nonein psycopg3) when pooling externally.
Async SQLAlchemy uses create_async_engine with
asyncpg or psycopg async drivers and
async_sessionmaker. The ORM API is the same select()
style inside async with session blocks. Do not mix sync sessions
inside async route handlers without run_in_executor — you will
block the event loop.
Worked example: Harbor Commerce inventory API
Harbor Commerce runs a Flask/FastAPI hybrid: Flask serves the admin UI; FastAPI exposes a JSON inventory API backed by SQLAlchemy on PostgreSQL. Requirements: reserve stock atomically when checkout starts, release on payment timeout, and never oversell limited-edition mugs.
Schema
products, variants, and stock_reservations
with a partial unique index on active reservations per variant. Foreign keys use
ON DELETE CASCADE for variants when products archive.
Reserve flow
def reserve_stock(session, variant_id: int, qty: int, order_id: str):
variant = session.execute(
select(Variant).where(Variant.id == variant_id).with_for_update()
).scalar_one()
if variant.stock_qty < qty:
raise InsufficientStock()
variant.stock_qty -= qty
session.add(StockReservation(
variant_id=variant_id, order_id=order_id, qty=qty, expires_at=utcnow() + timedelta(minutes=15)
))
session.commit()
with_for_update() row-locks the variant during the transaction so
concurrent checkouts cannot pass the stock check simultaneously. A Celery Beat task
runs every minute to delete expired reservations and restore stock_qty.
List endpoint
The catalog API uses selectinload(Product.variants) in one query
batch instead of 200 lazy loads for a 50-product page. Response caching sits in
Redis with
a 60-second TTL; cache keys invalidate on successful commit() via
SQLAlchemy after_commit event listeners.
ORM decision table
| Need | Prefer | Why |
|---|---|---|
| Python web app, any framework | SQLAlchemy | Framework-agnostic; Core + ORM flexibility |
| Django-only monolith with admin | Django ORM | Integrated migrations, admin, QuerySet ergonomics |
| TypeScript Next.js API | Prisma or Drizzle | Generated types; see our Prisma guide |
| Complex analytics / ETL | SQLAlchemy Core or raw SQL | Window functions, bulk COPY, no identity map overhead |
| Embedded local-first app | SQLite + SQLAlchemy | Same ORM code path; swap URL for Postgres later |
| Serverless edge functions | HTTP SQL proxy or Drizzle | Connection pooling is hard; avoid long-lived Engine per invocation |
Common pitfalls
- Implicit lazy loads in loops — hundreds of queries per request; use selectinload or explicit joins.
- Detached instances — passing ORM objects outside session scope after close triggers
DetachedInstanceError; pass IDs or DTOs. - Missing transactions on multi-step writes — partial updates without rollback corrupt inventory state.
- Autogenerate blind trust — Alembic drops columns you still need in production.
- Unicode and timezone naive datetimes — store UTC in
timestamptz; usedatetime.timezone.utc. - Global session anti-pattern — one module-level Session shared across requests causes cross-request data leaks.
- Over-mapping JSON blobs — PostgreSQL JSONB is fine, but ORM change tracking on huge dicts is slow; consider Core for patch updates.
- Connection pool per worker explosion — 16 Gunicorn workers × pool_size 20 exhausts Postgres; math pools before deploy.
Production checklist
- Pin SQLAlchemy 2.x, driver (psycopg3/asyncpg), and Alembic versions in lockfile.
- Set
pool_pre_ping=Trueand document total max connections per service. - Enforce one session per request/task with guaranteed close in finally.
- Review Alembic autogenerate diffs in PR; never run unchecked against production.
- Add indexes on every foreign key and frequent
WHEREcolumn. - Log slow queries (>100 ms) with statement text and bind parameters redacted.
- Integration-test critical flows with real Postgres, not SQLite-only, before release.
- Use
with_for_update()or idempotent upserts for inventory and payment rows. - Export pool checkout time and overflow metrics to your observability stack.
- Document rollback procedure for failed migrations (
alembic downgrade -1).
Key takeaways
- SQLAlchemy splits Core (SQL expressions) and ORM (mapped classes) — use both where each shines.
- SQLAlchemy 2.0 standardizes on
select()+session.execute/scalars; retiresession.query()in new code. - Sessions are scopes, not singletons; commit/rollback explicitly and never share across threads.
- Alembic versions schema changes; autogenerate is a draft, not gospel.
- Loading strategy choices matter as much as indexes for API latency.
Related reading
- PostgreSQL fundamentals explained — the database most SQLAlchemy Python apps target in production
- Flask fundamentals explained — microframework often paired with Flask-SQLAlchemy
- FastAPI fundamentals explained — async routes with injected SQLAlchemy sessions
- Prisma fundamentals explained — TypeScript-side ORM alternative for Node backends