Guide
SQL query optimization explained
A support dashboard times out whenever someone searches orders by customer email.
The team adds an index on orders.customer_id — no improvement.
The real problem is a WHERE LOWER(email) LIKE '%@acme.com' predicate
that cannot use a B-tree, a join that starts from the million-row line-items table
instead of the filtered orders set, and an ORM that emits SELECT * plus
forty lazy-loaded queries per page. SQL query optimization is the
discipline of reading what the database actually does (via EXPLAIN),
rewriting predicates so indexes apply (SARGable clauses), choosing
efficient join and aggregation shapes, and fixing application-layer patterns that
multiply round trips. This guide covers planner output, predicate and join tuning,
pagination strategies, ORM pitfalls, a Harbor Commerce order search worked example,
a decision table versus indexing-only and
materialized views,
common mistakes, and a production checklist — assuming familiarity with
PostgreSQL
or similar relational engines and baseline
index design.
The optimization workflow
Treat slow-query work as a repeatable loop, not a heroic one-off. Capture the exact
SQL (parameter values included), run EXPLAIN (ANALYZE, BUFFERS) on
production-like data volumes, identify the most expensive plan node, change one
variable (predicate, index, join hint, statistics), and measure again. Skipping
measurement is how teams accumulate indexes that never get used.
What to measure
- Wall-clock latency — p50 and p99 under concurrent load, not a single manual run in psql.
- Rows examined vs returned — a query that reads 2M rows to return 20 is a scan problem even if an index exists elsewhere.
- Buffer hits and reads —
BUFFERSin Postgres shows whether data is cached or hitting disk. - Lock wait time — sometimes the query is fast but blocked by a long-running migration or
ACCESS EXCLUSIVElock.
Log queries above a threshold (e.g. 200ms) with normalized fingerprints. Tools like
pg_stat_statements, MySQL’s performance_schema, or APM traces tied
to ORM spans make recurring offenders visible before customers complain.
Reading EXPLAIN plans
EXPLAIN shows the planner’s predicted plan; EXPLAIN
ANALYZE executes the query and reports actual row counts and timings. Mismatches
between estimated and actual rows usually mean stale statistics —
run ANALYZE on the table before concluding the planner is wrong.
Nodes to recognize
- Seq Scan — full table read; acceptable on small tables or when most rows match; red flag on large fact tables.
- Index Scan / Index Only Scan — B-tree (or other index) lookup; Index Only avoids heap fetches when a covering index includes all selected columns.
- Bitmap Index Scan — combines multiple index predicates; common for moderate selectivity ranges.
- Nested Loop — for each row in the outer table, probe the inner; great when the outer is tiny and the inner is indexed.
- Hash Join — builds a hash table on the smaller side; preferred for larger equi-joins without selective indexes.
- Sort / HashAggregate — expensive on large inputs; sometimes avoidable via pre-sorted indexes or incremental aggregates.
Read plans inside-out or bottom-up: find the node with the highest actual time. A mis-estimated nested loop that runs billions of inner probes dominates the bill even if every other step looks cheap. Compare plans before and after rewrites; screenshot or store them in tickets so future engineers see why a predicate changed.
SARGable predicates
SARGable (Search ARGument ABLE) means the predicate allows the engine to use an index on the raw column. Wrapping a column in a function or casting unpredictably forces scans.
Patterns that break indexes
WHERE YEAR(created_at) = 2026— rewrite to a range:created_at >= '2026-01-01' AND created_at < '2027-01-01'.WHERE LOWER(email) = 'user@example.com'— store a normalizedemail_lowercolumn or use a functional index onlower(email).WHERE amount * 1.1 > 100— isolate the column:amount > 100 / 1.1.- Leading wildcard
LIKE '%foo%'— use full-text search (GIN/tsvector) or trigram indexes for substring search; prefix patternsLIKE 'foo%'can use B-trees. ORacross different columns — often defeats single-index plans; considerUNION ALLof two indexed queries or partial indexes per branch.
Implicit type coercion also hurts: comparing a varchar column to an
integer may cast every row. Align types in the schema and in application code.
When filters are optional, dynamic SQL or planner-friendly (@param IS NULL OR
col = @param) patterns need careful testing — they sometimes suppress
index use unless rewritten with UNION branches.
Join and subquery tuning
Join order matters. The planner usually picks well if statistics are fresh, but star-schema reporting queries with ten joins may need reshaping: filter driving tables early in a CTE or subquery so later joins process fewer rows.
Practical rules
- Join on indexed keys — foreign keys should be indexed on the child side; missing FK indexes are a top cause of slow deletes on parents.
- Avoid SELECT * in joined queries — wide rows bloat memory and prevent index-only scans; project only needed columns.
- Prefer EXISTS over IN (subquery) when checking membership — semantics differ for NULLs, but EXISTS often short-circuits earlier.
- Correlated subqueries — sometimes fine; sometimes rewrite to a join or lateral join (
LATERALin Postgres) for one-pass execution. - Cross joins by accident — missing join conditions explode row counts; always sanity-check row estimates in EXPLAIN.
For analytical workloads on columnar stores, join order and distribution keys dominate; see ClickHouse for merge-tree specifics. OLTP tuning on PostgreSQL usually focuses on selective predicates and FK indexes first.
Pagination, sorting, and aggregation
Keyset vs OFFSET
ORDER BY created_at DESC LIMIT 20 OFFSET 100000 forces the database to
scan and discard 100,020 rows on every page turn. Keyset pagination
(cursor pagination) uses the last seen sort key:
WHERE (created_at, id) < ($last_ts, $last_id) ORDER BY created_at DESC, id DESC LIMIT 20
with a compound index on (created_at, id). Trade-off: jumping to arbitrary
page numbers is awkward; infinite scroll and API cursors fit this model well.
Aggregation without melting memory
- Pre-aggregate with
GROUP BYon indexed dimensions before joining to wide tables. - Use
COUNT(*)approximations (EXPLAINestimates, HyperLogLog, or rolling counters) when exact totals are not required on dashboards. - Roll up hot metrics into summary tables or materialized views refreshed on a schedule when real-time precision is unnecessary.
- Partition large time-series tables so monthly queries prune old partitions automatically.
ORM and application-layer pitfalls
Frameworks like SQLAlchemy and Django ORM generate SQL you do not see until production traffic arrives. Common failure modes:
- N+1 queries — looping over parents and lazy-loading children; fix with
joinedload,selectinload, or explicitprefetch_related. - Implicit casts in ORM filters — Python
datetimevs timezone-naive columns; enforce UTC at the boundary. - Loading full objects for counts — use
query.count()orSELECT COUNT(*), notlen(list(q.all())). - Unbounded queries — always cap list endpoints; stream exports instead of materializing millions of ORM instances.
- Transaction scope too wide — holding locks while calling external APIs; keep transactions short around pure SQL work.
Enable SQL logging in staging with query timing. Copy the worst offenders into migration tickets as raw SQL with EXPLAIN output so DBAs and app engineers share one source of truth.
Worked example: Harbor Commerce order search
Harbor Commerce’s admin API lists orders filtered by status, date range, and
customer email substring. The initial ORM query joins orders,
customers, and order_lines, applies
LOWER(customers.email) LIKE '%' || :term || '%', sorts by
orders.created_at DESC, and paginates with OFFSET.
At 400k orders, p99 latency exceeds 8 seconds.
Diagnosis (EXPLAIN ANALYZE highlights)
- Seq Scan on
customersdue toLOWER(email)wrapper. - Hash Join builds from
order_lines(2.1M rows) before status filter applied. - Sort node buffers 400k rows for OFFSET page 500.
- Twenty follow-up queries load line items (N+1) per request.
Remediation shipped
- Added
customers.email_normalizedmaintained by trigger; trigram GIN index foremail_normalized LIKE :prefix || '%'when search term length ≥ 3; fallback to exact match on short terms. - Rewrote query to filter
ordersby status andcreated_atrange first using index(status, created_at DESC), then join to customers only for surviving rows. - Replaced OFFSET with keyset cursor on
(created_at, order_id). - SQLAlchemy
selectinload(Order.lines)for one round trip of line items. - Dashboard totals served from hourly
order_stats_dailyrollup table.
p99 dropped to 120ms at the same concurrency. No new hardware — just predicate, join, pagination, and ORM fixes aligned with how Postgres actually plans.
Optimization strategy decision table
| Symptom | First lever | Escalate to… |
|---|---|---|
| Full table scan on large table | SARGable predicate + matching index | Partial/covering index, partition pruning, or archive cold data |
| Index exists but unused | Check function wraps, type mismatch, stale stats | Functional index, expression rewrite, or ANALYZE |
| Slow join | Index FK columns; reduce row count early | Query reshape, denormalized summary column, or read replica |
| Deep OFFSET pagination | Keyset cursor on sort key | Search engine (Elasticsearch) for arbitrary page jumps |
| Heavy dashboard aggregates | Rollup table or materialized view | Columnar warehouse (ClickHouse, BigQuery) for ad hoc analytics |
| Many similar short queries | Connection pooling, prepared statements | Cache layer (Redis) with explicit invalidation |
| ORM N+1 | Eager loading, batch queries | Raw SQL for hot path; DTO projection without full entities |
Common pitfalls
- Indexing every column — slows writes and confuses the planner; index for real query shapes.
- Hinting without measurement — join hints and planner settings are last resorts, not defaults.
- Tuning on empty dev databases — plans that work at 1k rows fail at 10M; use restored snapshots or synthetic volume.
- Ignoring vacuum and bloat — dead tuples inflate scans; monitor autovacuum and index bloat.
- Optimizing the wrong query — a nightly batch job matters more than a rare admin report; rank by total DB time (frequency × latency).
- Cache as a band-aid — Redis fixes hot reads but hides inconsistent data; fix SQL first for correctness-critical paths.
- Shipping migrations that lock tables —
CONCURRENTLYindex builds and expand-contract schema changes avoid query pileups.
Practitioner checklist
- Capture slow queries with fingerprints and parameter samples in production logs.
- Run
EXPLAIN (ANALYZE, BUFFERS)on representative data before changing schema. - Rewrite non-SARGable predicates to range or equality forms on indexed columns.
- Verify foreign-key child columns are indexed; add covering indexes for hot SELECT lists.
- Replace deep OFFSET pagination with keyset cursors on stable sort keys.
- Profile ORM-generated SQL in staging; eliminate N+1 with eager loading or batch queries.
- Pre-aggregate dashboard metrics when exact real-time totals are not required.
- Re-test after
ANALYZEand after major data distribution shifts (launches, backfills). - Document plan diffs in PRs when predicate or join shape changes.
- Revisit quarterly: workloads drift as product features add filters and joins.
Key takeaways
- EXPLAIN ANALYZE is the ground truth for what costs time — start at the most expensive plan node.
- SARGable predicates let indexes work; functions on columns and leading wildcards usually defeat them.
- Join order and row cardinality matter as much as indexes — filter early, join on keyed columns.
- Keyset pagination scales; OFFSET does not on large sorted sets.
- ORM convenience hides N+1 and wide selects — inspect and fix the SQL your framework emits.
Related reading
- Database indexing explained — B-trees, compound keys, and covering indexes that make predicates fast
- PostgreSQL fundamentals explained — MVCC, WAL, and planner statistics context
- SQLAlchemy fundamentals explained — ORM patterns that generate optimizable SQL
- Materialized views explained — when to precompute instead of tuning the same aggregate every request