Guide

Microservices architecture explained

Microservices architecture structures an application as a collection of small, independently deployable services — each owning a slice of business capability, communicating over the network, and scaling on its own schedule. The pattern promises team autonomy, technology heterogeneity, and fault isolation. It also introduces distributed-systems complexity that a monolith hides behind a single process and one database transaction. This guide explains when microservices earn their operational tax, how to draw service boundaries with domain-driven design, sync versus async communication, the database-per-service rule, consistency patterns like sagas, gateway and observability layers, common failure modes, and a checklist before you split production.

Monolith vs microservices: what you are actually trading

A monolith is one deployable unit — one codebase (or a tightly coupled set), often one shared database, one release train. Local function calls are fast and transactional. Refactoring across modules is a compiler error away. The downside is coupling: every team ships together, one slow query can stall the whole binary, and scaling means scaling everything even when only checkout is hot.

Microservices invert that trade. Each service is a separate deployable with its own lifecycle. Teams can ship inventory on Tuesday and billing on Thursday. You scale the search indexer without touching payment pods. But network calls replace in-process calls, partial failures become normal, and debugging a user request may cross six log streams. Martin Fowler's widely cited advice: do not start with microservices — start with a modular monolith and split when organizational or scaling pain justifies the distributed cost.

Microservices are an organizational pattern as much as a technical one. They work when you have multiple teams that need independent velocity, clear ownership, and the platform maturity to run containers, service discovery, CI/CD per service, and production observability. They hurt when a five-person startup splits prematurely and spends quarters fighting distributed transactions instead of customers.

Drawing service boundaries: bounded contexts

The most expensive microservices mistake is cutting along technical layers — “user-service,” “database-service,” “API-service” — which recreates a monolith with extra latency. Boundaries should follow business capabilities and bounded contexts from domain-driven design (DDD): areas where a term means one thing inside the boundary and something else outside it.

Signs of a good boundary

Cohesive language — “Order” in fulfillment means pick-pack-ship state; in billing it means invoice line items. Two services, two models, linked by IDs not shared tables.
Independent change velocity — pricing rules change weekly; identity schema changes yearly. Split where change frequency diverges.
Clear data ownership — one service is the system of record for each entity. Others read via API or events, never write another service's tables.
Team ownership — Amazon's “you build it, you run it” works when one team owns end-to-end. Splitting without ownership creates orphan services nobody pages for.

Anti-patterns to avoid

Chatty services — six HTTP hops to render a product page because boundaries were too granular.
Distributed monolith — services must deploy together because shared libraries encode hidden coupling.
Shared database — multiple services writing the same PostgreSQL schema; schema migrations become company-wide change windows.

Start with a modular monolith: enforce package boundaries, ban cross-module SQL, and measure which modules fight each other in code review. Split the module that blocks releases or needs different scaling first — usually checkout, search, or notifications — not the whole codebase on day one.

Communication: synchronous vs asynchronous

Services talk over the network. Two families dominate production architectures:

Synchronous request/response

HTTP/REST, GraphQL, or gRPC — caller waits for callee. Simple mental model, easy to debug with traces, natural for read paths and query APIs. Risks: cascading latency (p99 stacks multiplicatively), cascading failure when retries amplify load, and tight runtime coupling — if inventory is down, checkout cannot even attempt payment.

Mitigate sync paths with timeouts shorter than client patience, circuit breakers, bulkheads (thread pool isolation), and cached fallbacks where stale data beats errors.

Asynchronous messaging

Producers publish events to a message queue or log (Kafka, RabbitMQ, SQS, NATS). Consumers process at their own pace. Decouples availability — billing can lag five minutes behind order placement if the business allows it. Enables fan-out (one OrderPlaced event triggers email, analytics, and warehouse systems without order-service knowing them).

Async trades immediacy for resilience. You need idempotent consumers, dead-letter queues, poison-message handling, and schema evolution discipline on event contracts. Debugging “why wasn't this email sent?” requires distributed tracing across publish and consume spans.

Choosing a default

Use sync for queries and orchestration where the user waits for an answer. Use async for side effects and propagation — notifications, search index updates, analytics, anything that can be eventually consistent. Hybrid architectures are normal: HTTP to place an order, event to fulfill it.

Data ownership: database per service

The microservices data rule is strict: each service owns its datastore. No foreign keys across service databases. No shared read replicas that other teams query directly. If billing needs order totals, it calls order-service's API or subscribes to OrderCompleted events — it does not SELECT from orders.orders.

This enables independent schema migration, different storage engines (Postgres for transactions, Elasticsearch for search, Redis for sessions), and blast-radius isolation when a runaway analytics query cannot lock checkout rows.

Consistency without a global transaction

Cross-service updates cannot use a single ACID transaction. Patterns that replace it:

Saga — sequence of local transactions with compensating steps on failure (cancel reservation, refund payment). See the dedicated saga pattern guide.
Transactional outbox — write business row and outbox event in one local transaction; relay process publishes to the broker.
Eventual consistency — accept temporary divergence; design UX and APIs to reflect in-progress states.
Read models / CQRS — materialized views built from event streams for fast cross-domain queries without cross-database joins.

Understand consistency models before promising users instant cross-service invariants you cannot enforce.

API gateway, service mesh, and the edge layer

Clients should not hard-code twelve internal hostnames. An API gateway terminates TLS, authenticates JWTs, rate-limits, routes /v1/orders to order-service, and aggregates responses for mobile clients that need one round trip.

Inside the cluster, a service mesh (Istio, Linkerd) adds mutual TLS, retries, timeouts, and traffic splitting without baking retry logic into every language SDK. Meshes shine at 20+ services; they are overhead for three Go binaries behind nginx.

BFF (backend for frontend) pattern: separate gateway facades per client type — web BFF shapes payloads for React; mobile BFF minimizes chattiness over cellular. Prevents one generic API from serving every client's conflicting needs.

Deployment, scaling, and team topology

Independent deployment is the microservices superpower. Each service needs its own CI pipeline, container image, health checks, and rollback path. Patterns like blue-green and canary deploys apply per service — shipping payment v2.4 to 5% of traffic while catalog stays on v1.9.

Scaling is per-service: horizontal pod autoscaling on CPU or custom metrics (queue depth, request rate). Cost discipline matters — twenty idle microservices on Kubernetes still bill for control plane, sidecars, and observability ingest.

Conway's Law applies: service boundaries should align with how teams communicate. Platform teams provide paved roads — golden-path templates, shared logging libraries, standard Helm charts — so product teams do not reinvent discovery and tracing for every new repo.

Observability and operability are not optional

A monolith stack trace spans one process. Microservices demand metrics, structured logs, and distributed tracing with propagated correlation IDs from the gateway through every hop. Without traces, incidents devolve into log archaeology.

Define SLOs per service — availability and latency budgets — and alert on error budget burn, not every blip. Run chaos experiments in staging (then carefully in production) to prove circuit breakers and failover actually work. Maintain runbooks: what to do when the identity provider is slow, when Kafka lag exceeds ten minutes, when a bad deploy needs instant rollback.

Common pitfalls and when to stay monolithic

Premature decomposition — splitting before product-market fit; operational tax kills iteration speed.
No contract testing — consumer-driven contracts (Pact) or schema registries for events prevent silent breakage across teams.
Ignoring network economics — serial sync chains turn 2 ms local calls into 200 ms user-visible latency.
Shared libraries that encode business rules — a fat common JAR recreates the distributed monolith.
Weak platform foundation — no CI/CD, no secrets rotation, no on-call rotation — microservices multiply failure modes faster than features.

Stay monolithic (or modular monolith) when the team is small, the domain is still shifting weekly, you lack SRE capacity, or your bottleneck is feature delivery not deploy cadence. Extract services when you have measured pain — release contention, scaling mismatch, regulatory isolation — not because a conference slide said Netflix does it.

Production checklist

Before splitting a production path into a new service, confirm:

Boundary document — owned entities, public API surface, events published, and explicit non-goals.
Data ownership — dedicated database; no cross-service SQL; migration plan for extracted tables.
Sync contract — OpenAPI or protobuf spec, versioning policy, timeout and retry budget documented.
Async contract — event schema, idempotency key, DLQ and replay procedure.
Failure modes — circuit breakers, fallbacks, saga compensations tested in staging.
Observability — RED metrics (rate, errors, duration), trace propagation, dashboards linked from the service README.
Deploy path — automated pipeline, health/readiness probes, rollback under five minutes.
On-call owner — named team, paging policy, runbook in the repo.
Load test — p99 latency and error rate at 2x expected peak with one dependency artificially slow.

Key takeaways

Microservices trade local simplicity for independent deploy and scale — worth it when organizational pain exceeds operational cost.
Boundaries follow business capabilities and bounded contexts, not technical layers.
Database per service is non-negotiable; cross-service consistency uses sagas, outbox, and eventual consistency — not distributed 2PC.
Sync for queries, async for side effects — hybrid communication is the norm in mature systems.
Gateway + observability + platform paved roads — without them, microservices multiply outages instead of isolating them.
Start modular, split on evidence — a well-structured monolith beats a poorly drawn microservices map.