← Back to blog Khalil Drissi

Database schema best practices

Listen to article
0:00

The schema is the part of a system that is hardest to change once it is full of data. You can rewrite a service in a weekend. Migrating a billion-row table without downtime is a project. So I spend real time on the schema up front, because the cost of getting it wrong only grows. These are the decisions I do not regret.

Let the database enforce the rules

Application code is not the place to guarantee data integrity, because there is always another path in: a migration script, a manual fix, a second service, a developer in a console. The database is the one chokepoint everything passes through, so that is where the rules belong. I use NOT NULL aggressively, foreign keys to enforce relationships, unique constraints to prevent duplicates, and check constraints for value ranges.

CREATE TABLE orders (
  id          bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  customer_id bigint NOT NULL REFERENCES customers(id),
  status      text   NOT NULL DEFAULT 'pending'
              CHECK (status IN ('pending','paid','shipped','cancelled')),
  total_cents integer NOT NULL CHECK (total_cents >= 0),
  created_at  timestamptz NOT NULL DEFAULT now()
);

Every constraint here is a bug that can never reach production. A status of “shippd” gets rejected at write time instead of breaking a report three weeks later.

Normalize first, denormalize on evidence

I start normalized: each fact lives in exactly one place. Duplicated data is duplicated truth, and the copies drift apart the moment someone updates one and forgets the other. Normalization keeps writes simple and correctness cheap.

Denormalization is a performance optimization, and like any optimization I want a measurement before I do it. When a specific query is genuinely too slow and the profile points at joins, then I will cache a computed value or duplicate a column, knowing I am taking on the job of keeping the copies in sync. Doing it preemptively is how you get a schema full of fields nobody trusts.

Choose keys and types deliberately

Every table gets a synthetic primary key, usually a bigint identity or a UUID. I avoid natural keys like email addresses as primary keys, because the one thing you can promise about a natural key is that it will change, and changing a primary key that is referenced everywhere is misery. Use types that mean something:

Index for your reads, but know the cost

An index makes reads fast and writes slightly slower, and it takes space. I index foreign keys, columns I filter on regularly, and columns I sort by. I do not index everything, because an unused index is pure overhead on every insert. The way to know is to look: most databases will tell you which indexes go untouched, and those are candidates for removal.

Composite indexes are worth understanding because column order matters. An index on (customer_id, created_at) helps a query filtering by customer and sorting by date, but it does nothing for a query that only filters by date. The same observability mindset I described in logging and observability best practices applies here: measure the real query patterns before you guess.

Treat migrations as code

Every schema change goes through a migration file, checked into version control, reviewed like any other change. No manual ALTER statements run by hand in production, ever, because the next environment will not have them and you will spend a day chasing the difference. The same review discipline from code review best practices applies, with extra care, since a bad migration can lock a table or drop data.

A schema you can evolve safely is worth more than a perfect schema you are afraid to touch. Build for change, because change is the only certainty.

Comments
Leave a comment