Writeups
Problems, and how they got solved
Each one is a problem statement and the way out of it — the invariant it turned on, the tradeoffs, what failed first. The systems are real; where they happened mostly isn't the point.
Reliable Systems on Unreliable Data
June 18, 2026When every input lies sometimes and every upstream is down sometimes, "handle the happy path and retry" isn't a design. Building for inputs that are wrong on purpose.
3 min read
Counting Views at Scale
June 10, 2026A view counter is the simplest thing in the world, right up until it's the thing that falls over. How a hot row taught me to count differently.
2 min read
Multi-Tenant Isolation the Database Enforces
May 30, 2026One forgotten WHERE clause shouldn't be able to leak another tenant's data. Pushing the isolation invariant down to where it can't be bypassed.
3 min read
On the bench · in progress
- Recovering Bad Data Instead of Dropping It — When a record fails because a field is subtly wrong, the cheapest fix is often to go find the right value — carefully, and only where it's safe.
- Query-Aware Analytics Without a Service Per Metric — Instead of deploying a new consumer for every analytical question, compile the question into one consumer that reconfigures itself at runtime.
- Classifying Consumer Failures — A blanket retry treats a timeout and a malformed record the same way. Why the failure taxonomy is the actual design.
- Audit Logs Are Mostly a Storage Problem — Capturing "who did what" is easy. Where it lives, how it's queried, and why it can't be rewritten is the real work.
- A Low-Downtime Kafka Migration — Moving a self-managed broker cluster onto encrypted disks without losing data — quorum, partition reassignment, and a few minutes of acceptable downtime.