Writeups

Problems, and how they got solved

Each one is a problem statement and the way out of it — the invariant it turned on, the tradeoffs, what failed first. The systems are real; where they happened mostly isn't the point.

  1. Reliable Systems on Unreliable Data

    June 18, 2026

    When every input lies sometimes and every upstream is down sometimes, "handle the happy path and retry" isn't a design. Building for inputs that are wrong on purpose.

    3 min read

  2. Counting Views at Scale

    June 10, 2026

    A view counter is the simplest thing in the world, right up until it's the thing that falls over. How a hot row taught me to count differently.

    2 min read

  3. Multi-Tenant Isolation the Database Enforces

    May 30, 2026

    One forgotten WHERE clause shouldn't be able to leak another tenant's data. Pushing the isolation invariant down to where it can't be bypassed.

    3 min read

On the bench · in progress

  • Recovering Bad Data Instead of Dropping ItWhen a record fails because a field is subtly wrong, the cheapest fix is often to go find the right value — carefully, and only where it's safe.
  • Query-Aware Analytics Without a Service Per MetricInstead of deploying a new consumer for every analytical question, compile the question into one consumer that reconfigures itself at runtime.
  • Classifying Consumer FailuresA blanket retry treats a timeout and a malformed record the same way. Why the failure taxonomy is the actual design.
  • Audit Logs Are Mostly a Storage ProblemCapturing "who did what" is easy. Where it lives, how it's queried, and why it can't be rewritten is the real work.
  • A Low-Downtime Kafka MigrationMoving a self-managed broker cluster onto encrypted disks without losing data — quorum, partition reassignment, and a few minutes of acceptable downtime.