Counting Views at Scale

A view counter is everyone's favorite trivial feature. It's a row and a +1. It stays trivial for a long time — and then, at some boundary you didn't see coming, it becomes the most contended object in the system and the whole thing starts to wobble.

The problem

The naive design is one row per item with an exact count, incremented on every view. It's correct, it's obvious, and at scale it's doomed.

Every view of a popular item now targets the same row. Writes serialize behind a single lock, latency climbs, and under a spike the counter becomes a hot spot that drags unrelated queries down with it. Worse, the instinct to "make it exact" — read-modify-write inside a transaction — is exactly what tightens the contention. You've bought a correctness guarantee nobody asked for and paid for it with availability everybody needs.

The deeper trap is the assumption hiding in "exact": that the count must be perfectly right at every instant. For a view counter, it doesn't. Nobody's billing on it. What people actually need is a number that's close, always moving in the right direction, and never stuck.

The solution

Once the requirement is stated honestly, the invariant gets weaker and the design gets stronger: the count only ever moves toward the truth, and never blocks the write path.

A few things follow from that.

Absorb writes, don't serialize them

Increments go into something built for high-throughput append — buffered and aggregated rather than applied one-by-one to a single row. The hot item stops being a hot row, because no individual view is racing any other for the same lock.

Reduce resolution on the way in

You almost never need per-event granularity forever. Roll counts up into time buckets as they arrive — per minute, per hour — so the data you keep is already the shape you'll query. This is the same trick that makes the analytics side survive: decide the resolution you need and throw away the rest at write time, not read time.

Make "stuck" impossible, not just unlikely

The failure mode that actually hurts isn't a count that's off by a few — it's a count that wedges and stops advancing. So the aggregation is designed to be idempotent and monotonic: replaying a batch can't double-count past the floor, and a crash mid-flush resumes instead of stalling. A number that's eventually-right self-heals; a number that's stuck needs a human.

What it cost

You give up instantaneous exactness and a little simplicity in the read path. In return the counter stops being a liability under load, and the same machinery generalizes to every other "count things fast" problem in the product.

The general lesson stuck with me more than the counter did: at scale you don't scale a feature, you re-examine its requirements. Half the time the thing making it hard is a guarantee you assumed you needed and never actually did.