STANDARDwalkthrough

Timeline Cache Architecture

4 of 8

3 related

With 500M Daily Active Users (DAU) each expecting a feed in under 50ms, we cannot build feeds from scratch on every request. We store each user's feed as a Redis sorted set (not a database table) because sorted sets support

O(\log N + M)

range reads by score, giving us the top 30 photos in microseconds.

We need sorted ordering by timestamp, and Memcached only stores flat blobs. Trade-off: Redis uses more memory per key than Memcached, but the sorted set eliminates application-level sorting.

“Why not Memcached?”

Each entry is a photo_id (8 bytes) scored by timestamp. We cap at 500 photo IDs per user: 500 x 8B = 4KB per user.

At 500M DAU, that is 500M x 4KB = roughly 2TB total. Implication: 2TB fits across a 50-node Redis cluster at 40GB per node, no disk-based caching needed.

The two-step pattern is critical: first read IDs from Redis (microseconds), then hydrate each ID into a full photo object from the metadata store (one batched RPC). Why not store full photo objects in the cache?

Because a single viral photo appears in millions of timelines. Storing the full object would multiply storage by the fanout count, turning 2TB into 200TB+.

Why it matters in interviews

Deriving 2TB for 500M users from first principles and sizing the Redis cluster proves you can translate math into infrastructure. Interviewers want to hear why ID-only storage with separate hydration beats storing full objects, and the fanout multiplication math makes the case concrete.

Related concepts

← PreviousInstagram ID Generation Next →Image Processing Pipeline