STANDARDwalkthrough

Instagram ID Generation

3 of 8

3 related

At 200M uploads/day (~2,300 writes/sec), we need globally unique, time-sortable IDs without a coordination bottleneck. We chose in-database PL/pgSQL generation (not a centralized service like Twitter's Snowflake) because it eliminates a network hop and a single point of failure.

300

writes/sec saturating MySQL

Each PostgreSQL shard generates IDs using a local sequence, so there is zero cross-shard coordination. Because the timestamp occupies the most significant bits, IDs are naturally time-sortable: a numeric comparison tells you which photo came first without querying a created_at column.

“The 64-bit ID layout: 41 bits for millisecond timestamp since a custom epoch (January 1, 2011, good for 69 years), 13 bits for a logical shard ID (supporting 8,192 shards), and 10 bits for an auto-incrementing sequence (1,024 IDs per millisecond per shard).”

Trade-off: we accept coupling to PostgreSQL. If we ever migrated away from Postgres, we would need to rewrite the ID generator.

Snowflake avoids this coupling but requires deploying and operating a separate high-availability service. What if the interviewer asks: what happens when the sequence overflows 1,024 per millisecond on a single shard?

The shard stalls until the next millisecond. At peak load, 2,300 writes/sec across 8,192 shards means each shard averages 0.28 writes/ms, far below the 1,024 ceiling.

Why it matters in interviews

Interviewers frequently ask about distributed ID generation. Explaining why we embed it inside PostgreSQL versus running an external Snowflake service shows you weigh operational simplicity against architectural coupling, not default to popular solutions.

Related concepts

← PreviousPhoto Storage and CDN Delivery Next →Timeline Cache Architecture