Whiteboard Scale›ID Generator›Design Walkthrough

ID Generator System Design Walkthrough

Complete design walkthrough with animated diagrams, capacity math, API design, schema, and failure modes.

Solution PathTarget: 22 min

We designed a Snowflake-class ID generator: 64-bit IDs composed of 41 timestamp bits (69.7 years from a custom epoch), 10 worker bits (1,024 coordination-free minters), and 12 sequence bits (4,096/ms/worker, 4.2B/sec cluster ceiling). The real ceiling is RPC, so batching and library embedding serve 10M IDs/sec on 12 workers. Clock-backward steps are policied (spin, refuse, or borrow sequence), worker IDs are leased with fencing deadlines, and a 1-in-1,000 SETNX canary continuously proves uniqueness. Ten bytes of state per worker; correctness by refusal.

1/10

What is ID Generator?

Twitter, 2010: the fail-whale era. Tweets lived in a MySQL database whose auto-increment handed out IDs: unique, dense, simple: and a single point of failure that every tweet on Earth serialized through.

Sharding the database meant the counter could no longer be THE counter, and UUIDs: the obvious replacement: would have doubled index size and randomized every B-tree insert (the locality tax: random keys fragment indexes roughly 10x at scale). Twitter needed IDs that were unique without coordination, small enough for an int64, and roughly time-ordered so tweets could still sort by ID: and nothing off the shelf did all three.

The answer, Snowflake, became the pattern half the industry now runs: Discord message IDs, Instagram media IDs (a PostgreSQL-embedded variant), Sony's Sonyflake, Baidu's UidGenerator. The idea fits in one sentence: compose the ID from a timestamp, a machine number, and a counter, so uniqueness comes from structure: no two machines share bits, no machine reuses a millisecond-sequence pair: rather than from a central authority or from 122 bits of randomness.

What makes the topic interview-worthy is not the happy path (it is bit arithmetic) but the two places physics pushes back: clocks are not monotonic (NTP steps backward; the generator must pause or refuse rather than ever duplicate) and machine identity needs an arbiter (two workers sharing an ID is the one catastrophe; leases with fencing deadlines prevent it). The scope boundary worth stating: this is the infrastructure ID: for public URLs you may still layer an opaque slug on top (dense IDs leak volume: the German tank problem), and for zero-infrastructure contexts UUIDv7 is the modern compromise: sortable, but still 16 bytes.

One design, three requirements, two physics problems: the smallest system in this course with the widest blast radius.

Twitter 2010: auto-increment could not shard, UUIDs paid the locality tax. Snowflake: timestamp + machine + counter = uniqueness from structure, no coordination, int64-sized, k-sortable. The hard parts are physics: clocks step backward and machine identity needs an arbiter. Descendants: Discord, Instagram, Sonyflake.

A unique ID generator is the smallest service with the largest blast radius: every row, message, and order in the platform starts with an identity it mints. The requirements look trivial: unique, fast, roughly time-ordered: until you demand all three across 1,000+ machines minting millions of IDs per second with no coordination on the hot path. The Snowflake answer treats a 64-bit integer as a budget: 41 bits of milliseconds buy time-ordering, 10 bits of worker ID buy machine-level uniqueness, 12 bits of sequence buy per-millisecond burst: and the two real engineering problems live at the edges: clocks that step backward and worker IDs that must never collide.

Layout: 1 + 41 + 10 + 12 bits: sign, ms timestamp, worker, sequence: each field buys a guarantee
Ceilings: 4,096 IDs/ms per worker, 1,024 workers, 69.7 years of clock from a custom epoch
The defining failure: clock steps backward -> spin-wait, refuse, or borrow sequence: never duplicate
Real bottleneck is RPC, not bits: ~150K IDs/sec/worker as a service; batching or library embedding closes the gap