EASYwalkthrough
Tweet Snowflake ID
How do we generate unique IDs across thousands of servers without coordination? The constraint: auto-incrementing IDs require a single sequence generator, which becomes a bottleneck and a single point of failure at 600,000 posts per second.
We need IDs that are unique, compact, and time-sortable. The Snowflake algorithm produces a 64-bit ID composed of three parts: 41 bits for an epoch timestamp (milliseconds since a custom epoch, good for 69 years), 10 bits for a machine ID (supporting 1,024 worker nodes), and 12 bits for a per-machine sequence number.
“UUIDs solve uniqueness but are 128 bits (twice the storage), not time-sortable, and create fragmented B-tree indexes because of their randomness.”
The 12-bit sequence allows 4,096 IDs per millisecond per machine. Implication: a single Snowflake node generates over 4 million unique IDs per second, and 1,024 nodes together produce over 4 billion, far exceeding our write throughput needs.
Because the timestamp occupies the most significant bits, IDs are naturally time-sortable: a numeric comparison tells us which post came first, eliminating the need for a secondary index on created_at. We chose Snowflake (not UUIDs) because our timeline cache sorts by post ID, and time-sortable IDs let us use ZREVRANGE on a Redis Sorted Set without maintaining a separate timestamp score.
Trade-off: we gave up the simplicity of UUIDs for a system that requires machine ID assignment and epoch coordination. Discord uses a Snowflake variant for message IDs, and Instagram uses a similar approach with PostgreSQL sequences.
What if the interviewer asks: 'What happens when two machines get the same machine ID?' We assign machine IDs from ZooKeeper (or etcd) leases. If a lease expires, the node stops generating IDs until it re-acquires a unique lease, preventing duplicates at the cost of brief unavailability on that node.
Related concepts