EASYwalkthrough

Tweet Snowflake ID

6 of 8

2 related

How do we generate unique IDs across thousands of servers without coordination? The constraint: auto-incrementing IDs require a single sequence generator, which becomes a bottleneck and a single point of failure at 600,000 posts per second.

We need IDs that are unique, compact, and time-sortable. The Snowflake algorithm produces a 64-bit ID composed of three parts: 41 bits for an epoch timestamp (milliseconds since a custom epoch, good for 69 years), 10 bits for a machine ID (supporting 1,024 worker nodes), and 12 bits for a per-machine sequence number.

“UUIDs solve uniqueness but are 128 bits (twice the storage), not time-sortable, and create fragmented B-tree indexes because of their randomness.”

The 12-bit sequence allows 4,096 IDs per millisecond per machine. Implication: a single Snowflake node generates over 4 million unique IDs per second, and 1,024 nodes together produce over 4 billion, far exceeding our write throughput needs.

Because the timestamp occupies the most significant bits, IDs are naturally time-sortable: a numeric comparison tells us which post came first, eliminating the need for a secondary index on created_at. We chose Snowflake (not UUIDs) because our timeline cache sorts by post ID, and time-sortable IDs let us use ZREVRANGE on a Redis Sorted Set without maintaining a separate timestamp score.

Trade-off: we gave up the simplicity of UUIDs for a system that requires machine ID assignment and epoch coordination. Discord uses a Snowflake variant for message IDs, and Instagram uses a similar approach with PostgreSQL sequences.

What if the interviewer asks: 'What happens when two machines get the same machine ID?' We assign machine IDs from ZooKeeper (or etcd) leases. If a lease expires, the node stops generating IDs until it re-acquires a unique lease, preventing duplicates at the cost of brief unavailability on that node.

Why it matters in interviews

Snowflake IDs are foundational to distributed systems. Explaining the bit layout and why time-sortability eliminates a separate timestamp index demonstrates we understand distributed ID generation at a practical level, not a textbook one.

Related concepts

← PreviousFeed Ranking Next →Cache Warming