Consistent Hashing System Design Walkthrough
Complete design walkthrough with animated diagrams, capacity math, API design, schema, and failure modes.
Solution PathTarget: 25 min
We designed a distributed key-value cache for 500M DAU handling 115K ops/sec using consistent hashing with 256 virtual nodes per server. 25,600 ring positions in 307 KB enable 15-comparison binary search lookups. When a node fails, only 1% of keys (5M of 500M) remap, and RF=3 replication means the successor already holds the data. Virtual nodes reduce per-node load variance from 250% to under 10%. Hot keys are split across 10 sub-keys. Ring topology in MySQL; cache data in Redis with sub-millisecond lookups.
1/10
1.
What is Consistent Hashing?
Amazon DynamoDB uses consistent hashing to distribute tens of billions of requests per day across thousands of storage nodes. We are designing a distributed key-value cache for 500M DAU that maps keys to servers using a hash ring.
When a node joins or leaves, only 1/N of keys move instead of all of them. The real challenge is keeping load variance under 10% across nodes while handling 115K cache ops/sec with sub-millisecond routing.
Without virtual nodes, 3 servers produce arc sizes varying by 250%, meaning one server handles 6x the traffic of another. The second challenge is rebalancing speed: when a node fails at peak traffic, the successor must absorb 5M keys within seconds or the database gets crushed by cache misses.
Distributed cache for 500M DAU at 115K ops/sec. Two hard problems: load variance under 10% with virtual nodes, and rebalancing 5M keys on failure without crushing the database.
Amazon DynamoDB uses consistent hashing to distribute tens of billions of requests per day across thousands of storage nodes. We are designing a distributed key-value cache that maps keys to servers using a hash ring so that when a node joins or leaves, only 1/N of keys move instead of all of them. The system sounds simple, but the real challenge is keeping load variance under 10% across nodes while handling 115K cache ops/sec with sub-millisecond routing decisions and automatic rebalancing on node failure.
- Naive modular hashing (hash % N) remaps 99% of keys on a single node failure; consistent hashing remaps only 1/N
- Without virtual nodes, 3 servers produce arc sizes varying by 250%; with 256 vnodes each, variance drops under 10%
- Akamai invented consistent hashing in 1997 to route CDN requests across 300,000+ edge servers
- Ring metadata fits in 307 KB for 100 nodes x 256 vnodes; binary search lookup takes 15 comparisons