Consistent Hashing / Distributed Cache
VERY COMMONConsistent hashing is asked at every FAANG company because it is the partition routing algorithm behind DynamoDB, Cassandra, and Discord's 200M-user message routing. You will design a distributed key-value cache for 500M DAU where adding or removing a node remaps only 1/N of keys instead of reshuffling everything. You will implement virtual nodes that reduce load variance from 250% to under 10%, and design ring-based replication with tunable consistency.
- Design a hash ring with virtual nodes that limits key remapping to 1/N on topology changes
- Solve the hot key problem that causes 80% load imbalance across cache nodes
- Avoid the naive modular hashing trap that reshuffles 99% of keys on node failure
Visual Solutions
Step-by-step animated walkthroughs with capacity estimation, API design, database schema, and failure modes built in.
Cheat Sheet
Key concepts, trade-offs, and quick-reference notes for Consistent Hashing. Everything you need at a glance.
Anti-Patterns
Common design mistakes candidates make. Wrong approaches vs correct approaches for each trap.
Failure Modes
What breaks in production, how to detect it, and how to fix it. Detection metrics, mitigations, and severity ratings.
Start simple. Build to staff-level.
“I would design a distributed key-value cache for 500M DAU handling 115K ops/sec using consistent hashing for partition routing. Each of 100 physical nodes owns 256 virtual positions on a hash ring, totaling 25,600 entries in 307 KB. When a node fails, only 1% of keys (5M of 500M) remap to the successor, and RF=3 replication means the successor already holds the data, so zero misses reach the database. Virtual nodes reduce per-node load variance from 250% to under 10%. Hot keys are split across 10 sub-keys. Ring topology in MySQL; cache data in Redis with sub-millisecond lookups.”
Hash Ring Basics
EASYKeys and servers map onto a circular space. A key is served by the first server clockwise. On node removal, only 1/N of keys remap. Akamai invented this in 1997 for 300,000+ CDN servers.
Core Feature DesignVirtual Nodes
STANDARDEach server places 256 positions on the ring. 25,600 total vnodes reduce load variance from 250% to under 10%. Ring metadata: 307 KB, lookup: 15 comparisons. DynamoDB default.
Core Feature DesignNode Addition & Removal
STANDARDTwo-phase bootstrap: copy data in background, atomically switch ring. Transfers 5 GB in 42 seconds. Old owner serves reads during migration. Zero downtime.
Core Feature DesignReplication on the Ring
STANDARDWalk clockwise, collect RF distinct physical nodes (skip same-host vnodes). Rack-aware: also skip same-rack. With RF=3, node failure causes zero cache misses.
Replication and Fault ToleranceHot Key Problem
TRICKYVnodes balance key count, not request volume. A viral key sends 92K reads/sec to one node. Fix: split into 10 sub-keys. Detect via per-node CPU divergence.
Replication and Fault ToleranceBounded Load Consistent Hashing
TRICKYGoogle 2017: cap each node at . Overflows walk clockwise. Uniform handling of all keys without per-key metadata. Trade-off: 2-5% cache hit ratio reduction.
Core Feature DesignJump Consistent Hashing
STANDARDGoogle 2014: time, zero memory, perfect distribution. But requires sequential bucket numbering. Cannot remove arbitrary nodes. Better for static partitioning.
Core Feature DesignConsistent Hashing in CDNs
EASYAkamai, Cloudflare, Fastly use two-tier rings: first ring maps content to a POP (geo), second ring maps to a server (load). Discord routes 200M users via guild_id ring.
High Level Design