Cheat Sheet

Consistent Hashing Cheat Sheet

Key concepts, trade-offs, and quick-reference notes for your interview prep.

Modular Hashing Trap: hash % N Reshuffles 99%

#1
With modular hashing, changing N from 100 to 99 (one node failure) remaps 99/100=99%99/100 = 99\% of keys. At 500M keys, that is 495M cache misses hitting the database simultaneously. Consistent hashing remaps only 1/N1/N: 500M/100=5M500M / 100 = 5M keys (1%). The ring maps keys and servers onto the same circular hash space. A key is served by the first server clockwise from its hash position.

💡 hash % N remaps 99% on node change. Ring remaps 1/N = 1%.

Using modular hashing for dynamic clusters. It works in static environments but causes cache avalanches whenever topology changes.

Virtual Nodes: 256 Per Server Reduces Variance to <10%

#2
Three physical nodes on a ring produce wildly uneven arcs: load variance reaches 250%. With 256 virtual nodes per server, 100 servers produce 25,600 positions. The law of large numbers smooths arc lengths to under 10% variance. Why 256? Fewer (32) leave 30% variance; more (1024) quadruple metadata with diminishing returns. DynamoDB uses 256 as default.

💡 256 vnodes/server. 25,600 total positions. <10% variance.

Using 1 ring position per physical node. With 3 nodes, one server can own 60% of the ring while another owns 10%.

Ring Lookup: O(log N) Binary Search on 307 KB

#3
Ring metadata: 25,600 vnodes x 12 bytes (4B hash + 8B node_id) = 307 KB. This fits in L2 cache, so lookups never touch main memory. Binary search: log2(25,600)15\log_2(25{,}600) \approx 15 comparisons at ~1ns each = 15 nanoseconds per lookup. At 115K ops/sec, the gateway spends less than 2ms total per second on ring lookups. The sorted array is rebuilt only on topology changes (rare).

💡 307 KB ring, 15 comparisons, 15 nanoseconds. Fits in L2 cache.

Using a linked list for the ring. Linked list lookup is O(N) = 25,600 comparisons per request. Sorted array with binary search is O(log N) = 15.

Rebalancing Cost: 5 GB Transfer in 42 Seconds

#4
Adding a node to a 100-node cluster transfers 525GB/1015.2 GB525\text{GB} / 101 \approx 5.2\text{ GB} from existing nodes. At 1 Gbps sustained: 5.2×8/1=425.2 \times 8 / 1 = 42 seconds. During rebalancing, the old owner continues serving reads. We use two-phase bootstrap: (1) copy data in background, (2) atomically update ring topology. Why not faster? Throttling to 500 Mbps protects donor nodes from I/O saturation during peak traffic.

💡 5 GB per node add. 42 seconds at 1 Gbps. Two-phase bootstrap.

Rebalancing without bandwidth throttling. The donor node becomes I/O-saturated, degrading live cache performance for all keys it serves.

Replication Placement: Walk N Distinct Physical Nodes Clockwise

#5
With RF=3, a key is stored on 3 distinct physical nodes found by walking clockwise from the key's ring position. We skip vnodes belonging to already-selected physical nodes because two consecutive vnodes might belong to the same server. Blindly taking the next 2 positions could place all 3 replicas on one machine. For rack-aware placement, also skip nodes in the same rack.

💡 Walk clockwise, skip same-physical-node vnodes, collect RF distinct hosts.

Replicating to the next RF-1 ring positions without checking physical node identity. Two adjacent vnodes on the same server means one machine failure loses all replicas.

Hot Key Splitting: Scatter into 10 Sub-Keys

#6
Virtual nodes balance key count, not request volume. A viral key receiving 80% of reads overloads its primary node. Detection: per-node CPU divergence exceeds 3x cluster average. Fix: split into 10 sub-keys (key:0 through key:9). Client appends rand(0,9)\text{rand}(0,9), spreading reads across up to 10 nodes. Writes update all 10 (10x write amplification, acceptable for read-heavy hot keys).

💡 10 sub-keys. Random suffix on reads. Detect via CPU divergence.

Assuming virtual nodes solve hot key problems. Vnodes balance key distribution, not access distribution. One key with 80% of reads still hits one node.

Jump Hash vs Ring: Zero Memory but No Arbitrary Removal

#7
Google's Jump Hash (2014) maps keys to N buckets in O(lnN)O(\ln N) time with zero memory. Distribution is perfectly uniform. But it requires sequential bucket numbering (0 to N-1). Removing bucket 47 requires remapping 47 to another, cascading changes. We chose ring + vnodes (not jump hash) because cache nodes fail at arbitrary positions. Jump hash is better for static datasets like log partition assignment.

💡 Jump hash: O(ln N), 0 memory, perfect distribution, no arbitrary removal. Ring: 307 KB, arbitrary removal.

Using jump hash for a cache cluster where nodes can fail. Jump hash requires sequential numbering and cannot handle arbitrary node removal without cascading remaps.

Bounded Load: Cap at (1+epsilon) x Average

#8
Google's bounded-load consistent hashing (2017) caps each node at (1+ϵ)×avg_load(1 + \epsilon) \times \text{avg\_load}. When a key hashes to an overloaded node, walk clockwise to the next node below the cap. With ϵ=0.25\epsilon = 0.25, no node exceeds 125% of average. Vimeo adopted this for CDN caching, reducing miss rates by 50% during spikes. Trade-off: key-to-node mapping becomes load-dependent, reducing cache hit ratio by 2-5%.

💡 Cap at (1+epsilon) x average. epsilon=0.25 means max 125%. Dynamic routing reduces hits by 2-5%.

Setting epsilon too low (e.g., 0.01). Requests cascade through many nodes before finding capacity, adding latency and defeating the purpose of consistent hashing.

Ring Convergence: Versioned Epochs with 5-Second Poll

#9
The ring coordinator assigns a monotonically increasing epoch number to every topology change. Gateways poll every 5 seconds, comparing their local epoch to the coordinator's. If stale, they pull the full 307 KB ring. Target: all gateways converge to the latest epoch within 10 seconds. Stale gateways route to wrong nodes, causing misses. Why not push-based? Push requires maintaining gateway connections; polling is simpler and 10-second staleness is acceptable for cache routing.

💡 Epoch numbers. 5-second poll. 10-second convergence. 307 KB full ring pull.

Storing ring topology only in memory on gateways. A coordinator restart loses the ring state. MySQL provides durable backup; gateways cache locally.

Cache Miss Storm: Request Coalescing + Circuit Breaker

#10
A popular key expires. 10,000 concurrent requests miss simultaneously and all hit the database. This is the thundering herd problem. Fix 1: request coalescing at the gateway. Only the first miss fetches from the database; subsequent requests for the same key wait for that result. Fix 2: circuit breaker. If miss rate exceeds 10x normal for 5 seconds, reject new requests to protect the database. Fix 3: staggered TTLs. Add rand(0,60s)\text{rand}(0, 60\text{s}) jitter to expiration to prevent synchronized expiry.

💡 Coalesce concurrent misses. Circuit breaker at 10x miss rate. Staggered TTLs with jitter.

Allowing every cache miss to independently query the database. At 10,000 concurrent misses for one key, the database receives 10,000 identical queries instead of 1.