Whiteboard Scale›Consistent Hashing›Cheat Sheet

Cheat Sheet

Consistent Hashing Cheat Sheet

Key concepts, trade-offs, and quick-reference notes for your interview prep.

Modular Hashing Trap: hash % N Reshuffles 99%

With modular hashing, changing N from 100 to 99 (one node failure) remaps

99/100 = 99\%

of keys. At 500M keys, that is 495M cache misses hitting the database simultaneously. Consistent hashing remaps only

1/N

500M / 100 = 5M

keys (1%). The ring maps keys and servers onto the same circular hash space. A key is served by the first server clockwise from its hash position.

💡 hash % N remaps 99% on node change. Ring remaps 1/N = 1%.

⚠ Using modular hashing for dynamic clusters. It works in static environments but causes cache avalanches whenever topology changes.

Virtual Nodes: 256 Per Server Reduces Variance to <10%

Three physical nodes on a ring produce wildly uneven arcs: load variance reaches 250%. With 256 virtual nodes per server, 100 servers produce 25,600 positions. The law of large numbers smooths arc lengths to under 10% variance. Why 256? Fewer (32) leave 30% variance; more (1024) quadruple metadata with diminishing returns. DynamoDB uses 256 as default.

💡 256 vnodes/server. 25,600 total positions. <10% variance.

⚠ Using 1 ring position per physical node. With 3 nodes, one server can own 60% of the ring while another owns 10%.

Ring Lookup: O(log N) Binary Search on 307 KB

Ring metadata: 25,600 vnodes x 12 bytes (4B hash + 8B node_id) = 307 KB. This fits in L2 cache, so lookups never touch main memory. Binary search:

\log_2(25{,}600) \approx 15

comparisons at ~1ns each = 15 nanoseconds per lookup. At 115K ops/sec, the gateway spends less than 2ms total per second on ring lookups. The sorted array is rebuilt only on topology changes (rare).

💡 307 KB ring, 15 comparisons, 15 nanoseconds. Fits in L2 cache.

⚠ Using a linked list for the ring. Linked list lookup is O(N) = 25,600 comparisons per request. Sorted array with binary search is O(log N) = 15.

Rebalancing Cost: 5 GB Transfer in 42 Seconds

Adding a node to a 100-node cluster transfers

525\text{GB} / 101 \approx 5.2\text{ GB}

from existing nodes. At 1 Gbps sustained:

5.2 \times 8 / 1 = 42

seconds. During rebalancing, the old owner continues serving reads. We use two-phase bootstrap: (1) copy data in background, (2) atomically update ring topology. Why not faster? Throttling to 500 Mbps protects donor nodes from I/O saturation during peak traffic.

💡 5 GB per node add. 42 seconds at 1 Gbps. Two-phase bootstrap.

⚠ Rebalancing without bandwidth throttling. The donor node becomes I/O-saturated, degrading live cache performance for all keys it serves.

Replication Placement: Walk N Distinct Physical Nodes Clockwise

With RF=3, a key is stored on 3 distinct physical nodes found by walking clockwise from the key's ring position. We skip vnodes belonging to already-selected physical nodes because two consecutive vnodes might belong to the same server. Blindly taking the next 2 positions could place all 3 replicas on one machine. For rack-aware placement, also skip nodes in the same rack.

💡 Walk clockwise, skip same-physical-node vnodes, collect RF distinct hosts.

⚠ Replicating to the next RF-1 ring positions without checking physical node identity. Two adjacent vnodes on the same server means one machine failure loses all replicas.

Hot Key Splitting: Scatter into 10 Sub-Keys

Virtual nodes balance key count, not request volume. A viral key receiving 80% of reads overloads its primary node. Detection: per-node CPU divergence exceeds 3x cluster average. Fix: split into 10 sub-keys (key:0 through key:9). Client appends

\text{rand}(0,9)

, spreading reads across up to 10 nodes. Writes update all 10 (10x write amplification, acceptable for read-heavy hot keys).

💡 10 sub-keys. Random suffix on reads. Detect via CPU divergence.

⚠ Assuming virtual nodes solve hot key problems. Vnodes balance key distribution, not access distribution. One key with 80% of reads still hits one node.

Jump Hash vs Ring: Zero Memory but No Arbitrary Removal

Google's Jump Hash (2014) maps keys to N buckets in

O(\ln N)

time with zero memory. Distribution is perfectly uniform. But it requires sequential bucket numbering (0 to N-1). Removing bucket 47 requires remapping 47 to another, cascading changes. We chose ring + vnodes (not jump hash) because cache nodes fail at arbitrary positions. Jump hash is better for static datasets like log partition assignment.

💡 Jump hash: O(ln N), 0 memory, perfect distribution, no arbitrary removal. Ring: 307 KB, arbitrary removal.

⚠ Using jump hash for a cache cluster where nodes can fail. Jump hash requires sequential numbering and cannot handle arbitrary node removal without cascading remaps.

Bounded Load: Cap at (1+epsilon) x Average

Google's bounded-load consistent hashing (2017) caps each node at

(1 + \epsilon) \times \text{avg\_load}

. When a key hashes to an overloaded node, walk clockwise to the next node below the cap. With

\epsilon = 0.25

, no node exceeds 125% of average. Vimeo adopted this for CDN caching, reducing miss rates by 50% during spikes. Trade-off: key-to-node mapping becomes load-dependent, reducing cache hit ratio by 2-5%.

💡 Cap at (1+epsilon) x average. epsilon=0.25 means max 125%. Dynamic routing reduces hits by 2-5%.

⚠ Setting epsilon too low (e.g., 0.01). Requests cascade through many nodes before finding capacity, adding latency and defeating the purpose of consistent hashing.

Ring Convergence: Versioned Epochs with 5-Second Poll

The ring coordinator assigns a monotonically increasing epoch number to every topology change. Gateways poll every 5 seconds, comparing their local epoch to the coordinator's. If stale, they pull the full 307 KB ring. Target: all gateways converge to the latest epoch within 10 seconds. Stale gateways route to wrong nodes, causing misses. Why not push-based? Push requires maintaining gateway connections; polling is simpler and 10-second staleness is acceptable for cache routing.

💡 Epoch numbers. 5-second poll. 10-second convergence. 307 KB full ring pull.

⚠ Storing ring topology only in memory on gateways. A coordinator restart loses the ring state. MySQL provides durable backup; gateways cache locally.

Cache Miss Storm: Request Coalescing + Circuit Breaker

#10

A popular key expires. 10,000 concurrent requests miss simultaneously and all hit the database. This is the thundering herd problem. Fix 1: request coalescing at the gateway. Only the first miss fetches from the database; subsequent requests for the same key wait for that result. Fix 2: circuit breaker. If miss rate exceeds 10x normal for 5 seconds, reject new requests to protect the database. Fix 3: staggered TTLs. Add

\text{rand}(0, 60\text{s})

jitter to expiration to prevent synchronized expiry.

💡 Coalesce concurrent misses. Circuit breaker at 10x miss rate. Staggered TTLs with jitter.

⚠ Allowing every cache miss to independently query the database. At 10,000 concurrent misses for one key, the database receives 10,000 identical queries instead of 1.