STANDARDwalkthrough

Hinted Handoff and Sloppy Quorums

5 of 8
3 related
Replica 2 of a key's preference list is rebooting. A write arrives needing W = 2 acknowledgments and only replicas 1 and 3 are up: fine, the quorum succeeds.
10K
writes/sec saturating MySQL
A strict quorum fails the write: correct but unavailable, and Dynamo's founding requirement was that the cart must always accept writes. The sloppy quorum answer: the coordinator walks FURTHER around the ring and writes to the next healthy node, say node D, which is not in the preference list at all.
But what if BOTH 2 and 3 are briefly unreachable and only replica 1 remains?
D stores the value with a hint: a note saying "this belongs to replica 2; deliver it when they return". When replica 2 rejoins, D hands off the hinted writes and deletes its copies.
Availability is preserved: the write took W acknowledgments, just not from the canonical owners. The costs are subtle and interviewers love them.
First, sloppy quorums weaken the R + W > N guarantee: a read quorum from the canonical replicas can MISS a write that is sitting as a hint on node D, so during failures, reads can be stale even with R = W = 2. Second, hints are buffered work: a node holding hints for a long-dead peer accumulates them; we cap hint storage (say 3 hours worth) and rely on anti-entropy for anything older.
Third, mass failures turn handoff into a thundering herd on recovery, so handoff delivery is rate-limited. The numbers: with a 3-hour hint TTL at 10K writes/sec per node, a dead node's neighbors buffer up to ~100 GB of hints between them, which is why the TTL is hours, not days.
What if the interviewer asks: is hinted handoff replication? No: it is a temporary durability patch.
The canonical replica count is restored only when handoff completes or anti-entropy repairs the gap.
Why it matters in interviews
Hinted handoff is the mechanism behind "always writable", and knowing that it silently weakens quorum overlap during failures is the depth interviewers fish for. The 3-hour TTL and rate-limited recovery show operational maturity.
Related concepts