STANDARDwalkthrough
Hinted Handoff and Sloppy Quorums
Replica 2 of a key's preference list is rebooting. A write arrives needing W = 2 acknowledgments and only replicas 1 and 3 are up: fine, the quorum succeeds.
A strict quorum fails the write: correct but unavailable, and Dynamo's founding requirement was that the cart must always accept writes. The sloppy quorum answer: the coordinator walks FURTHER around the ring and writes to the next healthy node, say node D, which is not in the preference list at all.
“But what if BOTH 2 and 3 are briefly unreachable and only replica 1 remains?”
D stores the value with a hint: a note saying "this belongs to replica 2; deliver it when they return". When replica 2 rejoins, D hands off the hinted writes and deletes its copies.
Availability is preserved: the write took W acknowledgments, just not from the canonical owners. The costs are subtle and interviewers love them.
First, sloppy quorums weaken the R + W > N guarantee: a read quorum from the canonical replicas can MISS a write that is sitting as a hint on node D, so during failures, reads can be stale even with R = W = 2. Second, hints are buffered work: a node holding hints for a long-dead peer accumulates them; we cap hint storage (say 3 hours worth) and rely on anti-entropy for anything older.
Third, mass failures turn handoff into a thundering herd on recovery, so handoff delivery is rate-limited. The numbers: with a 3-hour hint TTL at 10K writes/sec per node, a dead node's neighbors buffer up to ~100 GB of hints between them, which is why the TTL is hours, not days.
What if the interviewer asks: is hinted handoff replication? No: it is a temporary durability patch.
The canonical replica count is restored only when handoff completes or anti-entropy repairs the gap.
Related concepts