STANDARDwalkthrough
Read Repair and Merkle-Tree Anti-Entropy
Replicas drift. A node misses writes while rebooting, a hint expires undelivered, a disk sector dies.
If they disagree, it resolves the winner (by version), responds to the client immediately, and asynchronously writes the winner back to the stale replicas. Hot keys therefore heal themselves within seconds; the repair rides on traffic we were paying for anyway.
“Two mechanisms pull replicas back together, one opportunistic and one exhaustive. Read repair is the cheap one: on every quorum read, the coordinator already holds R versions of the value.”
But read repair only heals what gets read: a key written once and never read again can stay divergent forever. Anti-entropy is the exhaustive one: periodically, replica pairs compare their entire key ranges. Comparing 200 GB per node key-by-key would saturate the network, so each node builds a Merkle tree over its range: leaves hash buckets of keys, parents hash their children.
Two nodes compare roots: identical roots mean identical data in one round trip. On mismatch they descend only into differing subtrees, localizing the divergence in comparisons and streaming just the differing buckets.
A full-cluster anti-entropy pass runs weekly per range, off-peak, rate-limited. The trade-offs: Merkle trees must be rebuilt when data changes (Cassandra builds them per-repair from SSTables, which is IO-heavy, hence the weekly cadence, not continuous), and repair storms after long outages must be throttled or they become their own incident.
The division of labor is the takeaway: read repair keeps hot data convergent in seconds for free; anti-entropy guarantees cold data converges within the repair interval; hinted handoff bridges short outages in minutes. Three mechanisms, three time horizons.
What if the interviewer asks: why not just re-replicate everything after every failure? Because failures are constant at 500 nodes, and full re-replication for every blip would consume the cluster; targeted repair is what makes always-on feasible.
Related concepts