STANDARDwalkthrough
Failure Detection and Lease-Based Liveness
How does the control plane know a node is dead, not just slow? With 10K nodes sending heartbeats every 10 seconds, the metadata leader processes 1K heartbeats/sec.
Gossip protocols like Cassandra's phi-accrual detector converge probabilistically, leaving a window where two nodes might believe they are the leader. The failure timeline: node dies at , last heartbeat was at (up to 10s ago), lease expires at up to , new election takes 2 seconds, propagation takes 1 second.
“We chose lease-based detection (not gossip-based) because leases give a deterministic fencing boundary: when a 9-second lease expires, the old leader is guaranteed to have stopped serving (it checks its own lease before every operation).”
Typical total: 12 seconds from death to new leader serving. CockroachDB uses 9-second leases for exactly this reason.
Trade-off: leases add latency to failure detection (must wait for full lease duration) but prevent split-brain at the range level.
Related concepts