Failure Detection and Lease-Based Liveness

3 of 8

3 related

How does the control plane know a node is dead, not just slow? With 10K nodes sending heartbeats every 10 seconds, the metadata leader processes 1K heartbeats/sec.

Gossip protocols like Cassandra's phi-accrual detector converge probabilistically, leaving a window where two nodes might believe they are the leader. The failure timeline: node dies at

t=0

, last heartbeat was at

t=-T

(up to 10s ago), lease expires at

t=(10-T)+9=

up to

19s

, new election takes 2 seconds, propagation takes 1 second.

“We chose lease-based detection (not gossip-based) because leases give a deterministic fencing boundary: when a 9-second lease expires, the old leader is guaranteed to have stopped serving (it checks its own lease before every operation).”

Typical total: 12 seconds from death to new leader serving. CockroachDB uses 9-second leases for exactly this reason.

Trade-off: leases add latency to failure detection (must wait for full lease duration) but prevent split-brain at the range level.

Why it matters in interviews

Interviewers probe the gossip vs lease trade-off. Explaining that leases provide deterministic fencing while gossip only offers probabilistic convergence shows staff-level distributed systems understanding.

Related concepts

← PreviousMetadata Store (Raft-Based Consensus)Next →Automatic Range Rebalancing