Replica Placement Policies

6 of 8

3 related

RF=3 means three copies, but all three replicas on machines in the same rack is useless. A single rack failure (power supply, top-of-rack switch) loses all copies.

The control plane enforces placement constraints: spread replicas across distinct racks, availability zones, and regions. When the rebalancer or split orchestrator creates a new replica, it queries the placement constraint solver: find a node in a different rack/zone than existing replicas, with sufficient disk space, and below the CPU threshold.

“We chose rack-aware placement (not latency-optimized placement) because correlated failures at the rack level are common in production.”

Google Spanner places replicas across continents for disaster recovery and uses Paxos to maintain consistency across 50-100ms round-trip latencies. The trade-off is concrete: cross-region replication adds 50-100ms latency per write because the Raft leader must wait for a majority ack.

For read-heavy workloads, this is acceptable because reads can be served from the local region via follower reads with bounded staleness.

Why it matters in interviews

Placement policies test whether we think beyond single-machine reliability. Explaining rack-aware placement and the cross-region latency trade-off demonstrates that we design for real infrastructure failures, not textbook scenarios.

Related concepts

← PreviousOnline Schema Changes (Ghost Tables)Next →Range Split and Merge Operations