Distributed Database Control Plane
COMMONThe control plane is asked at Google, Amazon, and CockroachDB interviews because it tests range metadata management, failure detection, and live rebalancing at global scale. It is how Google Spanner manages millions of key-range splits across datacenters. You will design a Raft-based metadata store tracking 1M ranges, a lease-based failure detector processing 1K heartbeats per second, and a range-split orchestrator that detects hotspots, picks a split key, and atomically transfers ownership without a single query failing.
- Design a Raft-based metadata store tracking 1M range assignments in 200 MB
- Build a lease-based failure detector that identifies dead nodes in under 30 seconds
- Orchestrate range splits that move data without dropping a single query
Visual Solutions
Step-by-step animated walkthroughs with capacity estimation, API design, database schema, and failure modes built in.
Cheat Sheet
Key concepts, trade-offs, and quick-reference notes for DB Control Plane. Everything you need at a glance.
Anti-Patterns
Common design mistakes candidates make. Wrong approaches vs correct approaches for each trap.
Failure Modes
What breaks in production, how to detect it, and how to fix it. Detection metrics, mitigations, and severity ratings.
Start simple. Build to staff-level.
“I would design a control plane for a distributed database managing 10K nodes and 1M ranges. Range metadata (200 bytes per entry) totals 200 MB, fitting entirely in memory on each Raft replica. Failure detection uses 9-second leases across 10K nodes, processing 1K heartbeats per second. When a range exceeds 512 MB, the split orchestrator picks the median key, creates two new ranges, and completes the atomic metadata swap in under 60 seconds. The rebalancer moves ~100 ranges per hour when load variance exceeds 20%. Online schema changes propagate across all 1M ranges in under 30 seconds via parallel workers. Raft provides linearizable metadata reads with 2-failure tolerance.”
Range Partitioning
EASYWe chose range partitioning (not hash) because SQL databases need range scans. 1M ranges at 512 MB each. Hotspots handled by split/merge mechanism.
What is it?Metadata Store (Raft)
STANDARDRaft-based store for 1M range entries in 200 MB. Linearizable reads. 5-node group tolerates 2 failures. Embedded, not external ZooKeeper.
High Level DesignLease-Based Failure Detection
STANDARD9-second leases with 1K heartbeats/sec. Expired lease = guaranteed fencing. Failure detected in 12 seconds. Deterministic, not probabilistic.
Core Feature DesignAutomatic Rebalancing
TRICKYTriggered at 20% load variance. Moves ~100 ranges/hour. Selects target by rack/zone/disk. Two-phase transfer with zero downtime.
Core Feature DesignOnline Schema Changes
TRICKYGhost table: shadow schema, parallel backfill across 100 workers, atomic swap. 1M ranges in 30 seconds. No table locks.
Database SchemaReplica Placement Policies
STANDARDRF=3 spread across racks, zones, regions. Constraint solver picks placement. Trade-off: cross-region adds 50-100ms write latency.
High Level DesignRange Split and Merge
TRICKYSplit at 512 MB using median key by data volume. Two-phase: in-flight queries served by old range until atomic metadata swap via Raft.
Core Feature DesignCluster Topology
EASY10K nodes register via heartbeat. Decommission drains ranges before removal (~60 min). Gossip for discovery, Raft for authority.
Replication and Fault Tolerance