DB Control Plane System Design Walkthrough
Complete design walkthrough with animated diagrams, capacity math, API design, schema, and failure modes.
Solution PathTarget: 30 min
We designed a distributed database control plane for 10K nodes and 1M ranges using a Raft-based metadata store (200 MB in memory), lease-based failure detection (12-second recovery), two-phase range splits at 512 MB with zero downtime, and a rebalancer processing ~100 moves/hour at 14 MB/sec. Online schema changes complete in 30 seconds via 100 parallel ghost-table workers.
1/10
1.
What is DB Control Plane?
Google Spanner manages millions of key-range splits across global datacenters. The control plane is the brain that sits above the data plane.
The data plane handles reads and writes. The control plane decides which node owns which key range, detects failures in seconds, rebalances shards without downtime, and orchestrates schema changes across thousands of nodes.
The core tension: 10K nodes, 1M ranges, and every routing decision depends on metadata that must be strongly consistent. If two nodes believe they own the same range, writes diverge and data is lost.
This is why we need Raft consensus for the metadata store. The control plane must handle three hard problems simultaneously: (1) failure detection under 12 seconds, (2) range splits at 512 MB with zero downtime, and (3) rebalancing at ~100 moves/hour without crushing the donor nodes.
Control plane = the brain above the data plane. Decides range ownership, detects failures (12 seconds), splits ranges (512 MB), rebalances (~100/hour). Core requirement: strongly consistent metadata via Raft.
Google Spanner manages millions of key-range splits across global datacenters. We are designing a control plane for a distributed database managing 10K nodes and 1M ranges where the real challenge is not storing data but deciding which node owns which key range, detecting failures in under 30 seconds, and rebalancing shards without a single query failing. The database stores data reliably. The control plane is the brain that makes every routing, placement, and recovery decision.
- Range metadata for 1M ranges fits in 200 MB of memory (200 bytes per range: range_id, start_key, end_key, leader, replicas, epoch)
- 10K nodes report heartbeats every 10 seconds, producing 1K heartbeats/sec for the failure detector to process
- Range rebalancing moves ~100 ranges per hour at steady state, each transfer completing in under 60 seconds
- Online schema changes propagate across 1M ranges without locking any range for more than 1 millisecond