Whiteboard ScaleTopicsDB Control Plane

Distributed Database Control Plane

COMMON

The control plane is asked at Google, Amazon, and CockroachDB interviews because it tests range metadata management, failure detection, and live rebalancing at global scale. It is how Google Spanner manages millions of key-range splits across datacenters. You will design a Raft-based metadata store tracking 1M ranges, a lease-based failure detector processing 1K heartbeats per second, and a range-split orchestrator that detects hotspots, picks a split key, and atomically transfers ownership without a single query failing.

  • Design a Raft-based metadata store tracking 1M range assignments in 200 MB
  • Build a lease-based failure detector that identifies dead nodes in under 30 seconds
  • Orchestrate range splits that move data without dropping a single query
GoogleAmazonCockroachDBMetaPlanetScaleMicrosoft
8
Concepts
Deep dives
10
Cheat Items
Quick ref
Elevator Pitch3-minute interview summary

I would design a control plane for a distributed database managing 10K nodes and 1M ranges. Range metadata (200 bytes per entry) totals 200 MB, fitting entirely in memory on each Raft replica. Failure detection uses 9-second leases across 10K nodes, processing 1K heartbeats per second. When a range exceeds 512 MB, the split orchestrator picks the median key, creates two new ranges, and completes the atomic metadata swap in under 60 seconds. The rebalancer moves ~100 ranges per hour when load variance exceeds 20%. Online schema changes propagate across all 1M ranges in under 30 seconds via parallel workers. Raft provides linearizable metadata reads with 2-failure tolerance.

Concepts Unlocked8 concepts in this topic