Whiteboard Scale›Topics›DB Control Plane

Distributed Database Control Plane

COMMON

The control plane is asked at Google, Amazon, and CockroachDB interviews because it tests range metadata management, failure detection, and live rebalancing at global scale. It is how Google Spanner manages millions of key-range splits across datacenters. You will design a Raft-based metadata store tracking 1M ranges, a lease-based failure detector processing 1K heartbeats per second, and a range-split orchestrator that detects hotspots, picks a split key, and atomically transfers ownership without a single query failing.

Design a Raft-based metadata store tracking 1M range assignments in 200 MB
Build a lease-based failure detector that identifies dead nodes in under 30 seconds
Orchestrate range splits that move data without dropping a single query

GoogleAmazonCockroachDBMetaPlanetScaleMicrosoft

Concepts

Deep dives

Cheat Items

Quick ref

▶

Visual Solutions

Step-by-step animated walkthroughs with capacity estimation, API design, database schema, and failure modes built in.

AnimatedWatch solutions →

📋

Cheat Sheet

Key concepts, trade-offs, and quick-reference notes for DB Control Plane. Everything you need at a glance.

Quick referenceView cheat sheet →

⚠

Anti-Patterns

Common design mistakes candidates make. Wrong approaches vs correct approaches for each trap.

8 anti-patternsLearn pitfalls →

🔥

Failure Modes

What breaks in production, how to detect it, and how to fix it. Detection metrics, mitigations, and severity ratings.

5 failure modesStudy failures →

Difficulty Ladder

Start simple. Build to staff-level.

Level 1

Junior / Basics

Core concepts, single-service design, straightforward requirements

Level 2

Mid-Level Interview

Multi-service architecture, trade-off discussions, standard scaling

Level 3

Senior / Deep Dive

Complex distributed systems, failure modes, consistency guarantees

Level 4

Staff+ / FAANG Hard

Planet-scale design, novel architectures, cross-cutting concerns

Elevator Pitch3-minute interview summary

“I would design a control plane for a distributed database managing 10K nodes and 1M ranges. Range metadata (200 bytes per entry) totals 200 MB, fitting entirely in memory on each Raft replica. Failure detection uses 9-second leases across 10K nodes, processing 1K heartbeats per second. When a range exceeds 512 MB, the split orchestrator picks the median key, creates two new ranges, and completes the atomic metadata swap in under 60 seconds. The rebalancer moves ~100 ranges per hour when load variance exceeds 20%. Online schema changes propagate across all 1M ranges in under 30 seconds via parallel workers. Raft provides linearizable metadata reads with 2-failure tolerance.”

Concepts Unlocked8 concepts in this topic

Range Partitioning

EASY

We chose range partitioning (not hash) because SQL databases need range scans. 1M ranges at 512 MB each. Hotspots handled by split/merge mechanism.

What is it?

Metadata Store (Raft)

STANDARD

Raft-based store for 1M range entries in 200 MB. Linearizable reads. 5-node group tolerates 2 failures. Embedded, not external ZooKeeper.

High Level Design

Lease-Based Failure Detection

STANDARD

9-second leases with 1K heartbeats/sec. Expired lease = guaranteed fencing. Failure detected in 12 seconds. Deterministic, not probabilistic.

Core Feature Design

Automatic Rebalancing

TRICKY

Triggered at 20% load variance. Moves ~100 ranges/hour. Selects target by rack/zone/disk. Two-phase transfer with zero downtime.

Core Feature Design

Online Schema Changes

TRICKY

Ghost table: shadow schema, parallel backfill across 100 workers, atomic swap. 1M ranges in 30 seconds. No table locks.

Database Schema

Replica Placement Policies

STANDARD

RF=3 spread across racks, zones, regions. Constraint solver picks placement. Trade-off: cross-region adds 50-100ms write latency.

High Level Design

Range Split and Merge

TRICKY

Split at 512 MB using median key by data volume. Two-phase: in-flight queries served by old range until atomic metadata swap via Raft.

Core Feature Design

Cluster Topology

EASY

10K nodes register via heartbeat. Decommission drains ranges before removal (~60 min). Gossip for discovery, Raft for authority.

Replication and Fault Tolerance