Key-Value Store
COMMONKey-value store design is asked at Amazon, Google, and Meta because it tests storage engines, replication, and consistency trade-offs in one problem. It is the design behind DynamoDB, Cassandra, and Riak: the systems holding carts, sessions, and device state that must never refuse a write. You will design an LSM storage engine that sustains 10K writes/sec per node by making every disk write sequential, a consistent-hash ring that places 100 billion keys with no metadata service, and tunable quorums (R + W > N) with hinted handoff, vector clocks, and Merkle-tree repair.
- Design an LSM storage engine (WAL + memtable + SSTable) that sustains 10K writes/sec per node
- Apply quorum math (R + W > N) and explain why overlap guarantees intersection, not agreement
- Choose between LWW and vector clocks, and name the clock-skew failure that silently loses writes
Visual Solutions
Step-by-step animated walkthroughs with capacity estimation, API design, database schema, and failure modes built in.
Cheat Sheet
Key concepts, trade-offs, and quick-reference notes for KV Store. Everything you need at a glance.
Anti-Patterns
Common design mistakes candidates make. Wrong approaches vs correct approaches for each trap.
Failure Modes
What breaks in production, how to detect it, and how to fix it. Detection metrics, mitigations, and severity ratings.
Start simple. Build to staff-level.
“I would design a Dynamo-class key-value store holding 100 TB (100 billion keys) on 500 nodes, serving 20M reads and 5M writes per second. The LSM engine makes every disk write sequential: WAL append plus memtable insert, flushed as immutable SSTables, giving 10K writes/sec per node at the cost of 10x compaction amplification, with bloom filters (10 bits/key, 1% false positives) keeping reads to one seek. A consistent-hash ring with 256 vnodes places keys and N=3 replicas with no metadata service. Quorums tune consistency per request: R+W>N for fresh reads. Sloppy quorums with hinted handoff keep writes flowing through failures, and read repair plus weekly Merkle anti-entropy converge replicas afterward.”
LSM Trees
TRICKYWAL + memtable + sequential SSTable flushes: 10K writes/sec/node, paid for in compaction
Core Key-Value DesignRing Placement with Vnodes
STANDARD256 vnodes smooth variance under 10%; preference lists give replication for free
High Level System DesignQuorum Reads/Writes
TRICKYR + W > N guarantees intersection, not agreement: versions and repair close the gap
High Level System DesignLWW vs Vector Clocks
TRICKYClock skew makes LWW lose writes silently; vector clocks surface siblings to merge
Replication and Fault ToleranceHinted Handoff
STANDARDAlways-writable through failures: the next ring node holds a deliver-on-return hint
Replication and Fault ToleranceRead Repair + Anti-Entropy
STANDARDThree repair horizons: read repair (seconds), hints (minutes), Merkle trees (weekly)
Replication and Fault ToleranceBloom Filters
EASY10 bits/key, ~1% false positives, zero false negatives: skip SSTables safely
Core Key-Value DesignCompaction Strategies
TRICKYSize-tiered ~5x vs leveled ~10x write amplification; debt degrades every read
Core Key-Value Design