Anti-Patterns

KV Store Anti-Patterns

Common design mistakes candidates make. Learn what goes wrong and how to avoid each trap in your interview.

B-Tree Engine for a Write-Heavy Workload

Very CommonFORMULA

Choosing an update-in-place B-tree store for millions of writes per second, then scaling the node count to compensate for random-I/O throughput.

Why: B-trees are the familiar default from relational databases, and the random-write ceiling only appears under sustained production load.

WRONG: PostgreSQL-style storage for 5M writes/sec. Each write is a random page update; checkpointing thrashes, p99 spikes, and the fleet balloons to 10x the nodes an LSM design needs.

RIGHT: LSM engine: WAL append + memtable insert, sequential SSTable flushes, background compaction. 10K+ writes/sec per node on the same hardware; pay with read amplification and compaction I/O instead.

Assuming R=1, W=1 Reads Are Fresh

Very CommonFORMULA

Running minimal quorums for speed while the application logic quietly assumes read-your-writes consistency.

Why: R=1/W=1 passes every test on a healthy cluster because replication lag is milliseconds; the staleness only surfaces under load or failure.

WRONG: Write a value with W=1, read it back with R=1 from a different replica, get the old value, and file a bug against the database. 1 + 1 is not greater than 3.

RIGHT: Apply the inequality deliberately: R + W > N where freshness matters (2+2>3), document where it does not, and use same-coordinator session semantics for read-your-writes flows.

LWW as the Universal Conflict Strategy

CommonFORMULA

Configuring last-writer-wins for every table because it never surfaces conflicts to the application.

Why: LWW makes conflicts invisible, which feels like making them solved. Vector clocks require application merge logic nobody wants to write.

WRONG: Shopping-cart table on LWW. A partition splits traffic, two adds land on opposite sides, clocks skew by 800ms, and the losing add silently vanishes. No error, no log: just a missing item and an angry customer.

RIGHT: Classify per table: LWW where a lost concurrent write is tolerable (caches, presence, metrics); vector clocks with application merge or CRDTs where it is not (carts, documents, counters). Alert on coordinator clock skew past 100ms.

Ignoring Read Amplification Until p99 Burns

CommonFORMULA

Treating LSM reads as free, skipping bloom-filter tuning and SSTable-count monitoring while tables accumulate.

Why: Fresh clusters have few SSTables, so reads are fast on day one; amplification compounds silently as data and flush count grow.

WRONG: Reads fan out across 40 accumulated SSTables, each a potential seek. p99 read latency climbs from 4ms to 60ms over a month with no code change, and the team blames the network.

RIGHT: Bloom filters at ~10 bits/key on every SSTable (1% FP, no false negatives), sparse block indexes in RAM, and SSTables-consulted-per-read as a first-class dashboard metric with compaction keeping the count bounded.

Letting Compaction Fall Behind

CommonFORMULA

Throttling or under-provisioning compaction to protect foreground traffic, accumulating unmerged SSTables as invisible debt.

Why: Compaction I/O looks like overhead you can defer; the damage (read amplification, disk pressure, immortal tombstones) accrues off any dashboard people watch.

WRONG: Cap compaction at 10% of disk bandwidth during a traffic surge. Pending compactions climb into the thousands, reads touch dozens of tables, disks fill with dead data, and deletes stop deleting.

RIGHT: Treat compaction throughput as capacity, not overhead: provision disks for ~10x write amplification, alert on pending-compaction growth rate, and shed ingest (backpressure) before shedding compaction.

One Celebrity Key Melting Its Preference List

CommonFORMULA

Assuming ring placement means load balance: a single viral key routes all its traffic to the same N=3 replicas.

Why: Vnodes smooth KEY distribution, not REQUEST distribution. Nothing in the ring prevents one key from being 1000x hotter than the rest.

WRONG: A flash-sale item's inventory key takes 500K reads/sec. Its three replicas saturate while 497 nodes idle; the coordinator queue backs up and unrelated keys on those replicas suffer collateral latency.

RIGHT: Detect per-key heat (top-K samplers), cache hot keys at the coordinator layer with short TTLs, and shard writable hot keys (key#1..key#8 with scatter-gather) or move flash-sale counters to purpose-built counters.

Forgetting That Deletes Are Writes

OccasionalFORMULA

Treating DELETE as space reclamation, when in an LSM store it writes a tombstone that must outlive every stale copy of the key.

Why: In B-tree stores, deletes remove data. In LSM stores they ADD data (tombstones) that only compaction can retire, after a safety window (gc_grace) that exists so a rejoining replica cannot resurrect deleted data.

WRONG: A queue-like workload writes and deletes millions of rows per hour. Ranges fill with tombstones; reads scan thousands of dead markers per row returned; disk usage grows while the dataset "shrinks".

RIGHT: Model delete-heavy workloads differently: TTL-based expiry instead of explicit deletes where possible, partition design that lets whole SSTables age out, tombstone-ratio monitoring per range, and gc_grace tuned to the repair cadence.

Unthrottled Repair After an Outage

OccasionalFORMULA

Kicking off full anti-entropy and hint delivery at maximum speed the moment a failed node returns.

Why: After an outage, the instinct is to restore redundancy as fast as possible; nobody budgets for repair traffic in capacity planning.

WRONG: A rack comes back after 4 hours. Every neighbor simultaneously streams hints and Merkle diffs at full bandwidth; foreground p99 triples and the "recovery" pages more people than the outage did.

RIGHT: Rate-limit hint delivery and repair streaming (fixed MB/sec budget per node), stagger range repairs, and prioritize by divergence age. Redundancy restored over an hour beats an incident restored over a minute.