TRICKYwalkthrough

Compaction and Write Amplification

8 of 8
3 related
Every memtable flush creates another SSTable. Left alone, a node accumulates hundreds: reads fan out wider, bloom-filter RAM grows, deleted data (tombstones) never actually dies, and overwritten values occupy disk forever. Compaction is the background process that merges SSTables, keeping only the newest version of each key and dropping expired tombstones.
Two mainstream strategies trade different resources. Size-tiered compaction (Cassandra's default): merge similarly-sized SSTables into bigger ones. Lowest write amplification (~4-6x), great for write-heavy loads, but a key can exist in many tiers simultaneously: space overhead can transiently hit 2x during a big merge and reads touch more tables. Leveled compaction (RocksDB, LevelDB): organize SSTables into levels with non-overlapping key ranges within each level, each level 10x larger than the previous.
It is also where the LSM design pays its bill: every byte written by a client is re-written multiple times as it moves through compaction, a multiplier called write amplification.
A read touches at most one SSTable per level (tight read amplification, predictable p99) and space overhead stays ~10%, but write amplification climbs to ~10x because data cascades down the levels. Our numbers make it concrete: 5 GB/sec of logical ingest across the cluster becomes ~50 GB/sec of physical compaction I/O under leveled compaction: the cluster's disks spend most of their bandwidth on housekeeping, and that is by design.
The operational failure mode is compaction debt: if ingest outruns compaction (undersized disks, throttled too hard), SSTables pile up, read latency degrades cluster-wide, and tombstones resurrect... nothing deletes. Monitoring pending compactions and SSTables-consulted-per-read is as important as watching CPU.
What if the interviewer asks: which strategy for us? Leveled for read-latency-sensitive ranges, size-tiered for ingest-heavy ranges; per-table strategy selection is exactly the knob real systems expose.
Why it matters in interviews
Write amplification is the hidden cost that makes LSM vs B-tree a real trade instead of a free win. Quoting size-tiered ~5x vs leveled ~10x, and naming compaction debt as the operational failure, demonstrates production experience with storage engines.
Related concepts