EASYwalkthrough

Bloom Filters and the Read Path

7 of 8

3 related

A read arrives for a key. The LSM engine must check the memtable, then potentially every SSTable on disk, newest to oldest, because the key could live in any of them.

250 MB

of memory for the filter

The fix costs 10 bits per key. A bloom filter is a bit array plus k hash functions; inserting a key sets k bits, and a lookup checks those k bits.

“With 10+ SSTables per node, a naive read costs 10+ disk seeks: this is read amplification, the tax LSM pays for its write speed.”

If ANY bit is zero, the key is definitely not present: skip that SSTable without touching disk. If all k bits are set, the key is probably present: a false positive rate of about 1% at 10 bits per key with the optimal k of 7.

There are no false negatives, which is the property that makes skipping safe. Each SSTable carries its own bloom filter, loaded in RAM: a typical read then touches the memtable, gets "no" from 9 filters in nanoseconds, and does one real disk read against the single SSTable that probably holds the key.

Inside that SSTable, a sparse index (one entry per 64 KB block, also in RAM) narrows the seek to one block. The budget at our scale: 200M keys per node across SSTables at 10 bits per key is ~250 MB of RAM per node for filters, buying a ~10x reduction in read I/O: the best RAM-for-IO trade in the whole design.

The limits: bloom filters do nothing for range scans (no membership question to ask) and false positives still cost a wasted seek 1% of the time; sizing bits-per-key is a tunable (RocksDB defaults near 10). What if the interviewer asks: what happens as SSTables accumulate?

Filters multiply and so do false-positive chances; that is compaction's other job: fewer, larger SSTables keep the filter count and read fan-out bounded.

Why it matters in interviews

Bloom filters are the canonical probabilistic data structure question hiding inside the KV topic. Quoting the numbers (10 bits/key, ~1% FP, no false negatives) and placing the filter at the exact point in the read path shows precision that generic "use a bloom filter" answers lack.

Related concepts

← PreviousRead Repair and Merkle-Tree Anti-Entropy Next →Compaction and Write Amplification