STANDARDwalkthrough

Offline Builds and Immutable Index Snapshots

4 of 8
3 related
The precomputed index has to come from somewhere, and the answer is a pipeline that never touches the serving path. Collection: every executed search (not every keystroke: keystrokes are noise) logs to Kafka: query text, locale, timestamp, and later, whether a suggestion was clicked. At 5B searches/day that is ~58K events/sec, trivial ingest. Aggregation: a batch job (hourly or daily) computes each query's score: raw frequency, time-decayed so last week outweighs last year, weighted by click-through rate so suggestions people actually choose rise above suggestions people scroll past, filtered against the blocklist, deduplicated and normalized (casing, whitespace, obvious typo folding). Build: from the scored corpus, generate every prefix's top-10 and write the result as an immutable snapshot: a versioned, checksummed artifact in object storage, sharded to match the serving topology. Ship: serving nodes download their shard of snapshot N+1 in the background, warm it in RAM alongside N, then atomically swap a pointer: promotion is instant, rollback is the same pointer swap back to N, and a corrupted build can never partially overwrite a good one.
700K
reads/sec saturating MySQL
The trade-off is staleness by construction: the world can change between builds, which is precisely the gap the trending overlay covers: the batch layer owns stable popularity, the overlay owns the last few minutes, and neither contaminates the other. What if the interviewer asks: why not update the index in real time as queries arrive?
This is the SSTable lesson from the KV store wearing different clothes: immutability is the deployment strategy: no in-place mutation of a live index, no locks between readers and builders, no torn state on crash.
Because a live-mutated 80 GB structure under 700K reads/sec needs concurrency control that costs more than it gives: and a bad update (a manipulation spike, a pipeline bug) becomes instantly irreversible. Batch + overlay gives freshness AND an undo button.
Why it matters in interviews
The build/ship/swap pattern (immutable snapshot, atomic promotion, trivial rollback) is reusable across every precomputed-index system: feature stores, ranking models, embeddings. Tying it back to SSTable immutability shows the course's ideas composing.
Related concepts