Whiteboard ScaleAutocompleteDesign Walkthrough

Autocomplete System Design Walkthrough

Complete design walkthrough with animated diagrams, capacity math, API design, schema, and failure modes.

Solution PathTarget: 25 min
We designed search autocomplete for 500M DAU: 700K requests/sec peak answered in under 100ms end-to-end. The key insight: precompute every prefix's top-10 offline (200M entries, 80 GB of sharded RAM) so serving is one hash get: query-time ranking is abolished, not optimized. A client contract (debounce, 2-char minimum, sequence numbers) deletes 60-70% of keystrokes; Zipf-aligned browser and edge caching absorbs two-thirds of the rest. Freshness layers: hourly immutable snapshot builds with canaried pointer-swap promotion, plus a megabytes-sized decaying trending overlay that can add capped slots but never reorder. Quality is a safety system: CTR-weighted scoring, build-time filters, and a 60-second kill switch.
1/10
1.

What is Autocomplete?

Type "we" into a search box and ranked predictions appear before your finger reaches the next key. The interaction is so familiar it reads as trivial: it is one of the highest-QPS, lowest-latency, most adversarially-watched endpoints in consumer software.
Three constraints define it, and they pull in different directions. Latency: the suggestion must land inside ~100 milliseconds: the threshold where interfaces feel instantaneous: and the server's slice of that is under 10ms, which immediately outlaws computing anything interesting at query time. Freshness: popularity is not static: a concert announcement rewrites the right answer for "taylor" within one minute: so an index frozen at build time needs a living edge. Quality: the box speaks first, in the product's voice, to everyone: one offensive completion, one name paired with an accusation, and the feature is a headline: so ranking is entangled with safety in a way most systems never face. What makes the topic a great interview is that all three constraints resolve through one architectural idea: precompute every prefix's answer offline (latency), layer a small decaying overlay for the last minutes (freshness), and gate the pipeline with scoring, filters, and a serve-time kill switch (quality).
The scale framing that surprises candidates: because every search emits several prefix requests, autocomplete out-QPSes the search engine it serves: 2-3x after aggressive client-side trimming: while its unit of work: one hash lookup in RAM: is a thousand times cheaper. It is the inverted image of most systems: enormous request volume, nearly weightless requests, and all the real machinery hidden in an offline pipeline and a safety layer.
Three pulling constraints: latency (~100ms e2e, <10ms server: no query-time compute), freshness (the right answer changes mid-afternoon), quality (the box speaks first: safety = ranking). One idea resolves them: precompute + overlay + gates. And it out-QPSes search itself 2-3x with thousand-fold cheaper requests.
Search autocomplete is the interface that answers before the question is finished: by the second keystroke, ranked suggestions are on screen, and the whole loop: keystroke, network, lookup, render: fits inside ~100 milliseconds, faster than the next key falls. Under the box sit three constraints in tension. Latency: the compute budget per request is under 10ms, which outlaws any query-time ranking. Freshness: the world changes mid-afternoon, and yesterday's index cannot suggest tonight's breaking news. Quality: the box speaks with the product's voice, and one offensive completion is a headline. The design resolves the tension with one central move: precompute the top-10 answer for every prefix, rebuild it continuously offline, and patch the last few minutes with a small trending overlay.
  • Scale: 500M DAU, ~10B suggest requests/day after the client contract trims keystrokes: 700K/sec peak budget
  • The key move: precomputed top-10 per prefix: 200M entries, ~80 GB, one hash get per request
  • Freshness is layered: immutable batch snapshots for stable truth + a decaying trending overlay for the last minutes
  • Quality is a safety system: CTR-weighted scoring, build-time filtering, and a serve-time kill switch