Whiteboard ScaleTopicsAutocomplete

Search Autocomplete

COMMON

Autocomplete design is asked at Google, Amazon, LinkedIn, and Meta because it hides a complete distributed system behind the most familiar UI element on the internet. It is how a suggest box answers 700K requests per second in under 100ms: faster than the next keystroke falls. You will abolish query-time ranking by precomputing every prefix's top-10 (200M entries, 80 GB of sharded RAM), delete 60-70% of traffic with a three-rule client contract, layer trending freshness over immutable hourly snapshots, and treat suggestion quality as the safety system it really is.

  • Abolish query-time ranking: precompute top-10 per prefix (10M queries -> 200M entries -> 80 GB)
  • Cut 60-70% of QPS with the client contract: debounce, 2-char minimum, sequence-numbered discard
  • Layer freshness: immutable snapshot builds + a capped trending overlay that adds but never reorders
GoogleAmazonLinkedInMetaUberAirbnb
8
Concepts
Deep dives
10
Cheat Items
Quick ref
Elevator Pitch3-minute interview summary

I would design autocomplete for 500M DAU serving 700K requests per second in under 100ms. The key move: abolish query-time ranking by precomputing every prefix's top-10 offline: 10M corpus queries become 200M prefix entries, about 80 GB of hash-sharded RAM, so serving is one hash get in microseconds. A client contract (150ms debounce, 2-character minimum, sequence-numbered stale discard) deletes 60-70% of keystrokes before any server, and Zipf-aligned edge caching absorbs two-thirds of the rest: origin sees about 200K/sec. Freshness layers hourly immutable snapshots (canaried pointer-swap promotion) with a megabytes-sized decaying trending overlay capped at 2-3 marked slots. Quality is a safety system: CTR-weighted scoring, build-time filters, and a 60-second kill switch.

Concepts Unlocked8 concepts in this topic