Autocomplete Cheat Sheet

Key concepts, trade-offs, and quick-reference notes for your interview prep.

Precompute the Answer, Not the Data

Query-time trie walks die on fat prefixes: "a" has hundreds of thousands of descendants to rank inside a 10ms budget. The move: store the final top-10 per prefix as a value. Lookup = one hash get, microseconds. Cost moves to build time and storage: 10M queries x ~20 prefixes each = 200M entries x ~400B = ~80 GB, sharded RAM, rebuilt offline. Same principle as feed fanout-on-write: when queries are enumerable and reads are latency-bound, move work to write time.

💡 k is a product constant (UI shows 8-10). Storing top-100 multiplies the index for invisible rows.

Client Contract: The 60-70% QPS Cut

Three client rules before any server exists: debounce 150-200ms (type "weather" = 7 keystrokes, 2-3 requests), minimum 2 characters (deletes the hottest, least useful tier), sequence numbers + cancellation (render a response only if its seq exceeds the last rendered: kills out-of-order flicker; AbortController cancels in-flight). Together: 60-70% of raw volume never sent. Bonus levers: prefetch-on-focus (zero perceived latency for keystroke one) and client-side recent-history merge (keeps the server index per-locale, not per-user).

💡 Debouncing does not slow users: the final prefix (the one they stop on) is the one that renders.

Zipf Head: Three Caches Before the Service

Top ~100K prefixes serve roughly half of all requests. Collect the skew: browser cache (Cache-Control 60-300s), CDN edge (responses are identical per (prefix, locale): perfectly cacheable GETs), in-process hot map (top ~1M entries pinned per node). Result: 700K/sec peak -> ~200K/sec at origin. Two-tier TTLs: trending-eligible short prefixes 30-60s, stable long prefixes 5 min. Guard hot-key expiry with edge request coalescing (one origin fetch per key), same as the maps tile-rollout pattern.

💡 No Redis between edge and service: the index is already RAM on the serving nodes; a cache there adds a hop, not speed.

Index Math: 10M Queries to 80 GB

Suggestion corpus: top 10M scored queries (from billions of raw). Average query ~20 chars -> ~20 prefixes each -> 200M prefix entries. Each entry: 10 suggestions x ~40B (string + score) = ~400B. Total ~80 GB per locale (en); replicate 3x for read throughput = 240 GB RAM fleet-wide, ~7 GB per shard at 12 shards. Rebuild cost is a batch job, not a serving concern. Sharding: hash the full prefix (never alphabetic ranges: Zipf melts a-c while x-z idles).

💡 Replication is for throughput, not durability: the immutable snapshot in object storage is the durable copy.

Build -> Ship -> Swap: Immutable Snapshots

Pipeline: searches log to Kafka (~58K/sec) -> batch scoring (time-decay + CTR weighting + blocklist + dedup) -> generate all prefix top-10s -> write a versioned, checksummed snapshot to object storage, pre-partitioned by shard. Serving nodes download their slice, warm it beside the live index, atomically swap a pointer. Promotion is instant; rollback is the same swap back; a corrupt build can never partially overwrite a good one. SSTable immutability as a deployment strategy.

💡 Never mutate the live index in place: real-time updates belong to the overlay, not the index.

Trending Overlay: Add, Never Reorder

Batch owns stable popularity; a decaying sliding-window counter (5-15 min, Redis/in-process, megabytes) owns the last minutes: tracking acceleration, not popularity. Serve time: fetch batch top-10, fetch overlay hits, merge with a cap (trending earns at most 2-3 slots, marked in the UI): the overlay can add, it cannot reorder the batch list wholesale, bounding manipulation blast radius. Decay halves counts every few minutes: spikes clean themselves up.

💡 Why not 5-minute full rebuilds? Compute waste for the 99.99% unchanged, and no human window to catch a poisoned build.

Trending Manipulation Defenses

A botnet pumping one query is, naively, indistinguishable from breaking news. Stack defenses: per-source rate caps in the counters (one IP/session contributes bounded weight), velocity baselines per locale (spike = multiple of normal, not absolute count), a sanity delay (sustain velocity N minutes before surfacing), the 2-3 slot merge cap, and the blocklist applying to the overlay with zero exceptions. Aftermath is handled by decay, not cleanup jobs.

💡 Design for an adversary: the overlay is the single most manipulable surface in the system.

Scoring and the Safety Gates

Score = frequency, time-decayed, CTR-weighted (shown-but-never-clicked suggestions get demoted: they are harming the list), normalized across near-duplicates. Safety: build-time filtering (blocklist, living-person policy classes, PII scrubbing: pasted emails/phones must never surface), plus a serve-time kill switch: a small deny set consulted per response, updatable in seconds during an incident, holding until the next clean build. Every suggestion is auditable: build version, scores, overlay contribution.

💡 Posture is minutes-to-mitigate + days-to-clean, not the fantasy of prevention. Say that out loud.

Capacity: Keystrokes to Origin QPS

500M DAU x 10 searches/day = 5B searches/day. ~6 raw keystroke-events per search = 30B/day; the client contract sends ~10B requests/day = 116K/sec average, ~350K/sec at 3x peak... plus spikes: budget 700K/sec. Edge + browser absorb 60-70%: ~200K/sec at origin across 12 shards x 3 replicas = ~5.5K/sec per node against an in-RAM hash get (microseconds): CPU headroom is enormous; the fleet is sized by RAM and blast radius, not QPS.

💡 Autocomplete traffic ~= 2-3x search traffic even after debouncing. Derive it, do not hand-wave it.

The Metrics: Latency Is Table Stakes, CTR Is Truth

#10

p99 end-to-end <100ms (edge) and service compute <10ms: table stakes. The quality truth: suggestion CTR (clicks / impressions): a slow decline means the index is drifting stale or scoring broke; a sudden per-prefix collapse means something offensive or wrong is ranked first. Index age (alert past 2 build cycles: serving stale is a failure even when latency is green), trending lag (event to surfaced, target <2 min), kill-switch hit rate (nonzero = live incident), edge hit ratio, and per-shard heat divergence.

💡 A perfectly fast autocomplete serving last week's world is broken. Index age is an availability metric.