STANDARDwalkthrough

Scoring, Filtering, and the Cost of a Bad Suggestion

8 of 8
3 related
Autocomplete is the product's mouth: it speaks before the user finishes asking, and a single bad utterance: an offensive completion, a privacy leak, a defamatory pairing with a person's name: becomes a screenshot, a headline, sometimes a lawsuit. Quality is therefore not a ranking nicety; it is a safety system with three gates.
The base score is query frequency, but frequency alone promotes garbage: it is time-decayed (recent behavior dominates), CTR-weighted (a suggestion shown often but never clicked is actively harming the list: demote it), and normalized across near-duplicates so "weather today" and "weather today" pool their evidence. Gate two: filtering, applied at BUILD time and again at the overlay: a maintained blocklist of slurs, adult content, and violence patterns; policy classes for suggestions about living persons (many products simply refuse to complete name + accusation patterns); and PII scrubbing: if users paste emails or phone numbers into search, those queries must never become suggestions, no matter their frequency.
Gate one: scoring.
Filtering at build time means a leaked term requires a rebuild to fully purge: which is why there is also a serve-time kill switch: a small deny set consulted on every response, updatable in seconds when an incident is live, holding the line until the next clean build ships. Gate three: auditability.
Every suggestion shown is reconstructable: which build, which scores, which overlay contribution: because the first question after an incident is "why did we say that", and "we cannot know" is an unacceptable answer. The uncomfortable truth to volunteer before the interviewer raises it: this gate system is adversarial and far from perfect: manipulation campaigns, embedding tricks, and novel slurs will get through, so the design goal is minutes-to-mitigate (kill switch) plus days-to-clean (rebuild), not the fantasy of prevention.
What if the interviewer asks: why not run every suggestion through an ML safety model at serve time? Latency (10ms budget) and cost at 700K/sec: models run at build time and on the overlay's much smaller stream; serve time gets the precomputed verdicts.
Why it matters in interviews
The suggestion box is a liability surface, and treating filtering as build-time + kill-switch layers (with an honest minutes-to-mitigate posture) shows product and legal awareness most candidates skip entirely. CTR-weighted demotion is the quality detail that separates search-adjacent experience from guesswork.
Related concepts