STANDARDwalkthrough

Trie vs Hash Map: Storing the Prefix Space

2 of 8
3 related
Once every prefix maps to a precomputed top-10, what structure holds 200M prefix entries? Two honest candidates.
Its virtue is prefix sharing: "weather", "weathered", and "weathering" share the "weather" path, so common prefixes are stored once; its lookup walks one pointer per character (20 hops for 20 chars), each a potential cache miss on nodes scattered across the heap: pointer-chasing is the trie's hidden tax. The flat hash map: every prefix string is its own key mapping to its top-10 list.
The trie: one node per character, completions hang off the path, and each node carries its cached top-k.
Lookup is one hash and one probe: a single memory access pattern, friendlier to CPU caches, trivially shardable by key hash. The cost is repeated storage of the prefix strings themselves: "w", "we", "wea"... all materialized: which is why the practical answer for a bounded corpus is the hash map: strings are short, the 80 GB fits in sharded RAM, and O(1) beats O(len) with better constants.
Where the trie earns its place: when the prefix space must support operations a hash cannot: fuzzy matching within edit distance, lexicographic range scans, or memory-constrained embedded serving where prefix compression matters (radix/Patricia tries collapse single-child chains and cut node count dramatically). Real systems often layer both: a hash-map hot path with a trie-based fallback tier for typo tolerance.
The senior move in an interview is naming the decision criterion instead of a winner: hash for exact-prefix speed and shardability; trie for shared-prefix compression and richer traversal: then committing based on the requirements (ours: exact prefix, 100ms budget, RAM is affordable: hash map). What if the interviewer asks: what about a sorted array with binary search per shard?
Legitimate third option (it is how Lucene-style FSTs think); binary search costs log n probes but the array is perfectly cache-linear: worth mentioning, rarely worth building first.
Why it matters in interviews
Everyone says "use a trie" on reflex; the stronger answer weighs pointer-chasing vs string duplication and picks per requirement. Knowing radix tries and the FST/sorted-array third option signals depth beyond the flashcard.
Related concepts