STANDARDwalkthrough

Geocoding and Place Search

7 of 8
3 related
A user types "coffee near me" into the Google Maps search bar. The system must interpret this as a category search ("coffee"), apply a geospatial filter ("near me" means within 2 km of the user's current location), rank results by relevance, rating, and distance, and return the top 10 results in under 200ms.
8 GB
of memory for the filter
Each POI record contains name, address, category tags, coordinates, rating, and hours, averaging 500 bytes. Total storage: 200M×500B=100 GB200\text{M} \times 500\text{B} = 100\text{ GB}.
This requires two core operations. Forward geocoding converts a text address into coordinates: "1600 Pennsylvania Ave, Washington DC" becomes (38.8977,77.0365)(38.8977, -77.0365). Reverse geocoding does the opposite: a tap on the map at (40.7484,73.9857)(40.7484, -73.9857) returns "Empire State Building, New York." We store 200 million Points of Interest (POIs) in an Elasticsearch cluster.
For text search, we tokenize the user's input and run a fuzzy match against place names using Elasticsearch's BM25 scoring. But raw text matching is not enough.
A search for "coffee" from San Francisco should not return results in Tokyo. We apply a geospatial boost: results within 2 km of the user's location receive a 10x relevance multiplier, results within 10 km get 3x, and anything beyond gets no boost.
Elasticsearch's geo_distance scoring function handles this natively. For autocomplete, we use Elasticsearch's completion suggester, which builds an in-memory finite state transducer (FST) for prefix matching.
As the user types "star", the FST returns "Starbucks" within 50ms. The FST for 200M POIs consumes about 8 GB of RAM.
The alternative to Elasticsearch is a custom trie with geospatial indexing (R-tree), which would be faster but requires building and maintaining the entire search infrastructure from scratch. Trade-off: Elasticsearch gives us full-text search, fuzzy matching, geospatial queries, and autocomplete out of the box, but it consumes more memory and has higher tail latency (p99 around 150ms) compared to a hand-tuned solution.
Google Places API handles billions of geocoding requests daily using a proprietary search engine, but the architecture principles are the same. What if the interviewer asks: how do we handle misspellings like "Starbuks"?
Elasticsearch's fuzzy query with edit distance 2 matches "Starbuks" to "Starbucks" with minimal latency impact.
Why it matters in interviews
Geocoding tests whether we can combine text search with geospatial filtering in a single query path. Explaining the geospatial boost scoring, the FST-based autocomplete, and the 100 GB index size shows practical search engineering, not just theoretical knowledge.
Related concepts