STANDARDwalkthrough

Geocoding and Place Search

7 of 8

3 related

A user types "coffee near me" into the Google Maps search bar. The system must interpret this as a category search ("coffee"), apply a geospatial filter ("near me" means within 2 km of the user's current location), rank results by relevance, rating, and distance, and return the top 10 results in under 200ms.

8 GB

of memory for the filter

Each POI record contains name, address, category tags, coordinates, rating, and hours, averaging 500 bytes. Total storage:

200\text{M} \times 500\text{B} = 100\text{ GB}

“This requires two core operations. Forward geocoding converts a text address into coordinates: "1600 Pennsylvania Ave, Washington DC" becomes

(38.8977, -77.0365)

. Reverse geocoding does the opposite: a tap on the map at

(40.7484, -73.9857)

returns "Empire State Building, New York." We store 200 million Points of Interest (POIs) in an Elasticsearch cluster.”

For text search, we tokenize the user's input and run a fuzzy match against place names using Elasticsearch's BM25 scoring. But raw text matching is not enough.

A search for "coffee" from San Francisco should not return results in Tokyo. We apply a geospatial boost: results within 2 km of the user's location receive a 10x relevance multiplier, results within 10 km get 3x, and anything beyond gets no boost.

Elasticsearch's geo_distance scoring function handles this natively. For autocomplete, we use Elasticsearch's completion suggester, which builds an in-memory finite state transducer (FST) for prefix matching.

As the user types "star", the FST returns "Starbucks" within 50ms. The FST for 200M POIs consumes about 8 GB of RAM.

The alternative to Elasticsearch is a custom trie with geospatial indexing (R-tree), which would be faster but requires building and maintaining the entire search infrastructure from scratch. Trade-off: Elasticsearch gives us full-text search, fuzzy matching, geospatial queries, and autocomplete out of the box, but it consumes more memory and has higher tail latency (p99 around 150ms) compared to a hand-tuned solution.

Google Places API handles billions of geocoding requests daily using a proprietary search engine, but the architecture principles are the same. What if the interviewer asks: how do we handle misspellings like "Starbuks"?

Elasticsearch's fuzzy query with edit distance 2 matches "Starbuks" to "Starbucks" with minimal latency impact.

Why it matters in interviews

Geocoding tests whether we can combine text search with geospatial filtering in a single query path. Explaining the geospatial boost scoring, the FST-based autocomplete, and the 100 GB index size shows practical search engineering, not just theoretical knowledge.

Related concepts

← PreviousETA Prediction with ML Next →Location Service and Map Matching