Running Dijkstra on the Full 500M-Node Graph
Very CommonFORMULA
Candidates propose standard Dijkstra for shortest-path routing without considering that the global road graph has 500 million nodes and a single query would explore millions of them.
Why: Dijkstra is the textbook shortest-path algorithm taught in every algorithms course. Candidates default to it without calculating the query cost at scale: 500ms to 2 seconds per query at 500M nodes.
WRONG: Running Dijkstra from source to destination on the full road graph. At 500M nodes, a cross-country route explores millions of nodes, taking 500ms to 2 seconds. At 11,600 QPS, this requires 11,600 dedicated cores doing nothing but pathfinding, which is infeasible.
RIGHT: Use Contraction Hierarchies: preprocess the graph offline by contracting low-importance nodes and adding shortcut edges. Bidirectional upward search explores ~1,000 nodes in under 10ms, a 100x speedup. Trade-off: preprocessing takes 30 minutes and road closures need a real-time edge weight overlay from Redis.
Static Edge Weights Without Traffic Overlay
Very CommonFORMULA
Candidates use fixed speed limits as edge weights in the road graph, ignoring that real-world speeds vary dramatically by time of day and traffic conditions.
Why: Using speed limits as edge weights is the simplest approach and gives correct routes in terms of distance. Candidates do not consider that a route on a highway with a 65 mph speed limit at rush hour may actually average 15 mph.
WRONG: Setting edge weights to distance / speed_limit and never updating them. A highway with a 65 mph limit during a traffic jam actually moves at 15 mph. The router sends users onto congested highways when side roads would be faster. ETAs are off by 20 to 40%.
RIGHT: Use a real-time edge weight overlay from Redis. Ingest 4M GPS probes/sec, aggregate median speed per road segment every 60 seconds, and override the static weight in the CH search. Trade-off: live weights add a Redis lookup per edge during CH traversal, but with only ~1,000 edges explored per query, the overhead is under 1ms.
SQL LIKE Queries for POI Search at 200M Rows
Very CommonFORMULA
Candidates search for places using SQL LIKE '%coffee%' on a 200M-row table, producing multi-second full table scans.
Why: SQL is the default tool for most developers. LIKE queries work fine on small tables but do not scale to 200M rows because they bypass indexes and scan every row.
WRONG: Using SQL LIKE '%coffee%' on the places table. This triggers a full table scan across 200M rows (100 GB), taking 3 to 10 seconds instead of the required 200ms. Adding a B-tree index on name does not help because leading-wildcard queries cannot use B-tree indexes.
RIGHT: Use Elasticsearch with geospatial filtering. Two-phase query: (1) geo-filter to a bounding box around the user, (2) BM25 full-text search on name and category, boosted by rating and distance. Sub-200ms at 200M rows. Trade-off: maintaining an Elasticsearch cluster alongside PostgreSQL adds operational complexity, but search is a separate access pattern from transactional POI updates.
Serving Raster Tiles Instead of Vector Tiles
CommonFORMULA
Candidates propose serving PNG image tiles, not realizing that raster tiles cannot be rotated, re-styled, or scaled to retina displays without server-side re-rendering.
Why: Early web maps (Google Maps pre-2013, OpenStreetMap's default) used raster tiles. Candidates who have seen these tile URLs assume PNG is still the standard without considering modern client-side rendering.
WRONG: Serving pre-rasterized PNG tiles at each zoom level. A raster tile at zoom 14 is 80 to 200 KB versus 20 to 50 KB for vector. Changing the map style requires re-rendering the entire 500 TB tile set. Text labels become illegible when the map is rotated. Retina displays require 4x the pixels, quadrupling bandwidth.
RIGHT: Serve vector tiles (Protocol Buffer MVT format) containing geometric primitives. The client GPU rasterizes using a local style sheet. 3x smaller, rotatable, retina-native. Trade-off: low-end devices with weak GPUs may render slower; fall back to raster PNG for those devices.
Global CDN Invalidation on Map Update
CommonFORMULA
Candidates propose invalidating all CDN tiles worldwide when the map data is updated, causing a cache miss storm at 521K requests per second.
Why: The intuition is that all users should see the latest map immediately. Candidates do not consider the CDN capacity implications of invalidating 500 TB of cached tiles simultaneously.
WRONG: Issuing a global CDN invalidation when new satellite imagery or road data is published. All 521K tile requests per second hit the origin, which is provisioned for only 10.4K/sec. Origin collapses, and all map users see blank tiles until the CDN re-warms.
RIGHT: Use staggered rollout: invalidate tiles POP by POP over 24 hours. At each POP, use request coalescing so concurrent requests for the same tile trigger a single origin fetch. Trade-off: some users see slightly older tiles during the 24-hour rollout window, but the map remains available throughout.
Single CH Graph Without Real-Time Override
CommonFORMULA
Candidates build a Contraction Hierarchies graph with static weights and no mechanism to incorporate live traffic or road closures.
Why: CH is already an advanced algorithm, and candidates feel accomplished just describing the preprocessing and query phases. They do not consider what happens when the preprocessed graph becomes stale due to real-time conditions.
WRONG: Serving routes from a static CH graph preprocessed with speed-limit weights. When a highway closes for construction, the router still sends drivers onto it. When rush hour congestion builds, the router does not account for slower speeds. Routes become dangerously inaccurate.
RIGHT: Combine CH with a real-time edge weight overlay in Redis (900 MB for 50M segments). Set closed roads to infinity weight. Update speeds every 60 seconds from GPS probes. For permanent changes, re-contract the graph nightly. Trade-off: overlay-adjusted routes may be 1 to 3% suboptimal versus a full re-contraction, but the 60-second freshness is worth it.
Using Raw GPS Speed Without Map Matching
CommonFORMULA
Candidates compute traffic speeds directly from GPS coordinates without first snapping those coordinates to the correct road segment.
Why: GPS seems precise enough: coordinates are accurate to 5 to 15 meters. Candidates do not consider that in dense urban areas, 15m error means a probe on an overpass gets attributed to the surface road below it.
WRONG: Taking raw GPS coordinates with speed readings and directly attributing them to the nearest road segment by Euclidean distance. In dense urban areas with overpasses, tunnels, and parallel roads, this attributes probes to the wrong segment, corrupting traffic estimates for both roads.
RIGHT: Use HMM map matching that considers the sequence of GPS points, road geometry, and transition probabilities. The Viterbi algorithm finds the most probable road sequence. Trade-off: adds 2 to 5ms per probe, but this is negligible compared to the cost of corrupted traffic data affecting millions of routing queries.
No Confidence Weighting on Sparse Probes
CommonFORMULA
Candidates take the average speed from GPS probes at face value without considering that segments with very few probes produce unreliable estimates.
Why: Averaging speeds seems straightforward. Candidates do not consider that a segment with 1 probe from a stopped delivery truck shows 0 mph, while the road is actually flowing at 45 mph.
WRONG: Computing traffic speed as the simple average of all GPS probe speeds on a segment. A rural road with 1 probe from a stopped vehicle shows 0 mph and gets marked as congested. Routes are diverted away from a perfectly clear road based on a single misleading data point.
RIGHT: Use confidence-weighted aggregation: segments with fewer than 3 probes in the 60-second window get a blend of probe data and historical speed averages, weighted by probe count. Segments with zero probes use pure historical data. Trade-off: historical fallback may miss sudden events on low-traffic roads, but it prevents single-probe noise from corrupting routing.