TRICKYwalkthrough

ETA Estimation

6 of 8
3 related
The displayed ETA is the single most important number in the rider experience. Get it wrong by 5 minutes and users lose trust.
We chose a three-layer pipeline (not Dijkstra alone, not ML alone) because each layer corrects a different class of error. Trade-off: we accepted higher compute cost (3 model evaluations per query) in exchange for reducing ETA error from 25% to under 5%.
The constraint: a static road graph gives 25% average error because it ignores traffic, construction, and time-of-day patterns.
Layer 1: Dijkstra on the road graph. The road network is a weighted directed graph where edges are road segments and weights are traversal times.
Layer 2: real-time traffic weights. Every active driver reports speed every 3 seconds.
We aggregate these into per-road-segment speed estimates, updated every 30 seconds. A road segment with 50 drivers averaging 15 km/h overrides the static speed limit of 60 km/h.
Layer 3: ML correction. A gradient-boosted model trained on billions of historical trips corrects systematic biases: construction zones the map does not know about, traffic lights not modeled in the graph, school zones that slow traffic at 3 PM.
The fallback when the ML model fails: straight-line distance divided by average city speed (25 km/h). This overestimates by 40% on average but is better than showing nothing.
Why it matters in interviews
Interviewers look for the three-layer approach: static graph, real-time traffic, ML correction. What if the interviewer asks: 'Why not use Google Maps API instead of building your own?' We answer: at 35 dispatch queries/sec peak, each requiring ETA for 20-50 candidate drivers, that is 700-1,750 Google Maps calls/sec. At 5per1,000requests,thatis5 per 1,000 requests, that is 300K/month for ETA alone. We need in-house computation at this scale.
Related concepts