TRICKYwalkthrough

ETA Estimation

6 of 8

3 related

The displayed ETA is the single most important number in the rider experience. Get it wrong by 5 minutes and users lose trust.

We chose a three-layer pipeline (not Dijkstra alone, not ML alone) because each layer corrects a different class of error. Trade-off: we accepted higher compute cost (3 model evaluations per query) in exchange for reducing ETA error from 25% to under 5%.

“The constraint: a static road graph gives 25% average error because it ignores traffic, construction, and time-of-day patterns.”

Layer 1: Dijkstra on the road graph. The road network is a weighted directed graph where edges are road segments and weights are traversal times.

Layer 2: real-time traffic weights. Every active driver reports speed every 3 seconds.

We aggregate these into per-road-segment speed estimates, updated every 30 seconds. A road segment with 50 drivers averaging 15 km/h overrides the static speed limit of 60 km/h.

Layer 3: ML correction. A gradient-boosted model trained on billions of historical trips corrects systematic biases: construction zones the map does not know about, traffic lights not modeled in the graph, school zones that slow traffic at 3 PM.

The fallback when the ML model fails: straight-line distance divided by average city speed (25 km/h). This overestimates by 40% on average but is better than showing nothing.

Why it matters in interviews

Interviewers look for the three-layer approach: static graph, real-time traffic, ML correction. What if the interviewer asks: 'Why not use Google Maps API instead of building your own?' We answer: at 35 dispatch queries/sec peak, each requiring ETA for 20-50 candidate drivers, that is 700-1,750 Google Maps calls/sec. At

5 per 1,000 requests, that is

300K/month for ETA alone. We need in-house computation at this scale.

Related concepts

← PreviousSurge Pricing Next →Driver Location Service