TRICKYwalkthrough

Live Traffic from GPS Probes

5 of 8
3 related
It is 5:30 PM on a Friday. A three-car accident closes two lanes on Highway 101 in Silicon Valley.
How does the system detect this without any human reporting it? The answer is GPS probe data from millions of phones.
Within 60 seconds, Google Maps shows the highway segment in red.
Roughly 20 million users are actively navigating at any moment, each sending a GPS update every 5 seconds. That is 20M/5=4M updates/sec20\text{M} / 5 = 4\text{M updates/sec}.
Each update contains latitude (8B), longitude (8B), speed (4B), heading (4B), timestamp (8B), and device metadata (8B), totaling about 40 bytes. Throughput into the ingestion layer: 4M×40B=160 MB/sec4\text{M} \times 40\text{B} = 160\text{ MB/sec}.
We pipe this through Kafka partitioned by geographic tile. But raw GPS coordinates have 5 to 10 meter error in open sky and 50 meters or more in urban canyons.
A GPS point might land on a parallel frontage road instead of the highway. We solve this with map matching: a Hidden Markov Model (HMM) where the hidden states are road segments and the observations are GPS readings.
The Viterbi algorithm finds the most likely sequence of road segments the driver actually traversed. Once matched, we compute each road segment's current speed as the median speed of all probes on that segment within a 60-second sliding window.
We track 50 million road segments globally. Each segment stores a current speed (4B), sample count (2B), confidence score (4B), and timestamp (8B), totaling 18 bytes.
Total in Redis: 50M×18B=900 MB50\text{M} \times 18\text{B} = 900\text{ MB}. For segments with fewer than 5 probes in the window, we apply confidence weighting: the reported speed is a weighted average of the sparse real-time data and the historical average for that segment, time of day, and day of week.
This prevents a single slow-moving delivery truck from painting an entire highway segment red. Trade-off: a shorter window (30 seconds) detects incidents faster but amplifies noise from individual drivers.
A longer window (5 minutes) is smoother but delays detection. Google and Waze settled on roughly 60 seconds as the sweet spot.
What if the interviewer asks: what about privacy? Speed data is aggregated and anonymized.
No individual trajectories are stored after map matching. Segments with fewer than 5 unique probes are suppressed entirely.
Why it matters in interviews
This concept tests real-time data pipeline design under extreme throughput. Explaining the 4M updates/sec ingestion rate, the HMM map matching step, and the confidence weighting for sparse segments shows we can build a live traffic system, not just describe one.
Related concepts