Ride Sharing Cheat Sheet

Key concepts, trade-offs, and quick-reference notes for your interview prep.

H3 Hexagonal Grid

We chose H3 resolution 9 (not QuadTree) because cell assignment is a pure function of (lat, lng) with no tree mutations. Each cell covers 0.1 km^2 (one city block). Driver cell ID computed from GPS in O(1). Stored in Redis sorted set keyed by cell ID. Hexagons have uniform neighbor distances, eliminating the diagonal bias of square grids. Trade-off: we gave up adaptive density for zero write contention.

💡 H3 is a pure function of (lat, lng) to cell_id. No tree structure, no rebalancing, no locks.

⚠ Using a QuadTree for a write-heavy workload. At 167K updates/sec, the tree constantly rebalances causing lock contention.

Location Pipeline

We route 167K updates/sec through Kafka (not directly to Redis) because a Redis pause at that rate queues 83K updates in app memory. Each update: driver_id (8B) + lat (8B) + lng (8B) + timestamp (8B) + status (1B) = 33 bytes. Ingress: 5.5 MB/sec. Consumers write to Redis geo index + Cassandra for analytics. Trade-off: we accepted 1-2 seconds of added latency for zero data loss.

💡 Kafka decouples write-heavy ingestion from the geo index. If Redis is slow, Kafka buffers without dropping updates.

⚠ Writing directly to the geo index from the API layer. A Redis hiccup at 167K/sec causes cascading timeouts across all driver connections.

Dispatch Algorithm

We query the rider's H3 cell + 6 neighbors (k-ring). We rank candidates by: ETA (50%), direction of travel (20%), rating (15%), acceptance rate (15%). We chose multi-factor ranking (not nearest-distance) because the closest driver is often not the fastest to arrive. First-accept-wins via optimistic locking. Trade-off: 7ms matching latency vs 1ms for nearest-only, but 40% lower average pickup time.

💡 If the driver declines or the 15-second timer expires, re-dispatch to the next candidate. Total matching budget: 30 seconds.

⚠ Matching by nearest distance only. A driver 500m away but stuck in traffic has a worse ETA than one 900m away on a clear road.

Optimistic Locking

Driver status in Redis: AVAILABLE, MATCHED, ON_TRIP, OFFLINE. On accept, we CAS (Compare-And-Swap) from AVAILABLE to MATCHED. We chose optimistic locking (not pessimistic) because at 35 matches/sec peak, holding a pessimistic lock for 15 seconds would block other dispatchers. If CAS fails (another ride matched first), we re-dispatch within 500ms. Trade-off: we accepted occasional CAS failures (under 1%) for higher concurrent throughput.

⚠ No locking on driver status. Two concurrent dispatches send requests to the same driver, both succeed, and one rider gets stranded.

Capacity Numbers

1M DAU riders, 500K drivers. Location QPS: 167K/sec, peak 500K/sec. Implication: no traditional database handles this write rate, so we need a streaming pipeline. Ride matching: 11.6/sec, peak 35/sec. Active driver state: 16.5 MB in Redis. Implication: the entire dataset fits on a single Redis instance. Trip records: 500 MB/day = 182 GB/year.

Real-Time Tracking

We chose WebSocket during active rides (not HTTP polling) because one connection replaces 300 poll requests per ride. 3-second push interval. Each update: 40 bytes. At 100K concurrent rides: 1.3 MB/sec egress. Connection registry in Redis maps ride_id to server. Heartbeat every 10s, reconnect on 3 missed pings. Trade-off: we accepted stateful connection complexity for 99% lower request volume.

💡 We use HTTP polling (every 5s) for pre-match phase when the rider is browsing the map. WebSocket only for active rides.

Surge Pricing

We compute supply/demand ratio per H3 cell every 60 seconds (not globally) because Times Square demand should not inflate Brooklyn prices. Apply multiplier:

\text{surge} = \min(\text{ratio} \times 0.5, 5.0)

. We smooth with EMA (

\alpha = 0.3

) to prevent oscillation. Trade-off: we accepted a 2-3 minute delay in surge response for price stability. 500K active cells computed per minute.

⚠ Computing surge globally instead of per geo cell. Times Square at midnight has 10x demand while Brooklyn is quiet. Global surge punishes both.

ETA Estimation

We use three layers (not Dijkstra alone): Dijkstra on road graph (25% error), + real-time traffic weights from driver GPS (11% error), + ML correction from historical trips (<5% error). Trade-off: we accepted 3x compute cost per query for 5x accuracy improvement. Fallback: straight-line / 25 km/h (overestimates by 40%). We show a range, not a point estimate, when the ML layer is down.

💡 Traffic weights updated every 30 seconds from aggregated driver speed reports per road segment.

Fault Tolerance

Stale driver: no update for >15s means we mark them inactive and remove from geo index. Maximum ghost exposure: 15 seconds. Redis Cluster failover: <5 seconds. During the gap, we fall back to Cassandra geo-bounded scan (200ms vs 5ms). Kafka consumer lag: we auto-scale consumers and drop events older than 10 seconds. Re-dispatch on driver timeout: 500ms.

Storage Strategy

#10

We chose MySQL for trip state (not DynamoDB) because fare calculation requires ACID transactions. Redis for real-time geo index (16.5 MB total) because the entire driver set fits in memory. Cassandra for location time-series (not MySQL) because LSM-tree handles 167K append-only writes/sec without B-tree rebalancing. Kafka for the streaming pipeline to decouple ingestion from indexing.

💡 The entire active driver set fits in memory. This is a compute problem (matching, ETA), not a storage problem.