EASYwalkthrough
WebSocket Connection Management
WebSocket connections are stateful: each connection is pinned to a specific server process. The constraint: when a WebSocket server crashes, all connections on that server drop, and riders lose real-time tracking updates during an active trip.
First, connection registry: a Redis hash maps ride_id to the WebSocket server ID hosting that connection. We chose Redis (not an in-memory local registry) because a centralized registry survives individual server failures.
“We solve this with three mechanisms.”
Trade-off: we accepted one extra network hop per routing lookup in exchange for crash-resilient connection tracking. Second, heartbeat and reconnection: the client sends a ping every 10 seconds.
If 3 consecutive pings fail (30 seconds), the client initiates a reconnection to any available WebSocket server. The new server registers itself in Redis for that ride_id.
The tracking gap is at most 3 seconds (one missed GPS update). Third, graceful draining: when deploying new code, the WebSocket server stops accepting new connections but keeps existing ones alive for up to 60 seconds.
This allows in-progress rides to complete naturally. At peak, we maintain 100K concurrent WebSocket connections across a fleet of servers, each handling roughly 10K connections using epoll-based event loops, consuming about 1 GB of memory per server.
Related concepts