EASYwalkthrough

WebSocket Connection Management

8 of 8

3 related

WebSocket connections are stateful: each connection is pinned to a specific server process. The constraint: when a WebSocket server crashes, all connections on that server drop, and riders lose real-time tracking updates during an active trip.

1 GB

of memory for the filter

First, connection registry: a Redis hash maps ride_id to the WebSocket server ID hosting that connection. We chose Redis (not an in-memory local registry) because a centralized registry survives individual server failures.

“We solve this with three mechanisms.”

Trade-off: we accepted one extra network hop per routing lookup in exchange for crash-resilient connection tracking. Second, heartbeat and reconnection: the client sends a ping every 10 seconds.

If 3 consecutive pings fail (30 seconds), the client initiates a reconnection to any available WebSocket server. The new server registers itself in Redis for that ride_id.

The tracking gap is at most 3 seconds (one missed GPS update). Third, graceful draining: when deploying new code, the WebSocket server stops accepting new connections but keeps existing ones alive for up to 60 seconds.

This allows in-progress rides to complete naturally. At peak, we maintain 100K concurrent WebSocket connections across a fleet of servers, each handling roughly 10K connections using epoll-based event loops, consuming about 1 GB of memory per server.

Why it matters in interviews

Interviewers ask how we handle WebSocket server failure during an active ride. What if the interviewer asks: 'Why not use sticky sessions at the load balancer instead of a connection registry?' We answer: sticky sessions tie a client to a specific server for the session lifetime. If that server dies, the load balancer has no metadata to re-route the ride context. The Redis registry lets any server pick up the connection.

Related concepts

← PreviousDriver Location Service