STANDARDwalkthrough

Chat Server Architecture

4 of 8
3 related
A chat server is stateful: it holds WebSocket connections in memory. This breaks the usual load balancing pattern where any server can handle any request.
500MB
of memory for the filter
The architecture has three layers. Layer 1: Gateway servers terminate WebSocket connections and handle auth, heartbeats, and connection lifecycle.
We cannot use round-robin or random load balancing because a message for User B must reach the specific server holding B's connection.
Each gateway holds 50K connections. Layer 2: Chat service (stateless) handles message routing, ordering, and persistence.
It looks up the recipient's gateway in the connection registry and forwards the message. Layer 3: Storage layer with Kafka for async writes and Cassandra for message persistence.
Why separate gateway from chat service? Because gateways are connection-bound (limited by file descriptors and RAM) while chat services are CPU-bound (message validation, routing logic).
Scaling them independently lets us add gateways when connection count grows and chat services when message throughput grows. The gateway runs on epoll (Linux) to handle 50K connections per process with a single event loop.
Each connection consumes a file descriptor and ~10KB of kernel buffer. At 50K connections, that is only 500MB of RAM, leaving headroom for the OS and application logic on a 16GB server.
Trade-off: the stateful gateway layer means we cannot simply restart servers. A rolling restart of one gateway disconnects 50K users, triggering a reconnection storm.
We mitigate this with graceful drain: the gateway stops accepting new connections, waits for existing ones to naturally close or migrate, then shuts down.
Why it matters in interviews
Separating stateful gateways from stateless chat services is the architectural pattern interviewers want to see. It shows you understand that connection management and message processing have different scaling axes and failure modes.
Related concepts