Chat System Cheat Sheet

Key concepts, trade-offs, and quick-reference notes for your interview prep.

Per-Message Storage Breakdown

Every message averages ~150 bytes: message_id (8B, Snowflake 64-bit) + conversation_id (8B) + sender_id (8B) + body (100B avg, median chat message is ~50 UTF-8 characters) + created_at (8B) + status (1B, enum: sent/delivered/read) + metadata (17B, flags + device info). Daily:

20B \times 150B = 3\text{ TB/day}

. Yearly at RF=3:

3\text{ TB} \times 365 \times 3 = 3.3\text{ PB}

. Messages are tiny; the challenge is volume, not per-record size. This means network bandwidth (70 MB/sec) is never the bottleneck.

💡 150B per message, 3 TB/day, 1.1 PB/year raw, 3.3 PB at RF=3

⚠ Assuming messages are large because they 'contain data'. Text messages are tiny; media messages contain a URL pointer (100B), not the file.

WebSocket Server Sizing

Each gateway server holds 50K concurrent WebSocket connections using epoll for event-driven I/O. Per-connection memory: ~10KB (kernel socket buffers + user metadata + send/receive buffers). Total per server:

50K \times 10\text{KB} = 500\text{ MB}

. On a 16GB machine, that leaves 15.5GB for OS and application. Server count:

500M / 50K = {10{,}000}

gateway servers. File descriptors: Linux default is 1024; set ulimit to 65536. Each connection is one fd, so 50K connections use 50K fds, well within 65K.

💡 50K connections/server, 10K servers, 500MB RAM per server, set ulimit to 65536

⚠ Forgetting to increase file descriptor limits. Default Linux ulimit of 1024 means the server maxes out at 1024 connections.

Per-Conversation Sequence Numbers

We assign each message a per-conversation sequence number via Redis INCR on key seq:conversation_id. At 20B messages across ~10B conversations, each conversation averages 2 increments/day, so contention per counter is near zero. A global counter at 231K/sec would serialize every message behind a single lock. Per-conversation counters run in parallel: 231K/sec spread across 10B keys means each key is incremented once every 43,000 seconds on average. The sequence number is assigned before Kafka publish, ensuring ordering survives async persistence.

💡 Per-conversation (not global), Redis INCR, assigned before Kafka publish

⚠ Using a global auto-increment counter. At 231K/sec, a single counter becomes a serialization bottleneck with 5-10ms latency per increment under contention.

Idempotency Keys for Dedup

The client generates a UUID idempotency_key per message. The server checks a Redis SET (key: dedup:idempotency_key, TTL: 24 hours) before processing. If the key exists, the server returns the original ack without re-delivering. This gives at-least-once delivery at the transport layer with exactly-once semantics at the application layer. Dedup cache size:

231K\text{ keys/sec} \times 86{,}400\text{ sec} \times 40B = ~800\text{ GB}

. We shard across ~15 Redis nodes. Fallback: if the dedup cache is unavailable, clients dedup using the conversation sequence number.

💡 Client-generated UUID, Redis SET with 24h TTL, ~800 GB dedup cache across 15 nodes

⚠ Relying on exactly-once transport (2PC). Two-phase commit adds 50-100ms per message and blocks if the recipient is offline.

Lazy Presence (Not Eager Broadcast)

Eager broadcast: user comes online, push status to all 500 contacts. At morning peak:

500M \times 500 / 600\text{s} = 417M

events/sec. This melts any pub/sub system. Lazy presence: store status in Redis with 60-second TTL, refreshed by 30-second heartbeats. Only push status changes to users who are actively viewing a chat with that contact. When User A opens a chat with User B, A subscribes to B's presence channel. Status change events go only to active subscribers, not all contacts. Heartbeat rate:

500M / 30\text{s} = 16.7M/\text{sec}

💡 Lazy, not eager. 16.7M heartbeats/sec. 60s TTL. Only push to active viewers.

⚠ Broadcasting presence to all contacts. The math (417M events/sec) makes it obviously impossible, but most candidates do not compute it.

Connection Registry (Redis)

Each user's WebSocket gateway is tracked in Redis: key = conn:user_id, value = server_id, TTL = 90 seconds refreshed by heartbeat. When routing a message to User B, the chat service does one Redis GET to find B's gateway, then forwards directly. Lookup cost: ~1ms. Alternative: broadcast to all 10,000 gateways. Cost: 10,000 network calls per message at 231K messages/sec = 2.31 billion calls/sec. The registry trades 1ms of lookup for eliminating 9,999 wasted network calls per message.

💡 conn:user_id -> server_id, 90s TTL, 1ms lookup vs 10K broadcast

⚠ Broadcasting messages to all gateway servers. At 10K servers and 231K messages/sec, broadcast generates 2.31B network calls/sec.

Kafka as Message Buffer

Kafka sits between the chat service and Cassandra. Why? The message must be acked to the sender in under 200ms, but Cassandra writes can spike to 50ms at p99. If we wrote synchronously, delivery latency = routing (5ms) + Cassandra write (50ms p99) + gateway push (5ms) = 60ms best case. With Kafka: routing (5ms) + Kafka publish (2ms) + gateway push (5ms) = 12ms. Cassandra writes happen asynchronously via Kafka consumers. If Cassandra is temporarily slow, messages buffer in Kafka (days of retention) without affecting delivery latency.

💡 Kafka decouples delivery from persistence. Ack in 12ms, not 60ms.

⚠ Writing synchronously to Cassandra before acking the sender. At p99, Cassandra write latency spikes to 50ms, pushing total delivery above 200ms.

Catch-Up Sync on Reconnection

When a user reconnects after being offline, the client sends its last_seen_sequence_number per conversation. The server queries Cassandra: SELECT * FROM messages WHERE conversation_id = ? AND message_id > ? ORDER BY message_id ASC. Because Cassandra partitions by conversation_id and clusters by message_id DESC, this is a single-partition sequential read. For users with many conversations, we batch queries and stream results over the WebSocket. This pattern handles both brief disconnects (2-5 seconds, a few missed messages) and extended offline periods (hours, hundreds of messages).

💡 Client sends last_seen_seq per conversation. Cassandra range query. Single-partition read.

⚠ Keeping undelivered messages in server memory. If the server crashes, those messages are lost. Cassandra is the durable source of truth.

Group Chat Fan-Out Limits

Fan-out cost scales linearly with group size. At 231K messages/sec with 10% group messages and average group size 20:

231K \times 0.1 \times 20 = 462K

deliveries/sec. We cap at 500 members (WhatsApp's limit). A 10,000-member group at 100 messages/minute would generate 1M delivery events/minute from one group. We store once, deliver many: the message body is written once to Cassandra, and only delivery notifications (16B each: message_id + conversation_id) are fanned out to online members.

💡 500 member cap. Store once, fan out delivery notifications. 462K deliveries/sec from groups.

⚠ Storing a separate copy of the message for each group member. For a 500-member group, that is 500x storage waste. Store once, reference by conversation_id.

E2E Encryption Trade-Offs

#10

End-to-end encryption via the Signal Protocol (Double Ratchet + X3DH key agreement) means the server stores only ciphertext. Forward secrecy: compromising one message key does not reveal past or future messages. The server is a blind relay. Trade-off: the server cannot implement search, spam filtering, link previews, or content moderation on message content. WhatsApp solves spam detection via metadata analysis (message frequency, contact patterns, forwarding chains). Group E2E uses Sender Keys: the sender encrypts a symmetric key per member, then encrypts the message once with that key.

💡 Signal Protocol. Server is blind relay. No server-side search or moderation.

⚠ Assuming TLS is sufficient for E2E. TLS encrypts client-to-server, but the server decrypts and re-encrypts when forwarding. The server sees plaintext.