EASYwalkthrough

Message Storage (Cassandra)

5 of 8

3 related

At 231K messages/sec with 150 bytes per message, we write 3 TB per day. We chose Cassandra (not MySQL, not DynamoDB) for message storage because the access pattern is a perfect fit: writes are append-only (messages are immutable once sent), and reads are sequential within a conversation (load the last 50 messages).

231K

writes/sec saturating MySQL

replicas for redundancy

The partition key is conversation_id, which co-locates all messages in a conversation on the same node. The clustering key is message_id DESC, so reading recent messages is a single sequential disk read with no sorting.

“Cassandra's LSM-tree storage engine handles append-only writes at

O(1)

amortized, regardless of table size.”

Why not MySQL? At 231K writes/sec, MySQL's B-tree indexes require random disk I/O for each insert.

Replication lag would grow to minutes. Cassandra's log-structured writes handle this natively.

Why not DynamoDB? DynamoDB could handle the throughput, but provisioned capacity pricing at 231K writes/sec is expensive ($~100K/month), and we lose control over partition placement.

With Cassandra, we tune replication factor, compaction strategy, and consistency level per query. Per-message storage: message_id (8B, Snowflake format) + conversation_id (8B) + sender_id (8B) + body (100B avg) + created_at (8B) + status (1B) + metadata (17B) = ~150 bytes.

At RF=3, actual storage is 3 TB x 3 = 9 TB/day across the cluster. Trade-off: Cassandra is eventually consistent.

Two users reading the same conversation at the same instant might see slightly different message counts. We use LOCAL_QUORUM for writes (acknowledged by 2 of 3 replicas) and LOCAL_ONE for reads (fast, single replica), accepting a small window of inconsistency.

Why it matters in interviews

Deriving 3 TB/day from the per-message byte breakdown and explaining why Cassandra's LSM-tree beats MySQL's B-tree at 231K writes/sec shows you pick databases based on access patterns, not popularity. Mentioning the partition key choice (conversation_id) proves you optimize for the read pattern.

Related concepts

← PreviousChat Server Architecture Next →Group Chat Fanout