EASYwalkthrough
Message Storage (Cassandra)
At 231K messages/sec with 150 bytes per message, we write 3 TB per day. We chose Cassandra (not MySQL, not DynamoDB) for message storage because the access pattern is a perfect fit: writes are append-only (messages are immutable once sent), and reads are sequential within a conversation (load the last 50 messages).
The partition key is conversation_id, which co-locates all messages in a conversation on the same node. The clustering key is message_id DESC, so reading recent messages is a single sequential disk read with no sorting.
“Cassandra's LSM-tree storage engine handles append-only writes at amortized, regardless of table size.”
Why not MySQL? At 231K writes/sec, MySQL's B-tree indexes require random disk I/O for each insert.
Replication lag would grow to minutes. Cassandra's log-structured writes handle this natively.
Why not DynamoDB? DynamoDB could handle the throughput, but provisioned capacity pricing at 231K writes/sec is expensive ($~100K/month), and we lose control over partition placement.
With Cassandra, we tune replication factor, compaction strategy, and consistency level per query. Per-message storage: message_id (8B, Snowflake format) + conversation_id (8B) + sender_id (8B) + body (100B avg) + created_at (8B) + status (1B) + metadata (17B) = ~150 bytes.
At RF=3, actual storage is 3 TB x 3 = 9 TB/day across the cluster. Trade-off: Cassandra is eventually consistent.
Two users reading the same conversation at the same instant might see slightly different message counts. We use LOCAL_QUORUM for writes (acknowledged by 2 of 3 replicas) and LOCAL_ONE for reads (fast, single replica), accepting a small window of inconsistency.
Related concepts