EASYwalkthrough
Social Graph
Who follows whom, and how do we look that up at fanout speed? The constraint: when a user publishes a post, our fanout service needs the complete follower list in under 10ms to begin cache writes.
The social graph stores every follow relationship as a directed edge in an adjacency list. User A follows User B means one edge from A to B.
“A SQL query scanning a billion-row follow table cannot meet that latency target.”
We chose a dual-storage pattern (not a single database) to satisfy both the speed and durability requirements. Redis Sets handle fast membership checks ('does User A follow User B?') and fast follower list retrieval for fanout. MySQL handles durable persistence and complex queries like 'find all mutual followers.' The Redis layer is a read-through cache of the MySQL source of truth.
Why do both layers exist? Because Redis alone risks data loss on restart (even with AOF, recovery takes minutes for billion-entry datasets), and MySQL alone cannot serve follower lookups at the 2 million queries per second our fanout layer demands.
The graph is partitioned by user ID across shards, so retrieving all followers of a single user hits exactly one shard, eliminating cross-shard joins. For bidirectional relationships like Facebook friendships, each connection creates two edges (A follows B and B follows A), doubling storage but making mutual-follow lookups a single set intersection.
Trade-off: we gave up storage efficiency (dual writes to Redis and MySQL) in exchange for sub-10ms lookup latency and full durability. What if the interviewer asks: 'Why not a graph database like Neo4j?' We chose Redis + MySQL (not Neo4j) because our access pattern is exclusively single-hop lookups (get followers of X), not multi-hop traversals.
Neo4j excels at 'friends of friends' queries, but we never need those in the fanout path, and it adds operational complexity for a pattern Redis handles natively.
Related concepts