EASYwalkthrough

Social Graph

4 of 8
3 related
Who follows whom, and how do we look that up at fanout speed? The constraint: when a user publishes a post, our fanout service needs the complete follower list in under 10ms to begin cache writes.
The social graph stores every follow relationship as a directed edge in an adjacency list. User A follows User B means one edge from A to B.
A SQL query scanning a billion-row follow table cannot meet that latency target.
We chose a dual-storage pattern (not a single database) to satisfy both the speed and durability requirements. Redis Sets handle fast O(1)O(1) membership checks ('does User A follow User B?') and fast O(N)O(N) follower list retrieval for fanout. MySQL handles durable persistence and complex queries like 'find all mutual followers.' The Redis layer is a read-through cache of the MySQL source of truth.
Why do both layers exist? Because Redis alone risks data loss on restart (even with AOF, recovery takes minutes for billion-entry datasets), and MySQL alone cannot serve follower lookups at the 2 million queries per second our fanout layer demands.
The graph is partitioned by user ID across shards, so retrieving all followers of a single user hits exactly one shard, eliminating cross-shard joins. For bidirectional relationships like Facebook friendships, each connection creates two edges (A follows B and B follows A), doubling storage but making mutual-follow lookups a single set intersection.
Trade-off: we gave up storage efficiency (dual writes to Redis and MySQL) in exchange for sub-10ms lookup latency and full durability. What if the interviewer asks: 'Why not a graph database like Neo4j?' We chose Redis + MySQL (not Neo4j) because our access pattern is exclusively single-hop lookups (get followers of X), not multi-hop traversals.
Neo4j excels at 'friends of friends' queries, but we never need those in the fanout path, and it adds operational complexity for a pattern Redis handles natively.
Why it matters in interviews
The social graph is the foundation of every feed system. Interviewers want to hear us justify the dual storage pattern and explain how follower list retrieval drives the fanout routing decision at publish time.
Related concepts