Whiteboard ScaleTopicsNews Feed

Twitter / News Feed

VERY COMMON

News feed design comes up at every FAANG company. It is how Twitter delivers 500 million tweets per day to 200 million users with under 5 seconds of delivery latency. You will solve the fanout problem, handle the celebrity edge case that broke Twitter in 2012, and design a timeline cache that fits 200 million users in 1.28 TB of Redis.

  • Design hybrid fanout that handles both 200 and 50M followers
  • Size a timeline cache for 200M users at 1.28 TB
  • Avoid the celebrity fanout storm that crashes write pipelines
GoogleMetaAmazonTwitterLinkedInNetflix
8
Concepts
Deep dives
10
Cheat Items
Quick ref
Elevator Pitch3-minute interview summary

Twitter feed for 200M DAU processing 500M tweets per day. Hybrid fanout: users under 10K followers get fanout-on-write, pushing tweet IDs to follower timeline caches in Redis. Celebrities above 10K use fanout-on-read, merged at request time. Each tweet is 1KB with a Snowflake ID (41-bit timestamp, 10-bit machine, 12-bit sequence) for time-sorted ordering. Timeline cache holds 800 tweet IDs per user at 6.4KB each, 1.28TB total. Fanout flows through Kafka so the POST returns in under 100ms. Feed ranking combines recency with engagement signals.

Concepts Unlocked8 concepts in this topic

Fanout-on-Write (Push Model)

EASY

We chose to push the tweet ID into every follower's timeline cache at write time because it makes reads O(1)O(1). The cost we accept: N cache writes per tweet, where N = follower count.

Core Feature Design

Fanout-on-Read (Pull Model)

EASY

We chose pull-at-read-time for celebrities (not push) because a 30M-follower account would trigger 30M cache writes per tweet. Reads cost O(N)O(N) queries, but we avoid massive write amplification.

Core Feature Design

Snowflake ID Generation

STANDARD

We chose Snowflake (not AUTO_INCREMENT or UUID): a 64-bit ID with 41 bits timestamp, 10 bits machine, 12 bits sequence. It generates 4.19B4.19B IDs/sec with zero coordination and gives us ORDER BY id = ORDER BY time for free.

High Level Design

Hybrid Fanout Model

STANDARD

We chose a threshold of 10K followers to split push vs pull. Below 10K: fanout-on-write for sub-10ms reads. Above 10K: fanout-on-read to cap write amplification. Trade-off: celebrity tweets appear 1-2 seconds later.

Core Feature Design

Timeline Cache (Redis Sorted Set)

STANDARD

We chose Redis sorted sets (not Memcached or MySQL) to store 800 tweet IDs per user because ZREVRANGE gives us chronological pages in sub-10ms. 6.4 KB per user, 1.28 TB for 200M users.

High Level Design

Write Amplification

STANDARD

We accept the cost: 1 tweet from a user with 200 followers = 200 cache writes. At 500M tweets/day: 100 billion cache writes/day. This is the dominant bottleneck, which is why we cap fanout at 10K followers.

Fault Tolerance

Social Graph (Redis + MySQL)

EASY

We chose two stores (not one) because neither handles both speed and durability. Redis SETs for O(1)O(1) membership checks and fast follower lookups. MySQL follows table for durability and analytics. Synced via Kafka events.

Database Schema

Cursor-Based Pagination

TRICKY

We chose cursor-based pagination (not OFFSET) using the last seen Snowflake ID as cursor. Always O(logN+M)O(\log N + M) regardless of page depth, unlike OFFSET pagination which degrades linearly. Trade-off: we cannot jump to page N directly.

System APIs