Whiteboard ScaleNews FeedAnti-Patterns
Anti-Patterns

News Feed Anti-Patterns

Common design mistakes candidates make. Learn what goes wrong and how to avoid each trap in your interview.

Pure Fanout-on-Write for ALL Users

Very CommonFORMULA

We chose a hybrid model (not pure push) because pushing every tweet to every follower's cache without a celebrity threshold causes catastrophic write amplification for high-follower accounts. Trade-off: the hybrid model adds read-time merge complexity, but it caps maximum write amplification at 10K per tweet.

Why: Fanout-on-write is elegant and makes reads instant. Candidates pick it because the read path is straightforward: fetch the cache. They test with small follower counts (100-500) and it works fine. Then they forget about users with 10M+ followers. One tweet from a celebrity triggers 10M cache writes, which takes minutes to drain and delays every other user's fanout in the queue.

WRONG: Apply fanout-on-write to every user uniformly. A celebrity with 30M followers tweets, and the fanout service queues 30M cache inserts. At 100K writes/sec, that is 5 minutes of queue time. Every other user's tweet delivery stalls behind this backlog.
RIGHT: We use a hybrid model. Users below 10K followers use fanout-on-write. Users above the threshold use fanout-on-read: their tweets are fetched and merged at read time. This caps the maximum write amplification per tweet at 10K.

Pure Fanout-on-Read for ALL Users

Very CommonFORMULA

We chose fanout-on-write for the majority of users (not pure pull) because building the timeline on every read by querying each followed user's tweets turns a GET into a scatter-gather across hundreds of sources. Trade-off: we accept write amplification for normal users to keep reads at O(1)O(1).

Why: Fanout-on-read avoids write amplification entirely. Candidates choose it because the write path is a single INSERT. But at read time, a user following 500 accounts requires 500 queries to fetch recent tweets, then a merge-sort. Even with caching, cold timelines take 500+ ms to build. At 300M timeline reads/day, that is 150 billion sub-queries per day.

WRONG: Every timeline request triggers a pull from all followees. A user following 500 accounts fires 500 queries, fetches 5,000 tweets, and sorts them. Response time: 800ms. At 300M reads/day, the backend drowns in 150B150B sub-queries.
RIGHT: We pre-compute timelines via fanout-on-write for the majority of users (those with <10K followers). The timeline cache is ready before the user opens the app. Read latency drops to sub-50ms: one Redis ZREVRANGE call.

Full Tweet Objects in Timeline Cache

Very CommonFORMULA

We chose to store tweet IDs only (not full objects) in the timeline cache because full objects waste 125x more memory. Trade-off: we pay one extra round-trip (~2ms) for batch hydration on read, but we save 125x memory.

Why: It feels faster: skip the hydration step and serve tweets directly from the timeline cache. But a tweet object is ~1KB. With 800 tweets per user and 200M users, that is 200M×800×1KB=160 TB200M \times 800 \times 1\text{KB} = 160\text{ TB}. Storing 8-byte IDs: 200M×800×8B=1.28 TB200M \times 800 \times 8\text{B} = 1.28\text{ TB}. The 125x difference is the gap between a 160-node Redis cluster and a 2-node cluster.

WRONG: Store full tweet objects (1KB each) in the timeline sorted set. Memory balloons to 160 TB for 200M users. Every fanout write pushes 1KB instead of 8B, increasing network bandwidth by 125x.
RIGHT: We store only tweet IDs (8 bytes each) in the timeline cache. On read, we batch-hydrate the top N IDs from a separate tweet cache using MGET. One extra round-trip (~2ms) saves 125x memory.

Auto-Increment IDs Instead of Snowflake

Very CommonFORMULA

We chose Snowflake IDs (not AUTO_INCREMENT) because auto-increment creates a single-point bottleneck across shards and loses time-based sorting. Trade-off: Snowflake IDs leak creation time from the embedded timestamp, which may be a privacy concern.

Why: AUTO_INCREMENT is the default. It works on a single MySQL server. But with 64 shards, each shard generates its own sequence. Two tweets created at the same millisecond on different shards might both get ID 1,000,001. You need a central coordinator (single point of failure) or odd/even tricks (breaks if you add shards). Meanwhile, you lose the ability to sort by ID and get chronological order for free.

WRONG: Use AUTO_INCREMENT across 64 MySQL shards. Duplicate IDs collide across shards. Add a central ticket server to hand out unique IDs. That server becomes a bottleneck at 500M tweets/day and a single point of failure.
RIGHT: We use Snowflake IDs: 41-bit timestamp + 10-bit machine ID + 12-bit sequence. Each shard generates globally unique, time-sorted IDs independently. No coordination, no collisions, 4.19B4.19B IDs/sec capacity.

No Cache for Timeline

CommonFORMULA

We chose to pre-build timelines in Redis (not query the database on every read) because 300M daily reads would crush the database. Trade-off: we accept the write amplification cost and memory overhead of maintaining 200M timeline caches to keep reads at sub-10ms.

Why: Candidates sometimes skip caching, assuming the database can handle the load. But building a timeline from MySQL requires joining the follows table with the tweets table, filtering by 500 followee IDs, sorting by timestamp, and paginating. That is a multi-join query across shards. At 300M reads/day (3,472 QPS average, 10K+ peak), this query runs against hot tables with billions of rows.

WRONG: Every GET /timeline runs a multi-shard query: find the user's 500 followees, query each shard for recent tweets, merge-sort across shards. Latency: 500-800ms. Database CPU at 90% during peak hours.
RIGHT: We pre-build timelines into Redis sorted sets via fanout-on-write. GET /timeline becomes a single ZREVRANGE call: O(logN+M)O(\log N + M) where MM is the page size. Latency: sub-10ms. Database serves only cache misses and writes.

Sharding by tweet_id Instead of user_id

CommonFORMULA

We chose to shard by user_id (not tweet_id) because the most common access pattern is 'get all tweets by user X'. Trade-off: we accept uneven shard sizes (celebrity shards are larger), mitigated with read replicas.

Why: Hash-based sharding on tweet_id gives perfect data distribution. Each shard holds exactly 1/N of all tweets. But the most common access pattern is 'get all tweets by user X' (profile page, fanout reads). With tweet_id sharding, user X's tweets are scattered across all 64 shards. Every profile view fans out to all 64 shards, waits for the slowest one, and merges results.

WRONG: Shard by tweet_id for even distribution. A profile page query for one user fans out to all 64 shards. Each shard scans its index for that user_id. Latency is bounded by the slowest shard: p99 hits 50ms. At 100M profile views/day, that is 6.4 billion shard queries.
RIGHT: We shard by user_id. All of a user's tweets live on one shard. Profile queries hit one shard with a range scan on (user_id, created_at). Trade-off: hot users (celebrities) create a hot shard. We mitigate with a read replica for heavy-read shards.

Synchronous Fanout Blocking POST Response

CommonFORMULA

We chose async fanout via Kafka (not synchronous writes in the POST handler) because synchronous fanout makes API response time proportional to follower count. Trade-off: followers see the tweet 2-5 seconds later, but the author gets an instant response.

Why: It is the straightforward implementation: inside the POST handler, loop through followers and write to each cache. With 200 followers, that is 200 Redis writes at 0.5ms each = 100ms added to the API response. Seems fine. But a user with 5,000 followers now waits 2.5 seconds for their tweet to post. The HTTP connection times out at 30 seconds for users with 60K+ followers.

WRONG: The POST /tweets handler loops through the user's follower list and writes to each timeline cache synchronously. A user with 5,000 followers waits 2.5 seconds for the API to respond. HTTP timeout errors for users above 60K followers.
RIGHT: POST /tweets writes the tweet to the tweets table and publishes a message to a Kafka topic. We return 201 Created immediately (under 50ms). A separate fanout service consumes the message and writes to follower timeline caches asynchronously. The user sees their tweet instantly; followers see it within 2-5 seconds.

No Rate Limiting on Tweet Creation

CommonFORMULA

We rate-limit tweet creation at the API gateway (not leaving it unlimited) because each tweet triggers fanout to all followers. A bot posting 1,000 tweets/minute with 10K followers generates 1,000×10,000=10M1{,}000 \times 10{,}000 = 10M cache writes per minute. Trade-off: legitimate power users hit the rate limit occasionally, but we protect the fanout queue from flood attacks.

Why: Rate limiting feels like a separate concern, so candidates skip it in the core design. But each tweet triggers a fanout to all followers. A bot posting 1,000 tweets/minute with 10K followers generates 1,000×10,000=10M1{,}000 \times 10{,}000 = 10M cache writes per minute. That is 167K extra writes/sec, enough to spike the fanout queue latency for all users.

WRONG: No rate limit on POST /tweets. A bot posts 1,000 tweets/min. Each triggers fanout to 10K followers. The fanout queue backs up with 10M writes/min. Legitimate users' tweets are delayed by minutes.
RIGHT: We rate limit tweet creation: 300 tweets/day per user, 15 tweets per 15-minute window. We apply the limit at the API gateway before the request reaches the tweet service. Return 429 Too Many Requests with a Retry-After header when the limit is hit.