Whiteboard ScaleTopicsPhoto Sharing

Instagram / Photo Sharing

VERY COMMON

Instagram photo sharing is asked at Meta, Google, and Amazon because it tests photo storage pipelines, CDN delivery, and feed generation at planetary scale. It is how Instagram delivers 1.3 billion photos per day to 500 million daily users with sub-second feed load times. You will design a multi-resolution image processing pipeline, a hybrid fanout system that handles the celebrity edge case, and a PostgreSQL sharding strategy with timestamp-based 64-bit ID generation.

  • Design a photo upload pipeline with multi-resolution image processing
  • Build hybrid fanout that handles 50M-follower celebrity accounts
  • Implement PostgreSQL sharding with timestamp-based 64-bit IDs
MetaGoogleAmazonMicrosoftNetflixUber
8
Concepts
Deep dives
10
Cheat Items
Quick ref
Elevator Pitch3-minute interview summary

Photo sharing for 500M DAU processing 200M uploads/day. Each photo in S3 at 3 MB, resized into 4 variants totaling 4.1 MB, 820 TB/day new storage. Hybrid fanout: under 10K followers get fanout-on-write pushing IDs to Redis caches at 4 KB/user, 2 TB total. Celebrities use fanout-on-read. PostgreSQL generates 64-bit IDs (41-bit timestamp + 13-bit shard + 10-bit sequence) inside PL/pgSQL. CDN delivers at 350K reads/sec peak with 95%+ cache hit ratio. Likes at 146K/sec via Redis INCR.

Concepts Unlocked8 concepts in this topic

Hybrid Fanout

TRICKY

We chose hybrid fanout (not pure push or pure pull) with a 10K follower threshold. Push for 99% of users gives instant delivery. Pull for celebrities caps write amplification at 100K writes/sec instead of 50M.

Core Feature Design

Photo Storage & CDN

EASY

We store originals in S3 (not self-managed HDFS) and deliver via CDN with 1-year TTL. 95%+ cache hit ratio absorbs 350K reads/sec because photos are immutable after upload.

High Level Design

Instagram 64-bit ID Generation

STANDARD

We chose Instagram's 64-bit ID (not UUID v4, not Snowflake) because it is time-sortable, half the bytes, and generated inside PostgreSQL with zero external coordination.

High Level Design

Timeline Cache (Redis Sorted Set)

STANDARD

We chose Redis sorted sets (not Memcached, not plain lists) because we need O(log N) range queries for chronological pagination. 500 photo IDs per user, 4 KB each, 2 TB total.

High Level Design

Multi-Resolution Image Pipeline

EASY

We pre-generate 4 sizes (not dynamic resizing) because the read path is 50x hotter than writes. Thumbnail first for instant grid display. libvips over ImageMagick for 4-8x speed.

High Level Design

Cassandra Follower Graph

STANDARD

We chose Cassandra (not MySQL) for 100B follow edges because LSM-tree writes at O(1)O(1) regardless of table size. Two denormalized tables: partition by followee for fanout, by follower for profile.

Database Schema

Feed Ranking

TRICKY

We rank feeds by 6 signals (not pure chronological) because ranked feeds increased time spent by 40%. ML scoring in under 50ms applied at read time on top of the chronological cache.

Core Feature Design

Logical vs Physical Sharding

STANDARD

We chose logical sharding (not physical hash-mod-N) with 8,192 logical shards on ~30 physical servers. Move entire shards to new hardware without row-level resharding.

Database Schema