Instagram / Photo Sharing
VERY COMMONInstagram photo sharing is asked at Meta, Google, and Amazon because it tests photo storage pipelines, CDN delivery, and feed generation at planetary scale. It is how Instagram delivers 1.3 billion photos per day to 500 million daily users with sub-second feed load times. You will design a multi-resolution image processing pipeline, a hybrid fanout system that handles the celebrity edge case, and a PostgreSQL sharding strategy with timestamp-based 64-bit ID generation.
- Design a photo upload pipeline with multi-resolution image processing
- Build hybrid fanout that handles 50M-follower celebrity accounts
- Implement PostgreSQL sharding with timestamp-based 64-bit IDs
Visual Solutions
Step-by-step animated walkthroughs with capacity estimation, API design, database schema, and failure modes built in.
Cheat Sheet
Key concepts, trade-offs, and quick-reference notes for Photo Sharing. Everything you need at a glance.
Anti-Patterns
Common design mistakes candidates make. Wrong approaches vs correct approaches for each trap.
Failure Modes
What breaks in production, how to detect it, and how to fix it. Detection metrics, mitigations, and severity ratings.
Start simple. Build to staff-level.
“Photo sharing for 500M DAU processing 200M uploads/day. Each photo in S3 at 3 MB, resized into 4 variants totaling 4.1 MB, 820 TB/day new storage. Hybrid fanout: under 10K followers get fanout-on-write pushing IDs to Redis caches at 4 KB/user, 2 TB total. Celebrities use fanout-on-read. PostgreSQL generates 64-bit IDs (41-bit timestamp + 13-bit shard + 10-bit sequence) inside PL/pgSQL. CDN delivers at 350K reads/sec peak with 95%+ cache hit ratio. Likes at 146K/sec via Redis INCR.”
Hybrid Fanout
TRICKYWe chose hybrid fanout (not pure push or pure pull) with a 10K follower threshold. Push for 99% of users gives instant delivery. Pull for celebrities caps write amplification at 100K writes/sec instead of 50M.
Core Feature DesignPhoto Storage & CDN
EASYWe store originals in S3 (not self-managed HDFS) and deliver via CDN with 1-year TTL. 95%+ cache hit ratio absorbs 350K reads/sec because photos are immutable after upload.
High Level DesignInstagram 64-bit ID Generation
STANDARDWe chose Instagram's 64-bit ID (not UUID v4, not Snowflake) because it is time-sortable, half the bytes, and generated inside PostgreSQL with zero external coordination.
High Level DesignTimeline Cache (Redis Sorted Set)
STANDARDWe chose Redis sorted sets (not Memcached, not plain lists) because we need O(log N) range queries for chronological pagination. 500 photo IDs per user, 4 KB each, 2 TB total.
High Level DesignMulti-Resolution Image Pipeline
EASYWe pre-generate 4 sizes (not dynamic resizing) because the read path is 50x hotter than writes. Thumbnail first for instant grid display. libvips over ImageMagick for 4-8x speed.
High Level DesignCassandra Follower Graph
STANDARDWe chose Cassandra (not MySQL) for 100B follow edges because LSM-tree writes at regardless of table size. Two denormalized tables: partition by followee for fanout, by follower for profile.
Database SchemaFeed Ranking
TRICKYWe rank feeds by 6 signals (not pure chronological) because ranked feeds increased time spent by 40%. ML scoring in under 50ms applied at read time on top of the chronological cache.
Core Feature DesignLogical vs Physical Sharding
STANDARDWe chose logical sharding (not physical hash-mod-N) with 8,192 logical shards on ~30 physical servers. Move entire shards to new hardware without row-level resharding.
Database Schema