EASYwalkthrough

Photo Storage and CDN Delivery

2 of 8

3 related

At 200M uploads/day, storing only originals wastes bandwidth because a mobile client on 3G would download a 3MB file when it only needs a 15KB thumbnail. We generate 4 resolution variants: 150px thumbnail (15KB), 320px small (50KB), 640px medium (200KB), and 1080px full (800KB).

We store originals and variants in S3 (not a distributed filesystem like HDFS) because photos are immutable write-once-read-many blobs, and S3 gives us 11 nines of durability without managing data nodes. Trade-off: S3 costs more per GB than HDFS, but we avoid an entire Hadoop operations team.

“Total per photo: original 3MB plus variants equals roughly 4.1MB.”

Since photos never change after upload, we set Cache-Control headers with 1-year TTLs on the CDN. Over 95% of reads are served from edge Points of Presence (POPs), never reaching origin storage.

At 200M uploads/day and 4.1MB each, that is 820TB of new storage daily. Implication: after one year we need roughly 300PB, which rules out any single-cluster solution and requires S3's automatic cross-region distribution.

What if the interviewer asks: why not generate variants on-demand instead of eagerly? Because a viral photo viewed 10M times would trigger 10M resize operations versus 4 one-time resizes.

Eager generation trades storage cost for compute savings.

Why it matters in interviews

Calculating 820TB/day of new storage from the per-photo breakdown proves you derive numbers rather than guess. Explaining why immutability enables aggressive CDN caching with year-long TTLs connects storage design to delivery performance.

Related concepts

← PreviousNews Feed Generation (Hybrid Fanout)Next →Instagram ID Generation