EASYwalkthrough
Photo Storage and CDN Delivery
At 200M uploads/day, storing only originals wastes bandwidth because a mobile client on 3G would download a 3MB file when it only needs a 15KB thumbnail. We generate 4 resolution variants: 150px thumbnail (15KB), 320px small (50KB), 640px medium (200KB), and 1080px full (800KB).
We store originals and variants in S3 (not a distributed filesystem like HDFS) because photos are immutable write-once-read-many blobs, and S3 gives us 11 nines of durability without managing data nodes. Trade-off: S3 costs more per GB than HDFS, but we avoid an entire Hadoop operations team.
“Total per photo: original 3MB plus variants equals roughly 4.1MB.”
Since photos never change after upload, we set Cache-Control headers with 1-year TTLs on the CDN. Over 95% of reads are served from edge Points of Presence (POPs), never reaching origin storage.
At 200M uploads/day and 4.1MB each, that is 820TB of new storage daily. Implication: after one year we need roughly 300PB, which rules out any single-cluster solution and requires S3's automatic cross-region distribution.
What if the interviewer asks: why not generate variants on-demand instead of eagerly? Because a viral photo viewed 10M times would trigger 10M resize operations versus 4 one-time resizes.
Eager generation trades storage cost for compute savings.
Related concepts