EASYwalkthrough
Thumbnail Generation
A viewer hovers over the seek bar and expects an instant preview image. Without pre-generated thumbnails, the player would need to decode the video at that timestamp on the fly, consuming significant client CPU and adding visible delay.
We generate thumbnails as a downstream task in the transcoding DAG, triggered after the first resolution finishes encoding. The pipeline extracts keyframes at 10-second intervals, producing a sprite sheet of candidate images per video.
“The constraint: we need thumbnails at every 10-second interval, at multiple resolutions, stored for sub-millisecond retrieval by video ID.”
A 10-minute video at one frame per 10 seconds yields 60 candidates. For creator-facing thumbnails, an ML model scores each candidate on visual quality, face detection, and text clarity, then selects the top 3 to 5 options for the creator to choose from.
Each thumbnail is resized to multiple dimensions (120x90, 320x180, 480x360) and compressed to JPEG at roughly 5 KB per image. Implication: at 5 thumbnails times 5 KB, that is only 25 KB of thumbnail data per video, making thumbnail storage negligible compared to video storage.
We store thumbnails in Bigtable (not a traditional SQL database or filesystem) because Bigtable provides sub-millisecond random reads keyed by video ID, handles billions of rows without sharding configuration, and scales horizontally without manual intervention. A relational database would require explicit sharding for this volume.
A filesystem would lack the indexed lookup speed. Trade-off: we accept Bigtable's higher per-query cost compared to a filesystem in exchange for consistent low-latency reads at any scale.
Netflix generates up to 30 personalized thumbnail variants per title, selected by an A/B testing framework that optimizes for click-through rate. What if the interviewer asks: why generate thumbnails in the transcoding DAG rather than as a separate pipeline?
Because the transcoding DAG already decodes the video into frames. Extracting thumbnails from decoded frames is nearly free.
A separate pipeline would decode the video a second time, doubling the I/O cost for that step.
Related concepts