TRICKYwalkthrough
Video Transcoding Pipeline
A creator uploads a raw 4K video at 50 Mbps. We need to produce 6 renditions (240p through 4K) within minutes, not hours.
We model the pipeline as a directed acyclic graph (DAG) of tasks. First, we split the raw file into GOP-aligned chunks (more on GOP alignment in the chunking concept).
“The constraint: transcoding a 10-minute 4K video on a single machine takes roughly 40 minutes, which is unacceptable when YouTube processes over 500 hours of uploads per minute.”
Then each chunk fans out to parallel FFmpeg workers, one per target resolution. Each worker also generates a thumbnail sprite sheet for its segment.
After all chunks at a given resolution finish, a merge task stitches them into a complete rendition. We chose DAG-based orchestration (not a linear queue) because DAG execution lets us parallelize across both chunks and resolutions simultaneously, reducing wall-clock time from 40 minutes to under 5 minutes for a 10-minute video.
Trade-off: DAG scheduling adds orchestration complexity (dependency tracking, partial failure handling) compared to a flat job queue. The pipeline is the single most expensive compute step, consuming roughly 70% of total processing cost.
Netflix runs transcoding on AWS with per-title encoding: each title gets a custom bitrate ladder optimized by a quality metric (). Implication: per-title encoding means a visually simple animation needs fewer bits than a fast-action scene at the same resolution, saving 20 to 50% of storage per title.
If any chunk fails, we retry only that chunk, not the entire video. This chunk-level retry isolates failures to seconds of content rather than reprocessing hours of video.
What if the interviewer asks: why FFmpeg and not a custom encoder? Because FFmpeg supports every major codec (H.264, H.265, VP9, AV1), is battle-tested across billions of hours of video, and has a massive open-source community.
Building a custom encoder gives marginal quality gains but costs years of engineering effort.
Related concepts