Video Chunking

4 of 8

3 related

A 2-hour movie at 1080p is roughly 6 GB. Streaming that as a single file means any seek operation requires re-downloading from the beginning, and a single network hiccup corrupts the entire transfer.

We solve this with video chunking, which splits the file into small, independently decodable segments of 2 to 10 seconds each. Each chunk starts at a Group of Pictures (GOP) boundary, meaning it begins with a keyframe (I-frame) and can be decoded without any data from previous chunks.

“The constraint: we need random access into any point in the video without downloading everything before it.”

This independence enables three capabilities that would be impossible with a monolithic file. First, parallel transcoding: each chunk is encoded separately across workers, so a 10-minute video split into 150 four-second chunks can be transcoded by 150 workers simultaneously.

Second, instant seek: the player jumps to the nearest chunk boundary without downloading preceding content. Third, mid-stream ABR switching: the player can request the next chunk at a different resolution without restarting the stream.

We chose 4-second chunks for VOD (not 2-second or 10-second) because 4 seconds balances two opposing forces. Shorter chunks (2 seconds, as Twitch uses for live streaming) minimize latency but double the number of HTTP requests and manifest entries.

Longer chunks (10 seconds) reduce request overhead but make ABR switching less responsive, since the player waits up to 10 seconds before it can change quality. Trade-off: we accept coarser seek granularity (snapping to 4-second boundaries) in exchange for a manageable manifest size.

A 2-hour movie at 4-second chunks produces roughly 1,800 segments per resolution. Implication: at 6 resolutions, we store 10,800 chunk files per movie.

The manifest file (`.m3u8` for HLS) lists every chunk URL and its duration. What if the interviewer asks: why align on GOP boundaries and not arbitrary time intervals?

Because a chunk that starts mid-GOP requires reference frames from the previous chunk to decode, breaking the independence that enables parallel transcoding and ABR switching.

Why it matters in interviews

Chunking is the foundation that enables ABR, parallel transcoding, and seek. Explaining GOP-aligned boundaries and why we chose 4-second chunks over 2-second or 10-second alternatives shows we understand the latency-versus-overhead trade-off interviewers probe.

Related concepts

← PreviousCDN (Content Delivery Network)Next →Resumable Upload