EASYwalkthrough

Resumable Upload

5 of 8
3 related
A creator uploads a 5 GB video over home Wi-Fi. At 80% completion (4 GB transferred), their connection drops for 10 seconds.
The constraint: at YouTube's scale of 500+ hours of uploads per minute, even a 1% upload failure rate without resumability wastes petabytes of bandwidth daily. We chose the tus protocol (not a proprietary chunked upload) because tus is an open standard with client libraries for every major platform, reducing our implementation effort.
Without resumability, we discard 4 GB of received data and the creator restarts from byte zero.
Google's Resumable Upload API follows the same pattern but is proprietary to GCP. Trade-off: we accept the overhead of tus's HTTP-based negotiation (slightly more verbose than a raw TCP stream) in exchange for broad client compatibility and a well-documented spec.
The protocol works in three phases. First, the client sends a POST to create an upload session and receives a unique upload URI.
Second, the client PUTs chunks sequentially (typically 5 to 10 MB each), and the server records the byte offset of each completed chunk. Third, on interruption, the client sends a HEAD request to learn the last confirmed offset and resumes from exactly that byte.
The overhead is minimal: one extra HEAD request per retry, plus the server storing a single byte offset per active session. Implication: storing one byte offset per session means even 1 million concurrent uploads require only 8 MB of server-side state, making this cheap to operate.
We set the chunk size to 8 MB (not 1 MB or 64 MB) because 1 MB chunks generate excessive HTTP overhead (headers per chunk), while 64 MB chunks mean losing up to 64 MB of progress on each retry. 8 MB balances retry cost against request overhead. What if the interviewer asks: why not use multipart upload like S3?
S3 multipart upload requires all parts to complete before the object is assembled, meaning a partial upload cannot be served or previewed. Tus allows byte-level resume without reassembly, which is better suited for progressive upload tracking.
Why it matters in interviews
Skipping resumability means our upload pipeline breaks for any file over a few hundred MB on unreliable networks. Mentioning the tus protocol and explaining why we chose it over proprietary alternatives shows we know the standard approach and considered trade-offs.
Related concepts