EASYwalkthrough
File Chunking
Why transfer a 1 GB file when only one paragraph changed? We split every file into fixed-size 4 MB blocks, each identified by its SHA-256 hash.
Only blocks with changed hashes get uploaded. Dropbox pioneered this approach, chunking 600 billion content blocks across 700 million users.
“When a user edits a document, the client recomputes hashes for each block and compares them against the server's chunk manifest.”
We chose 4 MB (not 64 KB or 64 MB) because smaller chunks detect finer-grained changes but explode the chunk index to hundreds of entries per file, while larger chunks miss partial edits and waste bandwidth. Trade-off: we accepted the CPU cost of hashing 256 chunks per GB file (roughly 200 ms on a modern laptop).
To offset this, we use a rolling hash (Rabin fingerprint) for fast boundary detection before falling back to SHA-256 for verification. The result: 99% bandwidth reduction on incremental edits.
What if the interviewer asks: 'Why not use content-defined chunking?' Content-defined chunking (CDC) with Rabin fingerprinting produces variable-size chunks whose boundaries shift with content, improving dedup for inserted data. We chose fixed-size for simpler offset math in the chunk manifest, accepting slightly lower dedup on insertions.