Cloud Storage Cheat Sheet
Key concepts, trade-offs, and quick-reference notes for your interview prep.
File Chunking
#1💡 The 4 MB size balances granularity (detecting small edits) against index overhead (number of chunks per file). Trade-off: we accepted higher CPU cost for hash computation.
Deduplication
#2💡 Garbage collector deletes chunks only when ref_count drops to zero after a grace period. We never delete immediately because other users may still reference the chunk.
Sync Protocol
#3💡 Long polling is stateless and firewall-friendly, unlike WebSockets which require persistent connections and session affinity.
Conflict Resolution
#4💡 Fork-and-surface strategy: create 'filename (conflicted copy - device - date)' so no data is lost.
Resumable Upload
#5💡 Idempotent chunk acceptance: if the client retries an already-received chunk, the server recognizes the duplicate by byte range and returns success without double-writing.
Storage Architecture
#6💡 Chunks keyed by SHA-256 hash in S3 makes dedup trivial: same hash = same object key. Trade-off: crash between S3 write and MySQL update creates orphaned chunks, cleaned by a reconciliation job.
Capacity Numbers
#7Bandwidth Optimization
#8💡 Delta sync reduces volume, dedup skips duplicates, compression shrinks what remains. The order matters: dedup check before upload avoids wasting bandwidth on chunks we would skip.
Version History
#9💡 Version storage is cheap because versions share most chunks. Only changed chunks add storage.