Cloud Storage Cheat Sheet

Key concepts, trade-offs, and quick-reference notes for your interview prep.

File Chunking

We split files into 4 MB chunks and compute SHA-256 per chunk. On edit, we compare local hashes to the server manifest and upload only changed blocks. We chose 4 MB (not 64 KB or 64 MB) because smaller chunks explode the index while larger chunks miss partial edits. A 1 GB file with 2 changed chunks uploads 8 MB, not 1 GB.

💡 The 4 MB size balances granularity (detecting small edits) against index overhead (number of chunks per file). Trade-off: we accepted higher CPU cost for hash computation.

⚠ Using file-level hashing instead of block-level. File-level misses partial edits and re-uploads the entire file on every change.

Deduplication

We use SHA-256 hash as the chunk primary key and track references with ref_count. We chose reference counting (not mark-and-sweep GC) because ref_count updates are O(1) per operation. Saves 40-60% storage globally. At 10 PB gross: 4-6 PB saved. Trade-off: ref_count can drift on crashes, so we run weekly reconciliation.

💡 Garbage collector deletes chunks only when ref_count drops to zero after a grace period. We never delete immediately because other users may still reference the chunk.

⚠ Deleting a chunk when one user removes their file, breaking other users who share that chunk via dedup.

Sync Protocol

We chose long polling (not WebSockets, not SSE) with cursor-based delta sync. Each device holds an open HTTP request for up to 60 seconds. On change, the server responds immediately. The client sends a cursor token to receive only changes since last sync. Trade-off: up to 1 second latency versus WebSocket's near-instant push, but we eliminate connection affinity overhead.

💡 Long polling is stateless and firewall-friendly, unlike WebSockets which require persistent connections and session affinity.

⚠ Polling every second wastes bandwidth (300M empty requests/sec for 100M DAU x 3 devices) and still adds 1 second of latency.

Conflict Resolution

We use version vectors to detect concurrent edits. When two devices edit offline, we store both versions and create a conflicted copy. We chose fork-and-surface (not LWW, not OT) because LWW silently discards edits and OT requires an always-online server that does not work for binary files. Trade-off: users see occasional conflict files, but zero data loss.

💡 Fork-and-surface strategy: create 'filename (conflicted copy - device - date)' so no data is lost.

⚠ Using LWW for file content, silently discarding one user's offline edits. LWW is only safe for non-critical metadata like timestamps.

Resumable Upload

We use the tus protocol (not a custom implementation) because it is an open standard with battle-tested client libraries. POST to create session, PUT chunks sequentially, server tracks byte offset in Redis (24h TTL). On interruption, HEAD to get last offset, resume from there. Maximum wasted transfer per drop: 4 MB, not the full file.

💡 Idempotent chunk acceptance: if the client retries an already-received chunk, the server recognizes the duplicate by byte range and returns success without double-writing.

Storage Architecture

We store metadata in MySQL (ACID for atomic file renames, moves, deletes). We store chunks in S3/GCS (11 nines durability, no disk management). We store session state in Redis (upload progress, long-poll tracking). We chose this split (not a single store) because metadata is 200B/row and read-heavy while chunks are 4 MB and write-heavy. The two systems scale independently.

💡 Chunks keyed by SHA-256 hash in S3 makes dedup trivial: same hash = same object key. Trade-off: crash between S3 write and MySQL update creates orphaned chunks, cleaned by a reconciliation job.

Capacity Numbers

100M DAU, 500M users, 100B files. Storage: 10 PB gross, 5 PB after dedup. Implication: dedup saves $115K/month in S3 costs. Sync: 35K ops/sec peak. Implication: we need at least 7 sync service instances. Metadata: 20 TB across 16 shards = 1.25 TB/shard. Chunk index: 100 TB. Upload bandwidth: 280 GB/sec peak across 3+ regions.

Bandwidth Optimization

Three optimizations stack multiplicatively. Delta encoding: only changed chunks upload. Dedup: we skip chunks that already exist globally (not per-user). Compression: we gzip chunks before transfer. Combined: 75-95% bandwidth reduction versus naive full-file upload. We chose global dedup (not per-user) because cross-user duplication is the biggest savings source. Trade-off: global dedup requires a centralized chunk index.

💡 Delta sync reduces volume, dedup skips duplicates, compression shrinks what remains. The order matters: dedup check before upload avoids wasting bandwidth on chunks we would skip.

Version History

We keep 100 versions per file with 30-day retention. Each version is a list of chunk hashes, not a full copy. Old versions share chunks with the current version via dedup, so version storage is cheap. We chose 100 versions (not unlimited) to bound the file_versions table size. Trade-off: edits older than 30 days or beyond the 100th version are permanently lost. GC deletes unreferenced chunks after a grace period.

💡 Version storage is cheap because versions share most chunks. Only changed chunks add storage.

Security

#10

We use TLS 1.3 for all data in transit and AES-256 encryption at rest in S3. Per-file encryption keys live in a separate KMS (not alongside the data). We offer optional client-side encryption for sensitive files. Trade-off: client-side encryption defeats server-side dedup because identical plaintext produces different ciphertext. We accept this cost for users who opt in to client-side encryption.

⚠ Storing encryption keys in the same database as the encrypted content, defeating the purpose of encryption at rest.