EASYwalkthrough

Block Storage

7 of 8

3 related

The constraint: metadata is small (200 bytes/record), read-heavy, and needs ACID transactions, while file content is large (4 MB/chunk), write-heavy, and needs raw throughput. Storing both in the same database forces one system to handle two conflicting access patterns. Block storage is our decision to separate metadata from content.

Chunks go to S3 or GCS, keyed by their SHA-256 hash: /chunks/sha256_hex. This key structure makes deduplication trivial (same hash = same key = same object) and enables CDN caching.

“We chose S3/GCS for chunks (not local ext4, not HDFS) because S3 provides 11 nines of durability (

99.999999999%

) via 3x replication across availability zones, with no operational overhead for disk management.”

Metadata stays in MySQL for ACID. Dropbox originally stored blocks on local ext4 file systems before migrating to their custom storage system called Magic Pocket, which handles 600 billion blocks.

Trade-off: separating metadata from content introduces a consistency gap. After uploading a chunk to S3 and updating the metadata in MySQL, a crash between the two steps leaves an orphaned chunk.

We run a periodic reconciliation job that compares S3 inventory against the chunk index and cleans up orphans. What if the interviewer asks: 'Why not HDFS?' HDFS works well for batch analytics (large sequential reads) but adds operational complexity for a file storage system where the access pattern is random reads of individual chunks by hash.

S3's HTTP API is operationally simpler.

Why it matters in interviews

The metadata-versus-content split is the fundamental architectural decision. Interviewers want to hear why we separate them, how S3's hash-based keys enable dedup, and how we handle the consistency gap between the two stores.

Related concepts

← PreviousMetadata Sharding Next →Notification Service