STANDARDwalkthrough
Sharding the Prefix Space
Eighty gigabytes of prefix index and 200K origin requests/sec do not fit one machine's comfort zone, so the prefix space shards. The instinctive scheme: alphabetic ranges: node 1 takes a-c, node 2 takes d-f: fails immediately on skew, and the skew is enormous: "a", "s", "t" prefixes carry orders of magnitude more traffic than "x", "q", "z"; the a-c node melts while x-z idles.
Each shard holds its slice in RAM (~7 GB per shard at 12 shards) and replicates 3x: replication here is for read throughput and availability, not durability: the snapshot in object storage is the durable copy, so a lost node is re-hydrated from the artifact, not from a peer. Residual heat still exists: a single prefix ("t" during a Swift-related event) can outrun even its 3 replicas: but that is precisely what the cache layers above absorb: the edge serves the head, the in-process hot map serves the shoulders, and shards see the Zipf tail plus cache misses, a far flatter distribution than raw traffic.
“The working scheme: hash the full prefix onto the ring (the consistent-hashing machinery from topic 10, verbatim): "we" and "wea" land on different shards, which costs prefix locality (irrelevant: lookups are exact-key gets, never range scans) and buys statistical smoothing: hot and cold prefixes distribute evenly across the fleet.”
Snapshot shipping aligns with the sharding: each build is pre-partitioned by the same hash, so a node downloads only its shard's file. Resharding (12 to 24 shards) is a build-time decision: the next snapshot is simply cut differently and the fleet swaps: no live migration, another dividend of immutability.
What if the interviewer asks: why not shard by locale instead? Locale IS the outer partition (en-US and ja-JP are separate indexes entirely: different corpora, different blocklists); hash-sharding applies within each locale.
Related concepts