URL Shortener Cheat Sheet

Key concepts, trade-offs, and quick-reference notes for your interview prep.

Keyspace Size

We chose 7 characters (not 6 or 8) because

62^7 = 3.5 \text{ trillion unique codes}

. At 100M new URLs per day, that keyspace lasts 95+ years before exhaustion. Trade-off: longer codes than necessary waste screen space in mobile UIs, shorter codes exhaust faster.

💡 Show the math in the interview: 3.5T / 100M per day = 35,000 days = 95 years.

Counter Range Pre-allocation

We chose range pre-allocation (not a single global counter) because a single Redis INCR caps at ~10K RPS under contention. Each app server grabs a range of 10,000 IDs from ZooKeeper upfront. After that, it increments locally with zero coordination on the hot path. Trade-off: if a server crashes mid-range, we lose the unused IDs in that range, creating small gaps in the keyspace.

💡 Counter ranges eliminate distributed locks. Mention ZooKeeper or etcd as the range allocator.

Read-to-Write Ratio

We derived a 10:1 ratio:

289\text{K read RPS}

29\text{K write RPS}

. This ratio alone tells us the architecture must be cache-first. Without a cache layer, we would need 3+ MySQL read replicas for throughput alone.

💡 Derive from DAU: 500M users x 50 reads / 86,400 seconds = 289K.

Cache Hit Ratio Target

We target

\geq 95\%

Cache Hit Ratio (CHR). The top 20% of URLs attract 80% of traffic (Pareto distribution). That hot set fits in

\sim

2.7 GB of Redis (not Memcached, because we want persistence for cache warmup after a restart). Trade-off: we accept ~14K cache-miss reads/sec hitting MySQL, which is within a single node's capacity.

💡 Alert when CHR drops below 90%. That is an early sign of cache stampede or key expiry wave.

301 vs 302 Redirect

We chose 302 (Found) as the default (not 301) because we need analytics. With 302, the browser hits our server on every click. We offer 301 (Moved Permanently) as an opt-in for high-traffic links where the owner trades analytics visibility for up to 60% less server load on repeat visitors. Trade-off: 302 means higher server cost, but we get click-level tracking data.

⚠ Using 301 when you need click analytics. The browser caches the redirect and skips your server entirely. You lose all visibility into click volume.

Storage Growth Rate

Each URL record averages ~135 bytes (short_code 7B + long_url 100B + user_id 8B + timestamps 16B + metadata 4B). At 100M new URLs per day, that is

13.5\text{ GB/day}

, roughly

5\text{ TB/year}

. With max-length URLs (~500B per record), plan for

18\text{ TB/year}

. We chose to shard from day one because a single MySQL node handles ~2 TB comfortably before index scans slow down.

💡 Break down per-record storage to individual fields. Interviewers love byte-level reasoning.

Sharding Strategy

We shard by short_code hash (not user_id) across 16 shards. Counter-based Base62 codes are sequential, so their hashes distribute evenly across shards. User_id sharding would mean every redirect requires a scatter-gather across all shards, because the read path looks up by short_code. Trade-off: listing all URLs for a given user now requires a fan-out query across 16 shards, but this is a low-frequency operation.

⚠ Sharding by user_id creates hot shards when one user generates millions of URLs. The read path looks up by short_code, not user_id.

Bloom Filter for Dedup

We chose a Bloom filter (not a hash set) because it uses 10 bits per element at 1% false positive rate. For 10B URLs, a hash set would need ~80 GB (8B per key). The Bloom filter needs only

\sim

12 GB. It eliminates 85% of redundant DB lookups on writes. Trade-off: 1% false positives mean we occasionally reject a unique URL and must fall back to a DB check.

💡 Bloom filters answer 'definitely not in set' or 'probably in set'. They never produce false negatives.

TTL Jitter

We randomize cache TTL by

\pm 10\%

to prevent synchronized expiration. A base TTL of 24 hours becomes 21.6 to 26.4 hours per key. Without jitter, popular keys loaded in the same batch all expire at the same second, flooding MySQL with cache misses. Trade-off: slightly less predictable cache behavior, but we avoid coordinated expiry storms.

⚠ Setting identical TTL for all keys. Popular keys all expire at the same second, flooding the database with cache misses.

Write-Ahead Log for Counter

#10

We persist each allocated range to a WAL before acknowledging (not an in-memory-only counter) because a crash without a WAL can reissue IDs that were already handed out, causing collisions. Sequential I/O makes WAL writes

\sim

10x faster than random I/O. Trade-off: each range allocation pays a ~1ms fsync cost, but we guarantee no ID reuse after recovery.

💡 Without a WAL, a counter crash can reissue IDs that were already handed out, causing collisions.